I am grabbing urls from a list, and they come from all over the place on the internet, so they vary in length.
I convert them to txt to hopefully take out a lot of the garbage.
trying to get a summary of them from chatgpt 3, i’m usually exceeding the prompt window. And the text that its sending is far more than just the article itself, it includes all the extra junk you always see on web pages.
Is there a way to extract the articles themselves without all the extra stuff? Again these are articles that come from all over so I doubt there’ll be standard tagging across the lot of them.