Get only article text from a remote page

samliew · January 4, 2024, 2:30am

Usually RSS feeds contain the body text/content. Try using the data contained within the RSS feed.

You could probably try to configure a module that does web scraping (like ScrapeNinja, ScrapingBee), to extract the selectors you mentioned (article/paragraphs/headers).

Alternatively, you can try to use Feedly to aggregate different RSS feeds for you.

A possible solution would be to use AI (OpenAI GPT) to parse the raw web page data and return you only the article bits you need (structured data).

Screenshot_2024-01-04_100138

Topic		Replies	Views
I am having trouble getting Chatgpt to read a webpage Beginner Questions connections	4	1269	August 2, 2024
Get HTML content without the rubbish (buttons,ads...) Questions http	9	650	February 20, 2024
News Automation (RSS -> Scraptio -> OpenAI --> Google Sheet): almost there, please help! Questions error	6	501	September 11, 2024
How to extract information in an http page using http request Questions api	6	310	July 31, 2024
Scraping a news story Questions api	2	83	August 14, 2025

Get only article text from a remote page

Related topics