MrStack
February 5, 2024, 11:00pm
1
HI All
I am reading RSS data from various feeds, and it works OK
But some feeds have an HTML link to the full news story, which I can iterate through, but if I do I also get some junk that I don’t want, such as the site logo, menu, etc.
Here is an example of an RSS feed
https://pitchfork.com/feed/feed-news/rss
Within the RSS feed, you might see a link such as
Is there a way of getting the story content (without the header menu, footer menu, signup button, read more section)
Thanks
Gar
samliew
February 5, 2024, 11:57pm
2
Yes, you can use web scraping apps like ScrapeNinja and ScrapingBee where you can specify which sections of the website you want to return.
3 Likes
Have you tried the Text Parser module called HTML to Text?
2 Likes
MrStack
February 6, 2024, 10:37pm
4
Ok thanks
This looks like what I need, but I am getting a 403 error with RapidAPI key
I have opened a new topic on that
Thanks
Gar
MrStack
February 6, 2024, 10:38pm
5
Thanks, but I think that I want to keep the HTML content (images etc)
1 Like
Hi @samliew
This works well for me, thanks
But just wondering, if i want to search for a class with a name in the space (eg “article main-content”)
Is this possible
I have tried
extra single quotes “‘.article main-content’”
i have also tried “.article.main-content”
Any idea
Thanks
samliew
February 7, 2024, 10:35pm
7
.article.main-content
should be correct. No single quotes.
If you need further assistance, please provide the following:
1. Extractor function
Please provide the contents of the extractor function here. Paste the text formatted in this manner:
Either add three backticks ```
before and after the code, like this:
```
input/output bundle content goes here
```
Or use the format code button in the editor:
2 Likes
MrStack
February 10, 2024, 1:49am
8
function (input, cheerio){
let $ = cheerio.load(input);
return {
title: $("h1").text().trim(),
excerpt: $(".body__inner-container").text().trim(),
body: $(".article.main-content").text().trim()
}
}
The above code is what I tried
The end result is, the excerpt and body values give the same result
Thanks for any help
samliew
February 10, 2024, 11:24am
9
Trying your above function with no modifications, in the sandbox ScrapeNinja Live Sandbox ,
Gives the correct result.
2 Likes