How to grab only the main content and main image URL from webpage?

Hello everyone. First time user first time poster

I’m trying to get the main content and main image URL from a webpage. I used the http get and returned the full html source. Is there a way to only get what I want? Thank you in advance :pray:t2:

Welcome to the Make community!

So you basically need to “visit” the site yourself to get the content. This is called Web Scraping.

Web Scraping

For web scraping, some apps you can use are ScrapingBee and ScrapeNinja to get content from the page.

I’ve used ScrapeNinja, and you can use jQuery-like selectors in the extractor function.

ScrapeNinja also can run the page in a real web-browser, loading all the content and running the page load scripts so it closely simulates what you see, as opposed to just the raw page HTML fetched from the HTTP module.

If you want an example, take a look at Grab data from page and url - #5 by samliew

AI-powered “easier” method

You can also use AI-powered web scraping tools like Dumpling AI.

This is probably the easiest and quickest way to set-up, because all you need to do is to describe the content that you want, instead of inspecting the element to create selectors, or having to come up with regular expression patterns.

The plus-side of this is that such services combine BOTH fetching and extracting of the data in a single module (saving operations), and doing away with the lengthy setup from the other methods.

For more information on the different methods of web scraping, see Overview of Different Web Scraping Techniques in Make 🌐

Hope this helps! Let me know if there are any further questions or issues.

@samliew


P.S.: Did you know, the concepts of about 70% of questions asked on this forum are already covered in the Make Academy. Investing some effort into it will save you lots of time and frustration using Make later!