Scrape any kind of product pages

Cetryn · October 29, 2024, 9:56am

What are you trying to achieve?

I have been reading up on web scraping for the last few days and wanted to ask if it is possible to scrape information from sites such as Aliexpress.

Yesterday I saw a tutorial somewhere here and I tried it and it worked. Everything was based on a HTML page.
Then I wanted to try a product from Aliexpress and he gave me a lot of code/scripts where I currently have no access.

After that I found a webscraper (ParseHub). I haven’t tried it yet but it seems to be promising for the beginning.
Nevertheless…
Is there a way to get this data with make that is on the aliexpress page or do I have to use an additional program?
Can someone tell me the best way to do that so i can figure it out?

Greetings

Steps taken so far

Only Http Request

Screenshots: scenario setup, module configuration, errors

dilipborad · October 29, 2024, 11:10am

Hello @Cetryn,
I’ve used ParseHub to scrape different web pages of my customers.
Parsehub gives an easy-to-use approach with just selections of elements on the page. Also, you can grab multiple pages based top level listing and paging as well. They also have easy-to-use API integration as well.

My only concern is,

Pricing is too high when you go to paid plans.
The platform is good enough to have all different features but it’s not been updated for years. Product Updates - Web Scraping Blog (Tips, Guides + Tutorials) | ParseHub

Note:- All suggestions of notes based on my experience, it’s possible that you’ve get different.

P.S.: Always search first, Check Make Academy. If this is helpful, mark it as a solution and
Need expert help or have questions? Contact or comment below!

samliew · October 30, 2024, 7:35am

Welcome to the Make community!

So you basically need to “visit” the site yourself to get the content. This is called Web Scraping.

Web Scraping

For web scraping, a service you can use is ScrapeNinja to get content from the page.

ScrapeNinja allows you to use jQuery-like selectors to extract content from elements by using an extractor function. ScrapeNinja also can run the page in a real web-browser, loading all the content and running the page load scripts so it closely simulates what you see, as opposed to just the raw page HTML fetched from the HTTP module.

If you want an example, take a look at Grab data from page and url - #5 by samliew

AI-powered “easier” method

You can also use AI-powered web scraping tools like Dumpling AI.

This is probably the easiest and quickest way to set-up, because all you need to do is to describe the content that you want, instead of inspecting the element to create selectors, or having to come up with regular expression patterns.

The plus-side of this is that such services combine BOTH fetching and extracting of the data in a single module (saving operations), and doing away with the lengthy setup from the other methods.

More information, other methods

For more information on the different methods of web scraping, see Overview of Different Web Scraping Techniques in Make 🌐

Hope this helps! Let me know if there are any further questions or issues.

— @samliew

P.S.: Investing some effort into the Make Academy will save you lots of time and frustration using Make.

Cetryn · November 1, 2024, 7:40am

Thank you

I did some research on Parsehub but the results were somehow incorrect or I was blocked by too many scraps.

I had success with Scrapeninja but I don’t know how to extract specific data. I think I need to learn some code. I tried the extractors and didn’t get the right results.
Do I have to write the extractors myself for my needs? Now i get a huge list of something

As far as I understand it, I have to find the elements (browser console) of the website and then write an extractor that pulls the data from the page. Is that correct?
I am writing to understand the workflow…

I have currently found a semi-automatic workaround. but would like to replace the workflow later.

Greetings

samliew · November 3, 2024, 10:20pm

Did you know, this forum has a Hire a Pro category, where you can post your request for off-site specialised help on other platforms (video call/screenshare/private messaging/etc.)? This may help you get your issue resolved faster especially if it is urgent. It is important to post your request in the Hire a Pro category, as forum members are not allowed to advertise their services elsewhere (like here).

dilipborad · November 4, 2024, 5:03am

Hello @Cetryn,
By default, ParseHub doesn’t support IP rotation in their free plans. IP Rotation is only available in paid plans. When any webpage is secured against scraping, it will always be a struggle to get data from it. If you need to try to use ParseHub paid plans then also confirm which type of IPs are used. Normal IP or Residential Proxies. check more about it.

Now about using Scrapeninja I hope you’ve already read this help from them Using ScrapeNinja with Make.com | ScrapeNinja
On that page at the end, there is a link to Cheerio Sandbox: Basic example.