How to handle "Javascript disabled" issue when web scraping

mobo_dot · September 13, 2024, 6:19pm

Hey guys, I need help while trying to scrape data from a listing website.

The scenario it works perfectly without errors, however, I’m unable to scrape all the data listings on the site.

My scenario is only able to scrape 3 listing results correctly after which the remaining output bundles contain HTML with “Javascript disabled” information, preventing me from scraping the data on subsequent pages.

I tried simulating a normal user interaction experience by generating random values to be used in sleep modules just before HTTP request modules, however, it doesn’t seem to work.

Can anyone provide me with a solution to this.

Attached below is a copy of my scenario for anyone looking to help.

Thanks.
blueprint(1).json (399.5 KB)

Donald_Mitchell · September 13, 2024, 7:25pm

Hello @mobo_dot and welcome to the Make Community!

You may need to use ScrapeNinja or some other app that has a scrape module for this.
Also, check out this post: Scraping data from website like Reddoorz

mobo_dot · September 13, 2024, 7:52pm

Hey man, Thanks for the reference.

I thought I could do it using just the HTTP request module, but after going through some of the articles on here, I figured that the HTTP request cannot handle client-side Javascript.

I will try out scrapeNinja and see how it goes.

samliew · September 14, 2024, 9:58am

Welcome to the Make community!

So you basically need to “visit” the site yourself to get the content. This is called Web Scraping.

Web Scraping

For web scraping, a service you can use is ScrapeNinja to get content from the page.

ScrapeNinja allows you to use jQuery-like selectors to extract content from elements by using an extractor function. ScrapeNinja also can run the page in a real web-browser, loading all the content and running the page load scripts so it closely simulates what you see, as opposed to just the raw page HTML fetched from the HTTP module.

If you want an example, take a look at Grab data from page and url - #5 by samliew

AI-powered “easier” method

You can also use AI-powered web scraping tools like Dumpling AI.

This is probably the easiest and quickest way to set-up, because all you need to do is to describe the content that you want, instead of inspecting the element to create selectors, or having to come up with regular expression patterns.

The plus-side of this is that such services combine BOTH fetching and extracting of the data in a single module (saving operations), and doing away with the lengthy setup from the other methods.

For more information on the different methods of web scraping, see Overview of Different Web Scraping Techniques in Make 🌐

Hope this helps! Let me know if there are any further questions or issues.

— @samliew

P.S.: Investing some effort into the Make Academy will save you lots of time and frustration using Make.