Hi I have a quick question,
I am trying to learn how to use a HTTPs request module to scrape real estate websites however I’ve found that certain websites would result in “unexpected error contact support”. I then tried to the do the same thing with another real estate website and it worked perfectly. Both times I didn’t use any headers or parameters, so I know its to do with the specific website. Ive then ran it through Postman to get more information on the error, but it just says error 429 too many requests.
My questions are:
why do some websites not work, are there restrictions in place that are blocking the GET request?
If there are restrictions and i need to scrape a website (say for a client), how do I overcome this problem, so I am able to get the HTML data.
This could be intermittent server issues on the external service.
Yes, this is entirely possible that external websites implement anti-scraping measures.
You can use a service that specialises in getting around restrictions.
Handle errors so scenarios don’t stop.
You might want to add some error handling to your modules to handle exceptions, so the failing module(s) can automatically be retried or ignored. By handling any errors by adding a “Error handler” to the module, the scenario won’t throw an error and get turned off.
Error directives can only be added to an error handler route.