I have an automation scenario where the Exa web crawler module can experience errors sometimes. I currently mitigate this in general via a break error handler, and in 80% of the time that is sufficient since the Exa module can continue because of the break retry attempts.
But 20% of the scenarios still lead to an error after the number of retry attempts (currently 3).
That scenario I’d like to mitigate as well. I’d like that (after the the number of break retry attempts) the flow continues but then resuming with another web crawler module from Apify.
So:
- crawler Exa module error
- break attempt #1
- break attempt #2
- break attempt #3
- resume with crawler Apify module
Is this possible? Thanks in advance for thinking along.
@stenkate
Wouldn’t using Ignore in the Error Handling Module solve the problem?
It may not be an accurate answer because we don’t know anything about what the current scenario is like, what errors are occurring, or whether we need to retry with errors in the first place.
Thanks for your response, and sorry for my late reply. I prefer Exa’s capabilities of extracting web pages over any other similar service, and also the break error retries already mitigate a lot of the first errors. But sometimes these retries don’t catch all errors unfortunately. And for that scenario (so a break retry 3x does not mitigate the error) I’d like to use an additional web scraping service.
If I’m correct, your proposal of using an ignore error instead of a break retry would mean that I’d miss the to be scraped/extracted web data entirely in case the first Exa attempt fails. I cant miss that data actually, hence I’m looking for a more stable solution.
Thanks for the explanation. I finally understand.
You are right, Break Error Handling is indeed the best choice.
I am not sure if the Error is caused by Exa(?) The easiest solution would be to increase the number of Retries?
I feel that this is probably not practical due to the increased number of operations.
Thanks. The error is caused by Exa in this case: my requested webpages are not (yet) part of their index and the service gives a specific 400 error. My config of a break retry x3 already mititages that error a lot, but I don’t think increasing this break retry will solve the remaining open errors.
That’s why I’m looking for a possible solution where I can use another web scraping service as a resume error handling for the scenario where the 3x break retries still lead to the specific 400 error.
I am sorry, but I have never used the Scraping service itself, so I cannot suggest an alternative.
I hope other Communities can help.
Thanks I understand.
In general I’m hoping to find an answer to the general request to use a break and resume error handling together in one flow. We’ll see!
1 Like
After contact with Make support and some own experimentation, I have an update with working workaround which I’d like to share here with the community for completeness.
Important conclusion:
It seems not possible to combine a resume error handling together with a break error handling.
However:
I mitigated this via this setup:
table #1 = each record is an online article with e.g. url and text description
table #2 = each record is an error of the web crawler module
- add a ‘create record’ in table #2 module between the web crawling module and the error handling, where each record contains the url that errors to crawl
- create another Make automation flow with the following setup:
- search records: search the earlier created error records
- text aggregator: row separator = , and group by url
- filter with numeric operator: greater than or equal to: 3 (because max number of break error retries I set to 3 in the original Make flow)
{{length(split(2.text; “,”))}}
- crawl module: crawl the url from the text aggregator result for the text description
- search records: search the record ID’s of the original Make automation based (in table #1) based on the url from the text aggregator result (which is crawled in the previous step)
- update records: update the records of the previous step with the crawled output based on ID’s
- search errors: search the error record ID’s in table #2 based on the url from the text aggregator result
- delete records: delete the records of the previous step based on ID’s
2 Likes