Scraping Timeout Roadblock

Good morning Community Makers, hope you’re all doing well! I’m running this actor “Website Content Crawler” by Apify and I’ve run into an issue for the automation I’m building. Here’s what’s happening:

  • When a lead enters my database (Airtable), I want to use the Website Content Crawler to scrape through the website to provide LLM ready data.
  • A manual trigger within Airtable is used to trigger the run itself on the website.
  • There’s a timeout limitation of 120 seconds whether you run the scrape synchronously or not.
  • Because of this, websites with over x amount of pages can’t be scraped in one run. For example, I’m scraping a website w/ 200+ for testing.

Instead of one workflow within Make.com, I’ll have to make two workflows:

  1. The button that triggers the run in Airtable and starts the “Run an Actor” Apify Actor and store the Get Dataset Items ID & Run ID in Airtable.
  2. Second workflow retrieves the “Get Dataset Items” results from the Run ID or Get Dataset ID (not clear on it) through manual trigger which will retrieve all the bundles scraped.

Question: Is there a way to trigger the “Get Dataset Items” Apify module within Make when the run completes? I’m not sure if that’s a native feature. I also screenshot the API for that module - wondering if that could do it?

Thanks for the attention!

I resolved the issue - you can create a custom webhook to grad a webhook from Apify. All you have to do is go to the menu section of your actor and look for ‘Integrations’. From there, create a custom webhook in Apify where you’ll store the url of the custom webhook and indicate under ‘Event Types’: “Run succeeded”. This will commence the trigger, similar to what a ‘Watch Results’ module would in Airtable, but specifically for website scrapes or operations that are too long to run.

Hope this helps someone save hours of time in the future :joy:

2 Likes