Apify to scrape the top 10 Google SERP results for a given search term.
Using Iterator to create each URL as a single line item.
Pass this to Firecrawl via the HTTP module.
Insert the data response from Firecrawl into a new paragraph in a Google Doc.
– These two steps should loop until all the URLs from step 2 have been crawled and added to the Google Doc.
I then want to Get Content of Document for the Google Doc so that I can pass it to ChatGPT for analysis in future steps.
HOWEVER, not matter what I try, I cannot get the scenario to wait until the 10 URLs have looped.
I’m not helped by the fact that sometimes, the Apify scraper returns a blank URL field. I have overcome this by adding a Filter after the Iterator in Step 2 “if URL exists”.
Please help a noob! Thank you all
Steps taken so far
Repeat function, adding an Aggregator after the Google Doc step - about 6 hours of YouTube, Perplexity, and ChatGPT but I am pretty sure I am asking the wrong questions at this point! Next step, beer.
You need to set the “Source Module” field of the aggregator to where the bundles are coming from. This is usually an Iterator module, but can also be a search/list/repeater module.
For more information, please refer to the Make Academy.
Aggregators
Every result (item/record) from iterator/list/search/match modules will output a bundle. This can result in multiple bundles, which then trigger multiple operations in future modules (one operation per bundle). To “combine” multiple bundles into a single variable, you’ll need to use an aggregator of some sort.
Aggregators are modules that accumulate multiple bundles into one single bundle. An example of a commonly-used aggregator module is the Array aggregator module. The next popular aggregator is the Text Aggregator which is very flexible and can apply to many use-cases like building of JSON, CSV, HTML.
There are other types of aggregator modules, click the below links to find out more:
Array Aggregator – mapping multiple bundles into a complex field
The Array Aggregator module is very powerful because it allows you to build a complex array of collections for a later module’s field to map multiple items (collections) to it.
This is done using the “Target structure type” of an Array Aggregator module.
As you can see, the “Map” toggle on complex fields are used when you have an array. You can easily build an array variable to map to a future module’s field, by using an Array Aggregator module and select the “Target Structure Type” as the future module’s field you have mapped the array into.
Hope this helps! Let me know if there are any further questions or issues.
In my flow, I am using Apify to scrape the top 10 results for a given search term.
Then, the Firecrawl scraper seems to require being fed URLs one at a time. Therefore, I am using the Iterator to achieve this.
Firecrawl scrapes and returns a result after each page (it doesn’t appear to have an ‘at end of run’ option in their API so I am expecting multiple operations here).
Now I want to put everything into a Google Doc because the character limits of Firecrawl’s outputs exceed Google Sheets limits - to achieve this (and more hopefully, close the Iterator/Firecrawl loop) I have added the Aggregator step.
As you advise, I have set the Source Module to “Iterator [5]” and selected the “data” value from the Firecrawl Webscraper as this is what I need to consolidate and put into the Google Doc.
The filter between the Apify and Iterator steps is to handle cases where Apify outputs a blank ‘url’ value.