I’m building an automation that, once a week, scrapes a website with a large number of event listings. The events live on a separate sub-pages. E.g. www.example.com/~p1 …/~p2 …/~p3
My current process:
Scrape multiple sub-pages one after the other.
Feed each page’s text into an OpenAI module to extract event details (date, time, title, location).
At the end, I want to compile all OpenAI outputs and use a last OpenAI module to extract events into one HTML table and send it by email.
The problem:
The last AI module only processes one of the previous outputs (one “block”) instead of the combined content.
The automation is very bulky and cost intensive, with going through page after page.
My questions:
How can I streamline the process so that all sub-page are scraped simultaneously ?
Are there best practices for aggregating multiple AI outputs (per page) and then running one final AI call for formatting?
Any tips for reducing token usage but still getting a full, clean HTML table for email delivery?
To do this, you can try using the Flow Control “Repeater” module, instead of duplicating modules for each page.
Then,
Combining Bundles Using Aggregators
Every result (item/record) from trigger/iterator/list/search/match modules will output a bundle. This can result in multiple bundles, which then trigger multiple operations in future modules (one operation per bundle). To “combine” multiple bundles into a single variable, you’ll need to use an aggregator of some sort.
Aggregators are modules that accumulate multiple bundles into one single bundle. An example of a commonly-used aggregator module is the Array aggregator module. The next popular aggregator is the Text Aggregator which is very flexible and can apply to many use-cases like building of JSON, CSV, HTML.
You can find out more about the other types of aggregator modules here: