How do I get only one Airtable record per customer (Apify → Scrape → Aggregate → OpenAI → Airtable)?

Goal:
For each Apify dataset item (customer) I want exactly one OpenAI call and exactly one record created in Airtable — even if I scrape multiple subpages from the customer’s website.

Current flow:

  • Apify → Get Dataset Items (gives company, website)

  • HTTP (main website)

  • Extract links → HTTP (subpages) → HTML to text

  • Array Aggregator → Text Aggregator

  • OpenAI (ChatGPT)

  • Airtable (Create Record)

Problem:
Airtable still receives multiple records per customer (depending on the number of subpages). OpenAI also fires multiple times.

Main Question:
:backhand_index_pointing_right: How do I configure the Array/Text Aggregators so that for each Apify item (customer) there is only one single output bundle → 1x OpenAI → 1x Airtable?

Welcome to the Make community!

You simply need to use an aggregator for the previous Text Parser module.

Combining Bundles Using Aggregators

Every result (item/record) from trigger/iterator/list/search/match modules will output a bundle. This can result in multiple bundles, which then trigger multiple operations in future modules (one operation per bundle). To “combine” multiple bundles into a single variable, you’ll need to use an aggregator of some sort.

Aggregators are modules that accumulate multiple bundles into one single bundle. An example of a commonly-used aggregator module is the Array aggregator module. The next popular aggregator is the Text Aggregator which is very flexible and can apply to many use-cases like building of JSON, CSV, HTML.

You can find out more about the other types of aggregator modules here:

Question: Which is the best aggregator do you think you’ll need for your use-case?

Mapping a Specific Structure Into a Complex Field

If you have an array of collections, in programming terms, this is called an array of objects, or an array with non-primitive data types (“complex”).

The Array Aggregator module is very powerful because it allows you to build a new complex array of collections that matches a later module’s field to map multiple items (collections) to it. Such fields initially would allow you to manually add items, but you can toggle the “Map” switch to the “on” state and map a whole array into a single field.

This is done by selecting the “Target structure type” in an Array Aggregator module.

As you can see from the above example, the “Map” toggle on complex fields are used when you have an array variable (like from an array aggregator). Other combinations of modules may also allow you to generate an array that matches a future field’s array structure, like “Aggregate to JSON + Parse JSON”, or “Create JSON + Parse JSON”, but this is an advanced topic.

Question: Are you mapping your array into a field that accepts more than one item/collection?

Example

Here is an example of how your scenario could look like:

This is just an example. Your final solution may or may not look like this depending on your requirements and actual data.

For more information, see “Mapping with arrays” in the Help Centre. You should also do the Make Academy, which also covers the use of Iterators & Aggregators.

Hope this helps! Let me know if there are any further questions or issues.

@samliew
P.S.: investing some effort into the tutorials in the Make Academy will save you lots of time and frustration using Make!

1 Like

Hi, thanks again for your help earlier with storing the contacts individually in Airtable – that part is working now.

Right now I’m stuck at the link filtering step in my scenario:

  • I’m scraping websites, extracting all <a> elements, and then using a filter to only keep “good links” (like /about, /team, /karriere, /leistungen, etc.).

  • The issue: my filter is blocking everything. Even when the links clearly contain those keywords, nothing passes through.

  • I tried combining an Include regex (to allow only relevant paths) with an Exclude regex (to filter out assets like .png, .css, mailto: etc.), but no links make it past the filter.

  • Because of that, my Array Aggregator right after “Links extrahieren” receives no bundles and stays empty.

  • So ChatGPT later in the flow only gets the person’s name/company but no real website text context.

Basically: the regex filter setup is the bottleneck — it’s blocking all links instead of letting through the relevant ones.

Could you help me adjust the regex/filter logic so that only the useful company subpages (About, Team, Services, Jobs, etc.) pass through, while skipping logos, assets, or legal pages?

No problem, glad I could help you with “[Question] How do I get only one Airtable record per customer (Apify → Scrape → Aggregate → OpenAI → Airtable)?

1. If anyone has a new question in the future, please start a new thread. This makes it easier for others with the same problem to search for the answers to specific questions, and you are more likely to receive help as newer questions are displayed higher on the forum’s “new” page.

2. The Make Community guidelines encourages users to try to mark helpful replies as solutions to help keep the Community organized.

This marks the topic as solved, so that:

  • others can save time when catching up with the latest activity here, and
  • allows others to quickly jump to the solution if they come across the same problem

To do this, simply click the checkbox at the bottom of the post that answers your question:
Screenshot_2023-10-04_161049

3. Don’t forget to like and bookmark this topic so you can get back to it easily in future!

Hope this helps! Let me know if there are any further questions or issues.

@samliew

1 Like