Avoiding duplicates when bulk adding to google sheets

I’m trying to upload leads to a spreadsheet while checking for duplicates in a specific column, without using hundreds of credits at a time - is this possible?

-

I have a scenario that scrapes leads from Apify, checks for duplicates in an existing google sheet, uploads the new leads, then uses AI for some further enrichment (we can ignore the last part).

This uses up a lot of credits, especially when passing 100+ leads at a time. I tried setting up aggregators which reduced the credit usage, but since the data is nested in the arrays I am now unable to filter duplicates properly.

Original scenario:

This is the test scenario I’m using to try and reduce operations:

My first problem here is I’m unsure how to map everything so that all items in the array are bulk uploaded to google sheets.

My other problem is the filter only checks for the first output in the array, which then stops anything pulling through:

There are duplicates on the sheet so this first operation would be correct, however it’s not checking through every lead.

Is what I’m trying to achieve possible, or am I going to have to suck it up deal with the credit usage?

Many thanks

Hi @Laurie_Brocklehurst

You have an interesting optimization issue.

You see, by design, Make relies on simplicity rather than efficiency, meaning your solution will be up and running fast, but won’t really be time or cost-savvy.

In your use case, you need to check each lead individually, resulting in as many operations as the number of leads you have.

That’s how it works in Make. There is no simple and easy way to compare items inside an array without “cracking open” the array.

Therefore, if you really need to decrease the number of used operations, you’ll need to do the comparisons outside of Make.

Good news is that you already have a good starting point.

Aggregate all the results from Apify and Bulk insert the resulting array to a Google spreadsheet tab.

On another tab, use Google Sheet’s UNIQUE and QUERY functions to remove duplicates and, on an adjacent cell, search the linkedin_url for that row in the original tab, either with XLOOKUP or something similar.

(If you don’t have write access to the original spreadsheet, use Make modules to copy all of its rows to your working spreadsheet.)

Then, add another Search Rows module with the condition where the linkedin_url “Does not exist”.

If I understood it correctly, that should do the trick, or at least point you in the right direction.

@damato

Thanks for confirming! I’ll do exactly that, now I can stop going in circles trying to make this work haha

1 Like