Issue with filtering duplicates

Hello, I will try my best to make this as simple as possible.

I’m trying to build a “Lead generator” scenario by using Apify, and pumping new leads in my Google Sheets. Obviously I want to avoid duplicates, and I’m having issues trying to work with filters, to filter out duplicates. I have no coding background whatsoever, I’m sure this is easily solvable…

Initially, my scenario was as follows: Apify Run an Actor module > Apify Get Dataset Items > Google Sheets Create New Row. This worked fine, however I started getting duplicates which is normal.

So I tried using the “Google Sheets Search Rows” module (I’ve set it to extract information about a specific column, in this case my unique identifier is “Business names”) along with a filter which I’ve set to: If Google Sheets Data “Business names” is NOT EQUAL to new Apify actor data, then it passes.
Unfortunately this didn’t work the way it’s supposed to, duplicates still emerged because it seemed that the filter compared only ONE row of Google Sheets “Business Names” data to a new Apify data instead of the WHOLE column of business names. Make sense?

So I tried using an Aggregator. Set it to only aggregate the “business names”. Mapped it to the filter (compare array to new data) but now it compares data ONE BY ONE, meaning: every new Apify data is compared to only one “Business name” from my google sheets instead of the entire column. Resulting in hundreds of duplicates of the same row.

Hopefully this makes sense. I’d love to clarify if you have questions. Below are images/illustrations to help convey the issue I’m facing.

Also, I’m thinking maybe I should put the “Search Rows” module at the start because it’s eating up the amount of operations I have.

How do I fix this? Or how do I achieve what I want effectively?

Any help is much appreciated :slight_smile:


Update:

I’ve made some changes. I put the Search Rows module at the start to prevent it from doing lots of operations. I also updated my filter to become an Array Operator and not a Text Operator (apparently that was an issue). Except now it doesn’t let anything through, the filter says the array contains all the new data (which it doesn’t). Image below.

Here are the output bundles of the Array and JSON:
Make 4.txt (96.3 KB)

1 Like

@Jeroom

This is because you are mapping it wrong. From array you have mapped only first item and that is of text type. You should use map() like below:

{{map(Array;2)}} does not contain “title”

Anyway, you even don’t need this aggregator.

Get data set items > search rows from google sheet > filter “row number” does not exist > then proceed to create row.

inside “search rows” module, set the filter to search by business name and limit to 1.

Regards,
Msquare Automation - Platinum Partner of Make
@Msquare_Automation

Super, it works!

Is there a way to make this more efficient (operations wise)? My search rows is as many operations as there is scraped data (I have set the limit to 1). I’m assuming I’d have to put the Search rows module at the start but I’m unsure how to proceed from there.

1 Like

You will need to remove the limit and search all rows, then aggregate them, and then use map() and get() functions over that array to match the records.

Regards,
Msquare Automation - Platinum Partner of Make
@Msquare_Automation

1 Like

I’ll try it, thanks for your help :slight_smile:

1 Like