Google Sheets Duplicate Removal Issue

What are you trying to achieve?

Get data from a Google Sheet.

Identify and remove duplicates from this dataset.

Use the cleaned (deduplicated) data to parse HTML content.

Add enriched results to another sheet, ensuring no duplicate records are added.

Steps taken so far

I’m using Array Aggregator + Iterator to group and loop through records.

If I use Update Row and tag duplicates with a “Duplicate” flag, it works smoothly — but I don’t want to mark them; I want to delete them entirely.

When I switch to using Delete Row, it only deletes some duplicates — not all of them.

So I reverted to tagging duplicates again using Update Row.

Then, I search for non-duplicate rows and add them to the final sheet — but many of them get added multiple times, even though they shouldn’t.
:light_bulb: What I’m Looking For
A reliable way to delete duplicates, not just mark them.

A method to ensure rows are added only once to the target sheet — even if the original has many similar-looking entries.

Ideally: an efficient deduplication pattern that doesn’t involve manually maintaining a “Duplicate” flag.

Hey Ahsen,

can you show some screenshots of the setup you have?

Best guess → deleting something changes the row number, so trying to delete something else down the sheet fails because the row number changed.

I’ve been struggling with issues in my original scenario, so I decided to take a step back and build a simplified test scenario to better understand the problem. Here’s a breakdown of the original workflow:

  1. Starting Point: The scenario begins by retrieving user-defined metrics for a job search.
  2. Job Search Automation: It uses those metrics to search job websites, extract job titles and URLs, and then add this data to a Google Sheet.
  • Problem 1: Not all job URLs are being captured, and some job entries are added multiple times, causing duplication.
  1. Job Parsing and Processing: From the job list in the Google Sheet, the scenario visits each URL, scrapes the HTML content, converts it to plain text, and runs it through AI modules to generate results. These are then written to a separate “Results” Google Sheet.
  • Problem 2: The results sheet ends up with hundreds of rows containing the same job repeatedly, creating a lot of unnecessary duplicates.

Because this full scenario includes resource-intensive steps like PDF conversion and ChatGPT integration, I didn’t want to keep running it repeatedly for debugging. Instead, I created a test scenario to replicate the issue in a controlled environment — and unfortunately, I’m encountering the same duplication problems here too.

The problem is my Search Row. It gets multiple bundles but I don’t know the solution.

I have replied below