How to Avoid Duplicate or Similar News When Sending an RSS Feed to Google Sheets in Make.com

Hello,

I’m using Make.com to automate an RSS feed to a Google Sheet. The current scenario works as follows:

  1. The RSS module retrieves articles.
  2. A Google Sheets “Search Rows” module checks for duplicates based on the article’s URL or Title.
  3. If the article is unique, it proceeds to be added to the sheet.

Despite this, I’m facing the issue of similar news items being added. For example, multiple sources may report on the same event, and while their URLs or Titles differ slightly, the content is redundant.
Exmaple 1 - Charlotte airport workers plan to strike during busy Thanksgiving travel week - Trumbull Times
Exmple 2 - Charlotte airport workers plan to strike during busy Thanksgiving travel week - Star Tribune
Example 3 - Charlotte Airport Workers Plan to Strike During Busy Thanksgiving Travel Week

image

Could you advise on the following:

  1. How to set up the scenario to avoid adding similar articles (not just exact duplicates)?
  2. Is there a way to compare the content or titles for similarity and apply a threshold to filter out redundant entries?
  3. Any best practices for deduplication or clustering similar content in scenarios like this?

Attached are screenshots of my scenario and output for context.
RSS_Feed_to_Google_Sheet.json (51.2 KB)

Thank you for your help!

I’m facing the same issue here. I have tried to put an OpenAI module in between and ask ChatGPT if there are any similarities between the new RSS topic and the ones already present in the sheet. Then, by filtering its response (yes or no), I can add the title to the sheet.

The problem is that ChatGPT searches rows one by one and gives a reply for each instead of a single reply. Maybe I still haven’t found the right prompt for this case.

I’ll be following on this topic…