Hello,
I’m using Make.com to automate an RSS feed to a Google Sheet. The current scenario works as follows:
- The RSS module retrieves articles.
- A Google Sheets “Search Rows” module checks for duplicates based on the article’s URL or Title.
- If the article is unique, it proceeds to be added to the sheet.
Despite this, I’m facing the issue of similar news items being added. For example, multiple sources may report on the same event, and while their URLs or Titles differ slightly, the content is redundant.
Exmaple 1 - Charlotte airport workers plan to strike during busy Thanksgiving travel week - Trumbull Times
Exmple 2 - Charlotte airport workers plan to strike during busy Thanksgiving travel week - Star Tribune
Example 3 - Charlotte Airport Workers Plan to Strike During Busy Thanksgiving Travel Week
Could you advise on the following:
- How to set up the scenario to avoid adding similar articles (not just exact duplicates)?
- Is there a way to compare the content or titles for similarity and apply a threshold to filter out redundant entries?
- Any best practices for deduplication or clustering similar content in scenarios like this?
Attached are screenshots of my scenario and output for context.
RSS_Feed_to_Google_Sheet.json (51.2 KB)
Thank you for your help!