How can I prevent duplicate or similar news entries from being sent to Google Sheets?

What are you trying to achieve?

Hello Makers!

I’m a beginner and I struggle a bit with the following…

I really tried my best to scrape the internet to find if people encountered this type of problem AND of course they did… the issues is that neither them nor me could fix it with extensive help.

I have rather a simple sequence flow: - scraping the selected news outlets - modifying the news description with chatGPT - returning the result into Excel spreadsheet with Title, Content and URL.

My problem is getting the Content and URL not to repeat. I can imagine solving the repetency of URL is easier, yet I couldn’t even solve that.

The Content (modified news description by chatGPT) is trickier, but it still could detect the similar style or pattern of the article, yet I struggle to implement it as well…

I would appreciate a head-on approach with this. Hopefully other people can benefit later from potential solution.

I attached a JSON blueprint export if anyone can help directly (I run out of operations testing all possible solutions xD)

Screenshots: scenario setup, module configuration, errors

blueprint (1).json (181 KB)

Hey Vlas,

just to clarify, is there a specific technical problem you have with the scenario or is it more of a general issue in the process?

If you have the URLs in an array, you can use the distinct() function to get only unique values.

But regarding the content of articles being similar… this sounds more like a prompt to ChatGPT you need to send to find the similar ones and remove them from the array. Not a Make specific issue that can be solved with a couple of routers or formulas. Maybe have one assistant generate the articles and when its done - send all the articles to a second assistant to flag similar ones?

1 Like

Hello!

Thank you for your reply!

I think I tend to agree with you that it is my technical problem rather than the process issue.

Regarding your advice with URLs in array and distinct function… I don’t wanna sound unknowledgeable, but it is my literally first automation, so I don’t really know what you’re talking about …

Sorry about that, I will try to keep up my skills in make.com.

How would you implement it in the process? I just haven’t figured it out yet…

  1. As for now, the filter for URLs only checks the first row, but no other rows. The idea is to check the whole spreadsheet and if URLs repeat - don’t let it go through. How do I apply it to all rows minimizing the operations spent? :thinking: Also whether I put “Contains” or “Does Not Contain” it just doesn’t go through.

  2. For the content of articles being similar I’m not sure how do I implement it technically with the prompt and models…

I would appreciate any help with it!