I’m a beginner and wondering how can I add filtering option of some sort to avoid already uploaded URLs from my selected news outlets as well as similar texts (which is paraphrased by chatGPT) because as you understand, some news outlets publish the same news.
Could you please explain in details how can I make the duplicates and already published news into my Google Sheets disappear?
To filter out duplicate results, you first have to “load” the existing results into your scenario so that you can compare the current item against the list.
To do this, you can use the Google Sheets “Search Rows” module:
As I see from the screenshot, the beginning of the sequence is modified, however, I do not understand how to proceed to actually avoid the duplicates to repeat…
I really tried my best to scrape the internet to find if people encountered this type of problem AND of course they did… the issues is that neither them nor me could fix it with extensive help.
I have rather a simple sequence flow: - scraping the selected news outlets - modifying the news description with chatGPT - returning the result into Excel spreadsheet with Title, Content and URL.
My problem is getting the Content and URL not to repeat. I can imagine solving the repetency of URL is easier, yet I couldn’t even solve that.
The Content (modified news description by chatGPT) is trickier, but it still could detect the similar style or pattern of the article, yet I struggle to implement it as well…
I would appreciate a head-on approach with this. Hopefully other people can benefit later from potential solution.
Here is my JSON blueprint export if anyone can help directly - blueprint (1).json (181.3 KB)