Filter RSS entry with multiple keywords and "deduplicate" entries

Hello everyone… I would like to ask if anyone has an idea to help resolve a problem with duplicate RSS entries…

I’m working in a scenario with the following configuration: a Google Spreadsheet contains a column with RSS sources and for each source, a keyword in the right column. The scenario daily finds a set of new RSS items for each source that mention the keyword and includes them in a Diggest email with all the news extracted in this way:

image

Reproducing the scenario tutorial, I have more than one keyword for each source, so I repeated the source on the bottom line with another keyword in the cell on the right. It happens that some RSS entries mention more than one keyword, and these entries end up being repeated in the list of news generated at the end for sending the email.

Is there a way to filter the result to avoid repetitions? Or, better yet, check all the keywords at once so that the entry is included IF ONE OR MORE keywords are found, without repeating the check and generating multiple positive results for the same entry?

Thank you for your help!

Welcome to the Make community!

For assistance, please provide the following:

1. Read-only link to the Google sheet

2. Scenario blueprint

Please export the scenario blueprint file to allow others to view the mappings and settings. At the bottom of the scenario editor, you can click on the three dots to find the Export Blueprint menu item.

Screenshot_2023-08-24_230826
(Note: Exporting your scenario will not include private information or keys to your connections)

Uploading it here will look like this:

blueprint.json (12.3 KB)

Following these steps will allow others to assist you here. Thanks!

2 Likes

Sure!

Read only link: Fontes de RSS - Google Sheets

blueprint.json (44.5 KB)

Thank you!

The issue is that you were calling the “Get RSS Feed Items” module multiple times per unique RSS feed.

First you should aggregate (group by RSS feed URL).

Then this way you will only call the RSS module a total of 2 times instead of 20 times.

Then you can filter by a pattern of keywords (the keywords were aggregated into an array)

3 Likes

You can copy and paste this module export into your scenario. This will paste the modules shown in my screenshots above.

  1. Copy the code below by clicking the copy button when you mouseover the top-right of the code block
    Screenshot_2024-01-17_200117

  2. Enter your scenario editor. Press ESC to close any dialogs. Press CTRLV to paste in the canvas.

  3. Click on each imported module and save it. You may need to remap some variables.

Modules JSON Export

{
    "subflows": [
        {
            "flow": [
                {
                    "id": 10,
                    "module": "google-sheets:filterRows",
                    "version": 2,
                    "parameters": {
                        "__IMTCONN__": 95013
                    },
                    "mapper": {
                        "from": "share",
                        "limit": "100",
                        "sheetId": "Fontes",
                        "sortOrder": "asc",
                        "spreadsheetId": "1gb4brHxIewzECGvrPC44OsqR_5L6LBKOnCb6RktJfzQ",
                        "tableFirstRow": "A1:CZ1",
                        "includesHeaders": true,
                        "valueRenderOption": "FORMATTED_VALUE",
                        "dateTimeRenderOption": "FORMATTED_STRING"
                    },
                    "metadata": {
                        "designer": {
                            "x": 114,
                            "y": 1
                        },
                        "restore": {
                            "expect": {
                                "from": {
                                    "label": "Select from all"
                                },
                                "orderBy": {
                                    "mode": "chose",
                                    "label": "Empty"
                                },
                                "sheetId": {
                                    "mode": "chose",
                                    "label": "Fontes"
                                },
                                "sortOrder": {
                                    "mode": "chose",
                                    "label": "Ascending"
                                },
                                "tableFirstRow": {
                                    "label": "A-CZ"
                                },
                                "includesHeaders": {
                                    "mode": "chose",
                                    "label": "Yes"
                                },
                                "valueRenderOption": {
                                    "mode": "chose",
                                    "label": "Formatted value"
                                },
                                "dateTimeRenderOption": {
                                    "mode": "chose",
                                    "label": "Formatted string"
                                }
                            },
                            "parameters": {
                                "__IMTCONN__": {
                                    "data": {
                                        "scoped": "true",
                                        "connection": "google"
                                    },
                                    "label": "Google Docs"
                                }
                            }
                        },
                        "parameters": [
                            {
                                "name": "__IMTCONN__",
                                "type": "account:google",
                                "label": "Connection",
                                "required": true
                            }
                        ],
                        "expect": [
                            {
                                "name": "from",
                                "type": "select",
                                "label": "Enter a Spreadsheet ID and Sheet Name",
                                "required": true,
                                "validate": {
                                    "enum": [
                                        "drive",
                                        "share"
                                    ]
                                }
                            },
                            {
                                "name": "valueRenderOption",
                                "type": "select",
                                "label": "Value render option",
                                "validate": {
                                    "enum": [
                                        "FORMATTED_VALUE",
                                        "UNFORMATTED_VALUE",
                                        "FORMULA"
                                    ]
                                }
                            },
                            {
                                "name": "dateTimeRenderOption",
                                "type": "select",
                                "label": "Date and time render option",
                                "validate": {
                                    "enum": [
                                        "SERIAL_NUMBER",
                                        "FORMATTED_STRING"
                                    ]
                                }
                            },
                            {
                                "name": "limit",
                                "type": "number",
                                "label": "Maximum number of returned rows"
                            },
                            {
                                "name": "spreadsheetId",
                                "type": "text",
                                "label": "Spreadsheet ID",
                                "required": true
                            },
                            {
                                "name": "sheetId",
                                "type": "select",
                                "label": "Sheet Name",
                                "required": true
                            },
                            {
                                "name": "includesHeaders",
                                "type": "select",
                                "label": "Table contains headers",
                                "required": true,
                                "validate": {
                                    "enum": [
                                        true,
                                        false
                                    ]
                                }
                            },
                            {
                                "name": "tableFirstRow",
                                "type": "select",
                                "label": "Column range",
                                "required": true,
                                "validate": {
                                    "enum": [
                                        "A1:Z1",
                                        "A1:BZ1",
                                        "A1:CZ1",
                                        "A1:DZ1",
                                        "A1:MZ1",
                                        "A1:ZZ1",
                                        "A1:AZZ1",
                                        "A1:BZZ1",
                                        "A1:CZZ1",
                                        "A1:DZZ1",
                                        "A1:MZZ1",
                                        "A1:ZZZ1"
                                    ]
                                }
                            },
                            {
                                "name": "filter",
                                "type": "filter",
                                "label": "Filter",
                                "options": "rpc://google-sheets/2/rpcGetFilterKeys?includesHeaders=true"
                            },
                            {
                                "name": "sortOrder",
                                "type": "select",
                                "label": "Sort order",
                                "validate": {
                                    "enum": [
                                        "asc",
                                        "desc"
                                    ]
                                }
                            },
                            {
                                "name": "orderBy",
                                "type": "select",
                                "label": "Order by"
                            }
                        ],
                        "interface": [
                            {
                                "name": "__IMTLENGTH__",
                                "type": "uinteger",
                                "label": "Total number of bundles"
                            },
                            {
                                "name": "__IMTINDEX__",
                                "type": "uinteger",
                                "label": "Bundle order position"
                            },
                            {
                                "name": "__ROW_NUMBER__",
                                "type": "number",
                                "label": "Row number"
                            },
                            {
                                "name": "__SPREADSHEET_ID__",
                                "type": "text",
                                "label": "Spreadsheet ID"
                            },
                            {
                                "name": "__SHEET__",
                                "type": "text",
                                "label": "Sheet"
                            },
                            {
                                "name": "0",
                                "type": "text",
                                "label": "RSS (A)"
                            },
                            {
                                "name": "1",
                                "type": "text",
                                "label": "keyword (B)"
                            }
                        ]
                    }
                },
                {
                    "id": 12,
                    "module": "builtin:BasicAggregator",
                    "version": 1,
                    "parameters": {
                        "feeder": 10
                    },
                    "mapper": {
                        "1": "{{10.`1`}}"
                    },
                    "metadata": {
                        "designer": {
                            "x": 356,
                            "y": -1
                        },
                        "restore": {
                            "extra": {
                                "feeder": {
                                    "label": "Google Sheets - Search Rows [10]"
                                },
                                "target": {
                                    "label": "Custom"
                                }
                            }
                        },
                        "advanced": true
                    },
                    "flags": {
                        "groupBy": "{{10.`0`}}",
                        "stopIfEmpty": true
                    }
                },
                {
                    "id": 2,
                    "module": "rss:ActionReadArticles",
                    "version": 4,
                    "parameters": {
                        "include": []
                    },
                    "mapper": {
                        "url": "{{12.`__IMTKEY__`}}",
                        "gzip": true,
                        "password": "",
                        "username": "",
                        "maxResults": "100",
                        "filterDateTo": "",
                        "filterDateFrom": ""
                    },
                    "metadata": {
                        "designer": {
                            "x": 600,
                            "y": 0
                        },
                        "restore": {},
                        "parameters": [
                            {
                                "name": "include",
                                "type": "select",
                                "label": "Process RSS fields",
                                "multiple": true,
                                "validate": {
                                    "enum": [
                                        "google-merchant-center",
                                        "itunes"
                                    ]
                                }
                            }
                        ],
                        "expect": [
                            {
                                "help": "Enter a URL that points to a valid RSS or Atom export.",
                                "name": "url",
                                "type": "url",
                                "label": "URL",
                                "required": true
                            },
                            {
                                "help": "Only for password-protected export.",
                                "name": "username",
                                "type": "text",
                                "label": "User name",
                                "advanced": true,
                                "required": false
                            },
                            {
                                "help": "Only for password-protected export.",
                                "name": "password",
                                "type": "text",
                                "label": "Password",
                                "advanced": true,
                                "required": false
                            },
                            {
                                "help": "Enter a date in the 'date from' field from which you want to filter RSS feed items. Make will process all items that are more recent than the specified date.",
                                "name": "filterDateFrom",
                                "type": "date",
                                "label": "Date from"
                            },
                            {
                                "help": "Enter a date in the 'date to' field from which you want to filter RSS feed items. Make will process all items dated on or before the specified date.",
                                "name": "filterDateTo",
                                "type": "date",
                                "label": "Date to "
                            },
                            {
                                "name": "maxResults",
                                "type": "number",
                                "label": "Maximum number of returned items",
                                "required": true
                            },
                            {
                                "help": "Adds an `Accept-Encoding` header to request compressed content.",
                                "name": "gzip",
                                "type": "boolean",
                                "label": "Request compressed content",
                                "default": true,
                                "advanced": true,
                                "required": true
                            }
                        ]
                    }
                },
                {
                    "id": 3,
                    "module": "util:ComposeTransformer",
                    "version": 1,
                    "parameters": {},
                    "filter": {
                        "name": "keywords",
                        "conditions": [
                            [
                                {
                                    "a": "{{2.title}} {{stripHTML(2.description)}} {{2.summary}}",
                                    "o": "text:pattern:ci",
                                    "b": "(?:{{join(map(12.array; 1); \"|\")}})"
                                }
                            ]
                        ]
                    },
                    "mapper": {
                        "value": "📰 {{formatDate(2.dateUpdated; \"DD/MM/YYYY\")}} - <b>{{2.title}}</b><br>\nMatéria completa: {{2.url}}\n<br><br>"
                    },
                    "metadata": {
                        "designer": {
                            "x": 900,
                            "y": 0
                        },
                        "restore": {},
                        "expect": [
                            {
                                "name": "value",
                                "type": "text",
                                "label": "Text",
                                "multiline": true
                            }
                        ]
                    }
                },
                {
                    "id": 4,
                    "module": "util:AggregateAggregator",
                    "version": 1,
                    "parameters": {
                        "feeder": 12,
                        "rowSeparator": "",
                        "columnSeparator": ""
                    },
                    "mapper": {
                        "value": "{{3.value}}"
                    },
                    "metadata": {
                        "designer": {
                            "x": 1200,
                            "y": 0,
                            "messages": [
                                {
                                    "category": "last",
                                    "severity": "warning",
                                    "message": "A transformer should not be the last module in the route."
                                }
                            ]
                        },
                        "restore": {
                            "extra": {
                                "feeder": {
                                    "label": "Array aggregator [12]"
                                }
                            },
                            "parameters": {
                                "rowSeparator": {
                                    "label": "Empty"
                                },
                                "columnSeparator": {
                                    "label": "Empty"
                                }
                            }
                        },
                        "parameters": [
                            {
                                "name": "columnSeparator",
                                "type": "select",
                                "label": "Column separator",
                                "validate": {
                                    "enum": [
                                        "\n",
                                        "\t",
                                        "other"
                                    ]
                                }
                            },
                            {
                                "name": "rowSeparator",
                                "type": "select",
                                "label": "Row separator",
                                "validate": {
                                    "enum": [
                                        "\n",
                                        "\t",
                                        "other"
                                    ]
                                }
                            }
                        ]
                    }
                }
            ]
        }
    ],
    "metadata": {
        "version": 1
    }
}
3 Likes

Thank you very much!

It didn’t work completely from the json you sent, but from it I understood the logic of the aggregator you added and then I was able to make it work in the scenario that I had already structured. I took the opportunity to improve it a little, filtering the date range better and adding a url shortener, since I changed the sources to RSS topics on Google News and the URLs generated there are not exactly friendly.

Once again, thank you very much for your prompt help!

2 Likes

No problem, glad I could help!

Links

Here are some useful links and guides you can use to learn more on how to use the Make platform, apps, and app modules. I found these useful when I was learning Make, and hope they might benefit you too —

General

Help Center Basics

Articles & Videos

2 Likes