Deduplicate and sum Array

Hello.

I have an Array of values coming from Google Analytics that contains 3 values

URL - Referrer - Sessions

The URLs sometimes have multiple referrers and I want to be able to consolidate this list down so that the Array contains only unique URLS, but still holds a sum of all of the sessions.

IE…
URL, Referrer, Sessions
mysite.com/mypage, pinterest.com, 4000
mysite.com/mypage, pinterest.ca, 1000
mysite.com/mypage, pinterest.co.uk, 800

Evaluate for duplicates then consolidate to 1 row
mysite.com/mypage, pinterest, 5,800

I’m not actually worried about the referrer, I’m mostly interested in any ideas on how I can deduplicate the URLs and combine the sessions. Any thoughts?

I saw this thread, but writing potentially thousands of values to each data store would be costly at the scale I require.

Any other thoughts on how to deduplicate and sum the values?

Welcome to the Make community!

Output bundles

Please provide example output bundles of the modules by running the scenario, then click the white speech bubble on the top-right of each module and select “Download output bundles”.
Screenshot_2023-10-06_141025

A.

Save the bundle contents in your text editor as a bundle.txt file, and upload it here into this discussion thread.

Uploading it here will look like this:

bundle.txt (12.3 KB)

B.

If you are unable to upload files on this forum, alternatively you can paste the formatted output bundle in this manner:

  • Either add three backticks ``` before and after the code, like this:

    ```
    input/output bundle content goes here
    ```

  • Or use the format code button in the editor:
    Screenshot_2023-10-02_191027

Providing the output bundles will allow others to replicate what is going on in the scenario even if they do not use the external service.

Following these steps will allow others to assist you here. Thanks!

2 Likes

Hello @Matt_Lane,

I worked something up using your CSV as an example.
image

The result was this:

image

I can’t really post a Blueprint because it involves a couple custom data structures, which wouldn’t be included and the scenario would just fail.

The general process is this (7 Modules/Ops total).
The numbers are positional from left to right and don’t all correspond to the final screenshot.

  • Modules 1 (Parse CSV) and 2 (Aggregate to JSON):
    Assuming you’re starting with CSV (my first screenshot), Parse it, then aggregate your data into JSON (this is the first custom data structure).
    This data structure is identical to the source structure except you have an added field called subdomain to hold “pinterest”.
    Subdomain is extracted from Referrer by splitting it by period then taking the first element of the result array.
  • Modules 3 (Parse JSON) and 4 (Array Aggregator):
    Iterate this data then aggregate it into an array.
    You can use this array for lookups/mapping.
  • Module 5 (Iterator):
    Take all the subdomains from the array and iterate those. This is where the map() and deduplicate() functions are used.
  • Module 6 (Aggregate to JSON):
    In each iteration, you pull out the URL, Subdomain, and Sum of Sessions.
    Aggregate these into a new JSON with a custom data structure (the second custom data structure).
    This data structure contains URL, Subdomain, Sum of Sessions (the data points you specified).
  • Module 7 (Parse JSON):
    Parse the final JSON (my second screenshot)

Looks like this:

It can be more efficient, but then it would be much more difficult to understand.
Also while writing this, I already thought of a couple ways to rewrite parts of this, but out of time now.

Does this sound like what you need?

3 Likes

Hi Sam. Here is the Output bundle file. Hopefully that helps.

Output_Bundles.txt (6.1 KB)

@Donald_Mitchell thank you so much for the extended time and description. Unfortunately, I think it’s going to be too resource heavy for me to parse through thousands of records regularly. Unless someone provides some other ideas, I may have to just take this outside of Make to do the dedupe and combine.

Thanks again for your help though. I am going to give it a shot and see if I can get it to work.

Welcome to the Make community!

Yes, that is possible. You’ll need a minimum of three modules, and three operations:

Input showing both cheese breadsticks

Screenshot_2024-04-30_000445

Output

Screenshot_2024-04-30_000442

As you can see, the URL with duplicate entries, “cheese-breadsticks” have a total of 14906.

Give it a go and let us know if you have any issues!

Module Export

You can copy and paste this module export into your scenario. This will paste the modules shown in my screenshots above.

  1. Copy the JSON code below by clicking the copy button when you mouseover the top-right of the code block
    Screenshot_2024-01-17_200117

  2. Enter your scenario editor. Press ESC to close any dialogs. Press CTRLV (paste keyboard shortcut for Windows) to paste directly in the canvas.

  3. Click on each imported module and save it for validation. You may be prompted to remap some variables and connections.

JSON

{
    "subflows": [
        {
            "flow": [
                {
                    "id": 96,
                    "module": "builtin:BasicFeeder",
                    "version": 1,
                    "parameters": {},
                    "mapper": {
                        "array": "{{95.body.rows}}"
                    },
                    "metadata": {
                        "designer": {
                            "x": 1099,
                            "y": -624,
                            "name": "Iterate Rows"
                        },
                        "restore": {
                            "expect": {
                                "array": {
                                    "mode": "edit"
                                }
                            }
                        },
                        "expect": [
                            {
                                "name": "array",
                                "type": "array",
                                "label": "Array",
                                "mode": "edit",
                                "spec": []
                            }
                        ]
                    }
                },
                {
                    "id": 98,
                    "module": "builtin:BasicAggregator",
                    "version": 1,
                    "parameters": {
                        "feeder": 96
                    },
                    "mapper": {
                        "dimensionValues": "{{96.dimensionValues}}",
                        "metricValues": "{{96.metricValues}}"
                    },
                    "metadata": {
                        "designer": {
                            "x": 1341,
                            "y": -624,
                            "name": "Group by URL"
                        },
                        "restore": {
                            "extra": {
                                "feeder": {
                                    "label": "Iterator [96]"
                                },
                                "target": {
                                    "label": "Custom"
                                }
                            }
                        },
                        "advanced": true
                    },
                    "flags": {
                        "groupBy": "{{96.dimensionValues.1.value}}",
                        "stopIfEmpty": true
                    }
                },
                {
                    "id": 101,
                    "module": "util:TextAggregator",
                    "version": 1,
                    "parameters": {
                        "rowSeparator": "\n",
                        "feeder": 98
                    },
                    "mapper": {
                        "value": "{{98.`__IMTKEY__`}}, {{sum(map(98.array; \"metricValues.1.value\"))}}"
                    },
                    "metadata": {
                        "designer": {
                            "x": 1583,
                            "y": -624,
                            "name": "Sum values in array",
                            "messages": [
                                {
                                    "category": "last",
                                    "severity": "warning",
                                    "message": "A transformer should not be the last module in the route."
                                }
                            ]
                        },
                        "restore": {
                            "parameters": {
                                "rowSeparator": {
                                    "label": "New row"
                                }
                            },
                            "extra": {
                                "feeder": {
                                    "label": "Array aggregator [98]"
                                }
                            }
                        },
                        "parameters": [
                            {
                                "name": "rowSeparator",
                                "type": "select",
                                "label": "Row separator",
                                "validate": {
                                    "enum": [
                                        "\n",
                                        "\t",
                                        "other"
                                    ]
                                }
                            }
                        ],
                        "expect": [
                            {
                                "name": "value",
                                "type": "text",
                                "multiline": true,
                                "label": "Text"
                            }
                        ],
                        "advanced": true
                    },
                    "flags": {
                        "stopIfEmpty": true
                    }
                }
            ]
        }
    ],
    "metadata": {
        "version": 1
    }
}

samliewrequest private consultation

2 Likes

This is amazing @samliew !!! Thank you for the help.

So I’m now stuck with a little data structure confusion. In the 3 modules you suggested, I have a array of comma separated text fields. I want to first sort those fields by the highest session counts, then write the top 10 values to my DB in separate fields.

URL Sessions (sorted Z-A)

I know there is a simple way to do this, but I’m struggling to figure out the sort on the sessions field. Could you maybe provide me a little more direction on that?

Thanks!
Matt

No problem, glad I could help!

1. If you have a new question in the future, please start a new thread. This makes it easier for others with the same problem to search for the answers to specific questions, and you are more likely to receive help since newer questions are monitored closely. You can share a link to this thread for more context if necessary.

2. The Make Community guidelines encourages users to try to mark helpful replies as solutions to help keep the Community organized.

This marks the topic as solved, so that:

  • others can save time when catching up with the latest activity here, and
  • allows others to quickly jump to the solution if they come across the same problem

To do this, simply click the checkbox at the bottom of the post that answers your question:
Screenshot_2023-10-04_161049

3. Don’t forget to like and bookmark this topic so you can get back to it easily in future!

samliewrequest private consultation

1 Like

Thank you @samliew for the solution. For others reference, the solution provided leaves you with a single list of comma separated text values. To make it usable for additional work, I had to convert it to CSV (parse CSV block) and then run it through an Array aggregator to get it into a standard form.

Other than that, it works wonderfully!

Donald! This is actually very close to what I had to do in order to not only combine the values, but then order the new array based on a numeric value. I actually combined what @samliew added in this post with your JSON based transforms and was able to aggregate, deduplicate, and then sort it numerically. SUPER complicated set of 10 blocks to do all of this, but with a lot of help I was able to get it to work. Thanks for your efforts on this!

Glad you got it all sorted out!
Just curious, what format was your source data in? I don’t think you ever mentioned that… or was it in fact CSV?

2 Likes

@Donald_Mitchell Thanks! The source data was from Google Analytics 4. It is formatted in a slightly complex way. I think that’s what made this a bit confusing for me. It’s an array of “rows” that contain a bunch more nested values. Probably an easier way to do everything that I did, but the complexity makes me go a bit cross-eyed :slight_smile:

image

2 Likes