Issue with Scraping a Website

Hey guys,

I’m trying to scrape a job listing website and facing an issue. Could you please watch my short loom and maybe advise some solution? I’d really appreciate it!

Welcome to the Make community!

Every result (item/record) from a search/match module will output a bundle. To “combine” them into a single structure, you’ll need to use an aggregator of some sort.

Aggregators are modules that accumulate multiple bundles into one single bundle. An example of a commonly-used aggregator module is the Array aggregator module. The next popular aggregator is the Text Aggregator which is very flexible and has applies to many use-cases.

If you need further assistance, please provide the following:

1. Screenshots of module fields and filters

Please share screenshots of relevant module fields and filters in question? It would really help other community members to see what you’re looking at.

You can upload images here using the Upload icon in the text editor:
Screenshot_2023-10-07_111039

2. Scenario blueprint

Please export the scenario blueprint file to allow others to view the mappings and settings. At the bottom of the scenario editor, you can click on the three dots to find the Export Blueprint menu item.

Screenshot_2023-08-24_230826
(Note: Exporting your scenario will not include private information or keys to your connections)

Uploading it here will look like this:

blueprint.json (12.3 KB)

3. And most importantly, Input/Output bundles

Please provide the input and output bundles of the modules by running the scenario (or get from the scenario History tab), then click the white speech bubble on the top-right of each module and select “Download input/output bundles”.
Screenshot_2023-10-06_141025

A.

Save each bundle contents in your text editor as a bundle.txt file, and upload it here into this discussion thread.

Uploading them here will look like this:

module-1-input-bundle.txt (12.3 KB)
module-1-output-bundle.txt (12.3 KB)

B.

If you are unable to upload files on this forum, alternatively you can paste the formatted bundles in this manner:

  • Either add three backticks ``` before and after the code, like this:

    ```
    input/output bundle content goes here
    ```

  • Or use the format code button in the editor:
    Screenshot_2023-10-02_191027

Providing the input/output bundles will allow others to replicate what is going on in the scenario even if they do not use the external service.

Following these steps will allow others to assist you here. Thanks!

samliewrequest private consultation

Join the unofficial Make Discord server to chat with other makers!

1 Like

Hi,

Thanks for your reply! I’m not sure how I can apply aggregator to my case. I would appreciate some help.
I uploaded the blueprint.
blueprint(1).json (56.9 KB)

The issue is when I parse all the needed data (job title, job ref #, etc. with different match pattern modules, they all work fine separately but at the end the values get mixed in the airtable. Job title from bundle 1 is next to job ref # from bundle 2. So how can I make sure to deliver all values of bundle one in row 1, all values of the bundle to row 2, and so on in my scenario? Please help

Welcome to the Make community!

Yes, that is possible. You’ll need a minimum of one module:

Screenshot_2024-05-25_141956

In your HTTP module settings “Parse response” field, select “YES” (it is “No” by default)

Screenshot_2023-12-19_141214

This will allow you to map the response collection properties (variables) in subsequent modules.

samliewrequest private consultation

Join the unofficial Make Discord server to chat with other makers!

1 Like

Module Export

You can copy and paste this module export into your scenario. This will paste the modules shown in my screenshots above.

  1. Copy the JSON code below by clicking the copy button when you mouseover the top-right of the code block
    Screenshot_2024-01-17_200117

  2. Enter your scenario editor. Press ESC to close any dialogs. Press CTRLV (paste keyboard shortcut for Windows) to paste directly in the canvas.

  3. Click on each imported module and save it for validation. You may be prompted to remap some variables and connections.

View Module Export Code

JSON

{
    "subflows": [
        {
            "flow": [
                {
                    "id": 46,
                    "module": "http:ActionSendData",
                    "version": 3,
                    "parameters": {
                        "handleErrors": true,
                        "useNewZLibDeCompress": true
                    },
                    "mapper": {
                        "url": "http://jobsapi.staffcv.com/api/Job?organization=2052",
                        "serializeUrl": false,
                        "method": "get",
                        "headers": [],
                        "qs": [],
                        "bodyType": "",
                        "parseResponse": true,
                        "authUser": "",
                        "authPass": "",
                        "timeout": "",
                        "shareCookies": false,
                        "ca": "",
                        "rejectUnauthorized": true,
                        "followRedirect": true,
                        "useQuerystring": false,
                        "gzip": true,
                        "useMtls": false,
                        "followAllRedirects": false
                    },
                    "metadata": {
                        "designer": {
                            "x": 824,
                            "y": 356
                        },
                        "restore": {
                            "expect": {
                                "method": {
                                    "mode": "chose",
                                    "label": "GET"
                                },
                                "headers": {
                                    "mode": "chose"
                                },
                                "qs": {
                                    "mode": "chose"
                                },
                                "bodyType": {
                                    "label": "Empty"
                                }
                            }
                        },
                        "parameters": [
                            {
                                "name": "handleErrors",
                                "type": "boolean",
                                "label": "Evaluate all states as errors (except for 2xx and 3xx )",
                                "required": true
                            },
                            {
                                "name": "useNewZLibDeCompress",
                                "type": "hidden"
                            }
                        ],
                        "expect": [
                            {
                                "name": "url",
                                "type": "url",
                                "label": "URL",
                                "required": true
                            },
                            {
                                "name": "serializeUrl",
                                "type": "boolean",
                                "label": "Serialize URL",
                                "required": true
                            },
                            {
                                "name": "method",
                                "type": "select",
                                "label": "Method",
                                "required": true,
                                "validate": {
                                    "enum": [
                                        "get",
                                        "head",
                                        "post",
                                        "put",
                                        "patch",
                                        "delete",
                                        "options"
                                    ]
                                }
                            },
                            {
                                "name": "headers",
                                "type": "array",
                                "label": "Headers",
                                "spec": [
                                    {
                                        "name": "name",
                                        "label": "Name",
                                        "type": "text",
                                        "required": true
                                    },
                                    {
                                        "name": "value",
                                        "label": "Value",
                                        "type": "text"
                                    }
                                ]
                            },
                            {
                                "name": "qs",
                                "type": "array",
                                "label": "Query String",
                                "spec": [
                                    {
                                        "name": "name",
                                        "label": "Name",
                                        "type": "text",
                                        "required": true
                                    },
                                    {
                                        "name": "value",
                                        "label": "Value",
                                        "type": "text"
                                    }
                                ]
                            },
                            {
                                "name": "bodyType",
                                "type": "select",
                                "label": "Body type",
                                "validate": {
                                    "enum": [
                                        "raw",
                                        "x_www_form_urlencoded",
                                        "multipart_form_data"
                                    ]
                                }
                            },
                            {
                                "name": "parseResponse",
                                "type": "boolean",
                                "label": "Parse response",
                                "required": true
                            },
                            {
                                "name": "authUser",
                                "type": "text",
                                "label": "User name"
                            },
                            {
                                "name": "authPass",
                                "type": "password",
                                "label": "Password"
                            },
                            {
                                "name": "timeout",
                                "type": "uinteger",
                                "label": "Timeout",
                                "validate": {
                                    "max": 300,
                                    "min": 1
                                }
                            },
                            {
                                "name": "shareCookies",
                                "type": "boolean",
                                "label": "Share cookies with other HTTP modules",
                                "required": true
                            },
                            {
                                "name": "ca",
                                "type": "cert",
                                "label": "Self-signed certificate"
                            },
                            {
                                "name": "rejectUnauthorized",
                                "type": "boolean",
                                "label": "Reject connections that are using unverified (self-signed) certificates",
                                "required": true
                            },
                            {
                                "name": "followRedirect",
                                "type": "boolean",
                                "label": "Follow redirect",
                                "required": true
                            },
                            {
                                "name": "useQuerystring",
                                "type": "boolean",
                                "label": "Disable serialization of multiple same query string keys as arrays",
                                "required": true
                            },
                            {
                                "name": "gzip",
                                "type": "boolean",
                                "label": "Request compressed content",
                                "required": true
                            },
                            {
                                "name": "useMtls",
                                "type": "boolean",
                                "label": "Use Mutual TLS",
                                "required": true
                            },
                            {
                                "name": "followAllRedirects",
                                "type": "boolean",
                                "label": "Follow all redirect",
                                "required": true
                            }
                        ]
                    }
                },
                {
                    "id": 48,
                    "module": "builtin:BasicFeeder",
                    "version": 1,
                    "parameters": {},
                    "mapper": {
                        "array": "{{46.data.queryResult}}"
                    },
                    "metadata": {
                        "designer": {
                            "x": 1070,
                            "y": 359,
                            "messages": [
                                {
                                    "category": "last",
                                    "severity": "warning",
                                    "message": "A transformer should not be the last module in the route."
                                }
                            ]
                        },
                        "restore": {
                            "expect": {
                                "array": {
                                    "mode": "edit"
                                }
                            }
                        },
                        "expect": [
                            {
                                "name": "array",
                                "type": "array",
                                "label": "Array",
                                "mode": "edit",
                                "spec": []
                            }
                        ]
                    }
                }
            ]
        }
    ],
    "metadata": {
        "version": 1
    }
}

Here are some useful links and guides you can use to learn more on how to use the Make platform, apps, and app modules. I found these useful when I was learning Make, and hope they might benefit you too —

General

Help Center Basics

Articles & Videos

samliewrequest private consultation

Join the unofficial Make Discord server to chat with other makers!

1 Like