Array aggregator to "de"encode URL encoding to scrape LinkedIn jobs

Hi,

I’m in the process of creating a scraper for LinkedIn jobs. One of the pieces of information I’m pulling from certain jobs is the url to the actual job link outside of LinkedIn.

In the data, the link is encoded, so I have to "de"encode it.

Below is a screenshot of each code included in a parse module.

I’ve tried using array aggreegator, but it’s either buggy or i’m doing something wrong because when i try to save a module within it - it doesn’t save.

Either way, an aggregator doesn’t seem to cut the number of modules down.

Anyone have a suggestion about how I can cut down on the number of modules in the red box?

Welcome to the Make community!

Usually if you have to chain more than two Text Parsers in a row, you’re probably doing something wrong.

But we can’t see what’s the issue because you’ve not shared what each module is trying to do here.

If you need further assistance, please provide the following:

1. Screenshots of module fields and filters

Please share screenshots of relevant module fields and filters in question? It would really help other community members to see what you’re looking at.

You can upload images here using the Upload icon in the text editor:

2. Scenario blueprint

Please export the scenario blueprint file to allow others to view the mappings and settings. At the bottom of the scenario editor, you can click on the three dots to find the Export Blueprint menu item.


(Note: Exporting your scenario will not include private information or keys to your connections)

Uploading it here will look like this:

blueprint.json (12.3 KB)

3. And most importantly, Input/Output bundles

Please provide the input and output bundles of the trigger/iterator/aggregator modules by running the scenario (or get from the scenario History tab), then click the white speech bubble on the top-right of each module and select “Download input/output bundles”.

A.

Save each bundle contents in your text editor as a bundle.txt file, and upload it here into this discussion thread.

Uploading them here will look like this:

module-1-output-bundle.txt (12.3 KB)

B.

If you are unable to upload files on this forum, alternatively you can paste the formatted bundles in this manner:

  • Either add three backticks ``` before and after the code, like this:

    ```
    input/output bundle content goes here
    ```

  • Or use the format code button in the editor:

Providing the input/output bundles will allow others to replicate what is going on in the scenario even if they do not use the external service.

Following these steps will allow others to assist you here. Thanks!
samliewrequest private consultation

Join the Make Fans Discord server to chat with other makers!

1 Like

Hi @samliew and thanks for the lightning fast reply!

I’m attaching the blueprint here.

I’d love to bring all the Text Parser Replace modules into one, but I’m under the impression that they have to be individualized.

The whole point of this automation is to scrape through a LinkedIn (LI) job post.

As anyone who’s looked for a job on LI knows, there are jobs with links to the individual company’s website, and jobs with the “Easy Apply” option.

The string of modules 4-22 are because LI has the individual company application link in URL encoded format - so each one is addressing a different code (i.e. space = %20).

Is there either an option to put all codes into a single replace? Or is there another free tool to do this?

blueprint (1).json (292.6 KB)

Looks like you can just use a single module to get all the Job Details.

Output

For more information, see Linkedin API on RapidAPI.

The key is don’t do LinkedIn scraping yourself.

How to call an API on RapidAPI

Use the HTTP “Make an API Key Auth Request” module.

Create a new keychain connection and insert your RapidAPI API Key.

Key: <YOUR_RAPIDAPI_KEY>
API Key parameter name: X-RapidAPI-Key

You can reuse this RapidAPI keychain for all API calls to RapidAPI, you’ll just need to change the X-RapidAPI-Host value based on the API you are calling.

samliewrequest private consultation

Join the Make Fans Discord server to chat with other makers!

1 Like

Module Export

You can copy and paste this module export into your scenario. This will paste the modules shown in my screenshots above.

  1. Copy the JSON code below by clicking the copy button when you mouseover the top-right of the code block
    Screenshot_2024-01-17_200117

  2. Enter your scenario editor. Press ESC to close any dialogs. Press CTRLV (paste keyboard shortcut for Windows) to paste directly in the canvas.

  3. Click on each imported module and save it for validation. You may be prompted to remap some variables and connections.

Click to Expand Module Export Code

JSON - Copy and Paste this directly in the scenario editor

{
    "subflows": [
        {
            "flow": [
                {
                    "id": 1,
                    "module": "http:ActionSendDataAPIKeyAuth",
                    "version": 3,
                    "parameters": {
                        "auth": 1,
                        "handleErrors": true
                    },
                    "mapper": {
                        "url": "https://linkedin-api8.p.rapidapi.com/get-job-details",
                        "serializeUrl": false,
                        "method": "get",
                        "headers": [
                            {
                                "name": "X-RapidAPI-Host",
                                "value": "linkedin-api8.p.rapidapi.com"
                            }
                        ],
                        "qs": [
                            {
                                "name": "id",
                                "value": "3959823737"
                            }
                        ],
                        "bodyType": "",
                        "parseResponse": true,
                        "timeout": "",
                        "shareCookies": false,
                        "ca": "",
                        "rejectUnauthorized": true,
                        "followRedirect": true,
                        "useQuerystring": false,
                        "gzip": true,
                        "useMtls": false,
                        "followAllRedirects": false
                    },
                    "metadata": {
                        "designer": {
                            "x": 30,
                            "y": -2810
                        },
                        "restore": {
                            "parameters": {
                                "auth": {
                                    "collapsed": true,
                                    "label": "RapidAPI"
                                }
                            },
                            "expect": {
                                "method": {
                                    "mode": "chose",
                                    "label": "GET"
                                },
                                "headers": {
                                    "mode": "chose",
                                    "items": [
                                        null
                                    ]
                                },
                                "qs": {
                                    "mode": "chose",
                                    "items": [
                                        null
                                    ]
                                },
                                "bodyType": {
                                    "label": "Empty"
                                }
                            }
                        },
                        "parameters": [
                            {
                                "name": "auth",
                                "type": "keychain:apikeyauth",
                                "label": "Credentials",
                                "required": true
                            },
                            {
                                "name": "handleErrors",
                                "type": "boolean",
                                "label": "Evaluate all states as errors (except for 2xx and 3xx )",
                                "required": true
                            }
                        ],
                        "expect": [
                            {
                                "name": "url",
                                "type": "url",
                                "label": "URL",
                                "required": true
                            },
                            {
                                "name": "serializeUrl",
                                "type": "boolean",
                                "label": "Serialize URL",
                                "required": true
                            },
                            {
                                "name": "method",
                                "type": "select",
                                "label": "Method",
                                "required": true,
                                "validate": {
                                    "enum": [
                                        "get",
                                        "head",
                                        "post",
                                        "put",
                                        "patch",
                                        "delete",
                                        "options"
                                    ]
                                }
                            },
                            {
                                "name": "headers",
                                "type": "array",
                                "label": "Headers",
                                "spec": [
                                    {
                                        "name": "name",
                                        "label": "Name",
                                        "type": "text",
                                        "required": true
                                    },
                                    {
                                        "name": "value",
                                        "label": "Value",
                                        "type": "text"
                                    }
                                ]
                            },
                            {
                                "name": "qs",
                                "type": "array",
                                "label": "Query String",
                                "spec": [
                                    {
                                        "name": "name",
                                        "label": "Name",
                                        "type": "text",
                                        "required": true
                                    },
                                    {
                                        "name": "value",
                                        "label": "Value",
                                        "type": "text"
                                    }
                                ]
                            },
                            {
                                "name": "bodyType",
                                "type": "select",
                                "label": "Body type",
                                "validate": {
                                    "enum": [
                                        "raw",
                                        "x_www_form_urlencoded",
                                        "multipart_form_data"
                                    ]
                                }
                            },
                            {
                                "name": "parseResponse",
                                "type": "boolean",
                                "label": "Parse response",
                                "required": true
                            },
                            {
                                "name": "timeout",
                                "type": "uinteger",
                                "label": "Timeout",
                                "validate": {
                                    "max": 300,
                                    "min": 1
                                }
                            },
                            {
                                "name": "shareCookies",
                                "type": "boolean",
                                "label": "Share cookies with other HTTP modules",
                                "required": true
                            },
                            {
                                "name": "ca",
                                "type": "cert",
                                "label": "Self-signed certificate"
                            },
                            {
                                "name": "rejectUnauthorized",
                                "type": "boolean",
                                "label": "Reject connections that are using unverified (self-signed) certificates",
                                "required": true
                            },
                            {
                                "name": "followRedirect",
                                "type": "boolean",
                                "label": "Follow redirect",
                                "required": true
                            },
                            {
                                "name": "useQuerystring",
                                "type": "boolean",
                                "label": "Disable serialization of multiple same query string keys as arrays",
                                "required": true
                            },
                            {
                                "name": "gzip",
                                "type": "boolean",
                                "label": "Request compressed content",
                                "required": true
                            },
                            {
                                "name": "useMtls",
                                "type": "boolean",
                                "label": "Use Mutual TLS",
                                "required": true
                            },
                            {
                                "name": "followAllRedirects",
                                "type": "boolean",
                                "label": "Follow all redirect",
                                "required": true
                            }
                        ]
                    }
                }
            ]
        }
    ],
    "metadata": {
        "version": 1
    }
}

samliewrequest private consultation

Join the Make Fans Discord server to chat with other makers!

2 Likes

@samliew - thanks for help.

I tried rapid API and it is absolutely LOVELY!

That being said, it cost’s an arm and a leg. I burned through the $5 credit within an hour. So i’ll be looking for other solutions.

Thanks again though.

I am also looking for the same. An alternative API for LinkedIn jobs that is less expensive. The Rapid API costs are INSANE!

@Jak - I found something that might help you.

Find the JSON open and close within the page you are getting. In my case it was:

<script type="application/ld\+json"> and </script>

then put a Text Parser Match Pattern right after your http module with the following code:

<script type="application/ld\+json">([\s\S]*?)</script>

after that you have 2 options:

1 - use the chatgpt module to extract the JSON into a nice clean output (but for me there’s a problem*)
2 - use a JSON parser. It extracts the data in a raw way that ends up giving a bunch of different outputs.

take a look at the blueprint and at the screenshot
blueprint (2).json (70.2 KB)
below

*my problem is that the output from chatgpt is a single output and I need each piece of info as a separate output in order to put these into different google sheets columns - so that’s why I went with the JSON parser