How can I extract a all the URLs from a sitemap.xml to Google Sheets

Hey,

I want to list all URLs from a sitemap.xml to a Google Sheets.

I also need to skip the home page URL (so, I don’t want to list the https://www.domain.com as one of the scrapped URLs).

Thanks! :slight_smile:

Please provide an example of what the sitemap.xml looks like.

If you need further assistance, please provide the following:

1. Relevant Screenshots

Please share screenshots of your scenario, any error messages, relevant module fields, and filters in question? It would really help other community members to see what you’re looking at.

You can upload images here using the Upload icon in the text editor:

2. Scenario Blueprint

Please export the scenario blueprint file to allow others to view the mapped variables in the module fields. At the bottom of the scenario editor, you can click on the three dots to find the Export Blueprint menu item.

3. Output Bundles of Modules

Please provide the output bundles of the modules by running the scenario (or get from the scenario History tab), then click the white speech bubble on the top-right of each module and select “Download input/output bundles”.

A. Upload as Text File

Save each bundle contents in your text editor as a bundle.txt file, and upload it here into this discussion thread.

B. Insert as Formatted Code Block

If you are unable to upload files on this forum, alternatively you can paste the formatted bundles.
These are the two ways to format text so that it won’t be modified by the forum:

  • Method 1: Type code block manually

    Add three backticks ``` before and after the content/bundle, like this:

    ```
    content goes here
    ```

  • Method 2. Highlight and click the format button in the editor

Providing the input/output bundles will allow others to replicate what is going on in the scenario even if they do not use the external service.

Following these steps will allow others to assist you here. Thanks!

It’s a “default” Wordpress sitemap: https://www.matematicafinanceira.net/page-sitemap.xml

This is what I suspect is my probable scenario solution:

Here’s my
blueprint.json (39.9 KB).

I don’t have the output bundles because I’m struggling to get there… :slight_smile:

Welcome to the Make community!

Yes, that is possible. You’ll need a minimum of four modules (4 operations):

This is just an example. Your final solution may or may not look like this depending on your requirements.

Module Export - quick import into your scenario

You can copy and paste this module export into your scenario. This will import the modules (with fields/settings/filters) shown in my screenshots above.

  1. Move your mouse over the line of code below. Copy the JSON by clicking the copy button on the right of the code, which looks like this:

  2. Enter your scenario editor. Press ESC to close any dialogs. Press CTRLV (paste keyboard shortcut for Windows) to paste directly in the editor.

  3. Click on each imported module and re-save it for validation. There may be some errors prompting you to remap some variables and connections.

JSON module export — paste this directly in your scenario

{"subflows":[{"flow":[{"id":197,"module":"http:ActionSendData","version":3,"parameters":{"handleErrors":true,"useNewZLibDeCompress":true},"mapper":{"url":"https://www.matematicafinanceira.net/page-sitemap.xml","serializeUrl":false,"method":"get","headers":[],"qs":[],"bodyType":"","parseResponse":false,"authUser":"","authPass":"","timeout":"","shareCookies":false,"ca":"","rejectUnauthorized":true,"followRedirect":true,"useQuerystring":false,"gzip":true,"useMtls":false,"followAllRedirects":false},"metadata":{"designer":{"x":2622,"y":-392,"name":"Get sitemap"},"parameters":[{"name":"handleErrors","type":"boolean","label":"Evaluate all states as errors (except for 2xx and 3xx )","required":true},{"name":"useNewZLibDeCompress","type":"hidden"}]}},{"id":198,"module":"regexp:Parser","version":1,"parameters":{"pattern":"<loc>(?<url>[^<]+)</loc>","global":true,"sensitive":true,"multiline":false,"singleline":false,"continueWhenNoRes":false,"ignoreInfiniteLoopsWhenGlobal":false},"mapper":{"text":"{{toString(197.data)}}"},"metadata":{"designer":{"x":2868,"y":-393},"parameters":[{"name":"pattern","type":"text","label":"Pattern","required":true},{"name":"global","type":"boolean","label":"Global match","required":true},{"name":"sensitive","type":"boolean","label":"Case sensitive","required":true},{"name":"multiline","type":"boolean","label":"Multiline","required":true},{"name":"singleline","type":"boolean","label":"Singleline","required":true},{"name":"continueWhenNoRes","type":"boolean","label":"Continue the execution of the route even if the module finds no matches","required":true},{"name":"ignoreInfiniteLoopsWhenGlobal","type":"boolean","label":"Ignore errors when there is an infinite search loop","required":true}]}},{"id":199,"module":"builtin:BasicAggregator","version":1,"parameters":{"target":"200.rows","feeder":198},"mapper":{"values":["{{198.url}}"]},"metadata":{"designer":{"x":3111,"y":-393}}},{"id":200,"module":"google-sheets:addMultipleRows","version":2,"parameters":{"__IMTCONN__":95013},"mapper":{"spreadsheetId":"1Ch3XfA7gFKLSQZAezl9DSZgMMqoU1idB_lwWKXLja9M","insertUnformatted":false,"valueInputOption":"USER_ENTERED","insertDataOption":"INSERT_ROWS","sheetId":"Test Sheet","tableFirstRow":"A1:Z1","rows":"{{199.array}}"},"metadata":{"designer":{"x":3352,"y":-396},"parameters":[{"name":"__IMTCONN__","type":"account:google","label":"Connection","required":true}]}}]}],"metadata":{"version":1}}

Note: Did you know you can reduce the size of blueprints and module export code like the above, using the Make Blueprint Scrubber?

Hope this helps! Let me know if there are any further questions or issues.

@samliew


P.S.: Did you know, the concepts of about 70% of questions asked on this forum are already covered in the Make Academy. Investing some effort into it will save you lots of time and frustration using Make later!

1 Like

Thanks, @samliew

The first 3 modules go well. When I reach the last one (Bulk Add Rows), it fails and says:

The operation failed with an error. Function ‘extractBulkModuleOutput’ finished with error! Cannot read properties of null (reading ‘1’)

But the data is there. IF I hit to run the scenario again, it runs error free, but I have the URLs added again.

It’s not clear from the error log what wasn’t found…

Do you mean the Sheets module fails, yet it still inserts the data?

What does your aggregator and sheets module look like now?

@samliew


P.S.: Did you know, the concepts of about 70% of questions asked on this forum are already covered in the Make Academy. Investing some effort into it will save you lots of time and frustration using Make later!

Sorry for the delay, @samliew. I was on holiday.

I have tested again, adding a 2nd and 3rd sitemaps. They work perfectly.

Thank you! :slight_smile: