Web links from semi structured data

I am trying to get web links from 750 plus web pages. I can get Make to read the urls from a google sheet, return the page data from a http request, but then I get stuck trying to extract that data from the returned request.
Initially I tried scraping bee - but that kind of defeated the object as I already return the data in the https request.
The data is held in one line (always 379) and the links to the articles I want are in sections like this

Welcome to the Make community!

Looks like your images aren’t working. Could you please reupload directly to this forum?

If you need further assistance, please provide the following:

1. Screenshots of module fields and filters

Please share screenshots of relevant module fields and filters in question? It would really help other community members to see what you’re looking at.

You can upload images here using the Upload icon in the text editor:
Screenshot_2023-10-07_111039

2. Scenario blueprint

Please export the scenario blueprint file to allow others to view the mappings and settings. At the bottom of the scenario editor, you can click on the three dots to find the Export Blueprint menu item.

Screenshot_2023-08-24_230826
(Note: Exporting your scenario will not include private information or keys to your connections)

Uploading it here will look like this:

blueprint.json (12.3 KB)

3. And most importantly, Input/Output bundles

Please provide the input and output bundles of the modules by running the scenario (or get from the scenario History tab), then click the white speech bubble on the top-right of each module and select “Download input/output bundles”.
Screenshot_2023-10-06_141025

A.

Save each bundle contents in your text editor as a bundle.txt file, and upload it here into this discussion thread.

Uploading them here will look like this:

module-1-input-bundle.txt (12.3 KB)
module-1-output-bundle.txt (12.3 KB)

B.

If you are unable to upload files on this forum, alternatively you can paste the formatted bundles in this manner:

  • Either add three backticks ``` before and after the code, like this:

    ```
    input/output bundle content goes here
    ```

  • Or use the format code button in the editor:
    Screenshot_2023-10-02_191027

Providing the input/output bundles will allow others to replicate what is going on in the scenario even if they do not use the external service.

Following these steps will allow others to assist you here. Thanks!

2 Likes

Sure - thanks.

Scenario is here

I’m feeding in 750 rows with URLs - at the moment it’s limited to 3 rows

Pulling the html, then trying to pull out the sublinks using the parser to get the a links

That results in 256 output bundles

At the moment all I can do is output those to a sheet as columns and identify which ones to keep ie the links to sub pages.

I’m sure there is a regex to just pull the links from the string in line - attached is a txt file with the html I want to pull

 <article id="node-372033" class="article node node-blog node-teaser"><div class="row teaser-mix-cat"><div class="image-bx col-lg-6 col-md-6 col-sm-6"><a href="/2024372033/zodiac-signs-relationships-improve-week-march-11-17-2024"><img class="img-responsive lazyload" typeof="foaf:Image" src="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEAAQABAAD/2wBDAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/wAALCAABAAIBAREA/8QAFAABAAAAAAAAAAAAAAAAAAAACv/EABoQAAEFAQAAAAAAAAAAAAAAAAQAAgU1dLP/2gAIAQEAAD8AdJI2B2wns9f/2Q==" data-src="https://www.yourtango.com/sites/default/files/styles/header_slider_480/public/image_blog/zodiac-signs-love-lives-improve-week-march-11-17-2024.png?itok=L3y2Pxi_" width="376" height="188" alt="zodiac signs love lives improve week of march 11 - 17, 2024" title="zodiac signs relationships improve week of march 11 - 17, 2024" /><noscript><img class="img-responsive" typeof="foaf:Image" src="https://www.yourtango.com/sites/default/files/styles/header_slider_480/public/image_blog/zodiac-signs-love-lives-improve-week-march-11-17-2024.png?itok=L3y2Pxi_" width="376" height="188" alt="zodiac signs love lives improve week of march 11 - 17, 2024" title="zodiac signs relationships improve week of march 11 - 17, 2024" /></noscript></a></div><div class="texts col-lg-6 col-md-6 col-sm-6"><div class="teaser-mix-left"><h2><a href="/2024372033/zodiac-signs-relationships-improve-week-march-11-17-2024">3 Zodiac Signs Whose Relationships Drasticallly Improve The Week Of March 11 - 17, 2024</a></h2><div class="teaser-mix-dek">Perhaps there really is something to being open-minded.</div></div><div class="row teaser-mix-right"><div class="col-lg-8 col-md-8 col-sm-8 col-xs-7"><div class="row"><div class="col-lg-6 col-md-6 col-sm-6 col-xs-12"><div class="author"><div class='author-name'><div><a href="/users/ruby-miranda" class="yte-stats url fn" data-expert-id="rubymiranda" data-type="profile"><span>Ruby Miranda</span></a></div>Author</div></div></div></div></div><div class="col-lg-4 col-md-4 col-sm-4 col-xs-5 text-right"><a href="/zodiac" class="category active">Zodiac</a><div class="read-more"><a href="/user/login" rel="nofollow">Read Later</a></div></div></div></div></div> </article>

So in summary - 750 master links each have approx 20 links to sub pages and it’s those I want. Approx 15k links. Each set of links is stored in one long (approx 263,000 columns wide) data row on each of the 750 pages.