I am trying to get web links from 750 plus web pages. I can get Make to read the urls from a google sheet, return the page data from a http request, but then I get stuck trying to extract that data from the returned request.
Initially I tried scraping bee - but that kind of defeated the object as I already return the data in the https request.
The data is held in one line (always 379) and the links to the articles I want are in sections like this
Welcome to the Make community!
Looks like your images aren’t working. Could you please reupload directly to this forum?
If you need further assistance, please provide the following:
1. Screenshots of module fields and filters
Please share screenshots of relevant module fields and filters in question? It would really help other community members to see what you’re looking at.
You can upload images here using the Upload icon in the text editor:
2. Scenario blueprint
Please export the scenario blueprint file to allow others to view the mappings and settings. At the bottom of the scenario editor, you can click on the three dots to find the Export Blueprint menu item.
(Note: Exporting your scenario will not include private information or keys to your connections)
Uploading it here will look like this:
blueprint.json (12.3 KB)
3. And most importantly, Input/Output bundles
Please provide the input and output bundles of the modules by running the scenario (or get from the scenario History tab), then click the white speech bubble on the top-right of each module and select “Download input/output bundles”.
A.
Save each bundle contents in your text editor as a bundle.txt
file, and upload it here into this discussion thread.
Uploading them here will look like this:
module-1-input-bundle.txt (12.3 KB)
module-1-output-bundle.txt (12.3 KB)
B.
If you are unable to upload files on this forum, alternatively you can paste the formatted bundles in this manner:
-
Either add three backticks
```
before and after the code, like this:```
input/output bundle content goes here
``` -
Or use the format code button in the editor:
Providing the input/output bundles will allow others to replicate what is going on in the scenario even if they do not use the external service.
Following these steps will allow others to assist you here. Thanks!
Sure - thanks.
Scenario is here
I’m feeding in 750 rows with URLs - at the moment it’s limited to 3 rows
Pulling the html, then trying to pull out the sublinks using the parser to get the a links
That results in 256 output bundles
At the moment all I can do is output those to a sheet as columns and identify which ones to keep ie the links to sub pages.
I’m sure there is a regex to just pull the links from the string in line - attached is a txt file with the html I want to pull
<article id="node-372033" class="article node node-blog node-teaser"><div class="row teaser-mix-cat"><div class="image-bx col-lg-6 col-md-6 col-sm-6"><a href="/2024372033/zodiac-signs-relationships-improve-week-march-11-17-2024"><img class="img-responsive lazyload" typeof="foaf:Image" src="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEAAQABAAD/2wBDAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/wAALCAABAAIBAREA/8QAFAABAAAAAAAAAAAAAAAAAAAACv/EABoQAAEFAQAAAAAAAAAAAAAAAAQAAgU1dLP/2gAIAQEAAD8AdJI2B2wns9f/2Q==" data-src="https://www.yourtango.com/sites/default/files/styles/header_slider_480/public/image_blog/zodiac-signs-love-lives-improve-week-march-11-17-2024.png?itok=L3y2Pxi_" width="376" height="188" alt="zodiac signs love lives improve week of march 11 - 17, 2024" title="zodiac signs relationships improve week of march 11 - 17, 2024" /><noscript><img class="img-responsive" typeof="foaf:Image" src="https://www.yourtango.com/sites/default/files/styles/header_slider_480/public/image_blog/zodiac-signs-love-lives-improve-week-march-11-17-2024.png?itok=L3y2Pxi_" width="376" height="188" alt="zodiac signs love lives improve week of march 11 - 17, 2024" title="zodiac signs relationships improve week of march 11 - 17, 2024" /></noscript></a></div><div class="texts col-lg-6 col-md-6 col-sm-6"><div class="teaser-mix-left"><h2><a href="/2024372033/zodiac-signs-relationships-improve-week-march-11-17-2024">3 Zodiac Signs Whose Relationships Drasticallly Improve The Week Of March 11 - 17, 2024</a></h2><div class="teaser-mix-dek">Perhaps there really is something to being open-minded.</div></div><div class="row teaser-mix-right"><div class="col-lg-8 col-md-8 col-sm-8 col-xs-7"><div class="row"><div class="col-lg-6 col-md-6 col-sm-6 col-xs-12"><div class="author"><div class='author-name'><div><a href="/users/ruby-miranda" class="yte-stats url fn" data-expert-id="rubymiranda" data-type="profile"><span>Ruby Miranda</span></a></div>Author</div></div></div></div></div><div class="col-lg-4 col-md-4 col-sm-4 col-xs-5 text-right"><a href="/zodiac" class="category active">Zodiac</a><div class="read-more"><a href="/user/login" rel="nofollow">Read Later</a></div></div></div></div></div> </article>
So in summary - 750 master links each have approx 20 links to sub pages and it’s those I want. Approx 15k links. Each set of links is stored in one long (approx 263,000 columns wide) data row on each of the 750 pages.