Web Scraping with Pagination - Repeating Operations

Hi

I’m trying to build a scenario that scrapes a website using pagination.

I’m testing it with just one page first and will use a repeater to loop through the other pages.

The problem is I’m getting infinite repeating operations and I don’t understand why and how to fix it.

Can someone explain why this is happening and how to fix it? Much appreciated…





blueprint.json (27.0 KB)
bundles.txt (147.2 KB)

You set the “Source Module” field of your Array Aggregator incorrectly. Point it to the repeater module?

Hope this helps! Let me know if there are any further questions or issues.

@samliew

P.S.: Investing some effort into the Make Academy will save you lots of time and frustration using Make.

Hi @samiew

I changed my Source Module, but I still have the same problem. The Operations on the last three modules just keep on increasing.

Also, I noticed that it’s only updating the URL variable, while the rest of the variables remain the same.

Any other suggestions? I’m wasting so many Operations every time I test this… money out the window :expressionless:

You’ll need another aggregator for the Match Pattern module too (module [27]), since it outputs more than one bundle.

Aggregators

Every result (item/record) from iterator/list/search/match modules will output a bundle. This can result in multiple bundles, which then trigger multiple operations in future modules (one operation per bundle). To “combine” multiple bundles into a single variable, you’ll need to use an aggregator of some sort.

Aggregators are modules that accumulate multiple bundles into one single bundle. An example of a commonly-used aggregator module is the Array aggregator module. The next popular aggregator is the Text Aggregator which is very flexible and can apply to many use-cases like building of JSON, CSV, HTML.

Hope this helps! Let me know if there are any further questions or issues.

@samliew

P.S.: Investing some effort into the Make Academy will save you lots of time and frustration using Make.

I think I understand what you are saying.

I fixed my Match Pattern Regex (module [27]) so that it outputs only $1 (not $1 & $2).


I am scraping 100 products on each page of the website. So, each Match Pattern module should have 100 records/bundles for a single scenario execution.


Everything seems to be going fine until it reaches Module 27 then it goes crazy.

Blueprint:
blueprint.json (26.8 KB)

Anyone?

I’m blowing about 2000+ operations every time I test this thing. It’s driving me insane (and broke)!!!

Animated GIF

If you don’t set Global Match to NO, then there could be multiple bundles returned - one for every match.

If that is your intention, then you need to aggregate all the matches.

If not, then set Global Match to NO, so that only one match is returned.

Screenshot_2024-10-08_091054

Hope this helps! Let me know if there are any further questions or issues.

@samliew

P.S.: Investing some effort into the Make Academy will save you lots of time and frustration using Make.