Overview of Different Web Scraping Techniques in Make 🌐

samliew · August 23, 2024, 5:51am

As this is a commonly-asked question, I’ve created this post to explore the different methods for web scraping using Make. Each method offers varying levels of complexity and control.

Traditional Web Scraping + Text Parser

If you don’t want to rely on external services which may not be free, you can always fetch the content of the page using the HTTP “Make a request” module, then use a Text Parser “Match Pattern” module to find and return the content in the source code of the page.

To do this effectively, you need to know how to setup regular expression patterns, which can get complex very quickly if you want to match multiple content around the page using a single Match Pattern module. Alternatively, you can use one Match Pattern module per content you want to extract, but this method uses more operations.

Alternatives to consider:

XML “Perform XPath Query” —
You can extract items using XPath, but you have to use one module per extraction.
Set Multiple Variables —
It is possible to use negative regular expressions to remove unwanted content using the replace function, to leave the “match” behind.

Need help with complex web scraping requirements, building a pattern for your Text Parser, AI prompt engineering, or have some other Make-related question?
—> Let’s Talk

Hosted Web Scraping

If you don’t want to deal with web scraping, some apps you can use are ScrapingBee and ScrapeNinja to get content from the page.

ScrapeNinja has jQuery-like selectors in the extractor function, basically it’s how you get elements on a page. This way there are no regular expressions involved, but you can still use regex in the extractor function if you wish.

The main advantages of hosted web scraping services like ScrapeNinja is that it can handle and bypass anti-scrape measures, run the page in a real web-browser, loading all the content and running the page load scripts so it closely simulates what you see, as opposed to just the raw page HTML fetched from the HTTP module. Dedicated scraping services like these makes scraping so much more reliable, because they specialize in one thing and do it well.

If you want an example of ScrapeNinja usage, take a look at Grab data from page and url - #5 by samliew

Alternatives to consider:

ScrapeNinja “Scrape (Real browser)”
ScrapingBee “Extract Data”
0CodeKit’s “Scrape HTML From Website”
Scraptio “Scrape Website Texts”
other Web scraping APIs on RapidAPI —
Search for “scrape” at https://rapidapi.com/search plus whichever service you want to scrape (e.g.: LinkedIn)
Other “Data Extraction” integrations on Make —https://www.make.com/en/integrations/category/data-extraction-collection

References:

News Automation (RSS -> Scraptio -> OpenAI --> Google Sheet): almost there, please help! - #2 by samliew

Need help with complex web scraping requirements, building a pattern for your Text Parser, AI prompt engineering, or have some other Make-related question?
—> Book a Consultation

Either of the Above + AI Structured Data Extraction

You can combine the traditional HTTP scraping or the hosted web scraping method to fetch the source code of the target page, and feed it through an AI that does transformations to structured data (outputs variables/collections, or JSON that you have to put through a Parse JSON module).

This gives you the flexibility to extract content to complex data structures (collections), but there is some prompt engineering and setting up of the data structure, whether it’s via fields (OpenAI), or JSON in the prompt itself (Groq).

References:

Need help with complex web scraping requirements, building a pattern for your Text Parser, AI prompt engineering, or have some other Make-related question?
—> Submit Enquiry

AI-powered Web Scraping

This is probably the easiest and quickest way to set-up, because all you need to do is to describe the content that you want, instead of inspecting the element to create selectors, or having to come up with regular expression patterns.

The plus-side of this is that such services combine BOTH fetching and extracting of the data in a single module (saving operations), and doing away with the lengthy setup from the other methods.

Here is a simple example using the Dumpling AI “Extract data from URL” module:

As you can see, you can do this easily within a few seconds using Dumpling AI. Just map the URL variable in the module, and add the fields that you want extracted from the page! (you don’t even need to specify the type of data)

Also, if you don’t want structured data, and just want to pass the page content to another AI for further analysis, you can use the “Scrape URL” module which also removes unnecessary elements like headers and footers, leaving just the main/article content! This is extremely useful for training LLMs (e.g.: OpenAI, HuggingFace, etc.).

To learn more about Dumpling AI, see the official documentation at API Reference - Dumpling AI Docs

For those comfortable with regular expressions, traditional web scraping with the “Make a request” and “Match Pattern” modules allows for specific control over data extraction. However, this method can become complex when dealing with multiple content points. Hosted web scraping services like ScrapeNinja offer a more user-friendly approach with jQuery-like selectors and the ability to handle anti-scraping measures. AI-powered web scraping with tools like Dumpling AI provides the easiest and quickest setup, requiring only a description of the desired content for extraction. This method offers great ease of use but potentially less control over the specific data points.

View my profile for more useful links and articles like these (you may need to be logged-in to view forum profiles):

— @samliew —> connect with me

Professional Services

Need help with complex web scraping requirements, building a pattern for your Text Parser, AI prompt engineering, or have some other Make-related question?
—> Get Expert Help

samliew · August 23, 2024, 6:11am

Here is more information about the Dumpling AI integration in Make.

AI Agents

AI agents are pretrained on your data and knowledgebase for RAG (Retrieval-Augmented Generation). You can set one up in the dashboard and then call the Dumpling AI “Generate AI Agent Completion” module:

Runs AI Agent completion and returns the result

For more information, see the official documentation at Build Custom AI Agents, Simply.

(source: Dumpling AI website)

Run JavaScript (with plugins)

If you need to run JavaScript/TypeScript with JS libraries (NPM packages) in your scenario, you can consider Dumpling AI’s “JavaScript Code Execution API” available via the “Run Javascript Code” module —

Run your javascript or typescript code and get the result back.

The official documentation on how to use NPM modules with this module can be found here.

DumplingAI also does so much more, see also:

Examples of How to use Dumpling AI

For more information, see these Dumpling AI tutorials below, grouped by category:

YouTube & Videos

Image Generation

AI Agents & RAGs

Searching & Scraping

Other Data Extraction

Business & Social

Dumpling AI Tutorials

In short, Dumpling AI is able to replace several other paid services combined that would cost more than itself, making it a noteworthy choice as the “multi-tool” of AI services.

How to Use

For more information on how to set this up, refer to these forum threads:

View my profile for more useful links and articles like these!

— @samliew —> Connect with me

system · September 22, 2024, 5:51am

This topic was automatically closed after 29 days. New replies are no longer allowed.

Topic		Replies	Views
Scraping data from website like Reddoorz Features web-scraping	2	105	November 21, 2024
Website integration How To functions	2	35	May 25, 2025
Basic webscraping with HTTP and Text Parser Getting Started functions , web-scraping	2	437	October 3, 2024
Convert long string into json How To json , http	3	175	December 16, 2024
How to grab only the main content and main image URL from webpage? Getting Started filters	2	407	December 4, 2024

Overview of Different Web Scraping Techniques in Make 🌐

Traditional Web Scraping + Text Parser

Alternatives to consider:

Hosted Web Scraping

Alternatives to consider:

References:

Either of the Above + AI Structured Data Extraction

References:

AI-powered Web Scraping

Professional Services

AI Agents

Run JavaScript (with plugins)

Examples of How to use Dumpling AI

YouTube & Videos

Image Generation

AI Agents & RAGs

Searching & Scraping

Other Data Extraction

Business & Social

Dumpling AI Tutorials

How to Use

Related topics