As this is a commonly-asked question, Iâve created this post to explore the different methods for web scraping using Make. Each method offers varying levels of complexity and control.
Traditional Web Scraping + Text Parser
If you donât want to rely on external services which may not be free, you can always fetch the content of the page using the HTTP âMake a requestâ module, then use a Text Parser âMatch Patternâ module to find and return the content in the source code of the page.
To do this effectively, you need to know how to setup regular expression patterns, which can get complex very quickly if you want to match multiple content around the page using a single Match Pattern module. Alternatively, you can use one Match Pattern module per content you want to extract, but this method uses more operations.
Alternatives to consider:
- XML âPerform XPath Queryâ â
You can extract items using XPath, but you have to use one module per extraction. - Set Multiple Variables â
It is possible to use negative regular expressions to remove unwanted content using thereplace
function, to leave the âmatchâ behind.
Need help with complex web scraping requirements, building a pattern for your Text Parser, AI prompt engineering, or have some other Make-related question?
â> Letâs Talk
Hosted Web Scraping
If you donât want to deal with web scraping, some apps you can use are ScrapingBee and ScrapeNinja to get content from the page.
ScrapeNinja has jQuery-like selectors in the extractor function, basically itâs how you get elements on a page. This way there are no regular expressions involved, but you can still use regex in the extractor function if you wish.
The main advantages of hosted web scraping services like ScrapeNinja is that it can handle and bypass anti-scrape measures, run the page in a real web-browser, loading all the content and running the page load scripts so it closely simulates what you see, as opposed to just the raw page HTML fetched from the HTTP module. Dedicated scraping services like these makes scraping so much more reliable, because they specialize in one thing and do it well.
If you want an example of ScrapeNinja usage, take a look at Grab data from page and url - #5 by samliew
Alternatives to consider:
- ScrapeNinja âScrape (Real browser)â
- ScrapingBee âExtract Dataâ
- 0CodeKitâs âScrape HTML From Websiteâ
- Scraptio âScrape Website Textsâ
- other Web scraping APIs on RapidAPI â
Search for âscrapeâ at https://rapidapi.com/search plus whichever service you want to scrape (e.g.: LinkedIn) - Other âData Extractionâ integrations on Make âhttps://www.make.com/en/integrations/category/data-extraction-collection
References:
Need help with complex web scraping requirements, building a pattern for your Text Parser, AI prompt engineering, or have some other Make-related question?
â> Book a Consultation
Either of the Above + AI Structured Data Extraction
You can combine the traditional HTTP scraping or the hosted web scraping method to fetch the source code of the target page, and feed it through an AI that does transformations to structured data (outputs variables/collections, or JSON that you have to put through a Parse JSON module).
This gives you the flexibility to extract content to complex data structures (collections), but there is some prompt engineering and setting up of the data structure, whether itâs via fields (OpenAI), or JSON in the prompt itself (Groq).
References:
- Using Chat GPT to extract data from email
- Help Needed: Structuring Website Form Data into a JSON Array - #2 by drnic
- OpenAI respond can't respond with json structure - #2 by samliew
- News Automation (RSS -> Scraptio -> OpenAI --> Google Sheet): almost there, please help! - #3 by samliew
- How to cleanup HTTP get request (html object)
Need help with complex web scraping requirements, building a pattern for your Text Parser, AI prompt engineering, or have some other Make-related question?
â> Submit Enquiry
AI-powered Web Scraping
This is probably the easiest and quickest way to set-up, because all you need to do is to describe the content that you want, instead of inspecting the element to create selectors, or having to come up with regular expression patterns.
The plus-side of this is that such services combine BOTH fetching and extracting of the data in a single module (saving operations), and doing away with the lengthy setup from the other methods.
Here is a simple example using the Dumpling AI âExtract data from URLâ module:
As you can see, you can do this easily within a few seconds using Dumpling AI. Just map the URL variable in the module, and add the fields that you want extracted from the page! (you donât even need to specify the type of data)
Also, if you donât want structured data, and just want to pass the page content to another AI for further analysis, you can use the âScrape URLâ module which also removes unnecessary elements like headers and footers, leaving just the main/article content! This is extremely useful for training LLMs (e.g.: OpenAI, HuggingFace, etc.).
To learn more about Dumpling AI, see the official documentation at Introduction - Dumpling AI Docs
For those comfortable with regular expressions, traditional web scraping with the âMake a requestâ and âMatch Patternâ modules allows for specific control over data extraction. However, this method can become complex when dealing with multiple content points. Hosted web scraping services like ScrapeNinja offer a more user-friendly approach with jQuery-like selectors and the ability to handle anti-scraping measures. AI-powered web scraping with tools like Dumpling AI provides the easiest and quickest setup, requiring only a description of the desired content for extraction. This method offers great ease of use but potentially less control over the specific data points.
Please leave a comment below if you have other ways you do web scraping.
View my profile for more useful links and articles like these (you need to be logged-in to view forum profiles):
â @samliew â> connect with me
Professional Services
Need help with complex web scraping requirements, building a pattern for your Text Parser, AI prompt engineering, or have some other Make-related question?
â> Get Expert Help
P.S.: Did you know, the concepts of about 70% of questions asked on this forum are already covered in the Make Academy. Investing some effort into it will save you lots of time and frustration using Make later!