How can Make scrape a website, get the product url, image url and add an affiliate code?

:bullseye: What is your goal?

Affiliate posting on social

:thinking: What is the problem & what have you tried?

Getting knowledge

Hi

The short answer is- you can use Apify actors (using the apify modules) and then manipulate the data as you need for you final outcomes.

Cheers!

Welcome to the Make community!

So you basically need to “visit” the site to get the content. This is called Web Scraping. This can seem fairly simple, but get complex very quickly if you encounter the issues described below.

Incomplete Scraping; No Errors?

1. Anti-Scraping; Anti-Bot Measures

Are you getting no output from the HTTP “Make a request” module? This is because the website has employed anti-scraping measures, and has detected that the visit is not made by a human, and has blocked the request silently by returning no content. Hence, you cannot use normal scraping integrations like the HTTP “Make a Request” module to fetch pages from websites like these. This is NOT a Make platform, HTTP, Text Parser, or Regular Expression issue/bug.

Example: Scraping Bee Integration Runtime Error 400

2. Script Tags Do Not Run

Are you getting NO output from the Text Parser “HTML to Text” module? This is because there is NO text content in the HTML! The entire page content you are scraping may be likely hosted in a script tag, which is dynamically generated and placed onto the page using JavaScript when run on the user’s web browser (e.g.: when the page loads, or when an action is taken like on scroll).

Make is a server-side runtime environment, so when you use the HTTP modules it only fetches the initial page code, and all script tags are ignored by the Text Parser “HTML to Text” module because it is not a HTML layout element. Furthermore, the HTTP “Make a request” module also does not run any of those scripts, so no content is loaded on the page. You’ll probably get a default message that tells you to enable JavaScript.

3. Incorrect Regular Expression Pattern

Are you getting the same output as the input when using the Text Parser “Match Pattern” module? Your regular expression pattern may simply be incorrect. A reason for this is that every page is different and only works for a specific page. You also need to ensure that your pattern is built correctly to handle the raw output from the website. One way of building and testing a regular expression pattern is by using a popular tool that I use, regex101.com.

Running Page Scripts; Emulating User Input

For web scraping, a service you can use is ScrapeNinja to get content from the page.

ScrapeNinja allows you to use jQuery-like selectors to extract content from elements by using an extractor function. This is way easier than coming up with a valid and robust[1] regular expression pattern!

ScrapeNinja also can run the page in a real web-browser, loading all the content and running the page load scripts so it closely simulates what you see, as opposed to just the raw page HTML. It can even perform user actions like clicking on elements on the page!

Example: Grab data from page and url

Some tools that ScrapeNinja has provided for free

Use this to test the scraping parameters on web pages:

Use these to build and test the “extractor function”:

If you need help with the above tools, please start a new topic.

AI-powered Web Scraping

You can also use AI-powered web scraping tools like Dumpling AI.

This is probably the easiest and quickest way to set-up, because all you need to do is to describe the content that you want via a prompt.

The plus-side of this is that such services combine BOTH fetching and extracting of the data in a single module (saving operations), and doing away with the lengthy setup and maintenance from the other methods described in the previous sections.

More information; Other methods

For more information on the different methods of web scraping, see my full community blog post here: Overview of Different Web Scraping Techniques in Make 🌐

Hope this helps! If you are still having trouble, please provide more details.

— @samliew
P.S.: investing some effort into the tutorials in the Make Academy will save you lots of time and frustration using Make!


  1. A robust regular expression is one that is reliable, efficient, and handles various potential inputs and edge cases, and is able to fail gracefully. ↩︎

1 Like