Welcome to the Make community!
ChatGPT doesnât work on Make
âChatGPTâ (including Plus) and the âOpenAI APIâ are two separate products. The consumer chat âChatGPTâ at chatgpt.com is NOT compatible with Make.
Make uses the âOpenAI APIâ via the OpenAI developer platform.
So you basically need to âvisitâ the site to get the content. This is called Web Scraping. This can seem fairly simple, but get complex very quickly if you encounter the issues described below.
Incomplete Scraping; No Errors?
1. Anti-Scraping; Anti-Bot Measures
Are you getting no output from the HTTP âMake a requestâ module? This is because the website has employed anti-scraping measures, and has detected that the visit is not made by a human, and has blocked the request silently by returning no content. Hence, you cannot use normal scraping integrations like the HTTP âMake a Requestâ module to fetch pages from websites like these. This is NOT a Make platform, HTTP, Text Parser, or Regular Expression issue/bug.
Example: Scraping Bee Integration Runtime Error 400
2. Script Tags Do Not Run
Are you getting NO output from the Text Parser âHTML to Textâ module? This is because there is NO text content in the HTML! The entire page content you are scraping may be likely hosted in a script tag, which is dynamically generated and placed onto the page using JavaScript when run on the userâs web browser (e.g.: when the page loads, or when an action is taken like on scroll).
Make is a server-side runtime environment, so when you use the HTTP modules it only fetches the initial page code, and all script tags are ignored by the Text Parser âHTML to Textâ module because it is not a HTML layout element. Furthermore, the HTTP âMake a requestâ module also does not run any of those scripts, so no content is loaded on the page. Youâll probably get a default message that tells you to enable JavaScript.
3. Incorrect Regular Expression Pattern
Are you getting the same output as the input when using the Text Parser âMatch Patternâ module? Your regular expression pattern may simply be incorrect. A reason for this is that every page is different and only works for a specific page. You also need to ensure that your pattern is built correctly to handle the raw output from the website. One way of building and testing a regular expression pattern is by using a popular tool that I use, regex101.com.
4. Authentication Required
See below.
Running Page Scripts; Emulating User Input
For web scraping, a service you can use is ScrapeNinja to get content from the page.
ScrapeNinja allows you to use jQuery-like selectors to extract content from elements by using an extractor function. This is way easier than coming up with a valid and robust regular expression pattern!
ScrapeNinja also can run the page in a real web-browser, loading all the content and running the page load scripts so it closely simulates what you see, as opposed to just the raw page HTML. It can even perform user actions like clicking on elements on the page!
Example: Grab data from page and url
Some tools that ScrapeNinja has provided for free
Use this to test the scraping parameters on web pages:
Use these to build and test the âextractor functionâ:
If you need help with the above tools, please start a new topic.
AI-powered Web Scraping
You can also use AI-powered web scraping tools like Dumpling AI.
This is probably the easiest and quickest way to set-up, because all you need to do is to describe the content that you want via a prompt.
The plus-side of this is that such services combine BOTH fetching and extracting of the data in a single module (saving operations), and doing away with the lengthy setup and maintenance from the other methods described in the previous sections.
More information; Other methods
For more information on the different methods of web scraping, see my full community blog post here: Overview of Different Web Scraping Techniques in Make đ
â @samliew