How to access and get data of content of a specific news from a website and add it to a google sheet

Hi, i have a scenario like this, i try so many ways but i cant build one, can somebody help me. The scenario wil be like the following:

  • Retrieve new documents from the “Thư viện pháp luật” (https://thuvienphapluat.vn/) website
  • Filter documents related to science and technology, innovation, and digital transformation
  • Summarize the content (Title, time, content) of a specific post
  • Record the information in a Google Sheet
    I guess the most struggling part is when how to access to a specific news automatically and get the data from it. Anyways, please help me. Thank you everyone a lot.
1 Like

Welcome to the Make community!

So you basically need to “visit” the site yourself to get the content. This is called Web Scraping.

Incomplete Scraping

Are you getting NO output from the Text Parser “HTML to Text” module? This is because there is NO text content in the HTML! The entire page content you are scraping is hosted in a script tag, which is dynamically generated and placed onto the page using JavaScript when loaded and run on the user’s web browser on the client-side. Make is a server-side runtime environment, so using the HTTP modules, you get just the script tags, and those script tags are ignored by the Text Parser “HTML to Text” module because it is NOT a HTML layout element.

Using the Make HTTP “Make a request” does NOT run any of those JavaScript scripts, so there is no content on the page other than a default message that tells you to enable JavaScript.

This is NOT a Make platform, or Text Parser, or Regular Expression issue/bug.

You CANNOT use normal scraping integrations like ScrapingBee or HTTP “Make a request” module to fetch this page’s structure.

You will need to use ScrapeNinja’s “Scrape (Real browser)” module to emulate a real person visiting the site using a web browser, as client-side JavaScript needs to run to parse the JSON data in the script tags, and generate the page structure and content.

For more information and demo using ScrapeNinja, see Scraping Bee Integration Runtime Error 400

Web Scraping

For web scraping, a service you can use is ScrapeNinja to get content from the page.

ScrapeNinja allows you to use jQuery-like selectors to extract content from elements by using an extractor function. ScrapeNinja also can run the page in a real web-browser, loading all the content and running the page load scripts so it closely simulates what you see, as opposed to just the raw page HTML fetched from the HTTP module.

If you want an example, take a look at Grab data from page and url - #5 by samliew

AI-powered “easier” method

You can also use AI-powered web scraping tools like Dumpling AI.

This is probably the easiest and quickest way to set-up, because all you need to do is to describe the content that you want, instead of inspecting the element to create selectors, or having to come up with regular expression patterns.

The plus-side of this is that such services combine BOTH fetching and extracting of the data in a single module (saving operations), and doing away with the lengthy setup from the other methods.

More information, other methods

For more information on the different methods of web scraping, see Overview of Different Web Scraping Techniques in Make 🌐

If you need further assistance, please provide the following:

1. Relevant Screenshots

Could you please share screenshots of your full scenario? Also include screenshots of any error messages, module settings (fields), relevant filter settings (conditions), and module output bundles. We need to see what you’re working with to give you the best advice.

You can upload images here using the Upload icon in the text editor:

We would appreciate it if you could upload screenshots here instead of linking to them outside of the forum. This allows us to zoom in on the image when clicked, and prevent tracking cookies from third-party websites.

2. Scenario Blueprint

Please export the scenario blueprint. Providing your scenario blueprint file will allow others to quickly recreate and see how you have set up the mappings in each module, and also allows us take screenshots or provide module exports of any solutions we have for you in return - this would greatly benefit you in implementing our suggestions as you can simply paste module exports back into your scenario editor!

To export your scenario blueprint, click the three dots at the bottom of the editor then choose ‘Export Blueprint’.

You can upload the file here by clicking on this button:

3. Output Bundles of Modules

Please provide the output bundles of each of the relevant modules by running the scenario (you can also get this without re-running your scenario from the History tab).

Click on the white speech bubbles on the top-right of each module and select “Download input/output bundles”.

A. Upload as a Text File

Save each bundle contents in a plain text editor (without formatting) as a bundle.txt file.

You can upload the file here by clicking on this button:

B. Insert as Formatted Code Block

If you are unable to upload files on this forum, alternatively you can paste the formatted bundles.
These are the two ways to format text so that it won’t be modified by the forum:

  • Method 1: Type code block manually

    Add three backticks ``` before and after the content/bundle, like this:

    ```
    content goes here
    ```

  • Method 2. Highlight and click the format button in the editor

Providing the input/output bundles will allow others to replicate what is going on in the scenario, especially if there are complex data structures (nested arrays and collections) or if external services are involved, and help you with mapping the raw property names from collections.

Sharing these details will make it easier for others to assist you.

Hope this helps! Let me know if there are any further questions or issues.

@samliew

P.S.: Investing some effort into the Make Academy will save you lots of time and frustration using Make.