How to read a particular HTML tag/DOM Object from HTML content

Panda_Rentals · June 13, 2024, 5:40am

Goal:
Read an HTML string, find a particular node using some kind of query like I would do in javascript with document.queryElement or with XPATH, or any other language. I need to get the exact node.

Tried:
ChatGPT: Too non deterministic. I get different results with same prompt and input
Convert to Text, then use regular expressions: I get results from wrong places, cannot add more logic to refine.

ideal solution: Convert HTML to XML, then use XPATH. Or execute javascript code.

Non acceptable solutions: Invoke a paid service to convert HTML to XML. I expect built in modules to resolve this.

samliew · June 13, 2024, 8:24am

You can either use the Text Parser “Match Pattern” module, or the XML “Perform XPath Query” module. If you can’t do it then it’s probably a misconfiguration, or too advanced for you.

Alternatively,
For web scraping, some apps you can use are ScrapingBee and ScrapeNinja to get content from the page.

I’ve used ScrapeNinja, and you can use jQuery-like selectors there in the extractor function.

ScrapeNinja also can run the page in a web-browser so it closely emulates what users see, as opposed to just the raw page HTML fetched from the HTTP module.

If you want an example, take a look at Grab data from page and url - #5 by samliew

samliew – request private consultation

Join the Make Fans Discord server to chat with other makers!

Panda_Rentals · June 13, 2024, 12:05pm

I’ve tried the Match Pattern and I’m obtaining results, but unreliable.
For instance, I need to match a date, but sometimes the HTML contains several dates in different places, and It’s impossible to identify which match result is the one I need (regex is not enough to get the context).
If I could convert the HTML to XML, I would definitely use xpath. But I can’t find a module to convert HTML to XML (I would use XLST if I had to do it myself, but there is no XSLT module either)

samliew · June 13, 2024, 12:17pm

If you need further assistance, please provide the following:

1. Screenshots of module fields and filters

Please share screenshots of relevant module fields and filters in question? It would really help other community members to see what you’re looking at.

You can upload images here using the Upload icon in the text editor:

2. Scenario blueprint

Please export the scenario blueprint file to allow others to view the mappings and settings. At the bottom of the scenario editor, you can click on the three dots to find the Export Blueprint menu item.

^{(Note: Exporting your scenario will not include private information or keys to your connections)}

Uploading it here will look like this:

blueprint.json (12.3 KB)

3. And most importantly, Input/Output bundles

Please provide the input and output bundles of the trigger/iterator/aggregator modules by running the scenario (or get from the scenario History tab), then click the white speech bubble on the top-right of each module and select “Download input/output bundles”.

A.

Save each bundle contents in your text editor as a bundle.txt file, and upload it here into this discussion thread.

Uploading them here will look like this:

module-1-output-bundle.txt (12.3 KB)

B.

If you are unable to upload files on this forum, alternatively you can paste the formatted bundles in this manner:

Either add three backticks ``` before and after the code, like this:

```
^{input/output bundle content goes here}
```
Or use the format code button in the editor:

Providing the input/output bundles will allow others to replicate what is going on in the scenario even if they do not use the external service.

Following these steps will allow others to assist you here. Thanks!

samliew – request private consultation

Join the Make Fans Discord server to chat with other makers!

Panda_Rentals · June 13, 2024, 12:28pm

I’ve signed up to this [ScrapeNinja]. Do you know if I can provide the raw HTML as input instead of a url? my content comes form an email not public website.

samliew · June 13, 2024, 12:48pm

Hmm, no then. Those are web scrapers only. You didn’t mention the HTML was from an email earlier.

A regex should be reliable enough provided you are skilled to build one.

samliew – request private consultation

Join the Make Fans Discord server to chat with other makers!

Topic		Replies	Views
I need to capture certain content from HTML Questions text-parser , html	7	364	November 12, 2024
Extract content from HTML tag Questions html	1	59	April 22, 2025
How to scrape only specific parts of a website? Questions http , web-scraping	5	1326	June 14, 2024
Basic webscraping with HTTP and Text Parser Beginner Questions functions , web-scraping	2	618	October 3, 2024
How to extract and remove from a html-code Questions arrays	7	4581	May 1, 2024

How to read a particular HTML tag/DOM Object from HTML content

Join the Make Fans Discord server to chat with other makers!

1. Screenshots of module fields and filters

2. Scenario blueprint

3. And most importantly, Input/Output bundles

A.

B.

Join the Make Fans Discord server to chat with other makers!

Join the Make Fans Discord server to chat with other makers!

Related topics