Text Parsing html

AlT · November 18, 2024, 2:16pm

I have downloaded a page’s html using HTTP get request and now I’m trying to extract the page’s category tags which are shown in the text.

Here’s an example source url: Example page - see Categories towards the bottom above the map

If the page categories include ‘Accommodation’ ‘Guest House-B&B’ or ‘Restaurant’ etc then I need to extract this info and create a list which I can then populate into a google sheet.

So far, I’ve got the html tag (using chatgpt to create the regex) which contains the category information but I cannot seem to extract just the category text.

The expression I’m using to extract the phrase so far is:

Categories:</strong>\s*((?:<[^>]+>[^<]+</[^>]+>\s*,?\s*)+)

And that’s giving me this output:

<a href="https://www.discovercarlisle.co.uk/eat-drink/category/accomodation" class="Accomodation EDNcategorycolor-default">Accomodation</a>, <a href="https://www.discovercarlisle.co.uk/eat-drink/category/guest-house-bb" class="Guest_House-B_B EDNcategorycolor-default">Guest House-B&B</a>

Note: The categories will change for each URL I do a HTTP request for.

In my scenario I need to do further text parsing to extract the category text I need. I’m therefore trying to further reduce the output using a second text parser (I imagine there’s a more succinct way to do this but I’m new to this) This expression is:

>([^<]+)<

…and that’s not working either despite what ChatGPT says. The result of the second text parser is empty.

blueprint (1).json (47.9 KB)

Any ideas how I can get just the data back that I need?

Topic		Replies	Views
How can I extract a specific text from an HTML? How To text-parser	4	505	October 26, 2024
Basic webscraping with HTTP and Text Parser Getting Started functions , web-scraping	2	459	October 3, 2024
Need Help parsing text How To webhooks , text-parser , http	9	58	May 23, 2025
HTML to Text Parsing ONLY for <Body> text How To arrays , set-variable	4	1677	August 17, 2024
Extract title from .html webpage Getting Started filters	3	801	December 11, 2023

Text Parsing html

Related topics