Text Parsing html

I have downloaded a page’s html using HTTP get request and now I’m trying to extract the page’s category tags which are shown in the text.

Here’s an example source url: Example page - see Categories towards the bottom above the map

If the page categories include ‘Accommodation’ ‘Guest House-B&B’ or ‘Restaurant’ etc then I need to extract this info and create a list which I can then populate into a google sheet.

So far, I’ve got the html tag (using chatgpt to create the regex) which contains the category information but I cannot seem to extract just the category text.

The expression I’m using to extract the phrase so far is:

Categories:</strong>\s*((?:<[^>]+>[^<]+</[^>]+>\s*,?\s*)+)

And that’s giving me this output:

<a href="https://www.discovercarlisle.co.uk/eat-drink/category/accomodation" class="Accomodation EDNcategorycolor-default">Accomodation</a>, <a href="https://www.discovercarlisle.co.uk/eat-drink/category/guest-house-bb" class="Guest_House-B_B EDNcategorycolor-default">Guest House-B&B</a>

Note: The categories will change for each URL I do a HTTP request for.

In my scenario I need to do further text parsing to extract the category text I need. I’m therefore trying to further reduce the output using a second text parser (I imagine there’s a more succinct way to do this but I’m new to this) This expression is:

>([^<]+)<

…and that’s not working either despite what ChatGPT says. The result of the second text parser is empty.

blueprint (1).json (47.9 KB)

Any ideas how I can get just the data back that I need?