Parsing custom HTML payload efficiently

You_never_woke_alone · January 11, 2023, 3:50pm

Guys,
I have a webhook that sends html-nested-within-JSON payload that I’d like to parse, map and use downstream in an efficient way, meaning: as few operations as possible (since we’re running multiple 100k’s monthly)

I’m a Zapier veteran, but Make/Integromat newbie, so I haven’t quite figured out the right approach yet, so I’m looking for crowd wisdom!

I have initial payload like

{
"var":"<h4>this is some headline</h4><h4>with another headline</h4><p>And some content</p><p>and some more content</p>"
"var_as_markdown":"#### this is some headline \n\n#### with another headline\n\n And some content \n and some more content"
}

Note that this is not fully-fledged, valid html (e.g. head, body etc), but only a subset. Additionally, I get the same payload in markdown, if this helps the case or opens up new solutions.

Now, I need those different html elements (h4’s, p’s) stored into some kind of variables so I can use them downstream - stitching together a Wordpress post, specifically. In Zapier, I’d have used a custom “Code” block that executes arbitray Python/JS code, but since I havent found an Make equivalent of this, I toyed around with…

Text Parser “Match pattern”, using various regexp combinations → while this got me ahead somewhat, the parser output will be multiple “bundles” that seem to be processed in multiple operations, or I would need to create multiple modules doing the same thing for different variables, both of which doens’t seem very efficient
Text Parser “Get Elements from HTML”, but for some reason that is beyond me, this allows only for img, a and iframe elements to be selected. What’s up with that?
Tools “Set multiple Variables”, but I’m unable to cut the payload down to the elements/ contents I need.
I’m vaguely aware that there are custom functions available, but I haven’t mastered this yet.

As for the desired result, I’d like to be able to

create a Wordpress module node
use all those captured/mapped h4’s to form one single WP title field
use the rest of the content (the p’s) as post body

So: How can I achieve my desired result with the least amount of steps?

Any help appreciated!

Bjorn.drivn · January 11, 2023, 11:20pm

Hey @You_never_woke_alone ,

I think the best way would be the text parser with an array aggregator. The aggregator adds up all bundles into an array again, and then you can use that array however you want.
With RegEx named groups you can name each output to map() it later again. This regex names each part for example:

(?<h4>(?<=\<h4\>).+?(?=<\/h4>))|(?<p>(?<=\<p\>).+?(?=<\/p>))

Hope this helps you out

You_never_woke_alone · January 12, 2023, 11:03am

Wow, this is some solid advice! Thanks @Bjorn.drivn !

How can I only get the content between the brackets, without the “<h4>” and “=<h4” in the output bundle? Not entirely sure where they come from, as on https://regex101.com/ with ECMAScript set, the matches will be just fine, but not here

Will dive into Array Aggregator right afterwards!

IainM · January 12, 2023, 11:05am

@Bjorn.drivn WHAT AN ANSWER!!! Just wow!

Bjorn.drivn · January 12, 2023, 12:08pm

Hmmmm you are correct @You_never_woke_alone , I actually missed that…was building the RegEx on 101 as well indeed where it all looked good.
I expect this is an issue with the Make Engine which thinks it’s a named group instead of a positive lookbehind. Maybe they can fix this bug, I would recommend to create a support ticket with all information.

Topic		Replies	Views
HTML to Text Parsing ONLY for <Body> text How To arrays , set-variable	4	1064	August 17, 2024
Text Parser How To	2	659	September 7, 2023
Parsing text from mailhook to variables or collection How To regular-expressions	2	610	September 7, 2023
Best way to parse information from a webhook How To arrays , set-variable	3	957	March 11, 2024
What's the best way to get a raw file from Github and then parse it? How To regular-expressions , parse	5	1250	April 21, 2023

Parsing custom HTML payload efficiently

Related topics