Parsing custom HTML payload efficiently

Guys,
I have a webhook that sends html-nested-within-JSON payload that I’d like to parse, map and use downstream in an efficient way, meaning: as few operations as possible (since we’re running multiple 100k’s monthly)

I’m a Zapier veteran, but Make/Integromat newbie, so I haven’t quite figured out the right approach yet, so I’m looking for crowd wisdom!

I have initial payload like

{
"var":"<h4>this is some headline</h4><h4>with another headline</h4><p>And some content</p><p>and some more content</p>"
"var_as_markdown":"#### this is some headline \n\n#### with another headline\n\n And some content \n and some more content"
}  

Note that this is not fully-fledged, valid html (e.g. head, body etc), but only a subset. Additionally, I get the same payload in markdown, if this helps the case or opens up new solutions.

Now, I need those different html elements (h4’s, p’s) stored into some kind of variables so I can use them downstream - stitching together a Wordpress post, specifically. In Zapier, I’d have used a custom “Code” block that executes arbitray Python/JS code, but since I havent found an Make equivalent of this, I toyed around with…

  • Text Parser “Match pattern”, using various regexp combinations → while this got me ahead somewhat, the parser output will be multiple “bundles” that seem to be processed in multiple operations, or I would need to create multiple modules doing the same thing for different variables, both of which doens’t seem very efficient
  • Text Parser “Get Elements from HTML”, but for some reason that is beyond me, this allows only for img, a and iframe elements to be selected. What’s up with that?
  • Tools “Set multiple Variables”, but I’m unable to cut the payload down to the elements/ contents I need.
  • I’m vaguely aware that there are custom functions available, but I haven’t mastered this yet.

As for the desired result, I’d like to be able to

  • create a Wordpress module node
  • use all those captured/mapped h4’s to form one single WP title field
  • use the rest of the content (the p’s) as post body

So: How can I achieve my desired result with the least amount of steps?

Any help appreciated!

Hey @You_never_woke_alone ,

I think the best way would be the text parser with an array aggregator. The aggregator adds up all bundles into an array again, and then you can use that array however you want.
With RegEx named groups you can name each output to map() it later again. This regex names each part for example:

(?<h4>(?<=\<h4\>).+?(?=<\/h4>))|(?<p>(?<=\<p\>).+?(?=<\/p>))



Hope this helps you out :wink:

1 Like

Wow, this is some solid advice! Thanks @Bjorn.drivn !

How can I only get the content between the brackets, without the “<h4>” and “=<h4” in the output bundle? Not entirely sure where they come from, as on https://regex101.com/ with ECMAScript set, the matches will be just fine, but not here :thinking:

Will dive into Array Aggregator right afterwards! :slightly_smiling_face:

1 Like

@Bjorn.drivn WHAT AN ANSWER!!! Just wow! :exploding_head:

2 Likes

Hmmmm you are correct @You_never_woke_alone , I actually missed that…was building the RegEx on 101 as well indeed where it all looked good.
I expect this is an issue with the Make Engine which thinks it’s a named group instead of a positive lookbehind. Maybe they can fix this bug, I would recommend to create a support ticket with all information.