How to extract and remove from a html-code

Hi good people

I am trying to pass some html code from one Wordpress installation to another. But of course the two installations has slightly different configurations, so I have to alter the html of the content.

My use case is this:

  1. First I want to identify certain elements in the html-code.
  2. Then I want to remove some from the html-code. But the test of the code has to remain html.

For example - here is an html input coming into make.com:

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Article Title</title>
</head>
<body>

<header>
    <h1>Welcome to My Article</h1>
</header>
<article>
    <h2>Introduction</h2>
    <p>This is the introduction of the article. It sets the stage for the content that will follow. Here you can include interesting facts, figures, and foreshadow the main points that will be covered.</p>
</article>
    
    <div class="subsection">
        <h2>Main Section</h2>
        <p>This is the main section of the article. This part is significantly important as it carries the core information intended for the reader. It's advisable to break it down into several paragraphs to enhance clarity and reader engagement.</p>

        <h3>Subsection A</h3>
        <p>Details about Subsection A come here. It’s good to include data, statistics, and other in-depth info that supports the main article topic.</p>
    </div>

<footer>
    <p>Copyright © 2024 Your Website</p>
</footer>

</body>

I want to get everything within the title tag and the article tag:

  • So “Article title” should be output number 1.
  • And this should be output number 2:
<h2>Introduction</h2>
    <p>This is the introduction of the article. It sets the stage for the content that will follow. Here you can include interesting facts, figures, and foreshadow the main points that will be covered.</p>

Then I want the rest of the html without the two first outputs and without the footer to be output 3.

So first I will remove output 1, then remove output 2, then look for the footer tag and remove it. And then output 3 should be:

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
</head>
<body>

<header>
    <h1>Welcome to My Article</h1>
</header>
    
    <div class="subsection">
        <h2>Main Section</h2>
        <p>This is the main section of the article. This part is significantly important as it carries the core information intended for the reader. It's advisable to break it down into several paragraphs to enhance clarity and reader engagement.</p>

        <h3>Subsection A</h3>
        <p>Details about Subsection A come here. It’s good to include data, statistics, and other in-depth info that supports the main article topic.</p>
    </div>

</body>

I have tried to work with arrays and with the built in opportunities to “replace” and to “get”. But I can’t seem to get my head around the right solution.

Can any of you guide me towards the correct tools to use? Then I will try to code it following your examples.

Thank you in advance.

Best regards,
Rasmus MP

Welcome to the Make community!

Screenshot_2024-02-20_151445

You can use a Text Parser “Match Pattern” module with this Pattern (regular expression):

<title>\s*(?<title>[^<]+)\s*<\/title>[\w\W]+<article>\s*(?<article>[\w\W]+?)\s*<\/article>

Proof https://regex101.com/r/NABD4d

Important Info

  • :warning: Global match must be set to NO!

Screenshot

Output


For more information, see Text Parser in the Make Help Center:

Match Pattern
The Match pattern module enables you to find and extract string elements matching a search pattern from a given text. The search pattern is a regular expression (aka regex or regexp), which is a sequence of characters in which each character is either a metacharacter, having a special meaning, or a regular character that has a literal meaning.

Hope this helps!

4 Likes

Hi Samliew

Thank you so much. This is an approach, I haven’t tried yet.

I will try in a couple of hours to test if it works in my environment. I will reply again when I have the results.

Thank you for your help until now.

Best,
Rasmus MP

Hi again Samliew (or others?)

Once again thank you so much for helping out. With your and ChatGPTs help I think I have now manage to build all the necessary regex that I needed even though I never tried this language before.

One make question if I may?

The last operation I want to do is to use all the output provided to remove that text from the original html. What tool in make.com would you use to remove it?

Thank you in advance.

Best,
Rasmus MP

The Text Parser also has a Replace module. Use two of those (or you can try using the built-in replace function).

Replace the matched text with {{emptystring}}

2 Likes

Thank you!

It was the emptystring that I didn’t know about.

No problem, glad I could help!

1. If you have a new question in the future, please start a new thread. This makes it easier for others with the same problem to search for the answers to specific questions, and you are more likely to receive help since newer questions are monitored closely.

2. The Make Community guidelines encourages users to try to mark helpful replies as solutions to help keep the Community organized.

This marks the topic as solved, so that:

others can save time when catching up with the latest activity here, and

  • allows others to quickly jump to the solution if they come across the same problem

To do this, simply click the checkbox at the bottom of the post that answers your question:
Screenshot_2023-10-04_161049

3. Don’t forget to like and bookmark this topic so you can get back to it easily in future!

Links

Here are some useful links and guides to help you get started and learn more on how to use the Make platform, apps, and app modules —

General

Help Center Basics

Articles & Videos

3 Likes