HTML to Text Parsing ONLY for <Body> text

Hi,

I am having a hard time parsing HTML for elements.

I want to get results similar to zapiers Parser tool, however, I couldn’t find a way to do it.

I tried using variations of regex but was unable to extract only the content. I was able to just get the part of the html, but the content still had URLs and other noise in it.

I am looking for just the text (p), lists (ul or li) and headers (h1-h6).

Please help!

Welcome to the Make community!

When asking for help with creating a regex pattern for a text parser module, please copy and paste the sample text here as well so that we can run it against test patterns.

If you do not provide proper examples, you could be wasting our time as we have to guess your sample input. Not only that, you may not get the correct answer, or it may take several “guesses”.

You could also provide the following information so that we can see what you’re working with:

1. Screenshots of module fields and filters

Please share screenshots of relevant module fields and filters in question? It would really help other community members to see what you’re looking at.

You can upload images here using the Upload icon in the text editor:
Screenshot_2023-10-07_111039

2. Scenario blueprint

Please export the scenario blueprint file to allow others to view the mappings and settings. At the bottom of the scenario editor, you can click on the three dots to find the Export Blueprint menu item.

Screenshot_2023-08-24_230826
(Note: Exporting your scenario will not include private information or keys to your connections)

Uploading it here will look like this:

blueprint.json (12.3 KB)

3. And most importantly, Output bundles

Please provide the output bundles of the modules by running the scenario, then click the white speech bubble on the top-right of each module and select “Download output bundles”.
Screenshot_2023-10-06_141025

A.

Save the bundle contents in your text editor as a bundle.txt file, and upload it here into this discussion thread.

Uploading it here will look like this:

bundle.txt (12.3 KB)

B.

If you are unable to upload files on this forum, alternatively you can paste the formatted output bundle in this manner:

  • Either add three backticks ``` before and after the code, like this:

    ```
    input/output bundle content goes here
    ```

  • Or use the format code button in the editor:
    Screenshot_2023-10-02_191027

Providing the output bundles will allow others to replicate what is going on in the scenario even if they do not use the external service.

Following these steps will allow others to assist you here. Thanks!

samliewrequest private consultation

Join the unofficial Make Discord server to chat with us!

1 Like

Hi sorry about that.

Attached is a screenshot of the data I want parse. I am looking to extract the body, headlines, etc… I want to filter out links, urls and all other noise (basically anything in contained in “[” “]”).

Here’s the code:
blueprint (3).json (66.9 KB)

When reaching out for assistance with your regex pattern for a Text Parser module, it would be super helpful if you could share the actual text you’re trying to match. Screenshots of text can be a bit tricky, so if you could copy and paste the text directly here, that would be awesome! It ensures we can run it against test patterns effectively. If there’s any sensitive info, feel free to change it to something fictional yet still valid by keeping the format intact.

Don’t forget to format the text:

Paste the formatted text and,

  • Either add three backticks ``` before and after the code, like this:

    ```
    text content goes here
    ```

  • Or use the format code button in the editor:
    Screenshot_2023-10-02_191027

Providing clear text examples saves time on both ends and helps us give you the best possible solution. Without proper examples, we might end up playing a guessing game, and nobody wants that as it is a waste of time! You are more likely to get a correct answer faster. So, help us help you by sharing those text snippets. Thanks a bunch!

1 Like