Extract a long text to parse in JSON

I am using Claude to generate some content.
I specified the output format as strictly JSON with the format.
Overall it respects very well the format

{
    "Activity": [
        {
            "Title_activity": "",
            "Description_activity": ""
        }
      ]
}

The only problem is often it had some copy before my code like:

HERE IS THE CODE / generation / jsonformat etc....
{
    "Activity": [
        {
            "Title_activity": "",
            "Description_activity": ""
        }
      ]
}

To parse it properly I want to either remove anything before {"Activity": [ or select all the text from {"Activity": [ to the end.
I tried the lastest, and I use the following regex: \{[\s\S]*?"Activity": \[(.*\n)+}
validated with regex101.com but I have this output:

 {
            "Title_activity": "Chasse",
            "Description_activity": "Transformez"
        },
        {
            "Title_activity": "RĂ©cit ",
            "Description_activity": "Racontez"
        },
        {
            "Title_activity": "Photographie",
            "Description_activity": "Utilisez votre téléphone"
        },
        {
            "Title_activity": "Souterrain",
            "Description_activity": "Amusez-vous"
        }
    ]

which is incorrect as it is missing the beginning of my code:

{
    "Activity": [

And the } at the end
I am not sure how to approach this, it is a very long text to extract and the text before the JSON is random
I’ve attached the blueprint, data is passed in scenario input
many thanks
blueprint.json (6.0 KB)

Just after posting I had this post suggested: Stuck with REGEX to extract JSON object

I used bit of the regex to amend mine and managed to make it work: ({[\w\W]*?"Activity": \[(.*\n)+?})
The only thing strange is that I have 2 value $1 and $2
$1 give me exactly the JSON i want
$2 has this value: ]
I am not sure why, I am using $1 and i am able to parse it to a JSON module after; I am going to run a few more test to ensure it s working properly.
If someone has a better/more safe approach, to what i did to ensure I can extract a JSON properly all the time, I am happy to hear :slight_smile:
I will leave the post open 24hrs and mark it with this response if no reply
Thanks

That’s because you have a second capturing group.

To omit it, you can use this regex instead.

You can use a Text Parser “Match Pattern” module with this Pattern (regular expression):

(?<json>{[\w\W]*?"Activity": \[(?:.*\n)+?})

Proof https://regex101.com/r/i5fqQx/1

Important Info

  • :warning: Global match must be set to NO!

For more information, see Text Parser in the Make Help Center:

Match Pattern
The Match pattern module enables you to find and extract string elements matching a search pattern from a given text. The search pattern is a regular expression (aka regex or regexp), which is a sequence of characters in which each character is either a metacharacter, having a special meaning, or a regular character that has a literal meaning.

Hope this helps! Let me know if there are any further questions or issues.

— @samliew

P.S.: Investing some effort into the Make Academy will save you lots of time and frustration using Make.

1 Like

Thank you @samliew works like a charm!