Text split

Hello Guys,

I hope you are well.

I created a scenario for extracting text from a pdf through Google Cloud Vision then aggregated it and summarised it through claude.
Here is when problems start, the output is the following:

I want to have each lesson in a different array by itown, how can I do this please ? Thanks a lot.
blueprint.json (47.5 KB)

1 Like

@Youssef_Enjri , Well we can split this text with either split() function (with some other helper functions) with separators being break lines, or a regex module.

But do you really need a text aggregator? I see you want to add rows for each lesson in the blueprint, so use directly add row module after the iterator?

@kudrachaa thank you for your reply.
Actually I added the iterator because google cloud vision send the content of each page separately so what I did is I aggregated all the pages then sent to Claude for quotes/learnings of each page and now claudes send all the quotes in one array that is why I did not use add row module directly after iterator.

@Youssef_Enjri Can’t you have claude reply in json format and parse that, like it’s possible with ChatGPT?

Or did you mean that you want to have them in separate operations?

2 Likes

@Youssef_Enjri, Ok short answer to your question is to (simple working solution, but whole process might be optimized for less operations) :

Add a text parser : match pattern module with regex :
(Lesson #\d+ :[\s\S]+?)(?=\nLesson #\d+ :|$)

Result :

For other configuration, just ask chatgpt for a specific regex formula, it’ll be faster.

1 Like

Thank you @Juliusforster for your answer works perfectly fine after using parse Json module :pray:

2 Likes

Thank you @kudrachaa, really appreciate your help.

It does not seem to be working on my scenario.

Interested to know how you configured the “set variable module”,


This is my configuration and I doubt it is the reason why parser does not work
and also which text would you choose

Would love to know more to solve this as I am a beginner and the more solutions and knowledge I have the better it is :pray:

@Youssef_Enjri, You do not need to declare the variable. It was for me to simulate what output you were getting. You just put your ‘Text Response’ variable from Text Aggregator in the Text field.

Glad it worked. Feel free to mark it as answer so others can easily find it.

A cool automation by the way! :smiley:

1 Like

@kudrachaa I think I did not explained very well what I wanted to do.

On the text aggregator I have the whole text which is too long and it is not the one I want to parse, I want to parse the text that I receive ideally from Claude, because the text that I receive from claude is like a summary of the text available on the aggregator. The output of Claude is some lesson#1 that we can learn from the book ( Lesson#1, Lesson#2, Lesson#3…)
The issue that I face is that when I parse text I do not get the lessons separated.




I left also these shots to explain, hope it is clearer for you now :slightly_smiling_face:

Thank you @Juliusforster

@Youssef_Enjri, Yes I get that. You need to put Text from Text Aggregator, not from Claude. For the comment, as this is 2 operations after Claude, Juliusforster’s answer is probably more optimized if Claude can provide a json.

@Youssef_Enjri @kudrachaa Looked around a bit. There doesn’t seem to be a direct “json mode” that can be activated, but you can normally tell it to output in a certain json format. Here’s what i’ve found: Increase output consistency (JSON mode) - Anthropic

1 Like

@Juliusforster @kudrachaa Thanks a lot guys for your help. Both are working fine now. I will go for the one that is operations efficient :slightly_smiling_face:

Thanks

1 Like