Hi all,
My scenario extracts texts from images. I use an OpenAI module for the text extraction. After the text extraction is done, I use the Text Aggregator module to get the following output:
As you see, the text extracted from 0002.jpg, 0003.jpg, and 0004, jpg is the same as the one extracted from 0001.jpg. Likewise, the text extracted from 0006.jpg and 000 7.jpg is the same as the one extracted from 0005.jpg.
So, I would like to remove the portions enclosed with red boxes. I use the Text aggregator module to create the above result as shown below:
I tried things after reading related topics here, but I canât find a solution to remove the duplicated texts from the Text aggregator output.
It would be great if anyone here can help. Thank you!
@Kaz_Suzuki , Have you tried asking ChatGPT filter those duplicate texts ? Can you also look for potential regex solution with ChatGPT? Just copy the output at ChatGPT and explain what uâre trying to do, then use âmatch patternâ module (text = output text, pattern = pattern given by gpt).
Thank you very much for the reply. I tried, but I was not able to solve my issue by using Match Pattern + regex.
Thank you all who read my question and tried to come up with a solution.
I ended up using an Array Aggregator right after the Text Aggregator, and then used the distinct function to deduplicate. The Text Aggregator might not be needed, but itâs a part of somewhat complex scenario and I didnât want to break something. So, I havenât tried to remove the Text Aggregator yet (probably, I will try later).
Thank you!
1 Like
Would the function deduplicate() work here?
Never used it, but it might be worth a shot. Probably only works with simple arrays only

Thank you for the reply! I think deduplicate() is for a simple array that each item doesnât have any other attributes. In this case, each array item includes text and its associated image filename. So, I used distinct() like âdistinct(; text)â.
Just wanted to comment that as well. Distinct is the correct one ^^