Situation:
The blueprint(attached) is only a part of a bigger flow.
- I’m getting PDFs from a form upload (up to 5 PDFs)
- All the PDFs will be scanned with OCR API from mistral.ai and I get back one bundle per document
- All content(markdown) from all bundles need to be merged into one big text for further processing
- All images from all bundles will be converted and prepared for further processing
Goal: steps 3 and 4 should create one bundle with one big text and a list of all images for further processing.
What should be achieved:
- No routings within this part of flow
- Content from PDFs is available as one big text for the following steps
- Images from pdf content are available as List (will be generated by mistrals ocr)
- Having the one big text and all images as one bundle for all the steps afterwards
Why “no routing”:
After this flow, there will be 3 routings based on specific conditions (not part of the blueprint) for storing data in different Salesforce objects.
I want to avoid a lot of tracks. I will stay with 3 tracks afterwards which will be splitted into 6 later.
Needed values from ocr
- body(collection) > pages(array) > 1 (collection) > markdown
- body(collection) > pages(array) > 1 (collection) > images (if images where detected)
How can I streamline it.
I’m also open for impulses and ideas to optimize the flow within the giving setup.
Thanks in advance
Frank
Get-all-content-and-images-from-multiple-pdfs.json (25.6 KB)
example-output-from-mistral-2.json (44.0 KB)
example-output-from-mistral-1.json (5.0 KB)