I have several different Purchase Orders in PDF (Text). Each purchase order contains multiple items lines. The files are formatted quite well, but there are some issues, for example the item name will appear on the following line, so each item is actually broken down into 2 lines:
The approaches I have tried so far:
Use PDF.co to convert to JSON. This works well and producing a workable JSON, but there are some challenges here:
I cant use ChatGPT to manipulate the data and merge the items into 1 row because the JSON file is more than 150K characters.
If each product-line would come in 1 single row I would be able to process it on Make with iterators etc, but if I iterate on row 1 the product name only appears on row 2 which makes this kind of manipulation using Make tools quite complicated
Would trying to break the JSON into chunks and then send them to chatgpt to manipulate and merge would be a good idea?
If I upload the PDF directly on the OpenAI console and ask him to parse it correctly it works well but its a manual process. Is there a way to send a PDF using the API?
Any suggestions will be welcomed ? I dont mind taking any other approach
What about converting it in json as you did, but after iterating you just filter out bundle position if itâs Odd or Even with mod(bundleposition;2)=0 being even and mod(bundleposition;2)=1 being odd. Assuming youâre only working with 2 lines. If not, we can just iterate till next âPosâ column value is not empty. If you provide what youâre trying to do with the data, I might be able to help to determine the output data structure and optimize the scenario.
At the moment i set filters to start at bundle position X to avoid all the stuff coming before the line items.
So letâs assume i start with Bundle 10, then i take the data and ADD a certain row in Excel.
Maybe what i should do is then UPDATE the same row with info from the following bundle. The trick here is to know which row to update in the Excel and how to know when its an add and when its an update (probably that can be done with MOD).
Can you send a screen of your current scenario ? The trick of knowing if itâs add or update is filtering through a router with 2 choices : either bundleposition is odd or even. Thatâs it. You probably will need a âset variableâ just after the âadd rowâ module to set âcurrent rowâ, and then in the next router just before âupdate a rowâ add âget variableâ module to reference that row. cuz data in different router branch donât transfer to other branches. There might be another solution if youâre looking to optimize operations and donât use setvariable-getvariable. You can set up this solutions and if it works, we can talk about optimization afterwards.