Hi there, I’m trying to use PDF.CO to Parse a PDF and eventually push some fields into an Excel.
I apologize in advance for the long post
The PDF file im parsing contains Purchase Orders on every page, Each page being an order and each order may have several SKUs.
The PDF looks like this with multiple pages, each page looks like this:
I’m using the PDF.co PDF to JSON module and the Output of the Module is a mapping of the PDF by rows and columns, and I saw 2 ways to get the Data as below:
- Body/Document/Page/Row/Column
- Body/Object Values - Which is already a Matrix of Row/Columns that has a value in the specific document. I used this to map the data.
If I define the PDF.co module to run on 1 page, everything works fine and easily mapped into the Excel.
The challenges start when I want to run the automation on the entire PDF file which contains may pages=Orders:
-
As mentioned above, i’m mapping according to the matrix on the “Object Values” for example Row_2_Column_11 however if the output contains more than one page, there is also a page indication, for example Page_3_Row_2_Column_11 - So if I Map a specific page into the excel it i will only output that specific page… I think that some sort of iteration might be needed but not sure how to approach it…
-
In the case that there is more than 1 Row on the PO like in above example, the same complexity apply since I’m mapping a specific row and column, and this will mean im not mapping the the 2nd row onwards
-
Even with just parsing a single page and a single line in the order, I realized that somehow sometimes the row is not in the same position, so if in one page I map Row=2 and Column=3 in another page it might be Row=3 Column=3 and I would get an empty value" I think that somehow, although visually not evident from looking at the PDF, the content jumps one row from time to time. Any create idea on how to tackle it?
-
a Bonus question, is there a way to Sort the Excel by SKU, for example, using Make.com ?
That’s it, I think its already too much for one post…