Slicing Arrays and collections

OmriPe · February 4, 2025, 1:21pm

Hey guys,

I have a project where I’m trying to get a PDF Purchase Order and extract all the items into an Excel. I have tried this with Chat GPT and Kimi but they make too many mistakes and invent stuff, so I decided to proceed doing it the old fashioned way, which means:

Use PDF.co to convert the PDF to JSON
Slice the JSON file. For example, I know that the relevant information starts only from Line 7 as below, so I want to remove lines 1-6.
Merge the pages so I don’t have to worry about some items spilling between the pages (its happening)
Send the cleaned JSON to Chat GPT to receive a slightly manipulated JSON
P.s. I cannot send the raw JSON to Chat GPT since its above the characters limit (278,097
characters - and thats why I want to slice it first)

Here is the PDF2JSON Output:

Here is the info expanded:

Here is how the PDF looks like (the relevant item lines).

If anyone can point to how to:

Merge the pages
Slice the unnecessary rows…

Many thanks !

Ronak_Bhagdev · February 4, 2025, 2:03pm

@OmriPe After PDF.co output, put an Iterator, run through pages and then use an array aggregator.

OmriPe · February 4, 2025, 3:26pm

@Ronak_Bhagdev Thanks, its a good start, and im doing it already, but still need to see how to do the same on the inner collections…

Topic		Replies	Views
Extract item lines from PDF into Excel How To json , open-ai , pdf	5	41	February 17, 2025
Turning ChatGPT response into array How To mapping , arrays , airtable	3	411	April 4, 2024
Parse Text to JSON and slice the array How To arrays , json	8	512	April 15, 2024
Aggregate collection for Chatgpt How To arrays , collections , chatgpt	4	26	March 30, 2025
Aggregate a variety of content from similar json objects into one collection Getting Started aggregators , arrays	2	164	October 2, 2024

Slicing Arrays and collections

Related topics