Hi there I am new to make but I have a seemingly fairly complex requirement that I would appreciate assistance with. I am processing invoices with the help of GPT to create structured Json for downstream processing of the invoices. problem is on the invoices with many pages gpt hits a limitation. My solution is to split the invoices out and process them one page at a time and then aggregate the Json results into one collection. Components to the invoice data are
- Company details (duplicated on every page)
- Customer Details (duplicated on every page)
- Charges (unique on every page and needs to be aggregated)
- Totals(present on the last page)
- Hotel details (duplicated on every page)
Example of Json created per page.
{
"invoice": {
"company_details": {
"name": "XXXX (Pty) Ltd",
"address": "PO Box XXXX, Johannesburg, 2017, South Africa"
},
"customer_details": {
"name": "De XXX, Wilhelmina XXXX",
"vat_reg_no": "48XX025XXXX",
"user_id": "SIHXXX",
"company_name": "Island XXXXX",
"ar_number": "REXXXX",
"voucher_po_number": "X1X53X37",
"room_no": "0511",
"arrival_date": "25/03/24",
"departure_date": "05/04/24",
"folio_no": "821674",
"invoice_no": "298510",
"invoice_closed_date": "05/04/24",
"no_of_guests": 1,
"confirmation_no": [
"51995064",
"21763747",
"20609768"
],
"vat_reg_no_company": "4010113001"
},
"charges": [
{"description": "Bernoulli's Dinner Food", "amount": 275.00},
{"description": "Bernoulli's Dinner Beverage", "amount": 101.00},
{"description": "F&B Tips (outsourced)", "amount": 24.00},
{"description": "Accommodation", "amount": 1573.00},
{"description": "Tourism Levy", "amount": 13.93},
{"description": "Accommodation", "amount": 1269.00}
],
"totals": {
"total": 20648.23,
"total_inclusive_vat": 3744.32,
"standard_vat": 488.39,
"net_amount": 3255.93,
"non_supply": 165.00,
"balance_due": 20648.23
},
"hotel_details": {
"name": "Southern Sun OR Tambo Int. Airport",
"address": "Jones Road OR Tambo International Airport, Kempton Park, 1620, GAU, ZA",
"telephone": "+27 11 977 3600",
"fax": "+27 11 975 5846",
"website": "southernsun.com",
"company_reg": "1969/001365/07"
}
}
}
I have managed to get make to split the pages, send each one to GPT and get a decent result in Json. I am now struggling with creating either and array aggregator or text aggregator that creates an aggregate Json object (same structure per page example above) representative of the original invoice document for downstream processing.
Your help in this is much appreciated and apologies if this was comprehensively addressed somewhere else, I did go through a few other posts along the same lines.