Automation for pdf extraction

Francisco_Reis · July 23, 2024, 1:52pm

Hello everyone,

I’m looking to develop an automation process to extract specific information from 15 to 30 pages within PDF documents. The challenge is that the required values are not consistently located on the same page or in the same place. Using a PDF parser tool like pdf.co to extract this information might become expensive as I’d need to parse the entire file.

I’ve considered uploading the file to OpenAI in order to train an agent to extract this information; however, it seems that OpenAI doesn’t have direct access to the PDF and provides outputs based on the examples I provide. I also attempted to convert the full PDF to text, but it becomes too large to create a prompt for OpenAI or other similar platforms. When I divided it into chunks, I encountered difficulties in using all the chunks to prompt the agent.

Does anyone have any suggestions on how to create a solution for this, or how to improve the interaction with OpenAI’s assistance?

Any help would be greatly appreciated.

Thanks

samliew · July 23, 2024, 3:19pm

You’ll need to set up a custom assistant and use the Message an Assistant module to use previously uploaded files.

For more information, see

https://community.make.com/t/how-to-pdf-into-openai-solution/43811

samliew – request private consultation

Join the unofficial Make Discord server to chat with us!

Francisco_Reis · July 23, 2024, 3:34pm

I did this but the assistant doesn’t reply based on the content I submit but with the examples I provided in the instructions. Do you know what can be wrong?

Msquare_Automation · July 24, 2024, 2:13am

Hi @Francisco_Reis

Ensure that you have provided system, assistant and user correctly. We have done a video on PDF Extraction here.

If you need any setup guidance, don’t hesitate to reach out to us.

Regards,
Msquare Automation - Gold Partner of Make

Book a Free Consultation | Connect Live

Explore our YouTube Channel for valuable insights and updates!

Mit · July 24, 2024, 8:52am

Besides the solutions I have made here: How to PDF into openAI (Solution!)

I’ve found the pdf.co module a lot better to extract pdf data. As it’s setup specifically to extract pdf data.

Also makes openAI has a module to set up a structured JSON module, which is also worth noting too.

Francisco_Reis · July 24, 2024, 8:54am

Hello
thanks for the reply!

I used your method but in the last part when I message the assistant it doesn’t give me the information form the pdf I just uploaded

Mit · July 24, 2024, 8:58am

Check the back end of the playground and see what happened.

Which step did it fail it? Need more specific info. Seems it can’t find the pdf file.

Is the pdf file being moved to the vector store?
Is the correct vector store being accessed?
You have enough credits in your account?
Using GPT-4?

Francisco_Reis · July 24, 2024, 10:53am

I dont know. The setup is exactly like yours.
The pdf is in the vector store, and I do have enough credits in my account. I selected the right vector store and the assistant was created with gpt4o. But then the output is not right

Mit · July 24, 2024, 1:09pm

If you give the same input and pdf in playground do you get the output you want?

Essential if the playground works, then all you are doing via make is automated it, and the hardest step was for it to find the pdf. Which you say is there.

PDF.co · July 25, 2024, 9:58pm

Hello Francisco_Reis,

To accurately parse specific pages from your PDF file using the PDF.co Document Parser, please use the pages parameter. By specifying the pages you want to extract, the Document Parser will focus only on those pages instead of processing the entire file. For more details on how to use the pages parameter, please visit our documentation at the following link: API Docs

If you have any questions or need further assistance, please let us know.

Msquare_Automation · July 26, 2024, 1:14am

Hi @Francisco_Reis

You can check the detailed demo of PDF extraction here.

Regards,
Msquare Automation - Gold Partner of Make

Free Consultation | Live Implementation

Visit us here | Youtube Channel

Topic		Replies	Views
Which is the better option for analyzing and extracting information from a PDF? How To api	2	393	September 9, 2024
OpenAI GPT4o - extracting info from PDF How To error	0	34	May 19, 2025
Extract information from PDF invoices Getting Started connections	4	1057	April 26, 2024
Pass PDF to analyse images in OpenAI (or other formats and other LLMs) How To chatgpt , ai , pdf , gemini	3	104	April 25, 2025
How to use OpenAI module to read a PDF How To open-ai	2	50	July 11, 2025

Automation for pdf extraction

Related topics