How to PDF into openAI (Solution!)

Ok, after much messing about I managed to figure it out, as searching wasn’t helping.
Hopefully this helps others out there.

In summary; use openAI upload module, but make sure the file name has the correct extention.
Then you need to move this file into a Vector Store that you have already created (you could create via an api call if you wanted too).

Then when you use the openAI assistant module, you use file search feature.
It only works with GPT-4, and file search only works with vector store.

https://platform.openai.com/docs/assistants/tools/file-search
(Correct of 28/06/2024)

There are a few specific things I learn’t along the way.

The main missing step that took me a while; is when you upload a file, you cannot choose which store to save it in. Maybe a feature request for make team. And if the file is not in a store, you cannot search within it!

In my example I’m looking for new pdf files in a folder on oneDrive, I’m sure this also works for Google and other sources.

Step 1:
Use HTTP to get the file info. I passed the Download URL in my case

Step 2:
Use the openAI upload module. Set purpose to Assistants (as this is what we will use later)

But this is where I had made a mistake. Make sure you use map and the file name has the correct extension, like dot pdf. You can check if this works by loggin into your storage setting in open ai playground https://platform.openai.com/storage

File extensions supported:
https://platform.openai.com/docs/assistants/tools/file-search/supported-files

I made this mistake early on and didn’t release it was an issue until later. As you can see if you used the HTTP file name, it end up like the bottom file in the above screen shot. This causes an issue later when you want to move it as the file extension is not allowed i.e. its ‘none’

Step 3:
Create JSON to help with moving the file. This can be skipped as you can just type this in directly later

image

Step 4:
If you don’t already have one, make a vector store in openAI

[Edit: if you have an expiry date then openAI will expire the store and your automation will fail!, so set to never expire, this also means as the storage becomes larger it will cost you more, so make sure you remove files you do not need!]

Copy the ID, will need it later

Step 5:
Move the file into the vector store by making an openAI API call

For the URL you need to follow this: https://platform.openai.com/docs/api-reference/vector-stores-files/createFile

Which in make.com would be: v1/vector_stores/{vector_store_id}/files
So in the URL replace the vector store id with the ID you pasted. (The bit in bold)

You will need to add the extra header too
Then for body add the JSON output, or just paste the code.
Which should be the mapped file id from the upload openAI module

{"file_id":"FILE ID"}

You can check if this bit works after running, by going into your playground store and seeing if the file is in there

Step 6:
Now you can run your openAI assistant

Make sure you have file search0 turned on (not sure if you need this step as there is an option in openAI module, but I had it on.

If you get 404 error, check your URL.

Step 7:
Point to your assistant
In your message, point to the pdf you want it to reference. Since your store can hold multiple files, you need to tell it which one.

In tools make sure file search is turned on.

Then make sure you point to your vector database, either paste the ID, or manually select it by the name you gave it earlier.

So this will tell your assistant to search within a specific file, and also where this file can be found i.e. which vector database.

Note:

Repeating again: If you have an expiry date on vector store then openAI will expire the store and your automation will fail!, so set to never expire, BUT this also means as the storage becomes larger it will cost you more, so make sure you remove files you do not need!

Of course don’t forget to add in any error handling.

Hope this helps. I had wished someone told me about all of this.

It would make sense of the make.com team made a video on this.

You may also want to use the openAI module called “Transform Text to Structured data”
(If all else fails, maybe worth looking at specific pdf extract options from other companies)

16 Likes

Congratulations on being the first to figure it out and make a showcase about it! :smiley:

Thanks for sharing it with the community!

samliewrequest private consultation

Join the Make Fans Discord server to chat with other makers!

4 Likes

Thank you very much.
I noticed you have helped on many of the similar posts and found your input helpful too.

I believe you are much more clued up then I am so keep up the great support you are giving :+1:

I know my solution won’t be prefect, so feel free to give any additional feedback (that applies to anyone else too)

3 Likes

I think the Create JSON with the file_id might not be necessary, and you should be able to just type this in the OpenAI API module:

{
  "file_id": "{{7.file_id}}"
}

Like this:

Screenshot_2024-06-28_230637

:smiley:

samliewrequest private consultation

Join the Make Fans Discord server to chat with other makers!

oh yes, 100%. I think I mentioned that. :slight_smile:
But will be sure to implement it.

1 Like

Amazing work @Mit. I’m sure this will be very useful to a lot of people.

1 Like

Great Efforts @Mit :100:

It going to be really helpful for makers.

Regards,
Msquare Automation - Gold Partner of Make

Free Consultation | Live Implementation

Visit us here | Youtube Channel

1 Like

Great job dude! Im sure this post will be usefull for a long time for many people.

1 Like

Thank you, this is exactly what I was looking for!
I’m new to OpenAI in make.com, so a newbie question: I have to buy credits to make this work, there’s no way to connect make.com to my ChatGPT Plus plan, correct?

This is correct.

Do not buy ChatGPT Plus, it doesn’t work with Make

“ChatGPT Plus” and “OpenAI GPT-4 models” are two separate products.

You might have bought the consumer chat “Plus” version at chat.openai.com, which is NOT compatible with Make.

Make uses the “OpenAI GPT-4 models”, only accessible via the OpenAI developer platform.

OpenAI APIs for developer (commercial) use does not have any free plan.

To resolve this issue,

  1. You can buy credits on the OpenAI Developer Dashboard, under Accounts > Billing page

  2. Next, go to the Usage Limits page, and set the Monthly budget field to the same as the value above (e.g.: 120), then save your changes.

More Information

For more information, see

samliewrequest private consultation

Join the Make Fans Discord server to chat with other makers!

3 Likes

Seems @samliew has answered this above. So what he said!

It’s different to the monthly plan, and is pay per use. So depends on how and what you use.

https://openai.com/api/pricing/

@Mit How can i automatically remove the files because i have bunch of files. so deleting manually is not a right solution.? and currently this solution if just for one file? we have to upload it manually?
Can you make a video on that too?

@Mit in Step 6: what should be instructions.

In red colour line like give the name of the File or anything?

Welcome to the Make community!

Make doesn’t have an OpenAI module for this particular endpoint (Delete a file).

If the external service has a Developer API Reference/Documentation then you should be able to integrate the endpoints in Make using the app’s universal module (Make an API call) or generic HTTP “Make a request” module.

Screenshot_2024-07-29_120759 (2)

You can also suggest for it to be made in the Idea exchange. Don’t forget to search for it first, just in case someone already suggested it, so that you don’t end up creating a duplicate.

If you need assistance in setting up the app’s universal module, or the generic HTTP module, please provide additional information about what you have tried with regards to the external service’s Developer API Reference – how you are setting the connection up, a link to the endpoint are you trying to connect to, and what errors you are encountering.

You can also complete this brand new course/tutorial in the Make Academy on how to use external APIs — API calls with HTTP modules

  • API and Endpoints
  • Header and body
  • Multipart/form-data
  • OAuth 2.0

samliewrequest private consultation

Join the Make Fans Discord server to chat with other makers!

in Step 6: what should be instructions.

In red colour line like give the name of the File or anything?

No, the file is not done in step 6. The first “red line” is the assistant ID, which is redacted. The second “red line” is probably just a private instruction in the prompt that can be omitted.

The file is referenced in step 7 using the vector store it was uploaded to:

Screenshot_2024-07-29_130717

@Med_FutureXAI If you have a new question in future, please create a new thread.

While it’s tempting to continue an existing thread, a more effective approach would be to start a new topic. It helps other community users to respond to your query, and keeps our space organised for everyone. If you start a new conversation you are also more likely to get help from other users. You can refer others back to a related topic by including that link in your question. Thank you for understanding and keeping our community neat and tidy.

The “New Topic” link can be found in the top-right of the header:

samliewrequest private consultation

Join the Make Fans Discord server to chat with other makers!

2 Likes

@Med_FutureXAI Samliew reply is correct. I just redacted some info.

My prompt was asking something’s specific for a client, which I didn’t want to make public.

Also yes, as above you need to use openAI api call to delete the file. There should be details on the openAI platform in addition to what has been said.

2 Likes

you can even do it like this:
Scherm­afbeelding 2024-08-10 om 11.24.38

Use the body id, of the uploaded file.

1 Like

Thanks a lot for sharing this scenario!

I was able to read pdfs in my scenarios. Now I am facing a problem: It seems, that previous PDFs are getting analyzed and therefore the output is completely wrong.
How can I make sure only the newest uploaded files are getting analyzed??

Could it be, that I would have to delete all previous files? So adding the Delete-file mentioned above at the end of my scenario?!

Thanks in advance!

Glad it helped.

I guess you make it so you specifically reference and name of just that file.

OR

In your vector storage, just store that one file. So every time your scenario is completed, it deletes the file. So start with an empty storage, and as part of your scenario, you upload read, etc, then delete it after. This will also mean you don’t end up paying for storage you don’t really need.