Ok, after much messing about I managed to figure it out, as searching wasn’t helping.
Hopefully this helps others out there.
In summary; use openAI upload module, but make sure the file name has the correct extention.
Then you need to move this file into a Vector Store that you have already created (you could create via an api call if you wanted too).
Then when you use the openAI assistant module, you use file search feature.
It only works with GPT-4, and file search only works with vector store.
https://platform.openai.com/docs/assistants/tools/file-search
(Correct of 28/06/2024)
There are a few specific things I learn’t along the way.
The main missing step that took me a while; is when you upload a file, you cannot choose which store to save it in. Maybe a feature request for make team. And if the file is not in a store, you cannot search within it!
In my example I’m looking for new pdf files in a folder on oneDrive, I’m sure this also works for Google and other sources.
Step 1:
Use HTTP to get the file info. I passed the Download URL in my case
Step 2:
Use the openAI upload module. Set purpose to Assistants (as this is what we will use later)
But this is where I had made a mistake. Make sure you use map and the file name has the correct extension, like dot pdf. You can check if this works by loggin into your storage setting in open ai playground https://platform.openai.com/storage
File extensions supported:
https://platform.openai.com/docs/assistants/tools/file-search/supported-files
I made this mistake early on and didn’t release it was an issue until later. As you can see if you used the HTTP file name, it end up like the bottom file in the above screen shot. This causes an issue later when you want to move it as the file extension is not allowed i.e. its ‘none’
Step 3:
Create JSON to help with moving the file. This can be skipped as you can just type this in directly later
Step 4:
If you don’t already have one, make a vector store in openAI
[Edit: if you have an expiry date then openAI will expire the store and your automation will fail!, so set to never expire, this also means as the storage becomes larger it will cost you more, so make sure you remove files you do not need!]
Copy the ID, will need it later
Step 5:
Move the file into the vector store by making an openAI API call
For the URL you need to follow this: https://platform.openai.com/docs/api-reference/vector-stores-files/createFile
Which in make.com would be: v1/vector_stores/{vector_store_id}/files
So in the URL replace the vector store id with the ID you pasted. (The bit in bold)
You will need to add the extra header too
Then for body add the JSON output, or just paste the code.
Which should be the mapped file id from the upload openAI module
{"file_id":"FILE ID"}
You can check if this bit works after running, by going into your playground store and seeing if the file is in there
Step 6:
Now you can run your openAI assistant
Make sure you have file search0 turned on (not sure if you need this step as there is an option in openAI module, but I had it on.
If you get 404 error, check your URL.
Step 7:
Point to your assistant
In your message, point to the pdf you want it to reference. Since your store can hold multiple files, you need to tell it which one.
In tools make sure file search is turned on.
Then make sure you point to your vector database, either paste the ID, or manually select it by the name you gave it earlier.
So this will tell your assistant to search within a specific file, and also where this file can be found i.e. which vector database.
Note:
Repeating again: If you have an expiry date on vector store then openAI will expire the store and your automation will fail!, so set to never expire, BUT this also means as the storage becomes larger it will cost you more, so make sure you remove files you do not need!
Of course don’t forget to add in any error handling.
Hope this helps. I had wished someone told me about all of this.
It would make sense of the make.com team made a video on this.
You may also want to use the openAI module called “Transform Text to Structured data”
(If all else fails, maybe worth looking at specific pdf extract options from other companies)