Data extraction from many files, different formats

:bullseye: What is your goal?

I need to automate data extraction from about 150 different files I receive monthly. The files are in different formats.

:thinking: What is the problem?

Hello. I’m a new to this platform.

I’m trying to automate a workflow where I extract data from about 150 different files I receive monthly. These files are in xls and pdf. They are sent by different organizations, and their formats are not standardized.

I created a dictionary for the data, in the exact structure of the table in the screenshot attached to this message.

I’ve been trying to automate this with ChatGPT, by using a customized GPT I made for this task.

I’ve been running tests by sending different file formats and seeing what it gets right. The biggest problem is that it often doesn’t locate pieces of data from the files.

In the beginning it did dumb things, like taking information from the wrong column. I made sure to add clear instructions on how to go about this and it does much better now.

But still, it often flat-out ignores obvious pieces of data that I instructed it to extract. The files are very much readable, I dont see why there’s an issue reading the xls ones, and most of the data in the pdfs is very discernable.

The labels of the data vary accross files, that’s why I made the dictionary.

I’ve tried countless prompt iterations. It still doesn’t work well. That’s why I signed up to Make today, this whole thing is driving me insane.

Any guidance on this would be much appreciated.

:camera_with_flash: Screenshots: scenario setup, module configuration, errors

Hey @Breno_Barbosa , I have just two questions. is it finding it difficult to read just the xls files? Are the files only in XLS and PDF formats?

Both. It runs into the same issue of not finding some pieces of information regardless of the file type. In most cases it does find most of the pieces of data.

And yes, it’s only xls and and pdf