PDF to TEXT

Davidof90 · June 1, 2023, 3:27pm

Hello,

I want to extract the raw text from a PDF file, I can use the Google Drive module “download file”.

It gives me raw data, how can I convert it to text?

Thanks
David

Bruno_T · June 2, 2023, 11:05pm

Hey David!

Yeah, that’s not easy. PDF files are applications themselves, unlike other simpler formats. There’s no straightforward way to convert them.

The way to go is to use OCR (Optical Character Recognition) software. You can search for “OCR API” and try to find some options. I’ve never used any so I can’t recommend based on experience, but I know Make has built-in apps for Google Cloud Vision and PDF4me.

OCR is not perfect, so some typos can occur, but to the best of my knowledge that’s your best (or only) bet.

I hope this at least points you in the right direction.

Davidof90 · June 5, 2023, 9:29am

Thanks Bruno, will try this!

Trond · June 7, 2023, 3:15pm

Hey David,

I used Google cloud vision to solve this issue. Worked like a charm and was super easy to set up.

We use ist to convert invoices to text. Works at >95% accuracy. Feel free to reach out in case of questions.

Michaela · June 16, 2023, 11:53am

A post was split to a new topic: Read and analyze email content and PDFs using Chat GPT

Michaela · July 7, 2023, 8:10am

2 posts were split to a new topic: How to configure Google Cloud Vision on Make

E11iott · July 19, 2023, 7:19am

Hey Trond, I’m trying to do something similar with invoices and Google Cloud Vision.

I’ve got GCV working and to some extent Text Parsing working using RegEx. But I’m intrigued by what you’re doing to get the invoice data out of the files and into the format that you’re using.

How are you getting things like invoice date, name, total etc?

Would you mind sharing more of your scenario and how it works?

Trond · July 19, 2023, 8:05am

Thanks for reaching out.

Here is some screenshots, I hope they help

Set-Up of Text parser (replace):

Set-up of ChatGPT:

Prompt I am using to extract the data:
"I want you to act as an accountant. The company you are working for is called YOUR COMPANY NAME. You are tasked with handling the inbound invoices, your company receives. Your task is to specifically extract certain data from any inbound invoice for further processing of the invoices.

You will output your results into a result-array.

Here is the Text:{{26.text}}

The data you are looking for is:
Invoicing Company name (ie. the company which has issued the invoice),
Invoice Date (ie. the date on which the invoice has been issued),
Invoice Total Amount (ie. the total amount which needs to be paid including VAT (excluding any currency signs, use the euro amount if two options are available),
Invoice Number (ie. the unique identifier of this invoice),
Invoice Email (ie. the email of the invoicing company)
Invoice currency: (i.e. the currency of the total amount, EUR or USD)

The invoice date is required to be formatted like dd.mm.yyyy
The invoice company can never be YOUR COMPANY NAME, if you only find this result, reconsider your answer and search again.
All values inside the result-array must be seperated by §

The result-array [Value of invoice company§value of invoice date§value of Invoice Total amount§value of invoice number§value of invoice email§value of invoice currency]

Print result-array"

Darzk777 · August 18, 2023, 10:13am

Hi David,

To convert the raw data from the PDF file that you’ve downloaded using the Google Drive module into text, you can make use of PDFco’s API. PDFco offers powerful tools for working with PDF documents, including text extraction. We provide you a simple guide on how you can achieve this task using this link: How to Extract Text from Scanned PDF using Make - PDF.co

If you have any questions or encounter any issues during the process, please don’t hesitate to reach out to us via email at support@bytescout.com. Our dedicated support team is available to provide prompt and helpful assistance.

We hope you have a fantastic day!

Topic		Replies	Views
Extract text content of pdf How To mapping , api , pdf	2	200	May 24, 2025
How to pass pdf to text Getting Started connections	8	562	February 10, 2025
Improve PDF OCR performance from ChatGPT How To chatgpt , pdf	6	197	May 27, 2025
Get information from PDF How To connections	5	228	January 16, 2025
Extract text from the image/PDF How To	3	667	April 29, 2024

PDF to TEXT

Related topics