Have you ever wondered how to automate invoice data extraction directly from PDFs to Google Sheets without manual input? If you’re managing multiple invoices, This tutorial will guide you through setting up an automation in Make.com that extracts specific data from a PDF stored in Google Drive using Dumpling AI, ChatGPT and Google Sheets.
Here’s how I set up this automation:
Step 1: Setting Up the Trigger – Watch Files in a Google Drive Folder
To begin, we need Make.com to automatically detect when a new invoice PDF is uploaded to a Google Drive folder. This is the trigger that starts the automation.
- In your Google Drive, create a dedicated folder for invoice PDFs (e.g., “Invoices”).
- In Make.com, start by adding the Watch Files in a Folder module under Google Drive.
- Folder: Select the folder you just created.
- Watch for: Set this to Created (so the automation triggers when a new file is uploaded).
- File Types: Choose “All” to ensure it watches for all types of files.
What this does is monitor the folder so that each time you add a new PDF invoice, Make.com gets notified and starts the automation.
Step 2: Retrieving the New File from Google Drive
Now that Make.com knows when a new file is uploaded, the next step is to retrieve that file for processing.
- Add the Download a file module from Google Drive.
- In the module’s settings:
- File ID: Map this to the output of the previous “Watch Files” module. It will dynamically retrieve the file that triggered the automation.
At this point, Make.com is retrieving the actual invoice PDF from Google Drive, getting it ready for text extraction.
Step 3: Converting the PDF to Text with Dumpling AI
The next step is to extract the text from the PDF so that we can pull out important details like the invoice number, client name, and amounts. When working with invoices or other PDF documents in automation, the content format can vary widely. Some PDFs have searchable text, while others are image-based, making extraction a challenge. Dumpling AI offers two powerful modules to handle both scenarios in Make.com: Convert PDF to Text and Extract Data from PDF with AI. Here’s how you can choose the best one for your needs and why the Extract Data from PDF with AI module might be your top choice for comprehensive accuracy.
Scenario 1: Text-Based PDF Invoices
If your PDF contains searchable text (where you can highlight and search for text), Dumpling AI’s Convert PDF to Text module is a great option. This module extracts the text from the PDF, allowing you to pass the data to tools like ChatGPT or other AI models to structure it as needed.
Here’s how it works:
- Convert PDF to Text: The module reads the contents of the PDF and converts it into plain text.
- Extract Structured Data: After converting the PDF, you can use AI tools like ChatGPT to extract specific values such as invoice numbers, client names, and totals.
This method is effective for clean, text-based invoices, but it may fail if the invoice contains images or complex layouts where text extraction might jumble the data.
Scenario 2: Image-Based or Complex PDFs
Now, what happens when your PDF is more complex, such as image-based invoices where the text cannot be highlighted or searched? In these cases, the Convert PDF to Text module won’t work. This is where Dumpling AI’s Extract Data from PDF with AI comes in.
Extract Data from PDF with AI uses a vision-based approach, leveraging multimodal AI to read and extract data from both text and images within a PDF. It’s a more reliable option for handling various types of invoices, especially when:
- The PDF is entirely image-based.
- The PDF contains a mix of text and images.
- Traditional text extraction methods fail to structure the data correctly.
Why Choose “Extract Data from PDF with AI”?
Here are the advantages of using the Extract Data from PDF with AI module:
- Handles All PDF Formats: Whether the invoice is text-based, image-based, or a combination, this module can extract data reliably using vision-based AI.
- Increased Accuracy: Unlike the text extraction method, which may struggle with messy or jumbled text, the AI-driven extraction can handle complex layouts and ensure the correct values are captured.
- Multimodal AI Power: Traditional OCR methods have their limits, but this module uses modern AI capable of recognizing and extracting data from both text and images, making it much more versatile.
- Minimal Setup: The configuration is simple. You provide the PDF and a basic prompt (like instructing it to find the total amount, invoice number, etc.), and the module does the rest.
For our Scenario, we will be using the Dumpling AI Extract Data from PDF with AI Module, Here is how it is Structured:
- Add the Dumpling AI Extract Data From PDF with AI module .
- In the settings:
- File Input Method: Choose base64. This method allows Make.com to handle the PDF file as encoded text, which is necessary for extraction.We used this method because our file is in a data format, you can also use the URL option.
- File: Map the file data from the “Download a File” module. To do this, use the base64 function in Make.com to encode the file before sending it to the next step. This encoding ensures the file is converted into a format that the Dumpling AI can process.
- Add Your Prompt: For example, ask it to extract fields like invoice number, billed amount, and items. Here is the prompt used for this scenario:
Extract the following details from the invoice text:
-
Invoice Number
-
Billed To
-
Item Description
-
Rate
-
Amount
Return the data in JSON format like this:
{
“Invoice Number”: “”,
“Billed To”: “”,
“Items”: [
{“Item”: “”, “Description”: “”, “Rate”: “”, “Amount”: “”}
]
}
Set JSON Mode to “Yes”: In the module’s settings, switch the Mode to “Yes” under the “JSON Mode” option.This ensures that the extracted data is returned in a structured JSON.
If you’re working with simple, text-based PDFs, the Convert PDF to Text module will do the job. But for maximum flexibility and accuracy, especially with complex or image-heavy invoices, the Extract Data from PDF with AI module is your best bet. It ensures that your PDF data extraction is robust, no matter what type of document you’re working with.
Step 5: Parsing the JSON Data
The data Dumpling AI Extract Data from PDF with AI module returns is structured as JSON. Now we need to parse this JSON so that we can map it to specific columns in Google Sheets.
- Add the Parse JSON module.
- In the settings:
- JSON String: Map this to the output from the Dumpling AI module.
- Save the module.
This will break the JSON down into individual data points (invoice number, billed to, etc.) for easy handling.
Step 6: Use an Iterator to Handle Multiple Invoice Items
Invoices often have multiple line items. To handle this, we need to iterate over each item.
- Add the Iterator module.
- Array: Map the array of items (like product descriptions, rates, and amounts) from the parsed JSON output.
This step breaks down the items in the invoice, allowing you to process each one individually.
Step 7: Add Data to Google Sheets
Finally, it’s time to send the extracted data to Google Sheets.
- Add the Add Row module from Google Sheets.
- Spreadsheet: Select the sheet where you want the data.
- Mapping: Map each piece of data (Invoice Number, Billed To, Item, Description, Rate, and Amount) to the appropriate columns.
Wrapping Up
With this workflow, you can drop a PDF invoice into Google Drive, Dumpling AI and Make.com will automatically extract the details and organize them in Google Sheets. The iterator ensures all line items are handled, saving you from manual data entry for every item!