WhatsApp Voice Into Actions: Scheduling Meetings Using Voice Instructions

Msquare_Automation · January 7, 2026, 3:13pm

Why this automation exists

Scheduling meetings usually looks simple, but in reality it takes time.
You check calendars, find a free slot, send invites, add agendas, and then remember to follow up. When this happens multiple times a day, it adds up.

The goal of this automation is simple:

handle meeting-related tasks using a single WhatsApp voice message. Instead of opening Gmail or Google Calendar, the user sends a voice note like:
“Book a product meeting with Jumar.”

From there, the workflow takes care of the rest

What this automation does (in practice)

When a voice message is sent on WhatsApp, the scenario can:

Convert the voice message into text
Understand what the user wants (meeting or email)
Create a Google Calendar event when needed
Send confirmation back on WhatsApp (text + voice note)
Send emails when the request is email-related

All actions are triggered only by the voice instruction.

Tools involved

WhatsApp Cloud API (via Meta Developer account)
OpenAI (for transcription and instruction parsing)
Google Calendar
Email service
ElevenLabs (text-to-speech)

Scenario overview

The scenario starts by watching incoming WhatsApp messages.
This includes both text and audio messages, but this setup focuses on audio input.

1. Watching WhatsApp responses

The first module listens for new WhatsApp responses.
This requires a verified Meta Developer account and an approved WhatsApp app.

Once connected, every incoming message is captured by the scenario.

2. Downloading the audio message

If the incoming message is an audio note, it is downloaded using the WhatsApp media download step.

This audio file becomes the input for the next stage.

3. Converting voice to text (transcription)

The audio file is sent to OpenAI for transcription.

At this point, we now have:

The original voice message
The text version of what the user said

This transcript is used in all following steps.

4. Understanding the user’s intent

A second OpenAI step is used to interpret the instruction, not just read the text.

For example:

“Book a meeting” → calendar action
“Send an email” → email action

Based on this interpretation, the scenario assigns a result type:

Type 0 → Meeting-related request
Type 1 → Email-related request

image501×830 64.2 KB

This decides which route the scenario will follow.

Route 1: Meeting scheduling flow (Result Type 0)

When the instruction is about scheduling a meeting, the scenario:

Calculates:

Start time
End time
Duration

Creates a Google Calendar event

image651×877 68.3 KB

Adds meeting details and agenda

WhatsApp confirmation (text)

A WhatsApp text message is sent with:

Meeting date and time
Calendar confirmation
Any additional details

image791×894 71.9 KB

WhatsApp confirmation (voice note)

To keep the interaction natural, a voice response is also sent.

Steps involved:

OpenAI generates the confirmation text
ElevenLabs converts text to speech
Audio is converted to Opus format (required for WhatsApp voice notes)
Media is uploaded and sent via WhatsApp API
An HTTP module is used here instead of the default WhatsApp module, because WhatsApp requires voice=true to deliver messages as real voice notes.

image862×693 61.2 KB

Route 2: Email-related flow (Result Type 1)

If the voice instruction is about sending an email:

OpenAI generates:

Email subject
Email body (HTML)

The email is sent to the recipient

image580×633 41.8 KB
A voice note confirmation is also generated and delivered

image790×598 51.7 KB

This keeps the experience consistent—every voice request gets a voice response.

Why HTTP is used for voice notes

The official WhatsApp module allows sending audio files, but not voice notes.
To send proper voice notes:

The WhatsApp API requires voice=true
This is only possible through an HTTP request following official API documentation

Final outcome

With this setup:

The user never opens Calendar or Gmail
Everything starts from WhatsApp
Voice instructions are enough to trigger real actions
The system responds the way a human assistant would

This automation is not about replacing people.
It’s about removing repetitive steps that slow down daily work.

Once set up, it quietly runs in the background and handles routine requests triggered by simple voice messages.

Topic		Replies	Views
WhatsApp automation Hire Help airtable , ai	5	48	December 17, 2025
Need help finalizing an Whatsapp AI agent for booking with Google Calendar Hire Help google-calendar , open-ai , ai , whatsapp	5	52	December 4, 2025
Meet meeting reminder Questions google-calendar	4	407	July 9, 2024
How to schedule the sending of text messages and whatsapp messages Questions google-sheets , whatsapp	16	3558	September 8, 2023
WhatsApp chatbot with AI and human handoff to a real secretary (Make + WhatsApp) Beginner Questions connections , ai , whatsapp , ai-agent	0	16	January 7, 2026