WhatsApp Voice Into Actions: Scheduling Meetings Using Voice Instructions

Why this automation exists

Scheduling meetings usually looks simple, but in reality it takes time.
You check calendars, find a free slot, send invites, add agendas, and then remember to follow up. When this happens multiple times a day, it adds up.

The goal of this automation is simple:

handle meeting-related tasks using a single WhatsApp voice message. Instead of opening Gmail or Google Calendar, the user sends a voice note like:
“Book a product meeting with Jumar.”

From there, the workflow takes care of the rest

What this automation does (in practice)

When a voice message is sent on WhatsApp, the scenario can:

  • Convert the voice message into text
  • Understand what the user wants (meeting or email)
  • Create a Google Calendar event when needed
  • Send confirmation back on WhatsApp (text + voice note)
  • Send emails when the request is email-related

All actions are triggered only by the voice instruction.

Tools involved

  • WhatsApp Cloud API (via Meta Developer account)
  • OpenAI (for transcription and instruction parsing)
  • Google Calendar
  • Email service
  • ElevenLabs (text-to-speech)

Scenario overview

The scenario starts by watching incoming WhatsApp messages.
This includes both text and audio messages, but this setup focuses on audio input.

1. Watching WhatsApp responses

The first module listens for new WhatsApp responses.
This requires a verified Meta Developer account and an approved WhatsApp app.

Once connected, every incoming message is captured by the scenario.

2. Downloading the audio message

If the incoming message is an audio note, it is downloaded using the WhatsApp media download step.

This audio file becomes the input for the next stage.

3. Converting voice to text (transcription)

The audio file is sent to OpenAI for transcription.

At this point, we now have:

  • The original voice message

  • The text version of what the user said

This transcript is used in all following steps.

4. Understanding the user’s intent

A second OpenAI step is used to interpret the instruction, not just read the text.

For example:

  • “Book a meeting” → calendar action

  • “Send an email” → email action

Based on this interpretation, the scenario assigns a result type:

This decides which route the scenario will follow.

Route 1: Meeting scheduling flow (Result Type 0)

When the instruction is about scheduling a meeting, the scenario:

  1. Calculates:
  • Start time

  • End time

  • Duration

  1. Creates a Google Calendar event

Adds meeting details and agenda

WhatsApp confirmation (text)

A WhatsApp text message is sent with:

WhatsApp confirmation (voice note)

To keep the interaction natural, a voice response is also sent.

Steps involved:

  • OpenAI generates the confirmation text
  • ElevenLabs converts text to speech
  • Audio is converted to Opus format (required for WhatsApp voice notes)
  • Media is uploaded and sent via WhatsApp API
    An HTTP module is used here instead of the default WhatsApp module, because WhatsApp requires voice=true to deliver messages as real voice notes.

Route 2: Email-related flow (Result Type 1)

If the voice instruction is about sending an email:

  1. OpenAI generates:
  • Email subject
  • Email body (HTML)
  1. The email is sent to the recipient

  2. A voice note confirmation is also generated and delivered

This keeps the experience consistent—every voice request gets a voice response.

Why HTTP is used for voice notes

The official WhatsApp module allows sending audio files, but not voice notes.
To send proper voice notes:

  • The WhatsApp API requires voice=true
  • This is only possible through an HTTP request following official API documentation

Final outcome

With this setup:

  • The user never opens Calendar or Gmail
  • Everything starts from WhatsApp
  • Voice instructions are enough to trigger real actions
  • The system responds the way a human assistant would

This automation is not about replacing people.
It’s about removing repetitive steps that slow down daily work.

Once set up, it quietly runs in the background and handles routine requests triggered by simple voice messages.

1 Like