I want to download a Zoom cloud recording transcript (.vtt file) in Make and convert it into readable UTF-8 text so I can extract the spoken content and use it in later steps (e.g., GPT processing, summaries, description generation, etc.).
What is the problem & what have you tried?
Problem
When I download the transcript file via the Zoom module, the file arrives in Make as binary data.
The content is not directly readable â it looks like a hex/byte stream.
I need a way to decode this binary into actual text so that I can work with the VTT subtitle content.
What Iâve tried
I tried using the module âConvert encoding of a textâ, but since the input is binary, it does not decode the content.
The output looks identical to the input bundle and remains unreadable.
I also checked other Tools modules, but I havenât found anything that takes binary input and returns text output.
Are you very sure youâre downloading a VTT file??
It looks like Zoomâs recordings are in m4a (Audio format), try using a module like the OpenAI (ChatGPT, Sora, DALL-E, Whisper) âGenerate a transcriptionâ module â
Transcribes an audio to text.
For more information about the âGenerate a transcriptionâ module and the OpenAI (ChatGPT, Sora, DALL-E, Whisper) app, see the corresponding Integrations page and the Help Centre documentation.
Thanks for the suggestion!
Using the OpenAI Generate Transcription (Whisper or transcribe) module would normally be a great workaround â unfortunately in my case it doesnât solve the problem.
The audio files coming from Zoom are much longer than the current input limits of the Make â OpenAI Whisper module.
Whisper in Make has an audio length limit of 1,400 seconds (â 23 minutes)
My Zoom recordings are 40â90 minutes, so the module rejects the file
Splitting the files is not an option in this automation, because Zoom delivers them as a single M4A
So while Whisper would work for shorter audio, it canât process these full-length recordings inside Make due to the time limit.
I need the transcript because I use it in OpenAI to generate a YouTube headline and a YouTube description. Thatâs why having the transcript is so important for my workflow.
Creating the transcript directly from the audio file is not an option, because the OpenAI module has a strict audio-length limitation, so the full recordings canât be processed that way.
Yeah, and what I mean is your screenshot seems to be showing the audio file and not a transcript. And due to Makeâs built in timeout limits and data transfer limits it would be better to try compressing the file first, like Sam suggested. Or try an external service to do the transfer.