Differentiate Speakers in Audio File

stom · January 27, 2024, 9:51am

Hi everyone!

I built a scenario which is supposed to watch new audio file uploads to a Google Drive folder, transcribe and summarize the files, then send the output to a Google Sheet.

For the transcription, I am using OpenAI Whisper.

My question is: how can I differentiate the speakers of the audio in the transcript?

Is it possible to do it with Whisper? Do I need to use something else?

Thanks a lot in advance

ThakurHemansh · January 27, 2024, 5:28pm

hey @stom
instead of using chatgpt for transcription you can use Google cloud text to speech’s synthesize a speech module
this helps do the job
Screenshot 2024-01-27 225711

stom · January 30, 2024, 3:05pm

Thanks, I will look into this!

stom · February 3, 2024, 2:54pm

hey again thakur, i just tried it but i’m not sure i understand your advice. the goal is to transcribe the audio into text. in this text it should be clear who’s talking (hence my question). with the text to speech module i just create a new audio file, no?

samliew · February 7, 2024, 5:44am

The Google Cloud Speech “Start Asynchronous Speech Recognition” module is able to Differentiate multiple speakers with this setting.

The OpenAI Whisper module is not able to do this.

Hope this helps!

Topic		Replies	Views
Speech to Text (Google or Whisper API) Features	11	4985	October 20, 2023
Split Audio File to Transcribe with Whisper Features google-drive , chatgpt	5	1012	August 12, 2024
Video - Audio Conversion with AI Transcription How To google-drive , open-ai	1	721	June 4, 2024
Transcribe with Whisper - File size issue How To open-ai	2	122	February 26, 2025
Transcription using Whisper API How To api , open-ai , chatgpt , ai	2	108	December 20, 2024

Differentiate Speakers in Audio File

Related topics