Differentiate Speakers in Audio File

Hi everyone!

I built a scenario which is supposed to watch new audio file uploads to a Google Drive folder, transcribe and summarize the files, then send the output to a Google Sheet.

For the transcription, I am using OpenAI Whisper.

My question is: how can I differentiate the speakers of the audio in the transcript?

Is it possible to do it with Whisper? Do I need to use something else?

Thanks a lot in advance :blush:

hey @stom
instead of using chatgpt for transcription you can use Google cloud text to speech’s synthesize a speech module
this helps do the job
Screenshot 2024-01-27 225711


Thanks, I will look into this!

1 Like

hey again thakur, i just tried it but i’m not sure i understand your advice. the goal is to transcribe the audio into text. in this text it should be clear who’s talking (hence my question). with the text to speech module i just create a new audio file, no?

The Google Cloud Speech “Start Asynchronous Speech Recognition” module is able to Differentiate multiple speakers with this setting.

The OpenAI Whisper module is not able to do this.

Hope this helps!