Speech to Text (Google or Whisper API)

Jens · April 5, 2023, 1:49pm

Hello all,

I am currently looking for a solution to convert speech to text, however there is no pre-built app from Google and OpenAI. Does anyone have a solution on how to link them via the API?

Hope you can help me!

Runcorn · April 5, 2023, 2:06pm

You can use the HTTP module to Make a Request to OpenAI for Transcriptions. What you want to do is,

Download the Audio File
Make a Request to OpenAI with the following configuration

Please find the attached blueprint if you want to test it out.

blueprint (32).json (14.4 KB)

Papatatas · May 8, 2023, 7:19pm

Hi guys, I used the exact same method but whisper returns me an “invalid file format” and I get the following error :

Could you help me solve this out please? My file is a .m4a which should work in theory.

I craved to find some way to connect Make to Whisper and you showed me the way! Thanks for that.

rmanor · June 3, 2023, 4:00am

If anybody runs into this issue, the error happens because the OpenAI API expects the file field in the request to contain the binary data of the audio file you want to transcribe. The data value can’t be blank and can’t contain a URL. So add the binary data to the file field where it says “data:” and you should be good to go.

Mohamed_Jahar · June 3, 2023, 8:48am

Hi @Jens

You can use the Make app “Google Cloud Speech” to achieve your requirement. Let me know if you have any further questions.

MSquare Support
Visit us here
Youtube Channel

Robert_Mitchell · June 27, 2023, 4:26pm

Hi,
So just to clarify what this thread seems to imply.
I can’t use the make.com HTTP request for a whisper API transcription and you are suggesting we use google cloud speech on google cloud services, the quality of which is, at the moment, really poor compared to whisper API.
If so, is there is a way to extract the file data and input it into this API call?
Thanks,
Robert.

Eden_AI · June 28, 2023, 2:52pm

Hi,

You can use Eden AI Speech-to-Text module. It is a pre-build app made by Eden AI.

It allows you to access to all the best Speech-to-text services: Google, Whisper, Assembly, Deepgram, Speechmatics, Azure, IBM, Rev, AWS, etc.

vendy · August 24, 2023, 10:40am

Hi @Jens

your question turned a few heads . I’m just wondering if it helped you to solve your issue.

If yes, could you mark one of the suggestions as a solution? This way we keep the community neat and tidy for other users.

Thanks a lot!

Damianski · September 10, 2023, 6:57am

how to add binary data?

annianni · September 20, 2023, 5:30pm

To link OpenAI with Google Speech-to-Text, you can use the following Python code:

import speech_recognition as sr

# Set your OpenAI API key
openai.api_key = "YOUR_OPENAI_API_KEY"

# Create a Google Speech-to-Text recognizer
r = sr.Recognizer()

# Start listening to audio
with sr.Microphone() as source:
    audio = r.listen(source)

# Convert the audio to text
text = r.recognize_google(audio)

# Generate a response from OpenAI
response = openai.Completion.create(
    model="text-davinci",
    prompt=text,
    temperature=0.7,
)

# Print the response
print(response.choices[0].text)

This code will first start by listening to audio input from the microphone. The audio will then be converted to text using the Google Speech-to-Text API. The text will then be sent to OpenAI to generate a response. Finally, the response will be printed to the console.

Here is an example of how to use the code:

>>> import speech_recognition as sr
>>>
>>> # Set your OpenAI API key
>>> openai.api_key = "YOUR_OPENAI_API_KEY"
>>>
>>> # Create a Google Speech-to-Text recognizer
>>> r = sr.Recognizer()
>>>
>>> # Start listening to audio
>>> with sr.Microphone() as source:
...     audio = r.listen(source)
...
>>>
>>> # Convert the audio to text
>>> text = r.recognize_google(audio)
>>>
>>> # Generate a response from OpenAI
>>> response = openai.Completion.create(
...     model="text-davinci",
...     prompt=text,
...     temperature=0.7,
... )
...
>>>
>>> # Print the response
>>> print(response.choices[0].text)

You can modify the code to suit your needs, such as changing the OpenAI model that you use or the prompt that you send. You can also use the code to create a more interactive experience, such as by prompting the user to speak to OpenAI again after receiving a response.

You can put the code in any text editor, such as Notepad or Visual Studio Code. Once you have saved the code as a Python file (with the .py extension), you can run it in a terminal or command prompt.

To run the code in a terminal or command prompt, navigate to the directory where the file is saved and type the following command:

python filename.py

For example, if you saved the code as openai_speech_to_text.py, you would type the following command to run it:

python openai_speech_to_text.py

You can also create a shortcut to the file on your desktop or in your Start menu, so that you can run it with a double-click.

I personally find using any external audio transcription or text to speech online service more feasible

DavidGurr_Make · September 21, 2023, 8:28am

FYI, we also recently added Whisper modules to the Make OpenAI App

Topic		Replies	Views
Uploading Audio for Transcription on Whisper How To open-ai	3	938	March 22, 2024
openAI Whisper Transcript API Call How To api , open-ai	5	1023	April 17, 2024
Error while converting speech to text - Jotforms, Https-Get-OpenAI Whisper Getting Started error	5	326	September 6, 2024
How to chunk down audio to 25mb max size for openai whisper transcription with make templates? How To open-ai	3	2988	April 22, 2024
Differentiate Speakers in Audio File How To ai	5	650	March 27, 2024

Speech to Text (Google or Whisper API)

Related topics