Speech to Text (Google or Whisper API)

Hello all,

I am currently looking for a solution to convert speech to text, however there is no pre-built app from Google and OpenAI. Does anyone have a solution on how to link them via the API?

Hope you can help me!

You can use the HTTP module to Make a Request to OpenAI for Transcriptions. What you want to do is,

  1. Download the Audio File
  2. Make a Request to OpenAI with the following configuration


Please find the attached blueprint if you want to test it out.

blueprint (32).json (14.4 KB)

Hi guys, I used the exact same method but whisper returns me an “invalid file format” and I get the following error :


Could you help me solve this out please? My file is a .m4a which should work in theory.

I craved to find some way to connect Make to Whisper and you showed me the way! Thanks for that.

If anybody runs into this issue, the error happens because the OpenAI API expects the file field in the request to contain the binary data of the audio file you want to transcribe. The data value can’t be blank and can’t contain a URL. So add the binary data to the file field where it says “data:” and you should be good to go.

1 Like

Hi @Jens

You can use the Make app “Google Cloud Speech” to achieve your requirement. Let me know if you have any further questions.

MSquare Support
Visit us here
Youtube Channel

Hi,
So just to clarify what this thread seems to imply.
I can’t use the make.com HTTP request for a whisper API transcription and you are suggesting we use google cloud speech on google cloud services, the quality of which is, at the moment, really poor compared to whisper API.
If so, is there is a way to extract the file data and input it into this API call?
Thanks,
Robert.

Hi,

You can use Eden AI Speech-to-Text module. It is a pre-build app made by Eden AI.

It allows you to access to all the best Speech-to-text services: Google, Whisper, Assembly, Deepgram, Speechmatics, Azure, IBM, Rev, AWS, etc.

Hi @Jens :wave:

your question turned a few heads :smile: . I’m just wondering if it helped you to solve your issue.

If yes, could you mark one of the suggestions as a solution? This way we keep the community neat and tidy for other users. :broom:

Thanks a lot!

1 Like

how to add binary data?

To link OpenAI with Google Speech-to-Text, you can use the following Python code:

import speech_recognition as sr

# Set your OpenAI API key
openai.api_key = "YOUR_OPENAI_API_KEY"

# Create a Google Speech-to-Text recognizer
r = sr.Recognizer()

# Start listening to audio
with sr.Microphone() as source:
    audio = r.listen(source)

# Convert the audio to text
text = r.recognize_google(audio)

# Generate a response from OpenAI
response = openai.Completion.create(
    model="text-davinci",
    prompt=text,
    temperature=0.7,
)

# Print the response
print(response.choices[0].text)

This code will first start by listening to audio input from the microphone. The audio will then be converted to text using the Google Speech-to-Text API. The text will then be sent to OpenAI to generate a response. Finally, the response will be printed to the console.

Here is an example of how to use the code:

>>> import speech_recognition as sr
>>>
>>> # Set your OpenAI API key
>>> openai.api_key = "YOUR_OPENAI_API_KEY"
>>>
>>> # Create a Google Speech-to-Text recognizer
>>> r = sr.Recognizer()
>>>
>>> # Start listening to audio
>>> with sr.Microphone() as source:
...     audio = r.listen(source)
...
>>>
>>> # Convert the audio to text
>>> text = r.recognize_google(audio)
>>>
>>> # Generate a response from OpenAI
>>> response = openai.Completion.create(
...     model="text-davinci",
...     prompt=text,
...     temperature=0.7,
... )
...
>>>
>>> # Print the response
>>> print(response.choices[0].text)

You can modify the code to suit your needs, such as changing the OpenAI model that you use or the prompt that you send. You can also use the code to create a more interactive experience, such as by prompting the user to speak to OpenAI again after receiving a response.

You can put the code in any text editor, such as Notepad or Visual Studio Code. Once you have saved the code as a Python file (with the .py extension), you can run it in a terminal or command prompt.

To run the code in a terminal or command prompt, navigate to the directory where the file is saved and type the following command:

python filename.py

For example, if you saved the code as openai_speech_to_text.py, you would type the following command to run it:

python openai_speech_to_text.py

You can also create a shortcut to the file on your desktop or in your Start menu, so that you can run it with a double-click.

I personally find using any external audio transcription or text to speech online service more feasible

FYI, we also recently added Whisper modules to the Make OpenAI App

2 Likes