Hello I would like to use Google Cloud Speech to automatically transcribe MP3 voice recordings into German.
With the template “Transcribe new short mp3 files from Google Drive with Google Cloud Speech” I managed to do this, but only with MP3s shorter than 1 minute.
My workflow currently looks like this:
- 
Google Drive: watch folder, for new mp3 files
 - 
Google Drive: Move a File, to move the MP3 file to a subfolder, so I know which ones have been processed.
 - 
Google Drive: Download File, so I can use the MP3 file.
 - 
google cloud storage: Upload Object, the file will be uploaded to the cloud bucket.
Output 
[
    {
 "kind": "storage#object",
        "id": "syltfraeulein_speech/Loewenzahn_Kurz.mp3/1696272261434851",
        "selfLink": "https://www.googleapis.com/storage/v1/b/syltfraeulein_speech/o/Loewenzahn_Kurz.mp3",
        "mediaLink": "https://www.googleapis.com/download/storage/v1/b/syltfraeulein_speech/o/Loewenzahn_Kurz.mp3?generation=1696272261434851&alt=media",
        "name": "Loewenzahn_Kurz.mp3",
        "bucket": "syltfraeulein_speech",
        "generation": "1696272261434851",
        "metageneration": "1",
        "contentType": "audio/mpeg",
        "storageClass": "STANDARD",
        "size": "2385375",
        "md5Hash": "r8WoC7j2sodz8PDe/SKzxA==",
        "crc32c": "uhEIkA==",
        "etag": "COOb18yC2IEDEAE=",
        "timeCreated": "2023-10-02T18:44:21.457Z",
        "updated": "2023-10-02T18:44:21.457Z",
        "timeStorageClassUpdated": "2023-10-02T18:44:21.457Z"
- 
google cloud storage: Update Object to set the sharing of the file to public. (was just a test, in case Make.com does not have access).
 - 
Google Cloud Speech: Asynchronous Speech Recognition, to transcribe the MP3 file from the bucket into text.
 
I got this setting from this post: Google Cloud Speech-to-text module not working - #2 by Kirill_D
Input
[
    {
        "uri": "gs://syltfraeulein_speech/Loewenzahn_Kurz.mp3",
        "encoding": "MP3",
        "metadata": {
            "interactionType": "VOICEMAIL",
            "originalMimeType": "audio/mpeg",
            "originalMediaType": "AUDIO",
            "recordingDeviceType": "SMARTPHONE"
        },
        "FLACorWAV": true,
        "uploadType": "uri",
        "languageCode": "de-DE",
        "sampleRateHertz": 16000,
        "audioChannelCount": 1,
        "enableSeparateRecognitionPerChannel": false
    }
]
Output
[
    {
        "name": "791715767449740107"
    }
]
Later with 7. the text should be saved in a Google Doc and the file will be deletet from the bucket, but I am not there yet.
Point 1-5 runs without problems.
At point 6, I then come to a problem. I pass the file to Cloud Speech via URI, but only get an output back that is not a transcription.
When I do the transcription with my test workflow, I end up with an output of
"alternatives": [
            {
                "transcript": "Here is the text",
                "confidence": 0.90918183
            }
This works only with audio recordings shorter than one minute but we want to have whole interviews transcribed.
Am I doing something wrong or do I think the workflow is too complicated?
Greetings ![]()







