Problem with Google Cloud Speech and Audio Files longer than 1 Minute

falkemedia · October 2, 2023, 7:03pm

Hello I would like to use Google Cloud Speech to automatically transcribe MP3 voice recordings into German.

With the template “Transcribe new short mp3 files from Google Drive with Google Cloud Speech” I managed to do this, but only with MP3s shorter than 1 minute.

My workflow currently looks like this:

Google Drive: watch folder, for new mp3 files

chrome_YxSmRPtmDm2560×1291 183 KB
Google Drive: Move a File, to move the MP3 file to a subfolder, so I know which ones have been processed.

Pf53cGElTf2560×1291 198 KB
Google Drive: Download File, so I can use the MP3 file.

PjCgvItVps2560×1291 153 KB
google cloud storage: Upload Object, the file will be uploaded to the cloud bucket.

9H3gR2PhA02560×1291 236 KB

Output

[
    {
 "kind": "storage#object",
        "id": "syltfraeulein_speech/Loewenzahn_Kurz.mp3/1696272261434851",
        "selfLink": "https://www.googleapis.com/storage/v1/b/syltfraeulein_speech/o/Loewenzahn_Kurz.mp3",
        "mediaLink": "https://www.googleapis.com/download/storage/v1/b/syltfraeulein_speech/o/Loewenzahn_Kurz.mp3?generation=1696272261434851&alt=media",
        "name": "Loewenzahn_Kurz.mp3",
        "bucket": "syltfraeulein_speech",
        "generation": "1696272261434851",
        "metageneration": "1",
        "contentType": "audio/mpeg",
        "storageClass": "STANDARD",
        "size": "2385375",
        "md5Hash": "r8WoC7j2sodz8PDe/SKzxA==",
        "crc32c": "uhEIkA==",
        "etag": "COOb18yC2IEDEAE=",
        "timeCreated": "2023-10-02T18:44:21.457Z",
        "updated": "2023-10-02T18:44:21.457Z",
        "timeStorageClassUpdated": "2023-10-02T18:44:21.457Z"

google cloud storage: Update Object to set the sharing of the file to public. (was just a test, in case Make.com does not have access).

Q2cbSFD2182560×1291 192 KB
Google Cloud Speech: Asynchronous Speech Recognition, to transcribe the MP3 file from the bucket into text.

9H3gR2PhA02560×1291 236 KB

I got this setting from this post: Google Cloud Speech-to-text module not working - #2 by Kirill_D

Input

[
    {
        "uri": "gs://syltfraeulein_speech/Loewenzahn_Kurz.mp3",
        "encoding": "MP3",
        "metadata": {
            "interactionType": "VOICEMAIL",
            "originalMimeType": "audio/mpeg",
            "originalMediaType": "AUDIO",
            "recordingDeviceType": "SMARTPHONE"
        },
        "FLACorWAV": true,
        "uploadType": "uri",
        "languageCode": "de-DE",
        "sampleRateHertz": 16000,
        "audioChannelCount": 1,
        "enableSeparateRecognitionPerChannel": false
    }
]

Output

[
    {
        "name": "791715767449740107"
    }
]

Later with 7. the text should be saved in a Google Doc and the file will be deletet from the bucket, but I am not there yet.

Point 1-5 runs without problems.

At point 6, I then come to a problem. I pass the file to Cloud Speech via URI, but only get an output back that is not a transcription.

When I do the transcription with my test workflow, I end up with an output of

"alternatives": [
            {
                "transcript": "Here is the text",
                "confidence": 0.90918183
            }

This works only with audio recordings shorter than one minute but we want to have whole interviews transcribed.

Am I doing something wrong or do I think the workflow is too complicated?

Greetings

samliew · October 3, 2023, 3:15am

I think since it’s async, you need to wait for it to be processed first, then get the results later.

Here, I’ve added a 300 second (5 mins) delay before using the Get module:

Let me know if this works for you.

falkemedia · October 4, 2023, 8:34am

Yeah mate, thanks! That helped me, now I get the right output out

Would you have any idea how to merge the multiple arrays?
In the Google Doc under Content “{{24.response.results.alternatives.transcript}}” is not enough, it only takes the first part of the array.

Here again the output from the trascript and the input into the Google Doc.

Output from Speech

[
    {
        "name": "752806724534341005",
        "metadata": {
            "@type": "type.googleapis.com/google.cloud.speech.v1p1beta1.LongRunningRecognizeMetadata",
            "progressPercent": 100,
            "startTime": "2023-10-04T08:10:08.972781Z",
            "lastUpdateTime": "2023-10-04T08:10:31.393610Z",
            "uri": "gs://syltfraeulein_speech/Loewenzahn_Kurz.mp3"
        },
        "done": true,
        "response": {
            "@type": "type.googleapis.com/google.cloud.speech.v1p1beta1.LongRunningRecognizeResponse",
            "results": [
                {
                    "alternatives": [
                        {
                            "transcript": "ist er nicht schön mein Garten ein bisschen ungepflegt aber ich finde gerade schön wenn die Blumen alle da wachsen wo wir wollen und am liebsten habe ich natürlich die hier Löwenzahn Moment Marvel wieso der Wind kommt auch von da da müssen auf die Stirn nach da oder kommt der Wind jetzt von da neugierig hartnäckig und unzähligen Fragen auf der Spur so schätzten nicht nur Kinder Peter Lustig ganze Generationen sind mit ihm groß geworden immer wieder verblüffte er mit wissenswerten und kuriosen Geschichten aus Natur Umwelt und Technik geboren wurde Peter Lustig 1937 in Breslau nach dem Krieg zog in die Arbeit nach Berlin er machte zunächst eine Lehre zum rundfunkmechaniker danach ein Studium zum Elektro",
                            "confidence": 0.91188544
                        }
                    ],
                    "resultEndTime": "59.970s",
                    "languageCode": "de-de"
                },
                {
                    "alternatives": [
                        {
                            "transcript": " Ingenieur und er fing an Kinder und Technik Bücher zu schreiben bevor Peter Lustig zum Fernsehen kam arbeitete er beim Radio der Schritt vor die Kamera war eher ein Zufall liefer Tonmeister beim Film also hinter der Kamera zuerst ja dann hat sich per Zufall so ergeben dass man ein reiches wer sagte das wenig vergessen wenn man diesen Tonmeister daran rohes Ei auf die Classic night das müsste komisch aussehen lass uns doch mal machen eines Tages wir hatten gerade mal wieder in Berlin die kleinen Film produziert hat mir noch eine Rest der Rolle 10 minuten unbewegte das Material und wir drehten gerade in einer Kommune in Kreuzberg und ich sagte Peter Kompass auf wir nehmen jetzt diese Rolle des in zehn minuten und machen einen Kurzfilm mit dir in der Hauptrolle und als ich dann nach Köln fuhr habe ich den Unterhalt der den damaligen Chef der Sendung mit der Maus auch diese zehn Minuten gezeigt und er sagte den Mandeln haben",
                            "confidence": 0.9078844
                        }
                    ],
                    "resultEndTime": "119.940s",
                    "languageCode": "de-de"
                },
                {
                    "alternatives": [
                        {
                            "transcript": " ersten Auftritt hatte Lust Dich dann auch in der Sendung mit der Maus als Moderator arbeitete er erstmalig beim Bayerischen Rundfunk in der Wolpertinger Wochenschau eine Art Comic und Satiresendung für Erwachsene das Konzept hatte Peter Lustig selbst mit ausgedacht hier legte er quasi den Grundstein für seine spätere Fernseh Laufbahn 1978 holte man ihn schließlich zum zdf ein Glücksfall wie sich noch zeigen sollte",
                            "confidence": 0.8976325
                        }
                    ],
                    "resultEndTime": "148.620s",
                    "languageCode": "de-de"
                }
            ],
            "totalBilledTime": "149s",
            "requestId": "752806724534341005"
        }
    }
]

Input in Google Docs

[
    {
        "name": "Loewenzahn_Kurz.mp3_transcript",
        "footer": false,
        "header": false,
        "content": "ist er nicht schön mein Garten ein bisschen ungepflegt aber ich finde gerade schön wenn die Blumen alle da wachsen wo wir wollen und am liebsten habe ich natürlich die hier Löwenzahn Moment Marvel wieso der Wind kommt auch von da da müssen auf die Stirn nach da oder kommt der Wind jetzt von da neugierig hartnäckig und unzähligen Fragen auf der Spur so schätzten nicht nur Kinder Peter Lustig ganze Generationen sind mit ihm groß geworden immer wieder verblüffte er mit wissenswerten und kuriosen Geschichten aus Natur Umwelt und Technik geboren wurde Peter Lustig 1937 in Breslau nach dem Krieg zog in die Arbeit nach Berlin er machte zunächst eine Lehre zum rundfunkmechaniker danach ein Studium zum Elektro",
        "folderId": "/18KxoZS5GRz3n3DouHEjErxX6HZ42-6Hy",
        "destination": "team",
        "sharedDrive": "0AOGwerNoyQrZUk9PVA",
        "useDomainAdminAccess": true
    }
]

I thought about using an array merge, but unfortunately I don’t find much in the documentation.

Greetings

samliew · October 4, 2023, 8:37am

No problem, glad I could help!

In future, please create a new thread for each question. This makes it easier for others with the same problem to search for the answer. Thank you for your cooperation!

The Make Community guidelines encourages users to try to mark helpful replies as solutions to help keep the Community organized.

This marks the topic as solved, so that:

others can save time when catching up with the latest activity here, and
allows others to quickly jump to the solution if they come across the same problem

To do this, simply click the checkbox at the bottom of the post:
Screenshot_2023-10-04_161049

samliew · October 7, 2023, 1:55pm

To “combine” multiple items into a single structure, you’ll need to use an aggregator of some sort.

Aggregators are modules that accumulate multiple bundles into one single bundle. An example of a commonly-used aggregator module is the Array aggregator module.

For your case you just want one long string variable, so you might want to consider a Text Aggregator.

Topic		Replies	Views
Google Cloud Speech-to-text module not working Questions & Answers	5	1128	September 22, 2023
Speech to Text (Google or Whisper API) Questions & Answers	11	3019	October 20, 2023
Invalid file format \| Google Drive <> Whisper Questions & Answers connections , error	5	374	April 16, 2024
Video - Audio Conversion with AI Transcription Questions & Answers google-drive , open-ai	0	208	March 6, 2024
The output of my module "Google Cloud Text To Speech" is a set of number and not a mp3... can someone help me? Questions & Answers api	5	391	August 14, 2023

Problem with Google Cloud Speech and Audio Files longer than 1 Minute

Related Topics