Video Transcribe using Google Vertex

Prabha-10 · October 3, 2024, 5:20pm

Im trying to build a transcription bot, by replacing chatgpt wispher with Google Gemini Vertex. Im facing issue with the sharable link.

here is the blueprint:
blueprint.json (48.8 KB)

Output bundle for Google drive, Get sharable link element:

[
    {
        "fileId": "1hMP4g_9tCvy_FZRMDsCRTGqkCyCv5T9F",
        "kind": "drive#permission",
        "id": "anyoneWithLink",
        "type": "anyone",
        "role": "reader",
        "allowFileDiscovery": false,
        "shareLink": "https://drive.google.com/file/d/1hMP4g_9tCvy_FZRMDsCRTGqkCyCv5T9F",
        "webContentLink": "https://drive.google.com/uc?id=1hMP4g_9tCvy_FZRMDsCRTGqkCyCv5T9F&export=download"
    }
]

Input bundle for Google Gemini Vertex:

[
    {
        "topK": 32,
        "topP": 1,
        "model": "gemini-pro-vision",
        "messages": [
            {
                "role": "user",
                "prompt": "transcribe the video",
                "fileUri": "https://drive.google.com/file/d/1hMP4g_9tCvy_FZRMDsCRTGqkCyCv5T9F",
                "mimeType": "video/mp4",
                "videoMetadata": {
                    "endOffset": {},
                    "startOffset": {}
                },
                "fileUploadType": "fileUri"
            }
        ],
        "projectId": "make-vertex-436917",
        "temperature": 0.4,
        "serviceEndpointLocationId": "us-central1"
    }
]

RobertAndrews · October 12, 2024, 8:47pm

I think you want to be using the webContentLink (which is the actual raw video URL), not the shareLink.

Separately, though, I’m interested to see how your use case goes. I just tried something similar, with a working video URL, and it just spat out complete hallucination for the transcript.

When I asked it to identify speaker names from captions in the lower third, it got the two speakers’ names correct, and then hallucinated a load of others that weren’t present in the video.

Is gemini-pro-vision even a real model? Maybe the first version? The doc page at Explore vision capabilities with the Gemini API | Google AI for Developers suggests transcribing video with gemini-1.5-pro.

I got fantastic results in Google AI Studio, but don’t know how to achieve the equivalent through this module. Via the same interface, Gemini 1.5 Pro denies to me that its API supports multimodality.

Prabha-10 · October 18, 2024, 6:56am

Thanks for your reply Mr Robert. Basically im just trying to transcribe videos using free resoureces and yeah google is not working like we expected.

Topic		Replies	Views
Uploading files to Gemini How To google-drive , airtable	3	1394	May 14, 2025
Google Vertex AI (Gemini) - Analyze Image/Video (gemini-pro-vision) How To error	7	1206	November 2, 2024
Error 400 on Module Gemini Vertex How To error	2	85	May 18, 2025
Google Vertex API Timeout Issues How To api , error , google	1	150	November 21, 2024
How can I get a transcription from a video? How To google-drive	2	1009	December 8, 2024

Video Transcribe using Google Vertex

Related topics