How can I get a transcription from a video?

I’m making a scenario that automates reel posts on Instagram. Thw flow is kind of like this:

Google Drive (watches a folder to get a new video each time is uploaded) + Google Drive (downloads the file) + video transcript tool (which I do not know what to use) + chatgpt to get a Reel description based on the transcription of the video + Instagram

At first I thought I could use Whisper but then I found out it only transcribes audio, is there anything else I can use? I thought I could use a playground assistant from OpenAI could work but it doesn’t receive video as far as I know

Thx in advance