I am building a workflow for transcribing audio files using Whisper. If the audio file is greater than 25 MB, I use CloudConvert for compression with the option audio_convert = 25. However, when testing with a 25.6 MB audio file (52 minutes long), the compressed file size reached 25.1 MB, causing the Whisper action to fail with the error: "Payload Too Large - {“error”: {“message”:“Maximum content size limit (26214400) exceeded (26320136 bytes)}.”
I also tried compressing with the audio-bitrate option, which reduced the file size to 9.5 MB, but the audio quality was too low, resulting in poor transcription accuracy. Does increasing the audio bitrate value help in improving the quality?
Does anyone have suggestions or workarounds for this issue? I’m considering splitting the lengthy audio into smaller chunks as an error handler step but need guidance on how to do this or recommendations for applications that can help
I’m having a similar issue. Even using parameters that compress the file to the correct size on the CloudConvert website stop working when used with the CloudConvert module.
A work around for splitting the file into multiple chunks is to use the trim_start and trim_end parameters to cut the audio file roughly in half.
Sure, here’s the end parameter configured to trim the audio file to only the first 10 minuets. “trim_start” works the same way for setting the start time of the file, but I’m not using that in this scenario.
You can find the full documentation on the possible options here, though I suspect either the bitrate or sample_rate has a cap not mentioned in the documentation as there was a hard limit to the file size I could achieve with these parameters even though I could get smaller with the same parameters on the website itself.