Hi,
I’m transcribing an audio file using whisper and I want to send the result to GPT to check and fix some text problems.
The text is formatted as SRT file with unknown length / number of characters such as:
162
00:12:53,505 → 00:12:56,591
Now in this class you can
either call me Mr Keating…163
00:12:56,675 → 00:13:00,668
or if you’re slightly more daring
“O Captain my Captain.”164
00:13:04,432 → 00:13:07,519
Now let me dispel a few rumours
so they don’t fester into facts.165
00:13:07,602 → 00:13:11,345
Yes I too attended Hell- ton
and survived.
as you can see the text is divided to sections - each start with section number and then time stemp and then line of text relevant to this section.
I’m looking for a way to send the text to GPT via API without passing the prompt size - So I need a way to break / split the text into predefined number of characters (including the section numbering and time stamp and the relevant text) - but also not to brake / split a section in the middle - so if sections 1 is 20 characters, section 2 is 50 characters and section 3 is 50 characters (total of 120) and my set limit is 100 - I want to send section 1 + 2 with a total of 70 and after that section 3 with a total of 50 instead of cutting section 3 in the middle and to send 1+2+ part of 3 to t total of 100 and the other part of 3 with a total of 20.
I hope this is clear.
Any Idea how to do it?
Please not that the number of section can changed based on the text I’m transcribing and also I want to control the number of characters to brake / split by.
Regards,
Ram