How to split the GPT prompt input?

Ram · September 14, 2023, 5:55pm

Hi,
I’m transcribing an audio file using whisper and I want to send the result to GPT to check and fix some text problems.

The text is formatted as SRT file with unknown length / number of characters such as:

162
00:12:53,505 → 00:12:56,591
Now in this class you can
either call me Mr Keating…

163
00:12:56,675 → 00:13:00,668
or if you’re slightly more daring
“O Captain my Captain.”

164
00:13:04,432 → 00:13:07,519
Now let me dispel a few rumours
so they don’t fester into facts.

165
00:13:07,602 → 00:13:11,345
Yes I too attended Hell- ton
and survived.

as you can see the text is divided to sections - each start with section number and then time stemp and then line of text relevant to this section.

I’m looking for a way to send the text to GPT via API without passing the prompt size - So I need a way to break / split the text into predefined number of characters (including the section numbering and time stamp and the relevant text) - but also not to brake / split a section in the middle - so if sections 1 is 20 characters, section 2 is 50 characters and section 3 is 50 characters (total of 120) and my set limit is 100 - I want to send section 1 + 2 with a total of 70 and after that section 3 with a total of 50 instead of cutting section 3 in the middle and to send 1+2+ part of 3 to t total of 100 and the other part of 3 with a total of 20.

I hope this is clear.
Any Idea how to do it?

Please not that the number of section can changed based on the text I’m transcribing and also I want to control the number of characters to brake / split by.
Regards,
Ram

samliew · September 15, 2023, 12:45am

Would you be able to upload an example of a full SRT file you are trying to process, into your original question, so that I can import into a scenario to test with?

Ram · September 15, 2023, 7:07am

Hi,
In my scenario I’m uploading an mp3 to whisper and convert it to srt format and then I want to send the srt format I got from whisper to GPT.

But since whisper is producing the srt as text like the example I added before - all you need is some text in the format of number => time stamp → text

This is another text example:

2
00:00:42,315 → 00:00:45,767
Now remember keep
your shoulders back.

3
00:00:54,786 → 00:00:57,403
Okay put your arm
around your brother. That’s it.

4
00:00:57,497 → 00:00:59,457
That’s it.
Right and breathe in.

5
00:01:01,209 → 00:01:03,117
Okay one more.

6
00:01:07,632 → 00:01:09,707
Now just to review

7
00:01:09,801 → 00:01:12,210
you’re going to follow
along the procession…

8
00:01:12,303 → 00:01:15,223
until you get
to the headmaster.

9
00:01:15,306 → 00:01:17,882
At that point
he will indicate to you…

10
00:01:17,976 → 00:01:20,051
to light the candles
of the boys.

11
00:01:20,145 → 00:01:22,720
All right boys
let’s settle down.

12
00:01:29,320 → 00:01:32,105
Banners up!

13
00:02:22,791 → 00:02:26,868
Ladies and gentlemen
boys…

14
00:02:26,961 → 00:02:28,870
the Light of Knowledge.

15
00:02:43,478 → 00:02:48,900
One hundred years ago
in 1859

16
00:02:48,983 → 00:02:52,727
41 boys sat in this room…

17
00:02:52,821 → 00:02:55,313
and were asked
the same question…

18
00:02:55,407 → 00:02:58,576
that now greets you
at the start of each semester.

19
00:02:58,660 → 00:03:02,487
Gentlemen
what are the Four Pillars?

20
00:03:04,958 → 00:03:09,744
Tradition. Honour.
Discipline. Excellence.

samliew · September 15, 2023, 7:44am

Here you go, use this regex with a Match Pattern module:

(?<value>[\w\W\n]{1,300})(?:\n\n|$)

Demo on regex101.com

Change 300 to how many max characters you want per bundle.

Screenshots

Output bundle

[
    {
        "i": 1,
        "value": "2\n00:00:42,315 → 00:00:45,767\nNow remember keep\nyour shoulders back.\n\n3\n00:00:54,786 → 00:00:57,403\nOkay put your arm\naround your brother. That’s it.\n\n4\n00:00:57,497 → 00:00:59,457\nThat’s it.\nRight and breathe in.\n\n5\n00:01:01,209 → 00:01:03,117\nOkay one more."
    },
    {
        "i": 2,
        "value": "6\n00:01:07,632 → 00:01:09,707\nNow just to review\n\n7\n00:01:09,801 → 00:01:12,210\nyou’re going to follow\nalong the procession…\n\n8\n00:01:12,303 → 00:01:15,223\nuntil you get\nto the headmaster.\n\n9\n00:01:15,306 → 00:01:17,882\nAt that point\nhe will indicate to you…"
    },
    {
        "i": 3,
        "value": "10\n00:01:17,976 → 00:01:20,051\nto light the candles\nof the boys.\n\n11\n00:01:20,145 → 00:01:22,720\nAll right boys\nlet’s settle down.\n\n12\n00:01:29,320 → 00:01:32,105\nBanners up!\n\n13\n00:02:22,791 → 00:02:26,868\nLadies and gentlemen\nboys…\n\n14\n00:02:26,961 → 00:02:28,870\nthe Light of Knowledge."
    },
    {
        "i": 4,
        "value": "15\n00:02:43,478 → 00:02:48,900\nOne hundred years ago\nin 1859\n\n16\n00:02:48,983 → 00:02:52,727\n41 boys sat in this room…\n\n17\n00:02:52,821 → 00:02:55,313\nand were asked\nthe same question…\n\n18\n00:02:55,407 → 00:02:58,576\nthat now greets you\nat the start of each semester."
    },
    {
        "i": 5,
        "value": "19\n00:02:58,660 → 00:03:02,487\nGentlemen\nwhat are the Four Pillars?\n\n20\n00:03:04,958 → 00:03:09,744\nTradition. Honour.\nDiscipline. Excellence."
    }
]

Ram · September 15, 2023, 8:30am

Superb.
Thanks for your help.

Topic		Replies	Views
Split Audio File to Transcribe with Whisper Features google-drive , chatgpt	5	1134	August 12, 2024
Using whisper to generate an SRT file transcription? How To api	2	627	March 5, 2024
Splitting HTML into chunks How To arrays	1	43	April 1, 2025
Splitting chat GPT response into seperate variables to add to rows in a google sheet How To filters , functions , mapping , chatgpt	11	977	August 3, 2024
How to bypass gpt 4 max token limit (10k)? How To error	1	29	April 29, 2025

How to split the GPT prompt input?

Here you go, use this regex with a Match Pattern module:

Screenshots

Output bundle

Related topics