Summarizing a 2.5-Hour Jam Session into a Highlight Reel – Advice Needed

Hi Make Community,

I play in a jam band called Pizza Jam in Santa Monica, CA, and I have a 2.5-hour video that I want to condense into a 5-minute highlight reel. I’ve been experimenting with OpenAI and FFmpeg (thanks to @Stoyan_Vatov ), but the timecode cues summary from OpenAI turned out to be pretty unreliable. They were not close to choosing the best possible clip and just selected the first 10-20 seconds of each clip. They did summarize the overall video in terms of songs and even instruments but not much deeper than that.

I’m looking for a system or workflow that can generate a meaningful summary of a large video like this—one that captures musical instruments, musical content, and speech—so I can create accurate cue points. I’m able to programmatically split the video with FFmpeg, but I need a way to automatically produce a musically-informed summary.

Has anyone tackled something like this? Any tips, tools, or workflows would be much appreciated!

Here’s our Pizza Jam :slight_smile:

1 Like