I use this to summarize videos which were transcribed using Whisper. For this use-case, I use vtt2txt (included).
Forked from daveshap/RecursiveSummarizer. This fork uses less .txt files and takes two arguments for input and output.
It also create a grand summary of summaries at the end.
For processing, you can use process_vtt.sh