YouTube did that already. These parts are marked in the (converted) SRT files, for example:
1 00:00:06,560 --> 00:00:12,780 [Music] 2 00:00:09,650 --> 00:00:15,360 hi I'm Ron Gilbert I'm here winning and 3 00:00:12,780 --> 00:00:17,640 this is our first stand-up meeting
You could write a script that removes all rows except the ones above a line with “[music]”. With this approach you have the times for the music.
If you would like to have the SRT files, please send me a PM.