The (almost) Real Transcript Podcast #68 (off-topic again)

From Youtube I can export .vtt, .srt, and .sbv files.

:ring: :elf:

3 Likes

Ah, nice. With the SRT format we could skip a step. Via youtube-dl I only get vtt - but I don’t have to download it for each video separately.

@Sushi: Can you use the transcript (beside the Sauron part)? :slight_smile:

Yeah, I’ll be fine. I know @besmaller uses one of the formats to reinsert some time markers, though, or doesn’t he?

I’m at 33’ now. At my average rate of transcribing 10’ a day (2 to 3 hours), I should be done by Wednesday…ish.

3 Likes

Yes, but it’s one of the files I can download so I don’t need anything else.

As promised, here is my code. Please note that I used Linux:

# Get VTT subtitles:
youtube-dl --write-auto-sub --skip-download --output foo https://www.youtube.com/watch?v=xrgj-v8_nUo

# Convert VTT to SRT format:
ffmpeg -i foo.en.vtt foo.srt

# Kepp only the text:
sed -r '/^[0-9]+$/{N;d}' foo.srt > foo.txt

# Remove empty lines:
sed '/^\s*$/d' foo.xt > plain.txt

# Keep only each third line:
awk 'NR == 1 || NR % 3 == 0' plain.txt > final.txt

This isn’t the “best” solution, I just used the commands I needed for the next step.

The advance of youtube-dl (and the other command line tools) is, that you can download and process several videos at once. You just have to replace the URL for the video with the URL for the video list (and you have to wrap the commands in a little shell script :slight_smile: ).

You can get the SRT format with youtube-dl too:

python3 youtube-dl --write-auto-sub --skip-download --sub-format srt https://www.youtube.com/watch?v=xrgj-v8_nUo

but this doesn’t seem to work with the subtitles generated by YouTube (but I haven’t investigated this further).

2 Likes

1 Like

Yes - for such things Linux is really great. :slight_smile:

Currently at 43 minutes (almost 2 AM now, time to hit my bed)
:sleeping:

2 Likes

Thank you, it’s helpful to see the steps you took, and especially the youtube-dl command-line options - I was hoping to avoid figuring those out on my own, since it’s a fairly robust program with a million options.

1 Like

I did it.
It’s done!
It’s long!
Longer than Discourse allows
In a single post at least!
You asked for it…
You got it:

7 Likes

how ironic!
First a post with “coming soon…” is deemed too short (less than 20 characters) and then 45900 characters is above its limit of 32000?!

2 Likes

Yes, Discourse is weird. But don’t forget that Discobot is still partying (hard). So such things can happen…

Oh how I dread doing those 1,5 hour podcasts… like Captain Dread proportions

On the bright side, those 10 or less minute ones will be a breeze…

2 Likes

I’ve read all the podcast transcript today. Really impressive and thanks again.

When I’ll have time, I want to listen to it again while reading the transcript…

1 Like

I might be able to help with that… I’m hoping to release a new podcast animation demo soon using this podcast.

6 Likes