Text Version of Podcasts

ZakPhoenixMcKracken · October 30, 2017, 8:17am

I love you guys!
And also I love AI

Someone · October 30, 2017, 10:24am

Great work! I will try to download the subtitles later (after work :)).

Sushi · October 30, 2017, 4:16pm

So, I should stop my manual work, I guess? Or perhaps finish one and see if some things can be improved in the auto-extract?
Note that I’m adding some annotations in the transcript, which no tool could do.

Sushi · October 30, 2017, 4:47pm

Wow. That is pretty bad (edit: for such an expensive tool), imo. Without proper punctuation and cleaning of speech patterns, stuttering etc., it is quite unreadable unless used as a subtitle.
It may look like actual text at first, but it is very tiring to make sense out of it.

besmaller · October 30, 2017, 4:48pm

Well, this process is just auto “captoning” for the video format of the podcast. I would see this as an “aid” to proper transcription, doing most of the word entry, but it should be changed to include speaker names and clean up the occasional (now more rare) transcription errors.

If you are near completion with one or more podcasts, I would finish that work, and then we can compare the effort with someone working to clean up one of the auto-created transcripts, and decide the best way to go forward.

Sushi · October 30, 2017, 4:50pm

Great idea. I’m not a fast typer, so any base to start from will be useful. For the Friday Questions, I would make sure everyone’s unpronounceable nickname be written correctly by going back to the posts (and copy and paste the questions while I’m at it, of course)

besmaller · October 30, 2017, 4:53pm

Yes, it’s not bad for an automatic tool, but has lots of limitations. I needed to double check which post you were replying to, and I see it was the Sonix version. The Google automatic captions are actually much better than the Sonix output, but it doesn’t do speaker identification, and punctuation, etc…

BigRedButton · October 30, 2017, 5:21pm

If Sonix can write time-stamps in the output, we could merge the speaker informations from Sonix and the translation from YouTube.

Someone · October 30, 2017, 5:33pm

It’s working.

youtube-dl is able to download all subtitles from all videos in the playlist. The results are WebVTT files that looks like this:

WebVTT

WEBVTT
Kind: captions
Language: en
Style:
::cue(c.colorCCCCCC) { color: rgb(204,204,204);
 }
::cue(c.colorE5E5E5) { color: rgb(229,229,229);
 }
##

00:00:06.550 --> 00:00:11.610 align:start position:19%
[Music]

00:00:09.650 --> 00:00:14.130 align:start position:19%
hi<c.colorCCCCCC><00:00:10.650><c> I'm</c><00:00:10.889><c> Ron</c><00:00:11.099><c> Gilbert</c></c>

00:00:11.610 --> 00:00:16.949 align:start position:19%
I'm<c.colorE5E5E5><00:00:11.880><c> Gary</c><00:00:12.150><c> winning</c><00:00:12.480><c> and</c><00:00:12.780><c> this</c><00:00:13.530><c> is</c><00:00:13.650><c> our</c><00:00:13.799><c> first</c></c>

00:00:14.130 --> 00:00:18.449 align:start position:19%
stand-up<c.colorE5E5E5><00:00:15.000><c> meeting</c><00:00:15.360><c> podcast</c><00:00:15.929><c> and</c><00:00:16.350><c> a</c><00:00:16.529><c> stand-up</c></c>
...

I can convert these files into the SRT format that looks like this:

SRT

1
00:00:06,550 --> 00:00:11,610
[Music]

2
00:00:09,650 --> 00:00:14,130
hi I'm Ron Gilbert

3
00:00:11,610 --> 00:00:16,949
I'm Gary winning and this is our first

4
00:00:14,130 --> 00:00:18,449
stand-up meeting podcast and a stand-up

5
00:00:16,949 --> 00:00:21,720
meeting is the thing a project usually

6
00:00:18,449 --> 00:00:24,810
has in the mornings where everybody on

7
00:00:21,720 --> 00:00:27,029
the team gets together and talks very
...

A little sed command converts this to plain TXT:

Summary

[Music]
hi I'm Ron Gilbert
I'm Gary winning and this is our first
stand-up meeting podcast and a stand-up
meeting is the thing a project usually
has in the mornings where everybody on
the team gets together and talks very
very briefly about what's going on in
the project what happened what's gonna
happen today and what happened yesterday
usually the meetings are stand up just
so they're quick and they don't drag on
so we're gonna do the stand-up meeting
podcasts they're probably gonna last
less than five minutes and we'll
...

Which file format do we need? All?

There are several tools to edit and translate SRT files. So theoretically someone could translate the podcast in, let’s say, Italian and then we could re-import the subtitles into the video. So everyone can hear the English podcast with Italian subtitles. But that would be a huge amount of work of course.

milanfahrnholz · October 30, 2017, 5:42pm

I always knew it. While Gary doesn´t show himself too much publicly, he secretly does all of the winning. So much winning!

Someone · October 30, 2017, 5:47pm

So, you would pay the Sonix service?

I thought the same. Although I liked the Pockesphinx interpretation more, for example: “brenda our red phantom asks how much of the boot park was playboy wire frame before the real art was added.”

BigRedButton · October 30, 2017, 5:59pm

No, I forgot about the costs. Sorry.

ZakPhoenixMcKracken · October 30, 2017, 6:10pm

I hope so.

LowLevel · October 30, 2017, 7:57pm

Can we quantify the Sonix costs? How many hours of podcasts are there?

RonGilbert · October 30, 2017, 8:12pm

I like the idea of this. Once the captioning issues are figured out, I’d be happy to put them on the official YouTube channel if that’s desired.

besmaller · October 30, 2017, 8:22pm

There are approximately 21 hours of audio in the podcasts. At $8/hr that’s $168. The service also costs $15/month on top of that. It would require 1 month at least. The service has a nice editing tool as well as the transcription. However, it’s clear the transcription quality is substantially worse than Google’s, and I don’t know how easy it would be to merge data. Also, looking at the one example I posted earlier, it missed speaker transitions quite a bit, especially with more than 2 speakers.

The service offers a free trial with one hour of audio transcription. If someone wanted to, they could do one of the <1 hour podcasts, and try it out with the free trial and see how well it works, along with the online editing (like with Podcast #1, only about 14 minutes I think). Theoretically, we could import the transcription made from Sonix into the Youtube video captions, assuming the data is represented with a timestamp format.

Nor_Treblig · October 30, 2017, 8:46pm

21 hours? This can’t be right, you are missing half an hour!

besmaller · October 30, 2017, 8:50pm

Good point! I’ll add these tonight, I’m curious how the Google transcription will work…

BigRedButton · October 30, 2017, 9:06pm

I think I have found the name of what we are discussing: Speaker diarisation.

There are some open-source speaker recognition tools out there. Though, I’m not sure about their functionality.

Here is a link to one tool which seems to provide diarization. The “Tutorial for LIA_SpkSeg — Top-down Speaker Segmenting and Clustering System” may be interesting for us.

ZakPhoenixMcKracken · October 30, 2017, 10:31pm

Ahah, thank you for your mention, but I don’t think I deserve it. Maybe as a “bonus file”

…with an English pronunciation with heavy italian accent. It’s a tough challenge for the AI !

Topic		Replies	Views
Animated Podcasts Podcasts	187	5727	November 1, 2021
Thimbleweed Park Fan Forum Podcast #0 (Pilot) Podcasts	114	2111	August 24, 2018
All the phonebook messages and all the books in the library	10	1138	May 25, 2021
Transcript Podcast #2 Podcasts	0	1038	November 3, 2017
About the Podcasts category Podcasts	1	946	December 7, 2017

Text Version of Podcasts

Related topics