I have this crazy idea. Since we now have a method where we are creating transcripts of the podcasts, we have sufficient raw data I think to accomplish this. @Sushi has taken the autogenerated transcripts now for the first 5 podcasts, and identified speakers, and cleaned up the text, and added references as well (nice work).
For the video versions of the podcasts, I was thinking we could created animated “heads” of the speakers, with mouth movements. We can use the avatars for Ron, Gary, Dave, and other devs, we just need to create the 8 images or so for different mouth positions, and use the Rhubarb Lip Sync tool from @DanielWolf to extract data for the timing and which mouth position to use.
This should theoretically be all automated once we have the final transcripts available, which identify the times different speakers start speaking.
We need some sort of 2D animation engine to put the pieces together, and I’m thinking of trying the python library PyGame as a starting point. The pieces I don’t have readily available yet are the avatar images (should be able to clip those from the blog), and the mouth images. We may need a few sets of these to match up with mouth styles on the different avatars, and get help from someone who perhaps is better at drawing than I am.
My thought is we could identify all the speakers for a podcast ahead of time, and put them in equally spaced locations on the screen, and then animate their mouths as they speak. I was also thinking we could cause the other images to look in the direction of the speaker with a few different eye images. In addition, it would be fairly easy to drop text on the screen when there is a reference mentioned in the podcast.
The beauty of this is that this could potentially be fully automated, so while it might take a while to get working for the first podcast, the script could run automatically for the remaining podcasts.