We’re gross folks…
Demo 3 now available. I have the dev avatars from the spritesheets provided by Ron now, and addressed the small text size issue (still tweaking the way captions are done, comments are welcome).
First of all: Really great work!
But I have to admit that I was a little bit “confused” by the fact that the heads are static and don’t move. It’s not about the moving the eyes, it’s about the whole head.
Is it possible that there is a delay that affects the synchronization? For example, at the beginning I see Ron moving his mouth before the speech begins. I can’t understand very well if that delay exists also beyond that point or if it’s just something that happens at the beginning.
About the shapes of the mouth chosen by Rhubarb Lip Sync, sometimes I have the impression that they don’t match very well with the spoken words, especially for Gary. Are you giving to Rhubarb the audio file or the textual file using the -dialogFile option?
That’s good feedback. Moving the heads in a realistic way might be a bit complex. I’ll play some Thimbleweed park and see if I can get some ideas on that.
The problem is that they don’t move the head in the game. AFAIR Ron, David and Gary are just looking straight ahead out of the display. So we haven’t any sprites or animations that you could use. You could only try to “squeeze” or “scale” the heads. But maybe the movement of the eyes brings some more “life” into the faces.
Actually, they dance. Depending on the dance, we might have other positions of the heads.
There are a few issues I’m still working on. One is that Rhubarb is detecting mouth movement during the initial music. The other is that the YouTube auto-sync to the transcript is also mistiming the first caption (this might be for the same reason) I am trying to find a good automated solution to this. At first I ignored all Rhubarb data until the first caption timing from YouTube, but that didn’t work in this case - as they both had an issue. For now I’ve added a ‘hack’ to the script to ignore all captions and mouth movement for the first 10 seconds of the podcast. But that may not work well for all podcasts, and it’s still not perfect (as you noticed). What I might have to do is edit the Rhubarb data to remove all the mouth movements detected by the intro music, but that’s another manual step in the process I’d really like to avoid.
I don’t think in general the mouth movements are mis-synchronized, but just a gitch in the beginning of the podcast, during the intro music. It seems like it’s well aligned throughout the podcast in general.
The other improvement I need to implement is to provide Rhubarb with the optional “guiding” dialog text. Rhubarb works from the audio, doing it’s own detection of the words spoken. It allows you to provide a secondary file with text for the audio to help guide it along. I need to parse out a version of the transcript file (pulling out all but the spoken words) to feed into Rhubarb. Note, the Rhubarb lip sync (at least on my computer, running 8 threads in parallel) take a while to run (longer than the run time of the podcast), it doesn’t run with the animation, it’s all a preprocess step, and takes a while to iterate on.
From what I remember in the spritesheets, I don’t have any alternate front facing head positions to use, though I could move them up and down and scale them, though that might just end up looking weird. I will implement eye movement and maybe simulated blinking as well, which may help with this issue, and we’ll go from there.
(replying to Han Solo message from Zak…)
This is just before he shoots first, right… I wonder if there’s a hidden message there.
I was tempted to reply to this with a snarky but character-inversion appropriate (and to me, hilarious), “I know”, but I was afraid it would come across wrong.
I have checked. In the game there are also side views of the heads (including the different mouth shapes), so in theory it could be possible to alternate the front view and the side view. For example a non-speaking character could turn his head to watch the speaking character.
But I wonder if we really need this, because it’s beginning to look a lot of additional work for you.
I like hearing the ideas nonetheless. Some things might be easier than others, but much of this optimization will take a back seat to get the full automation flow going so I can just press the button for each podcast - which is my first priority.
This is a fun project for me, and as long at remains mostly fun and educational for me, I certainly can make a strong argument to myself for making time for it.
The other person really putting a lot of time into this whole effort is @Sushi. That work is a key element to these animated podcasts, because it provides the critcal data of who is speaking, and the accurate spoken text. He’s edited, cleaned up, and annotated the first 7 podcasts so far. I’m hoping to leverage more of the items he’s added, such as annotations and “best of” highlights, in the animations at some point. I’m guessing he’s put many hours of effort into the transcripts. His effort has certainly inspired me. Hopefully we can get others to help with the podcast transcriptions, including myself.
Actually in the kickstarter, the head of Gary wobbles while talking. But personally, I’d prefer a simple bobbing. Which means of course you should add static shoulders&necks
Ahahah sorry it was love for the work and the effort you put in, nothing personal
Yes, it’s the option that I mentioned in my post. I wonder if providing the text to Rhubarb would also help with the issues at the beginning of the video (but it could also make it worse, because Rhubarb could try to match the first written words with the music audio).
In the weekends, I could help with polishing the captions created by YouTube and attributing the sentences to the correct speakers, but I wouldn’t be able to add interesting notes as Sushi does.
@Sushi : do you think that the kind of activity that I just described could help you in any way?
But the heads would be still static. I’m not sure if this would be worth the effort.
Yes, I agree!
What kind of movements would you like the characters to make?
Like in Monkey Island 2 (the characters “nod” with the head while speaking):
Oh, then I can do that. I could take the “straight” versions of the heads and rotate them, like in MI2.