I spent the last ~1.5 weeks working on a project I thought you might be interested in.
I loved Thimbleweed Park and was very excited when Delores was released (I finished the game in one sitting after I had seen the announcement on Twitter). I really liked how much the voice acting added to the original game and I had recently seen people using neural networks for vocal synthesis so I extracted the files from both games, stared at the assembly code for a while and started torturing my poor graphics card.
Here you can see the result (Delores sounds like a drunk robot sometimes):
You might be asking yourself: Why did I not use the source code that was released a few days ago? *sigh*
I had never used IDA before or done much reverse engineering. On my second sleepless night of stepping through the assembly and injecting my own code I was finally able to capture the dialogue in real time. I took a break and scrolled through my Twitter feed - that’s when I saw the announcement that the source code was released
But since I had spent so much time working on my hacky method I decided to finish it (which was nice because I ended up learning a bunch of new things on the way).
Since you say you used assembly I suppose there won’t be much overlap, but this is somewhat similar to what I want to do with cat noises. The sound was generated here in advance?
Yup! You can see an example at the end of the video. I loved the “friends” part too, I thought it sounded like Delores got a southern accent all of the sudden
I saw your project, it looks really cool!
#1: I might make a detailed video soon. I basically extracted the audio and text files for the voiced lines from the original game and gave it to a neural network to learn how to speak like Delores. I extracted the Delores files as well and let the AI read them. I then looked through the assembly code and found the spot where the dialogue is drawn onto the screen and wrote my own library that I injected into that function. The library takes the dialogue line and looks up the correct audio file for it (previously generated by the neural network) and plays it. You can see the lines that it didn’t find audio for in the command window on the left (commands and other actors’ lines)
Oh, I admit I didn’t watch the whole 14 minute thing. The end was cool. I was thinking along the lines of doing something similar to what I did with the text, basically in parallel with it.
Yes, please, do that.
And what about the neural network? Where does she come from? Does she live with you? Do you feed her? Does she grow well? What about her political ideas? Does she like pizza?
I mean, the scary part is that soon ANYBODY can make YOU say WHATEVER they want.
She’s really hungry most of the time; she usually eats 5 GB of my GPU memory…
Yup, that is pretty scary… While making this I kept thinking about if this is morally wrong but I decided that it was ok for demonstration purposes. What scares me is that if I was a big company and had the resources to actually makes this sound good, I could just not hire actors anymore and make money off their voices.
Hmmm… Definitely scary. Probably an actor could sue you for non authorized use of his\her voice, at least if you make money out of it. Or it will be possible in the near future. It’s a more advanced form of piracy. Increasingly difficult to fight.
Unfortunately I know nothing about neural networks (that’s something I’d love to learn, but who has time?), but I suppose there might be a way of changing the tone of a recording so that it sounds like the training.
I mean, there are algorithms that are able to replicate voices, but I’ve always seen them replicate on the same language of the training.
But here it would be something different, it’s not replicating the speech, it’s changing the tone.
It all depends on what rights you have to the actor’s work. Sometime you have a “buyout” when means you can do anything you want with the recordings, most common is you can do anything you want, but only for that game, movie or TV show. Actors know that AI is coming and the contracts are starting to deal with that.
All that legal mumbo-jumbo aside, it’s not something I would ever do out of respect for their talent any more that I would train AI on someones music or art.
If you were going to record them to make a canon of audio to run AI against, you’d have to be upfront and pay them accordingly.
Interesting, I need to look at this topic. I’m a sound designer and I would love to mess with this stuff. Do you need to know much about coding or are there tools for this like for video fakes? I’m asking about the AI speech synthesis, not putting it in the game.
You don’t really need to know much about coding but the process is still a bit more difficult than it was with the deepfake tools. I used Tacotron-2, you can find some implementations online.