My friends trained a computer to speak in a completely humanlike voice and it sounds better than Google’s leading text-to-speech engine WaveNet
Would you believe me if I told you that the voiceover in this video was not produced by a human?
Seriously. Go give it a listen. I’ll wait….
Did you listen to it yet?
Now, you probably guessed it by now, but, yep… that entire two minute video was narrated using artificial intelligence.
As in…a human did not speak a word in that video. A machine learning program generated the entire narration.
You just heard a demo of one of the most advanced text-to-speech programs in existence today…built by two guys you've never heard of in Seattle.
Their program “reads” sentences off of a script and produces the sounds that it believes a human would make if they were reading the same sentences aloud.
(Okay that’s a little vague, but you get the general idea…)
The result is what you just heard: a humanlike voiceover track generated entirely using artificial intelligence.
Pretty cool, right?
Their technology rivals the bleeding-edge research being done at companies like Google.
Seriously, here's a side-by-side demo of Google’s best-of-market WaveNet voice reading the same paragraph as two of WellSaid's voices: WellSaid vs. Google.
Google is working on better voices (look up “Tacotron 2”), but they're not commercially available yet -- also, when you read the “fine print” in the research papers themselves you’ll realize that their technology has far more limitations than meets the…ear.
“So, all of that’s cool, but what would people even use this technology for?”
Text-to-speech technology isn’t new, but text-to-speech technology that doesn’t sound like somebody repeatedly hitting a dumpster with a baseball bat is.
There are tons of exciting use cases for a text-to-speech service that can cross the “uncanny valley” and actually speak like a real person.
My friends are focused on giving video and content producers access to instant, cheap, AI-generated voiceovers.
But that’s just one use case… there are tons more exciting things you can do with this stuff.
From “serious” use cases like:
Helping visually-impaired people by reading just about any written content in an enjoyable, humanlike voice
Replacing the crappy Hawking-like voice with a “real” voice for people who have lost their ability to speak
Creating a fully automated call center that can ask questions and have dynamic conversations with customers
Replace all of the automated reminder / alert systems in airports, train stations, etc. with a voice that does not sound like death itself
And there are plenty of fun use cases:
Instantly convert any blog post on the internet into a mini-audiobook narrated by a professional “voice artist”
Add a studio-quality voiceover to your home movies (you could have the movie trailer voice guy narrate your home movie)
Have Morgan Freeman read you any book on your Kindle…or your emails, or your texts
If you’re an indie game developer, you could add hundreds of different voiceovers to your game for a fraction of the cost of traditional voice artists
There are tons more use cases, but you get the idea…
"Cool. This is pretty wild. What can I do with this knowledge?"
Errr, well, I mostly just thought this was interesting and that lots of the Stew’s Letter readership would find this sort of thing intellectually stimulating.
But, more practically:
If you want to play around and generate some AI voiceovers, their beta is live and you can request access here (click "Join Beta")
If you’re a programmer, they are hiring (see "WellSaid Labs" postings)
If you’re just curious about the technology, they post lots of demos on their YouTube page
The future is here… and it’s far more wonderful and weird than any of us probably imagined.
P.s. — this post is an excerpt from Stew’s Letter, a weekly-ish newsletter for curious people. You can check it out here.