Roger Ebert using software to find his lost voice
After losing his voice following surgery, the film critic is using a new kind of text-to-speech software to communicate in a voice that sounds just like his.
Although he lost his voice to cancer surgery, Roger Ebert is sounding like his old self thanks to some innovative software.
The famous film critic, known for his spirited debates with the late Gene Siskel on their "At the Movies" show, has survived a difficult few years.
Diagnosed with thyroid cancer in 2002, Ebert underwent a series of operations that eventually robbed him of his voice and lower jaw, taking away his ability to speak, eat, and drink. To communicate with the outside world, he has relied on traditional text-to-speech (TTS) software that speaks whatever he types.
But traditional TTS software is far from perfect. The voice that comes out of the computer can sound robotic and mechanical. One of the best-known examples is probably the audio system used by famed physicist Stephen Hawking. Voices that use an accent for added flair--Ebert initially tried a British voice--often mispronounce words and are still hard to understand.
Then one day, as Ebert writes on his Web site, he was surfing the Web and discovered a site for a company called CereProc with a new kind of TTS software, one that builds voices based on a person's actual recordings.
After initially ruling out audio from his film review broadcasts, Ebert eventually sent CereProc the original commentary tracks that he recorded for the DVD releases of such films as "Casablanca," "Citizen Kane," and even "Beyond the Valley of the Dolls." CereProc first transcribed the audio recordings and then tweaked and programmed them into a final voice pattern for Ebert to beta test.
Matthew Aylett, CereProc's chief technical officer, described in an interview with NPR how the voices are created. The audio is cut up into individual sounds, according to Aylett. For example, the word cat would be split into three distinct sounds, /k/, /a/, and /t/. Those sounds, known as phonemes, are then rearranged and put back together to create whole words. Though it seems like a huge amount of recorded audio would be needed, Aylett pointed out that the English language contains only around 45 different sounds, all of which can be used to create a very full vocabulary.
CereProc includes demos of its TTS technology at its Web site, where people can type in text and hear it spoken in voices using different accents, including those of President Obama, California Gov. Arnold Schwarzenegger, and former President Bush.
Ebert said that he used the voice on Friday for a taping of Tuesday's Oprah Winfrey Show, where he and his wife, Chaz, recorded a segment to predict the Oscar winners. Though a couple of kinks still need to be worked out, he seemed quite pleased hearing his own voice again.
"Yes, 'Roger Jr. needs to be smoother in tone and steadier in pacing, but the little rascal is good," wrote Ebert on his Web site. "To hear him coming from my own computer made me ridiculously happy."
Ebert said he could use Roger Jr. for Webcasts, and even a new movie review show he's producing, though he said he "won't be one of the two critics." But it sounds like his greatest joy will be to use the voice to talk to Chaz and his grandchildren once again.