Microsoft revs speedier, smarter speech recognition for phones

Using a novel method that mimics brain functions, Microsoft researchers say they've doubled the speed at which speech is recognized and results are returned. They've also improved accuracy by 15 percent.

Jay Greene
Jay Greene Former Staff Writer
Jay Greene, a CNET senior writer, works from Seattle and focuses on investigations and analysis. He's a former Seattle bureau chief for BusinessWeek and author of the book "Design Is How It Works: How the Smartest Companies Turn Products into Icons" (Penguin/Portfolio).
2 min read
Microsoft chief executive Steve Ballmer at the Windows Phone 8 launch on October 29, 2012 in San Francisco. Josh Miller/CNET

To peck away at the vast lead that rivals Google and Apple have in the mobile phone market, Microsoft is tapping its vast research unit to help improve speech recognition for folks who speak their text messages or use their voice to search the Web.

Microsoft researchers say they have come up with a novel approach to boost the accuracy of speech recognition and rev up the speed in which it's rendered by creating a computation model that mimics the way the brain works. By applying so-called deep neural networks to speech recognition, Microsoft researchers claim that users in the United States, composing a text message or searching via Bing with their voices, will see results twice as fast as they did with Microsoft's previous technology. And the researchers say accuracy has improved by 15 percent.

"For a normal sentence, you will have one less word to correct," said Michael Tjalve, a senior program manager in the speech technology group at Microsoft.

Microsoft's internal testing found that the word-error rate fell from 16 percent to 13.5 percent. Error rates can vary, depending on everything from background noise to the quality of the microphone in the mobile phone.

What's more, the speed with which speech is translated to text is nearly instantaneous. A sentence that might take a second or more to render pops up on a phone screen almost as soon as the person stops speaking.

Microsoft declined to compare the speed and accuracy of its speech recognition to rival systems, saying companies often take different approaches to measuring speech recognition systems.

The accuracy and speed gains may seem subtle. But in the world of speech research, they are meaningful improvements. And for customers who use voice recognition on their phones, including the work from Microsoft researchers is one more step toward making the experience easier.

To get those improvements, Microsoft replaced the acoustic model in its speech recognition technology. In a technical paper, five Microsoft researchers found that using deep neural networks for speech recognition helped minimize the variability in speech that often trips up the acoustic model that Microsoft's previous technology used, known as the Gaussian mixture model.

"The computational capacity was just not there," said Frank Seide, a research manager in Microsoft's research operation in Beijing and one of the paper's authors.

Microsoft began rolling out the update to data centers in the United States in April, and plans to have completed the process within the next few weeks. The company didn't say when it planned to roll the technology out internationally.

Here's a video from Microsoft explaining the next speech recognition technology: