By age 3, humans are already experts at speech recognition. Computers, on the other hand, still have only remedial skills after a roughly 30-year history.
That may begin to change, thanks to advances in speech recognition software from the biggest players in the market and the thriving competition among them for new voice command markets in mobile devices and automobiles.
One step forward came Tuesday, when Nuance Communications released a new version of its widely used PC-based speech-recognition technology, Dragon NaturallySpeaking 9. The software, in development for about two years, improves the accuracy of speech recognition by 20 percent over its version 8, which debuted in November 2004.
That means that it hits dictation accuracy levels of 99 percent, according to the company, so people with disabilities or repetitive strain injury can voice their PC commands almost entirely instead of using a mouse.
Nuance engineers also built in a shortcut to the once-lengthy script training that had turned many consumers off before. Now people can begin using the speech software without training the program to understand their voice. Instead, the software will learn as it goes.
"People who tried it three, four, five years ago will notice massive improvements," said Matt Revis, Nuance's director of product management for dictation solutions. "Within several uses, the software catches up. It learns as you correct it."
Nuance's update comes as Microsoft tests its own speech recognition technology, which the software giant plans to offer at no charge within its new operating system, Vista. (NaturallySpeaking 9 costs about $99 for a Standard edition and $199 for the Preferred edition, which includes support for Microsoft Excel and syncing with digital handheld recorders.)
Like Nuance, Microsoft has worked on the accuracy of the program, so it recognizes the word "beach" from "peach" by the context of the sentence it's in. But it is also working on improvements to the user interface so it's easier for average people to command the software to fix errors or to switch applications.
"The technology is really becoming more mature. The accuracy continues to improve at an exponential rate," Microsoft software architect Rob Chambers said.
Speech recognition is a difficult computer problem. For one, external noise can confuse the program's reception of the speaker's voice and cause it to misinterpret language. Other recognition hurdles can be the high pitch of one person's voice or the mumbling tendencies of another's. As a result, the software must learn the nuances of an individual's speech patterns to deliver the highest accuracy.
The next leap for speech recognition is in the mobility market. Handheld devices like Blackberrys could allow people to dictate an e-mail instead of wearing out their thumbs on a tiny keyboard. Speech tech in automobiles could help drivers to better control the climate or navigate routes while leaving their hands on the wheel. Nuance's Revis said the company is talking to major wireless carriers and device makers about partnerships, under a mobility initiative.
Microsoft is eyeing the market, too. Microsoft's Chambers said he believes that speech recognition will one day surpass the natural skills of humans. "At one point in the future, we believe that the speech recognizer will be more accurate than a human is. We already do that in numerical digits."