Intel's sights on lip-reading software

By matching mouth movements with speech, the chipmaker's software promises to iron out the performance glitches that have held back voice recognition applications.

Michael Kanellos Staff Writer, CNET News.com

Michael Kanellos is editor at large at CNET News.com, where he covers hardware, research and development, start-ups and the tech industry overseas.

See full bio

Michael Kanellos

April 28, 2003 12:19 p.m. PT

3 min read

Intel has released software that lets computers read lips, a step forward that could lead to better voice recognition applications.

The Audio Visual Speech Recognition (AVSR) software tracks a speaker's face and mouth movements. By matching these movements with speech, the application can provide a computer with enough data to respond to voice recognition commands, even when these are given in noisy environments. The AVSR program is part of the OpenCV computer vision library, a collection of open-source applications and tools that help computers interpret visual data.

Computer companies have tried to popularize voice recognition applications for years, but have been stymied by a shortfall in processing power in most computers and by the restricted performance of their software.

Both of these factors are changing rapidly. Average processors now run at over 1.5GHz, while top-of-the-line chips run at 3GHz. Additionally, researchers are getting a better handle on how to write applications that will work with voice commands.

One way to improve such applications is to put a visual signal into the voice recognition scheme as Intel is doing. Microsoft Research, for example, has developed a prototype application called GWindows, which a person can use to scroll through files or move windows though a combination of voice commands and hand gestures, said Andy Wilson, the project's designer.

With GWindows, a video camera mounted on a television monitor follows moving objects, such as a hand or pointer, that come within 20 inches of the screen. The application interprets any hand movements (or pointer gestures) as computer commands: Placing a finger over a window and then moving a finger left will move the window left, for example. If a voice command such as "scroll" is given, the computer will combine the finger and voice commands and scroll down. No special gloves are needed.

Microsoft's prototype application works better than a simple voice recognition system because the gestures improve accuracy, according to Wilson, who has demonstrated that the computer can follow voice commands in a crowded room filled with multiple conversations and lots of interference.

Such visual signal software relies in part on Bayesian mathematics, which is influencing other interface and artificial intelligence projects at Microsoft. In Bayesian math, computers

essentially rely on statistics. If a computer "sees" a sweeping hand gesture toward the left a number of times, it will consistently interpret that gesture as a command to move a file toward the left.

Intel has other visual applications to AVSR in the works. The Santa Clara, Calif.-based tech giant is looking into an application that uses cameras to monitor hospital patients for risk of strokes and into software that uses a security camera feed to detect potential criminals in a parking lot. The underlying principles of these programs are the same: The computer sends an alert when it sees something unusual--a slowing in a patient's gait or a person going from car to car instead of into the mall--in its video stream.

The work on these applications and the development of AVSR is taking place at Intel's China Research Center in Beijing.

In other Intel software research news, the company has released a test version of a technical library for building Bayesian networks, said Gary Bradski, a senior researcher in Intel's Microprocessor Research Labs who helped create the OpenCV library. A final version of the technical library, called the Probability Network Library, will come out by the end of the year, he said.