Old-school theory is a new force

Thomas Bayes, one of the leading mathematical lights in computing today, differs from most of his colleagues: He's been dead for 241 years.

9 min read
Old School
18th-century theory is new force in computing

By Michael Kanellos
Staff Writer, CNET News.com
February 18, 2003, 4:00 AM PT

Thomas Bayes, one of the leading mathematical lights in computing today, differs from most of his colleagues: He has argued that the existence of God can be derived from equations. His most important paper was published by someone else. And he's been dead for 241 years.

Yet the 18th-century clergyman's theories on probability have become a major part of the mathematical foundations of application development.

Search giant Google and Autonomy, a company that sells information retrieval tools, both employ Bayesian principles to provide likely (but technically never exact) results to data searches. Researchers are also using Bayesian models to determine correlations between specific symptoms and diseases, create personal robots, and develop artificially intelligent devices that "think" by doing what data and experience tell them to do.

Despite the esoteric symbols, the idea--roughly speaking--is simple: The likelihood that something will happen can be plausibly estimated by how often it occurred in the past. Researchers are applying the idea to everything from gene studies to filtering e-mail.

A detailed mathematical rundown can be found on the University of Minnesota's Web site. And a Bayes Rule Applet on Gametheory.net lets you answer questions such as "How worried should you be if you test positive for some disease?"

One of the more vocal Bayesian advocates is Microsoft. The company is employing ideas based on probability--or "probabilistic" principles--in its Notification Platform. The technology will be embedded in future Microsoft software and is intended to let computers and cell phones automatically filter messages, schedule meetings without their owners' help and derive strategies for getting in touch with other people.

If successful, the technology will give rise to "context servers"--electronic butlers that will interpret people's daily habits and organize their lives under constantly shifting circumstances.

"Bayesian research is used to make the best gambles on where I should flow with computation and bandwidth," said Eric Horvitz, senior researcher and group manager of the Adaptive Systems & Interaction Group at Microsoft Research. "I personally believe that probability is at the foundation of any intelligence in an uncertain world where you can't know everything."

Toward the end of the year, Intel will also come out with a toolkit for constructing Bayesian applications. One experiment deals with cameras that can warn doctors that patients may soon suffer strokes. The company will discuss these developments later this week at its Developer Forum.

Despite its popularity today, Bayesian theory wasn't always universally accepted: Only a decade ago, Bayesian researchers dwelled on the fringes of their professions. Since then, however, improved mathematical models, faster computers and valid results from experiments have given new credibility to the school of thought.

"One of the problems was that it was overhyped," said Omid Moghadam, manager of application software and technology management in Intel's Microprocessor Lab. "In reality, the power to do anything serious didn't exist. The real implementation has taken place in the past 10 years."

Bayes for dummies
Bayesian theory can roughly be boiled down to one principle: To see the future, one must look at the past. Bayes theorized that the probability of future events could be calculated by determining their earlier frequency. Will a flipped coin land heads up? Experimental data assigns it a value of 0.5.

"Bayes said that essentially everything is uncertain, and you have different distributions on probability," said Ron Howard, a professor in the Department of Management Science and Engineering at Stanford.

Suppose, for example, that instead of flipping a coin, a researcher tossed a plastic pushpin and wanted to know what the chances were that it would land flat on its back with the pin pointing up, or, if it landed on its side, what direction it would be pointing. Shape, imperfections in the molding process, weight distribution and other factors, along with the greater variety of outcomes, would affect the results.

The appeal of the Bayesian technique is its deceptive simplicity. The predictions are based completely on data culled from reality--the more data obtained, the better it works. Another advantage is that Bayesian models are self-correcting, meaning that when data changes, so do the results.

Probabilistic thinking changes the way people interact with computers. "The idea is that the computer seems more like an aid rather than a final device," said Peter Norvig, director of security quality at Google. "What you are looking for is some guidance, not a model answer."

Search has benefited substantially from this shift. A few years ago, common use of so-called Boolean search engines required queries submitted in the "if, and, or but" grammar to find matching words. Now search engines employ complex algorithms to comb databases and produce likely matches.

As the pushpin example shows, complexity and the need for more data can accelerate rapidly. Harnessing the results required to transform a good guess into a plausible outcome has become possible through the emergence of powerful computers.

More importantly, researchers such as Judea Pearl at UCLA have learned how to make Bayesian models that better home in on the conditional relationships between different phenomena, which greatly reduces the number of calculations.

A quest in the population at large for the causes of lung cancer would reveal it to be a minor disease, for instance, but research confined to smokers would show some correlation. Examining lung cancer victims can then help draw a hypothesis on causation between the disease and the habit.

"Every individual attribute or symptom can depend on a lot of different things, but it depends directly only on a small number of things," said Daphne Koller, an assistant professor in the computer science department at Stanford. "In the past 15 years or so, there has been a revolution in tools that will allow you to represent large populations."

Eric Horvitz, senior researcher, Microsoft ResearchAmong other projects, Koller is using probabilistic techniques to better match symptoms to diseases and to link genes to specific cell phenomena.

Speaking to numbers
A related technique, called Hidden Markov models, allows probability to anticipate sequences. A speech recognition application, for example, knows that the sound most likely to follow "q" is "u." Along those lines, the software can also calculate the possible utterance of the word Qagga, an extinct zebra.

Probabilistic techniques are already embedded in Microsoft's products. Outlook Mobile Manager, which determines when to send a deskbound e-mail to a mobile device, grew out of Priorities, an experimental system unveiled at Microsoft in 1998. The troubleshooting engine in Windows XP also relies on probabilistic calculations.

More applications will trickle out over the coming years as the company's Notification Platform becomes embedded in products, Microsoft's Horvitz said.

An application named Coordinate, a major element of the Notification Platform, gathers data from personal calendars, keyboards, sensor cameras and other sources to create a mosaic of a person's life and habits. The data gathered can include arrival schedules, typical time and length of lunches, what types of phone and e-mail messages are kept or discarded, how frequently the keyboard is in use at given times of the day, and so on.

Such data can be used to manage the flow of messages and other information to people who use the application. If a manager sent an e-mail to a worker's computer at 2:40 p.m., for example, Coordinate could check that worker's calendar program and find that a meeting was listed for 2:00 p.m. The program could also scan data about the worker's habits and discover, say, that the person usually resumed keyboard activity about an hour after the listed start times of meetings. The program might also find that the worker typically responded to e-mails from this manager within five minutes. Based on all that data, and given that the worker probably wouldn't return to the computer for at least 20 minutes, the program could decide to forward the message to the worker's cell phone. Meanwhile, the program might decide not to forward e-mails from other people.

"We're balancing the value of information that is coming in with the cost of interrupting you," Horvitz said. With these applications, he maintained, "there will be a lot more people keeping up with things and not drowning in information."

Privacy and user control over these functions, Horvitz added, is assured. Callers don't know why a message may have been prioritized or pushed back.

Other Microsoft Bayesian prototypes include DeepListener and Quartet (voice activation), and SmartOOF and TimeWave (contact control). Consumer multimedia applications will also benefit, Horvitz said.

Bayesian techniques will also go beyond the PC. At the University of Rochester, researchers have determined that a person's gait can change before a stroke. While the changes are too subtle for humans to track, a camera feeding data to a PC can capture and track movements. The computer can then send an alert if walking anomalies occur.

An experimental security camera uses the same principle: Most airport patrons go straight to the terminal after parking, so someone who parks and then goes to another car is out of the ordinary and can trigger an alert. A basic engine for creating a Bayesian model and technical information will be posted to Intel's developer sites this fall.

Clash of the nerds
Although the techniques sound straightforward, the computing world has been slow to embrace them. Horvitz recalled being only one of two graduate students at Stanford working on probability and artificial intelligence in the 1980s. Everyone else was studying logical systems, those that interacted with the world through "if and then" statements.

Peter Norvig, director of security quality, Google "Probabilities were definitely out of fashion," Horvitz said. The tide turned as it became apparent that logical systems couldn't anticipate all unexpected circumstances.

Many researchers also began to acknowledge that human decision-making is far more mysterious than originally believed. "There was a cultural bias within the artificial-intelligence community against numbers," Koller said. "People now recognize that people don't realize what they do in their heads."

Even in his day, Bayes found himself outside the mainstream. Born in 1702 in London, he became a Presbyterian minister. Although he saw two of his papers published, his principal work, "Essay Toward Solving a Problem in the Doctrine of Chances," wasn't published until 1764, three years after he died.

His membership in the prestigious Royal Society was something of a mystery until recent years, when newly discovered letters showed that he privately corresponded with the other leading thinkers of England.

"He never, as far as I can tell, wrote down Bayes' theorem," Howard said of the formal mathematical formula.

Theologian Richard Price and French mathematician Pierre Simon LaPlace became early champions. The ideas, though, ran counter to those set out later by George Boole, father of Boolean math, which is based on algebraic-like logic and eventually gave birth to the binary system. Boole, also a member of the Royal Fellows, died in 1864.

While few discount the importance of probability, debate on its uses lingers. Critics periodically assert that Bayesian models depend on inherently subjective data, leaving humans to judge whether an answer is correct. And probabilistic models do not completely account for the nuances in the human thought process.

"It's not exactly clear how children learn," said Alfred Spector, vice president of services and software at IBM's Research Division, who proposes mixing statistical methods with logical systems in a so-called Combination Hypothesis. "I'm convinced it's statistical initially, but then at a certain point, you will see at three it is not just statistical."

Yet, probability is, in all probability, here to stay.

"This is a foundation," Horvitz said. "It was overlooked for a while, but it is a foundation for reasoning."  


Developers are using Bayes' centuries-old ideas to take on the Information Age.

They're building programs designed to automatically manage the deluge of data that gets thrown at people each day via e-mail, cell phones, instant messaging programs and the like.

Such a program would collect data over time and create a model of a person's past behavior to determine how best to deal with a newly arrived message.

A simplified example:

1) E-mail arrives
From: The boss
Subject: Read me
Sent: Thu 3:10 PM

Program accesses calendar software
"Thursday, 2 PM: Meeting"
Program considers habits and current situation
• When at computer, usually responds to e-mails from "The boss" in 1 to 5 minutes.
• When meeting scheduled for Thursday afternoon, keyboard activity usually resumes 1.5 hours from meeting's listed start time. (Thus, Thursday meetings usually last an hour and a half.)

Program draws conclusions
• Meeting started at 2.
• E-mail arrived at 3:10.
• Recipient probably away till 3:30.
• Usually answers "The boss" in 1 to 5 minutes.

Program acts
Forwards e-mail to recipient's cell phone for immediate retrieval.

2) E-mail arrives
From: "The middle manager"
Subject: Idea
Sent: Thu 3:10 PM

Program accesses calendar software
"Thursday, 2 PM: Meeting"
Program considers habits and current situation
When at computer, usually responds to e-mails from "The middle manager" in 25 minutes to an hour.
Program draws conclusions
• Meeting started at 2.
• E-mail arrived at 3.
• Recipient probably away till 3:30.
• Usually answers "The middle manager" in 20 minutes to an hour.

Program acts
Decides not to forward. OK for recipient to get it upon return.

Editors: Mike Yamamoto, Edward Moyer
Copy editor: Lisa Denenmark
Design: Pam Dore
Production: Meghan McDowell