Google practices dividing to conquer

Projects under development at the company are aimed at helping categorize data and improving the relevance of search results.

Stefanie Olsen Staff writer, CNET News
Stefanie Olsen covers technology and science.
Stefanie Olsen
2 min read
SAN FRANCISCO--Google's 8 billion-plus Web document index may not multiply, but its search engine will learn to better divide the data.

That was part of the message from Peter Norvig, Google's director of search quality, who on Tuesday gave a keynote speech here at the Semantic Technology Conference. Norvig, a former NASA employee and an author of books on artificial intelligence, highlighted several research projects the company is developing to help classify data and improve the relevance of search results.

Those projects focus on adding new clustering capabilities for search results, providing suggestions for related searches, personalizing listings, and returning factual answers to specific questions, Norvig said.

"We want to have a broader bandwidth for that kind of communication," Norvig said. "It's a question of what's the right language."

Despite heavy competition in recent years to own the largest document index, Norvig also said he couldn't foresee Google's database adding many more Web documents without cataloging bogus or useless pages. Still, the company has numerous programs to add otherwise inaccessible data, like that from books and TV shows, to its Web search engine.

Norvig highlighted a research paper written by a Google employee last year regarding a classification engine the company is testing. The technology can parse a proper noun or compound nouns into several categories in order to deliver clustered results, for example. For a query on "ATM," or asynchronous transfer mode, the engine would be able to use the terms "such as" on Web pages indexed with the term to discover that it can be linked to the expression "high-speed networks." As a result, a search for high-speed networks might pull up a cluster on ATM.

Norvig said the same technology could be used to mine factual answers from the Web for queries like "President Lincoln's birth date." The technique could offer an edge over Microsoft's recent addition of encyclopedic answers to its database, thanks to its Encarta software, Norvig said. That's because MSN's engine could miss the chance to deliver the desired factual answer if the searcher's query is inexact. In contrast, Google draws on the semantic Web and various language sets from pages to find a match.

Norvig also demonstrated Keyhole, Google's satellite mapping service. He said that over time, the company will greater integrate its maps and local information on businesses and places. "It's important to deliver information about the real world as people carry devices around," he said.