X

Academia's quest for the ultimate search tool

U.C. Berkeley is developing a major research center focused on improving Internet search technology.

Stefanie Olsen Staff writer, CNET News

Stefanie Olsen covers technology and science.

Stefanie Olsen

Oct. 10, 2005 10:41 a.m. PT

6 min read

The University of California at Berkeley is creating an interdisciplinary center for advanced search technologies and is in talks with search giants including Google to join the project, CNET News.com has learned.

The project is one of many efforts at U.S. universities designed to address the explosive growth of Internet search and the complex issues that have arisen in the field.

U.C. Berkeley, birthplace of early search highflier Inktomi and the school where Google CEO Eric Schmidt got his computer science doctoral degree, is bringing together roughly 20 faculty members from various departments to cross-pollinate work on search technology, said Robert Wilensky, the center's director. The principal areas of focus: privacy, fraud, multimedia search and personalization.

News.context

What's new:
Continuing a long tradition of academic exploration in Net technology, U.C. Berkeley will soon open an interdisciplinary center for developing new search technologies. The school is talking to a number of search companies, including Google, about participating.

Bottom line:
Drawing on the expertise of faculty from various departments, Berkeley's center will focus on privacy, fraud, multimedia and personalization as these topics relate to the increasingly diverse and in-depth information available on the Internet.

More stories on this topic

"We want to solve the problems that have been engendered by the success of search," Wilensky said in an interview. Wilensky is a professor of computer science and information management at Berkeley.

Plans are still being worked out for the center's physical space, but Wilensky said he hopes designs will be completed within the next few months and the center opened early next year. He also said he's talking to Google and other search players about membership.

"If you have 20 researchers interested in search, then getting them together where they are cross-fertilizing ideas, you make something bigger than its parts. You can create a nuclear reaction," he said.

~~Google declined to comment. (Google representatives have instituted a policy of not talking with CNET News.com reporters until July 2006 in response to privacy issues raised by a previous story.)~~

The success of the $5 billion-a-year search-advertising business is fueling Internet research and development in many ways. The business has not only bolstered the likes of Yahoo and Google with billion-dollar annual revenues to be spent in new areas but it's also revived hundreds of smaller dot-coms and inspired leagues of upstarts to venture into areas of specialty search.

Looking for the next generation to be born? There's no better place to visit than academia, where today's most successful search companies got their start. "A big source of new ideas comes out of universities," said Geoff Yang, a venture capitalist at Redpoint Ventures, which has backed such companies as AskJeeves and TiVo.

Google and Yahoo were practically hatched in the same dorm room at Stanford University by two pairs of graduate students roughly six years apart. Lycos, a one-time search leader, came out of Carnegie Mellon University. Newer projects include Vivisimo, a clustering search tool from CMU professor Raul Valdes-Perez.

The search problems of today are different from those of five years ago. With books, scholarly papers and television programs being digitized and put online, the technology necessary to search through the material needs to be that much better. People need a way to trust the information they find and to ask more-complex questions with search tools so they can extract knowledge or ideas.

Jaime Carbonell, director of CMU's Language Technologies Institute, said his research team is perfecting a technology for personalized search that would solve some of the privacy concerns surrounding the wide-scale collection of sensitive data, such as names and query histories. CMU's project takes an auxiliary approach to software already being tested by commercial players like Yahoo and Google, which are collecting and storing search histories on their own networks.

CMU developed an add-on application that people download to a PC. It allows users to maintain and modify personal information, such as query history, preferences and favored sites, within a search profile. A search engine would be able to query the profile, along with the user's search term, to deliver a set of tailored results each time, thereby keeping personal information off the network and on the client's desktop.

Carbonell said the technology will be ready within a year, and CMU could either offer it as open-source software or license it to industry players.

CMU is also working under a government grant on a longer-term project called Javelin, focused on question-and-answer search technology. Google, MSN, Ask Jeeves and others already help people find quick answers for word definitions or encyclopedia facts like "What is the

population of Los Angeles?" But for complex queries like "What is the cheapest flight from San Francisco to London?" or "Which university has the largest computer science department?" finding answers is still like doing long division.

"This is dynamic information," Carbonell said. "You must parse the question, look for answers in multiple places and do a comparison. There are multiple steps, and we're looking at how to do it in one step and provide a trace for the user."

He said it will likely take another four of five years to build such functionality that can scale computationally for wide consumer usage and deliver the kind of efficiencies the government and Internet users expect. The universities of Texas and Pennsylvania are also exploring different approaches to the same problem.

Stanford continues in its role as a breeding ground for search projects. Since 2003, Google has purchased at least two projects hatched at Stanford--personalization search tool Kaltix and a project from Anna Patterson, a Stanford computer science research associate. Stanford associate professor Andrew Ng, among others, is working on artificial-intelligence techniques for extracting knowledge from text in a search index.

Other projects have turned into young businesses. SearchFox is a Web upstart co-founded in December by James Gibbons, a longtime Stanford professor and former dean of its School of Engineering. The privately held company has created a collaborative search engine that lets people share favorite links and create personalized search indices.

Stanford, the Massachusetts Institute of Technology and many other universities are working to solve problems presented by the library of tomorrow, which will be largely digitized. Sifting through and organizing billions of digital documents will require new search technology.

MIT, for example, has teamed with the World Wide Web Consortium to create next-generation search technology using the Semantic Web, in an overarching project called Simile.

Under that umbrella, an MIT graduate student has developed a tool called Piggybank, software that plugs in to the Mozilla Foundation's Firefox Web browser. Piggybank lets people surf the Web, tag visited sites with keywords and build a local, annotated collection that can then be published to a site called the bank. Therefore, it turns into a "Semantic Web browser" so users can expand the scope of understanding around existing information on the Web.

"A generalized data archive lets you make data work together in ways you couldn't before," said MacKenzie Smith, associate director for technology in the MIT libraries.

In a demonstration of what the tool could do, Piggybank integrated data from Boston.com, a movie site and Google maps to show where coffee shops are located relative to restaurants and movie theaters. The tool also lets users save such information to a "database" record (rather than a bookmark) so that it can later be searched by its attributes or designated keywords.

MIT hopes to deploy the technology and other advances from Simile for use by faculty and students.

At Berkeley's center, Wilensky has ambitious plans to solve problems within a broader definition of search. That means analyzing and organizing diverse forms of information--anything from images and video to e-commerce--and helping people synthesize it and extract knowledge.

One major area of development will be in trust and privacy. For example, how believable is the content dug up on Google or how do you know an eBay seller is truly trustworthy?

Wilensky said his group has proved that on average, eBay seller ratings are skewed based on what's called retaliatory ratings in which people slam those who slam them. Others with black marks will disappear only to re-emerge later with a clean slate. As a result, Wilensky said, his team has built an algorithm called "EM trust" (for expectation maximization) using a statistical model for rating how honest an online seller may or may not be. That development might be applied to Web sites as well.

The center will be modeled after Berkeley's Wireless Research Center in downtown Berkeley, which enjoys the backing of big mobile companies. It will include such faculty as Jitendra Malik, professor and chair of U.C. Berkeley's Department of Electrical Engineering, and David Forsyth, professor of computer sciences, who are both working on computer-vision research.