SUNNYVALE, Calif.--It only took a few years for the science of information retrieval to move from an obscure academic niche to the secretive research departments at the heart of multibillion-dollar Internet companies.
But one of those companies, Yahoo, is trying to give a little more power back to the professors and grad students through a program called. The service lets academics and start-ups build their own search sites around Yahoo's search engine for free, manipulating results however they want.
Two dozen researchers and students from Stanford, the Massachusetts Institute of Technology, Purdue, and other universities met here at Yahoo for a day in September to hear the company's BOSS pitch, show off some ideas they've had for how to use it, and try to coax Yahoo into sharing even more information through BOSS. Overall, their response to Yahoo's program was favorable.
"It enables a lot of research that we wouldn't otherwise be able to do," said Harr Chen, an MIT researcher at the event.
If it works out as hoped, Yahoo will make some money out of the program: corporate users who reach large scale with BOSS will have to show Yahoo's search ads. The academic side is a step removed from direct revenue, instead giving Yahoo some prominence with potentially influential thinkers in a market Google dominates. Piquing the interest of researchers at universities with a reputation for incubating the next big ideas is smart, though, and Yahoo and Google themselves both grew out of Stanford.
And honestly, withto Yahoo's 19.6 percent, what does Yahoo have to lose?
"We're not a market leader," said Prabhakar Raghavan, chief strategist for Yahoo Search. "From a strategic standpoint, it does make sense to let other people innovate on top of us. If the pie grows, our share of the pie grows at the expense of somebody else."
The ultimate hope is that BOSS will mean money, too.
Yahoo has made the investment in a massive infrastructure that constantly scans and re-indexes the Web, filters out some of the dreck, interprets search queries, and provides search results in high volume in very short order. This infrastructure is prohibitively expensive for start-ups, just as it is for academic researchers, so Yahoo is letting companies use BOSS as well. Those operating on a small scale may use BOSS for free, but Yahoo requires larger efforts to either show ads or sign a custom revenue-sharing deal.
Mashing up Yahoo results
One possibility for BOSS is that Yahoo's search results can be combined with other data sets. "Other parties may have more info about their users," said . For example, a social-networking site can track movies or the activities of friends that could be useful in shaping search results. "This is stuff we may or may not have," Singh said.
Chengxiang Zhai and Bin Tan of the University of Illinois at Urbana-Champaign showed one example of BOSS in action that uses this idea of modifying Yahoo's search results. Their application steered Yahoo's search engine in particular directions based on the data stored on a user's own computer.
In the example, the computer was able to discern what type of jaguar the user was more likely to be looking for--the cat, not the car, or the version of Mac OS X--based on evidence on the computer.
"We believe the client side of personalization has a few advantages over the server side," Zhai said. "It can alleviate concern over privacy and it can provide more information about user activity. And it can naturally distribute computation," so a search company's machines share work with the user's own computer.
Researchers could investigate search and related technologies such as natural-language processing (NLP) without BOSS. But with it, that research is vaulted into a different domain. It isn't just a matter of taking more time; with BOSS's vast index of the Web, the possibilities are qualitatively different.
"You gain enormously from access to the data. There are all sorts of things you can do with tons of data" that you can't with a smaller set, said Stanford's Christopher Manning.
Manning works in the active field of natural-language processing, technology that aims to let computers discern the meaning of real human speech or text and that's behind search technology from search start-up Hakia and Microsoft-acquired PowerSet. NLP benefits tremendously from having large-scale data sources, Manning said.
"To understand what words mean, you look at how they're used. We do that on a large scale, (examining) usage and context to learn about meaning," Manning said.
Please, sir, I want some more
It also was clear the researchers' appetites were whetted by BOSS. Nobody sounded ungrateful, but heck, as long as Yahoo is sharing some important data, why not share a little more?
Yahoo is headed that direction. On the research day, it opened up access to another slice of search-related "prisma" data.
Prisma powers Yahoo's search assist feature that suggests searches based on what people have begun to type into the search box, which can make searching more convenient for users, but for researchers trying to build more technology atop Yahoo search results, prisma data is bigger than that. For example, it can show a search term's variations, its membership in categories such as place names, movies, and government, and the likelihood that people search for the term by itself or as part of a larger query.
"That's got a lot of potential," said Dan Ramage a natural-language processing Ph.D. candidate at Stanford. Ramage said BOSS is useful for his research, which focuses on determining the various relationships that can connect a pair of words, he said, but he'd like it better if he could get better control over the snippets of text Yahoo shows with its search results.
Yes, Yahoo will share more
Yahoo plans to release more. "Over time you'll see we'll offer a lot more ingredients, a lot more power," said Ashim Chhabra, senior product manager with the BOSS project.
Some researchers are hungry for as much as they can get. Chen, for example, hoped Yahoo could become an engine to run software supplied by researchers that plumbs its entire Web index.
"We give you a little code, you run that code on every document, then you give us a number," Chen suggested. It would be useful, for example, "to track evolution of themes and memes on the Web, different buzz trends."
Graham Mudd, product marketing manager for Yahoo search, said the idea is "not as crazy as you think," though he also gave the impression that researchers shouldn't hold their breath for that level of access. But Yahoo clearly wants to offer what he could.
When it comes to search research, "The pool of talent is divided between a half a dozen companies," Raghavan said. "We think it behooves us to open up."