X

LookSmart bets on distributed computing

The Web search company plans to use unused processing power on volunteers' PCs to build an index of billions of Web documents that can be updated daily. But will it work?

Stefanie Olsen Staff writer, CNET News
Stefanie Olsen covers technology and science.
Stefanie Olsen
6 min read
LookSmart is hoping to spin a small acquisition into a big project that will use distributed computing to improve its Web search results.

In January, LookSmart quietly bought the assets of Grub, an Oklahoma-based developer of technology that lets people donate their computers' otherwise unused processing power to run spiders, or software programs that continually crawl the Net, indexing pages and words. This collective, or distributed, computing power could be used to find new, outdated or updated Web pages daily.

LookSmart, which licenses editorial and commercial directory listings to Microsoft's MSN and other Web sites, paid $1.3 million in cash and stock for Grub, according to a recent filing with the Securities and Exchange Commission. LookSmart said it is testing the Grub system and plans to unveil the distributed computing project in early April.

"Most engines only update their entire document catalog once a month, because there's an inherent computing problem: They can't do it any faster," said Pete Adams, chief technology officer of LookSmart. "The goal of this technology is to be able to crawl every document on the Internet every day. We can only do that if we can grow the number of people that are running the software--the computing power we would use is a function of how many people we have donating their computer power."

The Grub buyout underscores growing interest in distributed computing, in which computing jobs are farmed out in small chunks across the Internet to the otherwise untapped processing cycles of ordinary PCs. The movement has had grand ambitions--to find a cure for cancer or signs of intelligent life in the universe, among other things. But thus far, its chief successes have been curiosities such as the discovery of gigantic prime numbers.

LookSmart's long-shot bet on Grub highlights the race to innovate in the search engine arena. A handful of companies are vying for control in the niche, one of the few areas of the Net economy to have generated strong revenue and profit growth since the bursting of the dot-com bubble.

Yahoo just completed its acquisition of Inktomi, while Overture Services recently decided to snap up AltaVista and some of the assets of Norway's Fast Search & Transfer. Meanwhile, Disney has suggested that it might be interested in selling its Infoseek search engine.

Though it has a history as an editorial guide for the Internet, LookSmart has modified its business in recent years in order to survive the dot-com downturn. It still operates a volunteer-staffed directory, but the company has largely turned its focus to small-business listings, in which marketers pay for Web site reviews.

It also sells commercial listings related to keyword searches. The formula helped the company to reach profitability under Generally Accepted Accounting Principles (GAAP) for the first time in its fourth quarter last year.

At the same time, LookSmart has expanded its arsenal of search services to challenge the growing popularity of Google, the Web's best-loved search engine.

Last year, LookSmart used about $9.25 million in stock to buy WiseNut, an emerging technology company that uses automated crawlers to index the Web. LookSmart has yet to fully push the service. In the meantime, the company said that it has expanded its index to more than 1 billion documents and is improving WiseNut's algorithms for calculating the relevance of Web pages in relation to keywords.

Distributed digging
Google itself has experimented with distributed computing. Last year, the search leader invited 500 people to try out a new version of a toolbar that lets Windows users donate their computers' unused processing power to the Folding@home scientific research project at Stanford University. That experiment resulted in a small success when Stanford published a scientific paper based on the Folding@home calculations last year.

However, the idea of using distributed computing to boost search results remains is still in its early stages. Grub has operated under the radar since 2000, when it was founded in Oklahoma by Kord Campbell. Since Grub was acquired, the company's four-person team has moved to LookSmart's San Francisco headquarters.

Previous attempts to harness distributed computing models to update search listings have so far failed to produce useful results, according to search experts. For example, Infoseek has a patent on a system under which sites feed their content to a search index, in order to keep it updated and comprehensive--but the company never did anything with it, said Danny Sullivan, editor of the industry newsletter Search Engine Watch.

Sullivan said that setting up a collective effort to catalog the Web could lead to some improvements in efficiency, but it could open the door to other problems.

"If you allow anyone to just send you information, a small number of people will try to manipulate the system," he said. "Suddenly, you'll have someone that says: 'Surprise! I'm an Amazon.com affiliate and I have a million-page Web site, each page duplicates an exact page at Amazon, add me to (your) index.' When it comes to Web search, some people will abuse this because there are monetary reasons to do it."

Many pages on the Web are static and thus don't need to be indexed frequently, said Sullivan. Instead, search engines need to be more intelligent about directing people specifically to information relevant to their searches.

"If it were just that the system was going to harness the collective computing power of Web users, I think it would be useful. But it comes back to...spammers. When you drop the barriers completely, will the experience (for search consumers) be great? The conventional wisdom would be 'no.'"

Sullivan speculated that LookSmart might use the Grub system to start a "trusted feed" service for inclusion into its WiseNut index. Marketers could send updated Web pages to the index to refresh it for a fee--or what's known as paid inclusion. Search engines, including Inktomi and Fast's AlltheWeb, already use such a service to keep indexes of product-related sites and catalogs fresh and to augment revenue.

Pulling in participants
So far, only about 130 volunteers are participating in the test to donate their computer's processing power to crawl the Web. They do so by downloading software to their PC. The company's success will hinge on the number of people it signs up to donate computer resources to the cause. As part of the project, Grub promotes the benefits of "local searching," in which Webmasters can index their own sites and submit changed pages to the Grub directory, a process that can help save on network resources.

LookSmart also plans to introduce a Web application programming interface (API), with which Webmasters will be able to query documents contained in the registry.

Charles King, research director for the Sageza Group, a Mountain View, Calif.-based information technology analysis company, said that the real challenge that LookSmart faces is in recruiting devoted volunteers.

King said that projects such as Seti@Home, a distributed computing search for extraterrestrial life, have a built-in "geek factor" that draws Web surfers to donate their computer resources to the cause. Similarly, Intel hosts a project to research cures for cancer, luring people who have been touched by the disease. Although he doubts that Web indexing would be a big attractor, King said that it could be a good application for distributed computing.

"The Web is growing at such a phenomenal rate that charting an index of it is an ongoing process," said King. "There are so many pages added on a daily basis, that a snapshot today will be inaccurate tomorrow. But this is the kind of thing that will be successful only as long as they can inspire interest and keep it."