Google, Yahoo duel for documents

Google adds more than a billion documents to its searchable Web database, and rival Yahoo begins a similar endeavor.

Stefanie Olsen Staff writer, CNET News
Stefanie Olsen covers technology and science.
Stefanie Olsen
3 min read
Google has added more than a billion documents to its searchable Web database, and rival Yahoo has begun a similar endeavor.

Mountain View, Calif.-based Google said Tuesday that its searchable index has grown to more than 6 billion documents, up from roughly 4.5 billion in August, making Google one of the most complete search engines on the Web. That figure includes about 4.3 billion Web documents, 880,000 images, 845 million Usenet messages, and some book-related data.

"We're searching more comprehensively," said Sergey Brin, company co-founder and president of technology. "It helps, for example, if you're searching for a person like your next-door neighbor, you may get no result....Now you'll get one."

The announcement comes as Yahoo is quietly revving up its own search engines. The Web portal implemented on Monday a Yahoo-branded crawler, or robot, to scour the Web for documents. Called Yahoo Slurp, the robot "collects documents from the Web to build a searchable index for search services using the Yahoo search engine," according to Yahoo. The crawler is also keeping copies of those pages--what's known as "caching" pages.

In addition, Yahoo is starting to display results from its own technology, though company spokeswoman Diana Lee denied that it has switched from using longtime search partner Google. Yahoo, which has licensed search technology from Google since 1999, has said that it will introduce an in-house replacement in the first quarter of 2004 for Google. It has several technologies to choose from via acquisitions of Inktomi in late 2002, and Altavista and Fast Web search last year.

A review by CNET News.com shows that for some commercial queries, Inktomi results appear on Yahoo in place of Google. For example, search results for the term "Powershot G5" include a listing for Amazon's Web page offering the digital camera for sale. By running a cursor over the link, it shows a referral Web address from Inktomi, which attaches the referral tags to track its paid-inclusion program. Paid inclusion is designed to let companies pay to be updated in a search index more rapidly. Google does not offer a paid-inclusion program.

Other Yahoo results, for terms like "debt consolidation" and "casinos," show referral links from Yahoo.

Despite this, Yahoo spokeswoman Diana Lee said that the company is still using Google's search technology. Yahoo Slurp is new, she said, but it was previously called Inktomi Slurp. Lee would not disclose how many documents are in its searchable database.

Google said that in addition to searching a broader set of documents, its crawlers are now searching information-rich Web sites more deeply. Google doubled the amount of images in its index, for example, and updated the specialty index with a new user interface and ranking algorithm, Google's Brin said. Expansion also includes items external to the Web. Google has started adding pages of books to its searchable database, for instance.

Brin said that the comprehensiveness is important to delivering the best search results because it helps guarantee that what you're looking for will turn up.

In recent weeks, Google also has made about five quality enhancements to the service, which will affect search results more greatly than the additional documents, Brin said. Though he didn't detail the improvements, he said that at least two were meant to filter spam, or bogus listings, from its search results.

Characterized as a "good-sized investment" by Brin, the advancements are part of Google's ongoing mission to be the world's largest resource for information. When asked about developing a multimedia search service, Brin said that he was thinking about it, but that there are still some copyright and technology concerns.

"Ultimately we want to have all the world's information, whatever medium it is," Brin said.