X

Developers dig in to Google's toolbox

A new toolkit that allows software developers to automatically query the search site's database has spawned visions of real-time search results. But critics are skeptical.

Stefanie Olsen Staff writer, CNET News
Stefanie Olsen covers technology and science.
Stefanie Olsen
7 min read
Imagine retrieving up-to-the-second results for all of your Web searches, or personalizing a high-powered navigation system for your desktop.

Sound far-fetched? Not to some software developers who've found inspiration in a new service from Google, the search site everybody loves to love.

Late last week, Google launched a new tool for Web developers called Web APIs (application programming interfaces). Simply put, the service lets developers automatically query 2 billion documents from its database on a limited basis. Then they can publish results as they choose, as long as it's for noncommercial purposes.

To be sure, the experiment could turn out to be just so much hype. It is less than a week old, so it has yet to deliver anything astounding. Early creations of the APIs are cropping up as a "Google box," a display of search results that takes the pulse of any desired term 1,000 times a day. Google is promoting the APIs as a means to create an online game or use its spell-checking technology.

Still, the announcement has the research and software development community dreaming up applications that could be spawned from access to Google's massive database of Web pages, documents, images, news and discussion-group archives.

"It's an extremely important and forward-thinking move by Google," said Rob Sanderson, an England-based researcher working on the University of California at Berkeley's Cheshire project, a library catalog and full-text information retrieval system. "The beta release is just a teaser for what could be done with full access to Google's engine."

Google's APIs come as the company is trying to find its financial footing amid fierce competition in the market for Internet search services. Widely thought to be preparing for an initial public offering, the company has been slowly cementing its position as the No. 1 search provider on the Web, with more than 3 billion documents in its database. In the last couple of years, Google's technology scored it a high-profile contract with Yahoo, taking over Inktomi's position. It has also replaced paid-listing company Overture Services on EarthLink's site.

Although the API service is currently free, many software developers say it could be a catalyst to get creative juices flowing before the company puts a price tag on it. Such a move would fall into line with Google's recent efforts to boost its revenue, which for now relies on advertising. Earlier this year, it introduced an enterprise search device and unveiled new tools for advertisers to bid for better exposure on the site.

The service also comes as Google seeks to stop some parties from performing automated searches of its database for commercial gain--a forbidden practice that typically drains bandwidth resources. By requiring Web developers to register with the site and by capping the number of searches, Google can save resources while endorsing, and potentially charging for, legitimate uses of automated queries.

Chris Sherman, associate editor for search publication SearchEngineWatch.com, said the move helps alleviate some of the problems Google faces with automated robots querying its database.

"This is the warning shot that things are going to change here. And the nice thing is that before shutting off access, Google has opened up this gateway for the Web community to have access to vital information," Sherman said.

Gaga over Google
Google's API test service uses WSDL (Web Services Description Language) and SOAP (Simple Object Access Protocol) standards, so developers can link their applications regardless of the language--such as Java or languages supported by Microsoft's Visual Studio .Net tools--used to program them. But since the service launched five days ago, several people in the software development community have adapted code so that APIs can be developed in alternate programming languages, including Perl and Ruby.

Web developers can only retrieve 10 "hits" at a time, performing 1,000 searches a day only on the Web. But developers are encouraged by the prospect of having access to Google's vast collection of images, directory links, news archives and newsgroups. One potential application could be an automatic query of the day's hot topic using Google News, a database of newspaper headlines and excerpts in test phase at the site. Another offshoot could be based on a query of Web documents related to a single day in history.

see special report: Web services: The new buzz Chris McClelland, a programmer in Marblehead, Mass., developed a search bot called "botgoogle" for AOL Instant Messenger (AIM). People can search for a term by sending an instant message to "botgoogle," which will return up to five results from Google. McClelland said it made sense to create an IM application because many of the other bot technologies such as ActiveBuddy's "Smarterchild" are based on similar expansive databases. However, the application can only answer 1,000 queries a day, which "severely hinders my bot's performance," he said.

If Google's APIs take off, the benefits could extend far beyond the company to the Web at large, according to Web developers, who said the move could provide a big boost for ballyhooed, but still mostly theoretical, Web services.

By allowing Web sites and applications to sync up or share data to build new systems or sites, Google's APIs could provide the clearest example yet of how Web services will actually work. And that, proponents say, could inspire valuable developments in the Web community, including forcing other software developers such as Microsoft to follow suit.

"Believe me, we're going to get tired of the Google box," said Dave Winer, publisher of a Web log and head of software company UserLand. "It's not at the core of what they released, but it says maybe they'll expose some functionality that is useful. It opens up the conversation."

Even Winer acknowledges that the experiment faces some hurdles, especially if the APIs mature into robust applications that Google can tap for revenues.

Because it uses SOAP, the Cheshire project's Sanderson said Google could hit a snag because of a hotly debated proposal to extend patent rights to Web standards, a possibility that he describes as a "dark cloud looming" over the experiment.

Already Winer and others predict that Google will charge for various applications of the service. But Sanderson predicted that API users might not only have to pay Google for better access to its database--they also might have to pay companies such as Microsoft and IBM to use the mechanism for performing such searches. He added that the patent owners have yet to state any sort of terms that they would consider a "reasonable and nondiscriminatory license."

Nelson Minar, lead software engineer on Google's API project, said that while it's too early to tell what kinds of applications will evolve out of the experiment, interest is high. He said that after only four days, 10,000 developers signed up to use the API service. In addition, he said that 15 or 20 libraries for various programming languages have been created.

"The possibilities are limitless," Minar said. "Our idea is to provide the raw material for the developer community, and (they) will come to us with creative applications that use our Web API."

Real-time searches?
Winer, for one, is enthralled by Google's willingness to let developers collaborate on ideas. In the act of introducing APIs, Google has changed the status of some people from users to developers, and that could open a dialog to allow for real-time "crawling" of the Web.

This means that eventually Web publishers such as CNN or Web loggers could notify Google's search engine every time data on their Web page is updated. Then Google could automatically verify the command and index the page on the fly.

"They could open themselves to that kind of notification and verification so they would provide up-to-the-minute search capability," said Winer, who predicts that one day Google or another search provider will offer a service for the home computer.

Google CEO Eric Schmidt has already said real-time search results are a priority for the company, which launched in 1998. In the last year, the company improved its search technology to begin indexing sites for news organizations such as CNN and The New York Times that change more frequently. Previously, it crawled the Web every 30 days or so.

Like other developers, the Cheshire project's Sanderson believes the test is valuable for the future of interoperable applications, allowing for seamless embedding of Google's resources into other programs through multiple platforms and standards.

For example, Sanderson said the Google APIs could allow his project to create applications that search both traditional library catalogs as well as the entire Web from a single command.

"Now not only can you find the books available on the topic, but also Web sites which might offer relevant information, and in a uniform interface rather than having to search the library, open a new browser window, and then search Google," he said.

"Students in particular will find this useful when researching papers," he added.

Others, while interested, are more skeptical.

Aaron Straup Cope, one developer who wrote a script for APIs in Perl, said he sees the move as validation for distributed computing or Web services, but he added, "I doubt that (APIs) will make the Earth move."

"Being able to 'plug' Google-ness in to your Web site will, if nothing else, provide an example of 'distributed computing' that is not as abstract as those that have come before it," Cope said. He also cited the development of cross-publishing functionality in Web logs.

Cope said Google's APIs give further credence to the concept of an "Internet operating system," where people can pull in or manipulate content via a remote function as a page is being published. "That is, there is a growing interconnectedness among pages, sites (and) applications."

Still, Cope predicted many of the problems that plague Web sites such as bandwidth limitations will hamper broader adoption.

"Not everyone has a thousand servers like Google does," he said.