Culture

Google adds major libraries to its database

Search giant plans to begin converting the holdings of leading research libraries into digital files that would be searchable online.

Stefanie Olsen Staff writer, CNET News

Stefanie Olsen covers technology and science.

See full bio

Stefanie Olsen

Dec. 14, 2004 12:13 p.m. PT

4 min read

Google will expand its ability for searching books by working with Stanford and Harvard universities, among others, to digitize out-of-print and copyrighted works.

On Tuesday, the Mountain View, Calif.-based search giant announced relationships with five major libraries, including those at the University of Michigan and Oxford University, as well as the New York Public Library, to create digital copies of some books so that they may be searchable using Google. Also on Tuesday, the company began sampling some works already scanned for Google Print, the company's searchable index of books that it formally unveiled in October.

Susan Wojcicki, Google's director of product management, said the project will evolve over several years.

"Libraries have been the keepers of information for centuries," she said. "We're excited to unlock that wealth of Information."

For now, the scope of Google's relationship with each institution varies. For example, Harvard Publications director Peter Kosewski said the university is in a pilot program with Google to scan only 40,000 randomly selected books from its collection of 15 million, the largest academic library in the United States and one dating back to the 1630s. By going through the process, Harvard will be able to vet issues such as care of the books and copyright concerns and determine whether it's appropriate to proceed, he said.

Google has long said it plans to make the world's information accessible and searchable, and a cornerstone to its mission would be to bring libraries to life online. Google itself was born out of a library digitization project at Stanford, Wojcicki said, and its founders had planned all along to build a vast searchable index of books. Only now has the company found the technology and resources to work with libraries to scan their volumes, she said.

Faced with increasing competition from Microsoft, Yahoo and others, Google is also trying to continually differentiate itself in Web search and make its service vital to consumers in new ways. The task is not only in making it easy for consumers to find an obscure travel site on Zimbabwe or track a UPS package, but now it's also in helping a visitor call up and read a work of Shakespeare.

Still, the company must navigate tricky issues of copyright. Because libraries own only copies of copyrighted books and don't hold the rights to reproduce those works for wide distribution, Google will likely have to deal with publishers to share revenue on advertising, excerpt only a small portion

of material or promote the purchase of books on third-party sites such as Amazon.com, all of which Google said it plans to do. The company said that at first, it will only display biographical information for copyrighted works.

For books in the public domain--books no longer protected by copyright--Google will allow people to search and read the entirety of the work. Oxford, for example, has agreed to let Google scan all of its books published on and before 1900.

The New York Public Library has agreed to a pilot program with Google, granting rights to scan an undisclosed number of books. Stanford and the University of Michigan have given Google the go-ahead to digitize their entire libraries, which Google estimated at 7 million volumes each.

William Gosling, librarian at the University of Michigan, said he expects it will take Google about six years to digitize its 7 million volumes, which do not include special collections, papers or manuscripts. Before this project, the university was scanning roughly 5,000 titles a year on its own, but now Google will digitize hundreds of thousands of volumes a year, he said. At the end of the project the university will have a complete digital rendition for its own use.

Google is underwriting the cost of the project for each library. Gosling said Google is using special equipment to take digital photos of book pages as they're turned, without having to take pages apart.

Many universities tout exclusive collections of books or letters, and for this reason, Google may also run into trouble obtaining clearances down the road to meet its goals. Harvard's Kosewski said that its test is only with a small number of books and that it would require an entirely new set of considerations if the university were to grant Google or others the ability to scan such works.

"The potential to serve people worldwide is without question," Kosewski said. "We have to ensure that the collections can be taken very good care of."

Google's project coincides with another academic pursuit. The company only recently introduced Google Scholar, a service for searching academic papers such as theses and abstracts. A commercial outfit that sells access to similar materials recently sued Google over its new program.

The library project builds on Google's previously released print service, which when launched, focused largely on digitizing works from publishers, including Random House and Knopf Publishing Group. The company recently invited all publishers to scan their books for inclusion in the index.

The service lets Web surfers call up brief excerpts from books, critic reviews, bibliographic and author's notes and, in some cases, a picture of the book jacket.

Google makes money from the service by displaying related ads alongside book text, and the company shares the majority of the ad revenue with publishers.

Rivals are jockeying for similar utility. Microsoft, for example, has built encyclopedia answers from its Encarta software into search results for its new proprietary engine. Last year, Yahoo began a content-acquisition project to digitize more searchable material. And Amazon features a search-inside-the-book tool so that people can browse works digitally before buying.