Yahoo to digitize public domain books

Site is launching library-digitization project to rival Google's controversial program, but Yahoo effort avoids copyright mess.

Elinor Mills Former Staff Writer
Elinor Mills covers Internet security and privacy. She joined CNET News in 2005 after working as a foreign correspondent for Reuters in Portugal and writing for The Industry Standard, the IDG News Service and the Associated Press.
Elinor Mills
5 min read
Yahoo is launching a library-digitization project to rival Google's controversial program.

Yahoo is working with the Internet Archive, the University of California and others on a project to digitize books in archives around the world and make them searchable through any Web search engine and downloadable for free, the group was set to announce Monday.

"If we get this right so enough people want to participate in droves, we can have an interoperable, circulating library that is not only searchable on Yahoo but other search engines and downloadable on handhelds, even iPods," said Brewster Kahle, founder of the Internet Archive.

The project, to be run by the newly formed Open Content Alliance (OCA), was designed to skirt copyright concerns that have plagued Google's Print Library Project since it was begun last year.

The Authors Guild sued Google last week, alleging its scanning and digitizing of copyright protected books infringes copyright, even if only small excerpts are displayed in search results as Google plans. Google argues that the project adheres to the fair use doctrine under U.S. copyright law, which allows excerpts in book reviews and the like.

Unlike Google, Yahoo will scan and digitize only texts in the public domain, except where the copyright holder has expressly given permission. The OCA project also will make the index of digitized works searchable by any Web search engine. Because Google is restricting public access to excerpts of copyright protected books, it is maintaining control over the searching of all the digitized texts in its program.

The Internet Archive, a nonprofit formed to offer access to historical collections that exist in digital format, will host the digitized material. Hewlett-Packard Labs is providing technology for scanning books, and Adobe Systems is providing software licenses for its Acrobat and Photoshop software.

The University of California system, The University of Toronto, the European Archive, the National Archives in the United Kingdom, O'Reilly Media and Prelinger Archives are all providing content, which will include books, speeches, spoken word audio, video and music, Yahoo said.

The University of California's 10 campus libraries have about 33 million volumes, of which an estimated 15 percent are in the public domain, said Daniel Greenstein, associate vice provost and University Librarian of the California Digital Library.

Greenstein said that contrary to publisher concerns that people will choose not to buy books if they can read or download them free online, the ability to easily find books on the Internet will broaden the public's exposure to them and is likely to increase, not decrease, sales.

"There is good evidence to suggest that if people see (that a book) is (out) there, they will buy it. Print sales either increase or are unchanged," he said. "We haven't once seen data to suggest that open access, at least to published printed works, decreases sales."

The University of California Press is likely to participate in the project, said Lynne Withey, director of the UC Press. "I'm all in favor of extending the availability of both books and journals in digital formats," she said. "So anything that does that in a way that respects authors' copyrights and also allows publishers to stay in business is a good thing."

By exposing more people to scholarly works, the OCA project could contribute to improved research and help reverse the trend among publishers of cutting back the number and print runs of books, said Lawrence Pitts, chairman of the University of California Academic Counsel Special Committee on Scholarly Communication.

Rising prices on books from academic publishers has meant fewer purchases by universities, he said. For example, academic presses that used to print 12,000 copies of a book a few years ago are now printing as few as 250 copies, he said.

"It is a terrible problem in the liberal arts, in particular, of getting a first book published, and that is often the ticket to being hired by a good university and getting tenure," Pitts said. "Data show that if you can put the material in an open access arena, the mention of the work doubles or quadruples because people out there in the world can find it better."

The OCA is appealing to publishers and other libraries, universities and archives worldwide to offer materials as well. "This is an international effort, not just domestic," said Dave Mandelbrot, Yahoo's vice president of search content. For example, "we would be very eager to integrate French content into the Open Content Alliance and are working with people in France to make that happen."

After Google announced its effort, the French government said it would embark on its own book digitization project, complaining that the Google plan would only accelerate the domination of the English language over other languages.

The OCA effort was applauded by publisher and author groups who have been critical of Google's effort, including the Association of Learned and professional Society Publishers, the Text and Academic Authors Association, or TAAA, and the Authors Guild.

"It is a wonderful idea. It does all the good things that the Google project was represented as doing, but it respects the copyright," said Richard Hull, executive director of the TAAA.

"Sounds fine, but we would want to see the details, of course," said Paul Aiken, executive director of the Authors Guild. "We have absolutely no problem with digitization of public domain works. With copyright works, we want to make sure the people who actually have the rights are the ones granting the licenses. In most cases it would be the authors."

The OCA also is looking for ways to help publishers be compensated for offering copyright protected books to the repository, said Mandelbrot. "We are working directly with publishers to come up with business models to encourage them to come up with ways to make works publicly available," he said.

O'Reilly will make some copyright works available, initially without compensation, to encourage others to participate, Yahoo said.

When asked to comment on the Yahoo project, Google spokesman Nate Tyler said, "We welcome efforts to make information accessible to the world."