Culture

Online library offers 1.5 million works and counting

The Universal Digital Library, backed by several major libraries around the globe, is more about preservation and less about getting clicks.

Candace Lombardi

In a software-driven world, it's easy to forget about the nuts and bolts. Whether it's cars, robots, personal gadgetry or industrial machines, Candace Lombardi examines the moving parts that keep our world rotating. A journalist who divides her time between the United States and the United Kingdom, Lombardi has written about technology for the sites of The New York Times, CNET, USA Today, MSN, ZDNet, Silicon.com, and GameSpot. She is a member of the CNET Blog Network and is not a current employee of CNET.

See full bio

Candace Lombardi

Nov. 27, 2007 2:49 p.m. PT

3 min read

The Universal Digital Library, a book-scanning project backed by several major libraries across the globe, has completed the digitization of 1.5 million books and on Tuesday made them free and publically available.

The online library offers full text downloads of works that are in the public domain, or for which the copyright holder has been given permission to make available. Having the backing of prominent institutions such as the Bibliotheca Alexandrina in Alexandria, Egypt, however, the collection goes far beyond the widely available classics, though those are there, too.

"You're not going to find over 900,000 works in Chinese on Google," said Michael Shamos, a professor of computer science at Carnegie Mellon University and director of intellectual property for the Universal Digital Library (UDL).

In fact, there are many differences between the book-scanning projects that Google and even Microsoft are doing, and the UDL, according to Shamos.

"If your subject is ancient archery and you have trouble because it's not something stocked at Barnes & Noble, you can find it at the Universal Digital Library."

--Michael Shamos, UDL director of intellectual property

For one, the UDL isn't interested in how many users it gets. Though its abundant amount of content and easy-to-download texts may make it attractive to e-book users looking for free compatible content, the library offers a large number of obscure works likely interesting to only a niche group of academics or hobbyists.

"If your subject is ancient archery and you have trouble because it's not something stocked at Barnes & Noble, you can find it at the Universal Digital Library," Shamos said.

Most importantly, he said, this is an undertaking of preservation for all humankind.

"Remember when the Taliban took over in Afghanistan and they dynamited statues they thought were heretical? We'll never have them again. But once books are digitized and stored on servers around the world, it becomes impossible for any one government to destroy all the copies of a book. Once it's there it remains immortal," he said.

The project, which has been ongoing for the last five years, was founded and is still directed by Raj Reddy, a Carnegie Mellon computer science and robotics professor who has been awarded everything from the ACM Turing Award to the French Legion of Honor. The project is funded partially by the National Science Foundation and, in addition to Carnegie Mellon and Bibliotheca Alexandrina, is led by Zhejiang University in China and the Indian Institute of Science in India. Seven other Chinese universities and eight other Indian universities are also partners.

Another difference between the UDL and other book-scanning projects is formatting. Because it was such a widespread project--books were scanned by several different groups in multiple countries--many open formats were used instead of one. Books from the Universal Digital Library are available in the open formats HTML, TIFF and DjVu (pronounced deja vu), an alternative to PDF.

While all of the content, regardless of copyright, has been digitized and indexed, those works still in copyright are only offered as abstracts. Even works from publishers long dissolved are included among the copyright text group if rights cannot be absolutely determined. That ensures that there is not even the slightest chance of copyright infringement, Shamos said.

"We don't have the legal resources of Google. We don't want to spend the university's endowment in legal fees," he said.