Google's digital-book future hangs in the balance

Google Book Search has the potential to unlock the musty archives of the world's libraries. But will it overcome antitrust obstacles and other opposition?

Stephen Shankland Former Principal Writer

Stephen Shankland worked at CNET from 1998 to 2024 and wrote about processors, digital photography, AI, quantum computing, computer science, materials science, supercomputers, drones, browsers, 3D printing, USB, and new computing technology in general. He has a soft spot in his heart for standards groups and I/O interfaces. His first big scoop was about radioactive cat poop.

Expertise Processors, semiconductors, web browsers, quantum computing, supercomputers, AI, 3D printing, drones, computer science, physics, programming, materials science, USB, UWB, Android, digital photography, science. Credentials

Shankland covered the tech industry for more than 25 years and was a science writer for five years before that. He has deep expertise in microprocessors, digital photography, computer hardware and software, internet standards, web technology, and more.

See full bio

Stephen Shankland

June 18, 2009 10:10 p.m. PT

11 min read

Google, the company best equipped and most motivated to digitize the world's books, wants to offer the world an online Library of Alexandria. The decisions of the Justice Department, authors, book publishers, a federal judge, and Google itself likely will determine whether the company actually does.

Nobody in recent years has accused Google of lacking ambition, but its Google Book Search project is certainly among the company's top projects when it comes to chutzpah. That's not just because of the technical and financial hurdles of scanning, indexing, and displaying online millions of books, it's also because of the tangled intellectual property and legal concerns involved in the controversial project.

After revealing the book-search project in 2003, Google drew copyright infringement lawsuits from the Authors Guild and the Association of American Publishers in 2005, but an October 2008 proposed settlement, now under review by Judge Denny Chin of the U.S. District Court for the Southern District of New York, has converted those groups from adversaries to allies.

The settlement, if approved, could neatly cut a Gordian knot of copyright entanglements though setting Google back $125 million. That's because it would enable Google not only to display books that are out of copyright and those that are in print by cooperating publishers, as it does today, but also those from the vast collection of in-copyright brooks that are out of print--even when those holding rights to those books didn't specifically agree to Google's plan.

The complicated proposed settlement invoked the wrath of some authors concerned it would grant Google monopolistic power over online publishing, and the court extended the deadline for authors to choose whether to opt out of the settlement from May to September. Then the other shoe dropped this month: the Justice Department signaled serious antitrust scrutiny by issuing subpoena-like civil investigative demands, or CIDs, to check into the matter.

AIG and General Motors apparently are too big to fail. But the way the opposition to Google Book Search is shaping up, it looks like some believe Google is too big to succeed.

Why doesn't Google just scrap it?

Google Book Search isn't just another Google project. It's a link from Google's current Internet-based view of humanity's collective knowledge to the broad and deep information contained in the world's books. If the company succeeds in its ambition, the world's books will emerge from dusty library stacks to be reborn on the Web, and Google already has a 7-million book start.

"Google's mission is to organize the world's information and make it universally accessible and useful," the company tells us. And conveniently, the company has found a way to make money presenting that information: sell ads next to search results based on the search terms people type in. To foster business growth and to meet rising expectations, Google must collect more data on its servers and improve the algorithm that selects search results from that data.

Google Book Search can show the content of books as well as links of places to buy it and advertisements. Screenshot by Stephen Shankland/CNET

The beauty of Google's approach is that it picks winners in search results based on the collective judgment of humans on the Internet rather than its own assessment of the content's quality. Adding data from books to search opens up a new pool of data, potentially leading to relevant search results for more search queries.

"We've always said that the perfect 'I'm feeling lucky' experience is when we get that answer right for you every single time. Maybe it comes from a Web page, maybe from a video, sometimes from a book," spokesman Gabriel Stricker said. "Our ability to have the most comprehensive search engine improves our ability to deliver on core search, which is the core of our business and one that's proven itself to be really profitable."

Though search is Google's primary business, the company also stands to make money directly from book search. Under the proposed settlement, Google could share revenue with authors and publishers from sales of PDF copies of books, from fees from institutional subscriptions granting access to its online library, and from advertising.

When Google began its project, it showed only short "snippets" of text from books it had scanned, just as it does today with excerpts from Web sites it shows in search results. The company argues that such snippets may be shown under the "fair use" provision of copyright law that use of copyrighted information under some circumstances without licensing it first.

The book-search lawsuits challenged whether such use was permissible. But by the time the proposed settlement arrived, though, Google got much more for its $125 million.

How does the proposed settlement work?

It took months to hammer out the proposed settlement, which runs to 320 pages including 15 appendices. Among its key features is the establishment of a Book Rights Registry, run by authors and publishers to keep track of who owns rights to which books and to collect money from Google's online sale of those books, either through individual use or through institutional subscriptions. For orphaned works, the registry would keep money from online sales for later distribution to rightsholders that turn up later.

Google, seeing lemons in the form of the Authors Guild's a class-action lawsuit, ended up with lemonade in the settlement. Class-action settlements apply to a class of potential plaintiffs, and in the case of Google Book Search, those with rights to books must opt out of the settlement if they don't want to be a party to it. That means essentially that Google would be permitted to show content from in-copyright, out-of-print books and sell online copies of those books even without an explicit agreement with the books' rightsholders.

The Berkman Center for Internet and Society at Harvard estimates this latter category accounts for 70 percent of Google Book Search books, and it's a key factor for so-called orphan works--books or other materials whose authors can't be located. The settlement would grant Google rights to use those works, but competitors--Microsoft, Amazon, or the Internet Archive are all real possibilities--without their own handy class-action settlement would be have to try to seek such permission in advance from each rightsholder or risk copyright infringement litigation.

Access to these orphan works is the first thing Google could get beyond its original book-search project. The second is the ability to show more material than just snippets, which means that Google users get much more useful search results and that much more of a scanned book might be shown online.

Authors might be afraid to give some content away for free online that they're accustomed to charging for, but showing more can help sales, Google said, basing its judgment on data from book-search results involving content from the more than 10,000 publishers and authors that participate in the current program that can be used to show specified portions of a book.

"Our data show really conclusively a direct correlation between the more pages people view and the likelihood people click 'buy the book,'" Stricker said, referring to present arrangements with in-copyright, in-print books, for which Google Book Search offers purchasing links.

Google keeps 37 percent of revenue from online book sales, advertising, and subscriptions; the not-for-profit registry would take a portion of the remainder for operating costs and distribute the rest to the rights holders. Although Google has an algorithm to set pricing for book downloads, rights holders can set prices through the registry if they want to override Google's decision.

Settlement resistance

What's not to like for authors? Google Book Search gives them a way to sell books that are out of print, which today for them make money only for used booksellers. And through other provisions, students and other researchers would get access to vast online libraries at institutions that pay for subscriptions, and the public would get a Google-funded computer with free access to the same in every U.S. library.

But the idea of being a cog in the Google machine doesn't sit well with some--including the fact that authors must figure out whether they want to participate in the settlement and the Book Rights Registry.

"Under the actual law, it is Google's burden and not yours to ask you for permission and then fairly negotiate terms of contract acceptable to you personally, not jam some monstrosity down your throat," said Lynn Chu, a literary agent with Writers' Reps who also called the proposed settlement a "ripoff for authors" in a Wall Street Journal opinion piece.

"The settlement creates a fundamental change in the digital world by consolidating power in the hands of one company," Harvard professor and author Robert Darnton concluded in a New York Review of Books opinion.

Concerns about the settlement and its complexity led the judge to extend the opt-out deadline by four months to September 4, giving rightsholders more time to considering whether they really wanted to join the settlement agreement and giving Google more time to conduct its worldwide campaign to try to inform as many authors as possible of the proposed settlement--an important activity since the company must convince the court it fulfilled its obligations to inform members of the class of their involvement in the suit.

Another organization that raised objections is the Internet Archive, which operates the Wayback Machine to catalog snapshots of the Web in earlier days and offers out-of-copyright books online.

"If the settlement were approved, it would be really difficult for the Internet Archive to work with the same group of books--those with no known rightsholders," said Peter Brantley, an Internet Archive director. If it tried to offering orphaned works online, "we could be faced with significant claims of infringement out of the blue."

Google has patented technology for scanning books. U.S. PTO

Instead, Brantley would prefer to see the issue addressed through legislation that could define what a digital library, for example, had to do in trying to locate an author before being able to use an orphaned work. Such legislation also could set up a mechanism similar to the Book Rights Registry that could hold money in escrow for later distribution to rightsholders once they're located.

"The best way of doing this is not through the court creating a private monopoly through a commercial actor, it's crafting legislation through Congress," Brantley said. That idea is within the realm of possibility: orphaned-works legislation made significant headway through Congress before faltering last year.

Monopoly power?

The Justice Department's scrutiny is a new wrinkle for the settlement. It's lost on no one that the Justice Department torpedoed a Google-Yahoo search-ad partnership last year by threatening a lawsuit. But Google argues Google Book Search isn't anticompetitive.

"The agreement as structured in a way to encourage competition. It's nonexclusive," Google's Stricker said. "The charter of Book Rights Registry explicitly says the registry will be able to work with other third parties to represent rightsholders who come forward."

And Mike Boni, attorney for the author's subclass, points out that participating in the Book Rights Registry or Google Book Search doesn't preclude an author from other licensing moves. In fact, thinks the registry could help other online book efforts.

"If anything, it's a positive," he said. "If over time the Book Rights Registry locates authors of out-of-print books, it winnows down to a small number the number of books that have been difficult to find. And it can assist competitors of Google to reach licensing arrangements," by facilitating contact with authors. And Google putting books online well help locate the "parents" of orphan works. "As Google digitizes books, information about the books will become more and more known. It will be easier and easier to locate the rightsholders of these books," he said.

Nonetheless, even supporters have qualms.

"The project will be immensely good for society, and the proposed deal is a fair one for Google, for authors, and for publishers. The public interest demands, however, that the settlement be modified first," said New York Law School's James Grimmelman. "It creates two new entities--the Books Rights Registry Leviathan and the Google Book Search Behemoth--with dangerously concentrated power over the publishing industry. Left unchecked, they could trample on consumers in any number of ways."

Randal Picker, a University of Chicago Law School professor who's scrutinized the books project, believes that the rights that Google alone gets through the class-action suit are pertinent.

"What I think the judge needs to think about is whether we think the Authors' Guild would on its own grant a similar license to competitors to Google. If answer is no, and there is good reason to think they would say no, this license will by its terms create monopoly power," Picker said. "There is a chance this is the only orphan-works license that will created. No one else like the Internet Archive would be in a position to compete with Google with respect to the orphan works."

Who else but Google?

Before siding with opponents or supporters of the agreement, try stepping back to look at the big picture. Chu asserts that scanning is neither rocket science nor expensive. But is it that true when viewed at the scale of all books published?

Google has patented technology to scan books that can correct for the 3D shape of a page. It's scanned millions of books already. It has technology to search those books fast and to show those books online. It has a functioning business model that can subsidize the expense, and a will to actually take on the monumental challenge.

The music industry, whose CD-based music was unencrypted, still has yet to come to full terms with the digital era. Those with video content tentatively embracing online distribution, but also are struggling with the forces of the Internet. Google Book Search, in contrast, could help an analog publishing industry move to the digital era more gracefully, even possibly with some money to be made.

The physical incarnation of books have a solidity that the fleeting, impermanent Internet can't match, but making books available online gives them new life by exposing them to people who might not have found them otherwise--even if they happened to be near a library that held that book and saw its title in a card catalog. Google has the most powerful engine today to help people discover exactly what's in those books, and it has the servers, storage, and network capacity to deliver that information to the world. It even has increasingly sophisticated translation technology that could bulldoze literary language barriers, and digitization could make countless books more easily available to blind people.

Indeed, who else but Google has the capability to transport centuries of accumulated text into the digital future? Microsoft dropped its book-scanning project, and Amazon appears more interested in commercial transactions. The Internet Archive has hundreds of thousands of books available, but it doesn't operate on Google's scale, and the nonprofit group hasn't pushed hard enough to try to break the copyright logjam the way Google has.

Then, too, think of the consequences of Google controlling the content of the world's books. Do you want the act of browsing the library to leave fingerprints in a server log, to become a transaction whose details can be revealed through a subpoena? Google has the best search engine, the most complete online maps, the most popular video site, and it wants to house your e-mail, spreadsheets, blogs, photos, and health data. Do you want Google to keep the keys to the world's library as well?

"It's not beyond the realm of possibility to digitize every book ever been printed. That's a boldness the national libraries had not imagined was in the realm of reach. We all owe Google credit for saying, 'Go for it.' That is a huge benefit to global society--to digitize the information that humans over hundreds of years have garnered into these things we call books. That has benefited everyone," Brantley said. "What doesn't benefit us that...Google alone will be able to provide access to that information in ways that cause us deep concern for privacy, pricing, and innovation."

Services and Software Guides

VPN

Cybersecurity

Streaming Services

Web Hosting & Websites

Other Services & Software

Services and Software Coupons