In the 1990s, Carl Malamud prompted the SEC and Patent Office to put their databases online. Now he's focusing more broadly on liberating government data, which is often sold for a princely sum.
SEBASTOPOL, Calif.--From a corner of a nondescript office building at the edge of wine country, Carl Malamud is masterminding an electronic guerrilla war against governments across the nation.
Most geeks tend to be a bit obsessive, and Malamud is no exception. He's devoted his life to liberating laws, regulations, court cases, and the other myriad detritus that governments produce daily, but often lock up in proprietary databases or allow for-profit companies to sell for princely sums.
"One of the most important products our government makes is information," said the 49-year-old tech activist, who created a Lego animation to buttress his point. "We forget the important role of the government in producing these vast databases of information. That to me is infrastructure no different from electrical lines or roads."
Malamud's solution typically has been to create a proof-of-concept Web site, with the hopes of embarrassing government entities into building that infrastructure themselves. In the 1990s, his activism was responsible for persuading the Securities and Exchange Commission and the Patent and Trademark Office to make their data available for free on the Internet. Now, on his public.resource.org Web site, he's resumed posting hundreds of thousands of pages of government documents--all of which are, or at least should be, in the public domain.
This month, he's busy liberating California government codes, including San Francisco's building code, electrical code, fire code, and zoning code. That means purchasing printed copies for as little as $40 or as much as thousands of dollars, digitizing them, and posting them as PDF files without copy protection. Two months earlier, he posted the California Administrative Code.
One hitch is that San Francisco is one of those municipalities that claims its building code is copyrighted. (The notice says: "All rights reserved. No part of this publication may be reproduced or distributed by any means or stored in a database or retrieval system without prior written permission of the City and County of San Francisco.")
"I haven't heard from anybody" in the city government, Malamud said, since the documents were posted early last week. "That is a little surprising. I would have expected that someone would have at least called up and asked what we hoped to accomplish by doing this."
Adine Varah, a San Francisco deputy city attorney, declined to answer questions on Wednesday about legal action or the enforceability of the copyright notice. "The city and county of San Francisco strongly supports and ensures the public accessibility of its municipal codes," Varah said. "The San Francisco Municipal Code is a public record under our state and local public records laws. In addition, the city and county of San Francisco makes those codes publicly available for free on our Web site."
Varah warned that Malamud's document may prove to be out-of-date, and that city residents rely on it at their peril: "The city does not make any representation as to whether codes accessed on the Internet through non-city Web sites are accurate or up-to-date versions of the San Francisco Municipal Code."
California's Code of Regulations: only $3,288 for one year
One reason that city and state officials tend not to appreciate Malamud's efforts is that selling copies of regulations can be a source of revenue.
In California, Barclay's, a subsidiary of Thompson West, is the officially-designated publisher of the state Code of Regulations. A 2008 price list says the complete code of regulations is $2,315 in printed form and $3,288 with a one-year subscription with updates. A CD-ROM version with updates is $1,556.
Susan Lapsley, director of California's office of administrative law, said on Wednesday that the state claims copyright "to protect that intellectual property of the state."
"Here in California, we are the ones who publish (and) compile the regulations," Lapsley said. To take legal action against Malamud, "we'd have to go through the state attorney general. We haven't investigated it."
Lapsley said the state already makes an effort to distribute at the materials on the Internet, in state depositories, and through libraries. She said having government-certified sources is useful because the code is constantly in flux, with her office approving or rejecting 5 to 10 rulemakings a day and updating the official version accordingly. (In response, Malamud says that only a portion of the code is online and that the second part with building, electrical, plumbing, mechanical, and elevator regulations must be purchased.)
"It's a little bit more concrete in other states but here it's very organic, so the products they're getting from him are likely outdated," Lapsley said.
Sonoma County, just an hour's drive north of San Francisco, has chosen LexisNexis, part of Reed Elsevier, as its commercial publisher. The 42-chapter Sonoma County Code can be bought from LexisNexis' online bookstore for a mere $200.
Malamud says that's why he prefers to buy physical copies and pay a local business to scan them in. "The electronic stuff either has a terms and conditions on checkout, or they're using some sort of copy-protected PDF--it is a DMCA thing," he said. "Plus, on checkout, you agree to abide by that. They'll put some sort of contractual restriction around it." The DMCA, or Digital Millennium Copyright Act, broadly restricts circumventing copy-protection measures.
At least California, San Francisco, and Sonoma let their citizens view the documents without using digital rights management techniques. Not New York state, which boasts a DRM-enabled building code on the Web site of the International Code Council. The PDF files can't be printed, probably because the ICC sells the code in book form for $105 a copy.
Given that Malamud has made a habit of butting heads with Reed Elsevier, Thompson West, and various government entities, it's almost surprising that he hasn't been sued. He's not exactly hoping for it, but also is doing nothing that could be interpreted as shying away from a fight. (The Electronic Frontier Foundation has from time to time provided him with legal advice. His nonprofit group, Public.Resource.Org, has received money from Google, eBay founder Pierre Omidyar's charitable foundation, and the Sunlight Foundation. He's renting office space from O'Reilly Media.)
One recent spat arose when the state of Oregon began sending cease-and-desist letters in April to Web sites that had posted the text of the Oregon Revised Statutes. That is "copyrighted material, the author and copyright owner of which is the Legislative Counsel Committee of the State of Oregon," the warning said.
Malamud and some of his allies replied by drafting a sample court complaint, which made the common-sense argument that the copyright was invalid: If citizens are required to comply with state law, they should be able to reproduce it freely without threat of lawsuits. And, besides, the government employees tasked with creating the law have their salaries paid for by taxpayers.
That dispute, at least, had a happy ending. Malamud showed up to testify before the state legislature ("the fact that works of government are in the public domain is thus one of the foundations of our system of government"), and politicians eventually backed down.
These are merely minor skirmishes in what amounts to a far broader ambition: to persuade all branches of government, at every level, including the court system, to open their massive data banks to free public access through the Internet. Malamud convened a group he calls the Independent Government Observers Task Force that has held a series of meetings and compiled a list of eight principles for what they view as a truly open government.
One of those says the data should be "reasonably structured to allow automated processing," which would allow Malamud and his allies--including the Internet Archive and the Boston Public Library, which Public.Resource.Org is paying to scan 2.5 million pages of congressional hearings--to repackage files with XML tags and permit them to be readily indexed and cataloged. Eventually, search engines might even become smart enough to interpret those tags and act accordingly.
"I believe access to knowledge is a human right," Malamud said. "When I see people putting barriers around useful information, I find that offensive."
[Editor's Note: One of the benefits of having an archive as extensive as ours is that we can provide a window into Internet history. Here's our article about Carl Malamud in April 1998 titled "Patent office slammed for not posting data." A followup from June 1998 reported the Clinton administration's response and was titled "Government puts patents online."]
CNET's Stephanie Condon contributed to this report