As President Obama's $825+ billion financial stimulus package works its way through Congress, a number of groups have started to call for increased transparency in the way that data on the proposed spending will be shared with citizens.
Most noteworthy are demands from public-interest groups and academics that the the data be provided in a format conducive to user-generated mashups and remixes.
The American Recovery and Reinvestment Act of 2009 passed through the House Appropriations Committee a couple weeks ago, and it is expected to come up for a full House vote in the coming weeks.
In addition to authorizing the spending of an obscene amount of money, the act also mandates the creation of a Web site to "foster greater accountability and transparency" in the use of those funds.
While the bill does a great job in mandating the kinds of information that will be put online (contracts, audits, inspector general reports, etc.), it is rather vague with regard to details on how the information will be provided.
The only hints include language mandating that the information be "easy to understand" and "regularly updated," and include a "database of findings from audits," "printable reports," and "user-friendly visual presentations to enhance public awareness of the use of funds."
Such statements bring to mind the possibility of yet another boring and difficult-to-navigate federal government Web site, perhaps similar to the Federal Communications Commission's antiquated and ineffective home page, or the Federal Elections Commission's slothlike campaign donation search engine.
Faced with the possibility of another Web 1.0 Web site designed by the federal bureaucracy, a number of pro-transparency activists and tech policy academics have started to weigh in on the issue, all of them demanding the same thing: full, easy, and free access to the complete data set powering the Recovery.gov Web site.
For example, while the FEC's donation search engine was often slow and unresponsive during last year's presidential campaign, a number of third parties were able to create fantastic mashups of the campaign donation data--the most notable of these being the Hufington Post's FundRace tool, which provides users with a Google map view of each donation to the presidential campaigns.
The numerous independent sites allowing for the easy navigation of campaign donation data was possible because of the legal requirement that all FEC data be made available in full to the public. As a result, public-interest groups and media organizations were able to create their own innovative mashups and remixes of the data, providing faster and more responsive Web interfaces than the FEC's overwhelmed servers, as well as creating innovative visualization methods for navigating the data set.
We'd like the site to serve not just the amateur information consumer, but also the programmers that can skillfully remix the information. The citizen observer's role seems well-addressed by the legislation that mandated the site (with requirements for "printable reports," feedback, and to be "easy to understand"), while the needs of the programmer are largely unaddressed. The data should be available in formats that facilitate more advanced use by programmers and analysts alike.
Certainly, the data should be made available following the 8 Principles of Open Data: (1) complete, (2) primary (as it is collected at the source), (3) timely, (4) accessible, (5) machine-processable, (6) nondiscriminatory, (7) nonproprietary, and (8) and license-free. XML and CSV are a minimum.
Search is great, if you are looking to find information about any one thing. But original analysis and visualization require access to data in bulk. If the goal of putting the data online is to increase accountability and transparency, then it is necessary (to) provide bulk data access.
Echoing this last point, David Robinson, the associate director of the Center for Information Technology Policy at Princeton University, told me that "(no) one person or organization could possibly anticipate all the ways that Americans will want to analyze, reuse, or cross-reference the information that Recovery.gov will offer. And no one person or organization needs to do so, as long as the data itself is readily available."
In 2008, Robinson and his colleagues at Princeton published a paper calling for the government to provide open access to the raw data used by all federal Web sites. The highly influential paper has been widely circulated among technology policy circles in recent months.
"This is a little tricky, because people have to settle on a format, and then require submissions in that format from contractors and state and local entities, etc.," Harper told me. "But if the administration wants to be transparent, a little forcing will go a long way. States and contractors will learn how to deal with standardized data quickly, if it makes the difference on getting federal dollars."
A month ago, Harper moderated a one-day forum at Cato, in which a number of policy experts called for open access to government data. A video and podcast of that event can be found here.
Given that this bill has largely been written and shaped behind closed doors, it remains unclear how much of an impact these pro-transparency activists will have on the legislation that will create the Recovery.gov Web site. As of press time, calls for comment left with the House and Senate Appropriations Committees had yet to be returned.