The White House's Office of Science and Technology Policy is launching a research and development program to collect and analyze reams of data, or big data.
The heads of seven federal science programs are scheduled to hold an online press conference this afternoon to discuss their work with big data. The event at the American Association for the Advancement of Science will have representatives from the Office of Science and Technology Policy, the National Science Foundation, the National Institutes of Health, the Departments of Defense and Energy, DARPA, and the U.S. Geological Survey. Commitments from the different agencies and departments will total about $200 million, according to the New York Times.
A media advisory indicates that the initiative is focused on addressing the issue of big data in scientific research across multiple fields. "To capitalize on this unprecedented opportunity to extract insights and make new connections across disciplines we need better tools and programs to access, store, search, visualize and analyze these data," according to the advisory.
"There is recognition by a broad range of federal agencies that further advances in big data management and analysis are critical to achieving their missions," Edward Lazowska, a computer scientist at the University of Washington told the New York Times. "It doesn't matter whether the mission is national defense, energy efficiency, evidence-based health care, education or scientific discovery."
Businesses in many fields, such as Internet, finance, or pharmaceutical, are already very advanced in collecting, sifting, and analyzing giant data sets. Researchers in many fields, such as genetics, astronomy, or climate science, have access to a huge amount of data but need better analytical tools, coupled with high-end computers, to speed their work.
Apart from tools for analysis, the White House has already taken steps to make data available in common formats which are available at the Data.gov site. Green Button is a simple data format which utilities can use to make electricity usage information available to consumers. The idea is to create data sets in common formats which allow consumers to access information and let software developers create custom applications.