Why science really needs big data

The White House Big Data Research and Development Initiative addresses the need for data science in the military, biomedicine, computers, and the environment to advance.

Martin LaMonica Former Staff writer, CNET News
Martin LaMonica is a senior writer covering green tech and cutting-edge technologies. He joined CNET in 2002 to cover enterprise IT and Web development and was previously executive editor of IT publication InfoWorld.
Martin LaMonica
3 min read
The explosion of big data is transforming how scientists conduct research, a shift which the White House's Big Data Research and Development Initiative seeks to address.
The explosion of big data is transforming how scientists conduct research, a shift which the White House's Big Data Research and Development Initiative seeks to address. Facebook

In years past, the go-to tools for researchers were specific to their field, whether it was a telescope or a microscope. Increasingly, it's computers and big data sets.

The White House today announced a $200 million big-data initiative to create tools to improve scientific research by making sense of the huge amounts of data now available. The programs are needed to improve the technologies for getting insight from complex and large sets of digital data, according to the White House.

"The initiative we are launching today promises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security," John Holdren, director of the White House Office of Science and Technology Policy, said in a statement (PDF).

Grants and research programs are geared at improving the core technologies around managing and processing big data sets, speeding up scientific research with big data, and encouraging universities to train more data scientists and engineers.

The initiative addresses an important need not only in computers, but science, in general. The emergent field of data science is changing the direction and speed of scientific research by letting people fine-tune their inquiries by tapping into giant data sets.

Medical research, for example, is moving from broad-based treatments to highly targeted pharmaceutical testing for a segment of the population or people with specific genetic markers.

"Scientists have been using data for a long time. What's new is that the scale of the data is overwhelming, which can be an infrastructure challenge," said Puneet Batra, the chief data scientist at big-data health care startup Kyruus and a former physicist.

In the past, certain fields of science relied heavily on big data sets, such as high-energy particle physics or research on nuclear fusion.

But as information becomes available from more sources, collecting and analyzing large amounts of data is becoming common in other fields of research and business, said Richard Lawrence, the manager of machine learning at IBM.

"Big data has moved from the immediate focus on certain scientific disciplines into the infrastructure of large enterprises and private companies...because big data is becoming more pervasive to society," he said.

In another example, climate science researchers now have a large body of observational data from sensors to better create models to predict the effects of climate change.

Data science
The American Association for the Advancement of Science later this afternoon is hosting a press conference with the heads of the Office of Science and Technology Policy, the National Science Foundation, the National Institutes of Health, the Departments of Defense and Energy, DARPA, and the U.S. Geological Survey to discuss the challenges and possibilities around big data for research.

The goals of the different agencies differ but the research initiatives all seek to improve people's ability to collect and use large amounts of information.

The Defense Department has made $60 million available in new research projects to analyze texts in different languages and improve autonomous systems, such as robotics, that can collect sensing data and operate in the field.

The National Institutes of Health, meanwhile, has made a 200-terabyte data set on human genetic variation available online. The data is stored on Amazon Web services and free for researchers to query and analyze.

Researchers now need to be able to tame large data sets with new software tools and high-performance computing to make rapid advances in their fields.

"The granularity of the data has changed," said Kyruus' Batra. "You're now collecting information from machines or individuals or physical phenomena at frequent intervals so that data can go to large scale. And now you have the tools to start analyzing it."