Learning from physics research to tackle big data
Scientists at CERN are using big-data techniques to process 15 petabytes of information a year to piece together what our universe is made of.
Companies are increasingly collecting amounts of digital information that are so large as to be unwieldy. It's no surprise that finding a way to securely store, categorize and recall this information efficiently is a huge advantage for any enterprise or organization.
The growth of information has introduced an entirely new category of software in thearena, which includes a variety of databases, processing engines, and applications. The main objective of all of these tools is to make data more malleable and consumable so that it can be used in an easier way.
This week, CERN, the European Organization for Nuclear Research announced a relationship with software integrity provider Coverity to upkeep ROOT, a custom-built data analysis framework that processes and makes available the 15 petabytes of information that are generated each year from experiments using CERN's Large Hadron Collider--dubbed the largest scientific instrument ever built.
The information stored in ROOT helps CERN's 10,000 physicists piece together what our universe is made of and how it works by studying what happens when. Every second, scientists at CERN oversee 600 million of these particle collisions, which generate enough data to fill up 15,000 standard disk drives.
Usually, it takes years for cutting-edge research to trickle down into enterprise solutions and even longer for it to permeate a market--primarily due to the lack of resources familiar with the technology. However, with the explosion of data and companies competing to help enterprises tackle these mountains and silos of data, we should see more data analysis frameworks currently being used in academia and research take hold sooner than we have in the past.
The big question with academic projects is whether or not they can achieve a widespread adoption beyond researchers. There have been a number of research-based projects that have seen growth over the last few years, including the R free software environment for statistical computing and graphics, as well as the Globus Toolkit for grid computing and Eucalyptus for private clouds.