Learning from physics research to tackle big data

Scientists at CERN are using big-data techniques to process 15 petabytes of information a year to piece together what our universe is made of.

Dave Rosenberg Co-founder, MuleSource
Dave Rosenberg has more than 15 years of technology and marketing experience that spans from Bell Labs to startup IPOs to open-source and cloud software companies. He is CEO and founder of Nodeable, co-founder of MuleSoft, and managing director for Hardy Way. He is an adviser to DataStax, IT Database, and Puppet Labs.
Dave Rosenberg
2 min read

ROOT framework CERN

Companies are increasingly collecting amounts of digital information that are so large as to be unwieldy. It's no surprise that finding a way to securely store, categorize and recall this information efficiently is a huge advantage for any enterprise or organization.

The growth of information has introduced an entirely new category of software in the big-data arena, which includes a variety of databases, processing engines, and applications. The main objective of all of these tools is to make data more malleable and consumable so that it can be used in an easier way.

This week, CERN, the European Organization for Nuclear Research announced a relationship with software integrity provider Coverity to upkeep ROOT, a custom-built data analysis framework that processes and makes available the 15 petabytes of information that are generated each year from experiments using CERN's Large Hadron Collider--dubbed the largest scientific instrument ever built.

The information stored in ROOT helps CERN's 10,000 physicists piece together what our universe is made of and how it works by studying what happens when particles of matter collide into each other at close to the speed of light. Every second, scientists at CERN oversee 600 million of these particle collisions, which generate enough data to fill up 15,000 standard disk drives.

Usually, it takes years for cutting-edge research to trickle down into enterprise solutions and even longer for it to permeate a market--primarily due to the lack of resources familiar with the technology. However, with the explosion of data and companies competing to help enterprises tackle these mountains and silos of data, we should see more data analysis frameworks currently being used in academia and research take hold sooner than we have in the past.

The big question with academic projects is whether or not they can achieve a widespread adoption beyond researchers. There have been a number of research-based projects that have seen growth over the last few years, including the R free software environment for statistical computing and graphics, as well as the Globus Toolkit for grid computing and Eucalyptus for private clouds.