Open source powers big data index

Interest in open-source tooling and infrastructure for big data keeps growing among developers and traditionally proprietary companies.

Dave Rosenberg
Dave Rosenberg Co-founder, MuleSource
Dave Rosenberg has more than 15 years of technology and marketing experience that spans from Bell Labs to startup IPOs to open-source and cloud software companies. He is CEO and founder of Nodeable, co-founder of MuleSoft, and managing director for Hardy Way. He is an adviser to DataStax, IT Database, and Puppet Labs.
2 min read

Interest in big data continues to grow in terms of both downloads of connectors to software packages and in software infrastructure to power big data, primarily in the form of NoSQL databases and Hadoop-related extensions, according to a report.

The report, released today to coincide with the Hadoop Summit in Santa Clara, Calif., comes via open-source business intelligence provider Jaspersoft. The second-quarter report measures demand for popular data sources for storing, analyzing, and visualizing big data and uses stats from the JasperForge community site.

Key findings:

  • Big-data downloads are on pace to grow 92 percent in 2012 compared with 2011.
  • Over 10,000 big-data connectors (software that enables users to connect to different data stores) were downloaded in 2012 so far.
  • In 2012, NoSQL document stores like MongoDB received more than 70 percent of total big-data tool demand. MongoDB from 10gen remains the top performer in this group.
  • Since January 2011, document stores garnered the majority of demand with 58 percent of downloads. Key-value stores and Big Table clones tie for second with 20 percent and 19 percent of demand respectively.
  • Cassandra from DataStax grew nearly 100 percent in May 2012 alone.
  • Hive, a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems, has seen a constant, steady demand from 2011 to 2012.

It's interesting to note that open-source projects/products continue to dominate the emerging big-data landscape. Even vendors such as VMware that typically rely on proprietary models have embraced open-source big-data tools as witnessed by today's announcement of a new open-source project called Serengeti designed to enable Hadoop to run atop of VMware vSphere cloud.