X

Cloudera ups the ante on open-source Hadoop

The latest release of Cloudera's Hadoop distribution puts big data processing into a neat stack for ease of use and efficiency.

Dave Rosenberg Co-founder, MuleSource
Dave Rosenberg has more than 15 years of technology and marketing experience that spans from Bell Labs to startup IPOs to open-source and cloud software companies. He is CEO and founder of Nodeable, co-founder of MuleSoft, and managing director for Hardy Way. He is an adviser to DataStax, IT Database, and Puppet Labs.
Dave Rosenberg
2 min read

The Hadoop open-source project for distributed compute processing continues to be one of the most interesting projects for managing the vast amount of data being analyzed and collected in a wide variety of scenarios.

Today, Cloudera, a provider of Hadoop data management software and services, is set to release a major release of its open source software distribution--Cloudera Distribution for Hadoop (CDH), including Apache Hadoop v3.

Cloudera's CDH3 distribution is an integrated set of components and functions that interoperate through standard APIs and manage required component versions and dependencies.

CDH3 is an integrated stack that includes not just software components but the associated libraries and testing necessary for a smooth experience. Software stacks have remained ever-elusive in the open source world, where there can arguably be too much choice--so much so that developers end up having to tweak every component to address issues with just one.

As such, the stack approach for something like Hadoop, which has inherent complexity and many components (this is big data after all) can be hugely beneficial for both users the project itself.

CDH3 includes the following components:

  • HBase: Hadoop database for random read/write access
  • Hive: SQL-like queries and tables on large datasets
  • Pig: dataflow language and compiler
  • Sqoop: integrates databases and data warehouses with Hadoop
  • Flume: highly reliable, configurable streaming data collection
  • Extended security and authentication functions

While Hadoop is readily available on its own, CDH makes it easier and more consumable for people to be up and running quickly, especially in light of the sub-projects that have emerged, according to Cloudera CEO Mike Olson.

Olson said the company has thrived because the core Hadoop software has remained open source and a large community has developed to not only support users but to extend the platform in ways that no single developer or company could. Additionally, because Cloudera has a large team of Hadoop committers, it has visibility into what may or may not be interesting features or problems with the software and can best address the needs of their customers.