Hortonworks looks to grow Hadoop ecosystem

Hortonworks CEO talks about the origin of Hadoop and how his company intends to monetize the open-source, big data standout.

Dave Rosenberg Co-founder, MuleSource
Dave Rosenberg has more than 15 years of technology and marketing experience that spans from Bell Labs to startup IPOs to open-source and cloud software companies. He is CEO and founder of Nodeable, co-founder of MuleSoft, and managing director for Hardy Way. He is an adviser to DataStax, IT Database, and Puppet Labs.
Dave Rosenberg
3 min read

Hadoop Logo
Apache Foundation

As big data becomes more and more top of mind, a number of new companies have popped up to support Hadoop, the leading open-source platform for data-intensive distributed applications. One of the newer entrants is Hortonworks, a company spun out of Yahoo, with a $15 million-plus cash infusion from both Yahoo and Benchmark Capital.

Last week I sat down with Hortonworks CEO Eric Baldeschwieler to understand how the company intends to differentiate from other vendors such as Cloudera, MapR, and the many as yet unlaunched companies that venture capitalists are still funding.

Hadoop itself was initially developed at Yahoo by Doug Cutting (who is now part of Cloudera) to commoditize the storage and processing of big data. According to Baldeschwieler, Yahoo started taking Hadoop seriously as both a way to provide computing resources as a service to its own business units and as a way to attract data scientists to come work at the company.

Yahoo found that it needed to have a more generic infrastructure to roll out new services and make the processing of the data it collected more readily consumable for internal and partner services. The ultimate vision was that Yahoo was building on what could become the standard commodity compute platform.

And Hadoop has broken out as the star of the big-data world. Baldeschwieler believes that half the world's data will be stored in Hadoop in five years and if you look at the growth of unstructured data, Hadoop is the logical answer. Hadoop can store data at a different price point, making it a solution that allows you to experiment and store more. If you hit the right cost, you can store even more data for longer and use it for more purposes. There is more upside you can harvest and, ultimately, there's more business value.

From a business perspective, the big difference with Hortonworks is that the company currently has no intention of creating a new or alternate distribution of Hadoop. Instead it aims to provide support and training on the Apache open-source releases, with the intention of ecosystem growth. The support model has long been the weapon of choice for early stage open-source companies, but I wonder if it still flies in today's market.

The Hortonworks approach is counter to Cloudera, which packages its own distribution of Hadoop and other necessary components into releases that, while open source, are not explicitly only the bits from the Apache foundation. Cloudera also provides consulting in addition to support and training, which CEO Mike Olson told CNET is still a requirement for most enterprises interested in using Hadoop in production. Olson also pointed out ease of use and packaging as the way to gain adoption and ultimately revenue.

And while I can't really say which approach is better, it's clear that there is quite a bit of money to be made in the Hadoop ecosystem. That said, the technology is still difficult enough to use that people need help to get up and running, which is how most open-source companies have found their niche.