The tech media recently started taking serious notice of Hadoop, an open-source project developed to processing huge amounts of data, and the coverage is growing every day. According to ITDatabase, 161 stories have been written about Hadoop in the last three months alone, including a veritable "coming out party" in The New York Times.
Hadoop is interesting because it's proven in use at large Web shops, cloud-oriented, open-source, and it solves two major computing problems: handling large amounts of data, and writing parallel programs for large numbers of computers. Hadoop clusters can scale up to tens or hundreds of terabytes, or even petabytes.
But adoption doesn't always equal commercial success. I've written in the past about Cloudera, a company formed to support Hadoop, and recently sat down with CEO Mike Olson to get his thoughts on the burgeoning Hadoop ecosystem and how the company intends to balance community and commerce.
My initial question for Olson was how does the company succeed when users are happy with the open-source project?
Olson answered with several key points. Cloudera sees "big data" -- terabytes at least -- becoming a common problem for all kinds of companies. The early adopters of Hadoop were all Web 2.0 companies generating logs and mining them for user behavior data. But data processing at this scale is also an enterprise problem and enterprises aren't always early adopters and often require software to be supported by a vendor, not just a community.
Most enterprise buyers are very different from Facebook and Yahoo. They employ much smaller development and IT staff. They need strong SLAs and a quick response to problems from a vendor with deep expertise. Cloudera aims to solve those problems in ways that community support, mailing lists, and online forums can't.
This is typical of open-source projects that become more like products, and the challenge is ensuring that the project lives on and the commercialization efforts are balanced with good citizenship to non-customers.
The open-source community around Hadoop thus far appears to be pretty happy with Cloudera. The company has made its Cloudera Distribution for Hadoop available for free download, put a large amount of free training material on its Web site, and contributes to the open-source project with new features.
Good community relations are critical for open-source companies; getting this right is important for Cloudera.
Olson tells me that customers are running Hadoop in-house and, increasingly, in the cloud. A few weeks ago, Amazon even announced a hosted Hadoop offering called "Elastic MapReduce" -- more evidence that Hadoop has gone mainstream. From Olson's perspective, more Hadoop in the world means more demand for enterprise-grade services and support, and that creates a great opportunity for Cloudera to make life better for commercial users of the open-source project.
This is the key to maintaining the balance of commercial and community and others will certainly pay attention to how Cloudera interacts with the Hadoop community to learn what works and what doesn't.
Follow me on Twitter @daveofdoom