VMware today announced a new open-source project called Serengeti, which enables enterprises to quickly deploy, manage, and scale Apache Hadoop in virtual and cloud environments.
VMware says it is working with the Apache Hadoop community to contribute extensions that will make Hadoop Distributed File System (HDFS) and Hadoop MapReduce projects "virtualization-aware" to support elastic scaling and further improve Hadoop performance in virtual environments.
In case you've been living outside the big data vacuum, open source Hadoop has emerged as the de facto standard for big data processing and is packaged up in a few different distributions by commercial vendors including Cloudera, IBM, EMC Greenplum, MapR, and Hortonworks, each with a slightly different spin on the platform and associated tooling.
VMware has been particularly aggressive of late, partnering with Hortonworks earlier this week and acquiring virtualization-scaling specialist CETAS back in May. The interest in big data, and especially Hadoop makes a lot of sense as the company moves up the stack toward applications that rely on scalable infrastructure for success. And considering the rising interest in open-source big data it also fits into VMware's overall software strategy.
My immediate reaction to this news is that it sounds a lot like how Amazon Web Services (AWS) Elastic Map Reduce (EMR) works -- where the underlying infrastructure scales across virtual instances -- in this case in enterprise-oriented packaging.
The release of Serengeti could also usher in a new world of hosted Hadoop providers -- assuming the open-source angle takes hold and the software supports other hypervisors beyond VMware.