Hadoop is the popular open-source implementation of MapReduce, a powerful tool designed for deep analysis and transformation of very large data sets. It enables you to explore complex data, using custom analyses tailored to your information and questions. It's also one of the most buzz-worthy, talked about open-source projects around.
I spoke with Christophe Bisciglia, Hadoop World organizer and founder of Cloudera, to ask some questions about this inaugural event. And by the way, if you're interested in attending, click on the link in the answer to question No. 5. (My readers get a 25 percent discount if you register before September 15.)
Q: How can you explain the buzz around Hadoop? It's deafening.
Bisciglia: The reason for this is easy to understand: Hadoop provides a scalable and flexible platform that allows anyone to build data-intensive applications without worrying about the underlying distributed system. Anyone can now store and process terabytes upon terabytes of data using the same platform that powers some of the world's largest Web properties, and enterprises across traditional verticals have demonstrated compelling use cases.
Since Hadoop is open-source, we're seeing extensions that allow Hadoop to function more like, and integrate with, existing data systems. This is enabling an entire ecosystem to take advantage of Hadoop's underlying storage and processing technology, and combining that with specific application needs.
Q: Why Hadoop World? Why New York? What's unique about this event compared to Hadoop Summit that took place in Silicon Valley in June?
Bisciglia: One thing that is becoming increasingly clear is that Hadoop is no longer just for Web companies in Silicon Valley. Traditional enterprises, around the world, are turning to Hadoop to solve data-intensive computing challenges in a wide variety of verticals. Finance, telecommunications, and biotech are particularly noticeable. NYC is home to both traditional enterprise software companies, as well as a growing community of Web-centric start-ups. It's also a much shorter flight from Europe.
Like the Hadoop Summit in Silicon Valley, there will be deeply technical tracks on Hadoop development, but Hadoop World will also focus more attention on compelling applications for Hadoop across various industries and the resources needed for traditional enterprise users to get started with Hadoop. Hadoop World's sponsors include a wide range of companies to help attendees understand their options for cloud providers, server vendors, systems integrators. We're also happy to have major contributors like Yahoo and Facebook sponsor the event.
Q: How does the speaker lineup look and what are they going to talk about?
Bisciglia: We couldn't be happier. In addition to the usual suspects from major contributors, many enterprise users relatively new to the Hadoop scene, are going into detail about how they solve real business problems with Hadoop. Of particular interest are Visa on large-scale transaction analysis, JPMorganChase on data processing for financial services, Booz Allen Hamilton on protein alignment, Rackspace on cross data center logs analysis, eHarmony on matchmaking, China Mobile on data mining for telecom. You can see a full list of talks and more details on the individual tracks on the conference Web site.
Q: Where do you see Hadoop in next 12 to 18 months?
Bisciglia: There are three important trends I'd watch over the next year or so. On the Hadoop development side, we're working towards API stability and improved security--Yahoo is leading a lot of this work, and we're pretty excited about that. It will make upgrades much easier and enable broader enterprise usage.
On the usability side, we're constantly working to make Hadoop easier to deploy. Packaging Hadoop using standard deployment tools (Red Hat Package Manager, Debian packages) is one part of this, but we also invest a fair bit in our partner relationships to make Hadoop easy to deploy on cloud providers like Amazon, Rackspace, and Softlayer.
The last key thing to watch is the ecosystem. With Hadoop now supporting the JDBC and ODBC protocols, integrating with existing systems is getting much easier. For example, Cloudera's Distribution includes a tool called Sqoop which automates importing data from existing databases like MySQL and Oracle. This is just a start, and many such extensions will surely come from the vibrant development community.
Q: What are the Hadoop World details? How do I register? How much? Can Dave get me a discount?
Bisciglia: You can find all the details for Hadoop World at http://www.cloudera.com/hadoop-world-nyc. Registration for the main event on October 2 is $299, and there are three days of training options prior to the event for developers and administrators. You can get 25 percent off the regular registration until September 15 using this link.
Follow me on Twitter @daveofdoom.