hadoop

Where IT is going: Cloud, mobile, and data

Cloud computing seems to often get used as a catch-all term for the big trends happening in IT.

This has the unfortunate effect of adding additional ambiguities to a topic that's already laden with definitional overload. (For example, on a topic like security or compliance, it makes a lot of difference whether you're talking about public clouds like Amazon's, a private cloud within an enterprise, a social network, or some mashup of two or more of the above.)

However, I'm starting to see a certain consensus emerge about how best to think about the broad sense … Read more

IBM Fellow Jeff Jonas on the evolution of Big Data

Last week I reconnected with Jeff Jonas, chief scientist of the IBM Entity Analytics group and a recently named IBM Fellow, about what's going on in the realm of big data.

When I first met Jonas, back in June of 2010, he was focused on how companies are dealing with the deluge of information associated with Big Data. His focus hasn't changed, but he told me his perspective on how we make sense of data continues to evolve -- especially as we move in and out of demand for real-time versus batch data processing.

New Big Data tools … Read more

Is Hadoop the new tape?

I attended GigaOM's Structure:Data 2012 conference in New York City last week. This is the second one I've attended and I'm now a confirmed advocate of this event. Om Malik brings together people who, in one way or another, represent much the creative thinking around so-called big data. I got the feeling that I could strike up a conversation with anyone there and learn something new.

I noticed at least two major differences between the Structure:Data event I attended last year and this year's version. Last year, most if not all of the exhibiting … Read more

The end of the server-versus-storage wars is nigh

There's always been tension between server and storage bigots.

Scott McNealy, former CEO of the former Sun Microsystems, once infamously opined that storage was a (mere) feature of the server. The problem was that at the time he made that comment, the storage industry was writing its declaration of independence. Fibre Channel-based SANs were consolidating and replacing direct attached storage (DAS) architectures in many of the world's large data centers. IP-based network attached storage (NAS) systems were consolidating and replacing print and file servers, much to the chagrin of both McNealy and Steve Ballmer.

Vendors with a server … Read more

Hortonworks looks to grow Hadoop ecosystem

As big data becomes more and more top of mind, a number of new companies have popped up to support Hadoop, the leading open-source platform for data-intensive distributed applications. One of the newer entrants is Hortonworks, a company spun out of Yahoo, with a $15 million-plus cash infusion from both Yahoo and Benchmark Capital.

Last week I sat down with Hortonworks CEO Eric Baldeschwieler to understand how the company intends to differentiate from other vendors such as Cloudera, MapR, and the many as yet unlaunched companies that venture capitalists are still funding.

Hadoop itself was initially developed at Yahoo by … Read more

IBM launches Hadoop-based analytics software

IBM said today that it will invest $100 million on research for analytics and big data projects and expanded its portfolio accordingly. The company also launched Hadoop-based services.

Hadoop is open-source technology that's used to analyze unstructured data. Both Yahoo and Google are heavy Hadoop proponents.

IBM said it is launching InfoSphere BigInsights and Streams software to analyze unstructured data such as text, video, audio, and social media. The software, cooked up by IBM Research, is based on Hadoop and more than 50 Big Blue patents.

Read more of "IBM launches Hadoop-based analytics software, big data services" … Read more

EMC: The platform company

It's Monday morning at EMC World 2011 and EMC Chairman Joe Tucci opens the show with 10,000-plus in the audience. On stage with Tucci are big black boxes. What's wrong with this picture? EMC is no longer a company that can be primarily characterized as a maker of big black boxes. Tucci has engineered a transformation of EMC from an enterprise IT storage box vendor to a provider of computing platforms. Let me count them:

Nos. 1, 2, and 3: Foremost among EMC's platforms is VMware. EMC owns approximately 85 percent of it, but unlike his … Read more

Cloudera ups the ante on open-source Hadoop

The Hadoop open-source project for distributed compute processing continues to be one of the most interesting projects for managing the vast amount of data being analyzed and collected in a wide variety of scenarios.

Today, Cloudera, a provider of Hadoop data management software and services, is set to release a major release of its open source software distribution--Cloudera Distribution for Hadoop (CDH), including Apache Hadoop v3.

Cloudera's CDH3 distribution is an integrated set of components and functions that interoperate through standard APIs and manage required component versions and dependencies.

CDH3 is an integrated stack that includes not just software … Read more

IBM takes aim at Smarter Commerce

IBM is putting its expertise in data analysis and business process to work under a new initiative called "Smarter Commerce" to help make sense of what consumers want and help vendors to better target offers.

"Smarter Commerce" is a reaction to the shift in the dynamics of commerce as a whole, with the customer leading the path to sales, according to Yuchun Lee, VP of enterprise commerce for IBM. The newly packaged offerings are designed to help businesses engage customers with a higher level of relevancy, putting the customer back to the center of the business … Read more

Shared storage in a 'shared nothing' environment

The computing industry is seeing dramatic growth in the use of "shared nothing" database architectures where each node functions independently of one another and is self-sufficient (Hadoop Distributed File System for example). For the sake of performance, contention among nodes for shared disk resources (SAN and NAS) is one of the things these architectures avoid by dedicating storage resources to each node, i.e. no shared disk.

While these computing architectures are best-known in the context of Web-based applications and development activities, they are no longer confined to the Web. EMC Greenplum, IBM Netezza, and ParAccel are all … Read more