In 2010, the talk about a "big data" trend has reached a fever pitch. "Big data" centers around the notion that organizations are now (or soon will be) dealing with managing and extracting information from databases that are growing into the multi-petabyte range.
This dramatic amount of data has caused developers to seek new approaches that tend to avoid SQL queries and instead process data in a distributed manner. These so-called "NoSQL," such as Cassandra and MongoDB databases, are built to scale easily and handle massive amounts of data in a highly fluid manner.
And while I am a staunch supporter of the NoSQL approach, there is often a point where all of this data needs to be aggregated and parsed for different reasons, in a more traditional SQL data model.
It occurred to me recently that I've heard very little from the relational database (RDBMS) side of the house when it comes to dealing with big data. To that end, I recently caught up via e-mail with EnterpriseDB CEO Ed Boyajian, whose company provides services, support, and training around the open-source relational database PostgreSQL.
Boyajian stressed four points:
1. Relational databases can process ad-hoc queries
Production applications sometimes require only primary key lookups, but reporting queries often need to filter or aggregate based on other columns. Document databases and distributed key value stores sometimes don't support this at all, or they may support it only if an index on the relevant column has been defined in advance.
2. SQL reduces development time and improves interoperability
SQL is, and will likely remain, one of the most popular and successful computer languages of all time. SQL-aware development tools, reporting tools, monitoring tools, and connectors are available for just about every combination of operating system, platform, and database under the sun, and nearly every programmer or IT professional has at least a passing familiarity with SQL syntax.
Even for the types of relatively simple queries that are likely to be practical on huge data stores, writing an SQL query is typically simpler and faster than writing an algorithm to compute the desired answer, as is often necessary for data stores that do not include a query language.
3. Relational databases are mature, battle-tested technology
Nearly all of the major relational databases on the market today have been around for 10 years or more and have very stable code bases. They are known to be relatively bug-free, and their failure modes are well understood. Experienced DBAs can use proven techniques to maximize uptime and be confident of successful recovery in case of failure.
4. Relational databases conform to widely accepted standards
Migrating between two relational databases isn't a walk in the park, but most of the systems available today offer broadly similar capabilities, so many applications can be migrated with fairly straightforward changes. When they can't, products and services to simplify the process are available from a variety of vendors.
Document databases and distributed key-value stores have different interfaces, offer different isolation and durability guarantees, and accept very different types of queries. Changing between such different systems promises to be challenging.
Ed also provided an amusing analogy that perhaps illustrates how the differing types of databases (RDBMS, NoSQL and everything in between) relate to each other. You be the judge.
"An RDBMS is like a car. Nearly everybody has one and you can get almost everywhere in it. A key-value store is like an Indy car. It's faster than a regular car, but it has some limitations that make it less than ideal for a trip to the grocery store. And a column-oriented database is a helicopter. It can do many of the same things that a car can do, but it's unwieldy for some things that a car can do easily, and on the flip side excels at some things that a car can't do at all."
Ultimately, users care more about the data than they do about their database. Managing and manipulating the data to meet their specific needs should always trump any specific technology approach.