In 2010, the talk about a "big data" trend has reached a fever pitch. "Big data" centers around the notion that organizations are now (or soon will be) dealing with managing and extracting information from databases that are growing into the multi-petabyte range.
This dramatic amount of data has caused developers to seek new approaches that tend to avoid SQL queries and instead process data in a distributed manner. These so-called "NoSQL," such as Cassandra and MongoDB databases, are built to scale easily and handle massive amounts of data in a highly fluid manner.
And while I am a staunch supporter of the NoSQL approach, there is often a point where all of this data needs to be aggregated and parsed for different reasons, in a more traditional SQL data model.
It occurred to me recently that I've heard very little from the relational database (RDBMS) side of the house when it comes to dealing with big data. To that end, I recently caught up via e-mail with EnterpriseDB CEO Ed Boyajian, whose company provides services, support, and training around the open-source relational database PostgreSQL.
Boyajian stressed four points:
1. Relational databases can process ad-hoc queries
Production applications sometimes require only primary key lookups, but reporting queries often need to filter or aggregate based on other columns. Document databases and distributed key value stores sometimes don't support this at all, or they may support it only if an index on the relevant column has been defined in advance.
2. SQL reduces development time and improves interoperability
SQL is, and will likely remain, one of the most popular and successful computer languages of all time. SQL-aware development tools, reporting tools, monitoring tools, and connectors are available for just about every combination of operating system, platform, and database under the sun, and nearly every programmer or IT professional has at least a passing familiarity with SQL syntax.
Even for the types of relatively simple queries that are likely to be practical on huge data stores, writing an SQL query is typically simpler and faster than writing an algorithm to compute the desired answer, as is often necessary for data stores that do not include a query language. … Read more