CNET también está disponible en español.

Ir a español

Don't show this again


Are databases in the cloud really all that different?

The NoSQL term is losing its luster already as databases in the cloud increase in complexity and functionality.

Last week a discussion emerged in regards to the necessity of the NoSQL moniker associated with a new wave of open-source distributed database projects like CouchDB, MongoDB and Cassandra.

CouchOne, the commercial entity behind CouchDB even announced that it's moving away from associating the company with NoSQL as focuses on enabling offline data and applications.

The current orthodoxy would have you believe that if you are trying to get your head around "big data" or "Web scale" (see video), NoSQL is the answer. If you are dealing with preset data definitions being accessed by all the divisions of your global 100 company, SQL is better.

Here's the reality--relational databases have been around forever and Oracle, Microsoft SQL Server, MySQL, and IBM DB2 won't disappear any time soon. Too many vendors rely on RDBMS for their applications and the ecosystem around relational databases is extremely rich.

What's important to note is that using a database in a cloud-like manner requires system architects and developers recognize the principles associated with building a massively distributed data store.

Traditional SQL-based databases such as Oracle, Microsoft SQL Server, and IBM DB2 were designed to run on a single physical node/cluster in a single location, typically hooked to unified storage with full control over all software/hardware elements.

Running the databases in a virtualized environment with multiple nodes and very limited control imposes obstacles that are difficult to overcome. The wave of NoSQL databases seen recently is a reaction to these limitations.

Via e-mail, Razi Sharir, CEO of cloud database provider Xeround, outlined a few things that architects need to look out for when designing applications for the cloud:

The cloud is not a predictable and stable environment
According to Sharir there are very few SLAs for databases. That means every database--regardless of size or scope--must run in a replicable, high-availability set-up, which is typically more complex (if doable) and prohibitively expensive.

Performance is also not guaranteed unless running on dedicated nodes--which would defeat the purpose of using the cloud.

Databases generally don't scale the same way as applications
Some see the cloud as a purified, stateless computing environment that imposes on-demand capability requirements when trying to scale both elastically and linearly.

This means scaling out (adding nodes) and not up within the same node. While it may be easy to scale an application, scaling a database elastically (depending on throughput and size) can be very complex and tedious to manage.

Distributed databases are not the same as distributed applications

Maintaining multiple active/master copies of a database in multiple locations and/or clouds requires building logic to handle conflicts, network problems, or latency while attempting to maintain single source of truth at all times.

Multi-tenancy introduces many new possibilities--and headaches
In an Infrastructure-as-a-Service (IaaS) scenario, the database is expected to support multi-tenancy to enable a cost-effective and operationally efficient framework.

While a standard SQL database can be installed in multiple copies on the same virtual machine, that doesn't make it a multi-tenant set-up. In fact, that may cause more headaches and management overhead to keep it running.