cassandra

Are databases in the cloud really all that different?

Last week a discussion emerged in regards to the necessity of the NoSQL moniker associated with a new wave of open-source distributed database projects like CouchDB, MongoDB and Cassandra.

CouchOne, the commercial entity behind CouchDB even announced that it's moving away from associating the company with NoSQL as focuses on enabling offline data and applications.

The current orthodoxy would have you believe that if you are trying to get your head around "big data" or "Web scale" (see video), NoSQL is the answer. If you are dealing with preset data definitions being accessed by all … Read more

Why relational databases make sense for big data

In 2010, the talk about a "big data" trend has reached a fever pitch. "Big data" centers around the notion that organizations are now (or soon will be) dealing with managing and extracting information from databases that are growing into the multi-petabyte range.

This dramatic amount of data has caused developers to seek new approaches that tend to avoid SQL queries and instead process data in a distributed manner. These so-called "NoSQL," such as Cassandra and MongoDB databases, are built to scale easily and handle massive amounts of data in a highly fluid manner.

And while I am a staunch supporter of the NoSQL approach, there is often a point where all of this data needs to be aggregated and parsed for different reasons, in a more traditional SQL data model.

It occurred to me recently that I've heard very little from the relational database (RDBMS) side of the house when it comes to dealing with big data. To that end, I recently caught up via e-mail with EnterpriseDB CEO Ed Boyajian, whose company provides services, support, and training around the open-source relational database PostgreSQL.

Boyajian stressed four points:

1. Relational databases can process ad-hoc queries

Production applications sometimes require only primary key lookups, but reporting queries often need to filter or aggregate based on other columns. Document databases and distributed key value stores sometimes don't support this at all, or they may support it only if an index on the relevant column has been defined in advance.

2. SQL reduces development time and improves interoperability

SQL is, and will likely remain, one of the most popular and successful computer languages of all time. SQL-aware development tools, reporting tools, monitoring tools, and connectors are available for just about every combination of operating system, platform, and database under the sun, and nearly every programmer or IT professional has at least a passing familiarity with SQL syntax.

Even for the types of relatively simple queries that are likely to be practical on huge data stores, writing an SQL query is typically simpler and faster than writing an algorithm to compute the desired answer, as is often necessary for data stores that do not include a query language. … Read more

Xeround scales MySQL for the cloud

Today, Xeround officially announced the release of the private beta of its "MySQL for the Cloud" service--an elastic, linearly scalable, relational database designed to run applications in cloud environments.

Xeround is based on an in-memory database and has been tested in a number of telco production environments, according to CEO Razi Sharir. The software utilizes virtual partitions where data partitions are decoupled--or abstracted--from physical resources. These virtual partitions hold copies of both the data and the indexes, in order to ensure high availability and performance.

Despite the ubiquity of open-source MySQL, the database has in the past suffered … Read more

Big data in context

A few weeks back I attended venture firm Accel Partners' New Data Workshop event and learned quite a bit about the state of what we are now commonly referring to as "big data" and the challenges that await the vendors trying to target this new way of slicing and dicing vast amounts of information.

One of the big takeaways for me was the realization that even with all of the processing power available nowadays, the amount of data is growing at such a rapid pace that people are simply looking to cope with the problem, rather than facing it head on.

The issue of processing large amounts of data is not necessarily new--most developers and IT staff can tell you about having too much information to deal with--but, the big difference is that there are new approaches, tools and technologies that can help alleviate the difficult in processing.

Over the course of the last 30 years or so the way that machines process transactions has changed, but so too has the vast amount of data that is being processed and collected, now with an eye toward real-time analysis of information.

This has led to the advent of a number of technologies that allow for data processing to be offloaded and managed in both structured and unstructured ways--examples include open-source projects like Memcached and Hadoop as well as NoSQL data storage mechanisms like Cassandra.… Read more

NoSQL goes mobile with the help of CouchDB

If there is one aspect of mobility that has yet to live up to user expectations, it's the ability for data to be accessible in near real-time across multiple devices.

Despite all the advances in technology, including a wealth of Wi-Fi and 3G networks, many devices become impotent without an Internet connection.

This issue becomes even more apparent when you are dealing with browser-based applications and smartphones that don't have multithreading functionality to maintain state across applications and data stores.

I recently had the chance to chat with Damien Katz, the creator of CouchDB and CEO of Couchio, … Read more

Apache Cassandra gets boost from Riptano (Q&A)

A new company called Riptano recently launched to provide support and services for the Apache Cassandra project, a nonrelational open-source database designed for high performance that has a strong presence in Web shops like Twitter, Digg, and Reddit. I recently had the chance to chat with Matt Pfeil, founder of Riptano, and he provided some insight into the project and the new world of NoSQL database approaches.

What exactly is Cassandra and who uses it? Cassandra is a highly scalable, distributed, open source database. It's a top-level Apache project with committers from Riptano, Rackspace, Digg, Facebook, and others.

Cassandra … Read more

Open-source evolution hits overdrive

Update at 5:30 AM Pacific on March 2, 2010: I mistakenly reported that Facebook has moved from MySQL in favor of Cassandra. According to a credible source familiar with Facebook's systems, this is not the case. Indeed, you can actually follow "MySQLatFacebook" on Facebook. I apologize for the error and am glad to see MySQL is still in active usage at Facebook.

Open-source software has hastened the evolution of Web applications as it drives out the inefficiencies and costs of proprietary software to enable companies like Google and Twitter to scale. But it's not just … Read more

Shedding new light on tumors

A new oxygen nanosensor that combines a biopolymer with a light-emitting dye could help identify the most aggressive regions of cancerous tumors, according to a press release by researchers at the University of Virginia.

The material uses polylactic acid as its base--good news for the environment and cost because it is both easy and inexpensive to fabricate in many forms.

Guoqing Zhang, a chemistry doctoral candidate, alongside Cassandra Fraser, a chemistry professor, combined a corn-based biopolymer with a dye that is both fluorescent (the immediate illumination of photon re-emission) and phosphorescent (a slower illumination that appears as an afterglow):

Zhang … Read more