CNET también está disponible en español.

Ir a español

Don't show this again

Christmas Gift Guide

Rethinking the relational database

CopperEye CEO Kate Mitchell says conventional wisdom governing this technology no longer applies.

The relational database so dominates the thinking of information technology and business professionals that its presumed suitability for essentially all data management tasks is rarely questioned. But it?s time to revisit that conventional wisdom. To be sure, this preeminent position has been well earned, since relational database management systems (RDBMS) provide sophisticated development tools, capabilities for handling frequently changing information, support for a large number of concurrent users, and many other features.

However, the character of much of the data generated by businesses today does not match the strengths of the RDBMS in virtually any respect. This mismatch is revealed within the context of Information Lifecycle Management, or assessing the handling of data from the time of its creation to its obsolescence. ILM is rapidly gaining favor within enterprise IT departments as an effective approach for coping with rapidly growing volumes of corporate data.

The time is right to rethink how to deal with the looming explosion in data volumes.

Consider two of the hot-button IT issues on the top of everyone?s list--the requirements of RFID and Sarbanes-Oxley. From a raw data perspective, they have a great deal in common with other less pervasively covered IT challenges, such as mobile service carriers? call data records or manufacturers? bill-of-material information.

The huge volumes of data these sources generate are related to past business events. This category of data possesses three key characteristics:

1) The data records occur at high transaction rates (usually from automated sources), resulting in a large volume of stored data.

2) The data records never change once they are created.

3) The data records must be saved primarily for historical-reference purposes and will be infrequently (if ever) accessed.

Two weeks of mobile call records easily fill a database to four terabytes (four thousand gigabytes) or more, and this volume will be multiplied by ten- or twenty-fold for so-called "3G" mobile networks. In the case of RFID records, major retailers and distributors are expected to generate between tens of terabytes to, by some incredible estimates, millions of terabytes of these records daily.

Herein lies the mismatch. Relational databases--with their transactional, dynamic and multi-user features--come with functionality that far exceeds what's needed for simply storing and accessing write-once/read-maybe business data. This excess functionality requires sizable hardware and software investments that grow in proportion to the amount of data handled. With costs easily in the seven-figure range, even the most well-funded datacenter would have a difficult time spending its way out of this problem.

The answer likely resides in pairing the RDBMS with a complementary technology that is particularly suited to the demands of capturing and storing large volumes of this write-once data. Ironically, a technology previously destined for the history books may well fit current and future requirements perfectly: the flat file.

Long relegated to application-embedded databases and desktop programs, a flat file that borrows a key feature from the relational database--the index--meets all of the requirements previously described for digital-business event data.

In databases, an index speeds up query access to large volumes of data by providing an entry for each field (such as username, phone number, etc.) and the location of the specific matching record(s). Applying an index to a flat file results in a very accessible repository--much more accessible than a tape library--that can respond quickly to enterprise reporting needs. Further, it can do so using comparatively modest server hardware. Coupled with the ever-decreasing cost of disk-based storage, using a flat file becomes a highly cost-effective approach.

Equally important, moving large volumes of business event data from the RDBMS to a complementary flat-file-based solution enhances the performance of the RDBMS for the tasks it's meant for. At the same time, this approach also delivers on the promise of ILM by putting the right data in the right place for the right cost without sacrificing support for the business.

The time is right to rethink how to deal with the looming explosion in data volumes. The relational database is an impressive technology, but it is also the most expensive way to store large volumes of static data simply to provide for potential access some time in the future.

It is a frequently referenced fact that 80 percent of the data stored in relational databases is never accessed once it is written to the database. For digital business event data, this percentage will be much higher and we know right now that it is unlikely the records will ever be accessed once they are written, let alone be altered by multiple users.

Simply put, the relational database is too much hammer for the digital business-event nail.