Why You Can Trust CNET

Our expert, award-winning staff selects the products we cover and rigorously researches and tests our top picks. If you buy through our links, we may get a commission. Reviews ethics statement

Culture

Is Google bringing BigTable out of the closet?

Google may begin the first phase of becoming an infrastructure provider for external developers by exposing its BigTable data storage system as a Web service.

Dan Farber

See full bio

Dan Farber

April 15, 2008 8:12 a.m. PT

3 min read

TechCrunch is speculating that Google may begin the first major phase of becoming an infrastructure provider for developers by exposing its BigTable data storage system as a Web service. This service would be similar to Amazon's SimpleDB service, which automatically indexes data and provides an API for storage and access.

I've queried Google on this potential news and await a response. Given Google's prowess at delivering applications from the cloud, it's logical to expect the company to become a platform for application services, with APIs for storage, compute cycles and databases--similar to what Amazon has done with its S3 storage and Elastic Compute Cloud along with SimpleDB.

It's a way for Google to leverage its massive infrastructure build out, with hundreds of thousands of custom servers running in parallel, and deep computer science expertise by effectively becoming the network service for the planet.

It's the "Red Shift," utility computing concept offered by Sun CTO Greg Papadopoulus. He is predicting a "neutron star collapse of datacenters."

At some point, businesses won't build their own datacenters and developers will program on the network itself. Google, Sun and a few other megaliths will provide the computing resources with brutal efficiency for utilization, power, security, service levels and rapid idea-to-deploy time, Papadopoulus said. It's a model that salesforce.com has adopted on a smaller scale with its platform-as-a-service.

Google describes, BigTable as a "distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers."

The API includes functions for creating and deleting tables and columns, as well as for changing cluster, table, and column family metadata, such as access control rights, according to a white paper on BigTable, which gives the following description of its evolution and usage:

Over the last two and a half years we have designed, implemented, and deployed a distributed storage system for managing structured data at Google called BigTable. BigTable is designed to reliably scale to petabytes of data and thousands of machines. BigTable has achieved several goals: wide applicability, scalability, high per- formance, and high availability. BigTable is used by more than sixty Google products and projects, includ- ing Google Analytics, Google Finance, Orkut, Person- alized Search, Writely, and Google Earth. These prod- ucts use BigTable for a variety of demanding workloads, which range from throughput-oriented batch-processing jobs to latency-sensitive serving of data to end users.

The BigTable clusters used by these products span a wide range of configurations, from a handful to thousands of servers, and store up to several hundred terabytes of data. In many ways, BigTable resembles a database: it shares many implementation strategies with databases. Paral- lel databases [14] and main-memory databases [13] have achieved scalability and high performance, but BigTable provides a different interface than such systems.

BigTable does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format, and al- lows clients to reason about the locality properties of the data represented in the underlying storage. Data is in- dexed using row and column names that can be arbitrary strings. BigTable also treats data as uninterpreted strings, although clients often serialize various forms of struc- tured and semi-structured data into these strings. Clients can control the locality of their data through careful choices in their schemas. Finally, BigTable schema parameters let clients dynamically control whether to serve data out of memory or from disk.