Latency matters in a hybrid cloud

Especially in a world where we increasingly work with very large data sets, we need to think about data latency when we design hybrid cloud architectures.

Gordon Haff

Gordon Haff is Red Hat's cloud evangelist although the opinions expressed here are strictly his own. He's focused on enterprise IT, especially cloud computing. However, Gordon writes about a wide range of topics whether they relate to the way too many hours he spends traveling or his longtime interest in photography.

See full bio

Gordon Haff

Sept. 27, 2012 11:38 a.m. PT

5 min read

"There's that pesky speed of light." That cautionary remark was offered by Lee Ziliak of Verizon Data Services, speaking on a panel at the 451 Group's Hosting and Cloud Transformation Summit last week. The context was that hybrid cloud environments may logically appear as something homogeneous, but application architectures need to take the underlying physical reality into account.

Latency, the time it takes to move data from one location to another, often gets overlooked in performance discussions. There's long been a general bias toward emphasizing the amount of data rather than the time it takes to move even a small chunk. Historically, this was reflected in the prominence of bandwidth numbers -- essentially the size of data pipes, rather than their speed.

As I wrote back in 2002, system and networking specs rate computer performance according to bandwidth and clock speed, the IT equivalents of just measuring the width of a road and a vehicle engine's revolutions per minute. While they may be interesting, even important, data points, they're hardly the complete story. Latency is the time that elapses between a request for data and its delivery. It is the sum of the delays each component adds in processing a request. Since it applies to every byte or packet that travels through a system, latency is at least as important as bandwidth, a much-quoted spec whose importance is overrated. High bandwidth just means having a wide, smooth road instead of a bumpy country lane. Latency is the difference between driving it in an old pickup or a Formula One racer.

The genesis of that decade-ago research note was rooted in the performance of "Big Iron" Unix servers and tightly coupled clusters of same. At the time, large systems were increasingly being designed using an approach which connected together (typically) four-processor building blocks into a larger symmetrical multiprocessing system using some form of coherent memory connection. These modular architectures had a number of advantages, not least of which was that they made possible upgrades that were much more incremental. (In a more traditional system architecture, much of the interconnect hardware and other costly components had to be present even in entry-level systems.)

The downside of modularity is that, relative to monolithic designs, it tends to result in longer access times for memory that wasn't in the local building block. As a result, the performance of these Non-Uniform Memory Access (NUMA) systems depended a lot on keeping data close to the processor doing the computing. As NUMA principles crept into even mainstream processor designs -- even today's basic x86 two-processor motherboard is NUMA to some degree -- operating systems evolved to keep data affined with associated processes.

However, while software optimizations have certainly helped, the biggest reason that NUMA designs have been able to become so general purpose and widespread is that modern implementations aren't especially nonuniform. Early commercial NUMA servers running Unix from Data General and Sequent had local-remote memory access ratios of about 10:1. The differences in memory access in modern servers -- even large ones -- is more like 2:1 or even less.

However, as we start talking about computing taking place over a wider network of connections, the ratio can be much higher. More than once over the past decade, I've gotten pitches for various forms of distributed symmetrical multiprocessing systems that were intriguing -- but which rested on the assumption that long access times for data far away from where it was being processed could be mitigated somehow. The result has usually not been a good one. The problem is that, for many types of computation, synchronizing results tends to make performance more in line with the slowest access than the fastest access. Just because we make it possible to treat a distributed set of computing resources as a single pool of shared memory, doesn't mean that it will necessarily perform like we expect it to when we load up an operating system and run a program.

This lesson is highly relevant to cloud computing.

By design, a hybrid cloud can be used to abstract away details of underlying physical resources such as their location. Abstraction can be advantageous; we do it in IT all the time as a way to mask complexity. Indeed, in many respects, the history of computer technology is the history of adding abstractions. The difficulty with abstractions is that aspects of the complexity being hidden can be relevant to what's running on top. Such as where data is stored relative to where it is processed.

Two factors accentuate the potential problem.

The first is that a hybrid cloud can include both on-premise and public cloud resources. There's a huge difference between how much data can be transferred and how quickly it can be accessed over an internal data center network relative to the external public network. Orders of magnitude difference.

The second is that, with the growing interest in what's often called "Big Data," we're potentially talking about huge data volumes being used for analysis and simulation.

All of this points to the need for policy mechanisms in hybrid clouds that control workload and data placement. Policy controls are needed for many reasons in a hybrid cloud. Data privacy and other regulations may limit where data can legally be stored. Storage in different locations will cost different amounts. Fundamentally, the ability of administrators to set policies is what makes it possible for organizations to build clouds out of heterogeneous resources while maintaining IT control.

How applications and their data need to relate to each other will depend on many details. How much data is there? Can the data be preprocessed in some way? Is the data being changed or mostly just read? However, as a general principle, processing is best kept physically near the data that it's processing. In other words, if the data being analyzed is being gathered on-premise, that's probably where the processing should be done as well.

If this seems obvious, perhaps it should be. But it's easy to fall into the trap of thinking that, if differences can be abstracted away, those differences no longer matter. Latencies can be one of those differences -- whether in computer system design or in a hybrid cloud.