The central concept behind benchmarks was historically pretty simple. What's the horsepower of some vendor's "Big Iron"? After all, most systems--the important ones, anyway--were the big boxes sitting in a data center someplace doing important stuff like booking orders or counting money.
They cost a lot. They were based on proprietary architectures that made low-level technical comparisons between vendors difficult. And they were a core part of an enterprise's business.
This was largely the environment that spawned the benchmark business. Ideally, buyers would run their own tests, using their own applications, but this was difficult and expensive. Therefore, demand grew for audited results, using neutral third-party methodologies that were at least reasonably relevant to certain types of workloads. Users could then use these metrics to get an idea of how different systems stacked up against each other. For their part, vendors were quick to wield a new high-water mark in a benchmark as a marketing club to wield against their competitors.
One problem with positing that a high-end Unix system will be used for a single transaction-processing application is that it leads to some pretty silly results. Take the leading TPC-C result on the Transaction Processing Council's Web site, for example. Consider what this 6 million transactions-per-minute figure means in the context of the TPC-C benchmark, a widely used metric for comparing system performance.
This result was obtained using a configuration that cost more than $17 million! Why so much? Well, for one thing, it used almost 11,000 Fibre Channel disks moving at 15,000rpm. Having trouble picturing that? That's 68 cabinets of disks. As for the environment the benchmark purportedly simulated, it assumed that a company would have 518,000 warehouses servicing more than 15 billion customers. Said company was presumably a division of the Intergalactic Acme Corp.
My intent isn't to critique the TPC-C benchmark, which is actually one of the better benchmarks for its purpose out there. (The Transaction Processing Council has also introduced a TPC-E benchmark, which, among other things, aims to allow for more realistic hardware configurations, but it hasn't been widely adopted.) Rather, it's to illustrate that large systems--and for that matter, even modest-size boxes built with today's massively multicore processors--are increasingly not used to run a single application. Instead, they run many applications at one time, using virtualization to keep those workloads out of each others' way.
The problem was that there wasn't much in the way of benchmarks to test performance in a virtualized environment, even though that is often the primary way servers are deployed and used. VMmark has been one option, but it's tied to one virtualization vendor, VMware, and has been criticized for this and other reasons. Intel also introduced a virtualization benchmark, VConsolidate, but it primarily intended to allow server vendors to do their own internal evaluations; it's also no longer supported.
However, the Standard Performance Evaluation Corp. (SPEC) has now also released a virtualization benchmark, SPECvirt_sc2010. The benchmark is intended to measure "the end-to-end performance of all system components, including the hardware, virtualization platform, and the virtualized guest operating system and application software."
The first result is an IBM x3650 server running Red Hat Enterprise Linux 5.5 and its associated KVM virtualization platform. (Disclosure: I work for Red Hat.)
The introduction of this benchmark is significant for a variety of reasons. As I've already discussed, servers are increasingly virtualized as a matter of course, so understanding their performance characteristics while virtualized only makes sense. SPEC is also a good organization to own such a benchmark. It already has a broad stable of widely used and respected benchmarks that range from very simple (such as SPECint) to more application workload-focused (such as SPECjbb). However, all of its benchmarks are relatively easy and inexpensive to run, compared to the aforementioned TPC-C.
The new SPEC benchmark has a lot of credibility out of the gate. This will, among other things, help to gain a critical mass of benchmark results, which will, in turn, help to put in place enough results for buyers to start making meaningful comparisons of both hardware and software platforms.
Performance isn't the only consideration that goes into making virtualization and server buying decisions, of course, but it's one of the factors, and it hasn't been easy to measure in a comparable way until now.