X

Inside CNET Labs: PCMark matures into a robust benchmark suite with PCMark Vantage

FutureMark's new PCMark Vantage is a robust benchmark suite using real-world usage models.

Daniel A. Begun
5 min read
FutureMark

Contrary to popular belief among the CNET Labs staff, benchmark tests just aren't that sexy to the vast majority of members of the tech-savvy world. Most folks don't want to be mired down with the minute details of how every aspect and subsystem of the product they are interested in performs. In fact, many folks don't care about performance at all! The No. 1 purchasing decision that nearly always outranks all others is price. And the lower the price, the less important performance usually is to the consumer. For those who do care about performance, the majority of them usually just want the 10,000-foot view, which tells them in general terms how Product X compares to Product Y or even more simply, will Product X do what I need it to?

Unfortunately, if often takes a lot of time and effort to generate enough data in order to intelligently make a simple and straightforward recommendation. Take laptops for example. In addition to running benchmark tests comprised of real-world applications performing real-world workloads (such as iTunes converting MP3 files to AAC files) to gauge a laptop's application performance, we also test how long a laptop's battery lasts when playing a full-screen DVD movie. Battery-life testing a laptop is very time consuming: First we have to set the laptop's power settings according to our established testing methodology in order to ensure that we consistently test each laptop the same way. Then, starting with a fully charged battery, we have to go though several battery run-down and charge cycles in order to make sure that our results are repeatable. That's a lot of time and work just to tell you that the Dell XPS M1330's battery lasted 2 hours and 23 minutes.

Many of CNET Labs' systems benchmark tests use common, off-the-shelf applications, such as Photoshop and QuickTime, performing tasks similar to what you might actually use the systems for. We made a conscious decision to use this more real-world approach, as opposed to using synthetic benchmarks. (Explaining exactly what synthetic benchmarks are and why they are less-than-ideal is very complicated. For an excellent, in-depth discussion of this topic, see this article on Arstechnica. Even though the article was written eight years ago, it's just as valid today as it was when it was written.)

We've tried to steer away from third-party, synthetic benchmarks, such as FutureMark's PCMark05 and 3DMark06 applications. But with its brand-new replacement for PCMark05, PCMark Vantage, FutureMark has taken a giant step closer to the world of real-world benchmarks.

PCMark Vantage is a benchmark suite that runs on all versions of the Microsoft Vista operating system, including 64-bit versions. Instead of focusing exclusively on hardware-centric performance analysis as previous versions of PCMark have, this new version gauges performance on real-world usage scenarios using Vista's built-in applications, such as Windows Movie Maker and Windows Photo Gallery. Not everything in the benchmark is 100-percent real-world, however: One of the gaming tests uses the same synthetic 3D workload from 3DMark06.

The benchmark includes a number of different test suites, representing a modest, but impressive range of performance-based usage scenarios. The Memories Suite works with image manipulation and video editing. The TV and Movies Suite includes video transcoding and video playback, incorporating some HD content. The Gaming Suite focuses on 3D graphics rendering and AI. The Music Suite utilizes audio transcoding. The Communication Suite performs data encryption and compression. The Productivity Suite searches though contacts and mail. The descriptions I offer here are oversimplifications of what each suite actually does. Almost all of the test suites utilize multitasking scenarios, where multiple apps are performing tasks simultaneously. A number of the tests can even take advantage of the presence of multiple CPU cores, with one test that even utilizes up to 8 CPU cores at once. The Benchmark is designed to stress all of the major subsystems of a computer: The CPU, memory, hard disk, and GPU--but in a way that mimics how a real user might actually use the system--providing scores within the context of each usage scenario. The individual suite scores are used to calculate the overall PCMark score. Note that you cannot run the individual suites in the Basic version of PCMark Vantage; it is designed to provide just a PCMark score.

I ran the benchmark a handful of times on a couple of systems. My initial impression is that PCMark is slightly more sensitive to differences between system configurations than our benchmark tests are. For instance, the PCMark scores of the two systems we tested differed by 20 percent; while our test scores differed by about 14 percent overall. As of this moment, however, I am not sure how the PCMark score is weighted based on the individual suite subscores, so I'm not prepared yet to draw any final conclusions.

What has me concerned, however, is that I saw some inconsistencies with some of the scores from multiple test runs on the same systems. For a benchmark to be reliable, it must be able to generate reproducible scores over multiple runs. The Communications and Productivity suites, especially, showed inconsistencies across multiple runs on both systems. The inconsistencies weren't egregious, but they were significant enough to decrease the potential reliability of the scores. As far as the overall PCMark scores are concerned, the variations I saw there consistently stayed below our +/-5 percent margin of error. So even with the variations with the subscores, the overall PCMark score appears to land within our acceptable range of variability. You're bound to see some variability over time when benchmarking computers with such complicated and varied subsystems. Additionally, my sample set is too small to draw any final conclusions. I need to test far more systems before I have a more solid idea of how reliable PCMark is. Will we integrate PCMark Vantage into our standard test methodologies for Vista-based desktops and laptops? What I've seen so far is promising; but I reserve making my final decision until I've had the chance to try it out on more systems.

PCMark Vantage is available for download from FutureMark.com. Prices range from $6.95 for the Basic Edition to $495 for the Professional version.