The challenges of PC benchmark development
A look at some of the challenges involved in developing and updating benchmarks for PC testing.
One of our goals here in CNET Labs is to keep PC benchmark testing current. This does not come without its challenges. One good example of the process is our recent work on a new Photoshop CS5 benchmark.
First, a little background. We are sometimes asked why we don't use off-the-shelf benchmark apps such as 3DMark or Sysmark. These software packages are popular benchmarking tools, but they're synthetic benchmarks, in that they represent performance in a series of proprietary tasks, not commonly used consumer software. Those kinds of tests certainly have value--for example, we currently use Cinebench 11.5 to test raw multicore CPU performance. We're also evaluating 3DMark 11 as a possible addition to our existing gaming tests.
However, instead of off-the-shelf testing, we generally prefer to create our own benchmarks that incorporate mainstream applications representing real-world usage. In doing so, we hopefully give you a better idea of how well these systems perform with apps that are at least close to what the average consumer would use. Part of all benchmark development involves balancing the desire to keep software tests updated against the stability of the test and the ability to compare with previous results.
Here's the latest example. We've been using Adobe's Photoshop CS3, which is now two versions behind the latest version of Photoshop. We had skipped a generation, Photoshop CS4, for a few reasons. Only the Windows version had 64-bit support, which would make cross-platform comparisons with Macs less relevant. We also learned from Adobe that the company was accelerating the release of Photoshop CS5 (which eventually came out in the first half 2010), so any CS4 benchmark would be short-lived. Over the past several months, we've developed a version of our Photoshop test using CS5, and we're currently finalizing it for use in CNET reviews.
When developing any new benchmark test, either for a new app or an updated version of an existing app, it can take several months before the test is ready to roll out across our desktop and laptop reviews. We can't transition to the next version until we know it works on all the different types of systems we get in, and until we're confident with the results, and we always make sure product testing takes precedence over test development, so new products can get tested and reviewed in a timely manner.
So, until we have a substantial data set of results on high-end, mid, and low-end systems, we double-test with both the new and older benchmarks. One concern when creating these tests is to make it challenging enough for a high-end system but also able to run on a low-end PC reasonably well. The benchmark has to run at least three times and generate scores within 5 percent of one another, or the entire process must be done again. We also have to be able to reliably automate the test, which enables us to be much more productive and to test multiple systems at once.
Hopefully, this has given you a quick peek inside the benchmark development process. For a more detailed look at what happens behind the scenes here, check out our.