Chipmakers have found it harder to kick up a processor's clock speed while actually getting more useful work out of a chip and avoiding. As a result, Intel, Sun Microsystems and Advanced Micro Devices have begun emphasizing other features--for example, squeezing multiple processing engines, called cores, onto a single slice of silicon or executing many instruction sequences, called threads, simultaneously.
But IBM announced last week at thethat its forthcoming Power6 processor will run at least at 4GHz--the same speed .
McCredie, Power6's chief architect, joined IBM's mainframe group in 1991 but later moved to its Austin, Texas, team to work on the Power processors at the heart of the company's Unix servers. IBM has steadily gained share in the Unix server market against longtime leader Sun Microsystems in recent years, and much of the future success of that competitive attack lies with McCredie.
A 4GHz minimum puts IBM ahead of the pack, but Big Blue has hit some hurdles elsewhere in the race. In 2004,, with a faster variant called Power6+ scheduled for 2007, but McCredie said Power6 now is scheduled to emerge in 2007.
McCredie spoke with CNET News.com's Stephen Shankland about papers he co-authored for the chip conference.
Q: What was the big news at the conference for IBM?
McCredie: The main thing we did here is reveal the Power6 system design. We're still a year away from general availability, but the message we're trying to put out is we're still on schedule and continuing our pace. We released Power4 in 2001, Power5 in 2004, and we're on track for 2007 for Power6. We were showing pass-two (second-generation prototype) hardware results in all our papers with performance numbers. We're hitting some high-frequency targets.
Many of your rivals say they're moving on from the clock speed race.
McCredie: The high frequency is a real strong message out there. Other people may be shying away from high frequency, but we're still focusing on that. We didn't go to extremely power-hungry circuits or slicing the pipeline into 20, 30 or 40 stages, which implies you got the frequency by sacrificing performance.
A deeper pipeline divides up processing into many different steps so several instructions can be processed at the same time.
McCredie: The pipeline is the measure of the delay from the point where instruction is launched to point where an application or user has access to the results. (Having a deeper pipeline) is like having several sinks to wash the dishes--one for washing, a first rinse, a second rinse...If you hit the high frequency by really lengthening the pipe, you increase how long it takes for an instruction to get through the computation. If you double your frequency and double your pipe depth, you don't deliver a lot more performance. We doubled the frequency but held the pipe depth the same as Power5.
Power6 is built with a more advanced manufacturing process employing 65-nanometer features compared to the 90-nanometer process of Power5+. (A nanometer is a billionth of a meter; smaller circuitry means chips can be made smaller and more cheaply.) How is the 65-nanometer process working out?
McCredie: Right now we're very pleased with 65 nanometers. It's coming along nicely, as you can see from the frequency number. On one paper we showed the maximum frequency reached 5.1GHz. We got a little better than 2x the performance, which shows the 65-nanometer process is performing better than our 90 nanometer.
What frequency will production Power6 chips use?
McCredie: That's the thing we haven't got down yet. Our actual shipping frequency is going to be set by the system environment we choose to put the chip into. Chip development has to be well ahead of system development. Those system environments will have various power constraints and thermal constraints. As we settle those down we'll be able to pin the ultimate frequency. We're telling people now the ultimate frequency will be between 4GHz and 5GHz.
How do you get up to that speed?
McCredie: That's what Brian Curran (the lead author on IBM's 4GHz paper for ISSCC) showed. If you're holding the pipe depth constant, you have to put half as much logic between each pipe stage. We had to get to the point where our circuits were doing double and triple duty, where one set of transistors were doing multiple functions. We had half as much gate delay between latches but had to get more work out of them.
The chip will come out in 2007. Isn't that later than what IBM had said earlier?
McCredie: We're trying to stay on a three-year cadence.
I imagine you're trying to balance the Power6 systems so that a 4GHz chip won't just idle away more processing cycles waiting for data.
McCredie: We did scale the system with the chip. While we didn't show it at ISSCC, one of the key things is our next-generation I/O (input-output). We also have our third-generation elastic interface, so we are scaling our memory and processor-to-processor communication. We're focusing on scaling our system structure as well so we don't just spin our wheels faster.
Multicore processors are all the rage, and IBM led the market with the dual-core Power4. What does the future hold for Power6?
McCredie: The key thing is the balance between maintaining our single-core performance as well as maintaining our system throughput. There are many applications that still count on that single-core, single-thread, uniprocessor performance. Not all the applications have been migrated over to exploit multiple cores. We're trying to strike the balance between single-thread performance on uniprocessor apps and SMP (symmetrical multiprocessing with many threads). That's one of the reasons we did pursue the frequency.
How many cores does Power6 have?
McCredie: It's still a dual-core chip (like Power5). But we will, as we did on Power5, exploit our chip to get more than two cores per socket, just like we did on Power5.
Your multichip module packaging approach?
McCredie: Exactly. Power5 was the first four-core-per-socket design out there.
Will it still have two threads per core?
McCredie: We haven't talked about that yet.
McCredie: The problem you find is this: When you go over and specialize too much on one aspect of performance, you generally get in trouble. Life is never so kind to us architects. If I do this one thing, like great throughput SMP performance, and ignore all these other things, the world is not kind to you. We usually end up making trade-offs. Every single customer has at least one key app that's never been threaded.
Sun isn't arguing that Niagara is good for everything. They're aiming for the front-end server jobs like Web site hosting or Java programs.
McCredie: I don't believe there's a big role in this world for too-specialized hardware. We have to stay on general-purpose hardware that we can do a lot with. Cray, Thinking Machines--the roadside is littered with people building specialized scientific hardware.
What do you think of asynchronous clocking, where different parts of the chip run on independent schedules?
McCredie: For me, personally, for this particular architect, I'm not a big fan. It breaks too many of our tools and verification suites to build large processors.
Power is a big issue these days. What will power consumption look like going from Power5 to Power6?
McCredie: We're targeting the same classes and categories for Power5 and Power6. We're telling people we're hitting the same power envelope as with the Power5. We are holding the power aligned for the most part. Holding the power is a concern for everybody these days.