Sun puts 16 cores on its 'Rock' chip

High-end chip likely will stay a step ahead of competitors in the multicore processor race. Also, the chip design could be done this year.

SAN FRANCISCO--Sun Microsystems, already an aggressive advocate of multicore processing, will try to stay a step ahead of the game by putting 16 cores in its high-end Rock chip.

With overheating capping chip speeds, chipmakers have been scrambling to improve performance instead by packing multiple processing engines onto a single slice of silicon. Sun got an early start with its UltraSparc T1 "Niagara" processor, which has eight cores, and it looks like Rock will keep the company a step ahead of the competition.

Rock will have 16 cores, John Fowler, executive vice president of Sun's systems business, said in an interview Thursday. Rock-based servers, due to arrive in servers in 2008, will likely come as competitors' chips have at most eight cores, analysts say. Boosting performance is crucial to Sun's attempt to reverse the diminished influence and use of its Sparc family of processors, which have lost share to mainstream x86 chips from Intel and Advanced Micro Devices and to rivals such as IBM's Power family.

"Sun clearly has gone further with multicore approaches, even with Niagara and Niagara 2, than everybody else. This is just a logical extrapolation of what they've done," said Insight64 analyst Nathan Brookwood. "If it's going to 16 cores, with multiple threads (independent instruction sequences) per core, it's going to be a real barn-burner."

Servers for years have been built with multiple processors, so it's not as if competitors lacking a 16-core design will have no answer to Sun's products. But packing more performance into a single processor provides a way to reduce processor and system manufacturing costs and to boost performance without compounding today's problems with keeping data centers cool.

A Rock design-completion milestone called "tape-out" for the chip is just a few weeks away, Marc Tremblay, Sun's chief architect, said in a meeting here Wednesday. The company is holding a contest right now: if Sun engineers don't tape out the design by December 31, they'll all have to wear a tie, formal attire that Tremblay suspects is lacking from many of the designers' wardrobes.

Among competitors, Intel just moved to quad-core designs by mounting two silicon chips in a single processor package, and AMD's "Barcelona," with four cores on one slice of silicon, is due in mid-2007. Brookwood believes it possible some of these competitors will be able to release eight-core designs in 2008, but not 16.

Moving at a more stately multicore pace is Intel's Itanium family, which just reached dual-core status. Even Power6, due in 2007 from multicore pioneer IBM, will have only dual cores. A Fujitsu Sparc64 processor due in 2008 will have four cores.

Defining what exactly constitutes a core is a tricky business, though. David Yen, Sun's previous Sparc chief, said earlier that some Rock features are shared across multiple cores, blurring the boundaries somewhat.

Sun's chip reputation has been tarnished by years of delays and missteps in its Sparc processor business, said Greg Quick, an analyst with the 451 Group, but the company has partially restored it by meeting Niagara schedules. If it can show customers that Rock will significantly boost performance, Sun should be able at least to prevent current customers from phasing out their Sun servers.

Heavyweight cores
Niagara has eight cores, but competitors have dinged Sun because each core is lightweight compared with those in current chips, such as Intel Xeon or IBM's Power. With the ability to handle 32 threads, Niagara can get a lot of work done in a given amount of time, but the time taken to complete a specific task is relatively long.

Rock's design has a more traditional emphasis on performance, though, with threads running faster when measured individually as well as in aggregate. "Rock tries to optimize for high per-thread performance," Tremblay said.

A key part of that performance comes from what Sun calls scout threads.

Scout threads run about 250 steps ahead of the main threads the chip is actually processing. The scouts try to predict the best path to take when they reach branches in the sequences of instructions taken, and they fetch data the main thread likely will need from main memory so it's stored in relatively fast-response cache memory.

"The scout is the guy who does all the dirty work--all the snow-plowing in front of the main thread," Tremblay said.

Sun was happy enough with the scout thread performance that it chose to pair one scout thread with each regular thread in Rock, Tremblay said. The two threads tend to run at opposite times, with the regular thread launching a scout thread only when it stalls waiting for data from memory, so Rock avoids some of the heating problems caused by multiple threads running simultaneously, Tremblay said.

One consequence of the fast-thread priority is that the chip's clock speed matters more than in Niagara, which runs at a comparatively slow 1.2GHz, Tremblay added. The x86 chips from Intel and AMD have stayed in the 3GHz neighborhood as the companies moved to multicore designs.

Out of order
To speed execution, most modern chips don't methodically execute instruction sequences in a plodding, linear fashion. Instead, they employ various techniques such as out-of-order execution and speculative execution to get a jump on instructions a few steps ahead of the regular sequence.

Niagara employs none of these techniques, each of which requires more circuitry and therefore increases the chip size and power consumption. But Rock takes the opposite approach--and then some.

Rock goes a step beyond with something called out-of-order retirement, Tremblay said. When an instruction is retired, it means the chip has completed that step of processing and has committed its results to internal memory slots called registers.

With speculative execution, the chip makes its best guess about whether or not to take particular branches--conditional decision points that depend on the results of existing calculations. Current chips are able to speculate about the best choices to take, storing results in temporary locations called intermediate registers, Brookwood said. But they don't commit those results to the real registers until the chip is sure the choices were correct.

With out-of-order retirement, the chip commits its speculative results to memory and moves on without having to wait for validation. "What Rock will let you do is actually finish the instruction and maybe finish more instructions beyond it," Brookwood said.

If the choices proved to be the wrong ones, the chip can quickly back up to the earlier state, and software moves backward along with it so that incorrect results aren't produced, Brookwood said. "It's an undo button...for the stuff that's been committed," he said.

Software doesn't need to be rewritten to support out-of-order retirement, Tremblay said. Preserving compatibility is one of Sun's high chip priorities.

Sun had an awkward phase with its processor plans, ripping up its road map, canceling the UltraSparc V chip and relying instead on a partnership with fellow Sparc chip designer Fujitsu. But the company now has a simpler, more attainable strategy, Quick said, and Sun is eager to boast about its progress.

"We are very excited right now about how Sparc is going," Fowler said.

Close
Drag
Autoplay: ON Autoplay: OFF