Sun has high expectations for Niagara

The forthcoming processor embraces both the multicore and multithreading approaches more aggressively than does IBM, Intel and AMD.

SAN JOSE, Calif.--Sun Microsystems' forthcoming Niagara processor performs well on a wide variety of tasks and shows greater ability than competing processors in multithreading, an ability to do many things at once, a top Sun chip executive said.

Niagara has eight processing engines --called cores--each able to simultaneously execute four instruction sequences, called threads. It's neither the first multicore processor nor the first to employ multithreading, but it embraces both ideas more aggressively than competing chips from IBM, Intel and Advanced Micro Devices.

Marc Tremblay, a vice president and chief architect at Sun, argues that Niagara--due in systems to arrive in early 2006 at the latest --benefits from being designed from scratch with multiple cores and multithreading. In a paper that's been accepted for publication, Sun will show that switching on multithreading gives a major performance boost, Tremblay said in an interview at the Fall Processor Forum here.

"The industry is scrambling to get there as fast as possible with whatever (they) have in-house. We decided a few years ago to start from scratch," Tremblay said. "Not until you start from scratch do you see the full advantages."

Specifically, running database tasks measured with the Transaction Processing Performance Council's TPC-C benchmark, running four threads in one core triples performance compared with running one thread, Tremblay said.

Tremblay has bold words from a company that has struggled to deliver compelling chips in recent years. That has hurt Sun's market share while "Lintel" machines using Linux on Intel chips have gained.

"We're going to attack new markets--markets we used to be strong in and lost market share," Tremblay said. "When was the last time a Sparc box had price-performance leadership in the Web tier? Everyone thinks it's Lintel."

Intel's newest Xeon, code-named Paxville , is a dual-core, two-thread design, but successors coming in the second half of 2006 will drop the multithreading ability. IBM's Power4 processor in 2001 was the first dual-core server chip, and the Power5 successor introduced last year added two threads to the feature list. The next Itanium--code-named Montecito but delayed until mid-2006 --brings a dual-core design and two threads per core. And AMD's Opteron is a dual-core design that can handle only one thread per core.

Sun has a lot riding on its multicore, multithread initiative, called chip multithreading and sometimes throughput computing. Sun's Sparc processor family, hobbled by delays and lackluster performance, has been losing share to x86 chips such as Intel's Xeon and to IBM's Power family. Despite that, Sparc servers remain Sun's biggest revenue source, and improving the Sparc business holds the greatest potential for turning around Sun's financial troubles .

The chip overhaul plan begins with Niagara, continues with a successor called Niagara II, and extends as far as a higher-end sibling code-named Rock, which is due in 2008. To achieve the processor overhaul, Sun scrapped its UltraSparc V processor and in the meantime signed a partnership with Fujitsu to use its Sparc64 VI processor in a server family called the Advanced Product Line.

Sun plans two Niagara systems, the 1.75-inch thick "Erie" model and the 3.5-inch thick "Ontario" model .

Lower-cost models
Pricing will be aggressive, Tremblay said. Instead of pricing by the computers' ability to get work done, he said, Sun will price comparably to other slim, rack-mounted computer designs. Such models, most often using x86 chips such as Intel's Xeon, cost well under $10,000.

Auction site eBay's PayPal division is among those testing the Niagara servers, sources have said, but a potentially higher-profile customer could be Google. The search engine company is buying unspecified Sun servers through a partnership announced earlier this month.

Tremblay wouldn't say whether Google is a customer, but he did say Niagara is good for search applications and that "We've talked to Google several times over the years." In addition, Google employs Luiz Barroso, who worked on chip multithreading designs at Digital Equipment Corp., he said. "He's very familiar with some of that technology."

Niagara has error correction on data transfers to its caches and to its most central memory slots, called registers. It also has four on-board memory controllers that are shared among all the eight cores.

Suitable for Web work
Niagara will be good for "Web-facing" tasks such as hosting Web sites and running Java applications, Tremblay said. But it also will work well for housing databases, a type of work Sun engineers studied carefully when designing the chip, he said.

"Application servers and databases are beautiful" on the chip, Tremblay said, running well even if they haven't been optimized for the processor. However, Sun conceives of the Niagara systems as a suitable substitute chiefly for its lower-end models.

He said that Niagara won't do well when a thread has to execute with top speed--when, for example, it is sending streams of media over the network.

For fast thread execution, Sun will hold the fort with APL-based products due in late 2006, but the company is holding out hopes for its in-house Rock design.

Rock will employ an idea, the "hardware scout," for which Tremblay holds several patents. Tremblay holds 102 patents, the most at Sun, including 36 related to multicore and multithreading.

The hardware scout is a special-purpose thread that the chip launches by itself whenever the chip stalls because it has to wait for information to be retrieved from memory. The scout's job is to run software in advance of the chip's active task.

Here, Tremblay grows metaphorical.

"When the main thread is stalled, waiting for data, you launch a scout to plow ahead in the code. You try to run 300 or 400 instructions in front of the main thread to find the landmines," Tremblay said. "It's following branches, bringing instructions and data into the processor. It's basically plowing the snow ahead so you have smooth skating, doing the dirty work for you."

The scout can help find areas where new data must be loaded into the chip's high-speed cache memory. Having the data in place when the main thread needs it can save 500 clock cycles--a "huge" improvement, he said.

Featured Video
6
This content is rated TV-MA, and is for viewers 18 years or older. Are you of age?
Sorry, you are not old enough to view this content.

Top 5: Cars with best resale value

Brian Cooley runs down the top five US automobiles with the best resale value in 2015, five years after original sale.

by Brian Cooley