The move highlights Sun's effort to muscle its way into the market for high-performance, number-crunching computers, increasingly in demand from government and businesses. Airline companies in particular are interested in these "big iron" machines, so they can tackle computationally intense tasks such as scheduling planes, said Steve MacKay, vice president of architecture and technology at Sun.
If Sun wins the contract for the Los Alamos National Laboratory (LANL) supercomputer, it will be made of lots of top-end Unix-based servers connected through a high-speed interconnection system. The news confirms reports that Sun was among the bidders.
The supercomputer market hasn't been kind to financially struggling SGI and the supercomputer company it acquired, Cray Research. But Sun may enjoy an additional advantage besides its strong financial performance--the computers it markets as number-crunchers are the same as the servers it's sold by the hundreds to ordinary businesses.
Palo Alto, Calif.-based Sun's bid is based on the sequel to the company's current 64-processor E10000 computer, MacKay said. The newer machine, which will accommodate more than 100 of Sun's upcoming UltraSparc-III chips, will be released midway through 2000, he said. By ganging many of these systems together, the supercomputer would use thousands of processors in total, he said.
The nuclear weapon simulation supercomputer--to be installed at the federally funded lab in 2001--will be able to perform about 30 trillion calculations per second, a speed known as 30 teraflops.
It's one of several systems in the Department of Energy's Accelerated Strategic Computing Initiative, or ASCI, which aims to push U.S. computing companies ahead faster than they otherwise would move by partially underwriting research and development expenses. The ASCI contracts awarded so far have been worth tens of millions of dollars to Intel, IBM and SGI.
"We are confident we can deliver 30 teraflops," MacKay said. "The question is whether we can deliver it when ASCI wants it."
Incumbent SGI is the company to beat in the bidding. Though in the midst of financial difficulties, SGI won an earlier contract for LANL's current fastest machine, Blue Mountain, and SGI and LANL employees worked hard to transfer the nuclear weapons simulation software from the older Cray supercomputers to the newer SGI system.
MacKay dismissed the issue of rewriting the software for a new computer architecture, saying the software is all custom-written anyway. "They're going to rewrite all that stuff from scratch or 'port' it. They don't care. They just want the flops," MacKay said. Port is an industry expression for transfer.
Sun captured 113 spots on the most recent version of the top 500 computers, a new high and good for third place, after IBM and SGI. It was also a major step up from two years ago, when it had none. Sun's machines, usually found at banks, insurance companies and telecommunications firms, are all based on the E10000 design it acquired from Cray Research.
Earlier predictions that Sun would have half the systems on the supercomputer list by 2000 might well not come true yet, MacKay said. "Because of the dynamics of the chip transition [from today's UltraSparc II to next year's UltraSparc III], I'm not sure whether we're going to be more than 250 in the next year," he said.
IBM, which has won contracts for lesser machines at Lawrence Livermore National Laboratory, isn't a bidder in the race for the 30-teraflop machine, the company has said. IBM, which rallied its designers behind the Deep Blue chess-playing computer that eventually defeated the best human exponent, will unveil a new long-term initiative Monday that will involve tackling computational problems that, if solved, will benefit the health care and pharmaceuticals industry, executives have said.
In the nuclear weapons program, the Energy Department expects to fund two even more powerful systems after the 30-teraflop machine, officials have said.
The bidding process for the 30-teraflop machine allows lesser performance as a trade-off for an earlier delivery date or, conversely, a later delivery date for a more powerful system, a source familiar with the bidding process said.
Sun's bid is based on an architecture somewhat similar to IBM's current supercomputers, which are based on machines using several multi-chip nodes communicating over a high-speed interconnection system. But Sun's method involves fewer nodes with more chips each. IBM, like Sun, is working on a high-speed switch to interconnect the nodes.
The bid is based on a system that's relatively ordinary--everything in it except the high-speed switch is part of Sun's regular product line, MacKay said. "The big advantage from our standpoint is that we're doing only one unique thing for the...teraflop machine: this very high-speed, low-latency interconnect," he said.
Sun's high-performance systems use a technology called symmetrical multiprocessing, or SMP, to tie together all the processors in a single system. With SMP, all the processors talk to a single bank of memory, an issue that competitors such as SGI say has limitations because the more processors there are, the more time each one will spend waiting in line to get information from memory.
SGI instead uses a system called non-uniform memory architecture, or NUMA, for its high-end systems. In NUMA systems, smaller groups of processors each have their own smaller memory bank. Using what amounts to a high-speed internal network, each processor can access information from any of these memory banks distributed across the computer.
But Sun says the drawback of NUMA is that software must be rewritten to take advantage of it. "We don't have to change the programming model of the applications that run," MacKay said. Using a software method called "clustering," software is fooled into thinking it's running on a single system when in fact it's being distributed across several interconnected machines.
Sun's clustering technology, called Full Moon, currently can accommodate up to four systems. The next version of its Solaris operating system, a version of Unix now currently in testing, will handle as many as eight systems.
MacKay acknowledged that the Sun clusters in use today chiefly are used to provide redundancy to protect against crashes, not to provide higher performance by sharing a load across several systems. "Today, most are for 'fail-over,' but we're starting to see an increased interest in these high-performance problems. We anticipate an uptick in people [using Full Moon] for scalability," he said.