X

Everyman's supercomputer

Virginia Tech College of Engineering dean Hassan Aref explains what's behind the school's powerful cluster of Macintosh G5 desktops.

5 min read
Editor's note: In response to an outpouring of comments regarding CNET News.com's recent commentary about the Virginia Polytechnic Institute and State University supercomputer, we invited the dean of the College of Engineering, which oversaw the project, to write this piece.

"Supercomputer" is a relative term. Somewhat like "superstar," it reflects the best of breed for a particular era. Just as superstars come and go, this year's supercomputer will be eclipsed a year or two down the road.

Twice a year, Top500.org lists computers currently the fastest in the world. The top entries on the list always attract attention--and maybe particularly so this year.

An underdog in the competition, Virginia Tech landed the No. 3 position by building a machine in fewer than three months--utilizing 1,100 dual-processor Macintosh G5 PCs--that cost one-fifth to one-tenth the average price.

All the fastest computers in the world today are massively parallel cluster machines.
All the fastest computers in the world today are massively parallel cluster machines. That is, they consist of banks of chips or computers, each one with about the amount of power one would get in a desktop machine. The computers within the cluster are made to work in parallel. Thus, the cluster can produce results at a rate that is, roughly speaking, as many times the speed of an individual computer as there are computers in the cluster.

At the top of the list is the Earth Simulator, a giant project launched in Japan in 2001. As the name suggests, Earth Simulator was designed to produce new scientific insight through computer simulation into global climate change processes that affect our planet. It can complete some 36 trillion floating-point operations per second (or 36 teraflops).

The second entry on the Top500 list is ASCI Q, housed at the Los Alamos National Laboratory. The geopolitical role of ASCI Q is noteworthy. When the United States signed the test-ban treaty for nuclear weapons, an issue came up: How would we guarantee that our nuclear stockpile was secure and ready for use? That area of technological endeavor is known as "stockpile stewardship."

The government decided that supercomputers could do the job. The name for the activity became ASCI, the Advanced Scientific Computing Initiative. ASCI Q performs at almost 14 teraflops, so one might argue that national security and global stability today can be safeguarded by that amount of computational power. Supercomputers of any era have typically been installed in the interests of national defense and will, inevitably, continue to be used in this way.

No. 3 on the list is Virginia Tech X, the name of my institution with the Roman numeral for 10 appended. X, as we call it, consists of a cluster of Apple Computer's G5 desktop computers, with certain technological innovations.

These include the infiniband communications fabric produced by Mellanox, a new cooling system and customized racks for the computers produced by Liebert, the latest release of the Apple OS X operating system, and a proprietary software package known as Deja Vu that compensates for the inevitable glitches that occur on individual machines within the cluster when running a calculation involving many machines.

X got its name by being the first academic machine to exceed the 10-teraflop barrier. The name is also a play on OS X, the current version of Apple's Unix-based operating system. Clocked at 10.28 teraflops for the Top500 list, X may reach even higher speeds. And we at Virginia Tech hope to follow with L and, in due course, C, capable of 50 and 100 teraflops, respectively.

Just this week, Virginia Tech announced plans to migrate its cluster of Power Mac G5 desktop computers to Apple's new Xserve G5 rack-mounted servers. Xserve G5, the most powerful Xserve yet, delivers more than 15 gigaflops of peak processing power per system and features the same 64-bit processor used in the original cluster.

We are advancing our breakthrough terascale computing facility to gain even more industry-leading price performance benefits by upgrading to Apple's new 64-bit Xserve G5 cluster nodes.

These powerful machines will primarily be used for "grand challenge" problems in science and engineering, such as the modeling of biomolecules, global climate change, turbulent flow simulation, materials modeling, finding huge prime numbers and "virtual design" of large engineering problems. They will also be used to provide unique educational opportunities in high-performance computing for future generations of students.

We have received inquiries about using our supercomputer from numerous government agencies and some industry groups, including Argonne National Laboratory, NASA, American Electric Power and others.

The machine was assembled in record time, just three months.
Not only is X's computational speed impressive, but the cost of the hardware was a mere $5.2 million, only a fraction of the cost of ASCI Q or Earth Simulator's price tag. The machine was assembled in record time, just three months.

The fourth and fifth entries on the Top500 list are at the University of Illinois' National Center for Supercomputing Applications and the Pacific Northwest National Laboratory, respectively. Otherwise rather similar to X in design and construction, they have not thus far passed the 10-teraflop benchmark, and they costs two to three times as much.

One very interesting lesson from the Top500 list is that yesterday's supercomputers have suddenly become "affordable." If, for argument's sake, we define a supercomputer to be a machine that is a thousand times faster than the average desktop computer, and if we agree that hooking up desktop computers in parallel is one way to make a supercomputer, it follows that any institution or company that can afford to set aside 1,000 desktop machines--and invest in the communications software to link them--can own a supercomputer.

That is extremely good news for universities and corporations and for society at large. These machines will become miniaturized, and they will become ubiquitous. A step up in commonly available computer power is to be expected.

Some are predicting a minor "revolution" in computing similar to what happened many years ago, when the VAX computer became "everyman's mainframe." Small cluster computers have already been popping up in departmental labs and within academic research groups. Now, clusters at the frontline of performance can be assembled and run anywhere, more or less. The consequences could be truly dramatic.

The great strength of universities is to show what is possible through proof-of-principle experiments and inventions. Machines like Virginia Tech X fall squarely into this category.

The challenge in bringing such proofs of principle to the marketplace generally falls to other sectors of our society, but entrepreneurial faculty often play a role. Most faculty members derive tremendous pleasure from seeing a brainchild of their research in use. Hopefully, individuals who profit from university research--and the societal stakeholders for higher education at large--will understand how to sustain this university "engine" of good ideas and innovations.