TeraGrid supercomputing project expands

The TeraGrid project to build and interconnect supercomputers gets a $35 million National Science Foundation grant to expand the "grid" beyond its original Itanium 2 designs.

A project to build and interconnect mammoth supercomputers has landed a $35 million National Science Foundation grant that will take the "grid" beyond its original Itanium 2 designs.

In 2001, an alliance of academic supercomputing sites won a $53 million NSF grant to build the Distributed Terascale Facility, known for short as the TeraGrid. Now a $35 million supplement this year will expand the TeraGrid in a project formally known as the Extensible Terascale Facility. The expansion will connect different varieties of supercomputers at five different sites, with an eye to adding more later.

Last year's design called for four computers with a total of 3,300 Itanium 2 processors, code-named McKinley, from Intel in servers built by IBM and installed at the National Center for Supercomputing Applications (NCSA), the San Diego Supercomputer Center (SDSC), the California Institute of Technology and Argonne National Laboratory. That plan has now changed: Many of the supercomputers in the TeraGrid will instead use a later version of Itanium 2 code-named Madison due in 2003.

And with the new $35 million funding this year, other varieties of computer are being added to the mix--the "extensible" part of the TeraGrid project. Among them are one existing Hewlett-Packard machine with more than 2,700 Alpha processors and one new HP server using next-generation Alpha processors at the Pittsburgh Supercomputing Center.

"We're building the plans for the TeraGrid in such a way that we anticipate being able to add additional sites in future years," said Rob Pennington, who leads NCSA's grid work. "This is not intended to be a closed system. It's intended so additional sites with sufficient compute, storage and network bandwidth will be able to join."

One candidate for future admission is a new cluster of 12 IBM p690 servers at NCSA that collectively will be able to perform one trillion calculations per second. IBM announced that system earlier this month.

At Pittsburgh, the new HP system will use its own next-generation EV7 Alpha processors, code-named Marvel, said Rick Maier, the HP program manager who oversees the company's role in the Pittsburgh work. The new system will be fully installed in April, he said, and will be joined by another cluster of HP Itanium 2 systems running the Linux operating system.

The TeraGrid is one of the most ambitious efforts to develop "grid" computing, an emerging technology to join computers into a single vast pool of processing power and storage capacity. Grid computing is being spawned by the same government-funded academic forces that built the Internet.

Large computing companies--including IBM, HP and Sun Microsystems--are getting involved in grids, while a host of smaller software companies are sprouting up with software to govern how jobs run across grids. Among them are Avaki, Platform Computing, UnitedDevices, Axceleon and Entropia.

One key part of grid work is an open-source project called the Globus Toolkit, which helps with tasks such as discovering what processing resources are available on a grid and deciding what jobs have permission to use them.

In total, the TeraGrid computers will be able to perform 20 trillion calculations per second, a performance called 20 teraflops. But while part of the point of the project is to figure out how best to share single computing jobs and data storage across different elements of the grid, most computing jobs probably won't span the entire collection.

"Using the entire set of computational clusters will be the application team's goal, but that's probably not going to be the primary way people are going to use it," Pennington said.

At the SC2002 supercomputing show last week, TeraGrid representatives demonstrated several aspects of the system, including 32 dual-processor Itanium 2 servers from IBM. Other demonstrations used the system for climate change modeling, high-speed access to data stored at remote sites, and creation of animations of three-dimensional data such as molecular structures.

The TeraGrid also uses network capacity from Qwest Communications, storage networking switches from Brocade Communications and networking routers from

The current fastest single supercomputer in the world is NEC's Earth Simulator in Tokyo, with performance of 36 teraflops; IBM has just won a $290 million contract to build a 100-teraflop machine and a 360-teraflop machine.

Featured Video