IBM, Nvidia land $325M supercomputer deal

US Energy Department funds two huge machines that combine IBM and Nvidia chips with Mellanox networking. A further $100 million goes toward making faster next-gen supercomputers.

Stephen Shankland Former Principal Writer
Stephen Shankland worked at CNET from 1998 to 2024 and wrote about processors, digital photography, AI, quantum computing, computer science, materials science, supercomputers, drones, browsers, 3D printing, USB, and new computing technology in general. He has a soft spot in his heart for standards groups and I/O interfaces. His first big scoop was about radioactive cat poop.
Expertise Processors | Semiconductors | Web browsers | Quantum computing | Supercomputers | AI | 3D printing | Drones | Computer science | Physics | Programming | Materials science | USB | UWB | Android | Digital photography | Science Credentials
  • Shankland covered the tech industry for more than 25 years and was a science writer for five years before that. He has deep expertise in microprocessors, digital photography, computer hardware and software, internet standards, web technology, and more.
Stephen Shankland
3 min read

This rendering shows a few of the cabinets that ultimatly will comprise IBM's Sierra supercomputer at Lawrence Livermore National Laboratory.
This rendering shows a few of the cabinets that ultimatly will comprise IBM's Sierra supercomputer at Lawrence Livermore National Laboratory. IBM

In a Department of Energy deal worth $325 million, IBM will build two massive supercomputers called Sierra and Summit that combine a new supercomputing approach from Big Blue with Nvidia processing accelerators and Mellanox high-speed networking.

The companies and US government agency announced the deal on Friday ahead of a twice-yearly supercomputing conference that begins Monday. The show focuses on the high-end systems -- sometimes as large as a basketball court -- that are used to calculate car aerodynamics, detect structural weaknesses in airplane designs and predict the performance of new drugs.

The funds will pay for two machines, one for civilian research at the Oak Ridge National Laboratory in Tennessee and one for nuclear weapons simulation at the Lawrence Livermore National Laboratory in California. They'll each clock in with a peak performance surpassing 100 petaflops -- that's a quadrillion calculations per second as measured in the Top500 list that ranks the world's fastest machines. Trying to do that with modern laptops would take something like 3 million of them, Nvidia estimates.

In addition, the DOE will spend about $100 million on a program called FastForward2 to make next-generation, massive-scale supercomputers 20 to 40 times faster than today's high-end models, Energy Secretary Ernest Moniz was scheduled to announce Friday. It's all part of a project called Coral after the national labs involved: Oak Ridge, Argonne and Lawrence Livermore.

"We expect that critical supercomputing investments like Coral and FastForward2 will again lead to transformational advancements in basic science, national defense, environmental and energy research that rely on simulations of complex physical systems and analysis of massive amounts of data," Moniz said in a statement.

Supercomputing progress faltering?

The deal is a lucrative feather in the cap for the companies. IBM will build the overall system using a design that marries main processors from its own Power family with Volta accelerators from Nvidia. IBM has decades of experience in high-performance computing, but Nvidia, most of whose revenue comes from graphics chips to speed up video games, is a relative newcomer.

The world is accustomed to steady increases in computing power, but growth of supercomputing progress slowed in recent years. No longer do processor clock speeds conveniently ratchet up to higher gigahertz levels each year, and the constraints of funding, equipment cooling and electrical power consumption are formidable.

To tackle the problem, IBM is adopting a supercomputing approach it calls data-centric design. The general idea is to distribute processing power so it's close to data storage areas, reducing the performance and energy-consumption problems associated with moving data around a system.

"At the individual compute element level we continue the Von Neumann approach," IBM said of its design, referring to the traditional computer architecture that combines a central processor and memory. "At the level of the system, however, we are providing an additional way to compute, which is to move the compute to the data."

Modern architecture

The system encompasses relatively new computing trends, including flash-memory storage that's faster but more expensive than hard drives, and the graphic processing unit (GPU) boost from Nvidia. Such accelerators aren't as versatile as general-purpose central processing units, but they can solve particular types of math problems faster. That's why accelerators from Nvidia, AMD and Intel have found a place in supercomputing systems.

"This is a huge endorsement for the Tesla GPU accelerator platform," said Sumit Gupta, general manager of Nvidia's Tesla accelerated computing business. "To be able to build up these large systems, you need the energy efficiency that GPU accelerators provide."

One big problem with systems that include both CPUs and GPUs is getting data where it belongs. CPUs generally run the show, offloading some work to GPUs, but to do so, they have to transfer data from CPU memory to GPU memory. To speed that up, Nvidia offers its NVLink interconnect, which IBM said is five to twelve times faster than today's technology at the transfer.

Another key player in the system is Mellanox, which is supplying high-speed networking equipment using the InfiniBand standard to rapidly shuttle data around the system.