Intel is trying to overhaul the supercomputer. The idea is to pack more processing power into less space. The 50-plus core Knights Corner processor is how Intel hopes to make it happen.
Let's be clear. It's not that Intel is necessarily losing the supercomputer race--its Xeon processors still power the vast majority of the world's supercomputers--but supercomputing is changing. And the chip giant's arch rival , with that increasingly rely on its graphics processing units (GPUs) to do supercomputer calculations more efficiently.
Intel's Knights Corner processor is slated to land in a supercomputer at the Texas Advanced Computing Center (TACC) at the University of Texas at Austin by 2013. The new system, called Stampede, will be built by TACC in partnership with Dell and Intel. When completed, Stampede will house several thousand Dell "Zeus" servers, each with dual 8-core Intel Xeon processors. This production system will offer almost 2 petaflops of peak performance.
But that's not all. Importantly, the supercomputer will also include Knights Corner chips--built on Intel's latest 22-nanometer 3D transistor process--providing an additional 8 petaflops of performance (for a total of 10 petaflops). That's an extremely important number because the fastest supercomputer in the world today does about 10 petaflops. (Also note that Intel claims double precision floating point performance, an important metric.)
I talked with James Reinders, Intel's resident expert in the area of parallelism. He helped design the world's first teraflop (trillion floating point operations per second) supercomputer, Asci-Red, which was deployed at Sandia National Laboratories in 1996. Knights Corner essentially condenses all of that into one chip.
Question: What is Knights Corner? Where did it come from?
Reinders: "They're modified Pentium-era cores. What we've done is go back to a simpler design that's more power efficient. When you know that people are going to use your parallelism, a simpler core is a power saving opportunity. Modern processors have moved away from that because of more diverse workloads.
What is the connection to the Larrabee project?
Reinders: I hope that it sounds just like Larrabee. It's essentially the same folks working on it. The architecture wasn't the challenge that we ran into with the Larrabee project ( ). The idea that you put many X86 cores all on a single die (piece of silicon) and that you can get great performance out of that--that [idea] wasn't a mistake. (X86 is the same architecture used in all Microsoft Windows PCs today, thus making supercomputers easier to program for because tools are already widely used, so Intel argues.)
The Knights Ferry prototype cards (Knights Ferry is an older design composed of 30 or 32 cores)--that's a Larrabee, though a slightly modified Larrabee. Knights Corner is a new chip but inherits a lot of the same design philosophies. It's a changed Larrabee but it's still got its roots with Larrabee.
Why didn't Larrabee fly?
Reinders: Larrabee was two things. Putting a whole lot of programmable cores in one place. That's one. And the other thing Larrabee was, was a decision to sell it as a high-end graphics card. We didn't get to the market with the right product at the right time. Then when we revisited it and thought about it and decided that there is a much better opportunity to put our efforts into data parallelism. If you look at the graphics we've got now in our Sandy Bridge processors, they've gotten better, faster than we thought they originally would. As a result, the plug-in graphics card business isn't growing in a big way.
So, is Knights Corner your answer to Nvidia's efforts?
Reinders: It's fair to draw that comparison, but it's not how we would define it. They [Nvidia's chips] were designed to be graphics processors and turned out to have applicability in that [supercomputing] space. We've taken the approach of trying to design a programmable device specifically for that purpose. It's very different at the hardware level. When Intel talks about cores, we talk about general-purpose X86 cores. And that's what Knights Corner has in it. When GPU manufacturers talk about having a lot threads or if they talk about cores, it's more a hardware pipeline that they're talking about. The thing that makes our device programmable is that it's real X86 cores so it's programmable as X86.
Can you elaborate on the differences more?
Reinders: OK, take scalability and vectorization. You have to deal with scaling--a program that runs faster if you offer it more cores. The other thing is Vectorization. It's a kind parallelism in an individual core. It operates on more data on each instruction. Nobody can get away from this. Every device on the planet has to expect that something in the programming or the tools will grab that parallelism.
The second thing...GPGPUs [general-purpose GPU] and Knights Ferry [the Knights Corner prototype system] present themselves not as the central processor but present themselves as an attached processor. You have to get data in and out of them. It has some additional challenges when you have an attached processor scenario.
After you've dealt with understanding how your program scales, vectorizes, and how it moves data in and out of the device, you need to program the device itself to do something. And that's where we're vastly different. Because we took an approach to put together more than 50 X86 cores.
But the Nvidia's GPGPU is obviously a viable alternative considering the increasing number of high-profile installations?
Reinders: The GPU's approach is shader pipeline graphics. And that's being repurposed to do the algorithms that it can map to. When you say they're becoming more programmable, they really can't do arbitrarily different things. They're kind of locked together but you add a little bit of independence.
You kind of have two philosophies. You have very independent cores of an X86 multicore. Or you start with something that's very locked together but simpler and try to make it more generally programmable.
The real debate becomes what you believe will result in the best power-performance. How much performance you can get per watt. A lot of people find appeal in the idea of something that's less programmable but each item is simpler and the feeling that this is lower power (more power efficient). My feeling is that the extra programmability can be made efficient enough.
What's the future of Knights Corner? A supercomputer containing only these many-core processors?
Reinders: Yes, we're definitely on that path. Knights Corner is doing a teraflop. I was one of the designers of the Ascii Red, the first teraflop computer on the planet. It was more than 9,000 processors. When you shrink all of that onto one chip, you still need to surround it with memory and I/O and other things but you can't touch the individual cores the same way we could on a 9,000 processor machine. There are some challenges there. Those get worked out over time. So, in the future we'll have a cluster of these that will be more power efficient. We haven't announced a product that does that but it's certainly an expectation that we won't miss fulfilling eventually.
I don't know of any operating systems that boot on GPUs. When we demonstrated Knights Corner (on Tuesday) we had a Linux operating system running on the chip. Is part of it running on the Xeon? No, actually the full Linux operating system is running on the Knights Corner chip.