PALO ALTO, Calif.--Chip designers at Japan's RIKEN say you can get a lot done by specializing.
RIKEN, an anglicized acronym for Japan's Research Institute of Physical and Chemical Research, described on Tuesday the MDGrape 3, a processor it thinks will become the cornerstone of a computer capable of operating at a petaflop, or a quadrillion operations per second--far faster than the 36 trillion ops supercomputers of today.
Samples of the chip, which was designed for life sciences research, can now perform 230 gigaflops, or 230 billion operations per second, while running at 350MHz, better than standard general-purpose chips. In a worst-case scenario, the chip performs 160 gigaflops at 250MHz, said Makoto Tanji, a researcher with RIKEN's high-performance computing group. Tanji spoke at the Hot Chips conference taking place at Stanford University.
The computational power comes, he said, because the chip is specialized for workloads that involve numerous, similar calculations on a comparatively small set of data. This sort of workload is common in the life sciences and bio-nanotechnology field, where researchers need to examine, for example, how a single protein interacts with thousands of different molecules. Consequently, the chip and the computers based on it can be directly compared with general purpose supercomputers only in a limited field, but the processor excels there.
"We can obtain about a 100 times better performance through specialization. The number of operations are more limited on a general purpose computer," Tanji said. For the MDGrape 3 to shine, "the amount of computation must be much larger than the data," he added.
The University of Tokyo initiated the MDGrape project 15 years ago to develop a chip for astrophysics. RIKEN, which is one of the world's largest biosciences institutes, has worked over the last several years to extend the chip's architecture to life sciences and molecular dynamics because the range of applications is wider, Tanji explained. The group will create computers based on the chip for its Protein 3000 project to determine the characteristics of 3,000 proteins. Those machines should appear sometime in 2007.
Commercial systems using the MDGrape 2, which can churn at 16 gigaflops and run at 100MHz, are currently on the market, Tanji said. Work on the MDGrape 3, also know as the Protein Explorer, began in 2002, and the chip should start to be used to run applications in 2006.
Research also continues at the University of Tokyo to develop a quasi general purpose chip capable of 1 teraflop, or a trillion calculations a second. IBM and the University of Texas have a similar teraflop-on-a-chip project.
Architecturally, the MDGrape 3 differs substantially from most other chips. It comes with 20 pipelines for calculations, the equivalent of an assembly line for a processor. Commercial chips typically have one or two. The chip also features what RIKEN calls a broadcast memory architecture, where data is force-fed to the different pipelines simultaneously. Parallelization, a design convention that aims to cut down on redundant or parallel calculations, is optimized in the chip's design.
Despite the differences from other chips, the MDGrape 3 is built on the 130-nanometer process, a manufacturing convention that has been in place for the past few years.
The enhancements lead to huge advantages over general purpose processors. Tanji said the 350MHz Grape 3 can provide a gigaflop of computing power for $15, compared with $400 per gigaflop for a Pentium 4, $640 per gigaflop for the chips inside IBM's Blue Gene/L and a whopping $4,000 per gigaflop from NEC's Earth Simulator, currently the world's most powerful supercomputer.
In terms of power consumption, the 350MHz MDGrape 3 consumers 14 watts of power, or 0.1 watts per gigaflop. A 3GHz Pentium 4 runs at 82 watts, or 14 watts per gigaflop, he said. The Blue Gene/L chip and Earth Simulator come in at 6 and 128 watts, he said.
RIKEN is also designing the computer that will house the MDGrape 3. Twelve chips will fit on a board, while two boards will fit into a 2U-high box (3.5 inches). The chips are all connected to each other through an 81-bit bus, and the boards are connected to the rest of the computer through PCI Express.
The petaflop computer will consist of 6,144 processors on 512 boards clustered together. In all, the system will fit into 32 boxes that will stand on 19-inch pedestals.
"It is very small," Tanji said.