Intel has disclosed details on a chip that will compete directly with Nvidia and ATI and may take it into unchartered technological and market-segment waters.
Larrabee will be a stand-alone chip, meaning it will be very different than the low-end--but widely used--integrated graphics that Intel now offers as part of the silicon that accompanies its processors. And Larrabee will be based on the universal Intel x86 architecture.
The first Larrabee product will be "targeted at the personal computer market," according to Intel. This means the PC gaming market--putting Nvidia and AMD-ATI directly into Intel's sights. Nvidia and AMD-ATI currently dominate the market for "discrete" or stand-alone graphics processing units.
As Intel sees it, Larrabee combines the best attributes of a central processing unit (CPU) with a graphics processor. "The thing we need is an architecture that combines the full programmability of the CPU with the kinds of parallelism and other special capabilities of graphics processors. And that architecture is Larrabee," Larry Seiler, a senior principal engineer in Intel's Visual Computing Group, said at a briefing on Larrabee in San Francisco last week.
"It is not a GPU as many have mistakenly described it, but it can do most graphics functions," Jon Peddie of Jon Peddie Research, said in an article he posted Friday about Larrabee.
"It looks like a GPU and acts like a GPU but actually what it's doing is introducing a large number of x86 cores into your PC," said Intel spokesperson Nick Knupffer, alluding to the myriad ways Larrabee could be used beyond just graphics processing. In addition to the PC, high-performance computing and workstations are two potential markets that were also mentioned.
Intel describes it in a statement as "the industry's first many-core x86 Intel architecture." The chipmaker currently offers quad-core processors and will offer eight-core processors based on its Nehalem architecture, but Larrabee is expected to have dozens of cores and, later, possibly hundreds.
The number of cores in each Larrabee chip may vary, according to market segment. Intel showed a slide with core counts ranging from 8 to 48, claiming performance scales almost linearly as more cores are added: that is, 16 cores will offer twice the performance of eight cores.
The individual cores in Larrabee are derived from the Intel Pentium processor and "then we added 64-bit instructions and multi-threading," Seiler said. Each core has 256 kilobytes of level-2 cache allowing the size of the cache to scale with the total number of cores, according to Seiler. And application programming interfaces (APIs) such as Microsoft's DirectX and Apple's Open CL can be tapped. "Larrabee does not require a special API. Larrabee will excel on standard graphics APIs," he said. "So existing games will be able to run on Larrabee products."
So, what is Larrabee's market potential? Today, the graphics chip market is approaching 400 million units a year and has consolidated into a handful of suppliers. "And of that population, two suppliers, ATI and Nvidia, own 98 percent of the discrete GPU business." according to Peddie.
"And the trend line indicates a flattening to decline in the business...However, Intel is no light-weight start up, and to enter the market today a company has to have a major infrastructure, deep IP (intellectual property), and marketing prowess--Intel has all that and more," Peddie said.
Though more details will be provided at Siggraph 2008, some key Larrabee features:
Larrabee programming model: supports a variety of highly parallel applications, including those that use irregular data structures. This enables development of graphics APIs, rapid innovation of new graphics algorithms, and true general purpose computation on the graphics processor with established PC software development tools.
Software-based scheduling: Larrabee features task scheduling which is performed entirely with software, rather than in fixed function logic. Therefore rendering pipelines and other complex software systems can adjust their resource scheduling based each workload's unique computing demand.
Execution threads: Larrabee architecture supports four execution threads per core with separate register sets per thread. This allows the use of a simple efficient in-order pipeline, but retains many of the latency-hiding benefits of more complex out-of-order pipelines when running highly parallel applications.
Ring network: Larrabee uses a 1024 bits-wide, bi-directional ring network (i.e., 512 bits in each direction) to allow agents to communicate with each other in low latency manner resulting in super fast communication between cores.
"A key characteristic of this vector processor is a property we call being vector complete...You can run 16 pixels in parallel, 16 vertices in parallel, or 16 more general program indications in parallel," Seiler said.