Explaining Intel's Turbo Boost technology

The Turbo Boost technology in Intel's new Core i7 Mobile processor is positioned as a way to run the cores faster under certain circumstances--but that's not really why the technology exists.

Intel promotes the Turbo Boost technology in its new Core i7 Mobile processors as a way to adapt to the needs of the software and get more performance from the chip, but this isn't the real reason the technology exists.

The new "Clarksfield" Core i7 Mobile processors introduced at the Intel Developer Forum last week are certainly very impressive. They're huge high-performance quad-core chips with Hyper-Threading, support for two channels of DDR3-1333 DRAM, and an on-die PCI Express controller for the fastest possible connection to discrete graphics chips.

Mooly Eden and Core i7 Mobile processor
Intel VP Mooly Eden shows off the new Core i7 Mobile processor and its companion I/O controller at the Intel Developer Forum. Intel

In his IDF session announcing these parts, Intel Vice President Mooly Eden said the best of these parts, the 2GHz Core i7-920XM Extreme Edition, is "the fastest quad-core processor, the fastest dual-core processor, and the fastest single-core processor"-- all in one chip.

The key to this dramatic claim is a feature called Turbo Boost technology. Basically, if the current application workload isn't keeping all four cores fully busy and pushing right up against the chip's TDP (Thermal Design Power) limit, Turbo Boost can increase the clock speed of each core individually to get more performance out of the chip.

It's easy to see how this works when just one or two cores are being actively used; whatever power the other two or three cores would have consumed can be redirected over to the active cores, allowing them to run at higher speeds.

The quad-core mode of Turbo Boost is a little more subtle; it works when the four cores aren't running a worst-case workload--for example, integer-heavy processing, since it's generally floating-point calculations that consume the most power--so they aren't bumping into the TDP limit. Turbo Boost can increase the frequency of all four cores until they're running as fast as they can for the current workload.

Eden said that the Turbo Boost controller samples the current power consumption and chip temperature 200 times per second and makes whatever adjustments are necessary. Of course, if Windows isn't asking for more performance, Turbo Boost doesn't deliver it.

In the ideal case, where just one core is running, Turbo Boost can increase the clock rate on that core from the chip's rated speed of 2GHz to 3.2GHz--that's like getting a chip eight speed grades faster than what you paid for. (Speed grades, or "bins" in the parlance of semiconductor manufacturing, usually go up in steps of around 10 percent to 20 percent. The Core 2 Mobile processor P series parts have speeds of 2.26, 2.4, 2.53, 2.66, and 2.8GHz. The T series extends this range to 2.93 and 3.06GHz, so by this measurement, 3.2GHz would be about eight steps above 2GHz.)

That's how Intel wants everyone to think of Turbo Boost, but it isn't really the natural way. To explain why, I'll have to digress briefly and describe how chips are designed and built.

Any given microprocessor core architecture, like the Nehalem architecture underlying these new parts, has a certain typical complexity expressed in terms of a number of equivalent gate delays. The clock period has to be long enough to accommodate all of these gate delays.

Any given process technology, like Intel's 45nm "P1266" technology, has its own characteristics. These can be tweaked somewhat to optimize for higher speed, higher yield, lower power consumption, higher transistor density, etc., but generally a company like Intel has just one recipe for high-performance microprocessors like the Core i7.

The combination of the gate delays in the logical design of a chip with the physical transistor and interconnect performance figures for a process determines a maximum clock speed for that chip on that process. As chips are manufactured, they're tested for functionality and speed against various standards like power consumption and temperature rating; each speed grade ends up with its own part number, like "920XM" for the fastest Core i7 Mobile chips.

Core i7 Mobile processor
Intel's new 'Clarksfield' Core i7 Mobile processor is a big hunk of silicon--296 square mm with a carrier 37.5mm on a side. Intel

For the Core i7-920XM, that maximum speed bin is 3.2GHz, not the 2GHz value which is marked on the part. In principle, the 920XM could run all of its cores at 3.2GHz all the time if enough power was available and if the heat sink could keep the chip cool. (This is why Turbo Boost isn't like consumer overclocking: the chip is operating within its design specifications at all times.)

In a laptop, the potential for quad-core 3.2GHz operation just can't be realized. Intel selected the 55W TDP specification for the 920XM because that's a practical limit for a laptop processor. Combine that number with the rest of the chipset, the memory, a high-end graphics chip, and a big high-resolution LCD panel, and the whole laptop might be consuming 80W-100W when running all-out.

If the 920XM were configured to run all of its cores at 3.2GHz, I estimate it would consume at least 110W of power for the CPU alone--completely untenable in a mainstream laptop. (Though it's true that some original equipment manufacturers make laptops using desktop Nehalem processors; they're just huge, heavy, and hot.)

So Intel calculated how much it has to slow down the 920XM in order to meet the industry-standard definition of TDP, which amounts to a worst-case real-world workload running on all four cores. (Maximum power is defined in terms of a worst-case synthetic "power virus," but since real applications aren't that brutal in their processing demands, maximum power is only of interest to chip and system designers.)

For the 920XM, that slowdown worked out to 2GHz, and that's why the chip is rated at that speed.

It's worth looking at the previous Extreme Edition mobile processor, the Core 2 Extreme QX9300, which is a quad-core chip that can run all four cores continuously at 2.53 GHz. In spite of the QX9300's faster clock speed, there will still be many situations where the 920XM is faster on quad-core workloads because of the newer Nehalem microarchitecture, which usually gets more work done per clock period.

I haven't seen any good benchmarking comparisons between these two chips. Intel published some selected benchmarks at IDF, but not many, and it isn't clear to me what aspects of chip performance were being stressed.

But for dual-core and single-core performance, the 920XM should be much faster than its predecessor, combining the superior Nehalem architecture with the higher clock speeds enabled by Turbo Boost. The QX9300 has a simpler feature called Dynamic Acceleration Technology, but its effect is limited to only about one speed grade, roughly 10 percent. In most dual-core cases, and I think in all single-core cases, the 920XM will be much faster for the same power consumption.

As I explained in my previous post (see "Intel's Lynnfield mysteries solved"), this same chip design will also be used in desktops and servers, where Intel uses the code names "Lynnfield" and "Jasper Forest" respectively.

In desktops, there's room for the huge heat sinks and fans needed to keep the chip cool, so Intel can mark these chips with faster clock speeds... but the maximum clock rate will still be similar, so the benefits of Turbo Boost will be smaller. In servers, sustained quad-core throughput is the most important thing, so Turbo Boost may not be supported at all; if present, it'll be a relatively minor aspect of the chip's real-world performance.

Servers also provide an opportunity for an apparently paradoxical design optimization. By adjusting process parameters to reduce Jasper Forest's peak clock speed, Intel can actually deliver higher effective performance. The tweaks for slower transistors also reduce leakage currents and thus overall power consumption, making it easier to run all the cores at a slightly high speed all the time. If Intel offers a 55W server processor from this chip design, it could actually run at a higher clock speed than the 2GHz rating of the 920XM at the small cost of not supporting 3.2GHz for single-core Turbo Boost.

The Core i7-920XM Extreme Edition processor is priced at $1,054 in 1,000-unit quantities, so I think most of us will not be shopping for that particular model. Intel also introduced the Core i7-820QM for $546 and the 720QM for just $364. These two parts have a slightly lower TDP rating of 45W and lower clock speeds to match.

The 820QM is nominally rated at 1.73GHz with a peak Turbo Boost speed of 3.06GHz, which is really so close to the 2.0/3.2GHz figures of the 920XM that the 820QM is a better deal for every use of a laptop except for gaining bragging rights. At 1.6/2.8GHz, with all the same basic features (Hyper-Threading, 8M of L3 cache, DDR3-1333 DRAM, etc.), the 720QM is the best deal of all, and I expect it to be very popular.

Featured Video

Why do so many of us still buy cars with off-road abilities?

Cities are full of cars like the Subaru XV that can drive off-road but will never see any challenging terrain. What drives us to buy cars with these abilities when we don't really need them most of the time?

by Drew Stearne