Simultaneous multi-threading is a useful performance tool for processor designers, but it doesn't have the same benefit that adding a full core does.
I've been getting a fair number of questions about multi-threading the past couple of weeks. The reason is that Intel has been previewing its "" Xeon processor in advance of Advanced Micro Device's six-core "Istanbul" CPU launch. Intel's Nehalem generation has simultaneous multi-threading (SMT)--which Intel calls Hyper-Threading (HT)--while Istanbul does not.
I wrote about this topic in depth a couple of years back in "Gradations of Threading," but it's worth reviewing in the context of these new server processors.
First, a little terminology.
A thread is a sequence of instructions that can execute in parallel with other threads. The details of what exactly constitutes a thread and the relationship between threads and other structures such as processes vary by operating system. However, for our purposes here, think of a thread as an independent task.
A core is, in most respects, a complete processor that includes all the hardware such as execution units, registers, and so forth required to execute a sequence of instructions. Although multiple cores on a single die or in a single package (i.e. a chip or socket) may share certain resources such as cache memories, logically each core is a full central processing unit (CPU). That multiple cores are packaged together today is essentially an implementation detail that relates to getting the best performance out of the most economically sized silicon die.
Absent multithreading, each core can execute one thread at a time, running that thread until it has completed or until the operating system scheduler swaps it out for another thread.
SMT changes that 1:1 relationship. On a processor with SMT, more than one thread can execute on a single core at the same time--in the case of HT, it's two threads per core.
SMT potentially allows a processor to be more efficiently utilized. The reason is that modern microprocessors have multiple execution units within each core. For example, they have separate logic to handle integer operations and floating-point operations. Thus, in principle, if a thread with mostly integer operations runs concurrently with a thread that mostly crunches floating-point numbers, we could keep the processor busier by running both threads at the same time than we could running them sequentially.
The other main benefit is to hide memory latency. CPUs have to operate on data and that data has to ultimately come from memory or disk. Computer designs incorporate all sorts of techniques--such as caches and prefetching--to keep data close to processors in time and space. Nonetheless, processors still spend a lot of time waiting for data to arrive from relatively pokey memory. SMT lets a CPU quickly switch away from a thread that's sitting idle waiting for associated data to arrive.
SMT is therefore essentially a technique to use a processor more efficiently. It does not itself add execution resources to a core. And, in fact, the duplicated hardware and other logic that SMT requires to function (such as registers) takes space away from implementing other features (such as larger caches) that could themselves provide alternative ways to boost chip performance.
Intel's HT implementation--a fairly "lightweight" approach relative to IBM's on its Power processor--uses on the order of 5 percent of the total chip area to deliver typical performance gains of between 10 and 20 percent. (Optimized applications can see bigger gains. On the other hand, applications that are already efficiently using the CPU's execution units--or that are bottlenecked in ways that SMT can't assist with--may see no gain at all.)
Ultimately SMT is just one performance feature among many that may or may not be a match for a given processor's design. In Intel's case, it's been in some x86 designs but not others since it debuted on the Pentium 4; Itanium uses a simpler Temporal multi-threading approach.
SMT's in the plus column of the features checklist. But what really matters is overall processor performance on relevant workloads and platform capabilities. SMT is one tool to get there.