Talkin' about Intel's next generation

Mobile platforms chief Mooly Eden says Intel isn't afraid to go out on a limb when it comes to the company's new chip design blueprint.

Tom Krazit Former Staff writer, CNET News
Tom Krazit writes about the ever-expanding world of Google, as the most prominent company on the Internet defends its search juggernaut while expanding into nearly anything it thinks possible. He has previously written about Apple, the traditional PC industry, and chip companies. E-mail Tom.
Tom Krazit
6 min read
Intel's Mooly Eden is excited. Which means he's in the mood to make big promises.

The irreverent Israeli is now general manager of Intel's Mobile Platforms Group, but he occupies a special role in the company's history as one of the driving forces behind the original Pentium M processor. The chip's combination of low power consumption and high performance emerged from Intel's design labs in Israel, where the company's Core Duo chip was also conceived.

Intel is getting ready to introduce new chips based on what it calls its next-generation microarchitecture, a Pentium M-inspired set of design principles that are today's reason behind Eden's excitement. Though the Santa Clara, Calif., company's stock has suffered from its recent market share losses to Advanced Micro Devices, the new chips scheduled for the second half of the year will help Intel regain the performance crown, according to Eden. In fact, he thinks they'll be as much as 20 percent better than AMD products released at the same time, based on internal testing and projections of AMD's public road map information.

Eden sat down with CNET News.com to defend his predictions about the new chips and explain why the new architecture detailed at this week's Intel Developer Forum is just what Intel needs.

Q: With Yonah, people bring up the unified cache. (AMD) has dedicated cache in (its chip) core. How much performance does that add? Or is it mostly just flexibility?
Eden: It's huge. The question is, what is the application that you're speaking about? Let's look at several different applications and see how much performance I can gain. Let's say you take Yonah, compare it to the competition and run single-threaded applications. A huge difference--because now one of my cores will be able to use all the 2MB cache. If I pick any one of those (cores) and I increase it from 1MB cache to 2MB cache, you can easily get 10 to 15 percent performance improvement.

This is what we are afraid of when we say we go with dual-core. We might find ourselves in a situation that we deliver great performance in a multitasking or a multithreaded environment. But if you go to a single-threaded environment, a lot of the software developers might come and say, "Whoops, my experience on the new system is worse than my experience with the previous one." So the fact that you can use the overall cache (on single-threaded applications)--this definitely gives you a huge advantage.

Yonah just came out, but what still will need to be changed or improved in notebook architecture? And what will be the next things that you want to start working on?
Eden: The reason we're so proud of (the next-generation microarchitecture) in the technical community is because it's much more challenging than Dothan. Dothan and Banias are exactly the same architecture. We just (shrank) it and added 2MB cache. We did a lot of local microarchitecture surgery, but it was local.

We believe we'll be able to open a major gap with the new architecture.

Merom--it's going to be 14 pipeline stages, and instead of a three-wide machine, you put in a four-wide machine and you change the branch prediction. (A four-wide machine means a chip can process four instructions in a single clock cycle.) It's really a major change in clock and in the amount of time it takes you to execute a sequence through different pipelines to make sure that this is fully compatible.

I believe that with innovation and the things that are being put into Merom, it will take at least a year and a half or two years to close such a gap. I'm not afraid to open up and show techniques and everything. We're going to dive into a lot of the things all the way down to the microcode, and the way that you do micro fusion and the micro-ops fusion.

Are you making that comparison based on what you understand the competition is working on, or just based on the architecture you developed in Merom?
Eden: Let's put it like this: You're trying to assess where the competition will be. If you say that you are going to have an advantage, it's based on an assessment. You might be more accurate or you might be less accurate, but it's the risk we are taking. We believe...we'll be able to open a major gap with the new architecture.

How does the four-wide machine performance compare with the integrated memory controller?
Eden: Too many people ask me about the memory controller, and they don't ask me about microfusion or macrofusion, and all these kind of things.

What is memory access? Two things: When you address external memory, you need memory bandwidth and you need memory latency.

Memory bandwidth means in each clock (cycle) you can bring up this amount of data or this amount of data. Memory latency means when I go and I try to fix something for level 1 or level 2 cache, and I've got a miss and the data doesn't exist in the CPU, now I need to go to the external memory. And if I don't have enough parallelism, the CPU is idle--it goes to sleep until you fix the external memory.

I believe that the architecture balance that we'll deliver both in Yonah and Merom probably will give us better performance overall.

Now, (an integrated) memory controller gives you one big advantage. The first advantage is, when you want to access memory you can go to the external memory and fetch it. If you've got a north bridge, you need to go to the north bridge, and then from the north bridge go to the memory, and then from the memory go to the north bridge to fetch the data inside. This takes much longer latency.

What I need to do is to make sure that most of the time, if you go and need data, the data resides inside the cache. If you need to access data from the CPU and the data resides in the cache, the bandwidth to access the cache...is much better than any memory controller, because (with a) memory controller, you still need to leave the chip and go externally.

So, if I go and give you a big cache--and I'll do a great prefetch mechanism that will make sure to prefetch the data and prefetch everything from the memory, to make sure that we've got it well in advance inside the cache before you use it--this solution will probably bring us a better solution than any memory controller.

There is an advantage (to an integrated memory controller), and I'm not trying to diminish it--but there's more than one way to skin a calf. I believe that the architecture balance that we'll deliver both in Yonah and Merom probably will give us better performance overall, and you don't need to look at the memory bandwidth. Yes, maybe my memory bandwidths will be little bit less, but it'll be able to have most of the data in the cache overall, from the CPU point of view.

There are critics out there who claim that the Pentium M and Yonah are just really a Pentium III with a few tweaks--a Pentium III with an Israeli accent, let's say. Does it derive a lot from the Pentium III?
Eden: Part of it is based on previous architecture and part of it looks forward. It resembled (Pentium III) architecture to some extent, but (there are) a lot of different features inside.

Can you give performance improvement based on the same architecture? Are you taking things from the previous architecture? Yes. Did Yonah not take from Dothan alone? Did Merom take from Yonah? Much less, but to call it Pentium III architecture I believe is doing an injustice to the hundreds of people that delivered Banias.