Design flaws, defects, and faults

Glaskowsky explains the difference between design flaws and defects.

(Sorry for the brief hiatus... I had an important deadline to deal with at the office.)

On July 5, Microsoft announced that it was offering three years of warranty coverage for the "three red flashing lights" problem on the Xbox 360 (press release here).

This announcement began an interesting series of statements, interpretations and outright false conclusions from a variety of sources.

The press release was followed by a conference call featuring Robbie Bach, president of Microsoft's entertainment & devices division. The press release and Bach's statements in the call were unusually coy, Microsoft apparently having decided to refrain from giving details on the technical issues behind the situation. This reluctance worked to the company's ultimate detriment, as some people have inferred the worst in the absence of full disclosure.

The press release, for example, refers only to "a number of factors which can cause general hardware failures indicated by three red flashing lights on the console." In the call, Bach repeatedly declined to give more details, but responded to a question from a financial analyst by saying "you should think of it as a Microsoft design issue" (according to this transcript of the call).

I have to say that "design issue" sounds a lot like "design flaw," a phrase used by Tom Sanders of CRN Australia (here). But Sanders (or a CRN Australia headline writer) also said this in the article's subtitle: "Software giant admits there are 11.6 million faulty consoles sold in the past 19 months, will have to be fixed."

This is very wrong.

Based on this interpretation, in a letter for publication on Jerry Pournelle's Chaos Manor Reviews Web site, popular computer-book author and blogger Robert Bruce Thompson said this: "Microsoft admitted today that all 11.6 million Xbox 360 units that have been shipped to date are defective."

And this is wrong too.

Put simply, a design flaw does not automatically create a defect or a fault.

I have no information about the "three red flashing lights" problem in the Xbox 360. There are rumors that the problem involves overheating of the chips in the machine--which could mean the processor, the graphics chip, the memory, the power supply or something else. I don't know.

For the sake of argument, let's say it's the processor. Let's say that Microsoft designed the processor's thermal-management solution (the heat sink, heat pipe, fan, etc.) to handle up to 80W of heat from the processor while keeping the processor within its operating temperature limits, as specified by the manufacturer (IBM).

But let's say that Microsoft underestimated the effect of time and temperature on the thermal grease connecting the processor to its heat sink, or some other element of the product, so that after a while the thermal-management solution can really only handle 75W of heat.

Leading-edge microprocessors vary considerably from one production run to another. Let's say the typical power consumption of the Xbox 360 processor while running the most demanding games is 70W, with a worst-case maximum of 80W. Of all processors made, 80 percent will never consume more than 75W of power, but the other 20% will...and when they do, in systems with degraded thermal-management solutions, they will run hotter than the specified maximum temperature, and eventually fail.

Let me stress that I'm just making up these numbers to make a point, but I think they're roughly consistent with the available facts.

So now we've described a situation where most Xbox 360 systems will never have a problem. That is, not just as a matter of statistics, but absolutely so--80 percent of the systems will never fail because the processors never consume more power than the thermal-management solution can handle safely.

Of the other 20 percent, only some will fail--the ones that get the heaviest use, the ones run in warm rooms, the ones that collect dust that further degrades heat-sink efficiency. Let's say that of this 20 percent, only half of the systems will actually fail.

So in this hypothetical situation, all Xbox 360s have a design flaw, but only 20 percent have a defect and only 10 percent will have a fault.

The flaw is the failure to provide enough safety margin in the thermal design. The defect is the combination of a processor at the high end of its specified operating-power range with a thermal-management solution that can't handle that much power. (The processor itself is not defective, and the slightly degraded thermal solution would be no problem by itself.) The fault--the actual failure--happens when the overheated processor begins malfunctioning.

I don't know if these numbers are anything like the actual situation with the Xbox 360. The reality could be better or worse. I can't infer enough from Microsoft's behavior; the return rates are obviously high enough to trigger this public admission of a problem, yet they apparently aren't high enough to justify a recall.

But I am quite sure that Microsoft has not said anything to justify these claims that all Xbox 360s are "faulty" or "defective," and I think it's important to understand the difference between these terms and Microsoft's use of "design issue" or CRN Australia's use of "design flaw." The truth is important, and words are all we have to explain the truth to each other.

Tags:
Tech Culture
About the author

    Peter N. Glaskowsky is a computer architect in Silicon Valley and a technology analyst for the Envisioneering Group. He has designed chip- and board-level products in the defense and computer industries, managed design teams, and served as editor in chief of the industry newsletter "Microprocessor Report." He is a member of the CNET Blog Network and is not an employee of CNET. Disclosure.

     

    Join the discussion

    Conversation powered by Livefyre

    Show Comments Hide Comments