The problem, now patched, afflicts several servers using the 900MHz UltraSparc III processors, Sun said in an advisory posted in January and updated Thursday. The problem crops up through the cache, high-speed memory that stores data that a processor can fetch more quickly than information stored in main memory.
In most instances, error-correction mechanisms fix the problem, but in some "very rare instances" multiple simultaneous errors could cause a crash, Sun said in a statement Tuesday.
Sun started investigating the problem in September 2002 when it discovered more 900MHz UltraSparc III motherboards were being returned than expected. "Only a very small percentage of Sun's global customer base has had the potential to be affected," the company said.
Affected servers include the Sun Fire 3800, 4800, 4810, 1280, 6800, 12K and 15K--systems that run the gamut from four-processor models to the company's top-end 72-processor behemoth. Sun fixed the problem in servers that are shipping and has been notifying affected customers.
To fix the problem, Sun published firmware updates that change a server's hardware settings. It also published a patch to the Solaris operating system that can be used to sidestep the problem.
"Any problems that might affect customers are serious," said Brad Schultz, vice president of operations for Sun services. "The fixes are now shipping and are in place at customer sites."
Sun, which touts its reliability as a main reason to purchase its servers, has run into problems before. In 2000, the company foundon several of its systems that lacked today's error-correction mechanisms.
Sun has to be careful with its claims, said Sageza Group analyst Charles King. "Their attempts to peel away IBM mainframe customers have led them to claim over the last two years that their stuff offers mainframe-class reliability and availability that's beyond the capacity levels of what most other Unix server operators offer."