Netizens have been encountering widespread difficulty reaching Web sites, sending email, and using the Internet in general today due to the ripple effect of a database problem combined with "human error" at the InterNIC.
Last night, an Ingres database failed, resulting in corrupt ".com" and ".net" zone files, according to David H. Holtzman, senior president of engineering for Network Solutions (which runs the InterNIC), and David Graves, an InterNIC business manager.
The InterNIC assigns the Internet's most popular domains. Its name servers are used to translate the names to actual addresses on the Net. Although the name files were only unavailable for four hours, it affected the Net for much longer than that, much in the same way that a huge accident on a major freeway will tie up traffic for hours even after the roadway is clear.
People were reporting problems reaching email addresses and domains for several hours yesterday, probably because their ISPs kept zone files cached, which meant the bad files stayed within the systems.
The errant zone files should never have been released, but they were, Graves said. "As soon as that occurred, our quality-assurance software raised alarms. Despite all the alarms and warnings, the system administrator on duty allowed the release of the zone files without making sure they were regenerating and without bothering to verify the integrity of those files." That happened at last night at 11:30 PT.
When the files were released, it meant that Internet servers looking for an address ending in ".net" and ".com" would be told that the address did not exist.
It is difficult to gauge how many people were affected by the partial outage because many servers cache addresses ahead of time. But many don't. Judging by email and complaints, at least thousands of people saw their email bounce and had trouble surfing sites.
Graves added the InterNIC realized the problem immediately and resolved it four hours later at 3:30 a.m. PT. But there was a widespread ripple effect that meant many people were still having problems after that time.
Graves insisted that today's problem was not "technological." Instead, he said, "the reason it happened was human error. All the protective technology worked."
This problem comes on top of at least a week of negative publicity for the InterNIC. In the past week, Netizens have complained about several different issues relating to the domain registrar. Yesterday, an Internet service provider complained that the InterNIC canceled its domain name without warning; on Monday, InterNIC's rival domain registry, AlterNIC, redirected users from "www.internic.net" to its own site last weekend in what it called a "protest."
Today's problem also highlights an increasingly worrisome concern: that on the Net, one glitch--attributed to a technical malfunction or human error--can have a broad impact.
Whether it's a router that goes down or a name server, when something breaks people all over the world feel it. Because of latency and the way sites are cached, they often feel the effect for a long time.