X

Famed physics lab steps up to storage challenge

Faced with a deluge of data, CERN's computer centers use a combo of x86 architecture and Linux to save money.

Tom Espiner Special to CNET News
3 min read
Researchers at CERN, the world's largest particle physics laboratory, face a truly immense storage challenge.

One of its latest projects, the Large Hadron Collider (LHC), is being built to study particles and the forces that bind them together. Due to become fully operational around September 2007, the LHC will fire billions of protons around a 27-kilometer circuit, 150 meters below ground.

Each beam fires 3,000 bundles of 100 billion protons, whose paths are bent round the circuit by supercooled superconducting magnets (operating at minus 271 degrees Centigrade), and are made to collide at the center of four detectors in the tunnel. The interactions between the protons are measured there at 40 million events per second.

In short, this means that scientists at CERN--which is near Geneva--have an awful lot of data on their hands. They use computers to filter the events down to a few hundred "good" events per second, but even this can generate between 100MB and 1,000MB of data per second.

That equates to 15 petabytes of data per year for four experiments, which will be stored on magnetic tape and disk.

"This is far too large for a single data center," said Helge Meinhard, technical coordinator for CERN-IT Switzerland. "The information is federated to more than 120 data centers worldwide."

The processing power currently required by CERN is equivalent to 30,000 CPU servers, Meinhard told ZDNet UK, speaking at the Storage Networking World event in Frankfurt, Germany.

Experimental event data is sent via optical links to CERN computer centers. One data stream is stored on magnetic tape, one data stream is sent to one or two of CERN's 11 "Tier 1" centers, while a third data stream is sent to the lab's CPUs for analysis and to map the particle events.

This network is dubbed the DataGrid, and CERN's scientists will be able to access data from anywhere on the network.

Storage is made more complex by each center being autonomous, although there are commonalities. All the centers use x86 architecture and Linux. CERN uses x86 and Linux on 98 percent of its systems, according to Meinhard.

"The main reason is cost," said Meinhard. "It gives us the best value for money. You don't have to pay per machine, which is a significant advantage."

Another CERN scientist, who preferred not to be named, said that it wouldn't be possible to fund CERN projects if they had to rely on proprietary software, because of the cost of licensing.

CERN physicists also keep costs down by developing their own "homemade" software, and relying on commodity or off-the-shelf equipment as much as possible.

With the collisions beginning in earnest in the LHC by late summer 2007, the physicists hope to find the Higgs boson, a hypothetical elementary particle.

American scientists working on the LHC project got a boost last week when two high-speed networks, ESNet and Internet2, announced they would work together to develop a "highly reliable, high-capacity network" across the U.S.

Tom Espiner of ZDNet UK reported from Frankfurt, Germany.