IBM plans open-source storage strategy

Big Blue is releasing an open-source version of the software needed to let servers tap into its forthcoming "Storage Tank" technology.

Stephen Shankland Former Principal Writer
Stephen Shankland worked at CNET from 1998 to 2024 and wrote about processors, digital photography, AI, quantum computing, computer science, materials science, supercomputers, drones, browsers, 3D printing, USB, and new computing technology in general. He has a soft spot in his heart for standards groups and I/O interfaces. His first big scoop was about radioactive cat poop.
Expertise Processors | Semiconductors | Web browsers | Quantum computing | Supercomputers | AI | 3D printing | Drones | Computer science | Physics | Programming | Materials science | USB | UWB | Android | Digital photography | Science Credentials
  • Shankland covered the tech industry for more than 25 years and was a science writer for five years before that. He has deep expertise in microprocessors, digital photography, computer hardware and software, internet standards, web technology, and more.
Stephen Shankland
SAN JOSE, Calif.--To encourage the broadest possible support for its forthcoming "Storage Tank" technology, IBM will release an open-source version of the software needed to let servers tap into the next-generation storage system.

Big Blue is working with an undisclosed open-source group on the software and will release the code when the product is generally available in 2003, said David Pease, manager of storage software at IBM's Almaden Research Center and leader of the 5-year-old Storage Tank project. In addition, IBM plans to publish the communication method fundamental to the next-generation storage project.

The collaborative approach is the most recent example of IBM trying to capitalize on the momentum of the open-source movement. The company also backs the Linux operating system, the "="">Globus Toolkit for supercomputing networks, and several other projects of the collaborative programming movement.

IBM has tapped into the open-source community as a way to speed the development and adoption of technologies it favors and to give itself more cachet with in-the-know programmers. The company devotes many of its own resources to open-source projects, most notably its Linux Technology Center.

Storage Tank--fleetingly code-named Golden Retriever--is a technology designed to get more use out of existing storage systems and make them easier to manage. With Storage Tank, existing systems can be linked, so vaster amounts of data can be stored.

The technology works by using a different way of keeping track of descriptive information--"metadata" such as physical locations, file sizes or access permissions--that accompanies the actual content within the files. Where most storage systems include this metadata in the storage system itself, Storage Tank spreads the information across a group of metadata servers, lower-end dual-processor Intel servers running Linux.

The approach permits several advantages. For one thing, it can keep track of a lot of files. IBM's goal is for the system to control as many as a billion files, said Jai Menon, an IBM fellow and storage research manager at Big Blue's Almaden Research Center.

In addition, files of a certain type can be automatically moved to a particular storage "pool." For example, video and audio streaming files can be physically stored automatically on a particular storage device suited to that task, while infrequently used text files can be stored on a device with lower performance.

The use of the pools in conjunction with preset policies will let administrators automate tasks such as data backup, Menon said.

The system also means the same files can be accessed directly by different operating systems. Currently, because most operating systems have their own ways of storing files, that's difficult to do without using file system software from a company such as Veritas Software.

But for servers with multiple operating systems to tap into Storage Tank, a piece of software called an agent must be running to communicate with the metadata server. IBM plans to release a sample agent program as open-source software, Pease said.

Releasing the example software will permit others to write agents to tap into Storage Tank, Pease said. In addition, IBM will describe the protocols the agents use to communicate with the metadata servers, allowing others to build their own metadata servers if they wish, he said.

IBM hopes its strategy will make Storage Tank widely used. "Our goal for Storage Tank is nothing short of world domination," Pease said, only partly joking.

IBM's Almaden labs are working on several technologies besides Storage Tank, showing off several products in a recent media tour of the facility.

IBM expects only about 60 percent to 65 percent of its research lab projects to reach the product stage, Menon said. "We don't want to be more successful, because if we're not being crazy enough, we're not being challenged enough," he said.

IBM has several other projects stewing at the lab:

• A project code-named SledRunner is designed to give priority access to the programs that need a fast response time from hard drive arrays. Often low-priority jobs take up a storage system's time when those jobs could be postponed a few fractions of a second. The SledRunner name stems from the acronym SLE (service level enforcement).

• IBM is taking a crack at a technology that it acknowledges has flopped in the past--arranging blocks of data on a hard drive in the order it will be needed, which minimizes the time the hard drive has to spend moving mechanical components to grab the next tidbit of information. This project, called Automatic Locality Improving Storage (ALIS), relies on arranging data after monitoring the order in which it's actually used as a computing process runs.

• For a future version of Storage Tank, IBM plans a front-end "gateway" so that remote computers can tap into a tank over a network. It's similar in concept to EMC's Celerra product or Network Appliance's products in a deal with Hitachi Data Systems.

• IBM is working on a "semantic file system," software to make it easier to find a specific file. Current file systems store files in an ever-more-complex cascade of directories, but a better method than indexed contents of files, for example, could help people find what they need faster--"a sort of Google for enterprise file systems," Menon said.

• "Differential remote copy" is a technology to speed up the process of copying data on one storage system to a distant site with the identical data that protects against disasters such as earthquakes. The technique sends only the data that has changed since the last update, Menon said.