Start-up Evergrid doesn't claim to know exactly what a computer is thinking. But it is banking on the premise that customers will benefit by periodically capturing a server's state--the data a computer is processing at any one time.
Initially, Evergrid's "checkpointing" software will enable administrators to easily start, stop and switch jobs on Linux computers, CEO Dave Anderson said. Eventually, the company hopes its software will ensure transactions won't get lost even when servers crash.
Evergrid, based in Fremont, Calif., announced Monday at the weeklong SC06 supercomputing show in Tampa, Fla., that its software is available to some customers, with general availability planned for January.
The software initially is geared for high-performance computing customers that typically run jobs in a series of batches. Customers are able to save the state of a particular job--even when it's running across a cluster of hundreds or thousands of computers--to switch among different tasks.
"We do a checkpoint; we stop whatever number of jobs to run the high-priority job, then resume the other jobs," Anderson said.
Checkpointing also can ensure computing tasks are completed. If a computer fails, another machine can load the last saved state of the failed machine and resume where it left off. And Evergrid's software works with either physical or virtual machines, allowing applications to be moved back and forth.
There is a performance penalty to checkpointing, but the company argues that the hit is less than 5 percent. Using a short checkpoint interval means the data difference from one checkpoint to the next is smaller, reducing the penalty of frequent checks.
In the second half of 2007, the company is planning to take its software to the general business market, Anderson said. The intent is that the company will be able to ensure that transactions aren't lost, even as running applications are moved from one machine to another. That version of the software will log a server's communications and the commands made to the operating system, letting those interactions be replayed onto a new system that's restored from a checkpoint.
The company hasn't set pricing yet but plans to charge annual subscriptions based on the number of processor cores, with aggressive volume discounts, Anderson said.
Current customers include an unnamed financial services company and the University of Oklahoma, Anderson said.