Supercomputer Beagle can analyze 240 whole genomes in two days

University of Chicago researchers say they used one quarter of the Beagle's operating capacity in conjunction with commercially available software packages to analyze raw sequencing data.

Elizabeth Armstrong Moore
Elizabeth Armstrong Moore is based in Portland, Oregon, and has written for Wired, The Christian Science Monitor, and public radio. Her semi-obscure hobbies include climbing, billiards, board games that take up a lot of space, and piano.
Elizabeth Armstrong Moore
3 min read
Researchers say it would take a single 2.1 GHz CPU roughly 47 years to do what the Beagle completed in 50 hours. University of Chicago

The time and cost of sequencing entire human genomes has dropped dramatically in the 21st century. Unfortunately, it can take months to analyze the results, given that there are 3.2 billion base pairs to decode. (As such, it's become popular instead to focus on the fewer than 2 percent of the genome that codes for proteins, in a process called exome sequencing.)

Now researchers out of the University of Chicago are reporting in the journal Bioinformatics that one of the world's fastest supercomputers devoted to life sciences can analyze 240 full genomes in 50 hours. They estimate that the same task would have taken a single 2.1 GHz CPU more than 47 years.

To be clear, tapping the extraordinary powers of the Beagle (named after the ship that carried then 22-year-old Charles Darwin on his scientific voyage around the world in 1831) isn't going to keep the cost of whole genome analysis down, which is of course an imperative if the tech is to be clinically useful. It does, however, demonstrate that genome analysis has the potential to be far, far faster than it is today.

"The supercomputer can process many genomes simultaneously rather than one at a time," first author Megan Puckelwartz, a postdoctoral fellow at the University of Chicago, said in a school news release. "It converts whole genome sequencing, which has primarily been used as a research tool, into something that is immediately valuable for patient care."

The researchers report that they used one quarter of the Beagle's operating capacity in conjunction with commercially available software packages to analyze raw sequencing data from 61 human genomes, and that using this approach not only improved speed dramatically but also accuracy. This will presumably help reduce the cost of both sequencing and analyzing whole genomes down the road.

In fact, study author Elizabeth McNally, director of the Cardiovascular Genetics clinic at the University of Chicago Medicine, said in the news release that if the cost of analysis can be moved into the $1,000 range, which is the current target on the sequencing side, it will make sense to analyze entire genomes instead of just a small fraction of them.

Because exome sequencing can help spot an estimated 85 percent of mutations that contribute to diseases, this means the other 15 percent won't have to be ignored to keep time and costs manageable.

McNally says that in her own clinic, for example, analyzing a whole genome will have an immediate impact: "In the early days we would test one to three genes. In 2007, we did our first five-gene panel. Now we order 50 to 70 genes at a time, which usually gets us an answer. At that point, it can be more useful and less expensive to sequence the whole genome." Plus, she adds, it often makes sense to do several sequences for certain families: "We start genetic testing with the patient, but when we find a significant mutation we have to think about testing the whole family to identify individuals at risk."

The implications go beyond diagnosis. Spotting genetic mutations early, before a person is symptomatic, can help scientists not only diagnose diseases but also learn more about their very early phases as well as treatment options. "In this setting," McNally adds, "each patient is a big-data problem."