X

DNA Gets Artificial Upgrade to Store Humanity's Boundless Digital Data

Scientists add seven new letters to the existing nucleotide alphabet, opening the door for extreme levels of data storage capacity.

Monisha Ravisetti Former Science Writer
Monisha Ravisetti was a science writer at CNET. She covered climate change, space rockets, mathematical puzzles, dinosaur bones, black holes, supernovas, and sometimes, the drama of philosophical thought experiments. Previously, she was a science reporter with a startup publication called The Academic Times, and before that, was an immunology researcher at Weill Cornell Medical Center in New York. She graduated from New York University in 2018 with a B.A. in philosophy, physics and chemistry. When she's not at her desk, she's trying (and failing) to raise her online chess rating. Her favorite movies are Dunkirk and Marcel the Shell with Shoes On.
Monisha Ravisetti
3 min read
gettyimages-1285191763

Could all of humanity's data be transferred to synthetic DNA strands?

Getty/Yuichiro Chino

In the last few years, humanity has created more data than in all of history combined -- a remarkable level of output with no signs of slowing down. But where are we going to put all of it? 

Though scientists are constantly increasing hard drive sizes to hold humanity's information, and many of them believe this could be done indefinitely, some suggest these efforts will eventually be outrun by the exponential rate at which we generate data. In response to such worries, scientists have been looking into a rather unique solution -- storing files, photos and documents on nature's very own information database: DNA. 

DNA is both vast and condensed enough to contain an unfathomable amount of data in hyper small spaces. After all, the double helix strands protect our bodies' entire blueprints while tucked inside cell nuclei merely 10 micrometers wide. Plus, DNA is naturally abundant and can withstand super harsh conditions on Earth. Scientists can even retrieve genetic information from DNA that's several centuries old.

"Every day, several petabytes of data are generated on the internet. Only one gram of DNA would be sufficient to store that data. That's how dense DNA is as a storage medium," Kasra Tabatabaei, a researcher at the Beckman Institute for Advanced Science and Technology, said in a statement. 

Tabatabaei is the co-author of a new study, published in last month's edition of the journal Nano Letters, that may well take the DNA data storage concept to great heights. Essentially, the study team is the first to artificially extend the DNA alphabet, which could allow for massive storage capacities and accommodate a pretty extreme level of digital data. 

Before we dive into the details, here's a quick biology recap. 

DNA encodes genetic information with four molecules called nucleotides. There's adenine, guanine, cytosine and thymine, or A, G, C and T. In a sense, DNA has a four-letter alphabet, and different letter combinations represent different bits of data. With just these four letters, nature can encode the genetic information of every single living organism. So, theoretically, we should be able to store a ton of digital data with this crew of letters, too. But what if we had a longer alphabet? Presumably, that'd give us a much deeper capacity.

Following this line of thought, the team behind the new study artificially added seven new letters to the DNA repertoire. "Imagine the English alphabet," Tabatabei said. "If you only had four letters to use, you could only create so many words. If you had the full alphabet, you could produce limitless word combinations. That's the same with DNA. Instead of converting zeroes and ones to A, G, C and T, we can convert zeroes and ones to A, G, C, T and the seven new letters in the storage alphabet."

Further, ensuring information encoded in these 11 letters can be regurgitated on demand, the researchers also coined a novel mechanism that precisely reads back the synthetic DNA's data. The system uses deep-learning algorithms and artificial intelligence to discern between the human-made DNA letters and natural ones, as well as differentiate everything from one another.

All in all, it provides an extremely clear readout of the DNA's letter combinations, thereby unveiling any and all information hiding inside.

"We tried 77 different combinations of the 11 nucleotides, and our method was able to differentiate each of them perfectly," Chao Pan, a graduate student at the University of Illinois Urbana-Champaign and a co-author on this study, said in a statement, and "the deep learning framework as part of our method to identify different nucleotides is universal, which enables the generalizability of our approach to many other applications."

DNA isn't the only up and coming, innovative way of holding our compounding data. A Harvard University research team, for instance, is working on using neon dyes to encode invaluable information. Still, Tabatabaei remarked, "DNA is nature's original data storage system. We can use it to store any kind of data: images, video, music -- anything."