A team at the University of Washington has moved DNA data storage forward a significant step by making the information both searchable and directly accessible.
They encoded four digital images in DNA, and then retrieved them perfectly. A paper released last week details the effort.
The past few years have seen important strides in using DNA to store digital data. In 2012, Harvard researchers demonstrated that 5.5 petabits (5,500 terabits) of data can be stored in a single cubic millimetre of DNA. In 2013, researchers from the European Bioinformatics Institute showed that data could be retrieved by sequencing the DNA.
"Life has produced this fantastic molecule called DNA that efficiently stores all kinds of information about your genes and how a living system works -- it's very, very compact and very durable," paper co-author Luis Ceze, associate professor of computer science and engineering, said last week in a statement.
"We're essentially repurposing it to store digital data -- pictures, videos, documents -- in a manageable way for hundreds or thousands of years."
To store data as DNA, the binary code needs to be converted into the four nucleotides that make up DNA. The DNA is then synthesised with the data encoded. Replicating this DNA is a relatively easy process, which is how Technicolor stored 1 million copies of the same film in a small vial of DNA.
The other really cool part is the direct access (also known as random access) and searchability of the data encoded on the DNA, which eliminates the previous need to sequence the entire DNA to find the information.
"Suppose you have a large amount of information encoded in DNA in a big pool -- think petabytes," Ceze explained. "That information is stored in a large collection of small DNA molecules. How would you read just a small specific part of the data, say a video in a large video collection? Without random access you need to access the whole thing until you find what you want. With random access, you can access the desired data directly."
To achieve this, the team used something called Huffman coding, which is usually used in lossless data compression.
At the moment, DNA data storage is still prohibitively expensive, and the process of DNA synthesis is far from perfect. However, this combination of error correction and random access pushes us further into a future where the equivalent of a Walmart of digital data could be stored in a space the size of a sugar cube.