Science

Researchers store, search, retrieve images in DNA

DNA is the future of data storage, and it just got more viable as a team of researchers find a way to encode direct access.

Michelle Starr Science editor

Michelle Starr is CNET's science editor, and she hopes to get you as enthralled with the wonders of the universe as she is. When she's not daydreaming about flying through space, she's daydreaming about bats.

See full bio

Michelle Starr

April 11, 2016 12:37 a.m. PT

2 min read

The pink smear in the test tube could hold up to 10,000 gigabytes of data.
Tara Brown Photography/ University of Washington

A team at the University of Washington has moved DNA data storage forward a significant step by making the information both searchable and directly accessible.

They encoded four digital images in DNA, and then retrieved them perfectly. A paper released last week details the effort.

The past few years have seen important strides in using DNA to store digital data. In 2012, Harvard researchers demonstrated that 5.5 petabits (5,500 terabits) of data can be stored in a single cubic millimetre of DNA. In 2013, researchers from the European Bioinformatics Institute showed that data could be retrieved by sequencing the DNA.

"Life has produced this fantastic molecule called DNA that efficiently stores all kinds of information about your genes and how a living system works -- it's very, very compact and very durable," paper co-author Luis Ceze, associate professor of computer science and engineering, said last week in a statement.

"We're essentially repurposing it to store digital data -- pictures, videos, documents -- in a manageable way for hundreds or thousands of years."

To store data as DNA, the binary code needs to be converted into the four nucleotides that make up DNA. The DNA is then synthesised with the data encoded. Replicating this DNA is a relatively easy process, which is how Technicolor stored 1 million copies of the same film in a small vial of DNA.

Here are three of the images the team encoded into DNA.
University of Washington

The other really cool part is the direct access (also known as random access) and searchability of the data encoded on the DNA, which eliminates the previous need to sequence the entire DNA to find the information.

"Suppose you have a large amount of information encoded in DNA in a big pool -- think petabytes," Ceze explained. "That information is stored in a large collection of small DNA molecules. How would you read just a small specific part of the data, say a video in a large video collection? Without random access you need to access the whole thing until you find what you want. With random access, you can access the desired data directly."

To achieve this, the team used something called Huffman coding, which is usually used in lossless data compression.

At the moment, DNA data storage is still prohibitively expensive, and the process of DNA synthesis is far from perfect. However, this combination of error correction and random access pushes us further into a future where the equivalent of a Walmart of digital data could be stored in a space the size of a sugar cube.