DNA as a robust data storage

Biomolecules store digital information reliably

The base code of the DNA is ideally suited as data storage © thinkstock
Read out

Instead of hard drive: researchers have stored large amounts of digital data in DNA. They encoded two megabytes of compressed information in the form of the biomolecules - and then read them out without error. The special feature: The method used by the team is not only efficient, but also extremely robust. Because even from multiple copied DNA, the original data could still recover perfectly.

Humanity is producing more and more data. At the same time, however, digital forgetting threatens because today's storage media are rather short-lived. Due to the constant change of the technologies and the lack of durability of the data carriers, archives of digital data have to be copied regularly today. What is on a normal hard drive could otherwise be unreadable in a few decades.

Long lasting and never outdated

To avoid this problem in the future, scientists are now turning to an ancient storage medium of nature: the DNA. Their base code not only stores the genetic information of living creatures outstanding. He is also good as storage for digital data. Because the code language of nature is very similar to the binary language of computers. The only difference: zeros and ones represent the data on the hard disk, in the DNA they are the bases A, C, T and G.

Compared to solid state storage, however, the biomolecule has a distinct advantage: it can last for thousands of years without damage if stored under the right conditions. "Unlike tapes and CDs, this storage medium will never be outdated - and if it does, we'll have other problems, " says Yaniv Erlich of the New York Genome Center.

Two megabytes as a letter code

Erlich and his colleague Dina Zielinski have now impressively proven how reliably the enormous amounts of data in our society can be stored in DNA. They coded six files as DNA - including a text by information theorist Claude Shannon, a French film, a computer virus and a complete computer operating system. display

To convert these in compressed form a total of two megabytes of data, the researchers used a special algorithm - called Fountain Code - the information randomly bundled into small packages and then reassemble in the correct order. The system encodes the information several times. In this way no data is lost, even if some DNA nucleotides of the later memory should be damaged. The special feature: Despite the addition of redundant information, the coding process is very efficient.

Error-free despite repeated copying

Overall, Erlich and Zielinski thus generated a 72, 000 DNA strand long code in which each strand consists of 200 bases. This sequence of letters sent them to a laboratory that used this template to synthesize DNA molecules. Then came the exciting moment: would the original information be recovered from these molecules?

In fact, the researchers were able to decode their data correctly using software. Much more amazing, however: the same success could be achieved with copies of the DNA. The scientists multiplied the biomolecules again and again using the polymerase chain reaction. These copies as well as copies of these copies and so on could be decoded without error. This shows that the coding process using Fountain code is extremely robust, the team said.

Expensive procedure

The researchers also showed that with their method they can store information not only safely but also efficiently in large quantities. Thus, one gram of DNA is enough to hold 215 petabytes of data. The only catch on the storage process: the cost. $ 7, 000 alone cost to synthesize biomolecules. The researchers spent another $ 2, 000 to read out their data. (Science, 2017; doi: 10.1126 / science.aaj2038)

(Columbia University School of Engineering and Applied Science, 06.03.2017 - DAL)