Scientists store 214 pentabytes of data in a gram of DNA
Scientists in New York have successfully stored the highest amount of data ever kept in just a single gram of DNA. This breakthrough in data science can change how we keep and retrieve information in the future. The use of DNA for digital storage is appealing in theory because DNA is ultra compact enough to store, replicate and transmit massive amounts of information. This hypothesis was proven in 2012. Harvard University geneticist George Church published a paper describing how he and his colleagues successfully encoded 650 kilobytes worth of data into DNA strands, which contained millions of copies of Church’s 52,000-word book, Regenesis.
Indeed, the proof-of-concept was already a groundbreaking achievement — one that other engineers and biologists would expand upon over the years. But Church and his team’s methods were inefficient, as it could only store 1.28 pentagrams of data per DNA gram.
Now, scientists from Columbia University and the New York Genome Center have created the highest-density DNA data storage ever invented, surpassing Church and his team’s first research.
Led by Yaniv Erlich, the team of engineers successfully stored and retrieved 214 pentabytes of data (214,000 gigabytes) into DNA.
It contained six files: an old French film called The Arrival of a Train at La Ciotat Station, a 1948 scientific research paper, a computer operating system, a $50 Amazon gift card, a photo, and a computer virus. How did Erlich and his team do it? They took advantage of the structure of DNA molecules, which look like twisting ladders denoted by the letters A, C, G and T.
This genetic sequence typically acts as a building block for living things, and if one can convert it into binary numbers 0 and 1, DNA molecules can encode almost anything.
Of course, the process is not that easy because not all DNA sequences are robust enough, said Erlich. What’s more, not all data stored in DNA can be retrieved successfully.
To solve these issues, Erlich and his colleagues made use of a fountain code to gatekeep the code. This DNA fountain provides unlimited number of clues to the code rather than storing the code itself.