In our digital age, data generation is exploding at an unprecedented rate. Every minute in 2018 painted a striking picture of our data-hungry world: Google processed 3.88 million searches, while YouTube users watched 4.33 million videos. Simultaneously, people sent over 159 million emails, posted 473,000 tweets, and shared 49,000 photos on Instagram, according to software company Domo. By 2020, the numbers grew even more staggering, with an estimated 1.7 megabytes of data created every second by each person worldwide—translating to roughly 418 zettabytes annually, equivalent to 418 billion one-terabyte hard drives, assuming a global population of 7.8 billion.
This massive data explosion presents two critical challenges. First, our current magnetic and optical storage systems typically deteriorate within a century, if not sooner. Second, maintaining data centers consumes enormous amounts of energy. We’re racing toward a severe data storage crisis that will only intensify with time.
Enter an unexpected solution: DNA-based data storage. Nature’s own information storage system might hold the key to our digital future. DNA, comprising a chain of nucleotides A, T, C, and G, has already proven itself as life’s perfect information storage molecule. What makes it particularly appealing for data storage is its remarkable stability—scientists have successfully sequenced the complete genome of a horse fossil over 500,000 years old. Even more impressive is its storage density: according to calculations published in Nature Materials by Harvard University’s George Church and his colleagues, the simple bacterium Escherichia coli can store about 1019 bits per cubic centimeter. To put this in perspective, a DNA cube roughly one meter on each side could theoretically store all the world’s current annual data.
This isn’t just theoretical science. Researchers are already making significant strides in DNA data storage technology. In 2017, Church’s team at Harvard successfully used CRISPR technology to encode a human hand image into E. coli’s genome, achieving over 90% reading accuracy. Meanwhile, researchers at the University of Washington and Microsoft Research have developed a groundbreaking automated system for writing, storing, and reading DNA-encoded data. Major technology players, including Microsoft and Twist Bioscience, are actively investing in advancing these technologies.
DNA’s potential extends beyond traditional data storage. Scientists are leveraging DNA barcoding—using DNA sequences as molecular identification tags—to accelerate research across various fields. At the Georgia Institute of Technology, James E. Dahlman’s laboratory is using this technology to identify safer gene therapies. Other researchers are applying similar techniques to tackle drug resistance and prevent cancer metastasis.
However, challenges remain before DNA data storage becomes mainstream. The most significant hurdles are the cost and speed of reading and writing DNA, which must decrease substantially to compete with conventional electronic storage. Nevertheless, even if DNA doesn’t become our primary storage medium, it will likely play a crucial role in preserving certain types of data for the long term and generating information at unprecedented scales.
This piece is based on an article by Sang Yup Lee for Scientific American
Sang Yup Lee, co-chairman of the World Economic Forum’s Global Future Council on Biotechnology since 2016, is a professor of chemical and biomolecular engineering at the Korea Advanced Institute of Science and Technology (KAIST) and dean of the KAIST Institute of Science. He holds more than 680 patents.