Figure 1

DNA-based data storage with addition of degenerate bases enables increased information capacity. (A) Binary data is encoded to DNA sequences comprising not only the 4 traditional encoding characters A, C, G, and T but also 11 additional degenerate bases. The length of encoded DNA is less than that of the four-character encoding method. (B) The theoretical information capacity limit is therefore increased from 2 bits/character to 3.9 bits/character. The dots in the graph describe the information capacity values in previous research, and the numbers indicate the corresponding reference. (C) A degenerate base represented by an encoding character describes a mixed pool of more than two types of nucleotides. (D) Degenerate bases can be generated by mixing the DNA phosphoramidites during the synthesis.