Abstract
Canonical nucleic acids (DNA and RNA) naturally store genetic information with high density and programmability, making them promising candidates for molecular data storage. However, their susceptibility to degradation under harsh conditions, such as extreme pH, nuclease activity, and chemical attack, limits practical applications. In contrast, non-canonical nucleic acids (ncNAs) with natural or synthetic structural modifications exhibit enhanced stability and unique functional potential. This review systematically summarizes the fundamental properties of ncNAs, evaluates their suitability for molecular data storage, and discusses how their distinctive advantages may overcome the intrinsic limitations of canonical nucleic acids while addressing challenges in next-generation storage systems.
Similar content being viewed by others
Introduction
The digital evolution of information technology has been driven largely by continuous innovations in data storage technologies. Since the mid-20th century, silicon-based semiconductors have enabled the explosive growth of the information age1. However, as global data volumes are projected to exceed 175 zettabytes by 20252, conventional storage media, including magnetic devices, optical discs, and semiconductor memories, are approaching their physical and economic limits. Despite magnetic storage boasts theoretical maximum capacities of ~1014 bits/cm³ (ref. 3), real-world devices typically last fewer than 50 years even under ideal conditions4, are highly sensitive to environmental stress, and require prohibitive costs for long-term data maintenance5. Besides, the cost of maintaining data over decades is also prohibitively high, such as the $1 billion per decade required for maintaining 109 GB of tape storage6. Furthermore, the operation of large-scale data centers contributes significantly to global energy demand ( ~ 5% of electricity use) and greenhouse gas emissions ( ~ 2%), underscoring an urgent need for sustainable alternatives7,8. To enhance data storage capacity and durability while reducing costs, various novel data storage media have been developed. Glass-based systems, such as 5D optical storage in fused silica, leverage femtosecond laser writing to create nanostructured voxels with ultra-high density (up to 360 TB/disc) and longevity ( > 14 billion years)9. Metamaterial-based technologies expand possibilities through engineered electromagnetic properties, enabling terahertz-frequency data encoding and bypassing classical diffraction limits10. Other emerging platforms include phase-change materials, which utilize reversible structural transitions for non-volatile memory11.
As the most compact and durable data carrier, DNA has emerged as a promising medium for molecular data storage, offering ultrahigh storage density (∼455 EB/g12), minimal energy consumption5, and long-term stability13,14, as evidenced by successful retrieval of genetic material from ancient specimens dating back hundreds of thousands of years15. DNA’s well-understood biochemistry and compatibility with existing molecular biology techniques (e.g., PCR, sequencing) further enhance its attractiveness as a storage substrate. Additionally, DNA can be preserved and maintained under mild conditions, and biochemical experiments such as PCR or cellular proliferation require minimal energy input5, significantly reducing the cost of data storage, replication, and transmission.
Nevertheless, several challenges hinder the widespread application of DNA as a material for data storage, including poor chemical and biological stability in the open environment, restricted chemical diversity, and constrained storage capacity. DNA is particularly vulnerable to DNase16,17, extreme pH conditions18, ultraviolet radiation19, and various chemicals which can induce strand breaks and result in information loss20,21. Such DNA-damaging factors are commonly encountered in practical applications, posing significant limitations to its broader use in these fields.
To address these limitations, several innovative strategies have been developed to enhance their stability, functionality, and range of applications. These strategies include the use of DNA nanostructures (such as DNA origami)22,23, chemical modifications (such as non-canonical nucleic acids, short for ncNAs)24,25, encapsulation by hydrogel26, metal-organic frameworks27, and SiO2 particles28. Each approach targets different aspects of nucleic acid limitations and contributes to their broader application in data storage29, molecular computing30, diagnostics31, drug delivery32, and synthetic biology33.
Among these strategies, one of the most attracting approaches is the utilization of ncNAs with naturally or chemically modified in base, sugar, backbone, or adopted mirror-image isomerism34. Compared with canonical nucleic acids such as DNA or RNA, these evolutionary or artificially designed alternations generally lead to distinctive properties, display unique advantages in data storage35. (i) The incorporation of ncNAs broadens the information-encoding alphabet and enhances storage capacity. (ii) NcNAs exhibit unique orthogonality, enabling data encryption and multithreaded molecular computation, which enhances the security of data storage and computational complexity. (iii) Some of the endogenous modifications on nucleic acids provide intrinsic advantages in data storage, editing, and erasure in living systems. (iv) Modifications on bases, sugars, and backbones bring changes in physical, chemical, and biological properties, leading to enhance their stabilities against attacking from open environment. Hence, ncNAs emerge as promising materials in data storage.
Structure-property relationships of non-canonical nucleic acids
Structure diversity of non-canonical nucleic acids
NcNAs, defined by chemical modifications beyond the canonical nucleotides, are generally classified into several categories: nucleobase modifications (including both naturally occurring non-canonical bases and engineered unnatural base pairs, UBPs), sugar modifications, phosphate backbone modifications, and mirror-image isomerism34 (Fig. 1).
Overview of respectative ncNA modification strategies, including base, sugar, backbone, multiple modifications, and mirror-image isomerisms.
Base modifications aim to expand the genetic alphabet and diversify base-pairing capabilities36. Naturally occurring bases such as 2, 6-diaminopurine (short for 2-amino-A), which forms three hydrogen bonds with thymine (T), can increase the hybridization affinity of DNA duplexes37,38. Epigenetic base modifications, including N6-methyldeoxyadenosine (6 mA), C5-methylcytosine (5mC), and their oxides (e.g., N6-methyladenosine, 6hmA, and 5-hydroxymethylcytosine, 5hmC), play essential regulatory roles in gene expression39.
Base modifications like 5-fluorouracil (5-F-U)40, 5-chlorouracil (5-Cl-U)41, 5-bromouracil (5-Br-U)42, and 5-iodouracil (5-I-U)43 expand the chemical diversity of nucleic acids, enabling enhanced base-pairing interactions and offering potential for fine-tuned regulation of DNA hybridization and stability. Engineered UBPs have extended the repertoire of orthogonal pairing interactions. Up to 12 synthetic bases have been developed (A, T, G, C, B, S, P, Z, X, K, J, and V)44, forming six mutually exclusive base pairs. Notably, Hachimoji DNA and RNA systems constructed from eight synthetic nucleotides (dZ, dP, dS, dB, Z, P, S, and B) allow for enhanced data storage capacity45. In addition, base pairs such as NaM-5SICS and TPT3-NaM, which rely on hydrophobic rather than hydrogen-bonding interactions46,47,48, have demonstrated the feasibility of alternative pairing mechanisms for information encoding and retrieval. Moreover, the unnatural pair between 7-(2-thienyl)-imidazo[4,5-b]pyridine (Ds) and 2-nitro-4-propynylpyrrole (Px), in which Px contains a five-membered pyrrole ring49, illustrates further structural and steric complementarity as a design principle50. In addition, the isoC-isoG base pair, which is composed of 6-amino-2-ketopurine (isoG) and 2-amino-4-ketopyrimidine (isoC) and is a structural isomer of the G-C base pair, forms a three-hydrogen-bond pair through a different hydrogen-bonding geometry, demonstrating the possibility of alternative base-pairing modes51.
Sugar modifications involve substitution of the native ribose or deoxyribose sugars with alternative chemical structures, significantly influencing the conformational flexibility, thermodynamic stability, and enzymatic recognition of the modified nucleic acids52. For instance, 2′-O-methylation (2′-OMe) is a common nucleotide epitranscriptomic modification found in RNA, whereby a methyl group is added to the 2′ hydroxyl group of the ribose moiety53. Likewise, in 2′-F-RNA, a fluorophore replaces the 2′-OH group of RNA, forming more stable duplexes with RNA54. Arabino nucleic acid (ANA), which has the 2′-OH in a configuration opposite to that of RNA, adopts a DNA-like conformation55. In 2′-deoxy-2′-fluoroarabino nucleic acid (FANA), the ribose is replaced by a 2′-fluoroarabinose moiety that adopts a C2′/O4′-endo conformation, reminiscent of B-form DNA56, thereby enhancing hybridization affinity57. Threose nucleic acid (TNA), comprising a four-carbon sugar and a 2′,3′-linked phosphodiester backbone58,59, adopts a B-helical conformation with an average twist of 36° and a helical rise of 3.2 Å per nucleotide60,61, and demonstrate robust hybridization with both DNA and RNA. Locked nucleic acid (LNA) incorporates a methylene bridge connecting the 2′-oxygen and 4′-carbon of the ribose ring, conformationally locking the sugar in a C3′-endo pucker, thereby increasing thermal stability and affinity62,63. 1,5-Anhydrohexitol nucleic acid (HNA), derived from a six-membered hexitol ring, features 2′,3′-dideoxy-1′,5′-anhydro-D-arabino-hexitol nucleosides with 4′−6′ phosphodiester linkages and base attachment at the 2′-position64. HNA forms stable antiparallel duplexes with RNA, underscoring its structural versatility65. Glycol nucleic acid (GNA) features a backbone composed of repeating glycol units linked by phosphodiester bonds66, with nucleobases attached to the glycol units, and it has been investigated for its potential in various biotechnological applications. L-alpha-threose nucleic acid (L-αTNA) and D-alpha-threose nucleic acid (D-αTNA) are a pair of nucleic acid analogues that share a threose sugar backbone but differ in the stereoconfiguration of the sugar (L- and D- respectively), with L-αTNA often exhibiting enhanced stability and specific binding characteristics while D-αTNA may display distinct physical-chemical properties due to its opposite sugar configuration67. Xylonucleic acid (XyloNA), which ribose is replaced by xylose in the backbone of nucleic acids, displays a higher thermal stability than DNA or RNA in self-hybridization, and could neither pair with DNA nor RNA68.
Phosphate backbone modification involve chemical alternations to the phosphate backbone, resulting in changes to strand conformation, charge distribution, and biochemical stability69. A well-known backbone modification is the introduction of phosphorothioate linkages (PS-DNA), wherein one of the non-bridging oxygen atoms in the phosphate group is substituted by sulfur, enhancing nuclease resistance and altering electrostatic properties70,71. The N3’ → P5’ phosphoramidate DNA (3′-NP DNA) is a synthetic oligonucleotide analogue in which the 3′-oxygen of the natural phosphodiester backbone is replaced by an amino group, resulting in an N → P linkage72. Peptide nucleic acids (PNAs) exemplify this strategy by replacing the phosphate-sugar backbone with a neutral N-(2-aminoethyl)glycine moiety, thereby conferring resistance to nucleases and proteases73, and enabling high-affinity hybridization via Hoogsteen-like base pairing74. Serinol nucleic acid (SNA) is an acyclic phosphodiester backbone based on serinol (2-amino-1,3-propanediol).
With increasing availability and characterization of individual modifications, combinatorial strategies are emerging that synergistically integrate multiple modifications to harness their respective advantages. For example, antisense oligonucleotides incorporating PS-FANA exhibit both high binding affinity to target RNAs and efficient RNase H-mediated cleavage75. Similarly, TNA-TPT3/NaM constructs combine the structural stability of TNA with the enhanced data storage capacity of TPT3/NaM UBPs, expanding the chemical diversity and functional repertoire of synthetic genetic systems76.
Mirror-image isomerism, such as in L-DNA, an enantiomer of natural D-DNA, confers enhanced resistance to nuclease-mediated degradation due to the chiral specificity of biological enzymes77. In addition to increased biological stability, L-DNA exhibits reduced immunogenicity, making it a promising scaffold for therapeutic and diagnostic applications78.
In summary, the structural diversity of ncNAs imparts a rich spectrum of physicochemical and biological properties, including improved stability, expanded base-pairing systems, enhanced storage capacity, and increased functional versatility. These attributes underlie their potential for diverse applications in molecular data storage, computation, and synthetic biology, as discussed in the following sections.
Special properties of non-canonical nucleic acids
NcNAs exhibit enhanced chemical and biological stability due to their distinctive structural features (Table 1). Structural modifications on sugar, backbone, or bases often introduce conformational constraints and steric hindrance, which reduce susceptibility to enzymatic degradation by nucleases. Such modifications can also alter the electrostatic potential and hydrogen-bonding profile of the nucleic acid, thereby resist chemical or physical attacking. Additionally, the incorporation of hydrophobic moieties reduces solvent accessibility, further minimizing hydrolytic cleavage.
For example, synthetic analogues such as HNA79, L-αTNA80, PS-DNA81, NP-DNA72, and L-DNA78,82, demonstrate remarkable resistance to nuclease-mediated degradation owing to their modified sugar or backbone chemistries. In terms of chemical stability, several ncNAs, including FANA54, TNA83, and PNA84,85, display resistance under acidic conditions. Furthermore, modifications at the 2′-position of the ribose, such as 2′-F-RNA and 2′-OMe-RNA, are well known to significantly enhance resistance to hydrolysis86. NP-DNA also exhibits strong tolerance to divalent and transition metal ions87, underscoring its suitability for chemically challenging environments.
Although structurally distinctive from canonical nucleic acids, many ncNAs retain the capability for Watson-Crick base pairing and programmable hybridization, often displaying distinctive thermodynamic properties comparable to those of canonical duplexes88. These hybridization affinity, strand selectivity, and overall duplex stability are governed by van der Waals forces, electrostatics, hydrogen-bonding, and solvation energetics.
Sugar-modified nucleic acids such as FANA89, HNA90, and L-αTNA91can form stable homoduplexes as well as heteroduplexes with DNA or RNA. The melting temperature (Tm), defined as the temperature at which the hybridized and dissociated states of a duplex coexist with equal probability, is a widely used parameter for assessing duplex thermal stability and hybridization strength. Several ncNAs that modifications on sugar exhibit elevated Tm values relative to canonical nucleic acids. Unlike DNA, nucleic acids with 2′-substituted sugar modifications contribute to the pre-organization of sugar, which entropically facilitate hybridization89. For example, ANA increases Tm by ~ 10.2 °C per incorporation65. LNA raises Tm by ~ 10 °C per substitution92. However, Tm of TNA-DNA is lower than DNA-DNA hybridization93 due to the distance between two adjacent phosphate groups of TNA cannot perfectly match with DNA94. The geometric mismatch within the TNA–DNA duplex leads to asymmetric “breathing” fluctuations of the TNA-DNA duplex95, thus increases the off rate from hybridized TNA-DNA duplex96. For clarity, Table 2 summarizes average Tm shifts across representative systems.
Hybridization affinity and thermal stability are critical not only for molecular durability but also for controlled data retrieval. Duplex denaturation and re-annealing, required during PCR, capture, or strand-displacement readout, are governed by Tm-dependent kinetics (kon/koff). Insufficient stability risks spontaneous strand loss. Moreover, strong duplex pairing can protect nucleic acids from hydrolysis and nuclease attack. However, excessive stability hampers deliberate separation and slows amplification. The self-folding may also increase and suppress the processes of data read-out. Hence, it is essential to balance sequence composition, modification density, and readout conditions for optimized storage performance.
Overview of nucleic acid-based data storage
Owing to their high storage density and exceptional programmability, nucleic acids have emerged as a highly promising substrate for molecular data storage and have attracted substantial research interest in recent years. Nucleic acid-based data storage generally comprises four key steps: encoding, writing, preservation, and reading. The stored data can further be processed by molecular computation based on the communication (hybridization) within nucleic acid strands. In the encoding step, arbitrary information, including text, images, audio, and video, is first converted into binary strings (“0” and “1”) using standardized character encoding tables such as ASCII, UTF-8, or Unicode. These binary sequences are subsequently mapped onto nucleotide sequences97 or higher-order nanostructures. The encoded oligonucleotides are typically synthesized through non-enzymatic or enzymatic approaches and preserved either in vitro or in vivo to ensure long-term stability. Information retrieval is commonly performed via sequencing or non-sequencing techniques. Notably, due to inherent challenges associated with the direct synthesis and sequencing of ncNAs, their systems often require additional processing steps, including transcription and reverse transcription98.
Storage units
In canonical sequence-based storage, information is encoded using fixed binary-to-nucleotide conversion rules or advanced encoding schemes such as Goldman encoding99, Grass encoding, and DNA Fountain100. The simplest binary mapping (e.g., A = 00, C = 01, T = 10, G = 11) achieves a theoretical data storage capacity of 2 bits per nucleotide101.
While sequence-based approaches have demonstrated remarkable potential in molecular data storage, they remain constrained by several inherent limitations of canonical nucleic acids. These include susceptibility to enzymatic degradation, limited chemical diversity, and a restricted four-letter genetic alphabet, which historically capped the maximum theoretical storage capacity at 2 bits per base (Fig. 2a-i)102,103. However, recent advancements in encoding techniques, such as the use of DNA-QLC104, have shown that it is possible to surpass 2 bits per base limit, enabling much higher storage densities.
a Sequence-encoded strategies. i Canonical 4-letter DNA encodes 2 bits per nucleotide. ii Unnatural base pairs (UBPs), such as the 12-letter AEGIS system, increase the theoretical storage capacity to 3.58 bits per nucleotide according to log2 (N). iii The relationship between alphabet size (N) and coding capacity is shown. iv Sugar-modified nucleic acids (e.g., TNA) improve biochemical stability and data integrity. v Mirror-image (L-) nucleic acids offer high bio-orthogonality and have been explored for steganographic applications, and epigenetic modifications enable multi-layered coding and parallelized storage. b Nanostructure-encoded strategies. Static DNA structures: i DNA hairpins, ii DNA origami, and iii DNA tile arrays provide robust and addressable frameworks. Dynamic DNA structures: iv temperature-responsive structures, v pH-responsive structures, and vi light-responsive azobenzene-modified systems enable reversible signal switching between distinctive bit states. In this figure (a) (vi, “Epigenetic modification”) is reproduced with permission from Zhang, C. et al., Nature 634, 824–832 (2024). © 2024 Springer Nature. All graphical elements in this figure were originally created by the authors using Adobe Illustrator.
To address these challenges, ncNAs are used in data storage. These engineered ncNAs offer expanded coding capacity105, enhanced chemical and biological stability83, and greater compatibility with parallel106, exhibiting attractive potential as high-capacity, high-stability, and secure molecular data storage candidates. The theoretical storage capacity of ncNAs reaches \(\log {2}^{N}\,\) bit per ncNA incorporation (\(N\)), suggesting ncNA may largely increase the capacity on a log scale compared with only four bases system (Fig. 2a-ii, iii). For example, incorporation of UBPs, such as B-S and Z-P pairs, further expands the information alphabet and increases theoretical data capacity to 3 bit/nucleotide (Fig. 2a-ii)107,108. In addition, sugar-modified systems such as TNA can stably encode binary data using canonical mapping strategies (Fig. 2a-iv)76,109,110. Additionally, L-DNA, the mirror-image enantiomer of natural D-DNA, maintains equivalent theoretical storage capacity while offering high bio-orthogonality, which greatly enhances resistance to natural nuclease degradation (Fig. 2a-v)78. Furthermore, epigenetic modifications based 5 mC have been explored, in which enzymatic methylation of self-assembled DNA scaffolds using DNMT1, overcoming the slow kinetics and high cost of de novo synthesis (Fig. 2a-vi)106. Collectively, these strategies enable non-canonical systems to surpass canonical DNA in both security and durability, paving the way for robust, orthogonal molecular data storage platforms.
While sequence-based approaches primarily rely on one-dimensional arrangements of canonical or modified nucleotides to represent digital information, DNA nanostructures leverage spatial folding and higher-order assembly to encode data in multiple dimensions (Fig. 2b)111. This structural encoding can substantially increase the data storage capacity, introduce additional layers of security, and diversify retrieval modalities.
For example, DNA hairpin structures further enrich the encoding capacity by exploiting the formation or disruption of secondary motifs. The presence or absence of such hairpins can be detected by fluorescence resonance energy transfer or other optical readouts, enabling simple and robust binary encoding schemes (Fig. 2b-i).
DNA origami utilizes long single-stranded scaffolds folded by numerous short staple strands into well-defined two- and three-dimensional architectures (Fig. 2b-ii), which are capable of representing digital symbols, images, or complex patterns that are readily visualized by atomic force microscopy (AFM) or transmission electron microscopy (TEM)112. In contrast to linear sequences, such spatial configurations allow the direct storage of recognizable shapes or logos, circumventing the need for base-by-base sequencing during retrieval.
Similarly, tile-based DNA arrays self-assemble into nanoscale lattices in which the presence, absence, or arrangement of specific tiles encodes binary or multilevel information (Fig. 2b-iii)111. These lattice architectures support parallel data storage and can be interrogated through surface imaging techniques or hybridization-based probing.
In this context, recent advances in deep learning models capable of accurately predicting the folding free energy of DNA sequences offer powerful tools to preemptively eliminate high-propensity hairpin-forming regions113, thereby streamlining the design of reliable and structurally defined storage architectures.
Dynamic DNA devices extend this paradigm by incorporating responsive elements that reversibly switch between distinct structural states in response to external stimuli such as temperature (Fig. 2b-iv)114, pH (Fig. 2b-v)115,116, or light117,118 (Fig. 2b-vi). This dynamic behavior enables rewritable and reconfigurable data storage systems that are challenging to achieve with purely sequence-based approaches. Furthermore, the kinetics and dynamics of oligonucleotide hybridization systematically elucidate how sequence composition, environmental conditions, and conformational transitions profoundly influence pairing rates and stability119, providing a fundamental framework for designing responsive and high-fidelity DNA storage architectures.
In contrast to DNA and RNA, ncNAs are typically employed for the assembly of relatively smaller nanostructures because of the difficulty of the availability of long stranded ncNAs, especially modification on sugars. Most of the strands in these ncNA nanostructures are obtained by solid-phase synthesis. Compared to DNA nanostructures, these ncNA nanostructures display unique advantages, for example, double crossover nanostructures constructed by five FANA strands have higher thermostability and acid-resistance ability120. Nucleoside analogs floxuridine (FUDR) and gemcitabine (GEM) contained RNA oligonucleotides can be assembled into four-way junction structures, which have high efficacy in the treatment of Triple-Negative Breast Cancer121. To construct large ncNA nanostructure, long stranded ncNAs are required. These long stranded ncNAs should be transcribed by polymerase. Dr. Alexander I. Taylor et al.122. constructed a full FANA octahedron origami structure. The long-stranded FANA ( ~ 1.7kbp) was synthesized by transcription with Tgo-D4K polymerase using long single-stranded DNA template. Recently, a ~2000 nt nucleoside analogues (m5C, s2U, 5FU, and PseudoU) integrated single-stranded RNA origami has been reported synthesized by T7 polymerase, which can induce epigenetic immunomodulation123. With the increasing number of ncNA polymerases obtained through directed evolution, ncNA origami or other large-scaled nanostructures will emerge. Collectively, these strategies illustrate how the programmability and architectural versatility of DNA nanotechnology can transcend the limitations of linear nucleotide sequences, offering a promising platform for high-capacity, secure, and multidimensional molecular data storage.
Writing
Writing processes in nucleic acid-based systems are typically classified into non-enzymatic and enzymatic synthesis methods.
Solid-phase synthesis using phosphoramidite chemistry remains the widely adopted technique for synthesizing non-canonical oligonucleotides, including PNA, LNA, and TNA variants124. This method involves the stepwise coupling of protected nucleoside phosphoramidites to a growing oligonucleotide anchored on a solid support (Fig. 3a–i). However, synthesis efficiency is limited by side reactions (e.g., deletions, insertions, and substitutions) and cumulative coupling inefficiencies ( ~ 95–99.5% per cycle35), which generally restrict product length to under 120 nt125,126. Additionally, the technique demands tightly controlled environmental conditions and specialized reagents, limiting scalability and increasing cost (Table 3)127,128. Microarray-based synthesis offers an alternative for high-throughput production, leveraging parallel phosphoramidite reactions across surface-bound arrays129.
a Strategies for strands synthesis through non-enzymatic and enzymatic approaches: i solid-phase synthesis, ii metal-catalyzed ligation, iii polymerase-mediated extension, iv ligase-assisted assembly, and v TdT-mediated synthesis. b Preservation formats: i in vitro storage via solution, powder, silica encapsulation, and fiber embedding; ii in vivo storage through cellular transfection and genomic integration. c Retrieval of stored data: i physical extraction, ii PCR amplification, and iii CRISPR-Cas-based programmable editing. d Readout technologies: i Sanger sequencing, ii next-generation sequencing, iii nanopore sequencing, iv atomic force microscopy, v gel electrophoresis, and vi fluorescence microscopy. In this figure (d) (“Reading”) is reproduced with permission from Li, B. et al., Advanced Materials 37, 2412926 (2025). © 2025 Wiley-VCH GmbH. All graphical elements in this figure were originally created by the authors using Adobe Illustrator.
Template-directed chemical ligation offers an alternative route, especially valuable for substrates incompatible with enzymatic systems. In this strategy, phosphate groups are activated by coordination with N-cyanoimidazole (CNIm) and Mn²⁺ ions, enabling nucleophilic attack and subsequent phosphodiester bond formation (Fig. 3a-ii)130. This method has been successfully applied to the synthesis of chemically diverse backbones such as L-αTNA and 3’-NP-DNA131, supporting the incorporation of non-standard linkages with high fidelity.
DNA and RNA polymerases can incorporate modified triphosphates (xNTPs) into growing strands under physiological conditions (Fig. 3a-iii)132. Through directed evolution and rational design, engineered polymerases have been developed to accommodate specific non-canonical substrates133,134 (Table 4). These enzymes support phosphodiester bond formation at the 3′-OH via sequential nucleotidyl transfer, and in specific mutant contexts, can even facilitate phosphoramidate bond formation135. Recent advances involving trivalent ions (e.g., Sc³⁺) have significantly accelerated reaction kinetics87.
Template-directed ligation offers a complementary approach for synthesizing long non-canonical strands without the need for monomer synthesis (Fig. 3a-iv). T3, T4, and T7 DNA ligases have been employed to ligate various modified strands, including FANA136, TNA, and LNA137, with high efficiency and sequence specificity. This method allows modular assembly of oligonucleotide segments (20–120 nt), facilitating the construction of longer polymers from shorter synthetic units.
TiEOS leverages the template-independent activity of terminal deoxynucleotidyl transferase (TdT), which adds modified dNTPs to the 3′ end of ssDNA without a templating strand (Fig. 3a-v)138. TdT exhibits remarkable substrate tolerance, enabling incorporation of LNA139, 2′-OMe133, 2′-F133, and dTPT3TP140. TiEOS provides a programmable, modular framework for synthesizing structurally diverse nucleic acid libraries suitable for data storage and encryption. In parallel, cap-free TdT synthesis strategies combined with trit-based encoding and enzymatic length control have demonstrated scalable, low-cost production of high-fidelity DNA data pools, further extending the practical potential of enzyme-driven storage architectures141.
Preservation
Efficient preservation of nucleic acid molecules is a critical prerequisite for long-term data integrity in molecular data storage systems. Preservation strategies can be broadly classified into in vitro and in vivo preservation (Fig. 3b), each offering distinct advantages in terms of storage capacity, stability, scalability, and flexibility.
In vitro preservation methods protect DNA outside living systems through physical and chemical stabilization. Approaches such as silica encapsulation, glass immobilization, and polymer-based matrices have enabled long-term storage under ambient conditions by minimizing hydrolysis and oxidation (Fig. 3b, left)5. DNA embedded in electrospun and polymer fibers also offers a scalable, solid-state format with high capacity and mechanical stability142, enabling protection against environmental degradation while maintaining accessibility for downstream retrieval.
In contrast, in vivo preservation strategies integrate synthetic DNA constructs into biological hosts such as plasmids, artificial chromosomes, microbial or plants143,144,145,146. These systems benefit from the host’s intrinsic replication and repair mechanisms147 (Fig. 3b, right), which not only amplify stored sequences but also correct base-level errors and support dynamic, programmable data editing148,149.
However, both in vitro and in vivo platforms must contend with the chemical vulnerability of natural DNA to environmental perturbations150,151. In this regard, ncNAs such as TNA and L-DNA exhibit markedly enhanced stability due to their resistance to enzymatic degradation and limited interaction with endogenous cellular machinery78,109. These properties render chemically modified systems particularly attractive for applications requiring long-term archival storage and data integrity in hostile environments.
Readout
The final stage of nucleic acid-based data storage involves the retrieval and decoding of stored information. This process typically comprises two sequential steps: random access retrieval from complex DNA pools, and readout of the encoded information, either by sequencing or structural analysis152.
Selective retrieval of target DNA molecules from complex DNA pools is commonly achieved through physical extraction, PCR via designed primers, CRISPR-based techniques153, or digital microfluidics154. Magnetic bead-based physical extraction utilizes sequence-specific hybridization and magnetic separation for high-throughput, lossless isolation (Fig. 3c-i). PCR amplification, a widely employed method due to its specificity and scalability, is capable of retrieving target sequences from DNA pools containing millions of distinct oligonucleotides (Fig. 3c-ii)155. However, PCR is limited by primer design constraints, susceptibility to non-specific amplification, and incompatibility with rewritable storage architectures129,156. Programmable CRISPR-based DNA storage systems enable precise data manipulation and retrieval (Fig. 3c-iii)157, exemplified by Cas12a-driven multiplexed searches and Cas9-mediated rewriting via homology-directed repair158,159. Digital microfluidics divide large DNA pools into spatially segregated subpools on microfluidic chips to minimize strand interactions and enhance retrieval efficiency27,160,161. Despite improvements, this strategy still faces limitations due to molecular crowding and amplification-induced errors162,163.
For non-canonical systems (e.g., TNA, L-DNA), retrieval is typically performed using PCR-based methods78,110. Nevertheless, CRISPR-guided retrieval remains a promising direction, particularly when adapted to recognize unnatural bases.
Readout of stored information is predominantly performed using sequencing or non-sequencing readout techniques depending on the storage units.
Canonical DNA storage systems rely heavily on first-generation (Sanger) sequencing (Fig. 3d-i), next-generation sequencing (NGS) (Fig. 3d-ii)164 and third-generation (Nanopore-based) sequencing (Fig. 3d-iii), which enables long-read, real-time monitoring of DNA strands. Nanopore sequencing, in particular, has emerged as a versatile platform for direct reading of chemically modified nucleic acids165. For example, FANA strands can be read using Nanopore Induced Phase-Shift Sequencing (NIPSS) after ligation with DNA drive strands166. Phosphorothioate (PS) modifications can be detected via the ONT/ELIGOS platform167. DeepMod, a deep learning framework built on ONT data, achieves high accuracy in detecting epigenetic marks such as 5-methylcytosine (5mC, ~99%) and N6-methyladenine (6 mA, ~90%)39.
When direct sequencing is not feasible, many systems rely on reverse transcription (RT) of modified strands into canonical DNA or RNA88. TNA-to-cDNA conversion, followed by Illumina sequencing, has demonstrated high fidelity ( ~ 99.2%)110. Base substitutions during RT enable detection of Z-P168, Ds-Px169, and TPT3-NaM170 pairs using conventional sequencing platforms (Sanger, NGS, or Nanopore). Although these methods expand compatibility, base conversion strategies often introduce mutagenic noise and sequence bias, particularly during multiple RT-PCR cycles, thereby compromising accuracy and potentially obscuring encrypted information78. In Table 5, we summarize the sequencing methods of ncNAs.
For storage platforms that employ structural or conformational encoding, information retrieval is typically achieved through microscopy-based readout methods. AFM provides nanometer-resolution imaging of nanoscale motifs, enabling direct detection of features such as bulges, nicks, or conformational switches (Fig. 3d-iv)171. Gel electrophoresis offers a sequencing-free readout modality by distinguishing structural variants according to their differential migration rates (Fig. 3d-v). More recently, super-resolution microscopy techniques, such as stochastic optical reconstruction microscopy and DNA points accumulation for imaging nanoscale topography (DNA-PAINT), have enabled multiplexed, high-throughput decoding of DNA nanostructures by resolving structural states at the single-molecule level (Fig. 3d-vi)172,173. These sequencing-independent strategies are particularly advantageous for chemically diverse or orthogonal nucleic acid systems, as they bypass the limitations of base-calling algorithms and sequence fidelity. Importantly, structure-based readout approaches provide unique opportunities for secure or tamper-proof storage, where critical information is embedded in physical topology and spatial configuration rather than in linear sequence, rendering it inaccessible to conventional sequencing platforms.
Molecular computation
Similar to silicon-based electronic computers, DNA with stored data can also perform processing on the data. Benefiting from the molecular recognition capability of DNA and their data storage capacity, molecular computation development based on DNA has been realized. Since Adleman pioneered the use of DNA computing to solve the Hamiltonian Path Problem174 in 1994, DNA has been considered as a powerful molecular computing tool for solving intricate problems. Based on Watson-Crick-Franklin base pair interaction, primary DNA logic gates such as YES, NOT, OR, and AND gate have been designed175,176,177, as well as more complex circuits and networks have been developed178. A landmark breakthrough was reported in 2006 by Erik Winfree179 employing toehold mediated strand displacement enables the cascading of computations and scaling up. After that, Qian et al.180 further designed seesaw gate-based logic circuits and achieved DNA based neural networks. Kevin M. Cherry et al. even demonstrated the implementation of a DNA-based supervised learning network181, highlighting the powerful role of DNA as a programmable and dynamic material in molecular computation. A rechargeable computation system in DNA logic circuits and neural networks manipulated by heat was also established, suggesting DNA-based computation may develop in a more sustainable manner than other artificial systems181,182. Recent advances in biophysical techniques, particularly single-molecule fluorescence and magnetic tweezers, have empowered DNA computation performed at the single-molecule level183,184, thereby permitting real-time and controllable monitoring of the computing process and enhancing its interactivity.
However, current DNA computing still faces certain challenges, particularly in terms of low orthogonality, which may arise as computation scales up, since greater circuit or network complexity increases the likelihood of unwanted leakage reactions from mismatched interactions. A promising strategy is to utilize the intrinsic high orthogonality of ncNAs. For example, L-DNA does not directly recognize natural DNA or RNA185, but can communicate with them mediated by PNA186 or chimeric D/L-oligonucleotides187. An amplification circuit composed of D-αTNA is orthogonal to L-αTNA, DNA and RNA, but can be initiated by itself or SNA188. Conceptually, introducing ncNAs enhances the parallelism of nucleic acid computing, accelerating computation and extending its scalability.
Advantages of non-canonical nucleic acids-based data storage
NcNAs, owing to their diverse chemical structures and physicochemical properties, present several notable advantages over their canonical counterparts for digital data storage (Table 6). These include expanded genetic alphabets, enhanced chemical and biological stability, and parallelized data writing.
Expanded genetic alphabet
A major limitation of canonical DNA storage lies in its dependence on four standard nucleotides (A, T, G, C), restricting the maximum theoretical data storage capacity to 2 bits per base. UBPs expand this alphabet and thus enable higher-capacity storage48,189. For instance, a 12-letter artificially expanded genetic information systems (AEGIS), including synthetic bases such as P, Z, B, S, X, J, K, and V, can theoretically encode up to 3.58 bits per base (log₂12)44, offering a significant improvement over canonical systems. Hydrophobic base pairs such as NaM-TPT3 further contribute to this expansion and enhance the functional diversity of the storage system190.
Enhanced stability for long-term preservation
The chemical and enzymatic stability of the storage medium is critical for data integrity. Sugar-modified nucleic acids such as TNA exhibit enhanced biochemical and thermal stability, remaining intact even after prolonged exposure to biological environments such as human serum, where canonical DNA undergoes rapid degradation110. Moreover, hybrid systems combining sugar modifications and UBPs (e.g., TNA + NaM-TPT376) not only boost storage capacity but also introduce multifunctional advantages, including environmental responsiveness and intrinsic data encryption.
Bio-orthogonality
Benefiting from alterations in spatial structure, ncNAs can evade recognition by biomacromolecules, thereby achieving bio-orthogonality and enabling data storage independently of biological systems. Mirror-image nucleic acids (L-DNA) maintain the same sequence information capacity as D-DNA but provide superior biological stability due to their resistance to natural nucleases such as DNase I78. This significantly prolongs their lifespan in biological environments191, making them ideal for secure storage. Their bio-orthogonality ensures that they are not misread by natural systems, offering intrinsic data protection. However, the lack of compatible polymerases and sequencing tools makes L-DNA systems cost-intensive and technically challenging.
Parallel storage
Sequence-independent writing represents a promising frontier for scalable storage. The use of epigenetic markers, such as 5-methylcytosine (5mC), allows binary data to be encoded orthogonally onto existing sequences-methylated cytosine (5mC) denoting “1” and unmethylated cytosine denoting “0”. This strategy, akin to movable type printing, employs site-specific methylation via DNMT1 to inscribe data without altering the underlying sequence192. Such systems are inherently parallelizable, reversible, and offer reconfigurable, reusable storage architecture.
Collectively, ncNAs displayed diverse advantages in data storage. NcNAs with modifications at different positions exhibit distinct advantages in various application scenarios of data storage. For instance, modification on bases may expand the storage capacity, while lacking in biological and chemical stability. In contrast, modifications on the sugar ring and phosphate backbone can enhance the stability and orthogonality of nucleic acids; however, these modifications often hinder polymerase recognition, making the synthesis of long strands more challenging. Data retrieval is even more difficult than that based on base modifications. Furthermore, multiple modifications, such as TNA + UBP, combine the nuclease-resistant scaffold of TNA with the expanded storage capacity of UBP. Mirror-image nucleic acids (e.g., L-DNA) exhibit bio-orthogonality by evading recognition from natural biomacromolecules, offering enhanced stability and security for data storage, though challenges remain in polymerase compatibility and sequencing tools. In contrast, the parallelization of data writing using epigenetic markers like 5mC enables efficient, reversible, and scalable encoding without altering the original sequence, providing a flexible and reusable storage architecture.
Current challenges and future perspectives
Despite their substantial potential, ncNAs face several technical bottlenecks that hinder their large-scale deployment. Chief among these are challenges in synthesis, readout technologies, and chemical diversification. Addressing these limitations is critical for the realization of scalable, robust, and cost-effective data storage platforms.
Writing: synthesis limitations and polymerase compatibility
Synthesis remains a primary obstacle. Both chemical and enzymatic synthesis routes are technically demanding and costly193,194. Solid-phase synthesis involves the use of harsh reagents and expensive nucleotide building blocks (Table 3)195, while enzymatic approaches are limited by the substrate recognition fidelity of natural polymerases, which often fail to process UBPs or modified backbones. These inefficiencies significantly drive-up costs. Additionally, current synthesis technologies struggle to produce long, non-canonical sequences. In molecular data storage, longer nucleic acid strands require fewer index sequences, resulting in less redundancy and thus higher storage capacity. In contrast, short ncNA limits both data capacity and reliability. To overcome these limitations, more automated and scalable synthesis platforms are required. Enhancing coupling efficiency and minimizing synthesis errors in solid-phase synthesis, and reducing monomer costs through novel chemical or enzymatic production pathways, will be essential196. Meanwhile, polymerase engineering, accelerated by machine learning tools, holds promise for improving enzymatic compatibility with non-canonical substrates197,198. For instance, Holliger et al. engineered an archaeal polymerase with steric gate mutations to enable synthesis of long 2′-OMe RNA oligonucleotides up to 750 nt199.
Reading: sequencing limitations and optimization strategies
Accurate and efficient readout is another major challenge. Conventional sequencing platforms like Illumina and Sanger methods are often incompatible with non-canonical nucleotides, typically necessitating reverse transcription into canonical DNA, a step that adds complexity, time, and error potential200.
Recent experimental advances demonstrate that nanopore sequencing can be adapted to directly decode synthetic and ncNAs. Two landmark studies reported (i) high-throughput deconvolution of non-canonical bases201,202,203 and (ii) sequencing of DNA analogs with artificial bases203, both relying on bespoke training datasets and customized basecaller models. In parallel, methodological improvements, including Uncalled4204, Rockfish205, and the Bonito/Dorado/Remora toolchains206, as well as transformer-based architectures, have markedly enhanced the accuracy of modification detection and base analog discrimination. Advances in solid-state nanopores (e.g., graphene, MoS₂207) further underscore the theoretical potential for improved resolution and tunability, although robust superiority over biological pores remains to be experimentally validated. Collectively, these developments suggest that the gap between chemical innovation in ncNAs and their practical single-molecule readout is rapidly narrowing, contingent on continued progress in pore engineering, dataset curation, and machine-learning–driven basecalling.
Expanding the substrate landscape: chemical diversity and storage potential
The current landscape of ncNAs remains limited in scope, representing only a small fraction of their potential. Broadening the chemical repertoire, through the development of new bases, sugars, and backbone chemistry, will be crucial. For instance, Pol6G12 and RT521 have shown capability for transferring genetic information between DNA and HNA134, highlighting their potential in storage applications.
RNA, while structurally and functionally analogous to DNA, offers both distinct advantages and pronounced challenges in the context of molecular data storage. As a single-stranded polymer, RNA exhibits greater conformational flexibility and is amenable to a wide range of enzymatic manipulations, including transcription, reverse transcription, and programmable editing via CRISPR-Cas systems, making it a promising candidate for dynamic and rewritable data architectures208,209,210. However, its chemical instability, primarily due to the presence of a 2′-hydroxyl group that renders the phosphodiester backbone susceptible to base-catalyzed hydrolysis, significantly limits its utility for long-term storage applications211. Moreover, RNA is highly vulnerable to ubiquitous ribonucleases, which can rapidly degrade unprotected RNA molecules even under mild conditions212. To address these limitations, various chemical modification strategies have been developed to enhance RNA stability while preserving its informational and functional capabilities. Notable among these are 2′-O-methyl (2′-OMe), 2′-fluoro (2′-F), and LNA modifications, which confer increased resistance to nucleases and thermal denaturation without compromising base-pairing fidelity213,214. These modified RNAs have found extensive use in therapeutic and diagnostic applications, and their potential in data storage systems remains an emerging area of interest.
Beyond RNA itself, synthetic analogs with altered sugar-phosphate backbones offer a broader chemical space for engineering storage systems with superior stability and functionality. Several RNA-inspired ncNAs, such as GNA, and HNA, have demonstrated the ability to form stable Watson-Crick base pairs with complementary DNA or RNA strands, while exhibiting enhanced resistance to hydrolytic and enzymatic degradation134. Importantly, polymerases capable of synthesizing and replicating TNA and HNA have been engineered, suggesting that these systems could support autonomous information copying and evolution in synthetic media215.
Looking ahead, combining multiple modifications (e.g., TNA + TPT376, PS + FANA75) offers a modular approach to building high storage capacity, functional, and responsive storage systems. These hybrid platforms can encode data with enhanced fidelity, security, and environmental adaptability, features vital for the next-generation of molecular data storage technologies76,216.
Ethical and biosafety considerations
Given their structural homology with natural DNA and RNA, ncNAs raise important ethical and biosafety concerns, especially as they move from in vitro to potential in vivo applications. Although ncNA-based data storage has been demonstrated only in vitro so far, advancements in DNA-based in vivo storage systems suggest that ncNAs could be explored for future in vivo data storage217,218. However, several biosafety and ethical considerations must be carefully addressed.
One concern arises from recent discussions about mirror-image life forms composed of L-DNA and other mirror-image biological molecules219. Although the synthesis of a mirror-image organism is beyond current technological capabilities, progress in synthetic biology could eventually enable the construction of a fully mirror-image bacterium. Preliminary assessments suggest that these organisms could evade host immune surveillance and resist degradation by natural enzymes, leading to uncontrolled proliferation and potential biosafety risks. To mitigate these concerns, constructing restriction systems that limit the replication of mirror-image life may be necessary.
Similarly, the application of ncNAs for data storage requires caution regarding their biosafety. While the evolution of polymerases capable of handling these modified bases has been a focal point, the development of highly specific nucleases targeting ncNAs is also critical. Furthermore, the self-replication capabilities of ncNAs must be tightly controlled. Prior to in vivo data storage, it is essential to evaluate the efficiency of genome integration, potential mutation risks, immune responses, and metabolic clearance pathways associated with ncNAs. These factors must be clearly defined to ensure safety in future applications.
Molecular security, encryption, and bio-orthogonal steganography
Beyond expanding data storage capacity and chemical diversity, ncNAs also open new horizons for molecular-level security and encryption. Incorporating modified or synthetic nucleotides such as 5mC or other unnatural bases can render encoded sequences readable only by specific biochemical or enzymatic processes220, ensuring that only authorized entities can decode the data. In parallel, embedding hidden data layers within DNA through chemical modifications221 or secondary structural encoding provides additional protection against unauthorized access, establishing a molecular equivalent of multi-factor authentication.
Furthermore, ncNA sequences can also serve as unique molecular signatures or physical unclonable functions (PFUs) or molecular fingerprints. Their replication requires precise knowledge of both the sequence composition and the synthetic conditions222, thereby making counterfeiting or tampering extremely difficult. The inherent complexity of ncNA architectures, including expanded base alphabets, backbone modifications, and hybridization orthogonality, further enhances anti-counterfeiting capabilities and steganographic potential103.
A particularly exciting direction lies in bio-orthogonal steganography, which leverages the orthogonality between ncNAs and natural biomolecules to conceal data within biological systems without interfering with native functions. For instance, L-DNA, which exhibits no complementarity with natural D-DNA, has been used to create “mirror-encoded” information for covert communication and molecular authentication78. Similarly, PNA with charge-neutral backbones can evade detection by conventional hybridization probes, while highly stable systems like TNA offer durability for long-term encrypted data storage.
Building on this concept, DNA adducts represent a distinctive class of ncNAs in which DNA bases are covalently modified by small molecules such as psoralens or aldehydes223,224, enabling reversible chemical encryption. These adducts can be selectively formed or cleaved in response to external stimuli, including light, heat, or redox conditions, allowing dynamic data concealment and recovery. Collectively, such strategies point toward a future where ncNA-based data storage systems integrate multi-layered molecular encryption, steganography, and authentication, providing unprecedented levels of data security at the nanoscale.
Advanced encoding schemes for error minimization
In previous sections, we introduced the use of UBPs to expand the genetic alphabet, allowing for increased storage capacity in molecular data storage. However, as the alphabet expands, the potential for higher error rates during both data writing and reading processes increases. To address these challenges, several encoding schemes can be employed to minimize errors. Non-linear codes, such as run-length limited and Huffman coding, reduce homopolymer errors and improve error correction efficiency in large-alphabet systems225. Combinatorial encoding strategies, including DNA Fountain and overlapping pool designs, exploit redundancy and stochastic assembly to enhance robustness against synthesis and sequencing errors. More recently, 3D structural encoding using DNA origami and nanostructures enables storage in conformational states, bypassing sequencing altogether226. Within this toolbox, non-canonical nucleotides provide a unique opportunity: expanded alphabets (e.g., AEGIS or Hachimoji systems45) enable octal or higher-level coding schemes, thereby increasing per-base data capacity (up to 3.58 bits/base for a 12-letter alphabet). Moreover, their orthogonality and chemical tunability may synergize with combinatorial and structural encoding, offering parallel channels for error-resilient, multi-layered data representation.
Summary and outlook
As both the types and volumes of data continue growing exponentially, DNA-based storage platforms have attracted considerable attention due to their high storage capacity, durability, and low energy requirements. However, the biochemical instability of canonical DNA poses significant limitations for practical application. NcNAs have emerged as attractive alternatives, with diverse applications in molecular medicine, synthetic biology, and nanotechnology. By introducing various modifications at the base, sugar, or backbone moieties, ncNAs offer several advantages, including enhanced biochemical stability, expanded storage capacity, and novel orthogonal functionalities. Hybrid approaches that integrate multiple modifications, such as epigenetic marks combined with sugar or backbone alterations, hold great promise for creating multi-layered, rewritable, and secure data encoding systems.
Despite the promising potential of ncNAs, several technical challenges hinder their practical application. First, large-scale synthesis remains a significant bottleneck, as current methods are not yet efficient in producing long ncNA sequences. Additionally, many synthetic nucleic acids are incompatible with the polymerases required for replication and transcription227, limiting their functionality. Sequencing also remains a critical challenge; while nanopore sequencing has made progress228,229, reliable single-molecule readouts of UBPs require further optimization, including advancements in pore engineering, signal calibration, and base-calling algorithms. Overcoming these obstacles will necessitate multidisciplinary collaboration, integrating expertise from synthetic chemistry, enzymology, materials science, and computational analysis. Key areas for advancement include expanding the chemical repertoire of nucleic acids, engineering more robust and versatile polymerases, and optimizing nanopore-based sequencing technologies. As these challenges are progressively addressed, ncNAs are poised to form the foundation for next-generation molecular data storage systems.
Outlook
Advances in ncNA-based systems are expected to bring significant improvements in writing, preservation, and reading (Fig. 4).
Schematic roadmap illustrating the promising development of ncNAs in molecular data storage, spanning information writing, preservation, and reading technologies.
In terms of writing, rapid progress will be made in ncNA synthesis, driven by technologies like TdT and polymerase variants, enabling the creation of long-stranded and chimeric ncNAs in the short term. In the medium to long term, high-throughput real-time synthesis, fully automated ultra-fast synthesis, and even portable micro-writing devices will emerge, drastically enhancing the throughput and flexibility of ncNA writing.
For preservation, on-chip ncNA storage will be achieved in the near future, followed by the integration of ncNA storage and computing systems. Innovations in encryption and steganography using ncNAs will also gain traction. Over time, smart, low-cost encapsulation systems for real-time monitoring of ncNA degradation, along with ultra-fast micro-encapsulation technologies, will be developed to ensure secure and stable long-term storage of ncNA data.
In reading, relying on the rapid development of nanopore techonologies, nanopore-based ncNA sequencing will become available soon. Supported by the development of AI technique and DNA-based neutral network, AI-assisted semantic retrieval and human-computer interaction systems may be realized. As technology progresses, we will see high-throughput real-time reading, fully automated ultra-fast ncNA reading, and portable micro storage-reading devices, greatly enhancing the speed and convenience of accessing ncNA-based data storage systems.
References
Bar-Lev, D., Orr, I., Sabary, O., Etzion, T. & Yaakobi, E. Scalable and robust DNA-based storage via coding theory and deep learning. Nat. Mach. Intell. 7, 639–649 (2025).
Yang, S. et al. DNA as a universal chemical substrate for computing and data storage. Nat. Rev. Chem. 8, 179–194 (2024).
Krusin-Elbaum, L., Shibauchi, T., Argyle, B., Gignac, L. & Weller, D. Stable ultrahigh-density magneto-optical recordings using introduced linear defects. Nature 410, 444–446 (2001).
Grass, R. N., Heckel, R., Puddu, M., Paunescu, D. & Stark, W. J. Robust chemical preservation of digital information on DNA in silica with error-correcting codes. Angew. Chem. Int. Ed. 54, 2552–2555 (2015).
Buko, T., Tuczko, N. & Ishikawa, T. DNA data storage. BioTech 12, 44 (2023).
Li, X. DNA data storage: the fusion of digital and biological information. Theor. Nat. Sci. 4, 26–31 (2023).
Jens, M., Nina, L., Pernilla, B. & Dag, L. ICT sector electricity consumption and greenhouse gas emissions – 2020 outcome. Telecommun. Policy 48, 102701 (2024).
Erol, G. Electricity consumption by ICT: facts, trends, and measurements. Ubiquity 2023, 1–15 (2023).
Park, C.-H., Petit, Y., Canioni, L. & Park, S.-H. Five-dimensional optical data storage based on ellipse orientation and fluorescence intensity in a silver-sensitized commercial glass. Micromachines 11, 1026 (2020).
Li, J. et al. Terahertz wavefront shaping with multi-channel polarization conversion based on all dielectric metasurface. Photonics Res. 9, 1939–1947 (2021).
Wuttig, M., Bhaskaran, H. & Taubner, T. Phase-change materials for non-volatile photonic applications. Nat. Photonics 11, 465–476 (2017).
Organick, L. et al. Probing the physical limits of reliable DNA data retrieval. Nat. Commun. 11, 616 (2020).
Huang, X. L. et al. Storage-D: a user-friendly platform that enables practical and personalized DNA data storage. Imeta 3, e168 (2024).
Kevin, N. L. et al. A primordial DNA store and compute engine. Nat. Nanotechnol. 19, 1654–1664 (2024).
Ludovic, O. et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78 (2013).
Divya, K., Gk, M., Umema, A. & Ss, D. Environmental factors affecting the concentration of DNA in blood and saliva stains: a review. J. Forensic Sci. Res. 78, s343 (2024).
Wamho, E. C. et al. Controlling nuclease degradation of wireframe DNA origami with minor groove binders. ACS Nano 16, 8954–8966 (2022).
Moret, I. et al. Stability of PEI-DNA and DOTAP-DNA complexes: effect of alkaline pH, heparin and serum. J. Control Release 76, 169–181 (2001).
Fraikin, G. Y., Belenikina, N. S. & Rubin, A. B. Photochemical processes of cell DNA damage by UV radiation of various wavelengths: biological consequences. Mol. Biol. 58, 1–16 (2024).
Saloua Kouass, S., Sonia, G., Pierre, C., Léon, S. & Darel, J. H. The relative contributions of DNA strand breaks, base damage and clustered lesions to the loss of DNA functionality induced by ionizing radiation. Radiat. Res. 181, 99–110 (2014).
Karishma, M., James, M. T. & Albert, J. K. DNA stability: a central design consideration for DNA data storage systems. Nat. Commun. 12, 1358 (2021).
Ramakrishnan, S., Ijäs, H., Linko, V. & Keller, A. Structural stability of DNA origami nanostructures under application-specific conditions. Comput. Struct. Biotec. 16, 342–349 (2018).
Langlois, N. I. & Clark, H. A. Characterization of DNA nanostructure stability by size exclusion chromatography. Anal. Methods 14, 1006–1014 (2022).
Zhong, W. R. & Sczepanski, J. T. Direct comparison of D-DNA and L-DNA strand-displacement reactions in living mammalian cells. ACS Synth. Biol. 10, 209–212 (2021).
Yatsunyk, L. A., Mendoza, O. & Mergny, J.-L. Nano-oddities”: unusual nucleic acid assemblies for DNA-based nanostructures and nanodevices. Acc. Chem. Res 47, 1836–1844 (2014).
Fei, Z. J., Gupta, N., Li, M. J., Xiao, P. F. & Hu, X. Toward highly effective loading of DNA in hydrogels for high-density and long-term information storage. Sci. Adv. 9, eadg9933 (2023).
Mao, C. et al. Metal-organic frameworks in microfluidics enable fast encapsulation/extraction of DNA for automated and integrated data storage. Acs Nano 17, 2840–2850 (2023).
Koch, J. et al. A DNA-of-things storage architecture to create materials with embedded memory. Nat. Biotechnol. 38, 39 (2020).
Samuel, G. Cracking the code on DNA storage. Commun. ACM 60, 16–18 (2017).
Daoqing, F., Jun, W., Jiawen, H., Erkang, W. & Shaojun, D. Engineering DNA logic systems with non-canonical DNA-nanostructures: basic principles, recent developments and bio-applications. Sci. China Chem. 65, 284–297 (2021).
Xiujuan, Y., Huimin, Z., Zhenqiang, H. & Xiao, W. Application of aptamer-functionalized nanomaterials in molecular imaging of tumors. Nanotechnol. Rev. 12, 20230107 (2023).
Yuang, W. et al. Chemically modified DNA nanostructures for drug delivery. Innovation 3, 100217 (2022).
Changping, Y. et al. Genetically encoded nucleic acid nanostructures for biological applications. Chembiochem 26, e202400991 (2025).
Duffy, K., Arangundy-Franklin, S. & Holliger, P. Modified nucleic acids: replication, evolution, and next-generation therapeutics. Bmc Biol. 18, 112 (2020).
Yu, M. et al. High-throughput DNA synthesis for data storage. Chem. Soc. Rev. 53, 4463–4489 (2024).
Bag, S. S., Banerjee, A. & Sinha, S. Expansion of genetic alphabets: designer nucleobases and their applications. Synlett 35, 1195–1227 (2023).
Kang, S. H., Liu, Q., Zhang, J., Zhang, Y. & Qi, H. 2,6-diaminopurine (Z)-containing toehold probes improve genotyping sensitivity. Biotechnol. Bioeng. 121, 1384–1393 (2024).
Zhang, M., Singh, N., Ehmann, M. E., Zheng, L. N. & Zhao, H. M. Incorporation of noncanonical base Z yields modified mRNA with minimal immunogenicity and improved translational capacity in mammalian cells. Iscience 26, 107739 (2023).
Liu, Q. et al. Detection of DNA base modifications by deep recurrent neural network on Oxford Nanopore sequencing data. Nat. Commun. 10, 2449 (2019).
Seiple, L., Jaruga, P., Dizdaroglu, M. & Stivers, J. T. Linking uracil base excision repair and 5-fluorouracil toxicity in yeast. Nucleic Acids Res. 34, 140–151 (2006).
Eremeeva, E. & Herdewijn, P. PCR amplification of base-modified DNA. Curr. Protoc. Chem. Biol. 10, 18–48 (2018).
Goodman, M. F., Hopkins, R. L., Lasken, R. & Mhaskar, D. N. The biochemical basis of 5-bromouracil- and 2-aminopurine-induced mutagenesis. Basic Life Sci. 31, 409–423 (1985).
Willis, M. C., Hicke, B. J., Uhlenbeck, O. C., Cech, T. R. & Koch, T. H. Photocrosslinking of 5-iodouracil-substituted RNA and DNA to proteins. Science 262, 1255–1257 (1993).
Kawabe, H. et al. Enzymatic synthesis and nanopore sequencing of 12-letter supernumerary DNA. Nat. Commun. 14, 6820 (2023).
Hoshika, S. et al. Hachimoji DNA and RNA: a genetic system with eight building blocks. Science 363, 884 (2019).
Galindo-Murillo, R. & Barroso-Flores, J. Hydrophobic unnatural base pairs show a Watson-Crick pairing in micro-second molecular dynamics simulations. J. Biomol. Struct. Dyn. 38, 4098–4106 (2020).
Zhang, Y. et al. A semi-synthetic organism that stores and retrieves increased genetic information. Nature 551, 644 (2017).
Mukba, S. A. et al. Expanding the genetic code: unnatural base pairs in biological systems. Mol. Biol. 54, 475–484 (2020).
Okamoto, I., Miyatake, Y., Kimoto, M. & Hirao, I. High fidelity, efficiency and functionalization of Ds-Px unnatural base pairs in PCR amplification for a genetic alphabet expansion system. Acs Synth. Biol. 5, 1220–1230 (2016).
Hirao, I., Kimoto, M. & Lee, K. H. DNA aptamer generation by ExSELEX using genetic alphabet expansion with a mini-hairpin DNA stabilization method. Biochimie 145, 15–21 (2018).
Johnson, S. C., Sherrill, C. B., Marshall, D. J., Moser, M. J. & Prudent, J. R. A third base pair for the polymerase chain reaction: inserting isoC and isoG. Nucleic Acids Res. 32, 1937–1941 (2004).
Marina, E., Alexander, M. S.-C. & Markus, W. G. Impact of modified ribose sugars on nucleic acid conformation and function. Heterocycl. Commun. 23, 155–165 (2017).
Inoue, H. et al. Synthesis and hybridization studies on two complementary nona(2’-O-methyl)ribonucleotides. Nucleic Acids Res. 15, 6131–6148 (1987).
Watts, J. K., Katolik, A., Viladoms, J. & Damha, M. J. Studies on the hydrolytic stability of 2′-fluoroarabinonucleic acid (2′F-ANA). Org. Biomol. Chem. 7, 1904–1910 (2009).
Pontarelli, A. & Wilds, C. J. Arabinonucleic acids containing C5-propynyl modifications form stable hybrid duplexes with RNA that are efficiently degraded by E. coli RNase H. Bioorg. Med. Chem. Lett. 67, 128744 (2022).
Kalota, A. et al. 2′-Deoxy-2′-fluoro-β- d -arabinonucleic acid (2′F-ANA) modified oligonucleotides (ON) effect highly efficient, and persistent, gene silencing. Nucleic Acids Res. 34, 451–461 (2006).
Anosova, I. et al. The structural diversity of artificial genetic polymers. Nucleic Acids Res. 44, 1007–1021 (2016).
Zhang, W. et al. Structural interpretation of the effects of threo-nucleotides on nonenzymatic template-directed polymerization. Nucleic Acids Res. 49, 646–656 (2021).
Murayama, K., Okita, H., Kuriki, T. & Asanuma, H. Nonenzymatic polymerase-like template-directed synthesis of acyclic l-threoninol nucleic acid. Nat. Commun. 12, 804 (2021).
Culbertson, M. C. et al. Evaluating TNA stability under simulated physiological conditions. Bioorg. Med. Chem. Lett. 26, 2418–2421 (2016).
Ebert, M. O., Mang, C., Krishnamurthy, R., Eschenmoser, A. & Jaun, B. The structure of a TNA-TNA complex in solution: NMR study of the octamer duplex derived from α-(L)-threofuranosyl-(3′-2′)-CGAATTCG. J. Am. Chem. Soc. 130, 15105–15115 (2008).
Eze, N. A. & Milam, V. T. Quantitative analysis of locked nucleic acid and DNA competitive displacement events on microspheres. Langmuir 38, 6871–6881 (2022).
Hoshino, H., Kasahara, Y., Kuwahara, M. & Obika, S. DNA polymerase variants with high processivity and accuracy for encoding and decoding locked nucleic acid sequences. J. Am. Chem. Soc. 142, 21530–21537 (2020).
Samson, C. et al. Structural studies of HNA substrate specificity in mutants of an archaeal DNA polymerase obtained by directed evolution. Biomolecules 10, 1647 (2020).
Allart, B. et al. D-altritol nucleic acids (ANA): Hybridisation Properties, Stability, And Initial Structural Analysis. Chem. - A Eur. J. 5, 2424–2431 (1999).
Zhang, L. L., Peritz, A. & Meggers, E. A simple glycol nucleic acid. J. Am. Chem. Soc. 127, 4174–4175 (2005).
Makino, K., Sugiyama, I., Asanuma, H. & Kashida, H. Kinetics of strand displacement reaction with acyclic artificial nucleic acids. Angew. Chem. Int. Ed. Engl. 63, e202319864 (2024).
Maiti, M. et al. Xylonucleic acid: synthesis, structure, and orthogonal pairing properties. Nucleic Acids Res. 43, 7189–7200 (2015).
Kevin, A. et al. On the influence of nucleic acid backbone modifications on lipid nanoparticle morphology. Langmuir 38, 14036–14043 (2022).
Shin, H., Cho, J. Y., Park, B. Y. & Jung, C. L. Sulfur incorporation into nucleic acids accelerates enzymatic activity. Chem. Eng. J. 493, 152548 (2024).
Novikova, D., Sagaidak, A., Vorona, S. & Tribulovich, V. A visual compendium of principal modifications within the nucleic acid sugar phosphate backbone. Molecules 29, 3025 (2024).
Tereshko, V., Gryaznov, S. & Egli, M. Consequences of replacing the DNA 3′-oxygen by an amino group: high-resolution crystal structure of a fully modified N3′→P5′ phosphoramidate DNA dodecamer duplex. J. Am. Chem. Soc. 120, 269–283 (1998).
Singh, G. & Monga, V. Peptide nucleic acids: recent developments in the synthesis and backbone modifications. Bioorg. Chem. 141, 106860 (2023).
Moccia, M., Adamo, M. F. & Saviano, M. Insights on chiral, backbone modified peptide nucleic acids: properties and biological activity. Artif. DNA PNA XNA 5, e1107176 (2014).
Lok, C. N. et al. Potent gene-specific inhibitory properties of mixed-backbone antisense oligonucleotides comprised of 2′-deoxy-2′-fluoro-D-arabinose and 2′-deoxyribose nucleotides. Biochemistry 41, 3457–3467 (2002).
Depmeier, H. & Kath-Schorr, S. Expanding the horizon of the xeno nucleic acid space: threose nucleic acids with increased information storage. J. Am. Chem. Soc. 146, 7743–7751 (2024).
Mallette, T. L., Lidke, D. S. & Lakin, M. R. Heterochiral modifications enhance robustness and function of DNA in living human cells. Chembiochem 25, e202300755 (2024).
Fan, C. Y., Deng, Q. & Zhu, T. F. Bioorthogonal information storage in L-DNA with a high-fidelity mirror-image DNA polymerase. Nat. Biotechnol. 39, 1548–1555 (2021).
Kang, H. et al. Inhibition of MDR1 gene expression by chimeric HNA antisense oligonucleotides. Nucleic Acids Res. 32, 4411–4419 (2004).
Mads, K. S. et al. Self-assembly of ultrasmall 3D architectures of (<scp>l</scp>)-acyclic threoninol nucleic acids with high thermal and serum stability. J. Am. Chem. Soc. 146, 20141–20146 (2024).
Ken, Y. et al. Enhancing siRNA efficacy in vivo with extended nucleic acid backbones. Nat. Biotechnol. 43, 904–913 (2024).
Wang, J., Shang, J., Xiang, Y. & Tong, A. Post-synthetic modification of oligonucleotides through oxidative amination of 4-thio-2’-deoxyuridine. Curr. Protoc. 1, e274 (2021).
Lee, E. M., Setterholm, N. A., Hajjar, M., Barpuzary, B. & Chaput, J. C. Stability and mechanism of threose nucleic acid toward acid-mediated degradation. Nucleic Acids Res. 51, 9542–9551 (2023).
Gade, C. R. & Sharma, N. K. Hybrid DNA i-motif: aminoethylprolyl-PNA (pC(5)) enhance the stability of DNA (dC(5)) i-motif structure. Bioorg. Med. Chem. Lett. 27, 5424–5428 (2017).
Petkowski, J. J. et al. Astrobiological implications of the stability and reactivity of peptide nucleic acid (PNA) in concentrated sulfuric acid. Sci. Adv. 11, eadr0006 (2025).
Rozners, E. Hydration of short DNA, RNA and 2’-OMe oligonucleotides determined by osmotic stressing. Nucleic Acids Res. 32, 248–254 (2004).
Lelyveld, V. S., Fang, Z. Y. & Szostak, J. W. Trivalent rare earth metal cofactors confer rapid NP-DNA polymerase activity. Science 382, 423–429 (2023).
Bian, T. Y. et al. Xeno nucleic acids as functional materials: from biophysical properties to application. Adv. Health. Mater. 13, e2401207 (2024).
Wilds, C. J. & Damha, M. J. 2′-Deoxy-2′-fluoro-β-d-arabinonucleosides and oligonucleotides (2′F-ANA): synthesis and physicochemical studies. Nucleic Acids Res. 28, 3625–3635 (2000).
De Winter, H., Lescrinier, E., Van Aerschot, A. & Herdewijn, P. Molecular dynamics simulation to investigate differences in minor groove hydration of HNA/RNA hybrids as compared to HNA/DNA complexes. J. Am. Chem. Soc. 120, 5381–5394 (1998).
Kamiya, Y. et al. Intrastrand backbone-nucleobase interactions stabilize unwound right-handed helical structures of heteroduplexes of L-aTNA/RNA and SNA/RNA. Commun. Chem. 3, 156 (2020).
Campbell, M. A. & Wengel, J. Locked. unlocked nucleic acids (LNA. UNA): contrasting structures work towards common therapeutic goals. Chem. Soc. Rev. 40, 5680–5689 (2011).
Lackey, H. H., Peterson, E. M., Chen, Z., Harris, J. M. & Heemstra, J. M. Thermostability trends of TNA:DNA duplexes reveal strong purine dependence. ACS Synth. Biol. 8, 1144–1152 (2019).
Wilds, C. J., Wawrzak, Z., Krishnamurthy, R., Eschenmoser, A. & Egli, M. Crystal structure of a B-form DNA duplex containing (l)-α-threofuranosyl (3‘→2‘) nucleosides: a four-carbon sugar is easily accommodated into the backbone of DNA. J. Am. Chem. Soc. 124, 13716–13721 (2002).
Anosova, I. et al. Structural insights into conformation differences between DNA/TNA and RNA/TNA chimeric duplexes. Chembiochem 17, 1705–1708 (2016).
Lackey, H. H., Chen, Z., Harris, J. M., Peterson, E. M. & Heemstra, J. M. Single-molecule kinetics show DNA pyrimidine content strongly affects RNA:DNA and TNA:DNA heteroduplex dissociation rates. ACS Synth. Biol. 9, 249–253 (2020).
Yubin, R. et al. DNA-based concatenated encoding system for high-reliability and high-density data storage. Small Methods 6, e2101335 (2022).
Ping, S., Rujie, Z., Chuanping, H. & Tingjian, C. Transcription, reverse transcription, and amplification of backbone-modified nucleic acids with laboratory-evolved thermophilic DNA polymerases. Curr. Protoc. 1, e188 (2021).
Linda, C. M. et al. Reading and writing digital data in DNA. Nat. Protoc. 15, 86–101 (2019).
Giacomo, C. & Claudio, S. Time series compression survey. ACM Comput. Surv. 55, 1–32 (2023).
Bingzhe, L., Li, O. & David, D. DP-DNA: a digital pattern-aware DNA storage system to improve encoding density. In 2023 31st International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). Institute of Electrical and Electronics Engineers (IEEE), 1–8 (2023).
Church, G. M., Gao, Y. & Kosuri, S. Next-generation digital information storage in DNA. Science 337, 1628–1628 (2012).
Goldman, N. et al. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494, 77–80 (2013).
Zheng, Y. et al. DNA-QLC: an efficient and reliable image encoding scheme for DNA storage. BMC Genomics 25, 266 (2024).
Romesberg, F. E. Unnatural base pairs to expand the genetic alphabet and code. In Handbook of Chemical Biology of Nucleic Acids(ed Suginoto N). Springer Nature Singapore, 1–21 (2022).
Zhang, C. et al. Parallel molecular data storage by printing epigenetic bits on DNA. Nature 634, 824–832 (2024).
Kyung Hyun, L., Kiyofumi, H., Michiko, K. & Ichiro, H. Genetic alphabet expansion biotechnology by creating unnatural base pairs. Curr. Opin. Biotechnol. 51, 8–15 (2018).
Thomas, C. A. et al. Assessing readability of an 8-letter expanded deoxyribonucleic acid alphabet with nanopores. J. Am. Chem. Soc. 145, 8560–8568 (2023).
Wang, J. & Yu, H. Y. Threose nucleic acid as a primitive genetic polymer and a contemporary molecular tool. Bioorg. Chem. 143, 107049 (2024).
Yang, K. F., McCloskey, C. M. & Chaput, J. C. Reading and writing digital information in TNA. ACS Synth. Biol. 9, 2936–2942 (2020).
Seeman, N. C. & Sleiman, H. F. DNA nanotechnology. Nat. Rev. Mater. 3, 17068 (2017).
Rothemund, P. W. K. Folding DNA to create nanoscale shapes and patterns. Nature 440, 297–302 (2006).
Lin, W. et al. Predict the degree of secondary structures of the encoding sequences in DNA storage by deep learning model. Sci. Rep. 15, 20920 (2025).
Ke, G. et al. -DNA molecular beacon: a safe, stable, and accurate intracellular nano-thermometer for temperature sensing in living cells. J. Am. Chem. Soc. 134, 18908–18911 (2012).
Fu, W. et al. Rational design of pH-responsive DNA motifs with general sequence compatibility. Angew. Chem. Int. Ed. 58, 16405–16410 (2019).
Hu, Y. W., Cecconello, A., Idili, A., Ricci, F. & Willner, I. Triplex DNA nanostructures: from basic properties to applications. Angew. Chem. Int. Ed. 56, 15210–15233 (2017).
Barbosa, N., Sagresti, L. & Brancato, G. Photoinduced azobenzene-modified DNA dehybridization: insights into local and cooperativity effects from a molecular dynamics study. Phys. Chem. Chem. Phys. 23, 25170–25179 (2021).
Suyu, L., Fei, D., Qian, L., Chunhai, F. & Jing, F. Azobenzene-integrated DNA Nanomachine. Chem. J. Chin. U 43, 8 (2022).
Ashwood, B. & Tokmakoff, A. Kinetics and dynamics of oligonucleotide hybridization. Nat. Rev. Chem. 9, 305–327 (2025).
Wang, Q. et al. 2′-Fluoroarabinonucleic acid nanostructures as stable carriers for cellular delivery in the strongly acidic environment. Acs Appl. Mater. Inter 12, 53592–53597 (2020).
Li, X. et al. RNA nanotechnology for codelivering high-payload nucleoside analogs to cancer with a synergetic effect. Mol. Pharm. 21, 5690–5702 (2024).
Taylor, A. I. et al. Nanostructures from synthetic genetic polymers. Chembiochem 17, 1107–1110 (2016).
Dai, K. et al. Single-stranded RNA origami-based epigenetic immunomodulation. Nano Lett. 23, 7188–7196 (2023).
Dong, B. et al. Synthesis and characterization of (R)-miniPEG-containing chiral γ-peptide nucleic acids using the Fmoc strategy. Tetrahedron Lett. 60, 1430–1433 (2019).
Li, K. J. et al. Empowering DNA-based information processing: computation and data storage. Acs Appl Mater. Inter 16, 68749–68771 (2024).
Hollenstein, M. Enzymatic synthesis of base-modified nucleic acids. In Handbook of Chemical Biology of Nucleic Acids (ed Suginoto, N). Springer Nature Singapore, 1–39 (2022).
Akihiro, O., Kohji, S. & Mitsuo, S. DNA synthesis without base protection using the phosphoramidite approach. Curr. Protoc. Nucleic Acid Chem. 26, 3 (2006).
Brian, S. S. Chemical nucleic acid synthesis, modification and labelling. Curr. Opin. Biotechnol. 4, 20–28 (1993).
Weng, Z. et al. Massively parallel homogeneous amplification of chip-scale DNA for DNA information storage (MPHAC-DIS). Nat. Commun. 16, 667 (2025).
Okita, H., Kondo, S., Murayama, K. & Asanuma, H. Rapid chemical ligation of DNA and threoninol nucleic acid (TNA) for effective nonenzymatic primer extension. J. Am. Chem. Soc. 145, 17872–17880 (2023).
O’Flaherty, D. K., Zhou, L. & Szostak, J. W. Nonenzymatic RNA-templated synthesis of N3′→P5′ phosphoramidate DNA. Bio-Protoc. 10, e3734 (2020).
Kendall, H., Michelle, H., Giancarlo, G., Jeremy, S. E. & Wei, Z. Enzymatic synthesis of designer DNA using cyclic reversible termination and a universal template. ACS Synth. Biol. 9, 283–293 (2020).
Sun, L. P. et al. Template-independent synthesis and 3′-end labelling of 2′-modified oligonucleotides with terminal deoxynucleotidyl transferases. Nucleic Acids Res. 52, 10085–10101 (2024).
Pinheiro, V. B. et al. Synthetic genetic polymers capable of heredity and evolution. Science 336, 341–344 (2012).
Lelyveld, V. S., Zhang, W. & Szostak, J. W. Synthesis of phosphoramidate-linked DNA by a modified DNA polymerase. Proc. Natl. Acad. Sci. USA 117, 7276–7283 (2020).
Liu, Y. Y., Wang, J., Wu, Y. S. & Wang, Y. J. Advancing the enzymatic toolkit for 2′-fluoro arabino nucleic acid (FANA) manipulation: phosphorylation, ligation, replication, and templating RNA transcription. Chem. Sci. 15, 12534–12542 (2024).
Khamissi, N., Korfmann, C., Chaudhry, A. & Hili, R. Ligase-catalyzed transcription and reverse-transcription of XNA-containing nucleic acid polymers using T3 DNA ligase. Chem. Sci. 16, 9749–9755 (2025).
Carlson, C. K. et al. A massively parallel in vivo assay of TdT mutants yields variants with altered nucleotide insertion biases. ACS Synth. Biol. 13, 3326–3343 (2024).
Sabat, N. et al. Towards the controlled enzymatic synthesis of LNA containing oligonucleotides. Front. Chem. 11, 1161462 (2023).
Wang, G. et al. Enzymatic synthesis of DNA with an expanded genetic alphabet using terminal deoxynucleotidyl transferase. ACS Synth. Biol. 11, 4142–4155 (2022).
Lin, W. et al. Scaling the high-yield potential of large-scale DNA data storage with cap-free dnA Synthesis. ACS Synth. Biol. 18, 2764–2773 (2025).
Soukarie, D., Nocete, L., Bittner, A. M. & Santiago, I. DNA data storage in electrospun and melt-electrowritten composite nucleic acid-polymer fibers. Mater. Today Bio. 24, 100900 (2024).
Chen, W. G. et al. An artificial chromosome for data storage. Natl. Sci. Rev. 8, nwab028 (2021).
Yim, S. S. et al. Robust direct digital-to-biological data storage in living cells. Nat. Chem. Biol. 17, 246 (2021).
Luo, H. et al. Engineered living memory microspheroid-based archival file system for random accessible in vivo DNA storage. Adv. Mater. 37, e2415358 (2025).
Fister, K., Fister, I. & Murovec, J. The potential of plants and seeds in DNA-based information storage. Adv. Inf. Knowl. Process. 4, 69–81 (2017).
Elena, B., Aman, A., Renwick, C. J. D. & Thomas, D. DNA storage—from natural biology to synthetic biology. Comput. Struct. Biotec. 21, 1227–1235 (2023).
Farzadfard, F. et al. Single-nucleotide-resolution computing and memory in living cells. Mol. Cell 75, 769–780.e764 (2019).
Zhang, Y. et al. Preservation and encryption in DNA digital data storage. Chempluschem 87, e202200183 (2022).
Sun, F. J. et al. Mobile and self-sustained data storage in an extremophile genomic dNA. Adv. Sci. 10, e2206201 (2023).
Peter, R., Barbara, R., Karin, F., Claus, V. & Martin, W. Mechanisms of degradation of DNA standards for calibration function during storage. Appl. Microbiol. Biotechnol. 89, 407–417 (2010).
Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).
Shen, H. et al. Random sanitization in DNA information storage using CRISPR-Cas12a. J. Am. Chem. Soc. 146, 35155–35164 (2024).
Li, K. J. et al. DNA- DISK: automated end- to- end data storage via enzymatic single- nucleotide DNA synthesis and sequencing on digital microfluidics. Proc. Natl. Acad. Sci. USA 121, 1–9 (2024).
Organick, L. et al. Random access in large-scale DNA data storage (vol 36, pg 242, 2018). Nat. Biotechnol. 36, 660–660 (2018).
Zhou, Y., Bi, K., Ge, Q. Y. & Lu, Z. H. Advances and challenges in random access techniques for in vitro DNA data storage. Acs Appl. Mater. Inter. 16, 43102–43113 (2024).
Zhang, J. Y., Hou, C. Y. & Liu, C. C. CRISPR-powered quantitative keyword search engine in DNA data storage. Nat. Commun. 15, 2376 (2024).
Shy, B. R. et al. High-yield genome engineering in primary cells using a hybrid ssDNA repair template and small-molecule cocktails. Nat. Biotechnol. 41, 521–531 (2023).
Han, W. J. et al. Efficient precise integration of large DNA sequences with 3’-overhang dsDNA donors using CRISPR/Cas9. Proc. Natl. Acad. Sci. USA 120, e2221127120 (2023).
Newman, S. et al. High density DNA data storage library via dehydration with digital microfluidic retrieval. Nat. Commun. 10, 1706 (2019).
Piao, Y. et al. Bead-based DNA synthesis and sequencing for integrated data storage using digital microfluidics. Angew. Chem. Int. Ed. Engl. 64, e202416004 (2025).
Heckel, R., Mikutis, G. & Grass, R. N. A characterization of the DNA data storage channel. Sci. Rep. 9, 9663 (2019).
Darío Sánchez, M. et al. Reduced amplification by phi29 DNA polymerase in the presence of unbound oligos during reaction in RCA. Biosens. Bioelectron.: X 17, 100456 (2024).
Kircher, M., Heyn, P. & Kelso, J. Addressing challenges in the production and analysis of Illumina sequencing data. BMC Genomics 12, 382 (2011).
Wang, Y., Zhao, Y., Bollas, A., Wang, Y. & Au, K. F. Nanopore sequencing technology, bioinformatics and applications. Nat. Biotechnol. 39, 1348–1365 (2021).
Yan, S. H. et al. Direct sequencing of 2′-deoxy-2′-fluoroarabinonucleic acid (FANA) using nanopore-induced phase-shift sequencing (NIPSS). Chem. Sci. 10, 3110–3117 (2019).
Wadley, T. et al. Nanopore sequencing for detection and characterization of phosphorothioate modifications in native DNA sequences. Front. Microbiol 13, 871937 (2022).
Yang, Z. Y., Chen, F., Alvarado, J. B. & Benner, S. A. Amplification, mutation, and sequencing of a six-letter synthetic genetic system. J. Am. Chem. Soc. 133, 15105–15112 (2011).
Kimoto, M., Matsunaga, K. & Hirao, I. DNA aptamer generation by genetic alphabet expansion SELEX (ExSELEX) using an unnatural base pair system. Methods Mol. Biol. 1380, 47–60 (2016).
Wang, H. L. et al. Locating, tracing and sequencing multiple expanded genetic letters in complex DNA context via a bridge-base approach. Nucleic Acids Res. 51, e52 (2023).
Endo, M. AFM-based single-molecule observation of the conformational changes of DNA structures. Methods 169, 3–10 (2019).
Nieves, D. J., Gaus, K. & Baker, M. A. B. DNA-based super-resolution microscopy: DNA-PAINT. Genes (Basel) 9, 621 (2018). 12.
Kang, T., Lim, D., Lee, W. & Song, Y. Polymerase elongation onto patterned DNA for random accessed DNA data storage. BioChip J. 19, 636–648 (2025).
Adleman, L. M. Molecular computation of solutions to combinatorial problems. Science 266, 1021–1024 (1994).
Okamoto, A., Tanaka, K. & Saito, I. DNA logic gates. J. Am. Chem. Soc. 126, 9458–9463 (2004).
Frezza, B. M., Cockroft, S. L. & Ghadiri, M. R. Modular multi-level circuits from immobilized DNA-Based logic gates. J. Am. Chem. Soc. 129, 14875–14879 (2007).
Elbaz, J. et al. DNA computing circuits using libraries of DNAzyme subunits (vol 5, pg 417, 2010). Nat. Nanotechnol. 6, 190–190 (2011).
Phillips, A. & Cardelli, L. A programming language for composable DNA circuits. J. R. Soc. Interface 6, S419–S436 (2009).
Seelig, G., Soloveichik, D., Zhang, D. Y. & Winfree, E. Enzyme-free nucleic acid logic circuits. Science 314, 1585–1588 (2006).
Qian, L. & Winfree, E. Scaling up digital circuit computation with DNA strand displacement cascades. Science 332, 1196–1201 (2011).
Cherry, K. M. & Qian, L. L. Supervised learning in DNA neural networks. Nature 645, 639–647 (2025).
Song, T. & Qian, L. Heat-rechargeable computation in DNA logic circuits and neural networks. Nature 646, 315–322 (2025).
Zhang, Q. et al. High-speed sequential DNA computing using a solid-state DNA origami register. ACS Cent. Sci. 10, 2285–2293 (2024).
Pei, Y. et al. Single-molecule resettable DNA computing via magnetic tweezers. Nano Lett. 22, 3003–3010 (2022).
Garbesi, A. et al. L-DNAs as potenital antimessenger oligonucleotides: a reassessment. Nucleic Acids Res. 21, 4159–4165 (1993).
Kabza, A., Young, B. & Sczepanski, J. Heterochiral DNA strand displacement circuits. J. Am. Chem. Soc. 139, 17715–177182017 (2017). 49.
Young, B. E. & Sczepanski, J. T. Heterochiral DNA strand-displacement based on chimeric D/L-oligonucleotides. ACS Synth. Biol. 8, 2756–2759 (2019).
Chen, Y., Nagao, R., Murayama, K. & Asanuma, H. Orthogonal amplification circuits composed of acyclic nucleic acids enable RNA detection. J. Am. Chem. Soc. 144, 5887–5892 (2022).
Bag, S. S., Banerjee, A. & Sinha, S. Expansion of genetic alphabets: designer nucleobases and their applications. Synlett 35, 1195–1227 (2024).
Le, A. V. & Hartman, M. C. T. Improved synthesis of the unnatural base NaM, and evaluation of its orthogonality in in vitro transcription and translation. RSC Chem. Biol. 5, 1111–1121 (2024).
Shomorony, I. & Heckel, R. Information-theoretic foundations of DNA data storage. Found. Trends Commun. 19, 1–106 (2022).
Jin, B., Li, Y. & Robertson, K. D. DNA methylation: superior or subordinate in the epigenetic hierarchy? Genes Cancer 2, 607–617 (2011).
Zhang, Q. L. et al. Programming non-nucleic acid molecules into computational nucleic acid systems. Angew. Chem. Int. Ed. 62, e202214698 (2023).
Chaput, J. C. Redesigning the genetic polymers of life. Acc. Chem. Res. 54, 1056–1065 (2021).
Ahmed, M. K., Subrata, H. M. & Markus, W. G. Cyclic enzymatic solid phase synthesis of isotopically labeled DNA oligonucleotides. Nucleosides, Nucleotides Nucleic Acids 28, 1030–1041 (2009).
Jia-Yi, L., Yu-Ting, H., Xue-Qiang, W. & Yuntian, Z. Development of silyl-protected phosphoramidite building blocks for short <scp>ssDNA</scp> synthesis†. Chin. J. Chem. 43, 1293–1298 (2025).
David, P. & Rebecca, B. Improving enzyme fitness with machine learning. CHIMIA 77, 116 (2023).
Jason, Y., Francesca-Zhoufan, L. & Frances, H. A. Opportunities and challenges for machine learning-assisted enzyme engineering. ACS Cent. Sci. 10, 226–241 (2024).
Freund, N. et al. A two-residue nascent-strand steric gate controls synthesis of 2′-O-methyl- and 2′-O-(2-methoxyethyl)-RNA. Nat. Chem. 15, 91–100 (2023).
Eva, S. H. et al. Reverse transcription as key step in RNA in vitro evolution with unnatural base pairs. RSC Chem. Biol. 5, 556–566 (2024).
Pagès-Gallego, M. et al. Direct detection of 8-oxo-dG using nanopore sequencing. Nat. Commun. 16, 5236 (2025).
Perez, M. et al. Direct high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learning. Nat. Commun. 16, 6980 (2025).
Thomas, C. A. et al. Sequencing a DNA analog composed of artificial bases. Nat. Commun. 16, 7240 (2025).
Kovaka, S. et al. Uncalled4 improves nanopore DNA and RNA modification detection via fast and accurate signal alignment. Nat. Methods 22, 681–691 (2025).
Stanojević, D., Li, Z., Bakić, S., Foo, R. & Šikić, M. Rockfish: a transformer-based model for accurate 5-methylcytosine prediction from nanopore sequencing. Nat. Commun. 15, 5580 (2024).
Wang, Z. et al. Training data diversity enhances the basecalling of novel RNA modification-induced nanopore sequencing readouts. Nat. Commun. 16, 679 (2025).
Lei, L., Han, Q., Wei, X., Yujuan, W. & Kedong, B. Evaluating graphene and molybdenum disulfide nanopores for DNA sequencing. In 2023 IEEE 18th International Conference on Nano/Micro Engineered and Molecular Systems (NEMS). Institute of Electrical and Electronics Engineers (IEEE), 188–192 (2023).
Keasling, J. D. Manufacturing molecules through metabolic engineering. Science 330, 1355–1358 (2010).
Doudna, J. A. & Charpentier, E. The new frontier of genome engineering with CRISPR-Cas9. Science 346, 1258096 (2014).
Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019).
Lindahl, T. Instability and decay of the primary structure of DNA. Nature 362, 709–715 (1993).
Kornienko, I. V., Aramova, O. Y., Tishchenko, A. A., Rudoy, D. V. & Chikindas, M. L. RNA stability: a review of the role of structural features and environmental conditions. Molecules 29, 5978 (2024).
Burmeister, P. E. et al. Direct in vitro selection of a 2’-O-methyl aptamer to VEGF. Chem. Biol. 12, 25–33 (2005).
Vester, B. & Wengel, J. LNA (locked nucleic acid): high-affinity targeting of complementary RNA and DNA. Biochemistry 43, 13233–13241 (2004).
Maola, V. A. et al. Directed evolution of a highly efficient TNA polymerase achieved by homologous recombination. Nat. Catal. 7, 1173–1185 (2024).
Majumdar, B., Sarma, D., Yu, Y., Lozoya-Colinas, A. & Chaput, J. C. Increasing the functional density of threose nucleic acid. RSC Chem. Biol. 5, 41–48 (2024).
Li, S. et al. Exploring potential biosafety implications in DNA information storage. Biosaf. Health 7, 132–139 (2025).
Ou, Y. & Guo, S. Safety risks and ethical governance of biomedical applications of synthetic biology. Front. Bioeng. Biotechnol. 11, 1292029 (2023).
Adamala, K. P. et al. Confronting risks of mirror life. Science 386, 1351–1353 (2024).
Karpenko, D. V. A method using one fluorophore signal in Sanger read to determine CpG methylation in bisulfite converted DNA. Russian J. Genet. 59, 1255–1262 (2023).
Wang, P., Mu, Z., Sun, L., Si, S. & Wang, B. Hidden addressing encoding for DNA storage. Front. Bioeng. Biotechnol. 10, 916615 (2022).
Sharief, S. A., Chahal, P. & Alocilja, E. Application of DNA sequences in anti-counterfeiting: current progress and challenges. Int. J. Pharm. 602, 120580 (2021).
Cimino, G. D., Shi, Y. B. & Hearst, J. E. Wavelength dependence for the photoreversal of a psoralen-DNA cross-link. Biochemistry 25, 3013–3020 (1986).
Knutson, S. D. et al. Thermoreversible control of nucleic acid structure and function with glyoxal caging. J. Am. Chem. Soc. 142, 17766–17781 (2020).
Erlich, Y. & Zielinski, D. DNA fountain enables a robust and efficient storage architecture. Science 355, 950–954 (2017).
Lin, L. et al. Molecular-level insights on the reactive facet of carbon nitride single crystals photocatalysing overall water splitting. Nat. Catal. 3, 649–655 (2020).
Jiang, F. et al. A general temperature-guided language model to design proteins of enhanced stability and activity. Sci. Adv. 10, eadr2641 (2024).
Singh, G. et al. RUBICON: a framework for designing efficient deep learning-based genomic basecallers. Genome Biol. 25, 49 (2024).
Song, J. et al. DEMINERS enables clinical metagenomics and comparative transcriptomic analysis by increasing throughput and accuracy of nanopore direct RNA sequencing. Genome Biol. 26, 76 (2025).
Pezo, V. et al. Noncanonical DNA polymerization by aminoadenine-based siphoviruses. Science 372, 520–524 (2021).
Heyn, H. & Esteller, M. An adenine code for DNA: a second life for N6-methyladenine. Cell 161, 710–713 (2015).
Dunn, M. R., McCloskey, C. M., Buckley, P., Rhea, K. & Chaput, J. C. Generating biologically stable TNA aptamers that function with high affinity and thermal stability. J. Am. Chem. Soc. 142, 7721–7724 (2020).
Piao, X., Wang, H., Binzel, D. W. & Guo, P. Assessment and comparison of thermal stability of phosphorothioate-DNA, DNA, RNA, 2’-F RNA, and LNA in the context of Phi29 pRNA 3WJ. RNA 24, 67–76 (2018).
Stein, C. A., Subasinghe, C., Shinozuka, K. & Cohen, J. S. Physicochemical properties of phosphorothioate oligodeoxynucleotides. Nucleic Acids Res. 16, 3209–3221 (1988).
Nielsen, P. E., Egholm, M. & Buchardt, O. Peptide nucleic acid (PNA). A DNA mimic with a peptide backbone. Bioconjug. Chem. 5, 3–7 (1994).
Tri et al. Antisense oligonucleotide modified with serinol nucleic acid (SNA) induces exon skipping in mdx myotubes. RSC Adv. 7, 34049–34052 (2017).
Khudyakov, I. Y., Kirnos, M. D., Alexandrushkina, N. I. & Vanyushin, B. F. Cyanophage S-2l contains DNA with 2,6-diaminopurine substituted for adenine. Virology 88, 8–18 (1978).
Wan, L. Q., Yi, J., Lam, S. L., Lee, H. K. & Guo, P. 5-methylcytosine substantially enhances the thermal stability of DNA minidumbbells. Chem.-Eur. J. 27, 6740–6747 (2021).
Singh, S. K. et al. Characterization of DNA with an 8-oxoguanine modification. Nucleic Acids Res. 39, 6789–6801 (2011).
Ziomek, K., Kierzek, E., Biala, E. & Kierzek, R. The thermal stability of RNA duplexes containing modified base pairs placed at internal and terminal positions of the oligoribonucleotides. Biophys. Chem. 97, 233–241 (2002).
Pallan, P. S. et al. Unexpected origins of the enhanced pairing affinity of 2’-fluoro-modified RNA. Nucleic Acids Res. 39, 3482–3495 (2011).
Wang, Y. J., Vorperian, A., Shehabat, M. & Chaput, J. C. Evaluating the catalytic potential of a general RNA-cleaving FANA enzyme. Chembiochem 21, 1001–1006 (2020).
Matsuda, S. et al. Shorter is better: the α-(l)-threofuranosyl nucleic acid modification improves stability, potency, safety, and Ago2 binding and mitigates off-target effects of small interfering RNAs. J. Am. Chem. Soc. 145, 19691–19706 (2023).
Filichev, V. V., Christensen, U. B., Pedersen, E. B., Babu, B. R. & Wengel, J. Locked nucleic acids and intercalating nucleic acids in the design of easily denaturing nucleic acids: thermal stability studies. Chembiochem 5, 1673–1679 (2004).
Kashida, H., Murayama, K. & Asanuma, H. Acyclic artificial nucleic acids with phosphodiester bonds exhibit unique functions. Polym. J. 48, 781–786 (2016).
Suggs, J. W. & Taylor, D. A. Evidence for sequence-specific conformational-changes in DNA from the melting temperatures of DNA phosphorothioate derivatives. Nucleic Acids Res. 13, 5707–5716 (1985).
Lan, W. X. et al. Structural investigation into physiological DNA phosphorothioate modification. Sci. Rep. 6, 25737 (2016).
Gryaznov, S. M. et al. Oligonucleotide N3’-]P5’ Phosphoramidates. Proc. Natl. Acad. Sci. USA 92, 5798–5802 (1995).
Eriksson, M. & Nielsen, P. E. PNA-nucleic acid complexes. Structure, stability and dynamics. Q Rev. Biophys. 29, 369–394 (1996).
Murayama, K., Tanaka, Y., Toda, T., Kashida, H. & Asanuma, H. Highly stable duplex formation by artificial nucleic acids acyclic threoninol nucleic acid (aTNA) and serinol nucleic acid (SNA) with acyclic scaffolds. Chemistry 19, 14151–14158 (2013).
Czernecki, D. et al. How cyanophage S-2L rejects adenine and incorporates 2-aminoadenine to saturate hydrogen bonding in its DNA. Nat. Commun. 12, 2420 (2021).
Cozens, C., Pinheiro, V. B., Vaisman, A., Woodgate, R. & Holliger, P. A short adaptive path from DNA to RNA polymerases. Proc. Natl. Acad. Sci. USA 109, 8067–8072 (2012).
Aschenbrenner, J., Drum, M., Topal, H., Wieland, M. & Marx, A. Direct sensing of 5-methylcytosine by polymerase chain reaction. Angew. Chem. Int. Ed. 53, 8154–8158 (2014).
Malyshev, D. A., Seo, Y. J., Ordoukhanian, P. & Romesberg, F. E. PCR with an expanded genetic alphabet. J. Am. Chem. Soc. 131, 14620–14621 (2009).
Sun, L. et al. From polymerase engineering to semi-synthetic life: artificial expansion of the central dogma. RSC Chem. Biol. 3, 1173–1197 (2022).
Bai, J., Zou, J., Cao, Y., Du, Y. & Chen, T. Recognition of an unnatural base pair by tool enzymes from bacteriophages and its application in the enzymatic preparation of DNA with an expanded genetic alphabet. ACS Synth. Biol. 12, 2676–2690 (2023).
Hamashima, K., Soong, Y. T., Matsunaga, K., Kimoto, M. & Hirao, I. DNA sequencing method including unnatural bases for DNA aptamer generation by genetic alphabet expansion. ACS Synth. Biol. 8, 1401–1410 (2019).
Switzer, C., Moroney, S. E. & Benner, S. A. Enzymatic incorporation of a new base pair into DNA and RNA. J. Am. Chem. Soc. 111, 8322–8323 (1989).
Chen, T. J. et al. Evolution of thermophilic DNA polymerases for the recognition and amplification of C2′-modified DNA. Nat. Chem. 8, 557–563 (2016).
Chen, D., Han, Z., Liang, X. & Liu, Y. Engineering a DNA polymerase for modifying large RNA at specific positions. Nat. Chem. 17, 382–392 (2025).
Liu, Z., Chen, T. & Romesberg, F. E. Evolved polymerases facilitate selection of fully 2’-OMe-modified aptamers. Chem. Sci. 8, 8179–8182 (2017).
Chen, T. & Romesberg, F. E. A method for the exponential synthesis of RNA: introducing the polymerase chain transcription (PCT) reaction. Biochemistry 56, 5227–5228 (2017).
Medina, E., Yik, E. J., Herdewijn, P. & Chaput, J. C. Functional comparison of laboratory-evolved XNA polymerases for synthetic biology. ACS Synth. Biol. 10, 1429–1437 (2021).
Nikoomanzar, A., Dunn, M. R. & Chaput, J. C. Evaluating the rate and substrate specificity of laboratory evolved XNA polymerases. Anal. Chem. 89, 12622–12625 (2017).
Larsen, A. C. et al. A general strategy for expanding polymerase function by droplet microfluidics. Nat. Commun. 7, 11235 (2016).
Horhota, A. et al. Kinetic analysis of an efficient DNA-dependent TNA polymerase. J. Am. Chem. Soc. 127, 7427–7434 (2005).
Dangerfield, T. L., Kirmizialtin, S. & Johnson, K. A. Substrate specificity and proposed structure of the proofreading complex of T7 DNA polymerase. J. Biol. Chem. 298, 101627 (2022).
Smith, D. C. et al. Nanopores map the acid-base properties of a single site in a single DNA molecule. Nucleic Acids Res. 52, 7429–7436 (2024).
Rand, A. C. et al. Mapping DNA methylation with high-throughput nanopore sequencing. Nat. Methods 14, 411–413 (2017).
Harrison, J., Stirzaker, C. & Clark, S. J. Cytosines adjacent to methylated CpG sites can be partially resistant to conversion in genomic bisulfite sequencing leading to methylation artifacts. Anal. Biochem. 264, 129–132 (1998).
Georgieva, D., Liu, Q., Wang, K. & Egli, D. Detection of base analogs incorporated during DNA replication by nanopore sequencing. Nucleic Acids Res. 48, e88 (2020).
Müller, C. A. et al. Capturing the dynamics of genome replication on individual ultra-long nanopore sequence reads. Nat. Methods 16, 429–436 (2019).
Maier, K. C., Gressel, S., Cramer, P. & Schwalb, B. Native molecule sequencing by nano-ID reveals synthesis and stability of RNA isoforms. Genome Res. 30, 1332–1344 (2020).
Ledbetter, M. P. et al. Nanopore sequencing of an expanded genetic alphabet reveals high-fidelity replication of a predominantly hydrophobic unnatural base pair. J. Am. Chem. Soc. 142, 2110–2114 (2020).
Yamashige, R. et al. Highly specific unnatural base pair systems as a third base pair for PCR amplification. Nucleic Acids Res. 40, 2793–2806 (2012).
Ellefson, J. W. et al. Synthetic evolutionary origin of a proofreading reverse transcriptase. Science 352, 1590–1593 (2016).
Stephenson, W. et al. Direct detection of RNA modifications and structure using single-molecule nanopore sequencing. Cell Genom. 2, 100097 (2022).
Wang, Y., Ngor, A. K., Nikoomanzar, A. & Chaput, J. C. Evolution of a General RNA-Cleaving FANA Enzyme. Nat. Commun. 9, 5067 (2018).
Acknowledgements
The authors acknowledge the facility supported by the Hangzhou Institute of Medicine, Chinese Academy of Sciences. This work was supported by the National Key Research and Development Program of China (Grant No. 2022YFC3400400 and 2021YFF1200200), the National Natural Science Foundation of China (No. 22307123), Leading Health Talents in Zhejiang Province (WS2022LJ01), and the Natural Science Foundation of Zhejiang Province (No.YXD23B0301).
Author information
Authors and Affiliations
Contributions
Y.W.: data curation, formal analysis, visualization, writing–original draft, review & editing; Y.P.*: conceptualization, data curation, formal analysis, funding acquisition, investigation, writing original draft, review & editing; L.T.: supervision, methodology, validation, review & editing; X.S.: conceptualization, formal analysis, software, supervision; S.Z.: resources, software, supervision, visualization; J.S.*: formal analysis, funding acquisition, investigation, project administration, validation, visualization.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Yi Li and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, Y., Pei, Y., Tang, L. et al. Advances and challenges in non-canonical nucleic acids data storage. Nat Commun 17, 2354 (2026). https://doi.org/10.1038/s41467-026-68708-6
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41467-026-68708-6






