Advances and challenges in non-canonical nucleic acids data storage

Wang, Yan; Pei, Yufeng; Tang, Linlin; Sun, Xinyu; Zhou, Songtao; Song, Jie

doi:10.1038/s41467-026-68708-6

Download PDF

Review Article
Open access
Published: 04 February 2026

Advances and challenges in non-canonical nucleic acids data storage

Nature Communications volume 17, Article number: 2354 (2026) Cite this article

3730 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Canonical nucleic acids (DNA and RNA) naturally store genetic information with high density and programmability, making them promising candidates for molecular data storage. However, their susceptibility to degradation under harsh conditions, such as extreme pH, nuclease activity, and chemical attack, limits practical applications. In contrast, non-canonical nucleic acids (ncNAs) with natural or synthetic structural modifications exhibit enhanced stability and unique functional potential. This review systematically summarizes the fundamental properties of ncNAs, evaluates their suitability for molecular data storage, and discusses how their distinctive advantages may overcome the intrinsic limitations of canonical nucleic acids while addressing challenges in next-generation storage systems.

DNA as a universal chemical substrate for computing and data storage

Article 09 February 2024

Non-complementary strand commutation as a fundamental alternative for information processing by DNA and gene regulation

Article 05 January 2023

Nanosecond chain dynamics of single-stranded nucleic acids

Article Open access 17 July 2024

Introduction

The digital evolution of information technology has been driven largely by continuous innovations in data storage technologies. Since the mid-20th century, silicon-based semiconductors have enabled the explosive growth of the information age¹. However, as global data volumes are projected to exceed 175 zettabytes by 2025², conventional storage media, including magnetic devices, optical discs, and semiconductor memories, are approaching their physical and economic limits. Despite magnetic storage boasts theoretical maximum capacities of ~10¹⁴ bits/cm³ (ref. ³), real-world devices typically last fewer than 50 years even under ideal conditions⁴, are highly sensitive to environmental stress, and require prohibitive costs for long-term data maintenance⁵. Besides, the cost of maintaining data over decades is also prohibitively high, such as the $1 billion per decade required for maintaining 10⁹ GB of tape storage⁶. Furthermore, the operation of large-scale data centers contributes significantly to global energy demand ( ~ 5% of electricity use) and greenhouse gas emissions ( ~ 2%), underscoring an urgent need for sustainable alternatives^7,8. To enhance data storage capacity and durability while reducing costs, various novel data storage media have been developed. Glass-based systems, such as 5D optical storage in fused silica, leverage femtosecond laser writing to create nanostructured voxels with ultra-high density (up to 360 TB/disc) and longevity ( > 14 billion years)⁹. Metamaterial-based technologies expand possibilities through engineered electromagnetic properties, enabling terahertz-frequency data encoding and bypassing classical diffraction limits¹⁰. Other emerging platforms include phase-change materials, which utilize reversible structural transitions for non-volatile memory¹¹.

As the most compact and durable data carrier, DNA has emerged as a promising medium for molecular data storage, offering ultrahigh storage density (∼455 EB/g¹²), minimal energy consumption⁵, and long-term stability^13,14, as evidenced by successful retrieval of genetic material from ancient specimens dating back hundreds of thousands of years¹⁵. DNA’s well-understood biochemistry and compatibility with existing molecular biology techniques (e.g., PCR, sequencing) further enhance its attractiveness as a storage substrate. Additionally, DNA can be preserved and maintained under mild conditions, and biochemical experiments such as PCR or cellular proliferation require minimal energy input⁵, significantly reducing the cost of data storage, replication, and transmission.

Nevertheless, several challenges hinder the widespread application of DNA as a material for data storage, including poor chemical and biological stability in the open environment, restricted chemical diversity, and constrained storage capacity. DNA is particularly vulnerable to DNase^16,17, extreme pH conditions¹⁸, ultraviolet radiation¹⁹, and various chemicals which can induce strand breaks and result in information loss^20,21. Such DNA-damaging factors are commonly encountered in practical applications, posing significant limitations to its broader use in these fields.

To address these limitations, several innovative strategies have been developed to enhance their stability, functionality, and range of applications. These strategies include the use of DNA nanostructures (such as DNA origami)^22,23, chemical modifications (such as non-canonical nucleic acids, short for ncNAs)^24,25, encapsulation by hydrogel²⁶, metal-organic frameworks²⁷, and SiO₂ particles²⁸. Each approach targets different aspects of nucleic acid limitations and contributes to their broader application in data storage²⁹, molecular computing³⁰, diagnostics³¹, drug delivery³², and synthetic biology³³.

Among these strategies, one of the most attracting approaches is the utilization of ncNAs with naturally or chemically modified in base, sugar, backbone, or adopted mirror-image isomerism³⁴. Compared with canonical nucleic acids such as DNA or RNA, these evolutionary or artificially designed alternations generally lead to distinctive properties, display unique advantages in data storage³⁵. (i) The incorporation of ncNAs broadens the information-encoding alphabet and enhances storage capacity. (ii) NcNAs exhibit unique orthogonality, enabling data encryption and multithreaded molecular computation, which enhances the security of data storage and computational complexity. (iii) Some of the endogenous modifications on nucleic acids provide intrinsic advantages in data storage, editing, and erasure in living systems. (iv) Modifications on bases, sugars, and backbones bring changes in physical, chemical, and biological properties, leading to enhance their stabilities against attacking from open environment. Hence, ncNAs emerge as promising materials in data storage.

Structure-property relationships of non-canonical nucleic acids

Structure diversity of non-canonical nucleic acids

NcNAs, defined by chemical modifications beyond the canonical nucleotides, are generally classified into several categories: nucleobase modifications (including both naturally occurring non-canonical bases and engineered unnatural base pairs, UBPs), sugar modifications, phosphate backbone modifications, and mirror-image isomerism³⁴ (Fig. 1).

**Fig. 1: Major categories of non-canonical nucleic acids (ncNAs) modifications.**

Base modifications aim to expand the genetic alphabet and diversify base-pairing capabilities³⁶. Naturally occurring bases such as 2, 6-diaminopurine (short for 2-amino-A), which forms three hydrogen bonds with thymine (T), can increase the hybridization affinity of DNA duplexes^37,38. Epigenetic base modifications, including N6-methyldeoxyadenosine (6 mA), C5-methylcytosine (5mC), and their oxides (e.g., N6-methyladenosine, 6hmA, and 5-hydroxymethylcytosine, 5hmC), play essential regulatory roles in gene expression³⁹.

Base modifications like 5-fluorouracil (5-F-U)⁴⁰, 5-chlorouracil (5-Cl-U)⁴¹, 5-bromouracil (5-Br-U)⁴², and 5-iodouracil (5-I-U)⁴³ expand the chemical diversity of nucleic acids, enabling enhanced base-pairing interactions and offering potential for fine-tuned regulation of DNA hybridization and stability. Engineered UBPs have extended the repertoire of orthogonal pairing interactions. Up to 12 synthetic bases have been developed (A, T, G, C, B, S, P, Z, X, K, J, and V)⁴⁴, forming six mutually exclusive base pairs. Notably, Hachimoji DNA and RNA systems constructed from eight synthetic nucleotides (dZ, dP, dS, dB, Z, P, S, and B) allow for enhanced data storage capacity⁴⁵. In addition, base pairs such as NaM-5SICS and TPT3-NaM, which rely on hydrophobic rather than hydrogen-bonding interactions^46,47,48, have demonstrated the feasibility of alternative pairing mechanisms for information encoding and retrieval. Moreover, the unnatural pair between 7-(2-thienyl)-imidazo[4,5-b]pyridine (Ds) and 2-nitro-4-propynylpyrrole (Px), in which Px contains a five-membered pyrrole ring⁴⁹, illustrates further structural and steric complementarity as a design principle⁵⁰. In addition, the isoC-isoG base pair, which is composed of 6-amino-2-ketopurine (isoG) and 2-amino-4-ketopyrimidine (isoC) and is a structural isomer of the G-C base pair, forms a three-hydrogen-bond pair through a different hydrogen-bonding geometry, demonstrating the possibility of alternative base-pairing modes⁵¹.

Sugar modifications involve substitution of the native ribose or deoxyribose sugars with alternative chemical structures, significantly influencing the conformational flexibility, thermodynamic stability, and enzymatic recognition of the modified nucleic acids⁵². For instance, 2′-O-methylation (2′-OMe) is a common nucleotide epitranscriptomic modification found in RNA, whereby a methyl group is added to the 2′ hydroxyl group of the ribose moiety⁵³. Likewise, in 2′-F-RNA, a fluorophore replaces the 2′-OH group of RNA, forming more stable duplexes with RNA⁵⁴. Arabino nucleic acid (ANA), which has the 2′-OH in a configuration opposite to that of RNA, adopts a DNA-like conformation⁵⁵. In 2′-deoxy-2′-fluoroarabino nucleic acid (FANA), the ribose is replaced by a 2′-fluoroarabinose moiety that adopts a C2′/O4′-endo conformation, reminiscent of B-form DNA⁵⁶, thereby enhancing hybridization affinity⁵⁷. Threose nucleic acid (TNA), comprising a four-carbon sugar and a 2′,3′-linked phosphodiester backbone^58,59, adopts a B-helical conformation with an average twist of 36° and a helical rise of 3.2 Å per nucleotide^60,61, and demonstrate robust hybridization with both DNA and RNA. Locked nucleic acid (LNA) incorporates a methylene bridge connecting the 2′-oxygen and 4′-carbon of the ribose ring, conformationally locking the sugar in a C3′-endo pucker, thereby increasing thermal stability and affinity^62,63. 1,5-Anhydrohexitol nucleic acid (HNA), derived from a six-membered hexitol ring, features 2′,3′-dideoxy-1′,5′-anhydro-D-arabino-hexitol nucleosides with 4′−6′ phosphodiester linkages and base attachment at the 2′-position⁶⁴. HNA forms stable antiparallel duplexes with RNA, underscoring its structural versatility⁶⁵. Glycol nucleic acid (GNA) features a backbone composed of repeating glycol units linked by phosphodiester bonds⁶⁶, with nucleobases attached to the glycol units, and it has been investigated for its potential in various biotechnological applications. L-alpha-threose nucleic acid (L-αTNA) and D-alpha-threose nucleic acid (D-αTNA) are a pair of nucleic acid analogues that share a threose sugar backbone but differ in the stereoconfiguration of the sugar (L- and D- respectively), with L-αTNA often exhibiting enhanced stability and specific binding characteristics while D-αTNA may display distinct physical-chemical properties due to its opposite sugar configuration⁶⁷. Xylonucleic acid (XyloNA), which ribose is replaced by xylose in the backbone of nucleic acids, displays a higher thermal stability than DNA or RNA in self-hybridization, and could neither pair with DNA nor RNA⁶⁸.

Phosphate backbone modification involve chemical alternations to the phosphate backbone, resulting in changes to strand conformation, charge distribution, and biochemical stability⁶⁹. A well-known backbone modification is the introduction of phosphorothioate linkages (PS-DNA), wherein one of the non-bridging oxygen atoms in the phosphate group is substituted by sulfur, enhancing nuclease resistance and altering electrostatic properties^70,71. The N3’ → P5’ phosphoramidate DNA (3′-NP DNA) is a synthetic oligonucleotide analogue in which the 3′-oxygen of the natural phosphodiester backbone is replaced by an amino group, resulting in an N → P linkage⁷². Peptide nucleic acids (PNAs) exemplify this strategy by replacing the phosphate-sugar backbone with a neutral N-(2-aminoethyl)glycine moiety, thereby conferring resistance to nucleases and proteases⁷³, and enabling high-affinity hybridization via Hoogsteen-like base pairing⁷⁴. Serinol nucleic acid (SNA) is an acyclic phosphodiester backbone based on serinol (2-amino-1,3-propanediol).

With increasing availability and characterization of individual modifications, combinatorial strategies are emerging that synergistically integrate multiple modifications to harness their respective advantages. For example, antisense oligonucleotides incorporating PS-FANA exhibit both high binding affinity to target RNAs and efficient RNase H-mediated cleavage⁷⁵. Similarly, TNA-TPT3/NaM constructs combine the structural stability of TNA with the enhanced data storage capacity of TPT3/NaM UBPs, expanding the chemical diversity and functional repertoire of synthetic genetic systems⁷⁶.

Mirror-image isomerism, such as in L-DNA, an enantiomer of natural D-DNA, confers enhanced resistance to nuclease-mediated degradation due to the chiral specificity of biological enzymes⁷⁷. In addition to increased biological stability, L-DNA exhibits reduced immunogenicity, making it a promising scaffold for therapeutic and diagnostic applications⁷⁸.

In summary, the structural diversity of ncNAs imparts a rich spectrum of physicochemical and biological properties, including improved stability, expanded base-pairing systems, enhanced storage capacity, and increased functional versatility. These attributes underlie their potential for diverse applications in molecular data storage, computation, and synthetic biology, as discussed in the following sections.

Special properties of non-canonical nucleic acids

NcNAs exhibit enhanced chemical and biological stability due to their distinctive structural features (Table 1). Structural modifications on sugar, backbone, or bases often introduce conformational constraints and steric hindrance, which reduce susceptibility to enzymatic degradation by nucleases. Such modifications can also alter the electrostatic potential and hydrogen-bonding profile of the nucleic acid, thereby resist chemical or physical attacking. Additionally, the incorporation of hydrophobic moieties reduces solvent accessibility, further minimizing hydrolytic cleavage.

Table 1 Biological and chemical stability of non-canonical nucleic acids

Full size table

For example, synthetic analogues such as HNA⁷⁹, L-αTNA⁸⁰, PS-DNA⁸¹, NP-DNA⁷², and L-DNA^78,82^, demonstrate remarkable resistance to nuclease-mediated degradation owing to their modified sugar or backbone chemistries. In terms of chemical stability, several ncNAs, including FANA⁵⁴, TNA⁸³, and PNA^84,85, display resistance under acidic conditions. Furthermore, modifications at the 2′-position of the ribose, such as 2′-F-RNA and 2′-OMe-RNA, are well known to significantly enhance resistance to hydrolysis⁸⁶. NP-DNA also exhibits strong tolerance to divalent and transition metal ions⁸⁷, underscoring its suitability for chemically challenging environments.

Although structurally distinctive from canonical nucleic acids, many ncNAs retain the capability for Watson-Crick base pairing and programmable hybridization, often displaying distinctive thermodynamic properties comparable to those of canonical duplexes⁸⁸. These hybridization affinity, strand selectivity, and overall duplex stability are governed by van der Waals forces, electrostatics, hydrogen-bonding, and solvation energetics.

Sugar-modified nucleic acids such as FANA⁸⁹, HNA⁹⁰, and L-αTNA⁹¹can form stable homoduplexes as well as heteroduplexes with DNA or RNA. The melting temperature (T_m), defined as the temperature at which the hybridized and dissociated states of a duplex coexist with equal probability, is a widely used parameter for assessing duplex thermal stability and hybridization strength. Several ncNAs that modifications on sugar exhibit elevated T_m values relative to canonical nucleic acids. Unlike DNA, nucleic acids with 2′-substituted sugar modifications contribute to the pre-organization of sugar, which entropically facilitate hybridization⁸⁹. For example, ANA increases T_m by ~ 10.2 °C per incorporation⁶⁵. LNA raises T_m by ~ 10 °C per substitution⁹². However, Tm of TNA-DNA is lower than DNA-DNA hybridization⁹³ due to the distance between two adjacent phosphate groups of TNA cannot perfectly match with DNA⁹⁴. The geometric mismatch within the TNA–DNA duplex leads to asymmetric “breathing” fluctuations of the TNA-DNA duplex⁹⁵, thus increases the off rate from hybridized TNA-DNA duplex⁹⁶. For clarity, Table 2 summarizes average T_m shifts across representative systems.

Table 2 Thermodynamic properties of non-canonical nucleic acids

Full size table

Hybridization affinity and thermal stability are critical not only for molecular durability but also for controlled data retrieval. Duplex denaturation and re-annealing, required during PCR, capture, or strand-displacement readout, are governed by T_m-dependent kinetics (kon/koff). Insufficient stability risks spontaneous strand loss. Moreover, strong duplex pairing can protect nucleic acids from hydrolysis and nuclease attack. However, excessive stability hampers deliberate separation and slows amplification. The self-folding may also increase and suppress the processes of data read-out. Hence, it is essential to balance sequence composition, modification density, and readout conditions for optimized storage performance.

Overview of nucleic acid-based data storage

Owing to their high storage density and exceptional programmability, nucleic acids have emerged as a highly promising substrate for molecular data storage and have attracted substantial research interest in recent years. Nucleic acid-based data storage generally comprises four key steps: encoding, writing, preservation, and reading. The stored data can further be processed by molecular computation based on the communication (hybridization) within nucleic acid strands. In the encoding step, arbitrary information, including text, images, audio, and video, is first converted into binary strings (“0” and “1”) using standardized character encoding tables such as ASCII, UTF-8, or Unicode. These binary sequences are subsequently mapped onto nucleotide sequences⁹⁷ or higher-order nanostructures. The encoded oligonucleotides are typically synthesized through non-enzymatic or enzymatic approaches and preserved either in vitro or in vivo to ensure long-term stability. Information retrieval is commonly performed via sequencing or non-sequencing techniques. Notably, due to inherent challenges associated with the direct synthesis and sequencing of ncNAs, their systems often require additional processing steps, including transcription and reverse transcription⁹⁸.

Storage units

In canonical sequence-based storage, information is encoded using fixed binary-to-nucleotide conversion rules or advanced encoding schemes such as Goldman encoding⁹⁹, Grass encoding, and DNA Fountain¹⁰⁰. The simplest binary mapping (e.g., A = 00, C = 01, T = 10, G = 11) achieves a theoretical data storage capacity of 2 bits per nucleotide¹⁰¹.

While sequence-based approaches have demonstrated remarkable potential in molecular data storage, they remain constrained by several inherent limitations of canonical nucleic acids. These include susceptibility to enzymatic degradation, limited chemical diversity, and a restricted four-letter genetic alphabet, which historically capped the maximum theoretical storage capacity at 2 bits per base (Fig. 2a-i)^102,103. However, recent advancements in encoding techniques, such as the use of DNA-QLC¹⁰⁴, have shown that it is possible to surpass 2 bits per base limit, enabling much higher storage densities.

**Fig. 2: Nucleic acid-based storage units.**

To address these challenges, ncNAs are used in data storage. These engineered ncNAs offer expanded coding capacity¹⁰⁵, enhanced chemical and biological stability⁸³, and greater compatibility with parallel¹⁰⁶, exhibiting attractive potential as high-capacity, high-stability, and secure molecular data storage candidates. The theoretical storage capacity of ncNAs reaches $\log {2}^{N}\,$ bit per ncNA incorporation ($N$), suggesting ncNA may largely increase the capacity on a log scale compared with only four bases system (Fig. 2a-ii, iii). For example, incorporation of UBPs, such as B-S and Z-P pairs, further expands the information alphabet and increases theoretical data capacity to 3 bit/nucleotide (Fig. 2a-ii)^107,108. In addition, sugar-modified systems such as TNA can stably encode binary data using canonical mapping strategies (Fig. 2a-iv)^76,109,110. Additionally, L-DNA, the mirror-image enantiomer of natural D-DNA, maintains equivalent theoretical storage capacity while offering high bio-orthogonality, which greatly enhances resistance to natural nuclease degradation (Fig. 2a-v)⁷⁸. Furthermore, epigenetic modifications based 5 mC have been explored, in which enzymatic methylation of self-assembled DNA scaffolds using DNMT1, overcoming the slow kinetics and high cost of de novo synthesis (Fig. 2a-vi)¹⁰⁶. Collectively, these strategies enable non-canonical systems to surpass canonical DNA in both security and durability, paving the way for robust, orthogonal molecular data storage platforms.

While sequence-based approaches primarily rely on one-dimensional arrangements of canonical or modified nucleotides to represent digital information, DNA nanostructures leverage spatial folding and higher-order assembly to encode data in multiple dimensions (Fig. 2b)¹¹¹. This structural encoding can substantially increase the data storage capacity, introduce additional layers of security, and diversify retrieval modalities.

For example, DNA hairpin structures further enrich the encoding capacity by exploiting the formation or disruption of secondary motifs. The presence or absence of such hairpins can be detected by fluorescence resonance energy transfer or other optical readouts, enabling simple and robust binary encoding schemes (Fig. 2b-i).

DNA origami utilizes long single-stranded scaffolds folded by numerous short staple strands into well-defined two- and three-dimensional architectures (Fig. 2b-ii), which are capable of representing digital symbols, images, or complex patterns that are readily visualized by atomic force microscopy (AFM) or transmission electron microscopy (TEM)¹¹². In contrast to linear sequences, such spatial configurations allow the direct storage of recognizable shapes or logos, circumventing the need for base-by-base sequencing during retrieval.

Similarly, tile-based DNA arrays self-assemble into nanoscale lattices in which the presence, absence, or arrangement of specific tiles encodes binary or multilevel information (Fig. 2b-iii)¹¹¹. These lattice architectures support parallel data storage and can be interrogated through surface imaging techniques or hybridization-based probing.

In this context, recent advances in deep learning models capable of accurately predicting the folding free energy of DNA sequences offer powerful tools to preemptively eliminate high-propensity hairpin-forming regions¹¹³, thereby streamlining the design of reliable and structurally defined storage architectures.

Dynamic DNA devices extend this paradigm by incorporating responsive elements that reversibly switch between distinct structural states in response to external stimuli such as temperature (Fig. 2b-iv)¹¹⁴, pH (Fig. 2b-v)^115,116, or light^117,118 (Fig. 2b-vi). This dynamic behavior enables rewritable and reconfigurable data storage systems that are challenging to achieve with purely sequence-based approaches. Furthermore, the kinetics and dynamics of oligonucleotide hybridization systematically elucidate how sequence composition, environmental conditions, and conformational transitions profoundly influence pairing rates and stability¹¹⁹, providing a fundamental framework for designing responsive and high-fidelity DNA storage architectures.

In contrast to DNA and RNA, ncNAs are typically employed for the assembly of relatively smaller nanostructures because of the difficulty of the availability of long stranded ncNAs, especially modification on sugars. Most of the strands in these ncNA nanostructures are obtained by solid-phase synthesis. Compared to DNA nanostructures, these ncNA nanostructures display unique advantages, for example, double crossover nanostructures constructed by five FANA strands have higher thermostability and acid-resistance ability¹²⁰. Nucleoside analogs floxuridine (FUDR) and gemcitabine (GEM) contained RNA oligonucleotides can be assembled into four-way junction structures, which have high efficacy in the treatment of Triple-Negative Breast Cancer¹²¹. To construct large ncNA nanostructure, long stranded ncNAs are required. These long stranded ncNAs should be transcribed by polymerase. Dr. Alexander I. Taylor et al.¹²². constructed a full FANA octahedron origami structure. The long-stranded FANA ( ~ 1.7kbp) was synthesized by transcription with Tgo-D4K polymerase using long single-stranded DNA template. Recently, a ~2000 nt nucleoside analogues (m5C, s2U, 5FU, and PseudoU) integrated single-stranded RNA origami has been reported synthesized by T7 polymerase, which can induce epigenetic immunomodulation¹²³. With the increasing number of ncNA polymerases obtained through directed evolution, ncNA origami or other large-scaled nanostructures will emerge. Collectively, these strategies illustrate how the programmability and architectural versatility of DNA nanotechnology can transcend the limitations of linear nucleotide sequences, offering a promising platform for high-capacity, secure, and multidimensional molecular data storage.

Writing

Writing processes in nucleic acid-based systems are typically classified into non-enzymatic and enzymatic synthesis methods.

Solid-phase synthesis using phosphoramidite chemistry remains the widely adopted technique for synthesizing non-canonical oligonucleotides, including PNA, LNA, and TNA variants¹²⁴. This method involves the stepwise coupling of protected nucleoside phosphoramidites to a growing oligonucleotide anchored on a solid support (Fig. 3a–i). However, synthesis efficiency is limited by side reactions (e.g., deletions, insertions, and substitutions) and cumulative coupling inefficiencies ( ~ 95–99.5% per cycle³⁵), which generally restrict product length to under 120 nt^125,126. Additionally, the technique demands tightly controlled environmental conditions and specialized reagents, limiting scalability and increasing cost (Table 3)^127,128. Microarray-based synthesis offers an alternative for high-throughput production, leveraging parallel phosphoramidite reactions across surface-bound arrays¹²⁹.

**Fig. 3: Data writing and reading of nucleic acid data storage.**

Table 3 Cost of oligonucleotides, phosphoramidite monomers, and triphosphate monomers for canonical and non-canonical nucleic acids

Full size table

Template-directed chemical ligation offers an alternative route, especially valuable for substrates incompatible with enzymatic systems. In this strategy, phosphate groups are activated by coordination with N-cyanoimidazole (CNIm) and Mn²⁺ ions, enabling nucleophilic attack and subsequent phosphodiester bond formation (Fig. 3a-ii)¹³⁰. This method has been successfully applied to the synthesis of chemically diverse backbones such as L-αTNA and 3’-NP-DNA¹³¹, supporting the incorporation of non-standard linkages with high fidelity.

DNA and RNA polymerases can incorporate modified triphosphates (xNTPs) into growing strands under physiological conditions (Fig. 3a-iii)¹³². Through directed evolution and rational design, engineered polymerases have been developed to accommodate specific non-canonical substrates^133,134 (Table 4). These enzymes support phosphodiester bond formation at the 3′-OH via sequential nucleotidyl transfer, and in specific mutant contexts, can even facilitate phosphoramidate bond formation¹³⁵. Recent advances involving trivalent ions (e.g., Sc³⁺) have significantly accelerated reaction kinetics⁸⁷.

Table 4 Polymerase compatibility of non-canonical nucleic acids synthesis

Full size table

Template-directed ligation offers a complementary approach for synthesizing long non-canonical strands without the need for monomer synthesis (Fig. 3a-iv). T3, T4, and T7 DNA ligases have been employed to ligate various modified strands, including FANA¹³⁶, TNA, and LNA¹³⁷, with high efficiency and sequence specificity. This method allows modular assembly of oligonucleotide segments (20–120 nt), facilitating the construction of longer polymers from shorter synthetic units.

TiEOS leverages the template-independent activity of terminal deoxynucleotidyl transferase (TdT), which adds modified dNTPs to the 3′ end of ssDNA without a templating strand (Fig. 3a-v)¹³⁸. TdT exhibits remarkable substrate tolerance, enabling incorporation of LNA¹³⁹, 2′-OMe¹³³, 2′-F¹³³, and dTPT3TP¹⁴⁰. TiEOS provides a programmable, modular framework for synthesizing structurally diverse nucleic acid libraries suitable for data storage and encryption. In parallel, cap-free TdT synthesis strategies combined with trit-based encoding and enzymatic length control have demonstrated scalable, low-cost production of high-fidelity DNA data pools, further extending the practical potential of enzyme-driven storage architectures¹⁴¹.

Preservation

Efficient preservation of nucleic acid molecules is a critical prerequisite for long-term data integrity in molecular data storage systems. Preservation strategies can be broadly classified into in vitro and in vivo preservation (Fig. 3b), each offering distinct advantages in terms of storage capacity, stability, scalability, and flexibility.

In vitro preservation methods protect DNA outside living systems through physical and chemical stabilization. Approaches such as silica encapsulation, glass immobilization, and polymer-based matrices have enabled long-term storage under ambient conditions by minimizing hydrolysis and oxidation (Fig. 3b, left)⁵. DNA embedded in electrospun and polymer fibers also offers a scalable, solid-state format with high capacity and mechanical stability¹⁴², enabling protection against environmental degradation while maintaining accessibility for downstream retrieval.

In contrast, in vivo preservation strategies integrate synthetic DNA constructs into biological hosts such as plasmids, artificial chromosomes, microbial or plants^{143,144,145,146}. These systems benefit from the host’s intrinsic replication and repair mechanisms¹⁴⁷ (Fig. 3b, right), which not only amplify stored sequences but also correct base-level errors and support dynamic, programmable data editing^148,149.

However, both in vitro and in vivo platforms must contend with the chemical vulnerability of natural DNA to environmental perturbations^150,151. In this regard, ncNAs such as TNA and L-DNA exhibit markedly enhanced stability due to their resistance to enzymatic degradation and limited interaction with endogenous cellular machinery^78,109. These properties render chemically modified systems particularly attractive for applications requiring long-term archival storage and data integrity in hostile environments.

Readout

The final stage of nucleic acid-based data storage involves the retrieval and decoding of stored information. This process typically comprises two sequential steps: random access retrieval from complex DNA pools, and readout of the encoded information, either by sequencing or structural analysis¹⁵².

Selective retrieval of target DNA molecules from complex DNA pools is commonly achieved through physical extraction, PCR via designed primers, CRISPR-based techniques¹⁵³, or digital microfluidics¹⁵⁴. Magnetic bead-based physical extraction utilizes sequence-specific hybridization and magnetic separation for high-throughput, lossless isolation (Fig. 3c-i). PCR amplification, a widely employed method due to its specificity and scalability, is capable of retrieving target sequences from DNA pools containing millions of distinct oligonucleotides (Fig. 3c-ii)¹⁵⁵. However, PCR is limited by primer design constraints, susceptibility to non-specific amplification, and incompatibility with rewritable storage architectures^129,156. Programmable CRISPR-based DNA storage systems enable precise data manipulation and retrieval (Fig. 3c-iii)¹⁵⁷, exemplified by Cas12a-driven multiplexed searches and Cas9-mediated rewriting via homology-directed repair^158,159. Digital microfluidics divide large DNA pools into spatially segregated subpools on microfluidic chips to minimize strand interactions and enhance retrieval efficiency^27,160,161. Despite improvements, this strategy still faces limitations due to molecular crowding and amplification-induced errors^162,163.

For non-canonical systems (e.g., TNA, L-DNA), retrieval is typically performed using PCR-based methods^78,110. Nevertheless, CRISPR-guided retrieval remains a promising direction, particularly when adapted to recognize unnatural bases.

Readout of stored information is predominantly performed using sequencing or non-sequencing readout techniques depending on the storage units.

Canonical DNA storage systems rely heavily on first-generation (Sanger) sequencing (Fig. 3d-i), next-generation sequencing (NGS) (Fig. 3d-ii)¹⁶⁴ and third-generation (Nanopore-based) sequencing (Fig. 3d-iii), which enables long-read, real-time monitoring of DNA strands. Nanopore sequencing, in particular, has emerged as a versatile platform for direct reading of chemically modified nucleic acids¹⁶⁵. For example, FANA strands can be read using Nanopore Induced Phase-Shift Sequencing (NIPSS) after ligation with DNA drive strands¹⁶⁶. Phosphorothioate (PS) modifications can be detected via the ONT/ELIGOS platform¹⁶⁷. DeepMod, a deep learning framework built on ONT data, achieves high accuracy in detecting epigenetic marks such as 5-methylcytosine (5mC, ~99%) and N6-methyladenine (6 mA, ~90%)³⁹.

When direct sequencing is not feasible, many systems rely on reverse transcription (RT) of modified strands into canonical DNA or RNA⁸⁸. TNA-to-cDNA conversion, followed by Illumina sequencing, has demonstrated high fidelity ( ~ 99.2%)¹¹⁰. Base substitutions during RT enable detection of Z-P¹⁶⁸, Ds-Px¹⁶⁹, and TPT3-NaM¹⁷⁰ pairs using conventional sequencing platforms (Sanger, NGS, or Nanopore). Although these methods expand compatibility, base conversion strategies often introduce mutagenic noise and sequence bias, particularly during multiple RT-PCR cycles, thereby compromising accuracy and potentially obscuring encrypted information⁷⁸. In Table 5, we summarize the sequencing methods of ncNAs.

Table 5 Sequencing methods for non-canonical nucleic acids

Full size table

For storage platforms that employ structural or conformational encoding, information retrieval is typically achieved through microscopy-based readout methods. AFM provides nanometer-resolution imaging of nanoscale motifs, enabling direct detection of features such as bulges, nicks, or conformational switches (Fig. 3d-iv)¹⁷¹. Gel electrophoresis offers a sequencing-free readout modality by distinguishing structural variants according to their differential migration rates (Fig. 3d-v). More recently, super-resolution microscopy techniques, such as stochastic optical reconstruction microscopy and DNA points accumulation for imaging nanoscale topography (DNA-PAINT), have enabled multiplexed, high-throughput decoding of DNA nanostructures by resolving structural states at the single-molecule level (Fig. 3d-vi)^172,173. These sequencing-independent strategies are particularly advantageous for chemically diverse or orthogonal nucleic acid systems, as they bypass the limitations of base-calling algorithms and sequence fidelity. Importantly, structure-based readout approaches provide unique opportunities for secure or tamper-proof storage, where critical information is embedded in physical topology and spatial configuration rather than in linear sequence, rendering it inaccessible to conventional sequencing platforms.

Molecular computation

Similar to silicon-based electronic computers, DNA with stored data can also perform processing on the data. Benefiting from the molecular recognition capability of DNA and their data storage capacity, molecular computation development based on DNA has been realized. Since Adleman pioneered the use of DNA computing to solve the Hamiltonian Path Problem¹⁷⁴ in 1994, DNA has been considered as a powerful molecular computing tool for solving intricate problems. Based on Watson-Crick-Franklin base pair interaction, primary DNA logic gates such as YES, NOT, OR, and AND gate have been designed^175,176,177, as well as more complex circuits and networks have been developed¹⁷⁸. A landmark breakthrough was reported in 2006 by Erik Winfree¹⁷⁹ employing toehold mediated strand displacement enables the cascading of computations and scaling up. After that, Qian et al.¹⁸⁰ further designed seesaw gate-based logic circuits and achieved DNA based neural networks. Kevin M. Cherry et al. even demonstrated the implementation of a DNA-based supervised learning network¹⁸¹, highlighting the powerful role of DNA as a programmable and dynamic material in molecular computation. A rechargeable computation system in DNA logic circuits and neural networks manipulated by heat was also established, suggesting DNA-based computation may develop in a more sustainable manner than other artificial systems^181,182. Recent advances in biophysical techniques, particularly single-molecule fluorescence and magnetic tweezers, have empowered DNA computation performed at the single-molecule level^183,184, thereby permitting real-time and controllable monitoring of the computing process and enhancing its interactivity.

However, current DNA computing still faces certain challenges, particularly in terms of low orthogonality, which may arise as computation scales up, since greater circuit or network complexity increases the likelihood of unwanted leakage reactions from mismatched interactions. A promising strategy is to utilize the intrinsic high orthogonality of ncNAs. For example, L-DNA does not directly recognize natural DNA or RNA¹⁸⁵, but can communicate with them mediated by PNA¹⁸⁶ or chimeric D/L-oligonucleotides¹⁸⁷. An amplification circuit composed of D-αTNA is orthogonal to L-αTNA, DNA and RNA, but can be initiated by itself or SNA¹⁸⁸. Conceptually, introducing ncNAs enhances the parallelism of nucleic acid computing, accelerating computation and extending its scalability.

Advantages of non-canonical nucleic acids-based data storage

NcNAs, owing to their diverse chemical structures and physicochemical properties, present several notable advantages over their canonical counterparts for digital data storage (Table 6). These include expanded genetic alphabets, enhanced chemical and biological stability, and parallelized data writing.

Table 6 Advantages and challenges of non-canonical nucleic acids data storage

Full size table

Expanded genetic alphabet

A major limitation of canonical DNA storage lies in its dependence on four standard nucleotides (A, T, G, C), restricting the maximum theoretical data storage capacity to 2 bits per base. UBPs expand this alphabet and thus enable higher-capacity storage^48,189. For instance, a 12-letter artificially expanded genetic information systems (AEGIS), including synthetic bases such as P, Z, B, S, X, J, K, and V, can theoretically encode up to 3.58 bits per base (log₂12)⁴⁴, offering a significant improvement over canonical systems. Hydrophobic base pairs such as NaM-TPT3 further contribute to this expansion and enhance the functional diversity of the storage system¹⁹⁰.

Enhanced stability for long-term preservation

The chemical and enzymatic stability of the storage medium is critical for data integrity. Sugar-modified nucleic acids such as TNA exhibit enhanced biochemical and thermal stability, remaining intact even after prolonged exposure to biological environments such as human serum, where canonical DNA undergoes rapid degradation¹¹⁰. Moreover, hybrid systems combining sugar modifications and UBPs (e.g., TNA + NaM-TPT3⁷⁶) not only boost storage capacity but also introduce multifunctional advantages, including environmental responsiveness and intrinsic data encryption.

Bio-orthogonality

Benefiting from alterations in spatial structure, ncNAs can evade recognition by biomacromolecules, thereby achieving bio-orthogonality and enabling data storage independently of biological systems. Mirror-image nucleic acids (L-DNA) maintain the same sequence information capacity as D-DNA but provide superior biological stability due to their resistance to natural nucleases such as DNase I⁷⁸. This significantly prolongs their lifespan in biological environments¹⁹¹, making them ideal for secure storage. Their bio-orthogonality ensures that they are not misread by natural systems, offering intrinsic data protection. However, the lack of compatible polymerases and sequencing tools makes L-DNA systems cost-intensive and technically challenging.

Parallel storage

Sequence-independent writing represents a promising frontier for scalable storage. The use of epigenetic markers, such as 5-methylcytosine (5mC), allows binary data to be encoded orthogonally onto existing sequences-methylated cytosine (5mC) denoting “1” and unmethylated cytosine denoting “0”. This strategy, akin to movable type printing, employs site-specific methylation via DNMT1 to inscribe data without altering the underlying sequence¹⁹². Such systems are inherently parallelizable, reversible, and offer reconfigurable, reusable storage architecture.

Collectively, ncNAs displayed diverse advantages in data storage. NcNAs with modifications at different positions exhibit distinct advantages in various application scenarios of data storage. For instance, modification on bases may expand the storage capacity, while lacking in biological and chemical stability. In contrast, modifications on the sugar ring and phosphate backbone can enhance the stability and orthogonality of nucleic acids; however, these modifications often hinder polymerase recognition, making the synthesis of long strands more challenging. Data retrieval is even more difficult than that based on base modifications. Furthermore, multiple modifications, such as TNA + UBP, combine the nuclease-resistant scaffold of TNA with the expanded storage capacity of UBP. Mirror-image nucleic acids (e.g., L-DNA) exhibit bio-orthogonality by evading recognition from natural biomacromolecules, offering enhanced stability and security for data storage, though challenges remain in polymerase compatibility and sequencing tools. In contrast, the parallelization of data writing using epigenetic markers like 5mC enables efficient, reversible, and scalable encoding without altering the original sequence, providing a flexible and reusable storage architecture.

Current challenges and future perspectives

Despite their substantial potential, ncNAs face several technical bottlenecks that hinder their large-scale deployment. Chief among these are challenges in synthesis, readout technologies, and chemical diversification. Addressing these limitations is critical for the realization of scalable, robust, and cost-effective data storage platforms.

Writing: synthesis limitations and polymerase compatibility

Synthesis remains a primary obstacle. Both chemical and enzymatic synthesis routes are technically demanding and costly^193,194. Solid-phase synthesis involves the use of harsh reagents and expensive nucleotide building blocks (Table 3)¹⁹⁵, while enzymatic approaches are limited by the substrate recognition fidelity of natural polymerases, which often fail to process UBPs or modified backbones. These inefficiencies significantly drive-up costs. Additionally, current synthesis technologies struggle to produce long, non-canonical sequences. In molecular data storage, longer nucleic acid strands require fewer index sequences, resulting in less redundancy and thus higher storage capacity. In contrast, short ncNA limits both data capacity and reliability. To overcome these limitations, more automated and scalable synthesis platforms are required. Enhancing coupling efficiency and minimizing synthesis errors in solid-phase synthesis, and reducing monomer costs through novel chemical or enzymatic production pathways, will be essential¹⁹⁶. Meanwhile, polymerase engineering, accelerated by machine learning tools, holds promise for improving enzymatic compatibility with non-canonical substrates^197,198. For instance, Holliger et al. engineered an archaeal polymerase with steric gate mutations to enable synthesis of long 2′-OMe RNA oligonucleotides up to 750 nt¹⁹⁹.

Reading: sequencing limitations and optimization strategies

Accurate and efficient readout is another major challenge. Conventional sequencing platforms like Illumina and Sanger methods are often incompatible with non-canonical nucleotides, typically necessitating reverse transcription into canonical DNA, a step that adds complexity, time, and error potential²⁰⁰.

Recent experimental advances demonstrate that nanopore sequencing can be adapted to directly decode synthetic and ncNAs. Two landmark studies reported (i) high-throughput deconvolution of non-canonical bases^201,202,203 and (ii) sequencing of DNA analogs with artificial bases²⁰³, both relying on bespoke training datasets and customized basecaller models. In parallel, methodological improvements, including Uncalled4²⁰⁴, Rockfish²⁰⁵, and the Bonito/Dorado/Remora toolchains²⁰⁶, as well as transformer-based architectures, have markedly enhanced the accuracy of modification detection and base analog discrimination. Advances in solid-state nanopores (e.g., graphene, MoS₂²⁰⁷) further underscore the theoretical potential for improved resolution and tunability, although robust superiority over biological pores remains to be experimentally validated. Collectively, these developments suggest that the gap between chemical innovation in ncNAs and their practical single-molecule readout is rapidly narrowing, contingent on continued progress in pore engineering, dataset curation, and machine-learning–driven basecalling.

Expanding the substrate landscape: chemical diversity and storage potential

The current landscape of ncNAs remains limited in scope, representing only a small fraction of their potential. Broadening the chemical repertoire, through the development of new bases, sugars, and backbone chemistry, will be crucial. For instance, Pol6G12 and RT521 have shown capability for transferring genetic information between DNA and HNA¹³⁴, highlighting their potential in storage applications.

RNA, while structurally and functionally analogous to DNA, offers both distinct advantages and pronounced challenges in the context of molecular data storage. As a single-stranded polymer, RNA exhibits greater conformational flexibility and is amenable to a wide range of enzymatic manipulations, including transcription, reverse transcription, and programmable editing via CRISPR-Cas systems, making it a promising candidate for dynamic and rewritable data architectures^208,209,210. However, its chemical instability, primarily due to the presence of a 2′-hydroxyl group that renders the phosphodiester backbone susceptible to base-catalyzed hydrolysis, significantly limits its utility for long-term storage applications²¹¹. Moreover, RNA is highly vulnerable to ubiquitous ribonucleases, which can rapidly degrade unprotected RNA molecules even under mild conditions²¹². To address these limitations, various chemical modification strategies have been developed to enhance RNA stability while preserving its informational and functional capabilities. Notable among these are 2′-O-methyl (2′-OMe), 2′-fluoro (2′-F), and LNA modifications, which confer increased resistance to nucleases and thermal denaturation without compromising base-pairing fidelity^213,214. These modified RNAs have found extensive use in therapeutic and diagnostic applications, and their potential in data storage systems remains an emerging area of interest.

Beyond RNA itself, synthetic analogs with altered sugar-phosphate backbones offer a broader chemical space for engineering storage systems with superior stability and functionality. Several RNA-inspired ncNAs, such as GNA, and HNA, have demonstrated the ability to form stable Watson-Crick base pairs with complementary DNA or RNA strands, while exhibiting enhanced resistance to hydrolytic and enzymatic degradation¹³⁴. Importantly, polymerases capable of synthesizing and replicating TNA and HNA have been engineered, suggesting that these systems could support autonomous information copying and evolution in synthetic media²¹⁵.

Looking ahead, combining multiple modifications (e.g., TNA + TPT3⁷⁶, PS + FANA⁷⁵) offers a modular approach to building high storage capacity, functional, and responsive storage systems. These hybrid platforms can encode data with enhanced fidelity, security, and environmental adaptability, features vital for the next-generation of molecular data storage technologies^76,216.

Ethical and biosafety considerations

Given their structural homology with natural DNA and RNA, ncNAs raise important ethical and biosafety concerns, especially as they move from in vitro to potential in vivo applications. Although ncNA-based data storage has been demonstrated only in vitro so far, advancements in DNA-based in vivo storage systems suggest that ncNAs could be explored for future in vivo data storage^217,218. However, several biosafety and ethical considerations must be carefully addressed.

One concern arises from recent discussions about mirror-image life forms composed of L-DNA and other mirror-image biological molecules²¹⁹. Although the synthesis of a mirror-image organism is beyond current technological capabilities, progress in synthetic biology could eventually enable the construction of a fully mirror-image bacterium. Preliminary assessments suggest that these organisms could evade host immune surveillance and resist degradation by natural enzymes, leading to uncontrolled proliferation and potential biosafety risks. To mitigate these concerns, constructing restriction systems that limit the replication of mirror-image life may be necessary.

Similarly, the application of ncNAs for data storage requires caution regarding their biosafety. While the evolution of polymerases capable of handling these modified bases has been a focal point, the development of highly specific nucleases targeting ncNAs is also critical. Furthermore, the self-replication capabilities of ncNAs must be tightly controlled. Prior to in vivo data storage, it is essential to evaluate the efficiency of genome integration, potential mutation risks, immune responses, and metabolic clearance pathways associated with ncNAs. These factors must be clearly defined to ensure safety in future applications.

Molecular security, encryption, and bio-orthogonal steganography

Beyond expanding data storage capacity and chemical diversity, ncNAs also open new horizons for molecular-level security and encryption. Incorporating modified or synthetic nucleotides such as 5mC or other unnatural bases can render encoded sequences readable only by specific biochemical or enzymatic processes²²⁰, ensuring that only authorized entities can decode the data. In parallel, embedding hidden data layers within DNA through chemical modifications²²¹ or secondary structural encoding provides additional protection against unauthorized access, establishing a molecular equivalent of multi-factor authentication.

Furthermore, ncNA sequences can also serve as unique molecular signatures or physical unclonable functions (PFUs) or molecular fingerprints. Their replication requires precise knowledge of both the sequence composition and the synthetic conditions²²², thereby making counterfeiting or tampering extremely difficult. The inherent complexity of ncNA architectures, including expanded base alphabets, backbone modifications, and hybridization orthogonality, further enhances anti-counterfeiting capabilities and steganographic potential¹⁰³.

A particularly exciting direction lies in bio-orthogonal steganography, which leverages the orthogonality between ncNAs and natural biomolecules to conceal data within biological systems without interfering with native functions. For instance, L-DNA, which exhibits no complementarity with natural D-DNA, has been used to create “mirror-encoded” information for covert communication and molecular authentication⁷⁸. Similarly, PNA with charge-neutral backbones can evade detection by conventional hybridization probes, while highly stable systems like TNA offer durability for long-term encrypted data storage.

Building on this concept, DNA adducts represent a distinctive class of ncNAs in which DNA bases are covalently modified by small molecules such as psoralens or aldehydes^223,224, enabling reversible chemical encryption. These adducts can be selectively formed or cleaved in response to external stimuli, including light, heat, or redox conditions, allowing dynamic data concealment and recovery. Collectively, such strategies point toward a future where ncNA-based data storage systems integrate multi-layered molecular encryption, steganography, and authentication, providing unprecedented levels of data security at the nanoscale.

Advanced encoding schemes for error minimization

In previous sections, we introduced the use of UBPs to expand the genetic alphabet, allowing for increased storage capacity in molecular data storage. However, as the alphabet expands, the potential for higher error rates during both data writing and reading processes increases. To address these challenges, several encoding schemes can be employed to minimize errors. Non-linear codes, such as run-length limited and Huffman coding, reduce homopolymer errors and improve error correction efficiency in large-alphabet systems²²⁵. Combinatorial encoding strategies, including DNA Fountain and overlapping pool designs, exploit redundancy and stochastic assembly to enhance robustness against synthesis and sequencing errors. More recently, 3D structural encoding using DNA origami and nanostructures enables storage in conformational states, bypassing sequencing altogether²²⁶. Within this toolbox, non-canonical nucleotides provide a unique opportunity: expanded alphabets (e.g., AEGIS or Hachimoji systems⁴⁵) enable octal or higher-level coding schemes, thereby increasing per-base data capacity (up to 3.58 bits/base for a 12-letter alphabet). Moreover, their orthogonality and chemical tunability may synergize with combinatorial and structural encoding, offering parallel channels for error-resilient, multi-layered data representation.

Summary and outlook

As both the types and volumes of data continue growing exponentially, DNA-based storage platforms have attracted considerable attention due to their high storage capacity, durability, and low energy requirements. However, the biochemical instability of canonical DNA poses significant limitations for practical application. NcNAs have emerged as attractive alternatives, with diverse applications in molecular medicine, synthetic biology, and nanotechnology. By introducing various modifications at the base, sugar, or backbone moieties, ncNAs offer several advantages, including enhanced biochemical stability, expanded storage capacity, and novel orthogonal functionalities. Hybrid approaches that integrate multiple modifications, such as epigenetic marks combined with sugar or backbone alterations, hold great promise for creating multi-layered, rewritable, and secure data encoding systems.

Despite the promising potential of ncNAs, several technical challenges hinder their practical application. First, large-scale synthesis remains a significant bottleneck, as current methods are not yet efficient in producing long ncNA sequences. Additionally, many synthetic nucleic acids are incompatible with the polymerases required for replication and transcription²²⁷, limiting their functionality. Sequencing also remains a critical challenge; while nanopore sequencing has made progress^228,229, reliable single-molecule readouts of UBPs require further optimization, including advancements in pore engineering, signal calibration, and base-calling algorithms. Overcoming these obstacles will necessitate multidisciplinary collaboration, integrating expertise from synthetic chemistry, enzymology, materials science, and computational analysis. Key areas for advancement include expanding the chemical repertoire of nucleic acids, engineering more robust and versatile polymerases, and optimizing nanopore-based sequencing technologies. As these challenges are progressively addressed, ncNAs are poised to form the foundation for next-generation molecular data storage systems.

Outlook

Advances in ncNA-based systems are expected to bring significant improvements in writing, preservation, and reading (Fig. 4).

**Fig. 4: Roadmap of Non-Canonical Nucleic Acids in Data Storage.**

In terms of writing, rapid progress will be made in ncNA synthesis, driven by technologies like TdT and polymerase variants, enabling the creation of long-stranded and chimeric ncNAs in the short term. In the medium to long term, high-throughput real-time synthesis, fully automated ultra-fast synthesis, and even portable micro-writing devices will emerge, drastically enhancing the throughput and flexibility of ncNA writing.

For preservation, on-chip ncNA storage will be achieved in the near future, followed by the integration of ncNA storage and computing systems. Innovations in encryption and steganography using ncNAs will also gain traction. Over time, smart, low-cost encapsulation systems for real-time monitoring of ncNA degradation, along with ultra-fast micro-encapsulation technologies, will be developed to ensure secure and stable long-term storage of ncNA data.

In reading, relying on the rapid development of nanopore techonologies, nanopore-based ncNA sequencing will become available soon. Supported by the development of AI technique and DNA-based neutral network, AI-assisted semantic retrieval and human-computer interaction systems may be realized. As technology progresses, we will see high-throughput real-time reading, fully automated ultra-fast ncNA reading, and portable micro storage-reading devices, greatly enhancing the speed and convenience of accessing ncNA-based data storage systems.

References

Bar-Lev, D., Orr, I., Sabary, O., Etzion, T. & Yaakobi, E. Scalable and robust DNA-based storage via coding theory and deep learning. Nat. Mach. Intell. 7, 639–649 (2025).
Article Google Scholar
Yang, S. et al. DNA as a universal chemical substrate for computing and data storage. Nat. Rev. Chem. 8, 179–194 (2024).
Article PubMed Google Scholar
Krusin-Elbaum, L., Shibauchi, T., Argyle, B., Gignac, L. & Weller, D. Stable ultrahigh-density magneto-optical recordings using introduced linear defects. Nature 410, 444–446 (2001).
Article ADS CAS PubMed Google Scholar
Grass, R. N., Heckel, R., Puddu, M., Paunescu, D. & Stark, W. J. Robust chemical preservation of digital information on DNA in silica with error-correcting codes. Angew. Chem. Int. Ed. 54, 2552–2555 (2015).
Article CAS Google Scholar
Buko, T., Tuczko, N. & Ishikawa, T. DNA data storage. BioTech 12, 44 (2023).
Article CAS PubMed PubMed Central Google Scholar
Li, X. DNA data storage: the fusion of digital and biological information. Theor. Nat. Sci. 4, 26–31 (2023).
Article ADS Google Scholar
Jens, M., Nina, L., Pernilla, B. & Dag, L. ICT sector electricity consumption and greenhouse gas emissions – 2020 outcome. Telecommun. Policy 48, 102701 (2024).
Article Google Scholar
Erol, G. Electricity consumption by ICT: facts, trends, and measurements. Ubiquity 2023, 1–15 (2023).
Google Scholar
Park, C.-H., Petit, Y., Canioni, L. & Park, S.-H. Five-dimensional optical data storage based on ellipse orientation and fluorescence intensity in a silver-sensitized commercial glass. Micromachines 11, 1026 (2020).
Article PubMed PubMed Central Google Scholar
Li, J. et al. Terahertz wavefront shaping with multi-channel polarization conversion based on all dielectric metasurface. Photonics Res. 9, 1939–1947 (2021).
Article ADS Google Scholar
Wuttig, M., Bhaskaran, H. & Taubner, T. Phase-change materials for non-volatile photonic applications. Nat. Photonics 11, 465–476 (2017).
Article ADS CAS Google Scholar
Organick, L. et al. Probing the physical limits of reliable DNA data retrieval. Nat. Commun. 11, 616 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Huang, X. L. et al. Storage-D: a user-friendly platform that enables practical and personalized DNA data storage. Imeta 3, e168 (2024).
Article CAS PubMed PubMed Central Google Scholar
Kevin, N. L. et al. A primordial DNA store and compute engine. Nat. Nanotechnol. 19, 1654–1664 (2024).
Article ADS Google Scholar
Ludovic, O. et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78 (2013).
Article Google Scholar
Divya, K., Gk, M., Umema, A. & Ss, D. Environmental factors affecting the concentration of DNA in blood and saliva stains: a review. J. Forensic Sci. Res. 78, s343 (2024).
Google Scholar
Wamho, E. C. et al. Controlling nuclease degradation of wireframe DNA origami with minor groove binders. ACS Nano 16, 8954–8966 (2022).
Article Google Scholar
Moret, I. et al. Stability of PEI-DNA and DOTAP-DNA complexes: effect of alkaline pH, heparin and serum. J. Control Release 76, 169–181 (2001).
Article CAS PubMed Google Scholar
Fraikin, G. Y., Belenikina, N. S. & Rubin, A. B. Photochemical processes of cell DNA damage by UV radiation of various wavelengths: biological consequences. Mol. Biol. 58, 1–16 (2024).
Article CAS Google Scholar
Saloua Kouass, S., Sonia, G., Pierre, C., Léon, S. & Darel, J. H. The relative contributions of DNA strand breaks, base damage and clustered lesions to the loss of DNA functionality induced by ionizing radiation. Radiat. Res. 181, 99–110 (2014).
Article ADS PubMed Google Scholar
Karishma, M., James, M. T. & Albert, J. K. DNA stability: a central design consideration for DNA data storage systems. Nat. Commun. 12, 1358 (2021).
Article Google Scholar
Ramakrishnan, S., Ijäs, H., Linko, V. & Keller, A. Structural stability of DNA origami nanostructures under application-specific conditions. Comput. Struct. Biotec. 16, 342–349 (2018).
Article CAS Google Scholar
Langlois, N. I. & Clark, H. A. Characterization of DNA nanostructure stability by size exclusion chromatography. Anal. Methods 14, 1006–1014 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zhong, W. R. & Sczepanski, J. T. Direct comparison of D-DNA and L-DNA strand-displacement reactions in living mammalian cells. ACS Synth. Biol. 10, 209–212 (2021).
Article CAS PubMed Google Scholar
Yatsunyk, L. A., Mendoza, O. & Mergny, J.-L. Nano-oddities”: unusual nucleic acid assemblies for DNA-based nanostructures and nanodevices. Acc. Chem. Res 47, 1836–1844 (2014).
Article CAS PubMed Google Scholar
Fei, Z. J., Gupta, N., Li, M. J., Xiao, P. F. & Hu, X. Toward highly effective loading of DNA in hydrogels for high-density and long-term information storage. Sci. Adv. 9, eadg9933 (2023).
Article CAS PubMed PubMed Central Google Scholar
Mao, C. et al. Metal-organic frameworks in microfluidics enable fast encapsulation/extraction of DNA for automated and integrated data storage. Acs Nano 17, 2840–2850 (2023).
Article CAS PubMed Google Scholar
Koch, J. et al. A DNA-of-things storage architecture to create materials with embedded memory. Nat. Biotechnol. 38, 39 (2020).
Article CAS PubMed Google Scholar
Samuel, G. Cracking the code on DNA storage. Commun. ACM 60, 16–18 (2017).
Article Google Scholar
Daoqing, F., Jun, W., Jiawen, H., Erkang, W. & Shaojun, D. Engineering DNA logic systems with non-canonical DNA-nanostructures: basic principles, recent developments and bio-applications. Sci. China Chem. 65, 284–297 (2021).
Google Scholar
Xiujuan, Y., Huimin, Z., Zhenqiang, H. & Xiao, W. Application of aptamer-functionalized nanomaterials in molecular imaging of tumors. Nanotechnol. Rev. 12, 20230107 (2023).
Yuang, W. et al. Chemically modified DNA nanostructures for drug delivery. Innovation 3, 100217 (2022).
Google Scholar
Changping, Y. et al. Genetically encoded nucleic acid nanostructures for biological applications. Chembiochem 26, e202400991 (2025).
Article Google Scholar
Duffy, K., Arangundy-Franklin, S. & Holliger, P. Modified nucleic acids: replication, evolution, and next-generation therapeutics. Bmc Biol. 18, 112 (2020).
Article PubMed PubMed Central Google Scholar
Yu, M. et al. High-throughput DNA synthesis for data storage. Chem. Soc. Rev. 53, 4463–4489 (2024).
Article CAS PubMed Google Scholar
Bag, S. S., Banerjee, A. & Sinha, S. Expansion of genetic alphabets: designer nucleobases and their applications. Synlett 35, 1195–1227 (2023).
Google Scholar
Kang, S. H., Liu, Q., Zhang, J., Zhang, Y. & Qi, H. 2,6-diaminopurine (Z)-containing toehold probes improve genotyping sensitivity. Biotechnol. Bioeng. 121, 1384–1393 (2024).
Article CAS PubMed Google Scholar
Zhang, M., Singh, N., Ehmann, M. E., Zheng, L. N. & Zhao, H. M. Incorporation of noncanonical base Z yields modified mRNA with minimal immunogenicity and improved translational capacity in mammalian cells. Iscience 26, 107739 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Liu, Q. et al. Detection of DNA base modifications by deep recurrent neural network on Oxford Nanopore sequencing data. Nat. Commun. 10, 2449 (2019).
Article ADS PubMed PubMed Central Google Scholar
Seiple, L., Jaruga, P., Dizdaroglu, M. & Stivers, J. T. Linking uracil base excision repair and 5-fluorouracil toxicity in yeast. Nucleic Acids Res. 34, 140–151 (2006).
Article CAS PubMed PubMed Central Google Scholar
Eremeeva, E. & Herdewijn, P. PCR amplification of base-modified DNA. Curr. Protoc. Chem. Biol. 10, 18–48 (2018).
Article CAS PubMed Google Scholar
Goodman, M. F., Hopkins, R. L., Lasken, R. & Mhaskar, D. N. The biochemical basis of 5-bromouracil- and 2-aminopurine-induced mutagenesis. Basic Life Sci. 31, 409–423 (1985).
CAS PubMed Google Scholar
Willis, M. C., Hicke, B. J., Uhlenbeck, O. C., Cech, T. R. & Koch, T. H. Photocrosslinking of 5-iodouracil-substituted RNA and DNA to proteins. Science 262, 1255–1257 (1993).
Article ADS CAS PubMed Google Scholar
Kawabe, H. et al. Enzymatic synthesis and nanopore sequencing of 12-letter supernumerary DNA. Nat. Commun. 14, 6820 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Hoshika, S. et al. Hachimoji DNA and RNA: a genetic system with eight building blocks. Science 363, 884 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Galindo-Murillo, R. & Barroso-Flores, J. Hydrophobic unnatural base pairs show a Watson-Crick pairing in micro-second molecular dynamics simulations. J. Biomol. Struct. Dyn. 38, 4098–4106 (2020).
Article CAS PubMed Google Scholar
Zhang, Y. et al. A semi-synthetic organism that stores and retrieves increased genetic information. Nature 551, 644 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Mukba, S. A. et al. Expanding the genetic code: unnatural base pairs in biological systems. Mol. Biol. 54, 475–484 (2020).
Article CAS Google Scholar
Okamoto, I., Miyatake, Y., Kimoto, M. & Hirao, I. High fidelity, efficiency and functionalization of Ds-Px unnatural base pairs in PCR amplification for a genetic alphabet expansion system. Acs Synth. Biol. 5, 1220–1230 (2016).
Article CAS PubMed Google Scholar
Hirao, I., Kimoto, M. & Lee, K. H. DNA aptamer generation by ExSELEX using genetic alphabet expansion with a mini-hairpin DNA stabilization method. Biochimie 145, 15–21 (2018).
Article CAS PubMed Google Scholar
Johnson, S. C., Sherrill, C. B., Marshall, D. J., Moser, M. J. & Prudent, J. R. A third base pair for the polymerase chain reaction: inserting isoC and isoG. Nucleic Acids Res. 32, 1937–1941 (2004).
Article CAS PubMed PubMed Central Google Scholar
Marina, E., Alexander, M. S.-C. & Markus, W. G. Impact of modified ribose sugars on nucleic acid conformation and function. Heterocycl. Commun. 23, 155–165 (2017).
Article Google Scholar
Inoue, H. et al. Synthesis and hybridization studies on two complementary nona(2’-O-methyl)ribonucleotides. Nucleic Acids Res. 15, 6131–6148 (1987).
Article CAS PubMed PubMed Central Google Scholar
Watts, J. K., Katolik, A., Viladoms, J. & Damha, M. J. Studies on the hydrolytic stability of 2′-fluoroarabinonucleic acid (2′F-ANA). Org. Biomol. Chem. 7, 1904–1910 (2009).
Article CAS PubMed Google Scholar
Pontarelli, A. & Wilds, C. J. Arabinonucleic acids containing C5-propynyl modifications form stable hybrid duplexes with RNA that are efficiently degraded by E. coli RNase H. Bioorg. Med. Chem. Lett. 67, 128744 (2022).
Article CAS PubMed Google Scholar
Kalota, A. et al. 2′-Deoxy-2′-fluoro-β- d -arabinonucleic acid (2′F-ANA) modified oligonucleotides (ON) effect highly efficient, and persistent, gene silencing. Nucleic Acids Res. 34, 451–461 (2006).
Article CAS PubMed PubMed Central Google Scholar
Anosova, I. et al. The structural diversity of artificial genetic polymers. Nucleic Acids Res. 44, 1007–1021 (2016).
Article CAS PubMed Google Scholar
Zhang, W. et al. Structural interpretation of the effects of threo-nucleotides on nonenzymatic template-directed polymerization. Nucleic Acids Res. 49, 646–656 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Murayama, K., Okita, H., Kuriki, T. & Asanuma, H. Nonenzymatic polymerase-like template-directed synthesis of acyclic l-threoninol nucleic acid. Nat. Commun. 12, 804 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Culbertson, M. C. et al. Evaluating TNA stability under simulated physiological conditions. Bioorg. Med. Chem. Lett. 26, 2418–2421 (2016).
Article CAS PubMed Google Scholar
Ebert, M. O., Mang, C., Krishnamurthy, R., Eschenmoser, A. & Jaun, B. The structure of a TNA-TNA complex in solution: NMR study of the octamer duplex derived from α-(L)-threofuranosyl-(3′-2′)-CGAATTCG. J. Am. Chem. Soc. 130, 15105–15115 (2008).
Article ADS PubMed Google Scholar
Eze, N. A. & Milam, V. T. Quantitative analysis of locked nucleic acid and DNA competitive displacement events on microspheres. Langmuir 38, 6871–6881 (2022).
Article CAS Google Scholar
Hoshino, H., Kasahara, Y., Kuwahara, M. & Obika, S. DNA polymerase variants with high processivity and accuracy for encoding and decoding locked nucleic acid sequences. J. Am. Chem. Soc. 142, 21530–21537 (2020).
Article ADS CAS PubMed Google Scholar
Samson, C. et al. Structural studies of HNA substrate specificity in mutants of an archaeal DNA polymerase obtained by directed evolution. Biomolecules 10, 1647 (2020).
Article CAS PubMed PubMed Central Google Scholar
Allart, B. et al. D-altritol nucleic acids (ANA): Hybridisation Properties, Stability, And Initial Structural Analysis. Chem. - A Eur. J. 5, 2424–2431 (1999).
Article CAS Google Scholar
Zhang, L. L., Peritz, A. & Meggers, E. A simple glycol nucleic acid. J. Am. Chem. Soc. 127, 4174–4175 (2005).
Article ADS CAS PubMed Google Scholar
Makino, K., Sugiyama, I., Asanuma, H. & Kashida, H. Kinetics of strand displacement reaction with acyclic artificial nucleic acids. Angew. Chem. Int. Ed. Engl. 63, e202319864 (2024).
Article CAS PubMed Google Scholar
Maiti, M. et al. Xylonucleic acid: synthesis, structure, and orthogonal pairing properties. Nucleic Acids Res. 43, 7189–7200 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kevin, A. et al. On the influence of nucleic acid backbone modifications on lipid nanoparticle morphology. Langmuir 38, 14036–14043 (2022).
Article Google Scholar
Shin, H., Cho, J. Y., Park, B. Y. & Jung, C. L. Sulfur incorporation into nucleic acids accelerates enzymatic activity. Chem. Eng. J. 493, 152548 (2024).
Article CAS Google Scholar
Novikova, D., Sagaidak, A., Vorona, S. & Tribulovich, V. A visual compendium of principal modifications within the nucleic acid sugar phosphate backbone. Molecules 29, 3025 (2024).
Article CAS PubMed PubMed Central Google Scholar
Tereshko, V., Gryaznov, S. & Egli, M. Consequences of replacing the DNA 3′-oxygen by an amino group: high-resolution crystal structure of a fully modified N3′→P5′ phosphoramidate DNA dodecamer duplex. J. Am. Chem. Soc. 120, 269–283 (1998).
Article ADS CAS Google Scholar
Singh, G. & Monga, V. Peptide nucleic acids: recent developments in the synthesis and backbone modifications. Bioorg. Chem. 141, 106860 (2023).
Article CAS PubMed Google Scholar
Moccia, M., Adamo, M. F. & Saviano, M. Insights on chiral, backbone modified peptide nucleic acids: properties and biological activity. Artif. DNA PNA XNA 5, e1107176 (2014).
Article PubMed Google Scholar
Lok, C. N. et al. Potent gene-specific inhibitory properties of mixed-backbone antisense oligonucleotides comprised of 2′-deoxy-2′-fluoro-D-arabinose and 2′-deoxyribose nucleotides. Biochemistry 41, 3457–3467 (2002).
Article CAS PubMed Google Scholar
Depmeier, H. & Kath-Schorr, S. Expanding the horizon of the xeno nucleic acid space: threose nucleic acids with increased information storage. J. Am. Chem. Soc. 146, 7743–7751 (2024).
Article ADS CAS PubMed Google Scholar
Mallette, T. L., Lidke, D. S. & Lakin, M. R. Heterochiral modifications enhance robustness and function of DNA in living human cells. Chembiochem 25, e202300755 (2024).
Article CAS PubMed PubMed Central Google Scholar
Fan, C. Y., Deng, Q. & Zhu, T. F. Bioorthogonal information storage in L-DNA with a high-fidelity mirror-image DNA polymerase. Nat. Biotechnol. 39, 1548–1555 (2021).
Article CAS PubMed Google Scholar
Kang, H. et al. Inhibition of MDR1 gene expression by chimeric HNA antisense oligonucleotides. Nucleic Acids Res. 32, 4411–4419 (2004).
Article CAS PubMed PubMed Central Google Scholar
Mads, K. S. et al. Self-assembly of ultrasmall 3D architectures of (<scp>l</scp>)-acyclic threoninol nucleic acids with high thermal and serum stability. J. Am. Chem. Soc. 146, 20141–20146 (2024).
Article Google Scholar
Ken, Y. et al. Enhancing siRNA efficacy in vivo with extended nucleic acid backbones. Nat. Biotechnol. 43, 904–913 (2024).
Google Scholar
Wang, J., Shang, J., Xiang, Y. & Tong, A. Post-synthetic modification of oligonucleotides through oxidative amination of 4-thio-2’-deoxyuridine. Curr. Protoc. 1, e274 (2021).
Article CAS PubMed Google Scholar
Lee, E. M., Setterholm, N. A., Hajjar, M., Barpuzary, B. & Chaput, J. C. Stability and mechanism of threose nucleic acid toward acid-mediated degradation. Nucleic Acids Res. 51, 9542–9551 (2023).
Article CAS PubMed PubMed Central Google Scholar
Gade, C. R. & Sharma, N. K. Hybrid DNA i-motif: aminoethylprolyl-PNA (pC(5)) enhance the stability of DNA (dC(5)) i-motif structure. Bioorg. Med. Chem. Lett. 27, 5424–5428 (2017).
Article CAS PubMed Google Scholar
Petkowski, J. J. et al. Astrobiological implications of the stability and reactivity of peptide nucleic acid (PNA) in concentrated sulfuric acid. Sci. Adv. 11, eadr0006 (2025).
Article CAS PubMed PubMed Central Google Scholar
Rozners, E. Hydration of short DNA, RNA and 2’-OMe oligonucleotides determined by osmotic stressing. Nucleic Acids Res. 32, 248–254 (2004).
Article CAS PubMed PubMed Central Google Scholar
Lelyveld, V. S., Fang, Z. Y. & Szostak, J. W. Trivalent rare earth metal cofactors confer rapid NP-DNA polymerase activity. Science 382, 423–429 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Bian, T. Y. et al. Xeno nucleic acids as functional materials: from biophysical properties to application. Adv. Health. Mater. 13, e2401207 (2024).
Article Google Scholar
Wilds, C. J. & Damha, M. J. 2′-Deoxy-2′-fluoro-β-d-arabinonucleosides and oligonucleotides (2′F-ANA): synthesis and physicochemical studies. Nucleic Acids Res. 28, 3625–3635 (2000).
Article CAS PubMed PubMed Central Google Scholar
De Winter, H., Lescrinier, E., Van Aerschot, A. & Herdewijn, P. Molecular dynamics simulation to investigate differences in minor groove hydration of HNA/RNA hybrids as compared to HNA/DNA complexes. J. Am. Chem. Soc. 120, 5381–5394 (1998).
Article ADS Google Scholar
Kamiya, Y. et al. Intrastrand backbone-nucleobase interactions stabilize unwound right-handed helical structures of heteroduplexes of L-aTNA/RNA and SNA/RNA. Commun. Chem. 3, 156 (2020).
Article CAS PubMed PubMed Central Google Scholar
Campbell, M. A. & Wengel, J. Locked. unlocked nucleic acids (LNA. UNA): contrasting structures work towards common therapeutic goals. Chem. Soc. Rev. 40, 5680–5689 (2011).
Article CAS PubMed Google Scholar
Lackey, H. H., Peterson, E. M., Chen, Z., Harris, J. M. & Heemstra, J. M. Thermostability trends of TNA:DNA duplexes reveal strong purine dependence. ACS Synth. Biol. 8, 1144–1152 (2019).
Article CAS PubMed Google Scholar
Wilds, C. J., Wawrzak, Z., Krishnamurthy, R., Eschenmoser, A. & Egli, M. Crystal structure of a B-form DNA duplex containing (l)-α-threofuranosyl (3‘→2‘) nucleosides: a four-carbon sugar is easily accommodated into the backbone of DNA. J. Am. Chem. Soc. 124, 13716–13721 (2002).
Article ADS CAS PubMed Google Scholar
Anosova, I. et al. Structural insights into conformation differences between DNA/TNA and RNA/TNA chimeric duplexes. Chembiochem 17, 1705–1708 (2016).
Article CAS PubMed PubMed Central Google Scholar
Lackey, H. H., Chen, Z., Harris, J. M., Peterson, E. M. & Heemstra, J. M. Single-molecule kinetics show DNA pyrimidine content strongly affects RNA:DNA and TNA:DNA heteroduplex dissociation rates. ACS Synth. Biol. 9, 249–253 (2020).
Article CAS PubMed Google Scholar
Yubin, R. et al. DNA-based concatenated encoding system for high-reliability and high-density data storage. Small Methods 6, e2101335 (2022).
Article Google Scholar
Ping, S., Rujie, Z., Chuanping, H. & Tingjian, C. Transcription, reverse transcription, and amplification of backbone-modified nucleic acids with laboratory-evolved thermophilic DNA polymerases. Curr. Protoc. 1, e188 (2021).
Article Google Scholar
Linda, C. M. et al. Reading and writing digital data in DNA. Nat. Protoc. 15, 86–101 (2019).
Google Scholar
Giacomo, C. & Claudio, S. Time series compression survey. ACM Comput. Surv. 55, 1–32 (2023).
Google Scholar
Bingzhe, L., Li, O. & David, D. DP-DNA: a digital pattern-aware DNA storage system to improve encoding density. In 2023 31st International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). Institute of Electrical and Electronics Engineers (IEEE), 1–8 (2023).
Church, G. M., Gao, Y. & Kosuri, S. Next-generation digital information storage in DNA. Science 337, 1628–1628 (2012).
Article ADS CAS PubMed Google Scholar
Goldman, N. et al. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494, 77–80 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Zheng, Y. et al. DNA-QLC: an efficient and reliable image encoding scheme for DNA storage. BMC Genomics 25, 266 (2024).
Article CAS PubMed PubMed Central Google Scholar
Romesberg, F. E. Unnatural base pairs to expand the genetic alphabet and code. In Handbook of Chemical Biology of Nucleic Acids(ed Suginoto N). Springer Nature Singapore, 1–21 (2022).
Zhang, C. et al. Parallel molecular data storage by printing epigenetic bits on DNA. Nature 634, 824–832 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Kyung Hyun, L., Kiyofumi, H., Michiko, K. & Ichiro, H. Genetic alphabet expansion biotechnology by creating unnatural base pairs. Curr. Opin. Biotechnol. 51, 8–15 (2018).
Article Google Scholar
Thomas, C. A. et al. Assessing readability of an 8-letter expanded deoxyribonucleic acid alphabet with nanopores. J. Am. Chem. Soc. 145, 8560–8568 (2023).
Article ADS CAS Google Scholar
Wang, J. & Yu, H. Y. Threose nucleic acid as a primitive genetic polymer and a contemporary molecular tool. Bioorg. Chem. 143, 107049 (2024).
Article CAS PubMed Google Scholar
Yang, K. F., McCloskey, C. M. & Chaput, J. C. Reading and writing digital information in TNA. ACS Synth. Biol. 9, 2936–2942 (2020).
Article CAS PubMed Google Scholar
Seeman, N. C. & Sleiman, H. F. DNA nanotechnology. Nat. Rev. Mater. 3, 17068 (2017).
Article ADS Google Scholar
Rothemund, P. W. K. Folding DNA to create nanoscale shapes and patterns. Nature 440, 297–302 (2006).
Article ADS CAS PubMed Google Scholar
Lin, W. et al. Predict the degree of secondary structures of the encoding sequences in DNA storage by deep learning model. Sci. Rep. 15, 20920 (2025).
Article ADS PubMed PubMed Central Google Scholar
Ke, G. et al. -DNA molecular beacon: a safe, stable, and accurate intracellular nano-thermometer for temperature sensing in living cells. J. Am. Chem. Soc. 134, 18908–18911 (2012).
Article ADS CAS PubMed Google Scholar
Fu, W. et al. Rational design of pH-responsive DNA motifs with general sequence compatibility. Angew. Chem. Int. Ed. 58, 16405–16410 (2019).
Article CAS Google Scholar
Hu, Y. W., Cecconello, A., Idili, A., Ricci, F. & Willner, I. Triplex DNA nanostructures: from basic properties to applications. Angew. Chem. Int. Ed. 56, 15210–15233 (2017).
Article CAS Google Scholar
Barbosa, N., Sagresti, L. & Brancato, G. Photoinduced azobenzene-modified DNA dehybridization: insights into local and cooperativity effects from a molecular dynamics study. Phys. Chem. Chem. Phys. 23, 25170–25179 (2021).
Article CAS PubMed Google Scholar
Suyu, L., Fei, D., Qian, L., Chunhai, F. & Jing, F. Azobenzene-integrated DNA Nanomachine. Chem. J. Chin. U 43, 8 (2022).
Google Scholar
Ashwood, B. & Tokmakoff, A. Kinetics and dynamics of oligonucleotide hybridization. Nat. Rev. Chem. 9, 305–327 (2025).
Article CAS PubMed Google Scholar
Wang, Q. et al. 2′-Fluoroarabinonucleic acid nanostructures as stable carriers for cellular delivery in the strongly acidic environment. Acs Appl. Mater. Inter 12, 53592–53597 (2020).
Article CAS Google Scholar
Li, X. et al. RNA nanotechnology for codelivering high-payload nucleoside analogs to cancer with a synergetic effect. Mol. Pharm. 21, 5690–5702 (2024).
Article CAS PubMed PubMed Central Google Scholar
Taylor, A. I. et al. Nanostructures from synthetic genetic polymers. Chembiochem 17, 1107–1110 (2016).
Article CAS PubMed PubMed Central Google Scholar
Dai, K. et al. Single-stranded RNA origami-based epigenetic immunomodulation. Nano Lett. 23, 7188–7196 (2023).
Article ADS CAS PubMed Google Scholar
Dong, B. et al. Synthesis and characterization of (R)-miniPEG-containing chiral γ-peptide nucleic acids using the Fmoc strategy. Tetrahedron Lett. 60, 1430–1433 (2019).
Article CAS Google Scholar
Li, K. J. et al. Empowering DNA-based information processing: computation and data storage. Acs Appl Mater. Inter 16, 68749–68771 (2024).
Article CAS Google Scholar
Hollenstein, M. Enzymatic synthesis of base-modified nucleic acids. In Handbook of Chemical Biology of Nucleic Acids (ed Suginoto, N). Springer Nature Singapore, 1–39 (2022).
Akihiro, O., Kohji, S. & Mitsuo, S. DNA synthesis without base protection using the phosphoramidite approach. Curr. Protoc. Nucleic Acid Chem. 26, 3 (2006).
Google Scholar
Brian, S. S. Chemical nucleic acid synthesis, modification and labelling. Curr. Opin. Biotechnol. 4, 20–28 (1993).
Article Google Scholar
Weng, Z. et al. Massively parallel homogeneous amplification of chip-scale DNA for DNA information storage (MPHAC-DIS). Nat. Commun. 16, 667 (2025).
Article ADS CAS PubMed PubMed Central Google Scholar
Okita, H., Kondo, S., Murayama, K. & Asanuma, H. Rapid chemical ligation of DNA and threoninol nucleic acid (TNA) for effective nonenzymatic primer extension. J. Am. Chem. Soc. 145, 17872–17880 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
O’Flaherty, D. K., Zhou, L. & Szostak, J. W. Nonenzymatic RNA-templated synthesis of N3′→P5′ phosphoramidate DNA. Bio-Protoc. 10, e3734 (2020).
PubMed PubMed Central Google Scholar
Kendall, H., Michelle, H., Giancarlo, G., Jeremy, S. E. & Wei, Z. Enzymatic synthesis of designer DNA using cyclic reversible termination and a universal template. ACS Synth. Biol. 9, 283–293 (2020).
Article Google Scholar
Sun, L. P. et al. Template-independent synthesis and 3′-end labelling of 2′-modified oligonucleotides with terminal deoxynucleotidyl transferases. Nucleic Acids Res. 52, 10085–10101 (2024).
Article CAS PubMed PubMed Central Google Scholar
Pinheiro, V. B. et al. Synthetic genetic polymers capable of heredity and evolution. Science 336, 341–344 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Lelyveld, V. S., Zhang, W. & Szostak, J. W. Synthesis of phosphoramidate-linked DNA by a modified DNA polymerase. Proc. Natl. Acad. Sci. USA 117, 7276–7283 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Liu, Y. Y., Wang, J., Wu, Y. S. & Wang, Y. J. Advancing the enzymatic toolkit for 2′-fluoro arabino nucleic acid (FANA) manipulation: phosphorylation, ligation, replication, and templating RNA transcription. Chem. Sci. 15, 12534–12542 (2024).
Article CAS PubMed PubMed Central Google Scholar
Khamissi, N., Korfmann, C., Chaudhry, A. & Hili, R. Ligase-catalyzed transcription and reverse-transcription of XNA-containing nucleic acid polymers using T3 DNA ligase. Chem. Sci. 16, 9749–9755 (2025).
Article CAS PubMed PubMed Central Google Scholar
Carlson, C. K. et al. A massively parallel in vivo assay of TdT mutants yields variants with altered nucleotide insertion biases. ACS Synth. Biol. 13, 3326–3343 (2024).
Article CAS PubMed PubMed Central Google Scholar
Sabat, N. et al. Towards the controlled enzymatic synthesis of LNA containing oligonucleotides. Front. Chem. 11, 1161462 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, G. et al. Enzymatic synthesis of DNA with an expanded genetic alphabet using terminal deoxynucleotidyl transferase. ACS Synth. Biol. 11, 4142–4155 (2022).
Article CAS PubMed Google Scholar
Lin, W. et al. Scaling the high-yield potential of large-scale DNA data storage with cap-free dnA Synthesis. ACS Synth. Biol. 18, 2764–2773 (2025).
Article Google Scholar
Soukarie, D., Nocete, L., Bittner, A. M. & Santiago, I. DNA data storage in electrospun and melt-electrowritten composite nucleic acid-polymer fibers. Mater. Today Bio. 24, 100900 (2024).
Article CAS PubMed Google Scholar
Chen, W. G. et al. An artificial chromosome for data storage. Natl. Sci. Rev. 8, nwab028 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yim, S. S. et al. Robust direct digital-to-biological data storage in living cells. Nat. Chem. Biol. 17, 246 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Luo, H. et al. Engineered living memory microspheroid-based archival file system for random accessible in vivo DNA storage. Adv. Mater. 37, e2415358 (2025).
Article PubMed Google Scholar
Fister, K., Fister, I. & Murovec, J. The potential of plants and seeds in DNA-based information storage. Adv. Inf. Knowl. Process. 4, 69–81 (2017).
Elena, B., Aman, A., Renwick, C. J. D. & Thomas, D. DNA storage—from natural biology to synthetic biology. Comput. Struct. Biotec. 21, 1227–1235 (2023).
Article Google Scholar
Farzadfard, F. et al. Single-nucleotide-resolution computing and memory in living cells. Mol. Cell 75, 769–780.e764 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. et al. Preservation and encryption in DNA digital data storage. Chempluschem 87, e202200183 (2022).
Article CAS PubMed Google Scholar
Sun, F. J. et al. Mobile and self-sustained data storage in an extremophile genomic dNA. Adv. Sci. 10, e2206201 (2023).
Article Google Scholar
Peter, R., Barbara, R., Karin, F., Claus, V. & Martin, W. Mechanisms of degradation of DNA standards for calibration function during storage. Appl. Microbiol. Biotechnol. 89, 407–417 (2010).
Google Scholar
Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).
Article CAS PubMed PubMed Central Google Scholar
Shen, H. et al. Random sanitization in DNA information storage using CRISPR-Cas12a. J. Am. Chem. Soc. 146, 35155–35164 (2024).
Article ADS CAS PubMed Google Scholar
Li, K. J. et al. DNA- DISK: automated end- to- end data storage via enzymatic single- nucleotide DNA synthesis and sequencing on digital microfluidics. Proc. Natl. Acad. Sci. USA 121, 1–9 (2024).
Google Scholar
Organick, L. et al. Random access in large-scale DNA data storage (vol 36, pg 242, 2018). Nat. Biotechnol. 36, 660–660 (2018).
Article CAS PubMed Google Scholar
Zhou, Y., Bi, K., Ge, Q. Y. & Lu, Z. H. Advances and challenges in random access techniques for in vitro DNA data storage. Acs Appl. Mater. Inter. 16, 43102–43113 (2024).
Article CAS Google Scholar
Zhang, J. Y., Hou, C. Y. & Liu, C. C. CRISPR-powered quantitative keyword search engine in DNA data storage. Nat. Commun. 15, 2376 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Shy, B. R. et al. High-yield genome engineering in primary cells using a hybrid ssDNA repair template and small-molecule cocktails. Nat. Biotechnol. 41, 521–531 (2023).
Article CAS PubMed Google Scholar
Han, W. J. et al. Efficient precise integration of large DNA sequences with 3’-overhang dsDNA donors using CRISPR/Cas9. Proc. Natl. Acad. Sci. USA 120, e2221127120 (2023).
Article CAS PubMed PubMed Central Google Scholar
Newman, S. et al. High density DNA data storage library via dehydration with digital microfluidic retrieval. Nat. Commun. 10, 1706 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Piao, Y. et al. Bead-based DNA synthesis and sequencing for integrated data storage using digital microfluidics. Angew. Chem. Int. Ed. Engl. 64, e202416004 (2025).
Article CAS PubMed Google Scholar
Heckel, R., Mikutis, G. & Grass, R. N. A characterization of the DNA data storage channel. Sci. Rep. 9, 9663 (2019).
Article ADS PubMed PubMed Central Google Scholar
Darío Sánchez, M. et al. Reduced amplification by phi29 DNA polymerase in the presence of unbound oligos during reaction in RCA. Biosens. Bioelectron.: X 17, 100456 (2024).
Google Scholar
Kircher, M., Heyn, P. & Kelso, J. Addressing challenges in the production and analysis of Illumina sequencing data. BMC Genomics 12, 382 (2011).
Article CAS PubMed PubMed Central Google Scholar
Wang, Y., Zhao, Y., Bollas, A., Wang, Y. & Au, K. F. Nanopore sequencing technology, bioinformatics and applications. Nat. Biotechnol. 39, 1348–1365 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yan, S. H. et al. Direct sequencing of 2′-deoxy-2′-fluoroarabinonucleic acid (FANA) using nanopore-induced phase-shift sequencing (NIPSS). Chem. Sci. 10, 3110–3117 (2019).
Article CAS PubMed PubMed Central Google Scholar
Wadley, T. et al. Nanopore sequencing for detection and characterization of phosphorothioate modifications in native DNA sequences. Front. Microbiol 13, 871937 (2022).
Article PubMed PubMed Central Google Scholar
Yang, Z. Y., Chen, F., Alvarado, J. B. & Benner, S. A. Amplification, mutation, and sequencing of a six-letter synthetic genetic system. J. Am. Chem. Soc. 133, 15105–15112 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Kimoto, M., Matsunaga, K. & Hirao, I. DNA aptamer generation by genetic alphabet expansion SELEX (ExSELEX) using an unnatural base pair system. Methods Mol. Biol. 1380, 47–60 (2016).
Article CAS PubMed Google Scholar
Wang, H. L. et al. Locating, tracing and sequencing multiple expanded genetic letters in complex DNA context via a bridge-base approach. Nucleic Acids Res. 51, e52 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Endo, M. AFM-based single-molecule observation of the conformational changes of DNA structures. Methods 169, 3–10 (2019).
Article CAS PubMed Google Scholar
Nieves, D. J., Gaus, K. & Baker, M. A. B. DNA-based super-resolution microscopy: DNA-PAINT. Genes (Basel) 9, 621 (2018). 12.
Article PubMed PubMed Central Google Scholar
Kang, T., Lim, D., Lee, W. & Song, Y. Polymerase elongation onto patterned DNA for random accessed DNA data storage. BioChip J. 19, 636–648 (2025).
Article CAS Google Scholar
Adleman, L. M. Molecular computation of solutions to combinatorial problems. Science 266, 1021–1024 (1994).
Article ADS CAS PubMed Google Scholar
Okamoto, A., Tanaka, K. & Saito, I. DNA logic gates. J. Am. Chem. Soc. 126, 9458–9463 (2004).
Article ADS CAS PubMed Google Scholar
Frezza, B. M., Cockroft, S. L. & Ghadiri, M. R. Modular multi-level circuits from immobilized DNA-Based logic gates. J. Am. Chem. Soc. 129, 14875–14879 (2007).
Article ADS CAS PubMed Google Scholar
Elbaz, J. et al. DNA computing circuits using libraries of DNAzyme subunits (vol 5, pg 417, 2010). Nat. Nanotechnol. 6, 190–190 (2011).
Article ADS CAS Google Scholar
Phillips, A. & Cardelli, L. A programming language for composable DNA circuits. J. R. Soc. Interface 6, S419–S436 (2009).
Article CAS PubMed PubMed Central Google Scholar
Seelig, G., Soloveichik, D., Zhang, D. Y. & Winfree, E. Enzyme-free nucleic acid logic circuits. Science 314, 1585–1588 (2006).
Article ADS CAS PubMed Google Scholar
Qian, L. & Winfree, E. Scaling up digital circuit computation with DNA strand displacement cascades. Science 332, 1196–1201 (2011).
Article ADS CAS PubMed Google Scholar
Cherry, K. M. & Qian, L. L. Supervised learning in DNA neural networks. Nature 645, 639–647 (2025).
Article ADS CAS PubMed PubMed Central Google Scholar
Song, T. & Qian, L. Heat-rechargeable computation in DNA logic circuits and neural networks. Nature 646, 315–322 (2025).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhang, Q. et al. High-speed sequential DNA computing using a solid-state DNA origami register. ACS Cent. Sci. 10, 2285–2293 (2024).
Article CAS PubMed PubMed Central Google Scholar
Pei, Y. et al. Single-molecule resettable DNA computing via magnetic tweezers. Nano Lett. 22, 3003–3010 (2022).
Article ADS CAS PubMed Google Scholar
Garbesi, A. et al. L-DNAs as potenital antimessenger oligonucleotides: a reassessment. Nucleic Acids Res. 21, 4159–4165 (1993).
Article CAS PubMed PubMed Central Google Scholar
Kabza, A., Young, B. & Sczepanski, J. Heterochiral DNA strand displacement circuits. J. Am. Chem. Soc. 139, 17715–177182017 (2017). 49.
Article ADS CAS PubMed Google Scholar
Young, B. E. & Sczepanski, J. T. Heterochiral DNA strand-displacement based on chimeric D/L-oligonucleotides. ACS Synth. Biol. 8, 2756–2759 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chen, Y., Nagao, R., Murayama, K. & Asanuma, H. Orthogonal amplification circuits composed of acyclic nucleic acids enable RNA detection. J. Am. Chem. Soc. 144, 5887–5892 (2022).
Article ADS CAS PubMed Google Scholar
Bag, S. S., Banerjee, A. & Sinha, S. Expansion of genetic alphabets: designer nucleobases and their applications. Synlett 35, 1195–1227 (2024).
CAS Google Scholar
Le, A. V. & Hartman, M. C. T. Improved synthesis of the unnatural base NaM, and evaluation of its orthogonality in in vitro transcription and translation. RSC Chem. Biol. 5, 1111–1121 (2024).
Article CAS PubMed PubMed Central Google Scholar
Shomorony, I. & Heckel, R. Information-theoretic foundations of DNA data storage. Found. Trends Commun. 19, 1–106 (2022).
Article Google Scholar
Jin, B., Li, Y. & Robertson, K. D. DNA methylation: superior or subordinate in the epigenetic hierarchy? Genes Cancer 2, 607–617 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Q. L. et al. Programming non-nucleic acid molecules into computational nucleic acid systems. Angew. Chem. Int. Ed. 62, e202214698 (2023).
Article CAS Google Scholar
Chaput, J. C. Redesigning the genetic polymers of life. Acc. Chem. Res. 54, 1056–1065 (2021).
Article CAS PubMed Google Scholar
Ahmed, M. K., Subrata, H. M. & Markus, W. G. Cyclic enzymatic solid phase synthesis of isotopically labeled DNA oligonucleotides. Nucleosides, Nucleotides Nucleic Acids 28, 1030–1041 (2009).
Article Google Scholar
Jia-Yi, L., Yu-Ting, H., Xue-Qiang, W. & Yuntian, Z. Development of silyl-protected phosphoramidite building blocks for short <scp>ssDNA</scp> synthesis†. Chin. J. Chem. 43, 1293–1298 (2025).
Article Google Scholar
David, P. & Rebecca, B. Improving enzyme fitness with machine learning. CHIMIA 77, 116 (2023).
Article Google Scholar
Jason, Y., Francesca-Zhoufan, L. & Frances, H. A. Opportunities and challenges for machine learning-assisted enzyme engineering. ACS Cent. Sci. 10, 226–241 (2024).
Article Google Scholar
Freund, N. et al. A two-residue nascent-strand steric gate controls synthesis of 2′-O-methyl- and 2′-O-(2-methoxyethyl)-RNA. Nat. Chem. 15, 91–100 (2023).
Article CAS PubMed Google Scholar
Eva, S. H. et al. Reverse transcription as key step in RNA in vitro evolution with unnatural base pairs. RSC Chem. Biol. 5, 556–566 (2024).
Article Google Scholar
Pagès-Gallego, M. et al. Direct detection of 8-oxo-dG using nanopore sequencing. Nat. Commun. 16, 5236 (2025).
Article ADS PubMed PubMed Central Google Scholar
Perez, M. et al. Direct high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learning. Nat. Commun. 16, 6980 (2025).
Article ADS CAS PubMed PubMed Central Google Scholar
Thomas, C. A. et al. Sequencing a DNA analog composed of artificial bases. Nat. Commun. 16, 7240 (2025).
Article ADS CAS PubMed PubMed Central Google Scholar
Kovaka, S. et al. Uncalled4 improves nanopore DNA and RNA modification detection via fast and accurate signal alignment. Nat. Methods 22, 681–691 (2025).
Article CAS PubMed PubMed Central Google Scholar
Stanojević, D., Li, Z., Bakić, S., Foo, R. & Šikić, M. Rockfish: a transformer-based model for accurate 5-methylcytosine prediction from nanopore sequencing. Nat. Commun. 15, 5580 (2024).
Article ADS PubMed PubMed Central Google Scholar
Wang, Z. et al. Training data diversity enhances the basecalling of novel RNA modification-induced nanopore sequencing readouts. Nat. Commun. 16, 679 (2025).
Article ADS CAS PubMed PubMed Central Google Scholar
Lei, L., Han, Q., Wei, X., Yujuan, W. & Kedong, B. Evaluating graphene and molybdenum disulfide nanopores for DNA sequencing. In 2023 IEEE 18th International Conference on Nano/Micro Engineered and Molecular Systems (NEMS). Institute of Electrical and Electronics Engineers (IEEE), 188–192 (2023).
Keasling, J. D. Manufacturing molecules through metabolic engineering. Science 330, 1355–1358 (2010).
Article ADS CAS PubMed Google Scholar
Doudna, J. A. & Charpentier, E. The new frontier of genome engineering with CRISPR-Cas9. Science 346, 1258096 (2014).
Article PubMed Google Scholar
Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Lindahl, T. Instability and decay of the primary structure of DNA. Nature 362, 709–715 (1993).
Article ADS CAS PubMed Google Scholar
Kornienko, I. V., Aramova, O. Y., Tishchenko, A. A., Rudoy, D. V. & Chikindas, M. L. RNA stability: a review of the role of structural features and environmental conditions. Molecules 29, 5978 (2024).
Article CAS PubMed PubMed Central Google Scholar
Burmeister, P. E. et al. Direct in vitro selection of a 2’-O-methyl aptamer to VEGF. Chem. Biol. 12, 25–33 (2005).
Article CAS PubMed Google Scholar
Vester, B. & Wengel, J. LNA (locked nucleic acid): high-affinity targeting of complementary RNA and DNA. Biochemistry 43, 13233–13241 (2004).
Article CAS PubMed Google Scholar
Maola, V. A. et al. Directed evolution of a highly efficient TNA polymerase achieved by homologous recombination. Nat. Catal. 7, 1173–1185 (2024).
Article CAS Google Scholar
Majumdar, B., Sarma, D., Yu, Y., Lozoya-Colinas, A. & Chaput, J. C. Increasing the functional density of threose nucleic acid. RSC Chem. Biol. 5, 41–48 (2024).
Article CAS PubMed Google Scholar
Li, S. et al. Exploring potential biosafety implications in DNA information storage. Biosaf. Health 7, 132–139 (2025).
Article PubMed PubMed Central Google Scholar
Ou, Y. & Guo, S. Safety risks and ethical governance of biomedical applications of synthetic biology. Front. Bioeng. Biotechnol. 11, 1292029 (2023).
Article PubMed PubMed Central Google Scholar
Adamala, K. P. et al. Confronting risks of mirror life. Science 386, 1351–1353 (2024).
Article ADS CAS PubMed Google Scholar
Karpenko, D. V. A method using one fluorophore signal in Sanger read to determine CpG methylation in bisulfite converted DNA. Russian J. Genet. 59, 1255–1262 (2023).
Article CAS Google Scholar
Wang, P., Mu, Z., Sun, L., Si, S. & Wang, B. Hidden addressing encoding for DNA storage. Front. Bioeng. Biotechnol. 10, 916615 (2022).
Article PubMed PubMed Central Google Scholar
Sharief, S. A., Chahal, P. & Alocilja, E. Application of DNA sequences in anti-counterfeiting: current progress and challenges. Int. J. Pharm. 602, 120580 (2021).
Article CAS PubMed PubMed Central Google Scholar
Cimino, G. D., Shi, Y. B. & Hearst, J. E. Wavelength dependence for the photoreversal of a psoralen-DNA cross-link. Biochemistry 25, 3013–3020 (1986).
Article CAS PubMed Google Scholar
Knutson, S. D. et al. Thermoreversible control of nucleic acid structure and function with glyoxal caging. J. Am. Chem. Soc. 142, 17766–17781 (2020).
Article ADS CAS PubMed Google Scholar
Erlich, Y. & Zielinski, D. DNA fountain enables a robust and efficient storage architecture. Science 355, 950–954 (2017).
Article ADS CAS PubMed Google Scholar
Lin, L. et al. Molecular-level insights on the reactive facet of carbon nitride single crystals photocatalysing overall water splitting. Nat. Catal. 3, 649–655 (2020).
Article CAS Google Scholar
Jiang, F. et al. A general temperature-guided language model to design proteins of enhanced stability and activity. Sci. Adv. 10, eadr2641 (2024).
Article CAS PubMed PubMed Central Google Scholar
Singh, G. et al. RUBICON: a framework for designing efficient deep learning-based genomic basecallers. Genome Biol. 25, 49 (2024).
Article PubMed PubMed Central Google Scholar
Song, J. et al. DEMINERS enables clinical metagenomics and comparative transcriptomic analysis by increasing throughput and accuracy of nanopore direct RNA sequencing. Genome Biol. 26, 76 (2025).
Article PubMed PubMed Central Google Scholar
Pezo, V. et al. Noncanonical DNA polymerization by aminoadenine-based siphoviruses. Science 372, 520–524 (2021).
Article ADS CAS PubMed Google Scholar
Heyn, H. & Esteller, M. An adenine code for DNA: a second life for N6-methyladenine. Cell 161, 710–713 (2015).
Article CAS PubMed Google Scholar
Dunn, M. R., McCloskey, C. M., Buckley, P., Rhea, K. & Chaput, J. C. Generating biologically stable TNA aptamers that function with high affinity and thermal stability. J. Am. Chem. Soc. 142, 7721–7724 (2020).
Article ADS CAS PubMed Google Scholar
Piao, X., Wang, H., Binzel, D. W. & Guo, P. Assessment and comparison of thermal stability of phosphorothioate-DNA, DNA, RNA, 2’-F RNA, and LNA in the context of Phi29 pRNA 3WJ. RNA 24, 67–76 (2018).
Article CAS PubMed Google Scholar
Stein, C. A., Subasinghe, C., Shinozuka, K. & Cohen, J. S. Physicochemical properties of phosphorothioate oligodeoxynucleotides. Nucleic Acids Res. 16, 3209–3221 (1988).
Article CAS PubMed PubMed Central Google Scholar
Nielsen, P. E., Egholm, M. & Buchardt, O. Peptide nucleic acid (PNA). A DNA mimic with a peptide backbone. Bioconjug. Chem. 5, 3–7 (1994).
Article CAS PubMed Google Scholar
Tri et al. Antisense oligonucleotide modified with serinol nucleic acid (SNA) induces exon skipping in mdx myotubes. RSC Adv. 7, 34049–34052 (2017).
Article Google Scholar
Khudyakov, I. Y., Kirnos, M. D., Alexandrushkina, N. I. & Vanyushin, B. F. Cyanophage S-2l contains DNA with 2,6-diaminopurine substituted for adenine. Virology 88, 8–18 (1978).
Article CAS PubMed Google Scholar
Wan, L. Q., Yi, J., Lam, S. L., Lee, H. K. & Guo, P. 5-methylcytosine substantially enhances the thermal stability of DNA minidumbbells. Chem.-Eur. J. 27, 6740–6747 (2021).
Article CAS PubMed Google Scholar
Singh, S. K. et al. Characterization of DNA with an 8-oxoguanine modification. Nucleic Acids Res. 39, 6789–6801 (2011).
Article CAS PubMed PubMed Central Google Scholar
Ziomek, K., Kierzek, E., Biala, E. & Kierzek, R. The thermal stability of RNA duplexes containing modified base pairs placed at internal and terminal positions of the oligoribonucleotides. Biophys. Chem. 97, 233–241 (2002).
Article CAS PubMed Google Scholar
Pallan, P. S. et al. Unexpected origins of the enhanced pairing affinity of 2’-fluoro-modified RNA. Nucleic Acids Res. 39, 3482–3495 (2011).
Article CAS PubMed Google Scholar
Wang, Y. J., Vorperian, A., Shehabat, M. & Chaput, J. C. Evaluating the catalytic potential of a general RNA-cleaving FANA enzyme. Chembiochem 21, 1001–1006 (2020).
Article CAS PubMed Google Scholar
Matsuda, S. et al. Shorter is better: the α-(l)-threofuranosyl nucleic acid modification improves stability, potency, safety, and Ago2 binding and mitigates off-target effects of small interfering RNAs. J. Am. Chem. Soc. 145, 19691–19706 (2023).
Article ADS CAS PubMed Google Scholar
Filichev, V. V., Christensen, U. B., Pedersen, E. B., Babu, B. R. & Wengel, J. Locked nucleic acids and intercalating nucleic acids in the design of easily denaturing nucleic acids: thermal stability studies. Chembiochem 5, 1673–1679 (2004).
Article CAS PubMed Google Scholar
Kashida, H., Murayama, K. & Asanuma, H. Acyclic artificial nucleic acids with phosphodiester bonds exhibit unique functions. Polym. J. 48, 781–786 (2016).
Article CAS Google Scholar
Suggs, J. W. & Taylor, D. A. Evidence for sequence-specific conformational-changes in DNA from the melting temperatures of DNA phosphorothioate derivatives. Nucleic Acids Res. 13, 5707–5716 (1985).
Article CAS PubMed PubMed Central Google Scholar
Lan, W. X. et al. Structural investigation into physiological DNA phosphorothioate modification. Sci. Rep. 6, 25737 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Gryaznov, S. M. et al. Oligonucleotide N3’-]P5’ Phosphoramidates. Proc. Natl. Acad. Sci. USA 92, 5798–5802 (1995).
Article ADS CAS PubMed PubMed Central Google Scholar
Eriksson, M. & Nielsen, P. E. PNA-nucleic acid complexes. Structure, stability and dynamics. Q Rev. Biophys. 29, 369–394 (1996).
Article CAS PubMed Google Scholar
Murayama, K., Tanaka, Y., Toda, T., Kashida, H. & Asanuma, H. Highly stable duplex formation by artificial nucleic acids acyclic threoninol nucleic acid (aTNA) and serinol nucleic acid (SNA) with acyclic scaffolds. Chemistry 19, 14151–14158 (2013).
Article CAS PubMed Google Scholar
Czernecki, D. et al. How cyanophage S-2L rejects adenine and incorporates 2-aminoadenine to saturate hydrogen bonding in its DNA. Nat. Commun. 12, 2420 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Cozens, C., Pinheiro, V. B., Vaisman, A., Woodgate, R. & Holliger, P. A short adaptive path from DNA to RNA polymerases. Proc. Natl. Acad. Sci. USA 109, 8067–8072 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Aschenbrenner, J., Drum, M., Topal, H., Wieland, M. & Marx, A. Direct sensing of 5-methylcytosine by polymerase chain reaction. Angew. Chem. Int. Ed. 53, 8154–8158 (2014).
Article CAS Google Scholar
Malyshev, D. A., Seo, Y. J., Ordoukhanian, P. & Romesberg, F. E. PCR with an expanded genetic alphabet. J. Am. Chem. Soc. 131, 14620–14621 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Sun, L. et al. From polymerase engineering to semi-synthetic life: artificial expansion of the central dogma. RSC Chem. Biol. 3, 1173–1197 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bai, J., Zou, J., Cao, Y., Du, Y. & Chen, T. Recognition of an unnatural base pair by tool enzymes from bacteriophages and its application in the enzymatic preparation of DNA with an expanded genetic alphabet. ACS Synth. Biol. 12, 2676–2690 (2023).
Article CAS PubMed Google Scholar
Hamashima, K., Soong, Y. T., Matsunaga, K., Kimoto, M. & Hirao, I. DNA sequencing method including unnatural bases for DNA aptamer generation by genetic alphabet expansion. ACS Synth. Biol. 8, 1401–1410 (2019).
Article CAS PubMed Google Scholar
Switzer, C., Moroney, S. E. & Benner, S. A. Enzymatic incorporation of a new base pair into DNA and RNA. J. Am. Chem. Soc. 111, 8322–8323 (1989).
Article ADS CAS Google Scholar
Chen, T. J. et al. Evolution of thermophilic DNA polymerases for the recognition and amplification of C2′-modified DNA. Nat. Chem. 8, 557–563 (2016).
Article Google Scholar
Chen, D., Han, Z., Liang, X. & Liu, Y. Engineering a DNA polymerase for modifying large RNA at specific positions. Nat. Chem. 17, 382–392 (2025).
Article ADS CAS PubMed Google Scholar
Liu, Z., Chen, T. & Romesberg, F. E. Evolved polymerases facilitate selection of fully 2’-OMe-modified aptamers. Chem. Sci. 8, 8179–8182 (2017).
Article CAS PubMed PubMed Central Google Scholar
Chen, T. & Romesberg, F. E. A method for the exponential synthesis of RNA: introducing the polymerase chain transcription (PCT) reaction. Biochemistry 56, 5227–5228 (2017).
Article CAS PubMed Google Scholar
Medina, E., Yik, E. J., Herdewijn, P. & Chaput, J. C. Functional comparison of laboratory-evolved XNA polymerases for synthetic biology. ACS Synth. Biol. 10, 1429–1437 (2021).
Article CAS PubMed Google Scholar
Nikoomanzar, A., Dunn, M. R. & Chaput, J. C. Evaluating the rate and substrate specificity of laboratory evolved XNA polymerases. Anal. Chem. 89, 12622–12625 (2017).
Article ADS CAS PubMed Google Scholar
Larsen, A. C. et al. A general strategy for expanding polymerase function by droplet microfluidics. Nat. Commun. 7, 11235 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Horhota, A. et al. Kinetic analysis of an efficient DNA-dependent TNA polymerase. J. Am. Chem. Soc. 127, 7427–7434 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
Dangerfield, T. L., Kirmizialtin, S. & Johnson, K. A. Substrate specificity and proposed structure of the proofreading complex of T7 DNA polymerase. J. Biol. Chem. 298, 101627 (2022).
Article CAS PubMed PubMed Central Google Scholar
Smith, D. C. et al. Nanopores map the acid-base properties of a single site in a single DNA molecule. Nucleic Acids Res. 52, 7429–7436 (2024).
Article CAS PubMed PubMed Central Google Scholar
Rand, A. C. et al. Mapping DNA methylation with high-throughput nanopore sequencing. Nat. Methods 14, 411–413 (2017).
Article CAS PubMed PubMed Central Google Scholar
Harrison, J., Stirzaker, C. & Clark, S. J. Cytosines adjacent to methylated CpG sites can be partially resistant to conversion in genomic bisulfite sequencing leading to methylation artifacts. Anal. Biochem. 264, 129–132 (1998).
Article CAS PubMed Google Scholar
Georgieva, D., Liu, Q., Wang, K. & Egli, D. Detection of base analogs incorporated during DNA replication by nanopore sequencing. Nucleic Acids Res. 48, e88 (2020).
Article CAS PubMed PubMed Central Google Scholar
Müller, C. A. et al. Capturing the dynamics of genome replication on individual ultra-long nanopore sequence reads. Nat. Methods 16, 429–436 (2019).
Article PubMed PubMed Central Google Scholar
Maier, K. C., Gressel, S., Cramer, P. & Schwalb, B. Native molecule sequencing by nano-ID reveals synthesis and stability of RNA isoforms. Genome Res. 30, 1332–1344 (2020).
Article CAS PubMed PubMed Central Google Scholar
Ledbetter, M. P. et al. Nanopore sequencing of an expanded genetic alphabet reveals high-fidelity replication of a predominantly hydrophobic unnatural base pair. J. Am. Chem. Soc. 142, 2110–2114 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Yamashige, R. et al. Highly specific unnatural base pair systems as a third base pair for PCR amplification. Nucleic Acids Res. 40, 2793–2806 (2012).
Article CAS PubMed Google Scholar
Ellefson, J. W. et al. Synthetic evolutionary origin of a proofreading reverse transcriptase. Science 352, 1590–1593 (2016).
Article ADS CAS PubMed Google Scholar
Stephenson, W. et al. Direct detection of RNA modifications and structure using single-molecule nanopore sequencing. Cell Genom. 2, 100097 (2022).
Article CAS PubMed PubMed Central Google Scholar
Wang, Y., Ngor, A. K., Nikoomanzar, A. & Chaput, J. C. Evolution of a General RNA-Cleaving FANA Enzyme. Nat. Commun. 9, 5067 (2018).
Article ADS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors acknowledge the facility supported by the Hangzhou Institute of Medicine, Chinese Academy of Sciences. This work was supported by the National Key Research and Development Program of China (Grant No. 2022YFC3400400 and 2021YFF1200200), the National Natural Science Foundation of China (No. 22307123), Leading Health Talents in Zhejiang Province (WS2022LJ01), and the Natural Science Foundation of Zhejiang Province (No.YXD23B0301).

Author information

Authors and Affiliations

School of Automation and Intelligent Sensing, Shanghai Jiao Tong University, Shanghai, China
Yan Wang
Hangzhou Institute of Medicine Chinese Academy of Sciences, Hangzhou, China
Yufeng Pei, Linlin Tang, Songtao Zhou & Jie Song
Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, China
Xinyu Sun

Authors

Yan Wang
View author publications
Search author on:PubMed Google Scholar
Yufeng Pei
View author publications
Search author on:PubMed Google Scholar
Linlin Tang
View author publications
Search author on:PubMed Google Scholar
Xinyu Sun
View author publications
Search author on:PubMed Google Scholar
Songtao Zhou
View author publications
Search author on:PubMed Google Scholar
Jie Song
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.W.: data curation, formal analysis, visualization, writing–original draft, review & editing; Y.P.*: conceptualization, data curation, formal analysis, funding acquisition, investigation, writing original draft, review & editing; L.T.: supervision, methodology, validation, review & editing; X.S.: conceptualization, formal analysis, software, supervision; S.Z.: resources, software, supervision, visualization; J.S.*: formal analysis, funding acquisition, investigation, project administration, validation, visualization.

Corresponding authors

Correspondence to Yufeng Pei or Jie Song.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Yi Li and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, Y., Pei, Y., Tang, L. et al. Advances and challenges in non-canonical nucleic acids data storage. Nat Commun 17, 2354 (2026). https://doi.org/10.1038/s41467-026-68708-6

Download citation

Received: 23 May 2025
Accepted: 02 January 2026
Published: 04 February 2026
Version of record: 11 March 2026
DOI: https://doi.org/10.1038/s41467-026-68708-6

Subjects

Abstract

Similar content being viewed by others

DNA as a universal chemical substrate for computing and data storage

Non-complementary strand commutation as a fundamental alternative for information processing by DNA and gene regulation

Nanosecond chain dynamics of single-stranded nucleic acids

Introduction

Structure-property relationships of non-canonical nucleic acids

Structure diversity of non-canonical nucleic acids

Special properties of non-canonical nucleic acids

Overview of nucleic acid-based data storage

Storage units

Writing

Preservation

Readout

Molecular computation

Advantages of non-canonical nucleic acids-based data storage

Expanded genetic alphabet

Enhanced stability for long-term preservation

Bio-orthogonality

Parallel storage

Current challenges and future perspectives

Writing: synthesis limitations and polymerase compatibility

Reading: sequencing limitations and optimization strategies

Expanding the substrate landscape: chemical diversity and storage potential

Ethical and biosafety considerations

Molecular security, encryption, and bio-orthogonal steganography

Advanced encoding schemes for error minimization

Summary and outlook

Outlook

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links