Introduction

The digital evolution of information technology has been driven largely by continuous innovations in data storage technologies. Since the mid-20th century, silicon-based semiconductors have enabled the explosive growth of the information age1. However, as global data volumes are projected to exceed 175 zettabytes by 20252, conventional storage media, including magnetic devices, optical discs, and semiconductor memories, are approaching their physical and economic limits. Despite magnetic storage boasts theoretical maximum capacities of ~1014 bits/cm³ (ref. 3), real-world devices typically last fewer than 50 years even under ideal conditions4, are highly sensitive to environmental stress, and require prohibitive costs for long-term data maintenance5. Besides, the cost of maintaining data over decades is also prohibitively high, such as the $1 billion per decade required for maintaining 109 GB of tape storage6. Furthermore, the operation of large-scale data centers contributes significantly to global energy demand ( ~ 5% of electricity use) and greenhouse gas emissions ( ~ 2%), underscoring an urgent need for sustainable alternatives7,8. To enhance data storage capacity and durability while reducing costs, various novel data storage media have been developed. Glass-based systems, such as 5D optical storage in fused silica, leverage femtosecond laser writing to create nanostructured voxels with ultra-high density (up to 360 TB/disc) and longevity ( > 14 billion years)9. Metamaterial-based technologies expand possibilities through engineered electromagnetic properties, enabling terahertz-frequency data encoding and bypassing classical diffraction limits10. Other emerging platforms include phase-change materials, which utilize reversible structural transitions for non-volatile memory11.

As the most compact and durable data carrier, DNA has emerged as a promising medium for molecular data storage, offering ultrahigh storage density (455 EB/g12), minimal energy consumption5, and long-term stability13,14, as evidenced by successful retrieval of genetic material from ancient specimens dating back hundreds of thousands of years15. DNA’s well-understood biochemistry and compatibility with existing molecular biology techniques (e.g., PCR, sequencing) further enhance its attractiveness as a storage substrate. Additionally, DNA can be preserved and maintained under mild conditions, and biochemical experiments such as PCR or cellular proliferation require minimal energy input5, significantly reducing the cost of data storage, replication, and transmission.

Nevertheless, several challenges hinder the widespread application of DNA as a material for data storage, including poor chemical and biological stability in the open environment, restricted chemical diversity, and constrained storage capacity. DNA is particularly vulnerable to DNase16,17, extreme pH conditions18, ultraviolet radiation19, and various chemicals which can induce strand breaks and result in information loss20,21. Such DNA-damaging factors are commonly encountered in practical applications, posing significant limitations to its broader use in these fields.

To address these limitations, several innovative strategies have been developed to enhance their stability, functionality, and range of applications. These strategies include the use of DNA nanostructures (such as DNA origami)22,23, chemical modifications (such as non-canonical nucleic acids, short for ncNAs)24,25, encapsulation by hydrogel26, metal-organic frameworks27, and SiO2 particles28. Each approach targets different aspects of nucleic acid limitations and contributes to their broader application in data storage29, molecular computing30, diagnostics31, drug delivery32, and synthetic biology33.

Among these strategies, one of the most attracting approaches is the utilization of ncNAs with naturally or chemically modified in base, sugar, backbone, or adopted mirror-image isomerism34. Compared with canonical nucleic acids such as DNA or RNA, these evolutionary or artificially designed alternations generally lead to distinctive properties, display unique advantages in data storage35. (i) The incorporation of ncNAs broadens the information-encoding alphabet and enhances storage capacity. (ii) NcNAs exhibit unique orthogonality, enabling data encryption and multithreaded molecular computation, which enhances the security of data storage and computational complexity. (iii) Some of the endogenous modifications on nucleic acids provide intrinsic advantages in data storage, editing, and erasure in living systems. (iv) Modifications on bases, sugars, and backbones bring changes in physical, chemical, and biological properties, leading to enhance their stabilities against attacking from open environment. Hence, ncNAs emerge as promising materials in data storage.

Structure-property relationships of non-canonical nucleic acids

Structure diversity of non-canonical nucleic acids

NcNAs, defined by chemical modifications beyond the canonical nucleotides, are generally classified into several categories: nucleobase modifications (including both naturally occurring non-canonical bases and engineered unnatural base pairs, UBPs), sugar modifications, phosphate backbone modifications, and mirror-image isomerism34 (Fig. 1).

Fig. 1: Major categories of non-canonical nucleic acids (ncNAs) modifications.
Fig. 1: Major categories of non-canonical nucleic acids (ncNAs) modifications.
Full size image

Overview of respectative ncNA modification strategies, including base, sugar, backbone, multiple modifications, and mirror-image isomerisms.

Base modifications aim to expand the genetic alphabet and diversify base-pairing capabilities36. Naturally occurring bases such as 2, 6-diaminopurine (short for 2-amino-A), which forms three hydrogen bonds with thymine (T), can increase the hybridization affinity of DNA duplexes37,38. Epigenetic base modifications, including N6-methyldeoxyadenosine (6 mA), C5-methylcytosine (5mC), and their oxides (e.g., N6-methyladenosine, 6hmA, and 5-hydroxymethylcytosine, 5hmC), play essential regulatory roles in gene expression39.

Base modifications like 5-fluorouracil (5-F-U)40, 5-chlorouracil (5-Cl-U)41, 5-bromouracil (5-Br-U)42, and 5-iodouracil (5-I-U)43 expand the chemical diversity of nucleic acids, enabling enhanced base-pairing interactions and offering potential for fine-tuned regulation of DNA hybridization and stability. Engineered UBPs have extended the repertoire of orthogonal pairing interactions. Up to 12 synthetic bases have been developed (A, T, G, C, B, S, P, Z, X, K, J, and V)44, forming six mutually exclusive base pairs. Notably, Hachimoji DNA and RNA systems constructed from eight synthetic nucleotides (dZ, dP, dS, dB, Z, P, S, and B) allow for enhanced data storage capacity45. In addition, base pairs such as NaM-5SICS and TPT3-NaM, which rely on hydrophobic rather than hydrogen-bonding interactions46,47,48, have demonstrated the feasibility of alternative pairing mechanisms for information encoding and retrieval. Moreover, the unnatural pair between 7-(2-thienyl)-imidazo[4,5-b]pyridine (Ds) and 2-nitro-4-propynylpyrrole (Px), in which Px contains a five-membered pyrrole ring49, illustrates further structural and steric complementarity as a design principle50. In addition, the isoC-isoG base pair, which is composed of 6-amino-2-ketopurine (isoG) and 2-amino-4-ketopyrimidine (isoC) and is a structural isomer of the G-C base pair, forms a three-hydrogen-bond pair through a different hydrogen-bonding geometry, demonstrating the possibility of alternative base-pairing modes51.

Sugar modifications involve substitution of the native ribose or deoxyribose sugars with alternative chemical structures, significantly influencing the conformational flexibility, thermodynamic stability, and enzymatic recognition of the modified nucleic acids52. For instance, 2′-O-methylation (2′-OMe) is a common nucleotide epitranscriptomic modification found in RNA, whereby a methyl group is added to the 2′ hydroxyl group of the ribose moiety53. Likewise, in 2′-F-RNA, a fluorophore replaces the 2′-OH group of RNA, forming more stable duplexes with RNA54. Arabino nucleic acid (ANA), which has the 2′-OH in a configuration opposite to that of RNA, adopts a DNA-like conformation55. In 2′-deoxy-2′-fluoroarabino nucleic acid (FANA), the ribose is replaced by a 2′-fluoroarabinose moiety that adopts a C2′/O4′-endo conformation, reminiscent of B-form DNA56, thereby enhancing hybridization affinity57. Threose nucleic acid (TNA), comprising a four-carbon sugar and a 2′,3′-linked phosphodiester backbone58,59, adopts a B-helical conformation with an average twist of 36° and a helical rise of 3.2 Å per nucleotide60,61, and demonstrate robust hybridization with both DNA and RNA. Locked nucleic acid (LNA) incorporates a methylene bridge connecting the 2′-oxygen and 4′-carbon of the ribose ring, conformationally locking the sugar in a C3′-endo pucker, thereby increasing thermal stability and affinity62,63. 1,5-Anhydrohexitol nucleic acid (HNA), derived from a six-membered hexitol ring, features 2′,3′-dideoxy-1′,5′-anhydro-D-arabino-hexitol nucleosides with 4′−6′ phosphodiester linkages and base attachment at the 2′-position64. HNA forms stable antiparallel duplexes with RNA, underscoring its structural versatility65. Glycol nucleic acid (GNA) features a backbone composed of repeating glycol units linked by phosphodiester bonds66, with nucleobases attached to the glycol units, and it has been investigated for its potential in various biotechnological applications. L-alpha-threose nucleic acid (L-αTNA) and D-alpha-threose nucleic acid (D-αTNA) are a pair of nucleic acid analogues that share a threose sugar backbone but differ in the stereoconfiguration of the sugar (L- and D- respectively), with L-αTNA often exhibiting enhanced stability and specific binding characteristics while D-αTNA may display distinct physical-chemical properties due to its opposite sugar configuration67. Xylonucleic acid (XyloNA), which ribose is replaced by xylose in the backbone of nucleic acids, displays a higher thermal stability than DNA or RNA in self-hybridization, and could neither pair with DNA nor RNA68.

Phosphate backbone modification involve chemical alternations to the phosphate backbone, resulting in changes to strand conformation, charge distribution, and biochemical stability69. A well-known backbone modification is the introduction of phosphorothioate linkages (PS-DNA), wherein one of the non-bridging oxygen atoms in the phosphate group is substituted by sulfur, enhancing nuclease resistance and altering electrostatic properties70,71. The N3’ → P5’ phosphoramidate DNA (3′-NP DNA) is a synthetic oligonucleotide analogue in which the 3′-oxygen of the natural phosphodiester backbone is replaced by an amino group, resulting in an N → P linkage72. Peptide nucleic acids (PNAs) exemplify this strategy by replacing the phosphate-sugar backbone with a neutral N-(2-aminoethyl)glycine moiety, thereby conferring resistance to nucleases and proteases73, and enabling high-affinity hybridization via Hoogsteen-like base pairing74. Serinol nucleic acid (SNA) is an acyclic phosphodiester backbone based on serinol (2-amino-1,3-propanediol).

With increasing availability and characterization of individual modifications, combinatorial strategies are emerging that synergistically integrate multiple modifications to harness their respective advantages. For example, antisense oligonucleotides incorporating PS-FANA exhibit both high binding affinity to target RNAs and efficient RNase H-mediated cleavage75. Similarly, TNA-TPT3/NaM constructs combine the structural stability of TNA with the enhanced data storage capacity of TPT3/NaM UBPs, expanding the chemical diversity and functional repertoire of synthetic genetic systems76.

Mirror-image isomerism, such as in L-DNA, an enantiomer of natural D-DNA, confers enhanced resistance to nuclease-mediated degradation due to the chiral specificity of biological enzymes77. In addition to increased biological stability, L-DNA exhibits reduced immunogenicity, making it a promising scaffold for therapeutic and diagnostic applications78.

In summary, the structural diversity of ncNAs imparts a rich spectrum of physicochemical and biological properties, including improved stability, expanded base-pairing systems, enhanced storage capacity, and increased functional versatility. These attributes underlie their potential for diverse applications in molecular data storage, computation, and synthetic biology, as discussed in the following sections.

Special properties of non-canonical nucleic acids

NcNAs exhibit enhanced chemical and biological stability due to their distinctive structural features (Table 1). Structural modifications on sugar, backbone, or bases often introduce conformational constraints and steric hindrance, which reduce susceptibility to enzymatic degradation by nucleases. Such modifications can also alter the electrostatic potential and hydrogen-bonding profile of the nucleic acid, thereby resist chemical or physical attacking. Additionally, the incorporation of hydrophobic moieties reduces solvent accessibility, further minimizing hydrolytic cleavage.

Table 1 Biological and chemical stability of non-canonical nucleic acids

For example, synthetic analogues such as HNA79, L-αTNA80, PS-DNA81, NP-DNA72, and L-DNA78,82, demonstrate remarkable resistance to nuclease-mediated degradation owing to their modified sugar or backbone chemistries. In terms of chemical stability, several ncNAs, including FANA54, TNA83, and PNA84,85, display resistance under acidic conditions. Furthermore, modifications at the 2′-position of the ribose, such as 2′-F-RNA and 2′-OMe-RNA, are well known to significantly enhance resistance to hydrolysis86. NP-DNA also exhibits strong tolerance to divalent and transition metal ions87, underscoring its suitability for chemically challenging environments.

Although structurally distinctive from canonical nucleic acids, many ncNAs retain the capability for Watson-Crick base pairing and programmable hybridization, often displaying distinctive thermodynamic properties comparable to those of canonical duplexes88. These hybridization affinity, strand selectivity, and overall duplex stability are governed by van der Waals forces, electrostatics, hydrogen-bonding, and solvation energetics.

Sugar-modified nucleic acids such as FANA89, HNA90, and L-αTNA91can form stable homoduplexes as well as heteroduplexes with DNA or RNA. The melting temperature (Tm), defined as the temperature at which the hybridized and dissociated states of a duplex coexist with equal probability, is a widely used parameter for assessing duplex thermal stability and hybridization strength. Several ncNAs that modifications on sugar exhibit elevated Tm values relative to canonical nucleic acids. Unlike DNA, nucleic acids with 2′-substituted sugar modifications contribute to the pre-organization of sugar, which entropically facilitate hybridization89. For example, ANA increases Tm by ~ 10.2 °C per incorporation65. LNA raises Tm by ~ 10 °C per substitution92. However, Tm of TNA-DNA is lower than DNA-DNA hybridization93 due to the distance between two adjacent phosphate groups of TNA cannot perfectly match with DNA94. The geometric mismatch within the TNA–DNA duplex leads to asymmetric “breathing” fluctuations of the TNA-DNA duplex95, thus increases the off rate from hybridized TNA-DNA duplex96. For clarity, Table 2 summarizes average Tm shifts across representative systems.

Table 2 Thermodynamic properties of non-canonical nucleic acids

Hybridization affinity and thermal stability are critical not only for molecular durability but also for controlled data retrieval. Duplex denaturation and re-annealing, required during PCR, capture, or strand-displacement readout, are governed by Tm-dependent kinetics (kon/koff). Insufficient stability risks spontaneous strand loss. Moreover, strong duplex pairing can protect nucleic acids from hydrolysis and nuclease attack. However, excessive stability hampers deliberate separation and slows amplification. The self-folding may also increase and suppress the processes of data read-out. Hence, it is essential to balance sequence composition, modification density, and readout conditions for optimized storage performance.

Overview of nucleic acid-based data storage

Owing to their high storage density and exceptional programmability, nucleic acids have emerged as a highly promising substrate for molecular data storage and have attracted substantial research interest in recent years. Nucleic acid-based data storage generally comprises four key steps: encoding, writing, preservation, and reading. The stored data can further be processed by molecular computation based on the communication (hybridization) within nucleic acid strands. In the encoding step, arbitrary information, including text, images, audio, and video, is first converted into binary strings (“0” and “1”) using standardized character encoding tables such as ASCII, UTF-8, or Unicode. These binary sequences are subsequently mapped onto nucleotide sequences97 or higher-order nanostructures. The encoded oligonucleotides are typically synthesized through non-enzymatic or enzymatic approaches and preserved either in vitro or in vivo to ensure long-term stability. Information retrieval is commonly performed via sequencing or non-sequencing techniques. Notably, due to inherent challenges associated with the direct synthesis and sequencing of ncNAs, their systems often require additional processing steps, including transcription and reverse transcription98.

Storage units

In canonical sequence-based storage, information is encoded using fixed binary-to-nucleotide conversion rules or advanced encoding schemes such as Goldman encoding99, Grass encoding, and DNA Fountain100. The simplest binary mapping (e.g., A = 00, C = 01, T = 10, G = 11) achieves a theoretical data storage capacity of 2 bits per nucleotide101.

While sequence-based approaches have demonstrated remarkable potential in molecular data storage, they remain constrained by several inherent limitations of canonical nucleic acids. These include susceptibility to enzymatic degradation, limited chemical diversity, and a restricted four-letter genetic alphabet, which historically capped the maximum theoretical storage capacity at 2 bits per base (Fig. 2a-i)102,103. However, recent advancements in encoding techniques, such as the use of DNA-QLC104, have shown that it is possible to surpass 2 bits per base limit, enabling much higher storage densities.

Fig. 2: Nucleic acid-based storage units.
Fig. 2: Nucleic acid-based storage units.
Full size image

a Sequence-encoded strategies. i Canonical 4-letter DNA encodes 2 bits per nucleotide. ii Unnatural base pairs (UBPs), such as the 12-letter AEGIS system, increase the theoretical storage capacity to 3.58 bits per nucleotide according to log2 (N). iii The relationship between alphabet size (N) and coding capacity is shown. iv Sugar-modified nucleic acids (e.g., TNA) improve biochemical stability and data integrity. v Mirror-image (L-) nucleic acids offer high bio-orthogonality and have been explored for steganographic applications, and epigenetic modifications enable multi-layered coding and parallelized storage. b Nanostructure-encoded strategies. Static DNA structures: i DNA hairpins, ii DNA origami, and iii DNA tile arrays provide robust and addressable frameworks. Dynamic DNA structures: iv temperature-responsive structures, v pH-responsive structures, and vi light-responsive azobenzene-modified systems enable reversible signal switching between distinctive bit states. In this figure (a) (vi, “Epigenetic modification”) is reproduced with permission from Zhang, C. et al., Nature 634, 824–832 (2024). © 2024 Springer Nature. All graphical elements in this figure were originally created by the authors using Adobe Illustrator.

To address these challenges, ncNAs are used in data storage. These engineered ncNAs offer expanded coding capacity105, enhanced chemical and biological stability83, and greater compatibility with parallel106, exhibiting attractive potential as high-capacity, high-stability, and secure molecular data storage candidates. The theoretical storage capacity of ncNAs reaches \(\log {2}^{N}\,\) bit per ncNA incorporation (\(N\)), suggesting ncNA may largely increase the capacity on a log scale compared with only four bases system (Fig. 2a-ii, iii). For example, incorporation of UBPs, such as B-S and Z-P pairs, further expands the information alphabet and increases theoretical data capacity to 3 bit/nucleotide (Fig. 2a-ii)107,108. In addition, sugar-modified systems such as TNA can stably encode binary data using canonical mapping strategies (Fig. 2a-iv)76,109,110. Additionally, L-DNA, the mirror-image enantiomer of natural D-DNA, maintains equivalent theoretical storage capacity while offering high bio-orthogonality, which greatly enhances resistance to natural nuclease degradation (Fig. 2a-v)78. Furthermore, epigenetic modifications based 5 mC have been explored, in which enzymatic methylation of self-assembled DNA scaffolds using DNMT1, overcoming the slow kinetics and high cost of de novo synthesis (Fig. 2a-vi)106. Collectively, these strategies enable non-canonical systems to surpass canonical DNA in both security and durability, paving the way for robust, orthogonal molecular data storage platforms.

While sequence-based approaches primarily rely on one-dimensional arrangements of canonical or modified nucleotides to represent digital information, DNA nanostructures leverage spatial folding and higher-order assembly to encode data in multiple dimensions (Fig. 2b)111. This structural encoding can substantially increase the data storage capacity, introduce additional layers of security, and diversify retrieval modalities.

For example, DNA hairpin structures further enrich the encoding capacity by exploiting the formation or disruption of secondary motifs. The presence or absence of such hairpins can be detected by fluorescence resonance energy transfer or other optical readouts, enabling simple and robust binary encoding schemes (Fig. 2b-i).

DNA origami utilizes long single-stranded scaffolds folded by numerous short staple strands into well-defined two- and three-dimensional architectures (Fig. 2b-ii), which are capable of representing digital symbols, images, or complex patterns that are readily visualized by atomic force microscopy (AFM) or transmission electron microscopy (TEM)112. In contrast to linear sequences, such spatial configurations allow the direct storage of recognizable shapes or logos, circumventing the need for base-by-base sequencing during retrieval.

Similarly, tile-based DNA arrays self-assemble into nanoscale lattices in which the presence, absence, or arrangement of specific tiles encodes binary or multilevel information (Fig. 2b-iii)111. These lattice architectures support parallel data storage and can be interrogated through surface imaging techniques or hybridization-based probing.

In this context, recent advances in deep learning models capable of accurately predicting the folding free energy of DNA sequences offer powerful tools to preemptively eliminate high-propensity hairpin-forming regions113, thereby streamlining the design of reliable and structurally defined storage architectures.

Dynamic DNA devices extend this paradigm by incorporating responsive elements that reversibly switch between distinct structural states in response to external stimuli such as temperature (Fig. 2b-iv)114, pH (Fig. 2b-v)115,116, or light117,118 (Fig. 2b-vi). This dynamic behavior enables rewritable and reconfigurable data storage systems that are challenging to achieve with purely sequence-based approaches. Furthermore, the kinetics and dynamics of oligonucleotide hybridization systematically elucidate how sequence composition, environmental conditions, and conformational transitions profoundly influence pairing rates and stability119, providing a fundamental framework for designing responsive and high-fidelity DNA storage architectures.

In contrast to DNA and RNA, ncNAs are typically employed for the assembly of relatively smaller nanostructures because of the difficulty of the availability of long stranded ncNAs, especially modification on sugars. Most of the strands in these ncNA nanostructures are obtained by solid-phase synthesis. Compared to DNA nanostructures, these ncNA nanostructures display unique advantages, for example, double crossover nanostructures constructed by five FANA strands have higher thermostability and acid-resistance ability120. Nucleoside analogs floxuridine (FUDR) and gemcitabine (GEM) contained RNA oligonucleotides can be assembled into four-way junction structures, which have high efficacy in the treatment of Triple-Negative Breast Cancer121. To construct large ncNA nanostructure, long stranded ncNAs are required. These long stranded ncNAs should be transcribed by polymerase. Dr. Alexander I. Taylor et al.122. constructed a full FANA octahedron origami structure. The long-stranded FANA ( ~ 1.7kbp) was synthesized by transcription with Tgo-D4K polymerase using long single-stranded DNA template. Recently, a ~2000 nt nucleoside analogues (m5C, s2U, 5FU, and PseudoU) integrated single-stranded RNA origami has been reported synthesized by T7 polymerase, which can induce epigenetic immunomodulation123. With the increasing number of ncNA polymerases obtained through directed evolution, ncNA origami or other large-scaled nanostructures will emerge. Collectively, these strategies illustrate how the programmability and architectural versatility of DNA nanotechnology can transcend the limitations of linear nucleotide sequences, offering a promising platform for high-capacity, secure, and multidimensional molecular data storage.

Writing

Writing processes in nucleic acid-based systems are typically classified into non-enzymatic and enzymatic synthesis methods.

Solid-phase synthesis using phosphoramidite chemistry remains the widely adopted technique for synthesizing non-canonical oligonucleotides, including PNA, LNA, and TNA variants124. This method involves the stepwise coupling of protected nucleoside phosphoramidites to a growing oligonucleotide anchored on a solid support (Fig. 3a–i). However, synthesis efficiency is limited by side reactions (e.g., deletions, insertions, and substitutions) and cumulative coupling inefficiencies ( ~ 95–99.5% per cycle35), which generally restrict product length to under 120 nt125,126. Additionally, the technique demands tightly controlled environmental conditions and specialized reagents, limiting scalability and increasing cost (Table 3)127,128. Microarray-based synthesis offers an alternative for high-throughput production, leveraging parallel phosphoramidite reactions across surface-bound arrays129.

Fig. 3: Data writing and reading of nucleic acid data storage.
Fig. 3: Data writing and reading of nucleic acid data storage.
Full size image

a Strategies for strands synthesis through non-enzymatic and enzymatic approaches: i solid-phase synthesis, ii metal-catalyzed ligation, iii polymerase-mediated extension, iv ligase-assisted assembly, and v TdT-mediated synthesis. b Preservation formats: i in vitro storage via solution, powder, silica encapsulation, and fiber embedding; ii in vivo storage through cellular transfection and genomic integration. c Retrieval of stored data: i physical extraction, ii PCR amplification, and iii CRISPR-Cas-based programmable editing. d Readout technologies: i Sanger sequencing, ii next-generation sequencing, iii nanopore sequencing, iv atomic force microscopy, v gel electrophoresis, and vi fluorescence microscopy. In this figure (d) (“Reading”) is reproduced with permission from Li, B. et al., Advanced Materials 37, 2412926 (2025). © 2025 Wiley-VCH GmbH. All graphical elements in this figure were originally created by the authors using Adobe Illustrator.

Table 3 Cost of oligonucleotides, phosphoramidite monomers, and triphosphate monomers for canonical and non-canonical nucleic acids

Template-directed chemical ligation offers an alternative route, especially valuable for substrates incompatible with enzymatic systems. In this strategy, phosphate groups are activated by coordination with N-cyanoimidazole (CNIm) and Mn²⁺ ions, enabling nucleophilic attack and subsequent phosphodiester bond formation (Fig. 3a-ii)130. This method has been successfully applied to the synthesis of chemically diverse backbones such as L-αTNA and 3’-NP-DNA131, supporting the incorporation of non-standard linkages with high fidelity.

DNA and RNA polymerases can incorporate modified triphosphates (xNTPs) into growing strands under physiological conditions (Fig. 3a-iii)132. Through directed evolution and rational design, engineered polymerases have been developed to accommodate specific non-canonical substrates133,134 (Table 4). These enzymes support phosphodiester bond formation at the 3′-OH via sequential nucleotidyl transfer, and in specific mutant contexts, can even facilitate phosphoramidate bond formation135. Recent advances involving trivalent ions (e.g., Sc³⁺) have significantly accelerated reaction kinetics87.

Table 4 Polymerase compatibility of non-canonical nucleic acids synthesis

Template-directed ligation offers a complementary approach for synthesizing long non-canonical strands without the need for monomer synthesis (Fig. 3a-iv). T3, T4, and T7 DNA ligases have been employed to ligate various modified strands, including FANA136, TNA, and LNA137, with high efficiency and sequence specificity. This method allows modular assembly of oligonucleotide segments (20–120 nt), facilitating the construction of longer polymers from shorter synthetic units.

TiEOS leverages the template-independent activity of terminal deoxynucleotidyl transferase (TdT), which adds modified dNTPs to the 3′ end of ssDNA without a templating strand (Fig. 3a-v)138. TdT exhibits remarkable substrate tolerance, enabling incorporation of LNA139, 2′-OMe133, 2′-F133, and dTPT3TP140. TiEOS provides a programmable, modular framework for synthesizing structurally diverse nucleic acid libraries suitable for data storage and encryption. In parallel, cap-free TdT synthesis strategies combined with trit-based encoding and enzymatic length control have demonstrated scalable, low-cost production of high-fidelity DNA data pools, further extending the practical potential of enzyme-driven storage architectures141.

Preservation

Efficient preservation of nucleic acid molecules is a critical prerequisite for long-term data integrity in molecular data storage systems. Preservation strategies can be broadly classified into in vitro and in vivo preservation (Fig. 3b), each offering distinct advantages in terms of storage capacity, stability, scalability, and flexibility.

In vitro preservation methods protect DNA outside living systems through physical and chemical stabilization. Approaches such as silica encapsulation, glass immobilization, and polymer-based matrices have enabled long-term storage under ambient conditions by minimizing hydrolysis and oxidation (Fig. 3b, left)5. DNA embedded in electrospun and polymer fibers also offers a scalable, solid-state format with high capacity and mechanical stability142, enabling protection against environmental degradation while maintaining accessibility for downstream retrieval.

In contrast, in vivo preservation strategies integrate synthetic DNA constructs into biological hosts such as plasmids, artificial chromosomes, microbial or plants143,144,145,146. These systems benefit from the host’s intrinsic replication and repair mechanisms147 (Fig. 3b, right), which not only amplify stored sequences but also correct base-level errors and support dynamic, programmable data editing148,149.

However, both in vitro and in vivo platforms must contend with the chemical vulnerability of natural DNA to environmental perturbations150,151. In this regard, ncNAs such as TNA and L-DNA exhibit markedly enhanced stability due to their resistance to enzymatic degradation and limited interaction with endogenous cellular machinery78,109. These properties render chemically modified systems particularly attractive for applications requiring long-term archival storage and data integrity in hostile environments.

Readout

The final stage of nucleic acid-based data storage involves the retrieval and decoding of stored information. This process typically comprises two sequential steps: random access retrieval from complex DNA pools, and readout of the encoded information, either by sequencing or structural analysis152.

Selective retrieval of target DNA molecules from complex DNA pools is commonly achieved through physical extraction, PCR via designed primers, CRISPR-based techniques153, or digital microfluidics154. Magnetic bead-based physical extraction utilizes sequence-specific hybridization and magnetic separation for high-throughput, lossless isolation (Fig. 3c-i). PCR amplification, a widely employed method due to its specificity and scalability, is capable of retrieving target sequences from DNA pools containing millions of distinct oligonucleotides (Fig. 3c-ii)155. However, PCR is limited by primer design constraints, susceptibility to non-specific amplification, and incompatibility with rewritable storage architectures129,156. Programmable CRISPR-based DNA storage systems enable precise data manipulation and retrieval (Fig. 3c-iii)157, exemplified by Cas12a-driven multiplexed searches and Cas9-mediated rewriting via homology-directed repair158,159. Digital microfluidics divide large DNA pools into spatially segregated subpools on microfluidic chips to minimize strand interactions and enhance retrieval efficiency27,160,161. Despite improvements, this strategy still faces limitations due to molecular crowding and amplification-induced errors162,163.

For non-canonical systems (e.g., TNA, L-DNA), retrieval is typically performed using PCR-based methods78,110. Nevertheless, CRISPR-guided retrieval remains a promising direction, particularly when adapted to recognize unnatural bases.

Readout of stored information is predominantly performed using sequencing or non-sequencing readout techniques depending on the storage units.

Canonical DNA storage systems rely heavily on first-generation (Sanger) sequencing (Fig. 3d-i), next-generation sequencing (NGS) (Fig. 3d-ii)164 and third-generation (Nanopore-based) sequencing (Fig. 3d-iii), which enables long-read, real-time monitoring of DNA strands. Nanopore sequencing, in particular, has emerged as a versatile platform for direct reading of chemically modified nucleic acids165. For example, FANA strands can be read using Nanopore Induced Phase-Shift Sequencing (NIPSS) after ligation with DNA drive strands166. Phosphorothioate (PS) modifications can be detected via the ONT/ELIGOS platform167. DeepMod, a deep learning framework built on ONT data, achieves high accuracy in detecting epigenetic marks such as 5-methylcytosine (5mC, ~99%) and N6-methyladenine (6 mA, ~90%)39.

When direct sequencing is not feasible, many systems rely on reverse transcription (RT) of modified strands into canonical DNA or RNA88. TNA-to-cDNA conversion, followed by Illumina sequencing, has demonstrated high fidelity ( ~ 99.2%)110. Base substitutions during RT enable detection of Z-P168, Ds-Px169, and TPT3-NaM170 pairs using conventional sequencing platforms (Sanger, NGS, or Nanopore). Although these methods expand compatibility, base conversion strategies often introduce mutagenic noise and sequence bias, particularly during multiple RT-PCR cycles, thereby compromising accuracy and potentially obscuring encrypted information78. In Table 5, we summarize the sequencing methods of ncNAs.

Table 5 Sequencing methods for non-canonical nucleic acids

For storage platforms that employ structural or conformational encoding, information retrieval is typically achieved through microscopy-based readout methods. AFM provides nanometer-resolution imaging of nanoscale motifs, enabling direct detection of features such as bulges, nicks, or conformational switches (Fig. 3d-iv)171. Gel electrophoresis offers a sequencing-free readout modality by distinguishing structural variants according to their differential migration rates (Fig. 3d-v). More recently, super-resolution microscopy techniques, such as stochastic optical reconstruction microscopy and DNA points accumulation for imaging nanoscale topography (DNA-PAINT), have enabled multiplexed, high-throughput decoding of DNA nanostructures by resolving structural states at the single-molecule level (Fig. 3d-vi)172,173. These sequencing-independent strategies are particularly advantageous for chemically diverse or orthogonal nucleic acid systems, as they bypass the limitations of base-calling algorithms and sequence fidelity. Importantly, structure-based readout approaches provide unique opportunities for secure or tamper-proof storage, where critical information is embedded in physical topology and spatial configuration rather than in linear sequence, rendering it inaccessible to conventional sequencing platforms.

Molecular computation

Similar to silicon-based electronic computers, DNA with stored data can also perform processing on the data. Benefiting from the molecular recognition capability of DNA and their data storage capacity, molecular computation development based on DNA has been realized. Since Adleman pioneered the use of DNA computing to solve the Hamiltonian Path Problem174 in 1994, DNA has been considered as a powerful molecular computing tool for solving intricate problems. Based on Watson-Crick-Franklin base pair interaction, primary DNA logic gates such as YES, NOT, OR, and AND gate have been designed175,176,177, as well as more complex circuits and networks have been developed178. A landmark breakthrough was reported in 2006 by Erik Winfree179 employing toehold mediated strand displacement enables the cascading of computations and scaling up. After that, Qian et al.180 further designed seesaw gate-based logic circuits and achieved DNA based neural networks. Kevin M. Cherry et al. even demonstrated the implementation of a DNA-based supervised learning network181, highlighting the powerful role of DNA as a programmable and dynamic material in molecular computation. A rechargeable computation system in DNA logic circuits and neural networks manipulated by heat was also established, suggesting DNA-based computation may develop in a more sustainable manner than other artificial systems181,182. Recent advances in biophysical techniques, particularly single-molecule fluorescence and magnetic tweezers, have empowered DNA computation performed at the single-molecule level183,184, thereby permitting real-time and controllable monitoring of the computing process and enhancing its interactivity.

However, current DNA computing still faces certain challenges, particularly in terms of low orthogonality, which may arise as computation scales up, since greater circuit or network complexity increases the likelihood of unwanted leakage reactions from mismatched interactions. A promising strategy is to utilize the intrinsic high orthogonality of ncNAs. For example, L-DNA does not directly recognize natural DNA or RNA185, but can communicate with them mediated by PNA186 or chimeric D/L-oligonucleotides187. An amplification circuit composed of D-αTNA is orthogonal to L-αTNA, DNA and RNA, but can be initiated by itself or SNA188. Conceptually, introducing ncNAs enhances the parallelism of nucleic acid computing, accelerating computation and extending its scalability.

Advantages of non-canonical nucleic acids-based data storage

NcNAs, owing to their diverse chemical structures and physicochemical properties, present several notable advantages over their canonical counterparts for digital data storage (Table 6). These include expanded genetic alphabets, enhanced chemical and biological stability, and parallelized data writing.

Table 6 Advantages and challenges of non-canonical nucleic acids data storage

Expanded genetic alphabet

A major limitation of canonical DNA storage lies in its dependence on four standard nucleotides (A, T, G, C), restricting the maximum theoretical data storage capacity to 2 bits per base. UBPs expand this alphabet and thus enable higher-capacity storage48,189. For instance, a 12-letter artificially expanded genetic information systems (AEGIS), including synthetic bases such as P, Z, B, S, X, J, K, and V, can theoretically encode up to 3.58 bits per base (log₂12)44, offering a significant improvement over canonical systems. Hydrophobic base pairs such as NaM-TPT3 further contribute to this expansion and enhance the functional diversity of the storage system190.

Enhanced stability for long-term preservation

The chemical and enzymatic stability of the storage medium is critical for data integrity. Sugar-modified nucleic acids such as TNA exhibit enhanced biochemical and thermal stability, remaining intact even after prolonged exposure to biological environments such as human serum, where canonical DNA undergoes rapid degradation110. Moreover, hybrid systems combining sugar modifications and UBPs (e.g., TNA + NaM-TPT376) not only boost storage capacity but also introduce multifunctional advantages, including environmental responsiveness and intrinsic data encryption.

Bio-orthogonality

Benefiting from alterations in spatial structure, ncNAs can evade recognition by biomacromolecules, thereby achieving bio-orthogonality and enabling data storage independently of biological systems. Mirror-image nucleic acids (L-DNA) maintain the same sequence information capacity as D-DNA but provide superior biological stability due to their resistance to natural nucleases such as DNase I78. This significantly prolongs their lifespan in biological environments191, making them ideal for secure storage. Their bio-orthogonality ensures that they are not misread by natural systems, offering intrinsic data protection. However, the lack of compatible polymerases and sequencing tools makes L-DNA systems cost-intensive and technically challenging.

Parallel storage

Sequence-independent writing represents a promising frontier for scalable storage. The use of epigenetic markers, such as 5-methylcytosine (5mC), allows binary data to be encoded orthogonally onto existing sequences-methylated cytosine (5mC) denoting “1” and unmethylated cytosine denoting “0”. This strategy, akin to movable type printing, employs site-specific methylation via DNMT1 to inscribe data without altering the underlying sequence192. Such systems are inherently parallelizable, reversible, and offer reconfigurable, reusable storage architecture.

Collectively, ncNAs displayed diverse advantages in data storage. NcNAs with modifications at different positions exhibit distinct advantages in various application scenarios of data storage. For instance, modification on bases may expand the storage capacity, while lacking in biological and chemical stability. In contrast, modifications on the sugar ring and phosphate backbone can enhance the stability and orthogonality of nucleic acids; however, these modifications often hinder polymerase recognition, making the synthesis of long strands more challenging. Data retrieval is even more difficult than that based on base modifications. Furthermore, multiple modifications, such as TNA + UBP, combine the nuclease-resistant scaffold of TNA with the expanded storage capacity of UBP. Mirror-image nucleic acids (e.g., L-DNA) exhibit bio-orthogonality by evading recognition from natural biomacromolecules, offering enhanced stability and security for data storage, though challenges remain in polymerase compatibility and sequencing tools. In contrast, the parallelization of data writing using epigenetic markers like 5mC enables efficient, reversible, and scalable encoding without altering the original sequence, providing a flexible and reusable storage architecture.

Current challenges and future perspectives

Despite their substantial potential, ncNAs face several technical bottlenecks that hinder their large-scale deployment. Chief among these are challenges in synthesis, readout technologies, and chemical diversification. Addressing these limitations is critical for the realization of scalable, robust, and cost-effective data storage platforms.

Writing: synthesis limitations and polymerase compatibility

Synthesis remains a primary obstacle. Both chemical and enzymatic synthesis routes are technically demanding and costly193,194. Solid-phase synthesis involves the use of harsh reagents and expensive nucleotide building blocks (Table 3)195, while enzymatic approaches are limited by the substrate recognition fidelity of natural polymerases, which often fail to process UBPs or modified backbones. These inefficiencies significantly drive-up costs. Additionally, current synthesis technologies struggle to produce long, non-canonical sequences. In molecular data storage, longer nucleic acid strands require fewer index sequences, resulting in less redundancy and thus higher storage capacity. In contrast, short ncNA limits both data capacity and reliability. To overcome these limitations, more automated and scalable synthesis platforms are required. Enhancing coupling efficiency and minimizing synthesis errors in solid-phase synthesis, and reducing monomer costs through novel chemical or enzymatic production pathways, will be essential196. Meanwhile, polymerase engineering, accelerated by machine learning tools, holds promise for improving enzymatic compatibility with non-canonical substrates197,198. For instance, Holliger et al. engineered an archaeal polymerase with steric gate mutations to enable synthesis of long 2′-OMe RNA oligonucleotides up to 750 nt199.

Reading: sequencing limitations and optimization strategies

Accurate and efficient readout is another major challenge. Conventional sequencing platforms like Illumina and Sanger methods are often incompatible with non-canonical nucleotides, typically necessitating reverse transcription into canonical DNA, a step that adds complexity, time, and error potential200.

Recent experimental advances demonstrate that nanopore sequencing can be adapted to directly decode synthetic and ncNAs. Two landmark studies reported (i) high-throughput deconvolution of non-canonical bases201,202,203 and (ii) sequencing of DNA analogs with artificial bases203, both relying on bespoke training datasets and customized basecaller models. In parallel, methodological improvements, including Uncalled4204, Rockfish205, and the Bonito/Dorado/Remora toolchains206, as well as transformer-based architectures, have markedly enhanced the accuracy of modification detection and base analog discrimination. Advances in solid-state nanopores (e.g., graphene, MoS₂207) further underscore the theoretical potential for improved resolution and tunability, although robust superiority over biological pores remains to be experimentally validated. Collectively, these developments suggest that the gap between chemical innovation in ncNAs and their practical single-molecule readout is rapidly narrowing, contingent on continued progress in pore engineering, dataset curation, and machine-learning–driven basecalling.

Expanding the substrate landscape: chemical diversity and storage potential

The current landscape of ncNAs remains limited in scope, representing only a small fraction of their potential. Broadening the chemical repertoire, through the development of new bases, sugars, and backbone chemistry, will be crucial. For instance, Pol6G12 and RT521 have shown capability for transferring genetic information between DNA and HNA134, highlighting their potential in storage applications.

RNA, while structurally and functionally analogous to DNA, offers both distinct advantages and pronounced challenges in the context of molecular data storage. As a single-stranded polymer, RNA exhibits greater conformational flexibility and is amenable to a wide range of enzymatic manipulations, including transcription, reverse transcription, and programmable editing via CRISPR-Cas systems, making it a promising candidate for dynamic and rewritable data architectures208,209,210. However, its chemical instability, primarily due to the presence of a 2′-hydroxyl group that renders the phosphodiester backbone susceptible to base-catalyzed hydrolysis, significantly limits its utility for long-term storage applications211. Moreover, RNA is highly vulnerable to ubiquitous ribonucleases, which can rapidly degrade unprotected RNA molecules even under mild conditions212. To address these limitations, various chemical modification strategies have been developed to enhance RNA stability while preserving its informational and functional capabilities. Notable among these are 2′-O-methyl (2′-OMe), 2′-fluoro (2′-F), and LNA modifications, which confer increased resistance to nucleases and thermal denaturation without compromising base-pairing fidelity213,214. These modified RNAs have found extensive use in therapeutic and diagnostic applications, and their potential in data storage systems remains an emerging area of interest.

Beyond RNA itself, synthetic analogs with altered sugar-phosphate backbones offer a broader chemical space for engineering storage systems with superior stability and functionality. Several RNA-inspired ncNAs, such as GNA, and HNA, have demonstrated the ability to form stable Watson-Crick base pairs with complementary DNA or RNA strands, while exhibiting enhanced resistance to hydrolytic and enzymatic degradation134. Importantly, polymerases capable of synthesizing and replicating TNA and HNA have been engineered, suggesting that these systems could support autonomous information copying and evolution in synthetic media215.

Looking ahead, combining multiple modifications (e.g., TNA + TPT376, PS + FANA75) offers a modular approach to building high storage capacity, functional, and responsive storage systems. These hybrid platforms can encode data with enhanced fidelity, security, and environmental adaptability, features vital for the next-generation of molecular data storage technologies76,216.

Ethical and biosafety considerations

Given their structural homology with natural DNA and RNA, ncNAs raise important ethical and biosafety concerns, especially as they move from in vitro to potential in vivo applications. Although ncNA-based data storage has been demonstrated only in vitro so far, advancements in DNA-based in vivo storage systems suggest that ncNAs could be explored for future in vivo data storage217,218. However, several biosafety and ethical considerations must be carefully addressed.

One concern arises from recent discussions about mirror-image life forms composed of L-DNA and other mirror-image biological molecules219. Although the synthesis of a mirror-image organism is beyond current technological capabilities, progress in synthetic biology could eventually enable the construction of a fully mirror-image bacterium. Preliminary assessments suggest that these organisms could evade host immune surveillance and resist degradation by natural enzymes, leading to uncontrolled proliferation and potential biosafety risks. To mitigate these concerns, constructing restriction systems that limit the replication of mirror-image life may be necessary.

Similarly, the application of ncNAs for data storage requires caution regarding their biosafety. While the evolution of polymerases capable of handling these modified bases has been a focal point, the development of highly specific nucleases targeting ncNAs is also critical. Furthermore, the self-replication capabilities of ncNAs must be tightly controlled. Prior to in vivo data storage, it is essential to evaluate the efficiency of genome integration, potential mutation risks, immune responses, and metabolic clearance pathways associated with ncNAs. These factors must be clearly defined to ensure safety in future applications.

Molecular security, encryption, and bio-orthogonal steganography

Beyond expanding data storage capacity and chemical diversity, ncNAs also open new horizons for molecular-level security and encryption. Incorporating modified or synthetic nucleotides such as 5mC or other unnatural bases can render encoded sequences readable only by specific biochemical or enzymatic processes220, ensuring that only authorized entities can decode the data. In parallel, embedding hidden data layers within DNA through chemical modifications221 or secondary structural encoding provides additional protection against unauthorized access, establishing a molecular equivalent of multi-factor authentication.

Furthermore, ncNA sequences can also serve as unique molecular signatures or physical unclonable functions (PFUs) or molecular fingerprints. Their replication requires precise knowledge of both the sequence composition and the synthetic conditions222, thereby making counterfeiting or tampering extremely difficult. The inherent complexity of ncNA architectures, including expanded base alphabets, backbone modifications, and hybridization orthogonality, further enhances anti-counterfeiting capabilities and steganographic potential103.

A particularly exciting direction lies in bio-orthogonal steganography, which leverages the orthogonality between ncNAs and natural biomolecules to conceal data within biological systems without interfering with native functions. For instance, L-DNA, which exhibits no complementarity with natural D-DNA, has been used to create “mirror-encoded” information for covert communication and molecular authentication78. Similarly, PNA with charge-neutral backbones can evade detection by conventional hybridization probes, while highly stable systems like TNA offer durability for long-term encrypted data storage.

Building on this concept, DNA adducts represent a distinctive class of ncNAs in which DNA bases are covalently modified by small molecules such as psoralens or aldehydes223,224, enabling reversible chemical encryption. These adducts can be selectively formed or cleaved in response to external stimuli, including light, heat, or redox conditions, allowing dynamic data concealment and recovery. Collectively, such strategies point toward a future where ncNA-based data storage systems integrate multi-layered molecular encryption, steganography, and authentication, providing unprecedented levels of data security at the nanoscale.

Advanced encoding schemes for error minimization

In previous sections, we introduced the use of UBPs to expand the genetic alphabet, allowing for increased storage capacity in molecular data storage. However, as the alphabet expands, the potential for higher error rates during both data writing and reading processes increases. To address these challenges, several encoding schemes can be employed to minimize errors. Non-linear codes, such as run-length limited and Huffman coding, reduce homopolymer errors and improve error correction efficiency in large-alphabet systems225. Combinatorial encoding strategies, including DNA Fountain and overlapping pool designs, exploit redundancy and stochastic assembly to enhance robustness against synthesis and sequencing errors. More recently, 3D structural encoding using DNA origami and nanostructures enables storage in conformational states, bypassing sequencing altogether226. Within this toolbox, non-canonical nucleotides provide a unique opportunity: expanded alphabets (e.g., AEGIS or Hachimoji systems45) enable octal or higher-level coding schemes, thereby increasing per-base data capacity (up to 3.58 bits/base for a 12-letter alphabet). Moreover, their orthogonality and chemical tunability may synergize with combinatorial and structural encoding, offering parallel channels for error-resilient, multi-layered data representation.

Summary and outlook

As both the types and volumes of data continue growing exponentially, DNA-based storage platforms have attracted considerable attention due to their high storage capacity, durability, and low energy requirements. However, the biochemical instability of canonical DNA poses significant limitations for practical application. NcNAs have emerged as attractive alternatives, with diverse applications in molecular medicine, synthetic biology, and nanotechnology. By introducing various modifications at the base, sugar, or backbone moieties, ncNAs offer several advantages, including enhanced biochemical stability, expanded storage capacity, and novel orthogonal functionalities. Hybrid approaches that integrate multiple modifications, such as epigenetic marks combined with sugar or backbone alterations, hold great promise for creating multi-layered, rewritable, and secure data encoding systems.

Despite the promising potential of ncNAs, several technical challenges hinder their practical application. First, large-scale synthesis remains a significant bottleneck, as current methods are not yet efficient in producing long ncNA sequences. Additionally, many synthetic nucleic acids are incompatible with the polymerases required for replication and transcription227, limiting their functionality. Sequencing also remains a critical challenge; while nanopore sequencing has made progress228,229, reliable single-molecule readouts of UBPs require further optimization, including advancements in pore engineering, signal calibration, and base-calling algorithms. Overcoming these obstacles will necessitate multidisciplinary collaboration, integrating expertise from synthetic chemistry, enzymology, materials science, and computational analysis. Key areas for advancement include expanding the chemical repertoire of nucleic acids, engineering more robust and versatile polymerases, and optimizing nanopore-based sequencing technologies. As these challenges are progressively addressed, ncNAs are poised to form the foundation for next-generation molecular data storage systems.

Outlook

Advances in ncNA-based systems are expected to bring significant improvements in writing, preservation, and reading (Fig. 4).

Fig. 4: Roadmap of Non-Canonical Nucleic Acids in Data Storage.
Fig. 4: Roadmap of Non-Canonical Nucleic Acids in Data Storage.
Full size image

Schematic roadmap illustrating the promising development of ncNAs in molecular data storage, spanning information writing, preservation, and reading technologies.

In terms of writing, rapid progress will be made in ncNA synthesis, driven by technologies like TdT and polymerase variants, enabling the creation of long-stranded and chimeric ncNAs in the short term. In the medium to long term, high-throughput real-time synthesis, fully automated ultra-fast synthesis, and even portable micro-writing devices will emerge, drastically enhancing the throughput and flexibility of ncNA writing.

For preservation, on-chip ncNA storage will be achieved in the near future, followed by the integration of ncNA storage and computing systems. Innovations in encryption and steganography using ncNAs will also gain traction. Over time, smart, low-cost encapsulation systems for real-time monitoring of ncNA degradation, along with ultra-fast micro-encapsulation technologies, will be developed to ensure secure and stable long-term storage of ncNA data.

In reading, relying on the rapid development of nanopore techonologies, nanopore-based ncNA sequencing will become available soon. Supported by the development of AI technique and DNA-based neutral network, AI-assisted semantic retrieval and human-computer interaction systems may be realized. As technology progresses, we will see high-throughput real-time reading, fully automated ultra-fast ncNA reading, and portable micro storage-reading devices, greatly enhancing the speed and convenience of accessing ncNA-based data storage systems.