Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Nature Communications
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. nature communications
  3. articles
  4. article
Enabling global-scale nucleic acid repositories through versatile, scalable biochemical selection from room-temperature archives
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 14 February 2026

Enabling global-scale nucleic acid repositories through versatile, scalable biochemical selection from room-temperature archives

  • Joseph D. Berleant  ORCID: orcid.org/0000-0001-5672-42921 na1,
  • James L. Banal  ORCID: orcid.org/0000-0002-2364-48241 na1 nAff4,
  • Dhriti K. Rao  ORCID: orcid.org/0009-0001-0690-21152 &
  • …
  • Mark Bathe  ORCID: orcid.org/0000-0002-6199-68551,3 

Nature Communications , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Biomineralization
  • DNA and RNA
  • Organizing materials with DNA

Abstract

Conventional storage and retrieval of nucleic acid specimens, particularly unstable RNA, rely on costly cold-chain infrastructure and inefficient robotic handling, inhibiting large-scale nucleic acid archives needed for global genomic biobanking. We introduce a scalable room-temperature storage system with minimal physical footprint that enables database-like queries on encapsulated, barcoded, and pooled nucleic acid samples. Queries incorporate numerical ranges, categorical filters, and combinations thereof, advancing beyond previous demonstrations of single-sample retrieval or Boolean classifiers. We evaluate this system on ninety-six mock SARS-CoV-2 genomic samples barcoded with theoretical patient data including age, location, and diagnostic state, demonstrating rapid, scalable retrieval. We further demonstrate storage and sequencing of human patient-derived nucleic acid samples, illustrating applicability to clinical genomic analysis. By avoiding freezer-based storage and retrieval, this approach scales to millions of samples without loss of fidelity or throughput, enabling large-scale pathogen and genomic repositories in under-resourced or isolated regions of the US and worldwide.

Data availability

Raw sequencing data from human-derived samples have been deposited in the NCBI BioProject database under accession number PRJNA1344794: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1344794. Processed match counts to each internal barcode for each experiment are available on Zenodo at https://doi.org/10.5281/zenodo.1050134763. Raw datasets are available on Zenodo at https://doi.org/10.5281/zenodo.1751619164. Figure source data are provided in this paper.

Code availability

Data analysis scripts with processed outputs are archived on Zenodo and are available at https://doi.org/10.5281/zenodo.1050134763 and on the GitHub repository https://github.com/lcbb/BiosampleSQL under the MIT license. The version of this repository associated with this publication is archived on Zenodo and is accessible at https://doi.org/10.5281/zenodo.1740243865.

References

  1. Kreier, F. The myriad ways sewage surveillance is helping fight COVID around the world. Nature https://doi.org/10.1038/d41586-021-01234-1 (2021).

  2. Collins, F. S. & Varmus, H. A New Initiative on Precision Medicine. N. Engl. J. Med. 372, 793–795 (2015).

    Google Scholar 

  3. Vargas, A. J. & Harris, C. C. Biomarker development in the precision medicine era: lung cancer as a case study. Nat. Rev. Cancer 16, 525–537 (2016).

    Google Scholar 

  4. Tarazona, S., Arzalluz-Luque, A. & Conesa, A. Undisclosed, unmet and neglected challenges in multi-omics studies. Nat. Comput. Sci. 1, 395–402 (2021).

    Google Scholar 

  5. Lee, S. B. et al. Assessing a novel room temperature DNA storage medium for forensic biological samples. Forensic Sci. Int. Genet. 6, 31–40 (2012).

    Google Scholar 

  6. Ryder, O. A., McLaren, A., Brenner, S., Zhang, Y.-P. & Benirschke, K. DNA Banks for endangered animal species. Science 288, 275–277 (2000).

    Google Scholar 

  7. Brandies, P., Peel, E., Hogg, C. J. & Belov, K. The value of reference genomes in the conservation of threatened species. Genes 10, 846 (2019).

    Google Scholar 

  8. Kieffer, C., Genot, A. J., Rondelez, Y. & Gines, G. Molecular computation for molecular classification. Adv. Biol. 7, 2200203 (2023).

    Google Scholar 

  9. Zhang, D. Y. & Seelig, G. Dynamic DNA nanotechnology using strand-displacement reactions. Nat. Chem. 3, 103–113 (2011).

    Google Scholar 

  10. Lopez, R., Wang, R. & Seelig, G. A molecular multi-gene classifier for disease diagnostics. Nat. Chem. 10, 746–754 (2018).

    Google Scholar 

  11. Zhang, C. et al. Cancer diagnosis with DNA molecular computation. Nat. Nanotechnol. 15, 709–715 (2020).

    Google Scholar 

  12. Yin, F. et al. DNA-framework-based multidimensional molecular classifiers for cancer diagnosis. Nat. Nanotechnol. 18, 677–686 (2023).

    Google Scholar 

  13. Roundtree, I. A. & He, C. RNA epigenetics—chemical messages for posttranscriptional gene regulation. Curr. Opin. Chem. Biol. 30, 46–51 (2016).

    Google Scholar 

  14. Kan, R. L., Chen, J. & Sallam, T. Crosstalk between epitranscriptomic and epigenetic mechanisms in gene regulation. Trends Genet. 38, 182–193 (2022).

    Google Scholar 

  15. Helm, M. & Motorin, Y. Detecting RNA modifications in the epitranscriptome: predict and validate. Nat. Rev. Genet. 18, 275–291 (2017).

    Google Scholar 

  16. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Google Scholar 

  17. Elliott, P., Peakman, T. C. & Biobank, U. K. The UK Biobank sample handling and storage protocol for the collection, processing and archiving of human blood and urine. Int. J. Epidemiol. 37, 234–244 (2008).

    Google Scholar 

  18. Bull, R. A. et al. Analytical validity of nanopore sequencing for rapid SARS-CoV-2 genome analysis. Nat. Commun. 11, 6272 (2020).

    Google Scholar 

  19. Minogue, T. D., Koehler, J. W., Stefan, C. P. & Conrad, T. A. Next-generation sequencing for biodefense: biothreat detection, forensics, and the clinic. Clin. Chem. 65, 383–392 (2019).

    Google Scholar 

  20. Whitmore, L. et al. Inadvertent human genomic bycatch and intentional capture raise beneficial applications and ethical concerns with environmental DNA. Nat. Ecol. Evol. 7, 873–888 (2023).

    Google Scholar 

  21. Opitz, L. et al. Impact of RNA degradation on gene expression profiling. BMC Med. Genomics 3, 36 (2010).

    Google Scholar 

  22. Gallego Romero, I., Pai, A. A., Tung, J. & Gilad, Y. RNA-seq: impact of RNA degradation on transcript quantification. BMC Biol. 12, 42 (2014).

    Google Scholar 

  23. Mendy, M. et al. Biospecimens and Biobanking in Global Health. Glob. Health Pathol. 38, 183–207 (2018).

    Google Scholar 

  24. Ziyatdinov, A. et al. Genotyping, sequencing and analysis of 140,000 adults from Mexico City. Nature 622, 784–793 (2023).

    Google Scholar 

  25. Wall, J. D. et al. The GenomeAsia 100K project enables genetic discoveries across Asia. Nature 576, 106–111 (2019).

    Google Scholar 

  26. Naslavsky, M. S. et al. Whole-genome sequencing of 1,171 elderly admixed individuals from Brazil. Nat. Commun. 13, 1004 (2022).

    Google Scholar 

  27. Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).

    Google Scholar 

  28. Bick, A. G. et al. Genomic data in the All of Us Research Program. Nature 627, 340–346 (2024).

    Google Scholar 

  29. Organick, L. et al. Random access in large-scale DNA data storage. Nat. Biotechnol. 36, 242–248 (2018).

    Google Scholar 

  30. Tomek, K. J. et al. Driving the scalability of DNA-based information storage systems. ACS Synth. Biol. 8, 1241–1248 (2019).

    Google Scholar 

  31. Banal, J. L. & Bathe, M. Scalable nucleic acid storage and retrieval using barcoded microcapsules. ACS Appl. Mater. Interfaces 13, 49729–49736 (2021).

    Google Scholar 

  32. Banal, J. L. et al. Random access DNA memory using Boolean search in an archival file storage system. Nat. Mater. 20, 1272–1280 (2021).

    Google Scholar 

  33. Organick, L. et al. Probing the physical limits of reliable DNA data retrieval. Nat. Commun. 11, 616 (2020).

    Google Scholar 

  34. Xu, Q., Schlabach, M. R., Hannon, G. J. & Elledge, S. J. Design of 240,000 orthogonal 25mer DNA barcode probes. Proc. Natl. Acad. Sci. USA 106, 2289–2294 (2009).

    Google Scholar 

  35. Porichis, F. et al. High-throughput detection of miRNAs and gene-specific mRNA at the single-cell level by flow cytometry. Nat. Commun. 5, 5641 (2014).

    Google Scholar 

  36. Goldstein, E., Lipsitch, M. & Cevik, M. On the effect of age on the transmission of SARS-CoV-2 in households, schools, and the community. J. Infect. Dis. 223, 362–369 (2021).

    Google Scholar 

  37. Fauver, J. R. et al. Coast-to-coast spread of SARS-CoV-2 during the Early Epidemic in the United States. Cell 181, 990–996 (2020).

    Google Scholar 

  38. Kishi, J. Y. et al. SABER amplifies FISH: enhanced multiplexed imaging of RNA and DNA in cells and tissues. Nat. Methods 16, 533–544 (2019).

    Google Scholar 

  39. Player, A. N., Shen, L.-P., Kenny, D., Antao, V. P. & Kolberg, J. A. Single-copy gene detection using branched DNA (bDNA) in situ hybridization. J. Histochem. Cytochem. 49, 603–611 (2001).

    Google Scholar 

  40. Tao, K. et al. The biological and clinical significance of emerging SARS-CoV-2 variants. Nat. Rev. Genet. 22, 757–773 (2021).

    Google Scholar 

  41. Bei, Y. et al. Overcoming variant mutation-related impacts on viral sequencing and detection methodologies. Front. Med. 9, 989913 (2022).

    Google Scholar 

  42. Karthikeyan, S. et al. Wastewater sequencing reveals early cryptic SARS-CoV-2 variant transmission. Nature 609, 101–108 (2022).

    Google Scholar 

  43. Lagerborg, K. A. et al. Synthetic DNA spike-ins (SDSIs) enable sample tracking and detection of inter-sample contamination in SARS-CoV-2 sequencing workflows. Nat. Microbiol. 7, 108–119 (2022).

    Google Scholar 

  44. Kubik, S. et al. Recommendations for accurate genotyping of SARS-CoV-2 using amplicon-based sequencing of clinical samples. Clin. Microbiol. Infect. 27, 1036.e1–1036.e8 (2021).

    Google Scholar 

  45. Rosenthal, S. H. et al. Development and validation of a high throughput SARS-CoV-2 whole genome sequencing workflow in a clinical laboratory. Sci. Rep. 12, 2054 (2022).

    Google Scholar 

  46. BigQuery public datasets. Google Cloud https://cloud.google.com/bigquery/public-data.

  47. Open Datasets Documentation - Tutorials, API reference - Azure - Azure Open Datasets. https://learn.microsoft.com/en-us/azure/open-datasets/.

  48. Open Data on AWS. https://aws.amazon.com/opendata/.

  49. The Nucleic Acid Observatory Consortium. A global nucleic acid observatory for biodefense and planetary health. Preprint at arXiv:2108.02678 (2021).

  50. Azenta Life Sciences. Cryogenic Storage Solutions in Life Sciences. https://www.azenta.com/learning-center/resources/cryogenic-storage-solutions-life-sciences-comprehensive-guide-decision-making (2024).

  51. Bee, C. et al. Molecular-level similarity search brings computing to DNA data storage. Nat. Commun. 12, 4764 (2021).

    Google Scholar 

  52. Eldjarn, G. H. et al. Large-scale plasma proteomics comparisons through genetics and disease associations. Nature 622, 348–358 (2023).

    Google Scholar 

  53. Zhao, T. et al. Spatial genomics enables multi-modal study of clonal heterogeneity in tissues. Nature 601, 85–91 (2022).

    Google Scholar 

  54. Hunter, J. D. Matplotlib: A 2D Graphics Environment. Comput. Sci. Eng. 9, 90–95 (2007).

    Google Scholar 

  55. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Google Scholar 

  56. Knuth, D. E. The Art of Computer Programming, Volume 4, Fascicle 2: Generating All Tuples and Permutations. (Addison-Wesley, 2005).

  57. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Google Scholar 

  58. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Google Scholar 

  59. Wilm, A. et al. LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nucleic Acids Res. 40, 11189–11201 (2012).

    Google Scholar 

  60. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Google Scholar 

  61. McKenna, A. et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Google Scholar 

  62. Aksamentov, I., Roemer, C., Hodcroft, E. & Neher, R. Nextclade: clade assignment, mutation calling and quality control for viral genomes. J. Open Source Softw. 6, 3773 (2021).

    Google Scholar 

  63. Berleant, J. D., Banal, J. L., Rao, D. K. & Bathe, M. Enabling global-scale nucleic acid repositories through versatile, scalable biochemical selection from room-temperature archives. Zenodo https://doi.org/10.5281/ZENODO.10501347 (2025).

  64. Berleant, J. D., Banal, J. L., Rao, D. K. & Bathe, M. Full datasets from: Enabling global-scale nucleic acid repositories through versatile, scalable biochemical selection from room-temperature archives. Zenodo https://doi.org/10.5281/ZENODO.17516191 (2025).

  65. Berleant, J. D., Banal, J. L., Rao, D. K. & Bathe, M. lcbb/BiosampleSQL: Publication release. Zenodo https://doi.org/10.5281/ZENODO.17402438 (2025).

  66. NIAID Visual & Medical Arts. Eppendorf Tube. NIAID NIH BIOART Source. bioart.niaid.nih.gov/bioart/143 (2024).

  67. NIAID Visual & Medical Arts. 96 Well Plate. NIAID NIH BIOART source. bioart.niaid.nih.gov/bioart/7 (2024).

  68. NIAID Visual & Medical Arts. Next gen sequencer. NIAID NIH BIOART source. bioart.niaid.nih.gov/bioart/386 (2024).

Download references

Acknowledgements

M.B. and J.D.B. were supported by the Office of Naval Research (N00014-21-1-4013), the Army Research Office (ICB Subaward KK1954), and the National Science Foundation (CBET-1729397, OAC-1940231, and CCF-1956054). Additional funding to M.B. was provided through the National Science Foundation (CCF-2403100) and to J.D.B. through a National Science Foundation Graduate Research Fellowship (Grant No. 1122374). J.L.B. acknowledges support in part by the National Science Foundation SBIR Phase I 2136447, UCSF Parnassus Flow CoLab RRID:SCR_018206, DRC Center Grant NIH P30 DK063720, UCSF Center for Advanced Technology at Mission Bay, and Illumina. This research was also supported by a core center grant from the National Institute of Environmental Health Sciences, National Institutes of Health (P30-ES002109). We are grateful to T.B. Schardl and C.E. Leiserson (MIT CSAIL) for useful discussions on DNA barcoding. We thank G. Paradis, M. Jennings, and M. Griffin of the Flow Cytometry Core at the Koch Institute at the Massachusetts Institute of Technology (MIT) for flow sorting assistance. We are grateful to Delaware Diagnostics Labs for providing us de-identified clinical SARS-CoV-2 samples. We thank G. Tikhorimov for providing access to a Beckman Coulter Labcyte Echo 550. We are grateful to Ella Maru Studio, Inc. for assistance in creating the airport schematic in Fig. 1a.

Author information

Author notes
  1. James L. Banal

    Present address: Cache DNA, Inc., San Carlos, CA, USA

  2. These authors contributed equally: Joseph D. Berleant, James L. Banal.

Authors and Affiliations

  1. Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA

    Joseph D. Berleant, James L. Banal & Mark Bathe

  2. University of Cambridge, Cambridge, UK

    Dhriti K. Rao

  3. Broad Institute of MIT and Harvard, Cambridge, MA, USA

    Mark Bathe

Authors
  1. Joseph D. Berleant
    View author publications

    Search author on:PubMed Google Scholar

  2. James L. Banal
    View author publications

    Search author on:PubMed Google Scholar

  3. Dhriti K. Rao
    View author publications

    Search author on:PubMed Google Scholar

  4. Mark Bathe
    View author publications

    Search author on:PubMed Google Scholar

Contributions

M.B., J.D.B., and J.L.B. conceived the sample storage system. J.D.B. designed the sample barcoding scheme and query language architecture. J.L.B. designed and implemented sample synthesis, FAS selection, and post-processing after selection of mock patient samples, and encapsulation and barcoding of clinical SARS-CoV-2 samples. D.K.R. prepared samples for sequencing and analyzed the data. J.L.B. and J.D.B. performed data analysis after querying and calculation of summary statistics. M.B. supervised the entire project. All authors contributed equally to the writing of the manuscript.

Corresponding author

Correspondence to Mark Bathe.

Ethics declarations

Competing interests

The Massachusetts Institute of Technology has filed a patent related to this work on behalf of J.L.B., M.B., J.D.B., and additional inventors (US Patent App. 17/836,726). J.L.B. and M.B. are co-founders and equity shareholders of Cache DNA, Inc. (Cache). J.L.B. is an employee of Cache and an independent contractor of OpenAI. D.K.R. was an intern at Cache for the period of this work.

Peer review

Peer review information

Nature Communications thanks Fajia Sun and the other anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Dataset 1

Supplementary Dataset 2

Supplementary Dataset 3

Supplementary Dataset 4

Supplementary Dataset 5

Supplementary Dataset 6

Reporting Summary

Transparent Peer Review file

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Berleant, J.D., Banal, J.L., Rao, D.K. et al. Enabling global-scale nucleic acid repositories through versatile, scalable biochemical selection from room-temperature archives. Nat Commun (2026). https://doi.org/10.1038/s41467-026-69402-3

Download citation

  • Received: 02 April 2025

  • Accepted: 28 January 2026

  • Published: 14 February 2026

  • DOI: https://doi.org/10.1038/s41467-026-69402-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Videos
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims & Scope
  • Editors
  • Journal Information
  • Open Access Fees and Funding
  • Calls for Papers
  • Editorial Values Statement
  • Journal Metrics
  • Editors' Highlights
  • Contact
  • Editorial policies
  • Top Articles

Publish with us

  • For authors
  • For Reviewers
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Nature Communications (Nat Commun)

ISSN 2041-1723 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing