Scalable cyberinfrastructure for experimental NMR data

Hoch, Jeffrey C.; Henzler-Wildman, Katherine; Edison, Arthur S.; Rienstra, Chad M.; Bontempi, Christopher; Wedell, Jonathan R.; Weatherby, Gerard; Burr, Harrison; Pustovalova, Yulia; Thongdee, Seenat; Gryk, Michael R.; Pozhidaeva, Alexandra; Simon, Bernd; Cheng, Qi; Wilson, Michael P.; Moraru, Ion I.; Morris, Laura; Glushka, John N.; Uchimiya, Mario; Eletsky, Alexander; Moore, Abigail E.; Grimes, John H.; Paterson, Alexander L.; Wang, Songlin; Pinheiro, Paulo R.; Vanderloop, Boden H.; Maciejewski, Mark W.

doi:10.1038/s41597-025-06446-y

Download PDF

Article
Open access
Published: 17 December 2025

Scalable cyberinfrastructure for experimental NMR data

Jeffrey C. Hoch^1,2,
Katherine Henzler-Wildman ORCID: orcid.org/0000-0002-5295-2121^3,4,
Arthur S. Edison ORCID: orcid.org/0000-0002-5686-2350^5,6,7,
Chad M. Rienstra^3,4,
Christopher Bontempi ORCID: orcid.org/0009-0007-3783-6484¹,
Jonathan R. Wedell ORCID: orcid.org/0000-0002-2247-6259¹,
Gerard Weatherby ORCID: orcid.org/0000-0002-0462-4633¹,
Harrison Burr¹,
Yulia Pustovalova ORCID: orcid.org/0000-0003-3024-1764¹,
Seenat Thongdee¹,
Michael R. Gryk¹,
Alexandra Pozhidaeva¹,
Bernd Simon ORCID: orcid.org/0000-0003-0164-5516^1,2,
Qi Cheng¹,
Michael P. Wilson⁸,
Ion I. Moraru⁸,
Laura Morris⁵,
John N. Glushka⁶,
Mario Uchimiya⁶,
Alexander Eletsky⁶,
Abigail E. Moore⁵,
John H. Grimes Jr.⁶,
Alexander L. Paterson⁴,
Songlin Wang⁴,
Paulo R. Pinheiro⁴,
Boden H. Vanderloop⁴ &
…
Mark W. Maciejewski ORCID: orcid.org/0000-0003-1217-1571^1,2

Scientific Data , Article number: (2025) Cite this article

1334 Accesses
1 Altmetric
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

The Network for Advanced NMR (NAN) is a novel distributed resource that connects Nuclear Magnetic Resonance (NMR) facilities via a scalable cyberinfrastructure supporting NMR data harvesting, interactive data management, and the discovery of instruments, methods, and data to enable emerging data standards in biomedicine, chemistry, and material science. Anchored by the first open-access 1.1 GHz instruments in the USA, NAN integrates NMR facilities around a centralized hub for identity management, resource discovery, and access control. The system includes automated data harvesting through the NAN data transport system (NDTS), metadata-rich data archiving, and interactive web-based tools for data and metadata browsing, editing, and publishing, as well as tools for facility and laboratory data management by facility managers and principal investigators. NAN knowledgebases provide vetted, standardized pulse programs, protocols, parameters, and example datasets, along with processed data. Supported by the US National Science Foundation Midscale Research Infrastructure program, NAN helps to democratize access to NMR resources and fosters open, reproducible science.

High-resolution nanoscale NMR for arbitrary magnetic fields

Article Open access 17 October 2023

Structure characterization with NMR molecular networking

Article Open access 17 December 2025

NMR data processing, visualization, analysis and structure calculation with NMRFx

Article Open access 05 December 2025

Data availability

The NAN resource is available as a web portal (https://usnan.nmrhub.org), which provides interactive access to the data browser, sample browser, and associated management tools. All datasets originate from community users of NAN and are automatically harvested from NAN nodes and archived with rich metadata in the repository (see Data Harvesting). Publicly available datasets can be searched, filtered, and downloaded through the NAN data browser (see Data and Sample Browsers). Users without a NAN account can access all public datasets via the “Public & Knowledgebase Datasets” view located on the Resource Connector (https://usnan.nmrhub.org/resource-connector/public-datasets). Datasets become public three years after harvesting unless released earlier by the investigator, and immutable published versions are assigned persistent identifiers and are distributed under a Creative Commons Attribution (CC BY) license to support citation and reuse (see Publishing & Public Data). Access to embargoed non-public datasets is governed by PI controlled permissions (see Accounts & Permissions). In addition to the interactive web portal, NAN provides programmatic access through a Python software development kit (SDK) and RESTful API, enabling automated queries, dataset downloads, and integration into external workflows.

Code availability

As a research infrastructure system, most NAN software has little relevance for individual investigators and therefore is not released publicly. The exception is the Python SDK, which is available on GitHub (https://github.com/NanNMR/PythonSDK) and on the Python Package Index (https://pypi.org/project/usnan/). This SDK provides programmatic access to the system. Internal components such as the NDTS spectrometer components, gateway software, receiver and parser services, along with back-end APIs are restricted for security reasons but may be made available to responsible research organizations with conditions. Organizations interested in deploying an instance of NAN cyberinfrastructure should contact the corresponding author.

References

Jonas, J. & Gutowsky, H. S. NMR in Chemistry–An Evergreen. Annu. Rev. Phys. Chem. 31, 1–28 (1980).
Google Scholar
National Research Council. High Magnetic Field Science and Its Application in the United States: Current Status and Future Directions (2013).
Judge, M. T. & Ebbels, T. M. Problems, principles and progress in computational annotation of NMR metabolomics data. Metabolomics 18, 102 (2022).
Google Scholar
Hoch & Stern, J. C. Alan S. in. NMR Data Processing. (Wiley, New York, 1996).
Google Scholar
Verdi, K. K., Ellis, H. J. & Gryk, M. R. Conceptual-level workflow modeling of scientific experiments using NMR as a case study. BMC Bioinformatics 8, 31 (2007).
Google Scholar
Borges, J. L. The library of Babel. Collected fictions (1998).
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3, 160018 (2016).
Google Scholar
Maciejewski, M. W. et al. NMRbox: a resource for biomolecular NMR computation. Biophys. J. 112, 1529–1534 (2017).
Google Scholar
Hoch, J. C. et al. Biological magnetic resonance data bank. Nucleic Acids Res. 51, D368–D376 (2023).
Google Scholar
Haak, L. L., Fenner, M., Paglione, L., Pentz, E. & Ratner, H. ORCID: a system to uniquely identify researchers. Learned publishing 25, 259–264 (2012).
Google Scholar
Gormley, C. & Tong, Z. in Elasticsearch: the definitive guide: a distributed real-time search and analytics engine (“O’Reilly Media, Inc.”, 2015).
Sud, M. et al. Metabolomics Workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools. Nucleic Acids Res. 44, D463–D470 (2016).
Google Scholar
Kunze, J. & Rodgers, R. The ARK identifier scheme. (2008).
Stonebraker, M. & Rowe, L. A. The design of Postgres. ACM Sigmod Record 15, 340–355 (1986).
Google Scholar
Moreau, L., Groth, P., Cheney, J., Lebo, T. & Miles, S. The rationale of PROV. J. Web Semant. 35, 235–257 (2015).
Google Scholar

Download references

Acknowledgements

NAN is supported by the U.S. National Science Foundation (NSF) through the Mid-scale Research Infrastructure-2 program (Grant Number 1946970) and an Operations and Maintenance grant for sustaining the NAN cyberinfrastructure (Grant Number 2529058).

Author information

Authors and Affiliations

Department of Molecular Biology & Biophysics, UConn Health, Farmington, CT, 06030, USA
Jeffrey C. Hoch, Christopher Bontempi, Jonathan R. Wedell, Gerard Weatherby, Harrison Burr, Yulia Pustovalova, Seenat Thongdee, Michael R. Gryk, Alexandra Pozhidaeva, Bernd Simon, Qi Cheng & Mark W. Maciejewski
Gregory P. Mullen NMR Structural Biology Facility, UConn Health, Farmington, CT, 06030, USA
Jeffrey C. Hoch, Bernd Simon & Mark W. Maciejewski
Department of Biochemistry, University of Wisconsin–Madison, Madison, WI, 53706, USA
Katherine Henzler-Wildman & Chad M. Rienstra
National Magnetic Resonance Facility at Madison (NMRFAM), University of Wisconsin–Madison, Madison, Wisconsin, 53706, USA
Katherine Henzler-Wildman, Chad M. Rienstra, Alexander L. Paterson, Songlin Wang, Paulo R. Pinheiro & Boden H. Vanderloop
Department of Biochemistry & Molecular Biology, University of Georgia, Athens, Georgia, 30602, USA
Arthur S. Edison, Laura Morris & Abigail E. Moore
Complex Carbohydrate Research Center (CCRC), University of Georgia, Athens, Georgia, 30602, USA
Arthur S. Edison, John N. Glushka, Mario Uchimiya, Alexander Eletsky & John H. Grimes Jr.
Institute for Bioinformatics, University of Georgia, Athens, Georgia, 30602, USA
Arthur S. Edison
The Richard D. Berlin Center for Cell Analysis & Modeling, UConn Health, Farmington, CT, 06030, USA
Michael P. Wilson & Ion I. Moraru

Authors

Jeffrey C. Hoch
View author publications
Search author on:PubMed Google Scholar
Katherine Henzler-Wildman
View author publications
Search author on:PubMed Google Scholar
Arthur S. Edison
View author publications
Search author on:PubMed Google Scholar
Chad M. Rienstra
View author publications
Search author on:PubMed Google Scholar
Christopher Bontempi
View author publications
Search author on:PubMed Google Scholar
Jonathan R. Wedell
View author publications
Search author on:PubMed Google Scholar
Gerard Weatherby
View author publications
Search author on:PubMed Google Scholar
Harrison Burr
View author publications
Search author on:PubMed Google Scholar
Yulia Pustovalova
View author publications
Search author on:PubMed Google Scholar
Seenat Thongdee
View author publications
Search author on:PubMed Google Scholar
Michael R. Gryk
View author publications
Search author on:PubMed Google Scholar
Alexandra Pozhidaeva
View author publications
Search author on:PubMed Google Scholar
Bernd Simon
View author publications
Search author on:PubMed Google Scholar
Qi Cheng
View author publications
Search author on:PubMed Google Scholar
Michael P. Wilson
View author publications
Search author on:PubMed Google Scholar
Ion I. Moraru
View author publications
Search author on:PubMed Google Scholar
Laura Morris
View author publications
Search author on:PubMed Google Scholar
John N. Glushka
View author publications
Search author on:PubMed Google Scholar
Mario Uchimiya
View author publications
Search author on:PubMed Google Scholar
Alexander Eletsky
View author publications
Search author on:PubMed Google Scholar
Abigail E. Moore
View author publications
Search author on:PubMed Google Scholar
John H. Grimes Jr.
View author publications
Search author on:PubMed Google Scholar
Alexander L. Paterson
View author publications
Search author on:PubMed Google Scholar
Songlin Wang
View author publications
Search author on:PubMed Google Scholar
Paulo R. Pinheiro
View author publications
Search author on:PubMed Google Scholar
Boden H. Vanderloop
View author publications
Search author on:PubMed Google Scholar
Mark W. Maciejewski
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization of NDTS and the NAN portal was carried out by M.W.M., J.C.H. & K.H.W. Technical requirements were defined by M.W.M., J.C.H., K.H.W., Y.P., S.T., M.R.G., M.P.W., I.I.M. & A.L.P. NDTS and web-portal software components were developed by C.B., J.R.W., G.W. & H.B. Validation and testing were performed by M.W.M., J.C.H., K.H.W., Y.P., S.T., M.R.G., A.P., B.S., L.M., J.N.G., M.U., A.E., A.E.M., J.H.G., A.L.P., S.W., P.R.P. and B.H.V. The original draft was prepared by M.W.M., J.C.H., C.B., and K.H.W. with all authors contributing to reviewing and editing. Project administration was undertaken by Q.C., J.C.H., M.W.M., K.H.W., A.S.E., C.M.R., L.M. and B.H.V.

Corresponding author

Correspondence to Mark W. Maciejewski.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hoch, J.C., Henzler-Wildman, K., Edison, A.S. et al. Scalable cyberinfrastructure for experimental NMR data. Sci Data (2025). https://doi.org/10.1038/s41597-025-06446-y

Download citation

Received: 09 September 2025
Accepted: 11 December 2025
Published: 17 December 2025
DOI: https://doi.org/10.1038/s41597-025-06446-y