Abstract
Nuclear Magnetic Resonance (NMR) spectroscopy is one of the most powerful and widely used tools for molecular structure elucidation in organic chemistry. However, the interpretation of NMR spectra to determine unknown molecular structures remains a labor-intensive and expertise-dependent process, particularly for complex or novel compounds. Although recent methods have been proposed for molecular structure elucidation, they often underperform in real-world applications due to inherent algorithmic limitations and limited high-quality data. Here, we present NMR-Solver, a practical and interpretable framework for the automated determination of small organic molecule structures from 1H and 13C NMR spectra. Our method introduces an automated framework for molecular structure elucidation, integrating large-scale spectral matching with physics-guided molecular optimization that exploits atomic-level structure–spectrum relationships in NMR. We evaluate NMR-Solver on simulated benchmarks, curated experimental data from the literature, and real-world experiments, demonstrating its strong generalization, robustness, and practical utility in real-life scenarios. By integrating computational NMR analysis, deep learning, and interpretable chemical reasoning into a unified system, it facilitates scalable, automated, and chemically meaningful molecular structure elucidation, establishing a generalizable paradigm for solving inverse problems in molecular science.
Similar content being viewed by others
Data availability
The PubChem dataset40, used to construct the SimNMR-PubChem Database, is publicly available at https://pubchem.ncbi.nlm.nih.gov. The processed dataset and database index of the SimNMR-PubChem Database are available on Hugging Face at https://huggingface.co/datasets/yqj01/SimNMR-PubChem. All processed NMR datasets used for testing are available via Zenodo at https://doi.org/10.5281/zenodo.1695202460. All datasets generated and analyzed in this study are publicly accessible and can be freely used for research purposes without restriction.
Code availability
All source code for NMR-Solver is publicly available at https://github.com/YongqiJin/NMR-Solver61 under the open-source MIT License. The trained model weights for NMRNet are available via Zenodo at https://doi.org/10.5281/zenodo.1695202460.
References
Clayden, J., Greeves, N. & Warren, S. Organic Chemistry (Oxford University Press, 2012).
Skoog, D. A., Holler, F. J. & Crouch, S. R. Textbook “Principles of Instrumental Analysis” Vol. 6 (Cengage Learning, 2019).
Elyashberg, M., Williams, A. & Martin, G. Computer-assisted structure verification and elucidation tools in NMR-based structure elucidation. Prog. Nucl. Magn. Reson. Spectrosc. 53, 1–104 (2008).
Ermanis, K., Parkes, K. E., Agback, T. & Goodman, J. M. Doubling the power of DP4 for computational structure elucidation. Org. Biomol. Chem. 15, 8998–9007 (2017).
Howarth, A., Ermanis, K. & Goodman, J. M. DP4-AI automated NMR data analysis: straight from spectrometer to structure. Chem. Sci. 11, 4351–4359 (2020).
Marcarino, M. O., Zanardi, M. M., Cicetti, S. & Sarotti, A. M. NMR calculations with quantum methods: development of new tools for structural elucidation and beyond. Acc. Chem. Res. 53, 1922–1932 (2020).
NMR Workbook Suite. ACD Labs. https://www.acdlabs.com/products/spectrus-platform/nmr-workbook-suite. Accessed 30 August 2025.
MNova. MestreLab Research. https://mestrelab.com/software/mestrenova. Accessed 30 August 2025.
Klukowski, P., Riek, R. & Güntert, P. NMRtist: an online platform for automated biomolecular NMR spectra analysis. Bioinformatics 39, btad066 (2023).
Ruddigkeit, L., Van Deursen, R., Blum, L. C. & Reymond, J.-L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model. 52, 2864–2875 (2012).
Buitrago Santanilla, A. et al. Nanomole-scale high-throughput chemistry for the synthesis of complex molecules. Science 347, 49–53 (2015).
Trobe, M. & Burke, M. D. The molecular industrial revolution: automated synthesis of small molecules. Angew. Chem. Int. Ed. 57, 4192–4214 (2018).
Coley, C. W. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 365, eaax1566 (2019).
Mehr, S. H. M., Craven, M., Leonov, A. I., Keenan, G. & Cronin, L. A universal system for digitization and automatic execution of the chemical synthesis literature. Science 370, 101–108 (2020).
Joung, J. F. et al. Electron flow matching for generative reaction mechanism prediction. Nature 645, 115–123 (2025).
Jia, Y. et al. Robot-assisted mapping of chemical reaction hyperspaces and networks. Nature 645, 922–931 (2025).
Liu, J. & Hein, J. E. Automation, analytics and artificial intelligence for chemical synthesis. Nat. Synth. 2, 464–466 (2023).
Dai, T. et al. Autonomous mobile robots for exploratory synthetic chemistry. Nature 635, 890–897 (2024).
Kozlov, K. S. et al. Discovering organic reactions with a machine-learning-powered deciphering of tera-scale mass spectrometry data. Nat. Commun. 16, 2587 (2025).
Jonas, E. & Kuhn, S. Rapid prediction of NMR spectral properties with quantified uncertainty. J. Cheminform. 11, 50 (2019).
Kwon, Y., Lee, D., Choi, Y.-S., Kang, M. & Kang, S. Neural message passing for NMR chemical shift prediction. J. Chem. Inf. Model. 60, 2024–2030 (2020).
Zou, Z. et al. A deep learning model for predicting selected organic molecular spectra. Nat. Comput. Sci. 3, 957–964 (2023).
Klukowski, P., Riek, R. & Güntert, P. Machine learning in NMR spectroscopy. Prog. Nucl. Magn. Reson. Spectrosc. 148, 101575 (2025).
Wolinski, K., Hinton, J. F. & Pulay, P. Efficient implementation of the gauge-independent atomic orbital method for NMR chemical shift calculations. J. Am. Chem. Soc. 112, 8251–8260 (1990).
Chen, H., Liang, T., Tan, K., Wu, A. & Lu, X. GT-NMR: a novel graph transformer-based approach for accurate prediction of NMR chemical shifts. J. Cheminform. 16, 132 (2024).
Xu, F. et al. Toward a unified benchmark and framework for deep learning-based prediction of nuclear magnetic resonance chemical shifts. Nat. Comput. Sci. 5, 292–300 (2025).
Yao, L. et al. Conditional molecular generation net enables automated structure elucidation based on 13C NMR spectra and prior knowledge. Anal. Chem. 95, 5393–5401 (2023).
Hu, F., Chen, M. S., Rotskoff, G. M., Kanan, M. W. & Markland, T. E. Accurate and efficient structure elucidation from routine one-dimensional NMR spectra using multitask machine learning. ACS Cent. Sci. 10, 2162–2170 (2024).
Alberts, M., Zipoli, F. & Vaucher, A. C. Learning the language of NMR: Structure elucidation from NMR spectra using transformer models. Preprint at ChemRxiv https://doi.org/10.26434/chemrxiv-2023-8wxcz (2023).
Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).
Krenn, M., Häse, F., Nigam, A., Friederich, P. & Aspuru-Guzik, A. Selfies: a robust representation of semantically constrained graphs with an example in chemistry. Mach. Learn. Sci. Technol. 1, 045024 (2020).
Brown, N., Fiscato, M., Segler, M. H. & Vaucher, A. C. GuacaMol: benchmarking models for de novo molecular design. J. Chem. Inf. Model. 59, 1096–1108 (2019).
Tripp, A. & Hernández-Lobato, J. M. Genetic algorithms are strong baselines for molecule generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2310.09267 (2023).
Jensen, J. H. A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space. Chem. Sci. 10, 3567–3572 (2019).
Mirza, A. & Jablonka, K. M. Elucidating structures from spectra using multimodal embeddings and discrete optimization. Preprint at ChemRxiv https://doi.org/10.26434/chemrxiv-2024-f3b18 (2024).
Burns, D. C., Mazzola, E. P. & Reynolds, W. F. The role of computer-assisted structure elucidation (CASE) programs in the structure elucidation of complex natural products. Nat. Prod. Rep. 36, 919–933 (2019).
Yang, Z. et al. Cross-modal retrieval between 13C NMR spectra and structures for compound identification using deep contrastive learning. Anal. Chem. 93, 16947–16955 (2021).
Sun, H. et al. Cross-modal retrieval between 13C NMR spectra and structures based on focused libraries. Anal. Chem. 96, 5763–5770 (2024).
Vaswani, A. et al. Attention is all you need. In Proc. Advances in Neural Information Processing Systems Vol. 30 (Curran Associates, Inc., 2017).
Kim, S. et al. PubChem 2025 update. Nucleic Acids Res. 53, D1516–D1525 (2025).
Kuhn, S. & Schlörer, N. E. Facilitating quality control for spectra assignments of small organic molecules: nmrshiftdb2—a free in-house NMR database with integrated LIMS for academic service laboratories. Magn. Reson. Chem. 53, 582–589 (2015).
Wishart, D. S. et al. NP-MRD: the natural products magnetic resonance database. Nucleic Acids Res. 50, D665–D677 (2022).
Gupta, A., Chakraborty, S. & Ramakrishnan, R. Revving up 13C NMR shielding predictions across chemical space: benchmarks for atoms-in-molecules kernel machine learning with new data for 134 kilo molecules. Mach. Learn. Sci. Technol. 2, 035010 (2021).
Alberts, M., Schilter, O., Zipoli, F., Hartrampf, N. & Laino, T. Unraveling molecular structure: a multimodal spectroscopic dataset for chemistry. In Proc. Advances in Neural Information Processing Systems Vol. 37, 125780–125808 (Curran Associates, Inc., 2024).
Bajusz, D., Rácz, A. & Héberger, K. Why is tanimoto index an appropriate choice for fingerprint-based similarity calculations?. J. Cheminform. 7, 20 (2015).
Morgan, H. L. The generation of a unique machine description for chemical structures—a technique developed at Chemical Abstracts Service. J. Chem. Doc. 5, 107–113 (1965).
Wang, J.-J. et al. Mimicking hydrogen-atom-transfer-like reactivity in copper-catalysed olefin hydrofunctionalization. Nat. Catal. 7, 838–846 (2024).
Fu, Y. et al. Photocatalyzed dehydroxylative amination of phenols: A ring-expansion approach for medium-sized benzolactams. Org. Lett. 23, 8317–8321 (2021).
Cheng, D., Yu, C., Pu, Y. & Xu, X. DDQ-mediated oxidative coupling reaction of N, N-dimethyl enaminones with cycloheptatriene. Tetrahedron Lett. 90, 153609 (2022).
Novitskiy, I. M. & Kutateladze, A. G. Peculiar reaction products and mechanisms revisited with machine learning-augmented computational NMR. J. Org. Chem. 87, 8589–8598 (2022).
Landrum, G. et al. RDKit: 2025_03_1 (Q1 2025) Release. https://doi.org/10.5281/zenodo.15115844 (2025).
Halgren, T. A. Merck Molecular Force Field. I. Basis, form, scope, parameterization, and performance of MMFF94. J. Comput. Chem. 17, 490–519 (1996).
Kuhn, H. W. The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 2, 83–97 (1955).
Munkres, J. Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5, 32–38 (1957).
Crouse, D. F. On implementing 2D rectangular assignment algorithms. IEEE Trans. Aerosp. Electron. Syst. 52, 1679–1696 (2016).
Johnson, J., Douze, M. & Jégou, H. Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7, 535–547 (2019).
Malkov, Y. A. & Yashunin, D. A. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell. 42, 824–836 (2018).
Bremser, W. Hose—a novel substructure code. Anal. Chim. Acta 103, 355–365 (1978).
Keeler, J. Understanding NMR Spectroscopy (John Wiley & Sons, 2011).
Jin, Y. Datasets for NMR-solver. https://doi.org/10.5281/zenodo.16952024 (2025).
Jin, Y. NMR-Solver: v1.0. https://github.com/YongqiJin/NMR-Solver, https://doi.org/10.5281/zenodo.18450044 (2026).
Acknowledgements
The authors thank Shangqian Chen and Peng Jin for their contributions to the development of the web app. The authors are also grateful for the insightful discussions and suggestions from Hanzheng Li and Xi Wang. W.E. acknowledges the National Natural Science Foundation of China (grant nos. 92570001 and 12288101). R.Z. acknowledges the New Generation Artificial Intelligence-National Science and Technology Major Project (2025ZD0121905), the National Natural Science Foundation of China (22350006, T2521001, 22222101, 22171012), Beijing Natural Science Foundation (2242006), and the AISI-NUS joint research initiative.
Author information
Authors and Affiliations
Contributions
W.E., R.Z., and G.K. contributed to the design of the work. Y.J. designed and implemented the methods and conducted the analysis. Y.J. and J.W. performed data collection and preprocessing. J.W. conducted the wet-lab experiments. Y.J. and F.X. carried out the evaluation of the methods. X.J., Z.G., and L.Z. contributed to project coordination and platform support. All authors participated in the discussion and wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Jin, Y., Wang, JJ., Xu, F. et al. NMR-Solver: automated structure elucidation via large-scale spectral matching and physics-guided fragment optimization. Nat Commun (2026). https://doi.org/10.1038/s41467-026-71315-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-026-71315-0


