Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Nature Communications
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. nature communications
  3. articles
  4. article
NMR-Solver: automated structure elucidation via large-scale spectral matching and physics-guided fragment optimization
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 02 April 2026

NMR-Solver: automated structure elucidation via large-scale spectral matching and physics-guided fragment optimization

  • Yongqi Jin  ORCID: orcid.org/0009-0003-9468-54681,2,
  • Jun-Jie Wang2,3,
  • Fanjie Xu  ORCID: orcid.org/0009-0007-1007-54272,4,
  • Xiaohong Ji2,
  • Zhifeng Gao  ORCID: orcid.org/0000-0001-8433-999X2,
  • Linfeng Zhang2,5,
  • Guolin Ke  ORCID: orcid.org/0000-0002-1227-72212,
  • Rong Zhu  ORCID: orcid.org/0000-0001-5035-35313,5 &
  • …
  • Weinan E1,5,6 

Nature Communications (2026) Cite this article

  • 9399 Accesses

  • 1 Citations

  • 4 Altmetric

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computational science
  • NMR spectroscopy
  • Structure elucidation

Abstract

Nuclear Magnetic Resonance (NMR) spectroscopy is one of the most powerful and widely used tools for molecular structure elucidation in organic chemistry. However, the interpretation of NMR spectra to determine unknown molecular structures remains a labor-intensive and expertise-dependent process, particularly for complex or novel compounds. Although recent methods have been proposed for molecular structure elucidation, they often underperform in real-world applications due to inherent algorithmic limitations and limited high-quality data. Here, we present NMR-Solver, a practical and interpretable framework for the automated determination of small organic molecule structures from 1H and 13C NMR spectra. Our method introduces an automated framework for molecular structure elucidation, integrating large-scale spectral matching with physics-guided molecular optimization that exploits atomic-level structure–spectrum relationships in NMR. We evaluate NMR-Solver on simulated benchmarks, curated experimental data from the literature, and real-world experiments, demonstrating its strong generalization, robustness, and practical utility in real-life scenarios. By integrating computational NMR analysis, deep learning, and interpretable chemical reasoning into a unified system, it facilitates scalable, automated, and chemically meaningful molecular structure elucidation, establishing a generalizable paradigm for solving inverse problems in molecular science.

Similar content being viewed by others

NMRexp: A database of 3.3 million experimental NMR spectra

Article Open access 18 December 2025

Structure characterization with NMR molecular networking

Article Open access 17 December 2025

Molecular search by NMR spectrum based on evaluation of matching between spectrum and molecule

Article Open access 25 October 2021

Data availability

The PubChem dataset40, used to construct the SimNMR-PubChem Database, is publicly available at https://pubchem.ncbi.nlm.nih.gov. The processed dataset and database index of the SimNMR-PubChem Database are available on Hugging Face at https://huggingface.co/datasets/yqj01/SimNMR-PubChem. All processed NMR datasets used for testing are available via Zenodo at https://doi.org/10.5281/zenodo.1695202460. All datasets generated and analyzed in this study are publicly accessible and can be freely used for research purposes without restriction.

Code availability

All source code for NMR-Solver is publicly available at https://github.com/YongqiJin/NMR-Solver61 under the open-source MIT License. The trained model weights for NMRNet are available via Zenodo at https://doi.org/10.5281/zenodo.1695202460.

References

  1. Clayden, J., Greeves, N. & Warren, S. Organic Chemistry (Oxford University Press, 2012).

  2. Skoog, D. A., Holler, F. J. & Crouch, S. R. Textbook “Principles of Instrumental Analysis” Vol. 6 (Cengage Learning, 2019).

  3. Elyashberg, M., Williams, A. & Martin, G. Computer-assisted structure verification and elucidation tools in NMR-based structure elucidation. Prog. Nucl. Magn. Reson. Spectrosc. 53, 1–104 (2008).

    Google Scholar 

  4. Ermanis, K., Parkes, K. E., Agback, T. & Goodman, J. M. Doubling the power of DP4 for computational structure elucidation. Org. Biomol. Chem. 15, 8998–9007 (2017).

    Google Scholar 

  5. Howarth, A., Ermanis, K. & Goodman, J. M. DP4-AI automated NMR data analysis: straight from spectrometer to structure. Chem. Sci. 11, 4351–4359 (2020).

    Google Scholar 

  6. Marcarino, M. O., Zanardi, M. M., Cicetti, S. & Sarotti, A. M. NMR calculations with quantum methods: development of new tools for structural elucidation and beyond. Acc. Chem. Res. 53, 1922–1932 (2020).

    Google Scholar 

  7. NMR Workbook Suite. ACD Labs. https://www.acdlabs.com/products/spectrus-platform/nmr-workbook-suite. Accessed 30 August 2025.

  8. MNova. MestreLab Research. https://mestrelab.com/software/mestrenova. Accessed 30 August 2025.

  9. Klukowski, P., Riek, R. & Güntert, P. NMRtist: an online platform for automated biomolecular NMR spectra analysis. Bioinformatics 39, btad066 (2023).

    Google Scholar 

  10. Ruddigkeit, L., Van Deursen, R., Blum, L. C. & Reymond, J.-L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model. 52, 2864–2875 (2012).

    Google Scholar 

  11. Buitrago Santanilla, A. et al. Nanomole-scale high-throughput chemistry for the synthesis of complex molecules. Science 347, 49–53 (2015).

    Google Scholar 

  12. Trobe, M. & Burke, M. D. The molecular industrial revolution: automated synthesis of small molecules. Angew. Chem. Int. Ed. 57, 4192–4214 (2018).

    Google Scholar 

  13. Coley, C. W. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 365, eaax1566 (2019).

    Google Scholar 

  14. Mehr, S. H. M., Craven, M., Leonov, A. I., Keenan, G. & Cronin, L. A universal system for digitization and automatic execution of the chemical synthesis literature. Science 370, 101–108 (2020).

    Google Scholar 

  15. Joung, J. F. et al. Electron flow matching for generative reaction mechanism prediction. Nature 645, 115–123 (2025).

    Google Scholar 

  16. Jia, Y. et al. Robot-assisted mapping of chemical reaction hyperspaces and networks. Nature 645, 922–931 (2025).

    Google Scholar 

  17. Liu, J. & Hein, J. E. Automation, analytics and artificial intelligence for chemical synthesis. Nat. Synth. 2, 464–466 (2023).

    Google Scholar 

  18. Dai, T. et al. Autonomous mobile robots for exploratory synthetic chemistry. Nature 635, 890–897 (2024).

    Google Scholar 

  19. Kozlov, K. S. et al. Discovering organic reactions with a machine-learning-powered deciphering of tera-scale mass spectrometry data. Nat. Commun. 16, 2587 (2025).

    Google Scholar 

  20. Jonas, E. & Kuhn, S. Rapid prediction of NMR spectral properties with quantified uncertainty. J. Cheminform. 11, 50 (2019).

    Google Scholar 

  21. Kwon, Y., Lee, D., Choi, Y.-S., Kang, M. & Kang, S. Neural message passing for NMR chemical shift prediction. J. Chem. Inf. Model. 60, 2024–2030 (2020).

    Google Scholar 

  22. Zou, Z. et al. A deep learning model for predicting selected organic molecular spectra. Nat. Comput. Sci. 3, 957–964 (2023).

    Google Scholar 

  23. Klukowski, P., Riek, R. & Güntert, P. Machine learning in NMR spectroscopy. Prog. Nucl. Magn. Reson. Spectrosc. 148, 101575 (2025).

    Google Scholar 

  24. Wolinski, K., Hinton, J. F. & Pulay, P. Efficient implementation of the gauge-independent atomic orbital method for NMR chemical shift calculations. J. Am. Chem. Soc. 112, 8251–8260 (1990).

    Google Scholar 

  25. Chen, H., Liang, T., Tan, K., Wu, A. & Lu, X. GT-NMR: a novel graph transformer-based approach for accurate prediction of NMR chemical shifts. J. Cheminform. 16, 132 (2024).

    Google Scholar 

  26. Xu, F. et al. Toward a unified benchmark and framework for deep learning-based prediction of nuclear magnetic resonance chemical shifts. Nat. Comput. Sci. 5, 292–300 (2025).

    Google Scholar 

  27. Yao, L. et al. Conditional molecular generation net enables automated structure elucidation based on 13C NMR spectra and prior knowledge. Anal. Chem. 95, 5393–5401 (2023).

    Google Scholar 

  28. Hu, F., Chen, M. S., Rotskoff, G. M., Kanan, M. W. & Markland, T. E. Accurate and efficient structure elucidation from routine one-dimensional NMR spectra using multitask machine learning. ACS Cent. Sci. 10, 2162–2170 (2024).

    Google Scholar 

  29. Alberts, M., Zipoli, F. & Vaucher, A. C. Learning the language of NMR: Structure elucidation from NMR spectra using transformer models. Preprint at ChemRxiv https://doi.org/10.26434/chemrxiv-2023-8wxcz (2023).

  30. Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).

    Google Scholar 

  31. Krenn, M., Häse, F., Nigam, A., Friederich, P. & Aspuru-Guzik, A. Selfies: a robust representation of semantically constrained graphs with an example in chemistry. Mach. Learn. Sci. Technol. 1, 045024 (2020).

    Google Scholar 

  32. Brown, N., Fiscato, M., Segler, M. H. & Vaucher, A. C. GuacaMol: benchmarking models for de novo molecular design. J. Chem. Inf. Model. 59, 1096–1108 (2019).

    Google Scholar 

  33. Tripp, A. & Hernández-Lobato, J. M. Genetic algorithms are strong baselines for molecule generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2310.09267 (2023).

  34. Jensen, J. H. A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space. Chem. Sci. 10, 3567–3572 (2019).

    Google Scholar 

  35. Mirza, A. & Jablonka, K. M. Elucidating structures from spectra using multimodal embeddings and discrete optimization. Preprint at ChemRxiv https://doi.org/10.26434/chemrxiv-2024-f3b18 (2024).

  36. Burns, D. C., Mazzola, E. P. & Reynolds, W. F. The role of computer-assisted structure elucidation (CASE) programs in the structure elucidation of complex natural products. Nat. Prod. Rep. 36, 919–933 (2019).

    Google Scholar 

  37. Yang, Z. et al. Cross-modal retrieval between 13C NMR spectra and structures for compound identification using deep contrastive learning. Anal. Chem. 93, 16947–16955 (2021).

    Google Scholar 

  38. Sun, H. et al. Cross-modal retrieval between 13C NMR spectra and structures based on focused libraries. Anal. Chem. 96, 5763–5770 (2024).

    Google Scholar 

  39. Vaswani, A. et al. Attention is all you need. In Proc. Advances in Neural Information Processing Systems Vol. 30 (Curran Associates, Inc., 2017).

  40. Kim, S. et al. PubChem 2025 update. Nucleic Acids Res. 53, D1516–D1525 (2025).

    Google Scholar 

  41. Kuhn, S. & Schlörer, N. E. Facilitating quality control for spectra assignments of small organic molecules: nmrshiftdb2—a free in-house NMR database with integrated LIMS for academic service laboratories. Magn. Reson. Chem. 53, 582–589 (2015).

    Google Scholar 

  42. Wishart, D. S. et al. NP-MRD: the natural products magnetic resonance database. Nucleic Acids Res. 50, D665–D677 (2022).

    Google Scholar 

  43. Gupta, A., Chakraborty, S. & Ramakrishnan, R. Revving up 13C NMR shielding predictions across chemical space: benchmarks for atoms-in-molecules kernel machine learning with new data for 134 kilo molecules. Mach. Learn. Sci. Technol. 2, 035010 (2021).

    Google Scholar 

  44. Alberts, M., Schilter, O., Zipoli, F., Hartrampf, N. & Laino, T. Unraveling molecular structure: a multimodal spectroscopic dataset for chemistry. In Proc. Advances in Neural Information Processing Systems Vol. 37, 125780–125808 (Curran Associates, Inc., 2024).

  45. Bajusz, D., Rácz, A. & Héberger, K. Why is tanimoto index an appropriate choice for fingerprint-based similarity calculations?. J. Cheminform. 7, 20 (2015).

    Google Scholar 

  46. Morgan, H. L. The generation of a unique machine description for chemical structures—a technique developed at Chemical Abstracts Service. J. Chem. Doc. 5, 107–113 (1965).

    Google Scholar 

  47. Wang, J.-J. et al. Mimicking hydrogen-atom-transfer-like reactivity in copper-catalysed olefin hydrofunctionalization. Nat. Catal. 7, 838–846 (2024).

    Google Scholar 

  48. Fu, Y. et al. Photocatalyzed dehydroxylative amination of phenols: A ring-expansion approach for medium-sized benzolactams. Org. Lett. 23, 8317–8321 (2021).

    Google Scholar 

  49. Cheng, D., Yu, C., Pu, Y. & Xu, X. DDQ-mediated oxidative coupling reaction of N, N-dimethyl enaminones with cycloheptatriene. Tetrahedron Lett. 90, 153609 (2022).

    Google Scholar 

  50. Novitskiy, I. M. & Kutateladze, A. G. Peculiar reaction products and mechanisms revisited with machine learning-augmented computational NMR. J. Org. Chem. 87, 8589–8598 (2022).

    Google Scholar 

  51. Landrum, G. et al. RDKit: 2025_03_1 (Q1 2025) Release. https://doi.org/10.5281/zenodo.15115844 (2025).

  52. Halgren, T. A. Merck Molecular Force Field. I. Basis, form, scope, parameterization, and performance of MMFF94. J. Comput. Chem. 17, 490–519 (1996).

    Google Scholar 

  53. Kuhn, H. W. The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 2, 83–97 (1955).

    Google Scholar 

  54. Munkres, J. Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5, 32–38 (1957).

    Google Scholar 

  55. Crouse, D. F. On implementing 2D rectangular assignment algorithms. IEEE Trans. Aerosp. Electron. Syst. 52, 1679–1696 (2016).

    Google Scholar 

  56. Johnson, J., Douze, M. & Jégou, H. Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7, 535–547 (2019).

    Google Scholar 

  57. Malkov, Y. A. & Yashunin, D. A. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell. 42, 824–836 (2018).

    Google Scholar 

  58. Bremser, W. Hose—a novel substructure code. Anal. Chim. Acta 103, 355–365 (1978).

    Google Scholar 

  59. Keeler, J. Understanding NMR Spectroscopy (John Wiley & Sons, 2011).

  60. Jin, Y. Datasets for NMR-solver. https://doi.org/10.5281/zenodo.16952024 (2025).

  61. Jin, Y. NMR-Solver: v1.0. https://github.com/YongqiJin/NMR-Solver, https://doi.org/10.5281/zenodo.18450044 (2026).

Download references

Acknowledgements

The authors thank Shangqian Chen and Peng Jin for their contributions to the development of the web app. The authors are also grateful for the insightful discussions and suggestions from Hanzheng Li and Xi Wang. W.E. acknowledges the National Natural Science Foundation of China (grant nos. 92570001 and 12288101). R.Z. acknowledges the New Generation Artificial Intelligence-National Science and Technology Major Project (2025ZD0121905), the National Natural Science Foundation of China (22350006, T2521001, 22222101, 22171012), Beijing Natural Science Foundation (2242006), and the AISI-NUS joint research initiative.

Author information

Authors and Affiliations

  1. School of Mathematical Sciences, Peking University, Beijing, China

    Yongqi Jin & Weinan E

  2. DP Technology, Beijing, China

    Yongqi Jin, Jun-Jie Wang, Fanjie Xu, Xiaohong Ji, Zhifeng Gao, Linfeng Zhang & Guolin Ke

  3. College of Chemistry and Molecular Engineering, Peking University, Beijing, China

    Jun-Jie Wang & Rong Zhu

  4. Institute of Artificial Intelligence, Xiamen University, Xiamen, China

    Fanjie Xu

  5. AI for Science Institute, Beijing, China

    Linfeng Zhang, Rong Zhu & Weinan E

  6. Center for Machine Learning Research, Peking University, Beijing, China

    Weinan E

Authors
  1. Yongqi Jin
    View author publications

    Search author on:PubMed Google Scholar

  2. Jun-Jie Wang
    View author publications

    Search author on:PubMed Google Scholar

  3. Fanjie Xu
    View author publications

    Search author on:PubMed Google Scholar

  4. Xiaohong Ji
    View author publications

    Search author on:PubMed Google Scholar

  5. Zhifeng Gao
    View author publications

    Search author on:PubMed Google Scholar

  6. Linfeng Zhang
    View author publications

    Search author on:PubMed Google Scholar

  7. Guolin Ke
    View author publications

    Search author on:PubMed Google Scholar

  8. Rong Zhu
    View author publications

    Search author on:PubMed Google Scholar

  9. Weinan E
    View author publications

    Search author on:PubMed Google Scholar

Contributions

W.E., R.Z., and G.K. contributed to the design of the work. Y.J. designed and implemented the methods and conducted the analysis. Y.J. and J.W. performed data collection and preprocessing. J.W. conducted the wet-lab experiments. Y.J. and F.X. carried out the evaluation of the methods. X.J., Z.G., and L.Z. contributed to project coordination and platform support. All authors participated in the discussion and wrote the manuscript.

Corresponding authors

Correspondence to Guolin Ke, Rong Zhu or Weinan E.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Reporting Summary (download PDF )

Transparent Peer Review file (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, Y., Wang, JJ., Xu, F. et al. NMR-Solver: automated structure elucidation via large-scale spectral matching and physics-guided fragment optimization. Nat Commun (2026). https://doi.org/10.1038/s41467-026-71315-0

Download citation

  • Received: 10 October 2025

  • Accepted: 19 March 2026

  • Published: 02 April 2026

  • DOI: https://doi.org/10.1038/s41467-026-71315-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Videos
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims & Scope
  • Editors
  • Journal Information
  • Open Access Fees and Funding
  • Calls for Papers
  • Editorial Values Statement
  • Journal Metrics
  • Editors' Highlights
  • Contact
  • Editorial policies
  • Top Articles

Publish with us

  • For authors
  • For Reviewers
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Nature Communications (Nat Commun)

ISSN 2041-1723 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics