Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Chemistry-informed deep learning model for predicting stereoselectivity and absolute configuration in asymmetric hydrogenation

Abstract

The asymmetric hydrogenation of olefins is one of the most important asymmetric transformations in molecular synthesis. While other machine learning models have successfully predicted stereoselectivity for reactions with a single prochiral site, existing models face limitations including narrow substrate–catalyst applicability, an inability to simultaneously predict stereoselectivity and absolute configurations in asymmetric hydrogenation of olefins with two prochiral sites, and a reliance on predefined descriptors. Here, to overcome these challenges, we introduce Chemistry-Informed Asymmetric Hydrogenation Network (ChemAHNet), a deep learning model based on the reaction mechanism of olefin asymmetric hydrogenation. By leveraging three structure-aware modules, ChemAHNet accurately predicts the absolute configuration of major enantiomers across diverse catalysts and substrates. It also defines the \(\mathrm{\varDelta \varDelta }{G}^{{\boldsymbol{\ddagger }}}\) of asymmetric hydrogenation via catalyst–olefin interactions, enabling concurrent prediction of stereoselectivity and absolute configuration. Notably, ChemAHNet extends to other asymmetric catalytic reactions. By operating solely on simplified molecular-input line-entry system inputs, it captures atomic-level spatial and electronic interactions, offering a robust tool for target-directed molecular engineering.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: ChemAHNet: a deep learning framework for asymmetric olefin hydrogenation.
Fig. 2: Key statistics of the created AHO database.
Fig. 3: Comparison of prediction results between the ChemAHNet model and other state-of-the-art models.
Fig. 4: Ablation study of ChemAHNet modules.
Fig. 5: Enantioselectivity prediction for the chiral phosphoric acid-catalyzed thiol addition to N-acylimines using ChemAHNet and SEMG-MIGNN.
Fig. 6: Chemical interpretation of absolute conformation and enantioselectivity determination by the ChemAHNet model.

Similar content being viewed by others

Data availability

All datasets used in this study are publicly accessible. The original AHO dataset, including the Rh/BINOL-phosphite-catalyzed hydrogenation of tri-substituted olefins, was obtained from Hong et al.43 available via GitHub at https://github.com/licheng-xu-echo/AHO.git. The datasets curated and analyzed in the current study, including CPA-catalyzed thiol addition to N-acylimine, organocatalytic conjugate addition reactions, photoredox-catalyzed asymmetric reactions and organocatalytic enamine reactions, are available via Zenodo at https://doi.org/10.5281/zenodo.17346605 (ref. 45). Source data are provided with this paper.

Code availability

All codes needed to run this model are available via GitHub at https://github.com/CHENGLi-96/ChemAHNet/releases/tag/ChemAHNet–V1.0 and via Zenodo at https://doi.org/10.5281/zenodo.17346605 (ref. 45).

References

  1. von Lilienfeld, O. A., Müller, K.-R. & Tkatchenko, A. Exploring chemical compound space with quantum-based machine learning. Nat. Rev. Chem. 4, 347–358 (2020).

    Article  Google Scholar 

  2. Wouters, O. J., McKee, M. & Luyten, J. Estimated research and development investment needed to bring a new medicine to market, 2009–2018. JAMA 323, 844–853 (2020).

    Article  Google Scholar 

  3. Gromski, P. S., Henson, A. B., Granda, J. M. & Cronin, L. How to explore chemical space using algorithms and automation. Nat. Rev. Chem. 3, 119–128 (2019).

    Article  Google Scholar 

  4. DiMasi, J. A., Grabowski, H. G. & Hansen, R. W. Innovation in the pharmaceutical industry: new estimates of R&D costs. J. Health Econ. 47, 20–33 (2016).

    Article  Google Scholar 

  5. Pensak, D. A. & Corey, E. J. LHASA—logic and heuristics applied to synthetic analysis. ACS Symp. Ser. 61, 1–32 (1977).

    Article  Google Scholar 

  6. Wipke, W. T., Braun, H., Smith, G., Choplin, F. & Sieber, W. K. SECS—simulation and evaluation of chemical synthesis: strategy and planning. ACS Symp. Ser. 61, 92–127 (1977).

    Google Scholar 

  7. Kayala, M. A., Azencott, C.-A., Chen, J. H. & Baldi, P. Learning to predict chemical reactions. J. Chem. Inf. Model. 51, 2209–2222 (2011).

    Article  Google Scholar 

  8. Houk, K. N. & Cheong, P. H.-Y. Computational prediction of small-molecule catalysts. Nature 455, 309–313 (2008).

    Article  Google Scholar 

  9. Jin, W., Coley, C. W., Barzilay, R. & Jaakkola, T. Predicting organic reaction outcomes with Weisfeiler-Lehman network. In Advances in Neural Information Processing Systems, Vol. 30 (eds Guyon, I. et al.) 2604–2613 (NeurIPS, 2017).

  10. Ahn, S., Hong, M., Sundararajan, M., Ess, D. H. & Baik, M.-H. Design and optimization of catalysts based on mechanistic insights derived from quantum chemical reaction modeling. Chem. Rev. 119, 6509–6560 (2019).

    Article  Google Scholar 

  11. Coley, C. W. et al. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci. 10, 370–377 (2019).

    Article  Google Scholar 

  12. Coley, C. W., Barzilay, R., Jaakkola, T. S., Green, W. H. & Jensen, K. F. Prediction of organic reaction outcomes using machine learning. ACS Cent. Sci. 3, 434–443 (2017).

    Article  Google Scholar 

  13. Neel, A. J., Milo, A., Sigman, M. S. & Toste, F. D. Enantiodivergent fluorination of allylic alcohols: data set design reveals structural interplay between achiral directing group and chiral anion. J. Am. Chem. Soc. 138, 3863–3875 (2016).

    Article  Google Scholar 

  14. Knowles, R. R. & Jacobsen, E. N. Attractive noncovalent interactions in asymmetric catalysis: Links between enzymes and small molecule catalysts. Proc. Natl Acad. Sci. USA 107, 20678–20685 (2010).

    Article  Google Scholar 

  15. Bi, H. et al. Non-autoregressive electron redistribution modeling for reaction prediction. In Proc. 38th International Conference on Machine Learning, Vol. 139 (eds Meila, M. & Zhang, T.) 904–913 (PMLR, 2021).

  16. Thakkar, A., Kogej, T., Reymond, J.-L., Engkvist, O. & Bjerrum, E. J. Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain. Chem. Sci. 11, 154–168 (2019).

    Article  Google Scholar 

  17. Keto, A. et al. Data-efficient, chemistry-aware machine learning predictions of diels–alder reaction outcomes. J. Am. Chem. Soc. 146, 16052–16061 (2024).

    Article  Google Scholar 

  18. Schwaller, P. et al. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent. Sci. 5, 1572–1583 (2019).

    Article  Google Scholar 

  19. Lu, J. & Zhang, Y. Unified deep learning model for multitask reaction predictions with explanation. J. Chem. Inf. Model. 62, 1376–1387 (2022).

    Article  Google Scholar 

  20. Irwin, R., Dimitriadis, S., He, J. & Bjerrum, E. J. Chemformer: a pre-trained transformer for computational chemistry. Mach. Learn. Sci. Technol. 3, 015022 (2022).

    Article  Google Scholar 

  21. Pesciullesi, G., Schwaller, P., Laino, T. & Reymond, J.-L. Transfer learning enables the molecular transformer to predict regio- and stereoselective reactions on carbohydrates. Nat. Commun. 11, 4874 (2020).

    Article  Google Scholar 

  22. Tetko, I. V., Karpov, P., Van Deursen, R. & Godin, G. State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis. Nat. Commun. 11, 5575 (2020).

    Article  Google Scholar 

  23. Li, S.-W., Xu, L.-C., Zhang, C., Zhang, S.-Q. & Hong, X. Reaction performance prediction with an extrapolative and interpretable graph model based on chemical knowledge. Nat. Commun. 14, 3569 (2023).

    Article  Google Scholar 

  24. Knowles, W. S. Asymmetric hydrogenations (Nobel lecture). Angew. Chem. Int. Ed. 41, 1998–2007 (2002).

    Article  Google Scholar 

  25. Eberhardt, L., Armspach, D., Harrowfield, J. & Matt, D. BINOL-derived phosphoramidites in asymmetric hydrogenation: can the presence of a functionality in the amino group influence the catalytic outcome? Chem. Soc. Rev. 37, 839–864 (2008).

    Article  Google Scholar 

  26. Verendel, J. J., Pàmies, O., Diéguez, M. & Andersson, P. G. Asymmetric hydrogenation of olefins using chiral Crabtree-type catalysts: scope and limitations. Chem. Rev. 114, 2130–2169 (2014).

    Article  Google Scholar 

  27. Zhang, Z., Butt, N. A. & Zhang, W. Asymmetric hydrogenation of nonaromatic cyclic substrates. Chem. Rev. 116, 14769–14827 (2016).

    Article  Google Scholar 

  28. Massaro, L., Zheng, J., Margarita, C. & Andersson, P. G. Enantioconvergent and enantiodivergent catalytic hydrogenation of isomeric olefins. Chem. Soc. Rev. 49, 2504–2522 (2020).

    Article  Google Scholar 

  29. Janssen-Müller, D., Schlepphorst, C. & Glorius, F. Privileged chiral N-heterocyclic carbene ligands for asymmetric transition-metal catalysis. Chem. Soc. Rev. 46, 4845–4854 (2017).

    Article  Google Scholar 

  30. Wen, J., Wang, F. & Zhang, X. Asymmetric hydrogenation catalyzed by first-row transition metal complexes. Chem. Soc. Rev. 50, 3211–3237 (2021).

    Article  Google Scholar 

  31. Wang, Q. et al. Rhodium-catalyzed enantioselective hydrogenation of tetrasubstituted α-acetoxy β-enamido esters: a new approach to chiral α-hydroxyl-β-amino acid derivatives. J. Am. Chem. Soc. 136, 16120–16123 (2014).

    Article  Google Scholar 

  32. Yoshikai, Y., Mizuno, T., Nemoto, S. & Kusuhara, H. Difficulty in chirality recognition for transformer architectures learning chemical structures from string representations. Nat. Commun. 15, 1197 (2024).

    Article  Google Scholar 

  33. Sigman, M. S., Harper, K. C., Bess, E. N. & Milo, A. The development of multidimensional analysis tools for asymmetric catalysis and beyond. Acc. Chem. Res. 49, 1292–1301 (2016).

    Article  Google Scholar 

  34. Reid, J. P. & Sigman, M. S. Comparing quantitative prediction methods for the discovery of small-molecule chiral catalysts. Nat. Rev. Chem. 2, 290–305 (2018).

    Article  Google Scholar 

  35. Santiago, C. B., Guo, J.-Y. & Sigman, M. S. Predictive and mechanistic multivariate linear regression models for reaction development. Chem. Sci. 9, 2398–2412 (2018).

    Article  Google Scholar 

  36. Gallarati, S. et al. Reaction-based machine learning representations for predicting the enantioselectivity of organocatalysts. Chem. Sci. 12, 6879–6889 (2021).

    Article  Google Scholar 

  37. Ravasco, J. M. J. M. & Coelho, J. A. S. Predictive multivariate models for bioorthogonal inverse-electron demand Diels–Alder reactions. J. Am. Chem. Soc. 142, 4235–4241 (2020).

    Article  Google Scholar 

  38. Singh, S. et al. A unified machine-learning protocol for asymmetric catalysis as a proof of concept demonstration using asymmetric hydrogenation. Proc. Natl Acad. Sci. USA 117, 1339–1345 (2020).

    Article  Google Scholar 

  39. Xu, L.-C. et al. Enantioselectivity prediction of pallada-electrocatalysed C–H activation using transition state knowledge in machine learning. Nat. Synth. 2, 321–330 (2023).

    Article  Google Scholar 

  40. Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).

    Article  Google Scholar 

  41. Reid, J. P. & Sigman, M. S. Holistic prediction of enantioselectivity in asymmetric catalysis. Nature 571, 343–348 (2019).

    Article  Google Scholar 

  42. Zahrt, A. F. et al. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 363, eaau5631 (2019).

    Article  Google Scholar 

  43. Xu, L.-C. et al. Towards data-driven design of asymmetric hydrogenation of olefins: database and hierarchical learning. Angew. Chem. Int. Ed. 60, 22804–22811 (2021).

    Article  Google Scholar 

  44. Moskal, M., Beker, W., Szymkuć, S. & Grzybowski, B. A. Scaffold-directed face selectivity machine-learned from vectors of non-covalent interactions. Angew. Chem. Int. Ed. 60, 15230–15235 (2021).

    Article  Google Scholar 

  45. Cheng L. et al. Chemistry-informed deep learning model for predicting stereoselectivity and absolute configuration in asymmetric hydrogenation. Zenodo https://doi.org/10.5281/zenodo.17346605 (2025).

  46. Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, Vol. 30 (eds Guyon, I. et al.) 4768–4777 (NeurIPS, 2017).

  47. Lundberg, S. M., Erion, G. G. & Lee, S.-I. Consistent individualized feature attribution for tree ensembles. Preprint at https://doi.org/10.48550/arXiv.1802.03888 (2019).

  48. Heberle, H., Zhao, L., Schmidt, S., Wolf, T. & Heinrich, J. XSMILES: interactive visualization for molecules, SMILES and XAI attribution scores. J. Cheminform. 15, 2 (2023).

    Article  Google Scholar 

  49. Landis, C. R. & Halpern, J. Asymmetric hydrogenation of methyl (Z)-α-acetamidocinnamate catalyzed by [1,2-bis(phenyl-o-anisoyl)phosphino)ethane]rhodium(I): kinetics, mechanism and origin of enantioselection. J. Am. Chem. Soc. 109, 1746–1754 (1987).

  50. Mohar, B. & Stephan, M. Practical enantioselective hydrogenation of α-aryl- and α-carboxyamidoethylenes by rhodium(i)-1,2-bis[(o-tert-butoxyphenyl)(phenyl)phosphino]ethane. Adv. Synth. Catal. 355, 594–600 (2013).

  51. Li, C. et al. Stereoelectronic effects in ligand design: enantioselective rhodium-catalyzed hydrogenation of aliphatic cyclic tetrasubstituted enamides and concise synthesis of (R)-tofacitinib. Angew. Chem. Int. Ed. 58, 13573–13583 (2019).

    Article  Google Scholar 

  52. Rdkit: open-source chemoinformatics and machine learning. RDKit.org (accessed 15 July 2024); https://rdkit.org/

  53. Ahmad, W., Simon, E., Chithrananda, S., Grand, G. & Ramsundar, B. ChemBERTa-2: towards chemical foundation models. Preprint at https://doi.org/10.48550/arXiv.2209.01712 (2022).

  54. Kim, Y. Convolutional neural networks for sentence classification. In Proc. Conference on Empirical Methods in Natural Language Processing (eds Moschitti, A., Pang, B. & Daelemans, W.) 1746–1751 (EMNLP, 2014).

  55. Wang, S., Huang, M. & Deng, Z. Densely connected CNN with multi-scale feature attention for text classification. In Proc. 27th International Joint Conference on Artificial Intelligence, 4468–4474 (IJCAI, 2018).

  56. Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems, Vol. 30 (eds Guyon, I. et al.) 6000–6010 (NeurIPS, 2017).

  57. Nam, J. & Kim, J. Linking the neural machine translation and the prediction of organic chemistry reactions. Preprint at https://doi.org/10.48550/arXiv.1612.09529 (2016).

  58. Schwaller, P., Gaudin, T., Lányi, D., Bekas, C. & Laino, T. “Found in translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chem. Sci. 9, 6091–6098 (2018).

    Article  Google Scholar 

  59. Gehring, J., Auli, M., Grangier, D., Yarats, D. & Dauphin, Y. N. Convolutional sequence to sequence learning. In Proc. 34th International Conference on Machine Learning, Vol. 70 (eds Precup, D. & Teh, Y. W.) 1243–1252 (PMLR, 2017).

  60. Zhang, X., Zhou, X., Lin, M. & Sun, J. ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 6848–6856 (IEEE, 2018).

  61. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016, 770–778 (IEEE, 2016).

  62. He, K., Zhang, X., Ren, S. & Sun, J. Identity mappings in deep residual networks. In Proc. 14th European Conference on Computer Vision (eds Leibe, B., Matas, J., Sebe, N. & Welling, M.) 630–645 (ECCV, 2016).

Download references

Acknowledgements

We gratefully acknowledge the financial support provided by the Shenzhen Medical Research Fund (grant no. B2402042 to B.Z.), National Natural Science Foundation of China (grant nos. 22274069 and 22304070 to B.Z.), Shenzhen Science and Technology Program project (grant no. JCYJ20240813094504007 to B.Z. and JCYJ20210324104007020 to P.-L.S.), Guangdong Provincial Key Laboratory of Advanced Biomaterials (grant no. 2022B1212010003 to B.Z.), Shenzhen Intelligent Medical Engineering Laboratory (grant no. ZDSYS20200811144003009 to B.Z.) and Macao SAR (file no. 0046/2025/RIB1, 0002/2024/TFP to G.X.), UM’s research fund (file nos. MYRG-GRG2023-00065-IAPME-UMDF and MYRG-GRG2024-00156-IAPME to G.X.) and the Natural Science Foundation of China (grant nos. 62175268, 62288102 and 22405010 to G.X.). Supported by the Center for Computational Science and Engineering at Southern University of Science and Technology. We acknowledge the assistance of SUSTech Core Research Facilities.

Author information

Authors and Affiliations

Authors

Contributions

B.Z., L.C., P.-L.S. and G.X. conceived the project and designed the machine learning framework. L.C. performed the machine learning training. J. Lv, H.X. and Y.S. performed training of baseline models. J.Y., Z.X. and M.L. performed the experimental verifications. G.W., S.Z., J. Li, Z.J. and X.T. contributed to data curation. All authors are involved in the discussions and paper writing.

Corresponding authors

Correspondence to Pan-Lin Shao, Guichuan Xing or Bo Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science Xin Hong and Seonah Kim for their contribution to the peer review of this work. Primary Handling Editor: Kaitlin McCardle, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–22, Discussion and Tables 1–7.

Source data

Source Data Fig. 2

Source data (see Fig_2.zip file: dataset information in csv format).

Source Data Fig. 3

Source data (see Fig_3.zip file: model prediction accuracy data in csv format).

Source Data Fig. 4

Source data (see Fig_4.zip file: regression data in csv format).

Source Data Fig. 5

Source data (see Fig_5.zip file: regression data in csv format).

Source Data Fig. 6

Source data (see Fig_6.zip: atom-level contribution data generated by SHAP, provided in csv format); contained in Data_for_Fig6.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, L., Shao, PL., Lv, J. et al. Chemistry-informed deep learning model for predicting stereoselectivity and absolute configuration in asymmetric hydrogenation. Nat Comput Sci (2025). https://doi.org/10.1038/s43588-025-00920-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s43588-025-00920-8

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics