Abstract
The asymmetric hydrogenation of olefins is one of the most important asymmetric transformations in molecular synthesis. While other machine learning models have successfully predicted stereoselectivity for reactions with a single prochiral site, existing models face limitations including narrow substrate–catalyst applicability, an inability to simultaneously predict stereoselectivity and absolute configurations in asymmetric hydrogenation of olefins with two prochiral sites, and a reliance on predefined descriptors. Here, to overcome these challenges, we introduce Chemistry-Informed Asymmetric Hydrogenation Network (ChemAHNet), a deep learning model based on the reaction mechanism of olefin asymmetric hydrogenation. By leveraging three structure-aware modules, ChemAHNet accurately predicts the absolute configuration of major enantiomers across diverse catalysts and substrates. It also defines the \(\mathrm{\varDelta \varDelta }{G}^{{\boldsymbol{\ddagger }}}\) of asymmetric hydrogenation via catalyst–olefin interactions, enabling concurrent prediction of stereoselectivity and absolute configuration. Notably, ChemAHNet extends to other asymmetric catalytic reactions. By operating solely on simplified molecular-input line-entry system inputs, it captures atomic-level spatial and electronic interactions, offering a robust tool for target-directed molecular engineering.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
All datasets used in this study are publicly accessible. The original AHO dataset, including the Rh/BINOL-phosphite-catalyzed hydrogenation of tri-substituted olefins, was obtained from Hong et al.43 available via GitHub at https://github.com/licheng-xu-echo/AHO.git. The datasets curated and analyzed in the current study, including CPA-catalyzed thiol addition to N-acylimine, organocatalytic conjugate addition reactions, photoredox-catalyzed asymmetric reactions and organocatalytic enamine reactions, are available via Zenodo at https://doi.org/10.5281/zenodo.17346605 (ref. 45). Source data are provided with this paper.
Code availability
All codes needed to run this model are available via GitHub at https://github.com/CHENGLi-96/ChemAHNet/releases/tag/ChemAHNet–V1.0 and via Zenodo at https://doi.org/10.5281/zenodo.17346605 (ref. 45).
References
von Lilienfeld, O. A., Müller, K.-R. & Tkatchenko, A. Exploring chemical compound space with quantum-based machine learning. Nat. Rev. Chem. 4, 347–358 (2020).
Wouters, O. J., McKee, M. & Luyten, J. Estimated research and development investment needed to bring a new medicine to market, 2009–2018. JAMA 323, 844–853 (2020).
Gromski, P. S., Henson, A. B., Granda, J. M. & Cronin, L. How to explore chemical space using algorithms and automation. Nat. Rev. Chem. 3, 119–128 (2019).
DiMasi, J. A., Grabowski, H. G. & Hansen, R. W. Innovation in the pharmaceutical industry: new estimates of R&D costs. J. Health Econ. 47, 20–33 (2016).
Pensak, D. A. & Corey, E. J. LHASA—logic and heuristics applied to synthetic analysis. ACS Symp. Ser. 61, 1–32 (1977).
Wipke, W. T., Braun, H., Smith, G., Choplin, F. & Sieber, W. K. SECS—simulation and evaluation of chemical synthesis: strategy and planning. ACS Symp. Ser. 61, 92–127 (1977).
Kayala, M. A., Azencott, C.-A., Chen, J. H. & Baldi, P. Learning to predict chemical reactions. J. Chem. Inf. Model. 51, 2209–2222 (2011).
Houk, K. N. & Cheong, P. H.-Y. Computational prediction of small-molecule catalysts. Nature 455, 309–313 (2008).
Jin, W., Coley, C. W., Barzilay, R. & Jaakkola, T. Predicting organic reaction outcomes with Weisfeiler-Lehman network. In Advances in Neural Information Processing Systems, Vol. 30 (eds Guyon, I. et al.) 2604–2613 (NeurIPS, 2017).
Ahn, S., Hong, M., Sundararajan, M., Ess, D. H. & Baik, M.-H. Design and optimization of catalysts based on mechanistic insights derived from quantum chemical reaction modeling. Chem. Rev. 119, 6509–6560 (2019).
Coley, C. W. et al. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci. 10, 370–377 (2019).
Coley, C. W., Barzilay, R., Jaakkola, T. S., Green, W. H. & Jensen, K. F. Prediction of organic reaction outcomes using machine learning. ACS Cent. Sci. 3, 434–443 (2017).
Neel, A. J., Milo, A., Sigman, M. S. & Toste, F. D. Enantiodivergent fluorination of allylic alcohols: data set design reveals structural interplay between achiral directing group and chiral anion. J. Am. Chem. Soc. 138, 3863–3875 (2016).
Knowles, R. R. & Jacobsen, E. N. Attractive noncovalent interactions in asymmetric catalysis: Links between enzymes and small molecule catalysts. Proc. Natl Acad. Sci. USA 107, 20678–20685 (2010).
Bi, H. et al. Non-autoregressive electron redistribution modeling for reaction prediction. In Proc. 38th International Conference on Machine Learning, Vol. 139 (eds Meila, M. & Zhang, T.) 904–913 (PMLR, 2021).
Thakkar, A., Kogej, T., Reymond, J.-L., Engkvist, O. & Bjerrum, E. J. Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain. Chem. Sci. 11, 154–168 (2019).
Keto, A. et al. Data-efficient, chemistry-aware machine learning predictions of diels–alder reaction outcomes. J. Am. Chem. Soc. 146, 16052–16061 (2024).
Schwaller, P. et al. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent. Sci. 5, 1572–1583 (2019).
Lu, J. & Zhang, Y. Unified deep learning model for multitask reaction predictions with explanation. J. Chem. Inf. Model. 62, 1376–1387 (2022).
Irwin, R., Dimitriadis, S., He, J. & Bjerrum, E. J. Chemformer: a pre-trained transformer for computational chemistry. Mach. Learn. Sci. Technol. 3, 015022 (2022).
Pesciullesi, G., Schwaller, P., Laino, T. & Reymond, J.-L. Transfer learning enables the molecular transformer to predict regio- and stereoselective reactions on carbohydrates. Nat. Commun. 11, 4874 (2020).
Tetko, I. V., Karpov, P., Van Deursen, R. & Godin, G. State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis. Nat. Commun. 11, 5575 (2020).
Li, S.-W., Xu, L.-C., Zhang, C., Zhang, S.-Q. & Hong, X. Reaction performance prediction with an extrapolative and interpretable graph model based on chemical knowledge. Nat. Commun. 14, 3569 (2023).
Knowles, W. S. Asymmetric hydrogenations (Nobel lecture). Angew. Chem. Int. Ed. 41, 1998–2007 (2002).
Eberhardt, L., Armspach, D., Harrowfield, J. & Matt, D. BINOL-derived phosphoramidites in asymmetric hydrogenation: can the presence of a functionality in the amino group influence the catalytic outcome? Chem. Soc. Rev. 37, 839–864 (2008).
Verendel, J. J., Pàmies, O., Diéguez, M. & Andersson, P. G. Asymmetric hydrogenation of olefins using chiral Crabtree-type catalysts: scope and limitations. Chem. Rev. 114, 2130–2169 (2014).
Zhang, Z., Butt, N. A. & Zhang, W. Asymmetric hydrogenation of nonaromatic cyclic substrates. Chem. Rev. 116, 14769–14827 (2016).
Massaro, L., Zheng, J., Margarita, C. & Andersson, P. G. Enantioconvergent and enantiodivergent catalytic hydrogenation of isomeric olefins. Chem. Soc. Rev. 49, 2504–2522 (2020).
Janssen-Müller, D., Schlepphorst, C. & Glorius, F. Privileged chiral N-heterocyclic carbene ligands for asymmetric transition-metal catalysis. Chem. Soc. Rev. 46, 4845–4854 (2017).
Wen, J., Wang, F. & Zhang, X. Asymmetric hydrogenation catalyzed by first-row transition metal complexes. Chem. Soc. Rev. 50, 3211–3237 (2021).
Wang, Q. et al. Rhodium-catalyzed enantioselective hydrogenation of tetrasubstituted α-acetoxy β-enamido esters: a new approach to chiral α-hydroxyl-β-amino acid derivatives. J. Am. Chem. Soc. 136, 16120–16123 (2014).
Yoshikai, Y., Mizuno, T., Nemoto, S. & Kusuhara, H. Difficulty in chirality recognition for transformer architectures learning chemical structures from string representations. Nat. Commun. 15, 1197 (2024).
Sigman, M. S., Harper, K. C., Bess, E. N. & Milo, A. The development of multidimensional analysis tools for asymmetric catalysis and beyond. Acc. Chem. Res. 49, 1292–1301 (2016).
Reid, J. P. & Sigman, M. S. Comparing quantitative prediction methods for the discovery of small-molecule chiral catalysts. Nat. Rev. Chem. 2, 290–305 (2018).
Santiago, C. B., Guo, J.-Y. & Sigman, M. S. Predictive and mechanistic multivariate linear regression models for reaction development. Chem. Sci. 9, 2398–2412 (2018).
Gallarati, S. et al. Reaction-based machine learning representations for predicting the enantioselectivity of organocatalysts. Chem. Sci. 12, 6879–6889 (2021).
Ravasco, J. M. J. M. & Coelho, J. A. S. Predictive multivariate models for bioorthogonal inverse-electron demand Diels–Alder reactions. J. Am. Chem. Soc. 142, 4235–4241 (2020).
Singh, S. et al. A unified machine-learning protocol for asymmetric catalysis as a proof of concept demonstration using asymmetric hydrogenation. Proc. Natl Acad. Sci. USA 117, 1339–1345 (2020).
Xu, L.-C. et al. Enantioselectivity prediction of pallada-electrocatalysed C–H activation using transition state knowledge in machine learning. Nat. Synth. 2, 321–330 (2023).
Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).
Reid, J. P. & Sigman, M. S. Holistic prediction of enantioselectivity in asymmetric catalysis. Nature 571, 343–348 (2019).
Zahrt, A. F. et al. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 363, eaau5631 (2019).
Xu, L.-C. et al. Towards data-driven design of asymmetric hydrogenation of olefins: database and hierarchical learning. Angew. Chem. Int. Ed. 60, 22804–22811 (2021).
Moskal, M., Beker, W., Szymkuć, S. & Grzybowski, B. A. Scaffold-directed face selectivity machine-learned from vectors of non-covalent interactions. Angew. Chem. Int. Ed. 60, 15230–15235 (2021).
Cheng L. et al. Chemistry-informed deep learning model for predicting stereoselectivity and absolute configuration in asymmetric hydrogenation. Zenodo https://doi.org/10.5281/zenodo.17346605 (2025).
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, Vol. 30 (eds Guyon, I. et al.) 4768–4777 (NeurIPS, 2017).
Lundberg, S. M., Erion, G. G. & Lee, S.-I. Consistent individualized feature attribution for tree ensembles. Preprint at https://doi.org/10.48550/arXiv.1802.03888 (2019).
Heberle, H., Zhao, L., Schmidt, S., Wolf, T. & Heinrich, J. XSMILES: interactive visualization for molecules, SMILES and XAI attribution scores. J. Cheminform. 15, 2 (2023).
Landis, C. R. & Halpern, J. Asymmetric hydrogenation of methyl (Z)-α-acetamidocinnamate catalyzed by [1,2-bis(phenyl-o-anisoyl)phosphino)ethane]rhodium(I): kinetics, mechanism and origin of enantioselection. J. Am. Chem. Soc. 109, 1746–1754 (1987).
Mohar, B. & Stephan, M. Practical enantioselective hydrogenation of α-aryl- and α-carboxyamidoethylenes by rhodium(i)-1,2-bis[(o-tert-butoxyphenyl)(phenyl)phosphino]ethane. Adv. Synth. Catal. 355, 594–600 (2013).
Li, C. et al. Stereoelectronic effects in ligand design: enantioselective rhodium-catalyzed hydrogenation of aliphatic cyclic tetrasubstituted enamides and concise synthesis of (R)-tofacitinib. Angew. Chem. Int. Ed. 58, 13573–13583 (2019).
Rdkit: open-source chemoinformatics and machine learning. RDKit.org (accessed 15 July 2024); https://rdkit.org/
Ahmad, W., Simon, E., Chithrananda, S., Grand, G. & Ramsundar, B. ChemBERTa-2: towards chemical foundation models. Preprint at https://doi.org/10.48550/arXiv.2209.01712 (2022).
Kim, Y. Convolutional neural networks for sentence classification. In Proc. Conference on Empirical Methods in Natural Language Processing (eds Moschitti, A., Pang, B. & Daelemans, W.) 1746–1751 (EMNLP, 2014).
Wang, S., Huang, M. & Deng, Z. Densely connected CNN with multi-scale feature attention for text classification. In Proc. 27th International Joint Conference on Artificial Intelligence, 4468–4474 (IJCAI, 2018).
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems, Vol. 30 (eds Guyon, I. et al.) 6000–6010 (NeurIPS, 2017).
Nam, J. & Kim, J. Linking the neural machine translation and the prediction of organic chemistry reactions. Preprint at https://doi.org/10.48550/arXiv.1612.09529 (2016).
Schwaller, P., Gaudin, T., Lányi, D., Bekas, C. & Laino, T. “Found in translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chem. Sci. 9, 6091–6098 (2018).
Gehring, J., Auli, M., Grangier, D., Yarats, D. & Dauphin, Y. N. Convolutional sequence to sequence learning. In Proc. 34th International Conference on Machine Learning, Vol. 70 (eds Precup, D. & Teh, Y. W.) 1243–1252 (PMLR, 2017).
Zhang, X., Zhou, X., Lin, M. & Sun, J. ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 6848–6856 (IEEE, 2018).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016, 770–778 (IEEE, 2016).
He, K., Zhang, X., Ren, S. & Sun, J. Identity mappings in deep residual networks. In Proc. 14th European Conference on Computer Vision (eds Leibe, B., Matas, J., Sebe, N. & Welling, M.) 630–645 (ECCV, 2016).
Acknowledgements
We gratefully acknowledge the financial support provided by the Shenzhen Medical Research Fund (grant no. B2402042 to B.Z.), National Natural Science Foundation of China (grant nos. 22274069 and 22304070 to B.Z.), Shenzhen Science and Technology Program project (grant no. JCYJ20240813094504007 to B.Z. and JCYJ20210324104007020 to P.-L.S.), Guangdong Provincial Key Laboratory of Advanced Biomaterials (grant no. 2022B1212010003 to B.Z.), Shenzhen Intelligent Medical Engineering Laboratory (grant no. ZDSYS20200811144003009 to B.Z.) and Macao SAR (file no. 0046/2025/RIB1, 0002/2024/TFP to G.X.), UM’s research fund (file nos. MYRG-GRG2023-00065-IAPME-UMDF and MYRG-GRG2024-00156-IAPME to G.X.) and the Natural Science Foundation of China (grant nos. 62175268, 62288102 and 22405010 to G.X.). Supported by the Center for Computational Science and Engineering at Southern University of Science and Technology. We acknowledge the assistance of SUSTech Core Research Facilities.
Author information
Authors and Affiliations
Contributions
B.Z., L.C., P.-L.S. and G.X. conceived the project and designed the machine learning framework. L.C. performed the machine learning training. J. Lv, H.X. and Y.S. performed training of baseline models. J.Y., Z.X. and M.L. performed the experimental verifications. G.W., S.Z., J. Li, Z.J. and X.T. contributed to data curation. All authors are involved in the discussions and paper writing.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Computational Science Xin Hong and Seonah Kim for their contribution to the peer review of this work. Primary Handling Editor: Kaitlin McCardle, in collaboration with the Nature Computational Science team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–22, Discussion and Tables 1–7.
Source data
Source Data Fig. 2
Source data (see Fig_2.zip file: dataset information in csv format).
Source Data Fig. 3
Source data (see Fig_3.zip file: model prediction accuracy data in csv format).
Source Data Fig. 4
Source data (see Fig_4.zip file: regression data in csv format).
Source Data Fig. 5
Source data (see Fig_5.zip file: regression data in csv format).
Source Data Fig. 6
Source data (see Fig_6.zip: atom-level contribution data generated by SHAP, provided in csv format); contained in Data_for_Fig6.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cheng, L., Shao, PL., Lv, J. et al. Chemistry-informed deep learning model for predicting stereoselectivity and absolute configuration in asymmetric hydrogenation. Nat Comput Sci (2025). https://doi.org/10.1038/s43588-025-00920-8
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s43588-025-00920-8


