Abstract
Identifying a catalyst class to optimize the enantioselectivity of a new reaction, either involving a different combination of known substrate types or an entirely unfamiliar class of compounds, is a formidable challenge. Statistical models trained on a reported set of reactions can help predict out-of-sample transformations1–5 but often face two challenges: (1) only sparse data are available i.e., limited information on catalyst–substrate interactions, and (2) simple stereoelectronic parameters may fail to describe mechanistically complex transformations.6,7 Here we report a descriptor generation strategy that accounts for changes in the enantiodetermining step with catalyst or substrate identity, allowing us to model reactions involving distinct ligand and substrate types. As validating case studies, we collected data on enantioselective nickel-catalyzed C(sp3)-couplings8 and trained statistical models with features extracted from the transition states and intermediates proposed to be involved in asymmetric induction. These models allow for the optimization of poorly performing examples reported in a substrate scope and are applicable to unseen ligands and reaction partners. This approach offers the opportunity to streamline catalyst and reaction development, quantitatively transferring knowledge learned on sparse data to novel chemical spaces.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Author information
Authors and Affiliations
Corresponding authors
Supplementary information
Supplementary Information
This file contains Supplementary Information, including the following eight sections: 1. General Information; 2. Computational Workflow and Benchmarking; 3. Case Study 1; 4. Case Study 2; 5. Case Study 3; 6. Additional Synthesis and Characterization of Ligands; 7. NMR Spectra; and 8. References.
Rights and permissions
About this article
Cite this article
Gallarati, S., Bucci, E.M., Doyle, A.G. et al. Transferable enantioselectivity models from sparse data. Nature (2026). https://doi.org/10.1038/s41586-026-10239-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41586-026-10239-7


