Abstract
Chemical synthesis, as a foundational methodology in the creation of transformative molecules, exerts substantial influence across diverse sectors from life sciences to materials and energy. Current chemical synthesis practices emphasize laborious and costly trial-and-error workflows, underscoring the urgent needs for advanced AI assistants. Recently, large language models, typified by GPT-4, have been introduced as an efficient tool to facilitate scientific research. Here we present Chemma, a fully fine-tuned large language model with 1.28 million pairs of questions and answers about reactions, as an assistant to accelerate organic chemistry synthesis. Chemma surpasses the best-known results in multiple chemical tasks, for example, single-step retrosynthesis and yield prediction, which highlights the potential of general artificial intelligence for organic chemistry. By predicting yields across the experimental reaction space, Chemma significantly improves the reaction exploration capability of Bayesian optimization. More importantly, integrated in an active learning framework, Chemma exhibits advanced potentials of autonomously experimental exploration and optimization in open reaction spaces. For an unreported Suzuki–Miyaura cross-coupling reaction of cyclic aminoboronates and aryl halides for the synthesis of α-aryl N-heterocycles, the human–artificial intelligence collaboration successfully explored a suitable ligand (tri(1-adamantyl)phosphine) and solvent (1,4-dioxane) within only 15 runs, achieving an isolated yield of 67%. These results reveal that, without quantum-chemical calculations, Chemma can comprehend and extract chemical insights from reaction data, in a manner akin to human experts. This work opens avenues for accelerating organic chemistry synthesis with adapted large language models.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
All the source data used for model training were from ORD42 and USPTO44. The HTE data for the Suzuki–Miyaura reaction and the Pd-catalysed Buchwald–Hartwig reaction were collected from refs. 54,61, respectively. Literature data for the Pd-catalysed carbonylation reactions were released by ref. 58. Regioselectivity and enantioselectivity data were collected from refs. 59,60.
Code availability
The source code and inference for Chemma is available via Zenodo at https://doi.org/10.5281/zenodo.15295848 (ref. 67). Chemma is available for free usage at https://ai4chem.sjtu.edu.cn/.
References
Mendoza, A., Ishihara, Y. & Baran, P. S. Scalable enantioselective total synthesis of taxanes. Nat. Chem. 4, 21–25 (2012).
Elvira, K. S., i Solvas, X. C., Wootton, R. C. & Demello, A. J. The past, present and potential for microfluidic reactor technology in chemical synthesis. Nat. Chem. 5, 905–915 (2013).
Ball, P. Chemistry: why synthesize? Nature 528, 327–329 (2015).
Newman-Stonebraker, S. H. et al. Univariate classification of phosphine ligation state and reactivity in cross-coupling catalysis. Science 374, 301–308 (2021).
Mikulak-Klucznik, B. et al. Computational planning of the synthesis of complex natural products. Nature 588, 83–88 (2020).
Jablonka, K. M., Schwaller, P., Ortega-Guerrero, A. & Smit, B. Leveraging large language models for predictive chemistry. Nat. Mach. Intell. 6, 161–169 (2024).
Shen, Y. et al. Automation and computer-assisted planning for chemical synthesis. Nat. Rev. Methods Primers 1, 23 (2021).
Tao, H. et al. Nanoparticle synthesis assisted by machine learning. Nat. Rev. Mater. 6, 701–716 (2021).
Merchant, A. et al. Scaling deep learning for materials discovery. Nature 624, 80–85 (2023).
Burger, B. et al. A mobile robotic chemist. Nature 583, 237–241 (2020).
Angello, N. H. et al. Closed-loop optimization of general reaction conditions for heteroaryl Suzuki–Miyaura coupling. Science 378, 399–405 (2022).
Betinol, I. O., Lai, J., Thakur, S. & Reid, J. P. A data-driven workflow for assigning and predicting generality in asymmetric catalysis. J. Am. Chem. Soc. 145, 12870–12883 (2023).
Rinehart, N. I. et al. A machine-learning tool to predict substrate-adaptive conditions for Pd-catalyzed C–N couplings. Science 381, 965–972 (2023).
Granda, J. M., Donina, L., Dragone, V., Long, D.-L. & Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559, 377–381 (2018).
Mehr, S. H. M., Craven, M., Leonov, A. I., Keenan, G. & Cronin, L. A universal system for digitization and automatic execution of the chemical synthesis literature. Science 370, 101–108 (2020).
Rohrbach, S. et al. Digitization and validation of a chemical synthesis literature database in the ChemPU. Science 377, 172–180 (2022).
Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).
Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60 (2023).
Toniato, A., Schwaller, P., Cardinale, A., Geluykens, J. & Laino, T. Unassisted noise reduction of chemical reaction datasets. Nat. Mach. Intell. 3, 485–494 (2021).
Achiam, J. et al. GPT-4 technical report. Preprint at https://doi.org/10.48550/arXiv.2303.08774 (2023).
Lehr, S. A., Caliskan, A., Liyanage, S. & Banaji, M. R. ChatGPT as research scientist: probing GPT’s capabilities as a research librarian. Proc. Natl Acad. Sci. USA 121, e2404328121 (2024).
Kang, Y. & Kim, J. ChatMOF: an artificial intelligence system for predicting and generating metal–organic frameworks using large language models. Nat. Commun. 15, 4705 (2024).
Dagdelen, J. et al. Structured information extraction from scientific text with large language models. Nat. Commun. 15, 1418 (2024).
Hou, W. & Ji, Z. Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis. Nat. Methods 21, 1462–1465 (2024).
Zheng, Z. et al. A GPT-4 reticular chemist for guiding MOF discovery. Angew. Chem. Int. Ed. 62, e202311983 (2023).
Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models. Nature 624, 570–578 (2023).
Canty, R. B. & Abolhasani, M. Reproducibility in automated chemistry laboratories using computer science abstractions. Nat. Synth. 3, 1327–1339 (2024).
Ruan, Y. et al. An automatic end-to-end chemical synthesis development platform powered by large language models. Nat. Commun. 15, 10160 (2024).
Zheng, Z. et al. ChatGPT research group for optimizing the crystallinity of MOFs and COFs. ACS Cent. Sci. 9, 2161–2170 (2023).
Bran, A. M. et al. Augmenting large language models with chemistry tools. Nat. Mach. Intell. 6, 525–535 (2024).
Zheng, Z., Zhang, O., Borgs, C., Chayes, J. T. & Yaghi, O. M. ChatGPT chemistry assistant for text mining and the prediction of MOF synthesis. J. Am. Chem. Soc. 145, 18048–18062 (2023).
Antunes, L. M., Butler, K. T. & Grau-Crespo, R. Crystal structure generation with autoregressive large language modeling. Nat. Commun. 15, 10570 (2024).
Zheng, Z. et al. Integrating machine learning and large language models to advance exploration of electrochemical reactions. Angew. Chem. Int. Ed. 137, e202418074 (2024).
Ramos, M. C., Collison, C. J. & White, A. D. A review of large language models and autonomous agents in chemistry. Chem. Sci. 16, 2514–2572 (2025).
Segler, M. H., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).
Coley, C. W. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 365, eaax1566 (2019).
Shields, B. J. et al. Bayesian reaction optimization as a tool for chemical synthesis. Nature 590, 89–96 (2021).
Tang, T. et al. Interrogating the mechanistic features of Ni (I)-mediated aryl iodide oxidative addition using electroanalytical and statistical modeling techniques. J. Am. Chem. Soc. 145, 8689–8699 (2023).
Wang, J. Y. et al. Identifying general reaction conditions by bandit optimization. Nature 626, 1025–1033 (2024).
Raghavan, P. et al. Dataset design for building models of chemical reactivity. ACS Cent. Sci. 9, 2196–2204 (2023).
Frey, N. C. et al. Neural scaling of deep chemical models. Nat. Mach. Intell. 5, 1297–1305 (2023).
Kearnes, S. M. et al. The Open Reaction Database. J. Am. Chem. Soc. 143, 18820–18826 (2021).
Coley, C. W., Rogers, L., Green, W. H. & Jensen, K. F. Computer-assisted retrosynthesis based on molecular similarity. ACS Cent. Sci. 3, 1237–1245 (2017).
Lowe, D. M. Extraction of Chemical Structures and Reactions from the Literature. PhD thesis, University of Cambridge (2012).
Tu, Z. & Coley, C. W. Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. J. Chem. Inf. Model. 62, 3503–3513 (2022).
Sacha, M. et al. Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. J. Chem. Inf. Model. 61, 3273–3284 (2021).
Seo, S.-W. et al. GTA: graph truncated attention for retrosynthesis. In Proc. AAAI Conference on Artificial Intelligence Vol. 35, 531–539 (AAAI Press, 2021).
Somnath, V. R., Bunne, C., Coley, C., Krause, A. & Barzilay, R. Learning graph models for retrosynthesis prediction. Adv. Neural Inf. Process. Syst. 34, 9405–9415 (2021).
Wang, X. et al. RetroPrime: a diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chem. Eng. J. 420, 129845 (2021).
Wan, Y., Hsieh, C.-Y., Liao, B. & Zhang, S. Retroformer: pushing the limits of end-to-end retrosynthesis transformer. In International Conference on Machine Learning 22475–22490 (PMLR, 2022).
Chen, S. & Jung, Y. Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1, 1612–1620 (2021).
Yao, L. et al. Node-aligned graph-to-graph: elevating template-free deep learning approaches in single-step retrosynthesis. JACS Au. 4, 992–1003 (2024).
Coley, C. W., Barzilay, R., Jaakkola, T. S., Green, W. H. & Jensen, K. F. Prediction of organic reaction outcomes using machine learning. ACS Cent. Sci. 3, 434–443 (2017).
Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).
Li, S.-W., Xu, L.-C., Zhang, C., Zhang, S.-Q. & Hong, X. Reaction performance prediction with an extrapolative and interpretable graph model based on chemical knowledge. Nat. Commun. 14, 3569 (2023).
Szymanski, N. J. et al. An autonomous laboratory for the accelerated synthesis of novel materials. Nature 624, 86–91 (2023).
Saebi, M. et al. On the use of real-world datasets for reaction yield prediction. Chem. Sci. 14, 4997–5005 (2023).
Li, D.-Z. & Gong, X.-Q. Challenges with literature-derived data in machine learning for yield prediction: a case study on Pd-catalyzed carbonylation reactions. J. Phys. Chem. A 128, 10423–10430 (2024).
Li, X., Zhang, S.-Q., Xu, L.-C. & Hong, X. Predicting regioselectivity in radical C–H functionalization of heterocycles through machine learning. Angew. Chem. Int. Ed. 59, 13253–13259 (2020).
Zahrt, A. F. et al. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 363, eaau5631 (2019).
Perera, D. et al. A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow. Science 359, 429–434 (2018).
Guo, T. et al. What can large language models do in chemistry? A comprehensive benchmark on eight tasks. Adv. Neural Inf. Process. Syst. 36, 59662–59688 (2023).
Taylor, R. D., MacCoss, M. & Lawson, A. D. Rings in drugs: miniperspective. J. Med. Chem. 57, 5845–5859 (2014).
Ma, X. et al. A general approach to stereospecific cross-coupling reactions of nitrogen-containing stereocenters. Chem 6, 781–791 (2020).
Shu, X., Zhong, D., Lin, Y., Qin, X. & Huo, H. Modular access to chiral α-(hetero) aryl amines via Ni/photoredox-catalyzed enantioselective cross-coupling. J. Am. Chem. Soc. 144, 8797–8806 (2022).
Sarkar, S., Wagulde, S., Jia, X. & Gevorgyan, V. General and selective metal-free radical α-C–H borylation of aliphatic amines. Chem 8, 3096–3108 (2022).
Zhang, Y. et al. Large language models to accelerate organic chemistry synthesis. Zenodo https://doi.org/10.5281/zenodo.15295848 (2025).
Ruiz-Castillo, P. & Buchwald, S. L. Applications of palladium-catalyzed C–N cross-coupling reactions. Chem. Rev. 116, 12564–12649 (2016).
Acknowledgements
We thank K. Ding for valuable discussion on the design of this work and the SJTU AI for Science platform for computing support. This work was jointly supported by the Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102), the National Natural Science Foundation of China (62102258) and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Contributions
Y.X. and Y.Z. conceived of the research and designed the analyses. Y.Z. designed and implemented the Chemma model. Y.H., S.C., F.Z. and Y.Z. performed the wet experiments. Y.Z., Y.X., R.Y., X.Z., X.L. and K.Z. processed the data and performed the results analyses. M.Y. and J.T. discussed the model design. Y.J. and X.Y. built the computing platform for model training. Y.Z., Y.X. and F.Z. wrote the paper. Y.X., F.Z., Y.J. and X.Y. supervised the research.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Victor Batista and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Distribution of types of reactions in the USPTO-50k and ORD datasets.
(a) Data organization of USPTO-50k and ORD datasets. All of reactions in USPTO-50k are from patents in the United States; Most of reactions in ORD are from literature. (b-c) Power law fitting of the reactant distribution in the USPTO-50k and the catalyst distribution in the ORD, where the shallow points show the probability density and the deep dashed-line shows the ideal power-law fitting, respectively. (d-e) The bar charts of fifteen most common reactants and catalysts in the USPTO-50k and ORD, respectively. The shallow color presents the decimal-scale proportion and the deep color presents the log-scale count.
Extended Data Fig. 2 Description of the three HTE datasets.
(a) Pd-catalysed Buchwald-Hartwig C-N coupling reaction: aryl halides, isoxazole additives, Pd precatalyst, ligands and bases. (b) Suzuki–Miyaura reaction, consisting of the reaction yield measured as a function of boronic acid derivative, aryl halide, ligand, base and solvent. (c) C-H arylation dataset components: ligands, imidazoles, aryl bromides.
Extended Data Fig. 3 Multi-step retrosynthesis routes of five drug molecules predicted by Chemma.
We select five drug molecules as the target products including Osimertinib, Vonoprazan, Mitapivat, Pirtobrutinib, and Ritlecitinib. The reaction centers and leaving groups are highlighted in different colors. ‘Rank-1’ in predicted synthetic routes indicates that our predicted step with the highest probability hits the synthetic route reported in the literature.
Extended Data Fig. 4 Assessment of yield prediction performance by RF model and Chemma on two HTE reactions: Suzuki-Miyaura and Buchwald-Hartwig.
(a-c) The distribution of the yields for Pd-catalysed Suzuki–Miyaura [HTE], Buchwald-Hartwig [ELN], and imidazole C-H arylation [HTE] reactions. (d, e) Test set performance of the RF model and Chemma with randomly split strategy. A gradual erosion in predictive accuracy occurred from 90% of the entire dataset down to 5% of the full data set. (f, g) Test set performance of the RF and Chemma when training and test sets are split by diverse substrate scopes. For Suzuki-Miyaura reaction, all reactions can be split by twenty different kinds of substrates. For the Buchwald-Hartwig reaction, all reactions are split by fifteen aryl chloride substrates. We define four scenarios for evaluation characterized by variable training and testing substrates. For instance, a scenario encompasses reactions involving eight substrates in the training phase and reactions with four substrates for testing. (h, i) Test set performance of the RF model and Chemma by isolating conditions sets. For the Suzuki-Miyaura reaction, all reactions are split by eleven different kinds of ligands. For the Buchwald-Hartwig reaction, we select four case scenarios that had been tested in for evaluation. Training and testing sets are divided with additives scopes. Accordingly, in Case-1, tested additives are enumerated by indices a10, a18, a15, a23, and a4; in Case-2, indices a11, a9, a1, a17 and a5 are selected for testing; in Case-3, indices a14, a8, a21, a12, a6 are selected for testing; in Case-4, indices a16, a2, a22, a20, a3 are selected for testing. (j-k) The distribution of free energy barriers on two reactions: chiral phosphoric acid-catalyzed thiol addition and radical C-H functionalization reaction. (l) Chemma’s performance on two selective datasets. Each training sets with different proportions are selected randomly from the original full data set, and 30% data of the entire datasets are randomly selected as test sets.
Extended Data Fig. 5 Performance evaluation of Chemma for yield prediction with literature-derived Pd-catalyzed homogeneous carbonylation reactions.
(a) Proportions of the top five most frequently used catalyst precursors, ligands, bases, oxidants, additives, and solvents in the data set. (b) Schematic representation and yield distribution of the Pd-catalyzed carbonylation reaction. (c) Predictive results of the Chemma. We assess the generalization capability of Chemma in the out-of-sample testing strategy. Within the training set, we select literature IDs < 100, while the testing set exclusively included data from the original testing set with literature IDs > 100, challenging the Chemma to predict reaction performance in chemical spaces it has not previously encountered.
Extended Data Fig. 6 Case study and visualization of selectivity prediction performance.
(a-c) Successful case: comparison of sites predictions and experimentally determined for products. For each reaction, the predicted site is exactly same with the labeled one. Reactive sites of products are highlighted with a circle. (d) Illustration of failure prediction results. The measured probability of the target product with N m-meta site is 0.7652, but the predicted result is 0.6016 with N o-ortho site.
Extended Data Fig. 7 Detailed workflow of active learning framework for reaction exploration and optimization driven by Chemma.
In round 0, Chemma iteratively suggests the next reaction condition considering the feedback of last wet experiment. After a round of ‘suggestion-feedback loop’. In rounds 1-N, we fine-tune Chemma to adapt the reactions. The framework works not only on pre-defined reaction spaces by experts, but also on open reaction space where the conditions are not limited to experts’ prior knowledge. Credit: chemistry apparatus icons, Freepik.com.
Extended Data Fig. 8 Illustration of detailed prompts for optimizing reactions including Pd-catalysed imidazole C-H arylation reaction, Pd-catalysed Buchwald-Hartwig reaction, and our unreported synthesis of aryl-substituted reaction of nitrogen heterocycles, respectively.
(a) The detailed prompts used for optimizing conditions of imidazole C-H arylation and Buchwald-Hartwig reactions. (b-c) Details of reaction optimization process. A total of 16 experiments are conducted in s closed-loop fashion continuously. Taking imidazole C-H arylation reaction as a example, for each optimized process, the initial solvent-base variables are randomly selected from the entire reaction space. We interact with Chemma by zero-shot prompts to acquire a suitable ligand. Initial generation provides a preliminary exploration of the reaction space. Subsequently, we leverage both the observed yield from the initial exploration and the generated ligand to construct the ICL prompts and ask Chemma to suggest a ‘higher-yield’ ligand. If the suggested ligand has been tested in previous runs, we randomly change the reaction condition variables except the ligand for the next group of zero-shot interaction and experimental runs.
Extended Data Fig. 9 The new ligands designed by Chemma for the Pd-catalysed Buchwald-Hartwig reaction.
For a reported HTE reaction, we ask Chemma to design new ligands (L1 to L7) that have not been explored. It is worth noting that we synthesize L1, L3, and L6, and conduct wet experiments to evaluate the effectiveness of the generated ligands. The yield of L1 and L3 is 6% and 16%, respectively. L6 exhibits no reactivity.
Extended Data Fig. 10 Illustration of detailed prompts for the expiration of the new reaction.
We show the detailed prompts designed as input to the Chemma for optimizing the reported synthesis of aryl-substituted reaction of nitrogen heterocycles.
Supplementary information
Supplementary Information
Supplementary Figs. 1–4, Discussion, and Supplementary Tables 1 and 2.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, Y., Han, Y., Chen, S. et al. Large language models to accelerate organic chemistry synthesis. Nat Mach Intell 7, 1010–1022 (2025). https://doi.org/10.1038/s42256-025-01066-y
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s42256-025-01066-y