Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

SynGFN: learning across chemical space with generative flow-based molecular discovery

Abstract

In recent years, artificial intelligence has advanced the design–make–test–analyze cycle, transforming molecular discovery. Despite these advances, the compartmentalized approach to computer-aided molecular design and synthesis remains a critical bottleneck, limiting further optimization of the design–make–test–analyze cycle. Here, to this end, we introduce SynGFN, which models molecular design as a cascade of simulated chemical reactions, enabling the assembly of molecules from synthesizable building blocks. SynGFN features two key ingredients: (1) a hierarchically pretrained policy network that accelerates learning across diverse distributions of desirable molecules in chemical spaces, and (2) a multifidelity acquisition framework to alleviate the cost of reward evaluations. These technical developments collectively endow SynGFN with the capability to explore a chemical space up to an order of magnitude larger (measured in terms of #Circles) than that of other synthesis-aware generative models, while identifying the most diverse, synthesizable and high-performance molecules. We demonstrate SynGFN’s potential impacts by designing inhibitors for GluN1/GluN3A, a therapeutic target for neuropsychiatric disorders.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the SynGFN algorithm.

Similar content being viewed by others

Data availability

Source data are provided with this paper. The building block dataset is available via GitHub at https://github.com/ChemloverYuchen/SynGFN and via Zenodo at https://zenodo.org/records/17034917 (ref. 49).

Code availability

SynGFN is available via GitHub at https://github.com/ChemloverYuchen/SynGFN and via Zenodo at https://zenodo.org/records/17034917 (ref. 49).

References

  1. Ghiandoni, G. M., Evertsson, E., Riley, D. J., Tyrchan, C. & Rathi, P. C. Augmenting DMTA using predictive AI modelling at AstraZeneca. Drug Discov. Today 29, 103945 (2024).

    Article  Google Scholar 

  2. Stach, E. et al. Autonomous experimentation systems for materials development: a community perspective. Matter 4, 2702–2726 (2021).

    Article  Google Scholar 

  3. Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).

    Article  Google Scholar 

  4. Gao, W. & Coley, C. W. The synthesizability of molecules proposed by generative models. J. Chem. Inf. Model. 60, 5714–5723 (2020).

    Article  Google Scholar 

  5. Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009).

    Article  Google Scholar 

  6. Stanley, M. & Segler, M. Fake it until you make it? Generative de novo design and virtual screening of synthesizable molecules. Curr. Opin. Struct. Biol. 82, 102658 (2023).

    Article  Google Scholar 

  7. Hassen, A. K. et al. Generate what you can make: achieving in-house synthesizability with readily available resources in de novo drug design. J. Cheminform. 17, 41 (2025).

    Article  Google Scholar 

  8. Walters, W. P. Virtual chemical libraries: miniperspective. J. Med. Chem. 62, 1116–1124 (2019).

    Article  Google Scholar 

  9. Swanson, K. et al. Generative AI for designing and validating easily synthesizable and structurally novel antibiotics. Nat. Mach. Intell. 6, 338–353 (2024).

    Article  Google Scholar 

  10. Bengio, E., Jain, M., Korablyov, M., Precup, D. & Bengio, Y. Flow network based generative models for non-iterative diverse candidate generation. In Proc. 35th International Conference on Neural Information Processing Systems Vol. 34, 27381–27394 (Curran Associates, 2024).

  11. Zhu, Y. et al. Sample-efficient multi-objective molecular optimization with GFlowNets. In Proc. Advances in Neural Information Processing Systems (eds Oh, A. et al.) Vol. 36, 79667–79684 (Curran Associates, 2023).

  12. Zhang, B. et al. Phenyl fused ring compounds and their applications in the central nervous system. CN119371411A, China National Intellectual Property Administration (CNIPA) (2025).

  13. Enamine. Enamine https://enamine.net (2023).

  14. Thalji, R. K. et al. Discovery of 1-(1,3,5-triazin-2-yl)piperidine-4-carboxamides as inhibitors of soluble epoxide hydrolase. Bioorg. Med. Chem. Lett. 23, 3584–3588 (2013).

    Article  Google Scholar 

  15. Xie, Y., Xu, Z., Ma, J. & Mei, Q. How much space has been explored? Measuring the chemical space covered by databases and machine-generated molecules. In Proc. 11th International Conference on Learning Representations (OpenReview.net, 2023).

  16. Cretu, M. et al. SynFlowNet: design of diverse and novel molecules with synthesis constraints. In Proc. 13th International Conference on Learning Representations (ICLR 2025).

  17. Loeffler, H. H. et al. Reinvent 4: modern AI-driven generative molecule design. J. Cheminform. 16, 20 (2024).

    Article  Google Scholar 

  18. Browne, C. B. et al. A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4, 1–43 (2012).

    Article  Google Scholar 

  19. Fancelli, D. et al. Potent and selective Aurora inhibitors identified by the expansion of a novel scaffold for protein kinase inhibition. J. Med. Chem. 48, 3080–3084 (2005).

    Article  Google Scholar 

  20. Wang, S. et al. Structure of the D2 dopamine receptor bound to the atypical antipsychotic drug risperidone. Nature 555, 269–273 (2018).

    Article  Google Scholar 

  21. Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).

    Article  Google Scholar 

  22. Ghose, A. K., Viswanadhan, V. N. & Wendoloski, J. J. A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases. J. Comb. Chem. 1, 55–68 (1999).

    Article  Google Scholar 

  23. Wang, J. et al. Multi-constraint molecular generation based on conditional transformer, knowledge distillation and reinforcement learning. Nat. Mach. Intell. 3, 914–922 (2021).

    Article  Google Scholar 

  24. Maurer, T. S., Edwards, M., Hepworth, D., Verhoest, P. & Allerton, C. M. N. Designing small molecules for therapeutic success: a contemporary perspective. Drug Discov. Today 27, 538–546 (2022).

    Article  Google Scholar 

  25. Podda, M., Bacciu, D. & Micheli, A. A deep generative model for fragment-based molecule generation. In Proc. 23rd International Conference on Artificial Intelligence and Statistics (eds Chiappa, S. & Calandra, R.) Vol. 108 2240–2250 (PMLR, 2020).

  26. Mohammadi, S., O’Dowd, B., Paulitz-Erdmann, C. & Goerlitz, L. Penalized variational autoencoder for molecular design. Preprint at ChemRxiv https://doi.org/10.26434/chemrxiv.7977131.v1 (2019).

  27. Hopkins, A. L., Keserü, G. M., Leeson, P. D., Rees, D. C. & Reynolds, C. H. The role of ligand efficiency metrics in drug discovery. Nat. Rev. Drug Discov. 13, 105–121 (2014).

    Article  Google Scholar 

  28. Bossi, S., Pizzamiglio, L. & Paoletti, P. Excitatory GluN1/GluN3A glycine receptors (eGlyRs) in brain signaling. Trends Neurosci. 46, 667–681 (2023).

    Article  Google Scholar 

  29. Paoletti, P., Bellone, C. & Zhou, Q. NMDA receptor subunit diversity: impact on receptor properties, synaptic plasticity and disease. Nat. Rev. Neurosci. 14, 383–400 (2013).

    Article  Google Scholar 

  30. Zhu, W., Shenoy, A., Kundrotas, P. & Elofsson, A. Evaluation of AlphaFold-Multimer prediction on multi-chain protein complexes. Bioinformatics 39, btad424 (2023).

    Article  Google Scholar 

  31. Zhu, Z. et al. Negative allosteric modulation of GluN1/GluN3 NMDA receptors. Neuropharmacology 176, 108117 (2020).

    Article  Google Scholar 

  32. Zeng, Y. et al. Identification of a subtype-selective allosteric inhibitor of GluN1/GluN3 NMDA receptors. Front. Pharmacol. 13, 888308 (2022).

    Article  Google Scholar 

  33. Dalke, A. The chemfp project. J. Cheminform. 11, 76 (2019).

    Article  Google Scholar 

  34. Hartenfeller, M. et al. A collection of robust organic synthesis reactions for in silico molecule design. J. Chem. Inf. Model. 51, 3093–3098 (2011).

    Article  Google Scholar 

  35. Button, A., Merk, D., Hiss, J. A. & Schneider, G. Automated de novo molecular design by hybrid machine intelligence and rule-driven chemical synthesis. Nat. Mach. Intell. 1, 307–315 (2019).

    Article  Google Scholar 

  36. Roughley, S. D. & Jordan, A. M. The Medicinal Chemist’s Toolbox: an analysis of reactions used in the pursuit of drug candidates. J. Med. Chem. 54, 3451–3479 (2011).

    Article  Google Scholar 

  37. Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–D1107 (2012).

    Article  Google Scholar 

  38. Yin, X. et al. CODD-Pred: a web server for efficient target identification and bioactivity prediction of small molecules. J. Chem. Inf. Model. 63, 6169–6176 (2023).

    Article  Google Scholar 

  39. Huang, K. et al. Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development. In Proc. Neural Information Processing Systems Track on Datasets and Benchmarks (eds Vanschoren, J. & Yeung, S.) Vol. 1 (Curran Associates, 2021).

  40. GOSTAR. https://www.gostardb.com/

  41. Heid, E. et al. Chemprop: a machine learning package for chemical property prediction. J. Chem. Inf. Model. 64, 9–17 (2024).

    Article  Google Scholar 

  42. Sun, J. et al. ExCAPE-DB: an integrated large scale dataset facilitating big data analysis in chemogenomics. J. Cheminform. 9, 17 (2017).

    Article  Google Scholar 

  43. Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminform. 9, 48 (2017).

    Article  Google Scholar 

  44. Mehta, N. V. & Degani, M. S. The expanding repertoire of covalent warheads for drug discovery. Drug Discov. Today 28, 103799 (2023).

    Article  Google Scholar 

  45. Malkin, N., Jain, M., Bengio, E., Sun, C. & Bengio, Y. Trajectory balance: improved credit assignment in GFlowNets. In Proc. Advances in Neural Information Processing Systems (eds Koyejo, S. et al.) Vol. 35, 5955–5967 (Curran Associates, 2022).

  46. Hawkins, P. C. D., Skillman, A. G. & Nicholls, A. Comparison of shape-matching and docking as virtual screening tools. J. Med. Chem. 50, 74–82 (2007).

    Article  Google Scholar 

  47. Hawkins, P. C. D., Skillman, A. G., Warren, G. L., Ellingson, B. A. & Stahl, M. T. Conformer generation with OMEGA: algorithm and validation using high quality structures from the Protein Databank and Cambridge Structural Database. J. Chem. Inf. Model. 50, 572–584 (2010).

    Article  Google Scholar 

  48. Kearnes, S. & Pande, V. ROCS-derived features for virtual screening. J. Comput. Aided Mol. Des. 30, 609–617 (2016).

    Article  Google Scholar 

  49. Zhu, Y. ChemloverYuchen/SynGFN: SynGFN-1.0. Zenodo https://doi.org/10.5281/ZENODO.17034917 (2025).

Download references

Acknowledgements

This work was financially supported by National Key Research and Development Program of China (2024YFA1306400), National Natural Science Foundation of China (grant number 22373085) and Medical Interdisciplinary Innovation Program 2024, Zhejiang University School of Medicine. We are grateful to the organizer of the Shanghai International Computational Biology Challenge for conducting the bioactivity assays. We also express our gratitude to M. Wang from the First Affiliated Hospital of Shantou University Medical College for his assistance with the OpenEye software.

Author information

Authors and Affiliations

Authors

Contributions

Y.Z. developed SynGFN code, analyzed the data and wrote the manuscript. J.C., D.Z., X.W., Y. Li and Y. Liu. evaluated and interpreted the results. S.L., Y.K., B.Z. and C.L. conducted wet-lab experimental validation. T.H. and C.-Y.H. conceived and supervised the project, interpreted the results and wrote the manuscript. All authors participated in the discussion and approved the final manuscript.

Corresponding authors

Correspondence to Tingjun Hou or Chang-Yu Hsieh.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Longyang Dian and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Kaitlin McCardle, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Explanation of the double sub-flow design of SynGFN and the pre-training strategy.

(a) Explanation of the hierarchical action design adopted by SynGFN, illustrating the process of adding a new reactant in a single step. Current State St represents the building block or intermediate product. Policy model 1 and Policy model 2 are used to predict actions based on the current state and previous actions. For Action 1, the model predicts the reaction to be performed, with probabilities for each reaction. For Action 2, the model predicts the reactants, and their probabilities are determined similarly. Sampled actions correspond to the selected reaction or reactant. (b) Explanation of the flow network behind SynGFN. The flow network is defined as a Directed Acyclic Graph (DAG). Each node represents a state, with intermediate states denoted as x0, x1, and all terminal states as t0, t1. The process of generating objects in the flow network is analogous to water flowing from a start to an endpoint. The probability of taking an action at each state corresponds to the flow through a pipe. A flow-matching constraint ensures that the water entering a state equals the flow out. The flow network constrains the flow through terminal states to their reward feedback R(x). We implement a double sub-flow design for state transitions, representing hierarchical actions. For Action 2, Policy model 2 uses a pre-training strategy via a multi-label classification task to recognize and select reactants that match the specified reactions before training.

Extended Data Fig. 2 Evaluation of SynGFN as a chemical space search algorithm.

(a) Comparison illustration between SynGFN and virtual screening. We enumerated the virtual chemical space constructed from the smallest scale reactant library based on a two-step reaction (with a chemical space size of 300 million molecules) to compare the chemical space search efficiency differences between SynGFN and traditional virtual screening. (b) Comparison of chemical space exploration coverage between the top 10,000 molecules out of 1 million screened by virtual screening and the top 10,000 molecules out of 100,000 searched by SynGFN. We considered the #Circles changes under different threshold t and separately listed the #Circles results under thresholds t of 0.7, 0.75, and 0.8. (c) A t-SNE visualization of the top 10,000 molecules out of 1 million screened by virtual screening and the top 10,000 molecules out of 100,000 searched by SynGFN. (d) Comparison of QSAR activity score distributions between 1 million molecules screened by virtual screening and 100,000 molecules searched by SynGFN. (e) Evaluation of search efficiency between virtual screening and SynGFN. The number of hits under different QSAR activity score thresholds was calculated. To ensure fairness, the number of molecules screened by virtual screening here is also 100,000.

Source data

Extended Data Fig. 3 The evaluation of generated molecules.

(a) For three different targets, the QED distribution of the top 1,000 molecules in terms of QSAR activity scores sampled from 10,000 molecules (upper) and all 10,000 molecules (lower) sampled by four different scales of SynGFN models and the baseline models. (b) For three different targets, the logP distribution of the top 1,000 molecules in terms of QSAR activity scores sampled from 10,000 molecules (upper) and all 10,000 molecules (lower) sampled by four different scales of SynGFN models and the baseline models. (c) The t-SNE visualization results of the top 1,000 molecules in terms of QSAR activity scores sampled from 10,000 molecules by four different scales of SynGFN and baseline models under three different targets, along with the existing active molecules for the corresponding targets.

Source data

Extended Data Fig. 4 Evaluation of generated molecules.

(a) Comparison of Ligand Binding Efficiency (LBE) score distributions between SynGFN-XL generated molecules (top 1,000 out of 10,000 samples) and existing active molecules under three different targets. (b) The distribution curves of minimum RMSD values between molecules generated by SynGFN and active inhibitors against different targets. (c) Binding mode of the top-1 molecule generated by SynGFN-XL (based on docking scores). The first row shows the position and orientation of the generated molecule in the active pocket of the corresponding protein. The second row shows the interactions between the generated molecule and the corresponding protein (at 3.0 Å cut-off).

Source data

Extended Data Fig. 5 SynGFN has substantially accelerated the drug DMTA cycle.

(a) Chemical structures of the candidate compounds synthesized and evaluated for biological activity. Each compound is annotated with its Combo Score and IC50 value. The overlays below depict the align-ment of the generated molecules with the reference molecule: gaussian surface overlap is shown with the corresponding Shape Score, and pharmacophore feature alignment is shown with the corresponding Color Score. (b) Comparison between the synthetic routes proposed by SynGFN and the actual synthetic routes used.

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1–50, Tables 1–3, Sections 1–3 and reference.

Reporting Summary

Supplementary Data 1

Source data for supplementary figures.

Source data

Source Data Fig. 1

SMILES of all molecules used for display, as well as the SMILES of reactants in their synthetic routes and the corresponding catalog numbers.

Source Data Extended Data Fig. 2

The molecular library from virtual screening and all molecules searched by SynGFN for comparison, along with their scores.

Source Data Extended Data Fig. 3

All molecules generated by SynGFN and all baseline models for testing and evaluation on three targets.

Source Data Extended Data Fig. 4

Generated molecules of SynGFN on three targets.

Source Data Extended Data Fig. 5

SMILES of the synthesized molecules D1–D10 involved in the wet-lab experiments, their three scoring results and the IC50 experimental values.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Y., Li, S., Chen, J. et al. SynGFN: learning across chemical space with generative flow-based molecular discovery. Nat Comput Sci 6, 29–38 (2026). https://doi.org/10.1038/s43588-025-00902-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s43588-025-00902-w

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing