Generating 3D small binding molecules using shape-conditioned diffusion models with guidance

Chen, Ziqi; Peng, Bo; Zhai, Tianhua; Adu-Ampratwum, Daniel; Ning, Xia

doi:10.1038/s42256-025-01030-w

Article
Published: 12 May 2025

Generating 3D small binding molecules using shape-conditioned diffusion models with guidance

Nature Machine Intelligence volume 7, pages 758–770 (2025)Cite this article

4244 Accesses
9 Citations
59 Altmetric
Metrics details

Subjects

A preprint version of the article is available at arXiv.

Abstract

Drug development is a critical but notoriously resource- and time-consuming process. Traditional methods, such as high-throughput screening, rely on opportunistic trial and error and cannot ensure optimal precision design. To overcome these challenges, generative artificial intelligence methods have emerged to directly design molecules with desired properties. Here we develop a generative artificial intelligence method DiffSMol for drug discovery that generates 3D small binding molecules based on known ligand shapes. DiffSMol encapsulates ligand shape details within pretrained, expressive shape embeddings and generates binding molecules through a diffusion model. DiffSMol further modifies the generated 3D structures iteratively using shape guidance to better resemble ligand shapes, and protein pocket guidance to optimize binding affinities. We show that DiffSMol outperforms state-of-the-art methods on benchmark datasets. When generating binding molecules resembling ligand shapes, DiffSMol with shape guidance achieves a success rate 61.4%, substantially outperforming the best baseline (11.2%), meanwhile producing molecules with de novo graph structures. DiffSMol with pocket guidance also outperforms the best baseline in binding affinities by 13.2%, and even by 17.7% when combined with shape guidance. Case studies for two critical drug targets demonstrate very favourable physicochemical and pharmacokinetic properties of generated molecules, highlighting the potential of DiffSMol in developing promising drug candidates.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Generated drug candidates for CDK6.**

Target-aware 3D molecular generation based on guided equivariant diffusion

Article Open access 25 August 2025

Interaction-constrained 3D molecular generation using a diffusion model enables structure-based pharmacophore modeling for drug design

Article Open access 02 March 2026

Structure-based drug design with equivariant diffusion models

Article Open access 09 December 2024

Data availability

The MOSES dataset is available via GitHub at https://github.com/molecularsets/moses, and the CrossDocked2020 dataset is available via GitHub at https://github.com/gnina/models/tree/master/data/CrossDocked2020. Additional data, including our generated molecules and trained models, are publicly available via GitHub at https://github.com/ninglab/DiffSMol.

Code availability

The code for DiffSMol is publicly available via GitHub at https://github.com/ninglab/DiffSMol.

References

Sun, D., Gao, W., Hu, H. & Zhou, S. Why 90% of clinical drug development fails and how to improve it? Acta Pharm. Sin. B 12, 3049–3062 (2022).
Article Google Scholar
Wouters, O. J., McKee, M. & Luyten, J. Estimated research and development investment needed to bring a new medicine to market, 2009–2018. JAMA 323, 844–853 (2020).
Article Google Scholar
Yu, W. & MacKerell, A. D. Computer-Aided Drug Design Methods 85–106 (Springer, 2016).
Acharya, C., Coop, A., Polli, J. E. & MacKerell, A. D. Recent advances in ligand-based drug design: relevance and utility of the conformationally sampled pharmacophore approach. Curr. Comput. Aided Drug Des. 7, 10–22 (2011).
Article Google Scholar
Anderson, A. C. The process of structure-based drug design. Chem. Biol. 10, 787–797 (2003).
Article Google Scholar
Gimeno, A. et al. The light and dark sides of virtual screening: what is there to know? Int. J. Mol. Sci. 20, 1375 (2019).
Article Google Scholar
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. In Proc. 2nd International Conference on Learning Representations (eds Bengio, Y. & Lecun, Y.) (OpenReview.net, 2014).
Song, J., Meng, C. & Ermon, S. Denoising diffusion implicit models. In Proc. 9th International Conference on Learning Representations (eds Oh, A. et al.) (OpenReview.net, 2021).
OpenAI et al. GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2023).
Yu, B., Baker, F. N., Chen, Z., Ning, X. & Sun, H. LlaSMol: advancing large language models for chemistry with a large-scale, comprehensive, high-quality instruction tuning dataset. In Proc. First Conference on Language Modeling (OpenReview.net, 2024). https://openreview.net/forum?id=lY6XTF9tPv
Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proc. 35th International Conference on Machine Learning Vol. 80 (eds Dy, J. & Krause, A.) 2323–2332 (PMLR, 2018).
Schneuing, A. et al. Structure-based drug design with equivariant diffusion models. Nat. Comput. Sci. https://doi.org/10.1038/s43588-024-00737-x (2024).
Liu, S. et al. Conversational drug editing using retrieval and domain feedback. In Proc. 12th International Conference on Learning Representations (eds Chaudhuri, S. et al.) (OpenReview.net, 2024).
Boström, J., Hogner, A. & Schmitt, S. Do structurally similar ligands bind in a similar fashion? J. Med. Chem. 49, 6716–6725 (2006).
Article Google Scholar
Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).
Article Google Scholar
Chen, Z., Min, M. R., Parthasarathy, S. & Ning, X. A deep generative model for molecule optimization via one fragment modification. Nat. Mach. Intell. 3, 1040–1049 (2021).
Article Google Scholar
Hoogeboom, E., Satorras, V. G., Vignac, C. & Welling, M. Equivariant diffusion for molecule generation in 3D. In Proc. 39th International Conference on Machine Learning Vol. 162 (eds Chaudhuri, K. et al.) 8867–8887 (PMLR, 2022).
Long, S., Zhou, Y., Dai, X. & Zhou, H. Zero-shot 3d drug design by sketching and generating. In Proc. 36th International Conference on Neural Information Processing Systems (eds Oh, A. H. et al.) 23894–23907 (Curran Associates, 2022).
Adams, K. & Coley, C. W. Equivariant shape-conditioned generation of 3D molecules for ligand-based drug design. In Proc. 11th International Conference on Learning Representations (eds Nickel, M. et al.) (OpenReview.net, 2023).
Chen, Z., Peng, B., Parthasarathy, S. & Ning, X. Shape-conditioned 3D molecule generation via equivariant diffusion models. In Proc. NeurIPS 2023 Generative AI and Biology (GenBio) Workshop (2023). https://openreview.net/forum?id=JWfvMT43pZ
Luo, S., Guan, J., Ma, J. & Peng, J. A. 3D generative model for structure-based drug design. In Proc. 35th International Conference on Neural Information Processing Systems (eds Beygelzimer, A. et al.) 6229–6239 (Curran Associates, 2021).
Peng, X. et al. Pocket2Mol: efficient molecular sampling based on 3D protein pockets. In Proc. 39th International Conference on Machine Learning Vol. 162 (eds Chaudhuri, K. et al.) 17644–17655 (PMLR, 2022).
Guan, J. et al. 3D equivariant diffusion for target-aware molecule generation and affinity prediction. In Proc. 11th International Conference on Learning Representations (eds Nickel, M. et al.) (OpenReview.net, 2023).
Tingle, B. I. et al. Zinc-22—a free multi-billion-scale database of tangible compounds for ligand discovery. J. Chem. Inf. Model. 63, 1166–1176 (2023).
Article Google Scholar
Polykovskiy, D. et al. Molecular Sets (MOSES): a benchmarking platform for molecular generation models. Front. Pharmacol. 11, 565644 (2020).
Article Google Scholar
Guan, J. et al. DecompDiff: diffusion models with decomposed priors for structure-based drug design. In Proc. 40th International Conference on Machine Learning Vol. 202 (eds Krause, A. et al.) 11827–11846 (PMLR, 2023).
Ferreira, L., dos Santos, R., Oliva, G. & Andricopulo, A. Molecular docking and structure-based drug design strategies. Molecules 20, 13384–13421 (2015).
Article Google Scholar
Tadesse, S., Yu, M., Kumarasiri, M., Le, B. T. & Wang, S. Targeting CDK6 in cancer: state of the art and new insights. Cell Cycle 14, 3220–3230 (2015).
Article Google Scholar
El-Amouri, S. S. et al. Neprilysin: an enzyme candidate to slow the progression of Alzheimer’s disease. Am. J. Pathol. 172, 1342–1354 (2008).
Article Google Scholar
Burley, S. K. et al. RCSB protein data bank (rcsb.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Res. 51, D488–D508 (2022).
Article Google Scholar
Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).
Article Google Scholar
Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009).
Article Google Scholar
Neves, M. A. C., Totrov, M. & Abagyan, R. Docking and scoring with icm: the benchmarking results and strategies for improvement. J. Comput. Aided Mol. Des. 26, 675–686 (2012).
Article Google Scholar
Yang, H. et al. admetsar 2.0: web-service for prediction and optimization of chemical admet properties. Bioinformatics 35, 1067–1069 (2018).
Article Google Scholar
Halgren, T. A. Merck molecular force field. I. Basis, form, scope, parameterization, and performance of mmff94. J. Comput. Chem. 17, 490–519 (1996).
Article Google Scholar
Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 23, 3–25 (1997).
Article Google Scholar
Patnaik, A. et al. Efficacy and safety of abemaciclib, an inhibitor of CDK4 and CDK6, for patients with breast cancer, non-small cell lung cancer, and other solid tumors. Cancer Discov. 6, 740–753 (2016).
Article Google Scholar
Lu, J. Palbociclib: a first-in-class CDK4/CDK6 inhibitor for the treatment of hormone-receptor positive advanced breast cancer. J. Hematol. Oncol. 8, 98 (2015).
Article Google Scholar
Tripathy, D., Bardia, A. & Sellers, W. R. Ribociclib (lee011): mechanism of action and clinical impact of this selective cyclin-dependent kinase 4/6 inhibitor in various solid tumors. Clin. Cancer Res. 23, 3251–3262 (2017).
Article Google Scholar
Benigni, R., Bossa, C., Tcheremenskaia, O. & Giuliani, A. Alternatives to the carcinogenicity bioassay:in silicomethods, and thein vitroandin vivomutagenicity assays. Expert Opin. Drug Metab. Toxicol. 6, 809–819 (2010).
Article Google Scholar
Soo, J. Y.-C., Jansen, J., Masereeuw, R. & Little, M. H. Advances in predictive in vitro models of drug-induced nephrotoxicity. Nat. Rev. Nephrol. 14, 378–393 (2018).
Article Google Scholar
Du, X. et al. Insights into protein–ligand interactions: mechanisms, models, and methods. Int. J. Mol. Sci. 17, 144 (2016).
Article Google Scholar
Park, J. J., Florence, P., Straub, J., Newcombe, R. & Lovegrove, S. DeepSDF: learning continuous signed distance functions for shape representation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds Gupta, A. et al.) 165-174 (IEEE, 2019).
Deng, C. et al. Vector neurons: a general framework for SO(3)-equivariant networks. In Proc. IEEE/CVF International Conference on Computer Vision (eds Hassner, T. et al.) 12180–12189 (IEEE, 2021).
Wang, Y. et al. Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. 38, 1–12 (2019).
Google Scholar
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. In Proc. 34th International Conference on Neural Information Processing Systems (eds Larochelle, H. et al.) 6840–6851 (Curran Associates, 2020).
Kullback, S. & Leibler, R. A. On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951).
Article MathSciNet Google Scholar
Hoogeboom, E., Nielsen, D., Jaini, P., Forré, P. & Welling, M. Argmax flows and multinomial diffusion: learning categorical distributions. In Proc. 35th International Conference on Neural Information Processing Systems (eds Beygelzimer, A. et al.) 12454–12465 (Curran Associates, 2021).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article Google Scholar
Jing, B., Eismann, S., Suriana, P., Townshend, R. J. L. & Dror, R. Learning from protein structure with geometric vector perceptrons. In Proc. 11th International Conference on Learning Representations (eds Nickel, M. et al.) (OpenReview.net, 2021).
Garcia Satorras, V., Hoogeboom, E., Fuchs, F., Posner, I. & Welling, M. E(n) equivariant normalizing flows. In Proc. 35th International Conference on Neural Information Processing Systems (eds Beygelzimer, A. et al.) 4181–4192 (Curran Associates, 2021).
Torge, J., Harris, C., Mathis, S. V. & Lio, P. DiffHopp: a graph diffusion model for novel drug design via scaffold hopping. In Proc. ICML 2023 workshop on Computational Biology (PMLR, 2023); https://icml-compbio.github.io/2023/papers/WCBICML2023_paper69.pdf
Dhariwal, P. & Nichol, A. Q. Diffusion models beat GANs on image synthesis. In Proc. 35th International Conference on Neural Information Processing Systems (eds Beygelzimer, A. et al.) 8780–8794 (Curran Associates, 2021).
Eberhardt, J., Santos-Martins, D., Tillack, A. F. & Forli, S. Autodock Vina 1.2.0: new docking methods, expanded force field, and Python bindings. J. Chem. Inf. Model. 61, 3891–3898 (2021).
Article Google Scholar

Download references

Acknowledgements

This project was made possible, in part, by support from the National Science Foundation grant no. IIS-2133650 (X.N. and Z.C.), the National Library of Medicine grant no. 1R01LM014385 (X.N. and D.A.-A.) and the National Center for Advancing Translational Sciences grant no. UM1TR004548 (X.N.). Any opinions, findings, conclusions and recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies. We thank P. J. Lawrence, F. N. Baker and V. Dey for their constructive comments.

Author information

Authors and Affiliations

Computer Science and Engineering, The Ohio State University, Columbus, OH, USA
Ziqi Chen, Bo Peng & Xia Ning
Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Tianhua Zhai
Medicinal Chemistry and Pharmacognosy, The Ohio State University, Columbus, OH, USA
Daniel Adu-Ampratwum & Xia Ning
Biomedical Informatics, The Ohio State University, Columbus, OH, USA
Xia Ning
Translational Data Analytics Institute, The Ohio State University, Columbus, OH, USA
Xia Ning

Authors

Ziqi Chen
View author publications
Search author on:PubMed Google Scholar
Bo Peng
View author publications
Search author on:PubMed Google Scholar
Tianhua Zhai
View author publications
Search author on:PubMed Google Scholar
Daniel Adu-Ampratwum
View author publications
Search author on:PubMed Google Scholar
Xia Ning
View author publications
Search author on:PubMed Google Scholar

Contributions

X.N. conceived the research. X.N. obtained funding for the research. Z.C. and X.N. designed the research. Z.C. and X.N. conducted the research, including data curation, formal analysis, methodology design and implementation, result analysis and visualization. Z.C., B.P. and X.N. drafted the original paper. T.Z. provided comments on case studies for protein targets. D.A.-A. provided comments on case studies for low-quality examples. Z.C., B.P. and X.N. conducted the paper editing and revision. All authors reviewed the final paper.

Corresponding author

Correspondence to Xia Ning.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Auro Patnaik, Zhenqiao Song, Marinka Zitnik and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Supplementary Sections 1–19, discussion, Tables 1–21, Figs. 1–17, results and Algorithms 1–3.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, Z., Peng, B., Zhai, T. et al. Generating 3D small binding molecules using shape-conditioned diffusion models with guidance. Nat Mach Intell 7, 758–770 (2025). https://doi.org/10.1038/s42256-025-01030-w

Download citation

Received: 28 April 2024
Accepted: 31 March 2025
Published: 12 May 2025
Version of record: 12 May 2025
Issue date: May 2025
DOI: https://doi.org/10.1038/s42256-025-01030-w

This article is cited by

MDGCN: multi-scale dynamic aggregation and gated cooperative decoding for arbitrary-scale image super-resolution
- Huilin Liu
- Yu Sun
- Tao Wang
Multimedia Systems (2026)