Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Generating 3D small binding molecules using shape-conditioned diffusion models with guidance

A preprint version of the article is available at arXiv.

Abstract

Drug development is a critical but notoriously resource- and time-consuming process. Traditional methods, such as high-throughput screening, rely on opportunistic trial and error and cannot ensure optimal precision design. To overcome these challenges, generative artificial intelligence methods have emerged to directly design molecules with desired properties. Here we develop a generative artificial intelligence method DiffSMol for drug discovery that generates 3D small binding molecules based on known ligand shapes. DiffSMol encapsulates ligand shape details within pretrained, expressive shape embeddings and generates binding molecules through a diffusion model. DiffSMol further modifies the generated 3D structures iteratively using shape guidance to better resemble ligand shapes, and protein pocket guidance to optimize binding affinities. We show that DiffSMol outperforms state-of-the-art methods on benchmark datasets. When generating binding molecules resembling ligand shapes, DiffSMol with shape guidance achieves a success rate 61.4%, substantially outperforming the best baseline (11.2%), meanwhile producing molecules with de novo graph structures. DiffSMol with pocket guidance also outperforms the best baseline in binding affinities by 13.2%, and even by 17.7% when combined with shape guidance. Case studies for two critical drug targets demonstrate very favourable physicochemical and pharmacokinetic properties of generated molecules, highlighting the potential of DiffSMol in developing promising drug candidates.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Model overview of DiffSMol.
Fig. 2: Generated drug candidates for CDK6.

Similar content being viewed by others

Data availability

The MOSES dataset is available via GitHub at https://github.com/molecularsets/moses, and the CrossDocked2020 dataset is available via GitHub at https://github.com/gnina/models/tree/master/data/CrossDocked2020. Additional data, including our generated molecules and trained models, are publicly available via GitHub at https://github.com/ninglab/DiffSMol.

Code availability

The code for DiffSMol is publicly available via GitHub at https://github.com/ninglab/DiffSMol.

References

  1. Sun, D., Gao, W., Hu, H. & Zhou, S. Why 90% of clinical drug development fails and how to improve it? Acta Pharm. Sin. B 12, 3049–3062 (2022).

    Article  Google Scholar 

  2. Wouters, O. J., McKee, M. & Luyten, J. Estimated research and development investment needed to bring a new medicine to market, 2009–2018. JAMA 323, 844–853 (2020).

    Article  Google Scholar 

  3. Yu, W. & MacKerell, A. D. Computer-Aided Drug Design Methods 85–106 (Springer, 2016).

  4. Acharya, C., Coop, A., Polli, J. E. & MacKerell, A. D. Recent advances in ligand-based drug design: relevance and utility of the conformationally sampled pharmacophore approach. Curr. Comput. Aided Drug Des. 7, 10–22 (2011).

    Article  Google Scholar 

  5. Anderson, A. C. The process of structure-based drug design. Chem. Biol. 10, 787–797 (2003).

    Article  Google Scholar 

  6. Gimeno, A. et al. The light and dark sides of virtual screening: what is there to know? Int. J. Mol. Sci. 20, 1375 (2019).

    Article  Google Scholar 

  7. Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. In Proc. 2nd International Conference on Learning Representations (eds Bengio, Y. & Lecun, Y.) (OpenReview.net, 2014).

  8. Song, J., Meng, C. & Ermon, S. Denoising diffusion implicit models. In Proc. 9th International Conference on Learning Representations (eds Oh, A. et al.) (OpenReview.net, 2021).

  9. OpenAI et al. GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2023).

  10. Yu, B., Baker, F. N., Chen, Z., Ning, X. & Sun, H. LlaSMol: advancing large language models for chemistry with a large-scale, comprehensive, high-quality instruction tuning dataset. In Proc. First Conference on Language Modeling (OpenReview.net, 2024). https://openreview.net/forum?id=lY6XTF9tPv

  11. Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proc. 35th International Conference on Machine Learning Vol. 80 (eds Dy, J. & Krause, A.) 2323–2332 (PMLR, 2018).

  12. Schneuing, A. et al. Structure-based drug design with equivariant diffusion models. Nat. Comput. Sci. https://doi.org/10.1038/s43588-024-00737-x (2024).

  13. Liu, S. et al. Conversational drug editing using retrieval and domain feedback. In Proc. 12th International Conference on Learning Representations (eds Chaudhuri, S. et al.) (OpenReview.net, 2024).

  14. Boström, J., Hogner, A. & Schmitt, S. Do structurally similar ligands bind in a similar fashion? J. Med. Chem. 49, 6716–6725 (2006).

    Article  Google Scholar 

  15. Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

    Article  Google Scholar 

  16. Chen, Z., Min, M. R., Parthasarathy, S. & Ning, X. A deep generative model for molecule optimization via one fragment modification. Nat. Mach. Intell. 3, 1040–1049 (2021).

    Article  Google Scholar 

  17. Hoogeboom, E., Satorras, V. G., Vignac, C. & Welling, M. Equivariant diffusion for molecule generation in 3D. In Proc. 39th International Conference on Machine Learning Vol. 162 (eds Chaudhuri, K. et al.) 8867–8887 (PMLR, 2022).

  18. Long, S., Zhou, Y., Dai, X. & Zhou, H. Zero-shot 3d drug design by sketching and generating. In Proc. 36th International Conference on Neural Information Processing Systems (eds Oh, A. H. et al.) 23894–23907 (Curran Associates, 2022).

  19. Adams, K. & Coley, C. W. Equivariant shape-conditioned generation of 3D molecules for ligand-based drug design. In Proc. 11th International Conference on Learning Representations (eds Nickel, M. et al.) (OpenReview.net, 2023).

  20. Chen, Z., Peng, B., Parthasarathy, S. & Ning, X. Shape-conditioned 3D molecule generation via equivariant diffusion models. In Proc. NeurIPS 2023 Generative AI and Biology (GenBio) Workshop (2023). https://openreview.net/forum?id=JWfvMT43pZ

  21. Luo, S., Guan, J., Ma, J. & Peng, J. A. 3D generative model for structure-based drug design. In Proc. 35th International Conference on Neural Information Processing Systems (eds Beygelzimer, A. et al.) 6229–6239 (Curran Associates, 2021).

  22. Peng, X. et al. Pocket2Mol: efficient molecular sampling based on 3D protein pockets. In Proc. 39th International Conference on Machine Learning Vol. 162 (eds Chaudhuri, K. et al.) 17644–17655 (PMLR, 2022).

  23. Guan, J. et al. 3D equivariant diffusion for target-aware molecule generation and affinity prediction. In Proc. 11th International Conference on Learning Representations (eds Nickel, M. et al.) (OpenReview.net, 2023).

  24. Tingle, B. I. et al. Zinc-22—a free multi-billion-scale database of tangible compounds for ligand discovery. J. Chem. Inf. Model. 63, 1166–1176 (2023).

    Article  Google Scholar 

  25. Polykovskiy, D. et al. Molecular Sets (MOSES): a benchmarking platform for molecular generation models. Front. Pharmacol. 11, 565644 (2020).

    Article  Google Scholar 

  26. Guan, J. et al. DecompDiff: diffusion models with decomposed priors for structure-based drug design. In Proc. 40th International Conference on Machine Learning Vol. 202 (eds Krause, A. et al.) 11827–11846 (PMLR, 2023).

  27. Ferreira, L., dos Santos, R., Oliva, G. & Andricopulo, A. Molecular docking and structure-based drug design strategies. Molecules 20, 13384–13421 (2015).

    Article  Google Scholar 

  28. Tadesse, S., Yu, M., Kumarasiri, M., Le, B. T. & Wang, S. Targeting CDK6 in cancer: state of the art and new insights. Cell Cycle 14, 3220–3230 (2015).

    Article  Google Scholar 

  29. El-Amouri, S. S. et al. Neprilysin: an enzyme candidate to slow the progression of Alzheimer’s disease. Am. J. Pathol. 172, 1342–1354 (2008).

    Article  Google Scholar 

  30. Burley, S. K. et al. RCSB protein data bank (rcsb.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Res. 51, D488–D508 (2022).

    Article  Google Scholar 

  31. Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).

    Article  Google Scholar 

  32. Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009).

    Article  Google Scholar 

  33. Neves, M. A. C., Totrov, M. & Abagyan, R. Docking and scoring with icm: the benchmarking results and strategies for improvement. J. Comput. Aided Mol. Des. 26, 675–686 (2012).

    Article  Google Scholar 

  34. Yang, H. et al. admetsar 2.0: web-service for prediction and optimization of chemical admet properties. Bioinformatics 35, 1067–1069 (2018).

    Article  Google Scholar 

  35. Halgren, T. A. Merck molecular force field. I. Basis, form, scope, parameterization, and performance of mmff94. J. Comput. Chem. 17, 490–519 (1996).

    Article  Google Scholar 

  36. Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 23, 3–25 (1997).

    Article  Google Scholar 

  37. Patnaik, A. et al. Efficacy and safety of abemaciclib, an inhibitor of CDK4 and CDK6, for patients with breast cancer, non-small cell lung cancer, and other solid tumors. Cancer Discov. 6, 740–753 (2016).

    Article  Google Scholar 

  38. Lu, J. Palbociclib: a first-in-class CDK4/CDK6 inhibitor for the treatment of hormone-receptor positive advanced breast cancer. J. Hematol. Oncol. 8, 98 (2015).

    Article  Google Scholar 

  39. Tripathy, D., Bardia, A. & Sellers, W. R. Ribociclib (lee011): mechanism of action and clinical impact of this selective cyclin-dependent kinase 4/6 inhibitor in various solid tumors. Clin. Cancer Res. 23, 3251–3262 (2017).

    Article  Google Scholar 

  40. Benigni, R., Bossa, C., Tcheremenskaia, O. & Giuliani, A. Alternatives to the carcinogenicity bioassay:in silicomethods, and thein vitroandin vivomutagenicity assays. Expert Opin. Drug Metab. Toxicol. 6, 809–819 (2010).

    Article  Google Scholar 

  41. Soo, J. Y.-C., Jansen, J., Masereeuw, R. & Little, M. H. Advances in predictive in vitro models of drug-induced nephrotoxicity. Nat. Rev. Nephrol. 14, 378–393 (2018).

    Article  Google Scholar 

  42. Du, X. et al. Insights into protein–ligand interactions: mechanisms, models, and methods. Int. J. Mol. Sci. 17, 144 (2016).

    Article  Google Scholar 

  43. Park, J. J., Florence, P., Straub, J., Newcombe, R. & Lovegrove, S. DeepSDF: learning continuous signed distance functions for shape representation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds Gupta, A. et al.) 165-174 (IEEE, 2019).

  44. Deng, C. et al. Vector neurons: a general framework for SO(3)-equivariant networks. In Proc. IEEE/CVF International Conference on Computer Vision (eds Hassner, T. et al.) 12180–12189 (IEEE, 2021).

  45. Wang, Y. et al. Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. 38, 1–12 (2019).

    Google Scholar 

  46. Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. In Proc. 34th International Conference on Neural Information Processing Systems (eds Larochelle, H. et al.) 6840–6851 (Curran Associates, 2020).

  47. Kullback, S. & Leibler, R. A. On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951).

    Article  MathSciNet  Google Scholar 

  48. Hoogeboom, E., Nielsen, D., Jaini, P., Forré, P. & Welling, M. Argmax flows and multinomial diffusion: learning categorical distributions. In Proc. 35th International Conference on Neural Information Processing Systems (eds Beygelzimer, A. et al.) 12454–12465 (Curran Associates, 2021).

  49. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  Google Scholar 

  50. Jing, B., Eismann, S., Suriana, P., Townshend, R. J. L. & Dror, R. Learning from protein structure with geometric vector perceptrons. In Proc. 11th International Conference on Learning Representations (eds Nickel, M. et al.) (OpenReview.net, 2021).

  51. Garcia Satorras, V., Hoogeboom, E., Fuchs, F., Posner, I. & Welling, M. E(n) equivariant normalizing flows. In Proc. 35th International Conference on Neural Information Processing Systems (eds Beygelzimer, A. et al.) 4181–4192 (Curran Associates, 2021).

  52. Torge, J., Harris, C., Mathis, S. V. & Lio, P. DiffHopp: a graph diffusion model for novel drug design via scaffold hopping. In Proc. ICML 2023 workshop on Computational Biology (PMLR, 2023); https://icml-compbio.github.io/2023/papers/WCBICML2023_paper69.pdf

  53. Dhariwal, P. & Nichol, A. Q. Diffusion models beat GANs on image synthesis. In Proc. 35th International Conference on Neural Information Processing Systems (eds Beygelzimer, A. et al.) 8780–8794 (Curran Associates, 2021).

  54. Eberhardt, J., Santos-Martins, D., Tillack, A. F. & Forli, S. Autodock Vina 1.2.0: new docking methods, expanded force field, and Python bindings. J. Chem. Inf. Model. 61, 3891–3898 (2021).

    Article  Google Scholar 

Download references

Acknowledgements

This project was made possible, in part, by support from the National Science Foundation grant no. IIS-2133650 (X.N. and Z.C.), the National Library of Medicine grant no. 1R01LM014385 (X.N. and D.A.-A.) and the National Center for Advancing Translational Sciences grant no. UM1TR004548 (X.N.). Any opinions, findings, conclusions and recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies. We thank P. J. Lawrence, F. N. Baker and V. Dey for their constructive comments.

Author information

Authors and Affiliations

Authors

Contributions

X.N. conceived the research. X.N. obtained funding for the research. Z.C. and X.N. designed the research. Z.C. and X.N. conducted the research, including data curation, formal analysis, methodology design and implementation, result analysis and visualization. Z.C., B.P. and X.N. drafted the original paper. T.Z. provided comments on case studies for protein targets. D.A.-A. provided comments on case studies for low-quality examples. Z.C., B.P. and X.N. conducted the paper editing and revision. All authors reviewed the final paper.

Corresponding author

Correspondence to Xia Ning.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Auro Patnaik, Zhenqiao Song, Marinka Zitnik and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Supplementary Sections 1–19, discussion, Tables 1–21, Figs. 1–17, results and Algorithms 1–3.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Peng, B., Zhai, T. et al. Generating 3D small binding molecules using shape-conditioned diffusion models with guidance. Nat Mach Intell 7, 758–770 (2025). https://doi.org/10.1038/s42256-025-01030-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s42256-025-01030-w

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research