Abstract
Efficient molecular design methods are crucial for accelerating early stage drug discovery, potentially saving years of development time and billions of dollars in costs. Current molecular design methods rely on sequence-based or graph-based representations, emphasizing local features such as bonds and atoms but lacking a comprehensive depiction of the overall molecular topology. Here we introduce SketchMol, an image-based molecular generation framework that combines visual understanding with molecular design. SketchMol leverages diffusion models and applies a refinement technique called reinforcement learning from molecular experts to improve the generation of viable molecules. It creates molecules through a painting-like approach that simultaneously depicts local structures and global layout of the molecule. By visualizing molecular structures, various design tasks are unified within a single image-based framework. De novo design becomes sketching new molecular images, whereas editing tasks transform into filling partially drawn images. Through extensive experiments, we demonstrated that SketchMol effectively handles a variety of molecular design tasks.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
The data used in this study are available via Figshare (https://doi.org/10.6084/m9.figshare.27210429.v1)64.
Code availability
The source code for SketchMol is available via GitHub (https://github.com/WangZiXubiubiu/SketchMol-v1) and Zenodo (https://doi.org/10.5281/zenodo.13937534)65.
References
Scannell, J. W., Blanckley, A., Boldon, H. & Warrington, B. Diagnosing the decline in pharmaceutical R&D efficiency. Nat. Rev. Drug Discov. 11, 191–200 (2012).
von Delft, A. et al. Accelerating antiviral drug discovery: lessons from COVID-19. Nat. Rev. Drug Discov. 22, 585–603 (2023).
Pandey, M. et al. The transformational role of GPU computing and deep learning in drug discovery. Nat. Mach. Intell. 4, 211–221 (2022).
Sliwoski, G., Kothiwale, S., Meiler, J. & Lowe, E. W. Computational methods in drug discovery. Pharmacol. Rev. 66, 334–395 (2014).
Tong, X. et al. Generative models for de novo drug design. J. Med. Chem. 64, 14011–14027 (2021).
Zeng, X. et al. Deep generative molecular design reshapes drug discovery. Cell Rep. Med. 3, 100794–100802 (2022).
Bian, Y. & Xie, X.-Q. Generative chemistry: drug discovery with deep learning generative models. J. Mol. Model. 27, 71–88 (2021).
Bagal, V., Aggarwal, R., Vinod, P. & Priyakumar, U. D. MolGPT molecular generation using a transformer-decoder model. J. Chem. Inf. Model. 62, 2064–2076 (2021).
Wu, J.-N. et al. t-SMILES: a fragment-based molecular representation framework for de novo ligand design. Nat. Commun. 15, 4993 (2024).
Du, Y. et al. Machine learning-aided generative molecular design. Nat. Mach. Intell. 6, 589–604 (2024).
Li, Y. et al. Generative deep learning enables the discovery of a potent and selective RIPK1 inhibitor. Nat. Commun. 13, 6891 (2022).
Liu, Q., Allamanis, M., Brockschmidt, M. & Gaunt, A. L. Constrained graph variational autoencoders for molecule design. Adv. Neural Inf. Process. Syst. 31, 7806–7815 (2018).
Jin, W., Barzilay, R. & Jaakkola, T. S. Junction tree variational autoencoder for molecular graph generation. In Proc. 35th International Conference on Machine Learning 2328–2337 (PMLR, 2018).
Jin, W., Barzilay, R. & Jaakkola, T. S. Hierarchical generation of molecular graphs using structural motifs. In Proc. 37th International Conference on Machine Learning 4839–4848 (PMLR, 2020).
Maziarz, K. et al. Learning to extend molecular scaffolds with structural motifs. In The Tenth International Conference on Learning Representations 1–22 (ICLR, 2022).
Vignac, C. et al. DiGress: discrete denoising diffusion for graph generation. In The Eleventh International Conference on Learning Representations 1–22 (ICLR, 2023).
Zhu, Y. et al. A survey on deep graph generation: methods and applications. In Proc. First Learning on Graphs Conference 47:1–47:21 (PMLR, 2022).
Vong, W. K., Wang, W., Orhan, A. E. & Lake, B. M. Grounded language acquisition through the eyes and ears of a single child. Science 383, 504–511 (2024).
Cheng, S. et al. EgoThink: evaluating first-person perspective thinking capability of vision-language models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition 14291–14302 (IEEE, 2024).
Song, W. et al. HOIAnimator: generating text-prompt human-object animations using novel perceptive diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition 811–820 (IEEE, 2024).
Fang, C., Hu, X., Luo, K. & Tan, P. Ctrl-Room: controllable text-to-3D room meshes generation with layout constraints. Preprint at https://arxiv.org/abs/2310.03602 (2023).
Chen, W., Gu, T., Xu, Y. & Chen, A. Magic clothing: controllable garment-driven image synthesis. In Proc. 32nd ACM International Conference on Multimedia 6939–6948 (Association for Computing Machinery, 2024).
Schneider, G. & Fechner, U. Computer-based de novo design of drug-like molecules. Nat. Rev. Drug Discov. 4, 649–663 (2005).
Nicolaou, C. A., Brown, N. & Pattichis, C. S. Molecular optimization using computational multi-objective methods. Curr. Opin. Drug Discov. Dev. 10, 316–324 (2007).
Alizadeh, S. R. & Ebrahimzadeh, M. A. Quercetin derivatives: drug design, development, and biological activities, a review. Eur. J. Med. Chem. 229, 114068 (2022).
Hao, Y. et al. A review of the design and modification of lactoferricins and their derivatives. Biometals 31, 331–341 (2018).
Imrie, F., Bradley, A. R., van der Schaar, M. & Deane, C. M. Deep generative models for 3D linker design. J. Chem. Inf. Model. 60, 1983–1995 (2020).
Huang, Y., Peng, X., Ma, J. & Zhang, M. 3DLinker: an E(3) equivariant variational autoencoder for molecular linker design. In International Conference on Machine Learning 9280–9294 (PMLR, 2022).
Elharrouss, O., Almaadeed, N., Al-Maadeed, S. & Akbari, Y. Image inpainting: a review. Neural Process. Lett. 51, 2007–2028 (2020).
Tschannen, M., Bachem, O. & Lucic, M. Recent advances in autoencoder-based representation learning. Preprint at https://arxiv.org/abs/1812.05069 (2018).
Huang, H. et al. UNet 3+: a full-scale connected UNet for medical image segmentation. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing 1055–1059 (IEEE, 2020).
Croitoru, F.-A., Hondru, V., Ionescu, R. T. & Shah, M. Diffusion models in vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 45, 10850–10869 (2023).
Uesato, J. et al. Solving math word problems with process- and outcome-based feedback. Preprint at https://arxiv.org/abs/2211.14275 (2022).
Kim, S. et al. Pubchem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 49, D1388–D1395 (2021).
Karras, T., Aila, T., Laine, S. & Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. In 6th International Conference on Learning Representations 1–26 (ICLR, 2018).
van den Oord, A., Vinyals, O. & Kavukcuoglu, K. Neural discrete representation learning. Adv. Neural Inf. Process. Syst. 30, 6306–6315 (2017).
Esser, P., Rombach, R. & Ommer, B. Taming transformers for high-resolution image synthesis. In IEEE Conference on Computer Vision and Pattern Recognition 12873–12883 (IEEE, 2021).
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 10684–10695 (IEEE, 2022).
Yang, L. et al. Diffusion models: a comprehensive survey of methods and applications. ACM Comput. Surv. 56, 105 (2023).
Peng, X. & Zhu, F. Hitting stride by degrees: fine grained molecular generation via diffusion model. Expert Syst. Appl. 244, 122949 (2024).
Weininger, D. SMILES, a chemical language and information system. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).
Panaretos, V. M. & Zemel, Y. Statistical aspects of Wasserstein distances. Annu. Rev. Stat. Appl. 6, 405–431 (2019).
Preuer, K., Renz, P., Unterthiner, T., Hochreiter, S. & Klambauer, G. Fréchet ChemNet distance: a metric for generative models for molecules in drug discovery. J. Chem. Inf. Model. 58, 1736–1741 (2018).
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B. & Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Adv. Neural Inf. Process. Syst. 30, 6626–6637 (2017).
Atz, K. et al. Prospective de novo drug design with deep interactome learning. Nat. Commun. 15, 3408 (2024).
Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–D1107 (2012).
Zeng, X. et al. Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework. Nat. Mach. Intell. 4, 1004–1016 (2022).
Pourpanah, F. et al. A review of generalized zero-shot learning methods. IEEE Trans. Pattern Anal. Mach. Intell. 45, 4051–4070 (2023).
Alhossary, A., Handoko, S. D., Mu, Y. & Kwoh, C.-K. Fast, accurate, and reliable molecular docking with QuickVina 2. Bioinformatics 31, 2214–2216 (2015).
Toyoda, Y. et al. Ligand binding to human prostaglandin E receptor EP4 at the lipid-bilayer interface. Nat. Chem. Biol. 15, 18–26 (2019).
Wu, W.-I. et al. Crystal structure of human AKT1 with an allosteric inhibitor reveals a new mode of kinase inhibition. PLoS ONE 5, e12913 (2010).
Gao, H. et al. ROCK inhibitors 2. Improving potency, selectivity and solubility through the application of rationally designed solubilizing groups. Bioorg. Med. Chem. Lett. 28, 2616–2621 (2018).
Salentin, S., Schreiber, S., Haupt, V. J., Adasme, M. F. & Schroeder, M. PLIP: fully automated protein–ligand interaction profiler. Nucleic Acids Res. 43, W443–W447 (2015).
DeLano, W. L. et al. PyMOL: an open-source molecular graphics tool. CCP4 Newsl. Protein Crystallogr. 40, 82–92 (2002).
Shang, E. et al. De novo design of multitarget ligands with an iterative fragment-growing strategy. J. Chem. Inf. Model. 54, 1235–1241 (2014).
Old, D. W. Therapeutic agents. US patent no. WO2014/179263 (2015).
Qian, Y. et al. MolScribe: robust molecular structure recognition with image-to-graph generation. J. Chem. Inf. Model. 63, 1925–1934 (2023).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).
Vaswani, A. et al. Attention is all you need. In Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017).
Song, J., Meng, C. & Ermon, S. Denoising diffusion implicit models. In 9th International Conference on Learning Representations 1–20 (ICLR, 2021).
Rawte, V., Sheth, A. P. & Das, A. A survey of hallucination in large foundation models. Preprint at https://arxiv.org/abs/2309.05922 (2023).
Ouyang, L. et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 35, 27730–27744 (2022).
Landrum, G. et al. RDKit: a software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8, 5281 (2013).
Wang, Z. Data used for SketchMol training. Figshare https://doi.org/10.6084/m9.figshare.27210429.v1 (2024).
Wang, Z. WangZiXubiubiu/SketchMol-v1: SketchMol. Zenodo https://doi.org/10.5281/zenodo.13937534 (2024).
Acknowledgements
X.Z. acknowledges support from the National Natural Science Foundation of China (grant nos. 62425204, 62450002, 62122025, U22A2037 and 62432011) and the Beijing Natural Science Foundation (grant no. L248013). Y.L. acknowledges support from the National Natural Science Foundation of China (grant no. 62372159). T.S. acknowledges support from the Japan Society for the Promotion of Science (grant no. JP23H03411) and the Japan Science and Technology Agency (grant no. JPMJPF2017). X.Y. acknowledges support from the Japan Society for the Promotion of Science (grant no. JP22K12144). We extend our sincere gratitude to X.Y., Z.Y., T.S., Y.L. and Y.C. for their invaluable feedback on paper writing and figures.
Author information
Authors and Affiliations
Contributions
Z.W. and X.Z. conceived the research project. Z.W., Y.C. and X.Z. designed and implemented the framework. Z.W., Y.L., X.Y., T.S. and X.Z. designed the experiments. Z.W., Y.C., P.M., Z.Y. and J.W. conducted the experiments and result analyses. Y.C. conducted the molecular dynamics simulation. All authors discussed the experimental results and commented on the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Kenneth Atz and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Sections 1–6, Figs. 1–13 and Tables 1–5.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Z., Chen, Y., Ma, P. et al. Image-based generation for molecule design with SketchMol. Nat Mach Intell 7, 244–255 (2025). https://doi.org/10.1038/s42256-025-00982-3
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s42256-025-00982-3
This article is cited by
-
Molecular pretraining models towards molecular property prediction
Science China Information Sciences (2025)