Abstract
Discovering novel drug candidate molecules is a fundamental step in drug development. Generative deep learning models can sample new molecular structures from learned probability distributions; however, their practical use in drug discovery hinges on generating compounds tailored to a specific target molecule. Here we introduce DrugGEN, an end-to-end generative system for the de novo design of drug candidate molecules that interact with a selected protein. The proposed method represents molecules as graphs and processes them using a generative adversarial network that comprises graph transformer layers. Trained on large datasets of drug-like compounds and target-specific bioactive molecules, DrugGEN designed candidate inhibitors for AKT1, a kinase crucial in many cancers. Docking and molecular dynamics simulations suggest that the generated compounds effectively bind to AKT1, and attention maps provide insights into the model’s reasoning. Furthermore, selected de novo molecules were synthesized and shown to inhibit AKT1 at low micromolar concentrations in the context of in vitro enzymatic assays. These results demonstrate the potential of DrugGEN for designing target-specific molecules. Using the open-access DrugGEN codebase, researchers can retrain the model for other druggable proteins, provided a dataset of known bioactive molecules is available.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
All resources required to reproduce this work are open source. The full training datasets (curated ChEMBL compound sets, AKT1 and CDK2 bioactivity records), the pretrained weight files for every DrugGEN model variant (targeted and non-targeted) and all result artefacts—including the complete sets of de novo-generated molecules, molecular docking output tables and MD trajectory analyses—are archived in the DrugGEN dataset repository and are available via figshare at https://doi.org/10.6084/m9.figshare.29119205.v3 (ref. 86). The data to reproduce the experiments and the output files are available via GitHub at https://github.com/HUBioDataLab/DrugGEN.
Code availability
The source code and ready-to-use trained models are available in the archived DrugGEN repository, which is available via GitHub at https://github.com/HUBioDataLab/DrugGEN and via Zenodo at https://doi.org/10.5281/zenodo.15014579 (ref. 87). DrugGEN is also available as an online tool with a graphical interface at https://huggingface.co/spaces/HUBioDataLab/DrugGEN, where users can generate de novo molecules by using the desired model.
References
Rifaioglu, A. S. et al. Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases. Brief. Bioinform. 20, 1878–1912 (2019).
Paul, S. M. et al. How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nat. Rev. Drug Discov. 9, 203–214 (2010).
Bhisetti, G. & Fang, C. Artificial intelligence–enabled de novo design of novel compounds that are synthesizable. Methods Mol. Biol. 2390, 409–419 (2022).
Elton, D. C., Boukouvalas, Z., Fuge, M. D. & Chung, P. W. Deep learning for molecular design—a review of the state of the art. Mol. Syst. Des. Eng. 4, 828–849 (2019).
Walters, W. P. Virtual chemical libraries: miniperspective. J. Med. Chem. 62, 1116–1124 (2018).
Mouchlis, V. D. et al. Advances in de novo drug design: from conventional to machine learning methods. Int. J. Mol. Sci. 22, 1676 (2021).
Kingma, D. P. & Welling, M. An introduction to variational autoencoders. Found. Trends Mach. Learn. 12, 307–392 (2019).
Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).
Goodfellow, I. et al. Generative adversarial networks. Commun. ACM 63, 139–144 (2020).
De Cao, N. & Kipf, T. MolGAN: an implicit generative model for small molecular graphs. In ICML 2018 Workshop onTheoretical Foundations and Applications of Deep Generative Models (2018).
Zou, J., Yu, J., Hu, P., Zhao, L. & Shi, S. STAGAN: an approach for improve the stability of molecular graph generation based on generative adversarial networks. Comput. Biol. Med. 167, 107691 (2023).
Mahmood, O., Mansimov, E., Bonneau, R. & Cho, K. Masked graph modeling for molecule generation. Nat. Commun. 12, 3156 (2021).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. & Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proc. 32nd International Conference on Machine Learning (eds Bach, F. & Blei, D.) 2256–2265 (PMLR, 2015).
Peng, X. et al. Pocket2Mol: efficient molecular sampling based on 3D protein pockets. In Proc. 39th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 17644–17655 (PMLR, 2022).
Schneuing, A. et al. Structure‑based drug design with equivariant diffusion models. Nat. Comput. Sci. 4, 899–909 (2024).
Hoogeboom, E., Satorras, V. G., Vignac, C. & Welling, M. Equivariant diffusion for molecule generation in 3D. In Proc. 39th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 8867–8887 (PMLR, 2022).
Mitton, J., Senn, H. M., Wynne, K. & Murray‑Smith, R. A graph VAE and graph transformer approach to generating molecular graphs. In ICML 2020 Workshop on Graph Representation Learning and Beyond (2020).
Richards, R. J. & Groener, A. M. Conditional β-VAE for de novo molecular generation. Preprint at https://arxiv.org/abs/2205.01592 (2022).
Nemoto, K. & Kaneko, H. De novo direct inverse QSPR/QSAR: chemical variational autoencoder and Gaussian mixture regression models. J. Chem. Inf. Model. 63, 794–805 (2023).
Kadurin, A., Nikolenko, S., Khrabrov, K., Aliper, A. & Zhavoronkov, A. druGAN: an advanced generative adversarial autoencoder model for de novo generation of new molecules with desired molecular properties in silico. Mol. Pharm. 14, 3098–3104 (2017).
Xie, X., Valiente, P. A. & Kim, P. M. HelixGAN a deep-learning methodology for conditional de novo design of α-helix structures. Bioinformatics 39, btad036 (2023).
Arús-Pous, J. et al. Randomized SMILES strings improve the quality of molecular generative models. J. Cheminform. 11, 71 (2019).
Blaschke, T. et al. REINVENT 2.0: an AI tool for de novo drug design. J. Chem. Inf. Model. 60, 5918–5922 (2020).
Wang, X. et al. PETrans: de novo drug design with protein-specific encoding based on transfer learning. Int. J. Mol. Sci. 24, 1146 (2023).
Bagal, V., Aggarwal, R., Vinod, P. K. & Priyakumar, U. D. MolGPT: molecular generation using a transformer-decoder model. J. Chem. Inf. Model. 62, 2064–2076 (2022).
Yang, M. et al. CMGN: a conditional molecular generation net to design target-specific molecules with desired properties. Brief. Bioinform. 24, bbad185 (2023).
Zhang, O. et al. ResGen is a pocket-aware 3D molecular generation model based on parallel multiscale modelling. Nat. Mach. Intell. 5, 1020–1030 (2023).
Guan, J. et al. 3D equivariant diffusion for target-aware molecule generation and affinity prediction. In Eleventh International Conference on Learning Representations (OpenReview.net, 2023); https://openreview.net/forum?id=kJqXEPXMsE0
Perron, Q. et al. Deep generative models for ligand‐based de novo design applied to multi‐parametric optimization. J. Comput. Chem. 43, 692–703 (2022).
Fang, Y., Pan, X. & Shen, H.-B. De novo drug design by iterative multiobjective deep reinforcement learning with graph-based molecular quality assessment. Bioinformatics 39, btad157 (2023).
Zhou, Z., Kearnes, S., Li, L., Zare, R. N. & Riley, P. Optimization of molecules via deep reinforcement learning. Sci. Rep. 9, 10752 (2019).
Liu, M., Luo, Y., Uchino, K., Maruhashi, K. & Ji, S. Generating 3D molecules for target protein binding. In Proc. 39th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 13912–13924 (PMLR, 2022).
Wang, M. et al. RELATION: a deep generative model for structure-based de novo drug design. J. Med. Chem. 65, 9478–9492 (2022).
Gebauer, N. W. A., Gastegger, M., Hessmann, S. S. P., Müller, K.-R. & Schütt, K. T. Inverse design of 3D molecular structures with conditional generative neural networks. Nat. Commun. 13, 973 (2022).
Shi, W. et al. Pocket2Drug: an encoder-decoder deep neural network for the target-based drug design. Front. Pharmacol. 13, 837715 (2022).
Uludoğan, G., Ozkirimli, E., Ulgen, K. O., Karalı, N. & Özgür, A. Exploiting pretrained biochemical language models for targeted drug design. Bioinformatics 38, ii155–ii161 (2022).
Rozenberg, E. & Freedman, D. Semi-equivariant conditional normalizing flows, with applications to target-aware molecule generation. Mach. Learn. Sci. Technol. 4, 035037 (2023).
Li, Y. et al. Generative deep learning enables the discovery of a potent and selective RIPK1 inhibitor. Nat. Commun. 13, 6891 (2022).
Zhang, Y. et al. Universal approach to de novo drug design for target proteins using deep reinforcement learning. ACS Omega 8, 5464–5474 (2023).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 6000–6010 (2017).
Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In Proc. 5th International Conference on Learning Representations (OpenReview.net, 2017).
Li, P. et al. An effective self-supervised framework for learning expressive molecular global representations to drug discovery. Brief. Bioinform. 22, bbab109 (2021).
Rong, Y. et al. Self-supervised graph transformer on large-scale molecular data. Adv. Neural Inf. Process. Syst. 33, 12559–12571 (2020).
Li, H. et al. A knowledge-guided pre-training framework for improving molecular representation learning. Nat. Commun. 14, 7568 (2023).
Zhu, J.-Y., Park, T., Isola, P. & Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proc. IEEE International Conference on Computer Vision 2223–2232 (IEEE, 2017).
Kim, T., Cha, M., Kim, H., Lee, J. K. & Kim, J. Learning to discover cross-domain relations with generative adversarial networks. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 1857–1865 (PMLR, 2017).
Addie, M. et al. Discovery of 4-amino-N-[(1S)-1-(4-chlorophenyl)-3-hydroxypropyl]-1-(7H-pyrrolo[2,3-d]pyrimidin-4-yl)piperidine-4-carboxamide (AZD5363), an orally bioavailable, potent inhibitor of Akt kinases. J. Med. Chem. 56 2059–2073 (2013).
Polykovskiy, D. et al. Molecular Sets (MOSES): a benchmarking platform for molecular generation models. Front. Pharmacol. 11, 565644 (2020).
Tadesse, S. et al. Targeting CDK2 in cancer: challenges and opportunities for therapy. Drug Discov. Today 25, 406–413 (2020).
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).
Van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Rifaioglu, A. S. et al. DEEPScreen: high performance drug–target interaction prediction with convolutional neural networks using 2-D structural compound representations. Chem. Sci. 11, 2531–2557 (2020).
Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009).
Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).
Abeer, A. N. M. N., Urban, N. M., Weil, M. R., Alexander, F. J. & Yoon, B.-J. Multi-objective latent space optimization of generative molecular design models. Patterns 5, 101042 (2024).
Jain, M. et al. Multi‑objective GFlowNets. In Proc. 40th International Conference on Machine Learning (eds Krause, A. et al.) 14631–14653 (PMLR, 2023).
Monteiro, N. R. C. et al. FSM-DDTR: end-to-end feedback strategy for multi-objective de novo drug design using transformers. Comput. Biol. Med. 164, 107285 (2023).
Suzuki, T., Ma, D., Yasuo, N. & Sekijima, M. Mothra: multiobjective de novo molecular generation using Monte Carlo tree search. J. Chem. Inf. Model. 64, 7291–7302 (2024).
Ghosh, B., Dutta, I. K., Totaro, M. & Bayoumi, M. A survey on the progression and performance of generative adversarial networks. In 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT) 1–8 (IEEE, 2020).
Gui, J., Sun, Z., Wen, Y., Tao, D. & Ye, J. A review on generative adversarial networks: algorithms, theory, and applications. IEEE Trans. Knowl. Data Eng. 35, 3313–3332 (2023).
Janson, G., Valdes-Garcia, G., Heo, L. & Feig, M. Direct generation of protein conformational ensembles via machine learning. Nat. Commun. 14, 774 (2023).
Arjovsky, M., Chintala, S. & Bottou, L. Wasserstein generative adversarial networks. In Proc. 34th International Conference on Machine Learning (Precup, D. & Teh, Y. W.) 214–223 (PMLR, 2017).
Krenn, M., Häse, F., Nigam, A., Friederich, P. & Aspuru-Guzik, A. Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation. Mach. Learn. Sci. Technol. 1, 045024 (2020).
Yüksel, A., Ulusoy, E., Ünlü, A. & Doğan, T. Selformer: molecular representation learning via SELFIES language models. Mach. Learn. Sci. Technol. 4, 035014 (2023).
Doğan, T. et al. CROssBAR: comprehensive resource of biomedical relations with knowledge graph representations. Nucleic Acids Res. 49, e96 (2021).
Mendez, D. et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 47, D930–D940 (2019).
Wishart, D. S. et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 46, D1074–D1082 (2018).
Landrum, G. et al. rdkit/rdkit: 2024_09_6 (Q3 2024) Release (Release_2024_09_6). Zenodo https://doi.org/10.5281/zenodo.14943932 (2025).
Dwivedi, V. P. & Bresson, X. A generalization of transformer networks to graphs. In AAAI Workshop on Deep Learning on Graphs: Methods and Applications (2021).
Vignac, C. et al. DiGress: discrete denoising diffusion for graph generation. In Eleventh International Conference on Learning Representations (OpenReview.net, 2023); https://openreview.net/forum?id=UaAD-Nu86WX
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V. & Courville, A. C. Improved training of Wasserstein GANs. Adv. Neural Inf. Process. Syst. 30, 5769–5779 (2017).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In International Conference on Learning Representations (OpenReview.net, 2019); https://openreview.net/forum?id=Bkg6RiCqY7
Schoenmaker, L., Béquignon, O. J. M., Jespers, W. & Van Westen, G. J. P. UnCorrupt SMILES: a novel approach to de novo design. J. Cheminform. 15, 22 (2023).
Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).
Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 23, 3–25 (1997).
Veber, D. F. et al. Molecular properties that influence the oral bioavailability of drug candidates. J. Med. Chem. 45, 2615–2623 (2002).
Baell, J. B. & Holloway, G. A. New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J. Med. Chem. 53, 2719–2740 (2010).
Madhavi Sastry, G., Adzhigirey, M., Day, T., Annabhimoju, R. & Sherman, W. Protein and ligand preparation: parameters, protocols, and influence on virtual screening enrichments. J. Comput. Aided Mol. Des. 27, 221–234 (2013).
Schrödinger Release 2022-1: Maestro (Schrödinger, 2022).
Friesner, R. A. et al. Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein−ligand complexes. J. Med. Chem. 49, 6177–6196 (2006).
Martin, M. P., Olesen, S. H., Georg, G. I. & Schönbrunn, E. Cyclin-dependent kinase inhibitor dinaciclib interacts with the acetyl-lysine recognition site of bromodomains. ACS Chem. Biol. 8, 2360–2365 (2013).
The PyMOL molecular graphics system (version 1.8) (Schrödinger, 2015).
Durant, J. L., Leland, B. A., Henry, D. R. & Nourse, J. G. Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42, 1273–1280 (2002).
Yang, K. et al. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 59, 3370–3388 (2019).
Ünlü, A., Çevrim, E., Yiğit, M. G., Olğaç, A., & Doğan, T. DrugGEN resource collection: training data, model weights, generated molecules, docking and MD analyses (version 3). figshare https://doi.org/10.6084/m9.figshare.29119205.v3 (2025).
Ünlü, A., Çevrim, E., Yigit, M. G., Sarigun, A., & Dogan, T. HUBioDataLab/DrugGEN: DrugGEN v2.0 release (v2.0). Zenodo https://doi.org/10.5281/zenodo.15014579 (2025).
Guimaraes, G. L., Sanchez-Lengeling, B., Outeiral, C., Farias, P. L. C. & Aspuru-Guzik, A. Objective-reinforced generative adversarial networks (ORGAN) for sequence generation models. Preprint at https://arxiv.org/abs/1705.10843 (2018).
Grisoni, F., Moret, M., Lingwood, R. & Schneider, G. Bidirectional molecule generation with recurrent neural networks. J. Chem. Inf. Model. 60, 1175–1183 (2020).
Xie, Y. et al. MARS: Markov molecular sampling for multi-objective drug discovery. In International Conference on Learning Representations 1–19 (ICLR, 2021).
Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminform. 9, 48 (2017).
Matsukiyo, Y., Yamanaka, C. & Yamanishi, Y. De novo generation of chemical structures of inhibitor and activator candidates for therapeutic target proteins by a transformer-based variational autoencoder and Bayesian optimization. J. Chem. Inf. Model. 64, 2345–2355 (2024).
Acknowledgements
This project was supported by TUBITAK-BIDEB 2247-A National Outstanding Researchers Program under project no. 120C123. We thank H. A. Güvenilir for guidance during the preparation of datasets and A. Koyaş for aiding the target protein selection process.
Author information
Authors and Affiliations
Contributions
T.D. conceptualized the study and designed the general methodology. E.Ç. prepared the datasets. A.S., A.Ü., A.S.R. and T.D. determined the technical details of the fundamental model architecture. A.Ü. and A.S. prepared the original codebase and designed and implemented the initial models. A.Ü. and M.G.Y. designed, implemented, trained, tuned and evaluated numerous model variants and constructed the finalized DrugGEN models. A.O. and E.Ç. conducted the molecular filtering operations and physics-based (docking and MD) experiments. A.Ü. and H.Ç. analysed the de novo-generated molecules in the context of deep learning-based DTI prediction. M.G.Y., E.Ç. and T.D. conducted the attention map analysis. A.Ü., E.Ç., A.O., E.B. and T.D. evaluated and discussed the findings. E.Ç., A.Ü., A.S., A.O. and T.D. visualized the results and prepared the figures in the paper. A.Ü., E.Ç., M.G.Y., A.S., D.C.K., A.O. and T.D. wrote the paper. A.Ü., E.Ç., A.S., M.G.Y. and T.D. prepared the repository. O.B. and M.G.Y. constructed the online tool. T.D. supervised the overall study. All authors approved the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Artur Kadurin, Pengyong Li and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1
30 promising de novo molecules to effectively target AKT1 protein (generated by DrugGEN model), selected via expert curation from the dataset of molecules with sufficiently low binding free energies (< −8 kcal/mol) in the molecular docking experiment and deep learning-based DTI predictions (by DEEPScreen 59) as ‘active’.
Extended Data Fig. 2 Structural analysis of capivasertib (bound ligand in 4GV1) and five de novo generated molecules (Mol. ID 1–5) that were selected for experimental validation.
(a) Initial binding orientations of capivasertib, Molecules 1 and 2 at the starting point of molecular dynamics (MD) simulations. (b) Key protein-ligand interactions observed during MD simulations, visualised with interacting residues and interaction types. The depicted poses represent the most populated conformations from each simulation. (c) Root-mean-square deviation (RMSD) values of capivasertib, Molecules 1 and 2 in complex with AKT1. (d) Root-mean-square fluctuation (RMSF) values of ligand atoms in the same complexes. Abbreviations: I-VII represent β-sheet numbers, g.l represents glycine-rich loop, c.l represents catalytic loop, GK represents gatekeeper residue, and xDFG represents highly conserved kinase residues; linker represents the loop that connects the hinge domain to αChelix. Gray dashed lines represent Van der Waals interactions. Blue lines represent hydrogen bonds and water bridges. Green lines indicate halogen bonds. Yellow dashed lines represent salt bridges. Directional interactions were noted only when the occupancy value exceeded 10%; however, for visual clarity, occupancy values of the water bridges were stated only when they exceeded 30%.
Supplementary information
Supplementary Information
Supplementary Sections 1–13, Figs. 1–17 and Tables 1–3.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ünlü, A., Çevrim, E., Yiğit, M.G. et al. Target-specific de novo design of drug candidate molecules with graph-transformer-based generative adversarial networks. Nat Mach Intell 7, 1524–1540 (2025). https://doi.org/10.1038/s42256-025-01082-y
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s42256-025-01082-y