Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review
  • Published:

Advancing active compound discovery for novel drug targets: insights from AI-driven approaches

Abstract

The discovery of active compounds for novel, underexplored targets is essential for advancing innovative therapeutics across a wide range of diseases. Recent advancements in artificial intelligence (AI) are revolutionizing active compound discovery by dramatically enhancing the efficiency, accuracy, and scalability previously challenged by traditional methods. This review provides a comprehensive overview of AI-driven methodologies for active compound discovery, with a particular focus on their application to novel targets. Initially, we explore how AI overcomes traditional bottlenecks in molecular design, enabling precise protein perception through high-accuracy protein structure prediction and enhanced docking precision. Building upon these target-focused capabilities, AI-driven approaches also advance ligand exploration, effectively bridging biological and chemical spaces through sophisticated data transfer techniques that maximize the utility of available activity data. By assessing overall cellular or organismal responses, AI plays a pivotal role in decoding complex biological systems, driving phenotypic drug discovery (PDD) through multi-modal data integration. Finally, we discuss how AI is addressing challenges associated with targeting previously undruggable proteins, exemplified by the development of protein degraders. By synthesizing these cutting-edge advancements, this review serves as a valuable resource for researchers seeking to leverage AI in the discovery of next-generation therapeutics.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

This study does not involve the generation of new data. Therefore, data sharing is not applicable.

References

  1. Zhong F, Xing J, Li X, Liu X, Fu Z, Xiong Z, et al. Artificial intelligence in drug design. Sci China Life Sci. 2018;61:1191–204.

    Article  PubMed  Google Scholar 

  2. Sun D, Gao W, Hu H, Zhou S. Why 90% of clinical drug development fails and how to improve it? Acta Pharm Sin B. 2022;12:3049–62.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Hwang TJ, Carpenter D, Lauffenburger JC, Wang B, Franklin JM, Kesselheim AS. Failure of investigational drugs in late-stage clinical development and publication of trial results. JAMA Intern Med. 2016;176:1826.

    Article  PubMed  Google Scholar 

  4. Xie X, Yu T, Li X, Zhang N, Foster LJ, Peng C, et al. Recent advances in targeting the “undruggable” proteins: from drug discovery to clinical trials. Signal Transduct Target Ther. 2023;8:1–71.

    PubMed  PubMed Central  Google Scholar 

  5. Békés M, Langley DR, Crews CM. PROTAC targeted protein degraders: the past is prologue. Nat Rev Drug Discov. 2022;21:181–200.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Sasso JM, Tenchov R, Wang D, Johnson LS, Wang X, Zhou QA. Molecular glues: the adhesive connecting targeted protein degradation to the clinic. Biochemistry. 2023;62:601–23.

    Article  CAS  PubMed  Google Scholar 

  7. Fu Z, Li S, Han S, Shi C, Zhang Y. Antibody drug conjugate: the “biological missile” for targeted cancer therapy. Signal Transduct Target Ther. 2022;7:93.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Chen J, Kriwacki RW. Intrinsically disordered proteins: structure, function and therapeutics. J Mol Biol. 2018;430:2275–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Bonomi M, Heller GT, Camilloni C, Vendruscolo M. Principles of protein structural ensemble determination. Curr Opin Struct Biol. 2017;42:106–16.

    Article  CAS  PubMed  Google Scholar 

  10. Opella SJ, Marassi FM. Structure determination of membrane proteins by NMR spectroscopy. Chem Rev. 2004;104:3587–606.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Corsello SM, Nagari RT, Spangler RD, Rossen J, Kocak M, Bryan JG, et al. Discovering the anticancer potential of non-oncology drugs by systematic viability profiling. Nat Cancer. 2020;1:235–48.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Hasin Y, Seldin M, Lusis A. Multi-omics approaches to disease. Genome Biol. 2017;18:83.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Zhou Y, Zhang Y, Zhao D, Yu X, Shen X, Zhou Y, et al. TTD: therapeutic target database describing target druggability information. Nucleic Acids Res. 2024;52:D1465–77.

    Article  PubMed  Google Scholar 

  14. Brown DG, Wobst HJ. A decade of FDA-approved drugs (2010–2019): trends and future directions. J Med Chem. 2021;64:2312–38.

    Article  CAS  PubMed  Google Scholar 

  15. Sabe VT, Ntombela T, Jhamba LA, Maguire GEM, Govender T, Naicker T, et al. Current trends in computer-aided drug design and a highlight of drugs discovered via computational techniques: a review. Eur J Med Chem. 2021;224:113705.

    Article  CAS  PubMed  Google Scholar 

  16. Ren F, Ding X, Zheng M, Korzinkin M, Cai X, Zhu W, et al. AlphaFold accelerates artificial intelligence powered drug discovery: efficient discovery of a novel CDK20 small molecule inhibitor. Chem Sci. 2023;14:1443–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee GR, et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021;373:871–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023;379:1123–30.

    Article  CAS  PubMed  Google Scholar 

  20. Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630:493–500.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Krishna R, Wang J, Ahern W, Sturmfels P, Venkatesh P, Kalvet I, et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science. 2024;384:eadl2528.

    Article  CAS  PubMed  Google Scholar 

  22. Bryant P, Kelkar A, Guljas A, Clementi C, Noé F. Structure prediction of protein-ligand complexes from sequence information with Umol. Nat Commun. 2024;15:4536.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Corso G, Stärk H, Jing B, Barzilay R, Jaakkola T. DiffDock: diffusion steps, twists, and turns for molecular docking. arXiv; 2022. Available from: https://arxiv.org/abs/2210.01776.

  24. Masters MR, Mahmoud AH, Lill MA. Do deep learning models for co-folding learn the physics of protein-ligand interactions? Bioinformatics. 2024. Available from: http://biorxiv.org/lookup/doi/10.1101/2024.06.03.597219.

  25. Brotzakis ZF, Zhang S, Murtada MH, Vendruscolo M. AlphaFold prediction of structural ensembles of disordered proteins. Nat Commun. 2025;16:1632.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Eberhardt J, Santos-Martins D, Tillack AF, Forli S. AutoDock vina 1.2.0: new docking methods, expanded force field, and Python bindings. J Chem Inf Model. 2021;61:3891–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ, Mainz DT, et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J Med Chem. 2004;47:1739–49.

    Article  CAS  PubMed  Google Scholar 

  28. Allen WJ, Balius TE, Mukherjee S, Brozell SR, Moustakas DT, Lang PT, et al. DOCK 6: impact of new features and current docking performance. J Comput Chem. 2015;36:1132–56.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Wallach I, Dzamba M, Heifets A. AtomNet: a deep convolutional neural network for bioactivity prediction in structure-based drug discovery. arXiv; 2015. Available from: https://arxiv.org/abs/1510.02855.

  30. Shen C, Zhang X, Deng Y, Gao J, Wang D, Xu L, et al. Boosting protein–ligand binding pose prediction and virtual screening based on residue–atom distance likelihood potential and graph transformer. J Med Chem. 2022;65:10691–706.

    Article  CAS  PubMed  Google Scholar 

  31. Cao D, Chen G, Jiang J, Yu J, Zhang R, Chen M, et al. Generic protein–ligand interaction scoring by integrating physical prior knowledge and data augmentation modelling. Nat Mach Intell. 2024;6:688–700.

  32. Li Y, Li L, Wang S, Tang X. EQUIBIND: a geometric deep learning-based protein-ligand binding prediction method. Drug Discov Ther. 2023;17:363–4.

    Article  CAS  PubMed  Google Scholar 

  33. Lu W, Wu Q, Zhang J, Rao J, Li C, Zheng S. Tankbind: Trigonometry-aware neural networks for drug-protein binding structure prediction. Adv Neural Inf Process Syst. 2022;35:7236–49.

  34. Cao D, Chen M, Zhang R, Wang Z, Huang M, Yu J, et al. SurfDock is a surface-informed diffusion generative model for reliable and accurate protein–ligand complex prediction. Nat Methods. 2025;22:310–22.

  35. Prašnikar E, Ljubič M, Perdih A, Borišek J. Machine learning heralding a new development phase in molecular dynamics simulations. Artif Intell Rev. 2024;57:102.

    Article  Google Scholar 

  36. Wang T, He X, Li M, Li Y, Bi R, Wang Y, et al. Ab initio characterization of protein molecular dynamics with AI2BMD. Nature. 2024;635:1019–27.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Ni Y, Feng S, Hong X, Sun Y, Ma WY, Ma ZM, et al. Pre-training with fractional denoising to enhance molecular property prediction. Nat Mach Intell. 2024;6:1169–78.

    Article  Google Scholar 

  38. Guo AZ, Sevgen E, Sidky H, Whitmer JK, Hubbell JA, De Pablo JJ. Adaptive enhanced sampling by force-biasing using neural networks. J Chem Phys. 2018;148:134108.

    Article  PubMed  Google Scholar 

  39. Comer J, Gumbart JC, Hénin J, Lelièvre T, Pohorille A, Chipot C. The adaptive biasing force method: everything you always wanted to know but were afraid to ask. J Phys Chem B. 2015;119:1129–51.

    Article  CAS  PubMed  Google Scholar 

  40. Chen X, Wang K, Chen J, Wu C, Mao J, Song Y, et al. Integrative residue-intuitive machine learning and MD approach to unveil allosteric site and mechanism for β2AR. Nat Commun. 2024;15:8130.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Do HN, Wang J, Bhattarai A, Miao Y. GLOW: a workflow integrating gaussian-accelerated molecular dynamics and deep learning for free energy profiling. J Chem Theory Comput. 2022;18:1423–36.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Bhowmik D, Gao S, Young MT, Ramanathan A. Deep clustering of protein folding simulations. BMC Bioinforma. 2018;19:484.

    Article  CAS  Google Scholar 

  43. Brandt S, Sittel F, Ernst M, Stock G. Machine learning of biomolecular reaction coordinates. J Phys Chem Lett. 2018;9:2144–50.

    Article  CAS  PubMed  Google Scholar 

  44. Lemke T, Peter C. EncoderMap: dimensionality reduction and generation of molecule conformations. J Chem Theory Comput. 2019;15:1209–15.

    Article  CAS  PubMed  Google Scholar 

  45. Lewis S, Hempel T, Jiménez Luna J, Gastegger M, Xie Y, Foong AYK, et al. Scalable emulation of protein equilibrium ensembles with generative deep learning. Mol Biol. 2024. Available from: http://biorxiv.org/lookup/doi/10.1101/2024.12.05.626885.

  46. Karamzadeh R, Karimi-Jafari MH, Sharifi-Zarchi A, Chitsaz H, Salekdeh GH, Moosavi-Movahedi AA. Machine learning and network analysis of molecular dynamics trajectories reveal two chains of red/Ox-specific residue interactions in human protein disulfide isomerase. Sci Rep. 2017;7:3666.

    Article  PubMed  PubMed Central  Google Scholar 

  47. Ward MD, Zimmerman MI, Meller A, Chung M, Swamidass SJ, Bowman GR. Deep learning the structural determinants of protein biochemical properties by comparing structural ensembles with DiffNets. Nat Commun. 2021;12:3023.

    Article  PubMed  PubMed Central  Google Scholar 

  48. Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V. Deep neural nets as a method for quantitative structure–activity relationships. J Chem Inf Model. 2015;55:263–74.

    Article  CAS  PubMed  Google Scholar 

  49. Rose D, Wieder O, Seidel T, Langer T. PharmacoMatch: efficient 3D pharmacophore screening through neural subgraph matching. arXiv; 2024. Available from: http://arxiv.org/abs/2409.06316.

  50. Suo Y, Qian X, Xiong Z, Liu X, Wang C, Mu B, et al. Enhancing the predictive power of machine learning models through a chemical space complementary DEL screening strategy. J Med Chem. 2024;67:18969–80.

    Article  CAS  PubMed  Google Scholar 

  51. Li X, Fourches D. Inductive transfer learning for molecular activity prediction: next-Gen QSAR models with MolPMoFiT. J Cheminformatics. 2020;12:27.

    Article  CAS  Google Scholar 

  52. Upadhyay R, Phlypo R, Saini R, Liwicki M. Sharing to learn and learning to share; fitting together meta, multi-task, and transfer learning: a meta review. IEEE Access. 2024;12:148553–76.

    Article  Google Scholar 

  53. Ross J, Belgodere B, Chenthamarakshan V, Padhi I, Mroueh Y, Das P. Large-scale chemical language representations capture molecular structure and properties. Nat Mach Intell. 2022;4:1256–64.

    Article  Google Scholar 

  54. Cao Z, Sciabola S, Wang Y. Large-scale pretraining improves sample efficiency of active learning-based virtual screening. J Chem Inf Model. 2024;64:1882–91.

    Article  CAS  PubMed  Google Scholar 

  55. Chen S, Zhong F. GPCRSPACE: a new GPCR real expanded library based on large language models architecture and positive sample machine learning strategies. J Med Chem. 2024;67:16912–22.

    Article  CAS  PubMed  Google Scholar 

  56. Zhu J, Xia Y, Wu L, Xie S, Zhou W, Qin T, et al. Dual-view Molecular Pre-training. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, NY, USA: Association for Computing Machinery; 2023. p. 3615–27. (KDD ’23). Available from: https://doi.org/10.1145/3580305.3599317.

  57. Pei Q, Wu L, Zhu J, Xia Y, Xie S, Qin T, et al. Breaking the barriers of data scarcity in drug–target affinity prediction. Brief Bioinform. 2023;24:bbad386.

    Article  PubMed  Google Scholar 

  58. Li Z, Li X, Liu X, Fu Z, Xiong Z, Wu X, et al. KinomeX: a web application for predicting kinome-wide polypharmacology effect of small molecules. Bioinformatics. 2019;35:5354–6.

    Article  CAS  PubMed  Google Scholar 

  59. Li Z, Qu N, Zhou J, Sun J, Ren Q, Meng J, et al. KinomeMETA: a web platform for kinome-wide polypharmacology profiling with meta-learning. Nucleic Acids Res. 2024;52:W489–97.

    Article  PubMed  PubMed Central  Google Scholar 

  60. Lv Q, Chen G, Yang Z, Zhong W, Chen CYC. Meta learning with graph attention networks for low-data drug discovery. IEEE Trans Neural Netw Learn Syst. 2024;35:11218–30.

    Article  PubMed  Google Scholar 

  61. Wu Y, Xie L, Liu Y, Xie L. Semi-supervised meta-learning elucidates understudied molecular interactions. Commun Biol. 2024;7:1104.

    Article  PubMed  PubMed Central  Google Scholar 

  62. Adams K, Abeywardane K, Fromer J, Coley CW. ShEPhERD: diffusing shape, electrostatics, and pharmacophores for bioisosteric drug design. arXiv; 2025. Available from: http://arxiv.org/abs/2411.04130.

  63. Wang L, Wang S, Yang H, Li S, Wang X, Zhou Y, et al. Conformational space profiling enhances generic molecular representation for AI‐powered ligand‐based drug discovery. Adv Sci. 2024;11:2403998.

    Article  CAS  Google Scholar 

  64. Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, et al. The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science. 2006;313:1929–35.

    Article  CAS  PubMed  Google Scholar 

  65. Madhukar NS, Khade PK, Huang L, Gayvert K, Galletti G, Stogniew M, et al. A Bayesian machine learning approach for drug target identification using diverse data types. Nat Commun. 2019;10:5221.

    Article  PubMed  PubMed Central  Google Scholar 

  66. Zhao T, Hu Y, Valsdottir LR, Zang T, Peng J. Identifying drug–target interactions based on graph convolutional network and deep neural network. Brief Bioinform. 2021;22:2141–50.

    Article  CAS  PubMed  Google Scholar 

  67. Zhong F, Wu X, Yang R, Li X, Wang D, Fu Z, et al. Drug target inference by mining transcriptional data using a novel graph convolutional network framework. Protein Cell. 2022;13:281–301.

    Article  CAS  PubMed  Google Scholar 

  68. Chen H, King FJ, Zhou B, Wang Y, Canedy CJ, Hayashi J, et al. Drug target prediction through deep learning functional representation of gene signatures. Nat Commun. 2024;15:1853.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Himmelstein DS, Zietz M, Rubinetti V, Kloster K, Heil BJ, Alquaddoomi F, et al. Hetnet connectivity search provides rapid insights into how biomedical entities are related. GigaScience. 2023;12:giad047.

  70. Zhang Y, Sui X, Pan F, Yu K, Li K, Tian S, et al. A comprehensive large-scale biomedical knowledge graph for AI-powered data-driven biomedical research. Nat Mach Intell. 2025;7:602–14.

  71. Zheng S, Rao J, Song Y, Zhang J, Xiao X, Fang EF, et al. PharmKG: a dedicated knowledge graph benchmark for bomedical data mining. Brief Bioinform. 2021;22:bbaa344.

    Article  PubMed  Google Scholar 

  72. Chandak P, Huang K, Zitnik M. Building a knowledge graph to enable precision medicine. Sci Data. 2023;10:67.

    Article  PubMed  PubMed Central  Google Scholar 

  73. Zeng X, Zhu S, Lu W, Liu Z, Huang J, Zhou Y, et al. Target identification among known drugs by deep learning from heterogeneous networks. Chem Sci. 2020;11:1775–97.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Ni S, Kong X, Zhang Y, Chen Z, Wang Z, Fu Z, et al. Identifying compound-protein interactions with knowledge graph embedding of perturbation transcriptomics. Cell Genomics. 2024;4. Available from: https://www.cell.com/cell-genomics/abstract/S2666-979X(24)00266-0.

  75. Song Q, Li M, Li Q, Lu X, Song K, Zhang Z, et al. DeepAlloDriver: a deep learning-based strategy to predict cancer driver mutations. Nucleic Acids Res. 2023;51:W129–33.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Kamimoto K, Stringa B, Hoffmann CM, Jindal K, Solnica-Krezel L, Morris SA. Dissecting cell identity via network inference and in silico gene perturbation. Nature. 2023;614:742–51.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Aibar S, González-Blas CB, Moerman T, Huynh-Thu VA, Imrichova H, Hulselmans G, et al. SCENIC: single-cell regulatory network inference and clustering. Nat Methods. 2017;14:1083–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Zhu J, Wang J, Wang X, Gao M, Guo B, Gao M, et al. Prediction of drug efficacy from transcriptional profiles with deep learning. Nat Biotechnol. 2021;39:1444–52.

    Article  CAS  PubMed  Google Scholar 

  79. Lotfollahi M, Klimovskaia Susmelj A, De Donno C, Hetzel L, Ji Y, Ibarra IL, et al. Predicting cellular responses to complex perturbations in high‐throughput screens. Mol Syst Biol. 2023;19:e11517.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Roohani Y, Huang K, Leskovec J. Predicting transcriptional outcomes of novel multigene perturbations with GEARS. Nat Biotechnol. 2024;42:927–35.

    Article  CAS  PubMed  Google Scholar 

  81. Tong X, Qu N, Kong X, Ni S, Zhou J, Wang K, et al. Deep representation learning of chemical-induced transcriptional profile for phenotype-based drug discovery. Nat Commun. 2024;15:5378.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Yang F, Wang W, Wang F, Fang Y, Tang D, Huang J, et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat Mach Intell. 2022;4:852–66.

    Article  Google Scholar 

  83. Theodoris CV, Xiao L, Chopra A, Chaffin MD, Al Sayed ZR, Hill MC, et al. Transfer learning enables predictions in network biology. Nature. 2023;618:616–24.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Hao M, Gong J, Zeng X, Liu C, Guo Y, Cheng X, et al. Large-scale foundation model on single-cell transcriptomics. Nat Methods. 2024;21:1481–91.

    Article  CAS  PubMed  Google Scholar 

  85. Cui H, Wang C, Maan H, Pang K, Luo F, Duan N, et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat Methods. 2024;21:1470–80.

    Article  CAS  PubMed  Google Scholar 

  86. Bray MA, Singh S, Han H, Davis CT, Borgeson B, Hartland C, et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat Protoc. 2016;11:1757–74.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Rohban MH, Fuller AM, Tan C, Goldstein JT, Syangtan D, Gutnick A, et al. Virtual screening for small-molecule pathway regulators by image-profile matching. Cell Syst. 2022;13:724–736.e9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Simm J, Klambauer G, Arany A, Steijaert M, Wegner JK, Gustin E, et al. Repurposing high-throughput image assays enables biological activity prediction for drug discovery. Cell Chem Biol. 2018;25:611–618.e3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Hofmarcher M, Rumetshofer E, Clevert DA, Hochreiter S, Klambauer G. Accurate prediction of biological assays with high-throughput microscopy images and convolutional networks. J Chem Inf Model. 2019;59:1163–71.

    Article  CAS  PubMed  Google Scholar 

  90. Bray MA, Gustafsdottir SM, Rohban MH, Singh S, Ljosa V, Sokolnicki KL, et al. A dataset of images and morphological profiles of 30,000 small-molecule treatments using the Cell Painting assay. GigaScience. 2017;6:giw014.

    Article  Google Scholar 

  91. Zdrazil B, Felix E, Hunter F, Manners EJ, Blackshaw J, Corbett S, et al. The ChEMBL Ddatabase in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res. 2024;52:D1180–92.

    Article  CAS  PubMed  Google Scholar 

  92. Kusumoto D, Seki T, Sawada H, Kunitomi A, Katsuki T, Kimura M, et al. Anti-senescent drug screening by deep learning-based morphology senescence scoring. Nat Commun. 2021;12:257.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Janssens R, Zhang X, Kauffmann A, De Weck A, Durand EY. Fully unsupervised deep mode of action learning for phenotyping high-content cellular images. Cowen L, editor. Bioinformatics. 2021;37:4548–55.

    Article  CAS  PubMed  Google Scholar 

  94. Caie PD, Walls RE, Ingleston-Orme A, Daya S, Houslay T, Eagle R, et al. High-content phenotypic profiling of drug response signatures across distinct cancer cells. Mol Cancer Ther. 2010;9:1913–26.

    Article  CAS  PubMed  Google Scholar 

  95. Perakis A, Gorji A, Jain S, Chaitanya K, Rizza S, Konukoglu E. Contrastive learning of single-cell phenotypic representations for treatment classification. In: Lian C, Cao X, Rekik I, Xu X, Yan P, editors. Machine Learning in Medical Imaging. Cham: Springer International Publishing; 2021. p. 565–75.

  96. Lu SZ, Lu Z, Hajiramezanali E, Biancalani T, Bengio Y, Scalia G, et al. Cell morphology-guided small molecule generation with GFlowNets. arXiv; 2024. Available from: https://arxiv.org/abs/2408.05196.

  97. Marin Zapata PA, Méndez-Lucio O, Le T, Beese CJ, Wichard J, Rouquié D, et al. Cell morphology-guided de novo hit design by conditioning GANs on phenotypic image features. Digit Discov. 2023;2:91–102.

    Article  CAS  Google Scholar 

  98. Lazo JS, Sharlow ER. Drugging undruggable molecular cancer targets. Annu Rev Pharm Toxicol. 2016;56:23–40.

    Article  CAS  Google Scholar 

  99. Duran-Frigola M, Cigler M, Winter GE. Advancing targeted protein degradation via multiomics profiling and artificial intelligence. J Am Chem Soc. 2023;145:2711–32.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Troup RI, Fallan C, Baud MGJ. Current strategies for the design of PROTAC linkers: a critical review. Explor Target Antitumor Ther. 2020;1:273–312.

  101. Abbas A, Ye F. Computational methods and key considerations for in silico design of proteolysis targeting chimera (PROTACs). Int J Biol Macromol. 2024;277:134293.

    Article  CAS  PubMed  Google Scholar 

  102. Igashov I, Stärk H, Vignac C, Schneuing A, Satorras VG, Frossard P, et al. Equivariant 3D-conditional diffusion model for molecular linker design. Nat Mach Intell. 2024;6:417–27.

    Article  Google Scholar 

  103. Imrie F, Bradley AR, Van Der Schaar M, Deane CM. Deep generative models for 3D linker design. J Chem Inf Model. 2020;60:1983–95.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Tan Y, Dai L, Huang W, Guo Y, Zheng S, Lei J, et al. DRlinker: deep reinforcement learning for optimization in fragment linking design. J Chem Inf Model. 2022;62:5907–17.

    Article  CAS  PubMed  Google Scholar 

  105. García Jiménez D, Rossi Sebastiano M, Vallaro M, Mileo V, Pizzirani D, Moretti E, et al. Designing soluble PROTACs: strategies and preliminary guidelines. J Med Chem. 2022;65:12639–49.

    Article  PubMed  PubMed Central  Google Scholar 

  106. Apprato G, D’Agostini G, Rossetti P, Ermondi G, Caron G. In silico tools to extract the drug design information content of degradation data: the case of PROTACs targeting the androgen receptor. Molecules. 2023;28:1206.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. Prael FJ, Cox J, Sturm N, Kutchukian P, Forrester WC, Michaud G, et al. Machine learning proteochemometric models for Cereblon glue activity predictions. Artif Intell Life Sci. 2024;6:100100.

    CAS  Google Scholar 

  108. Su Z, Xiao D, Xie F, Liu L, Wang Y, Fan S, et al. Antibody–drug conjugates: recent advances in linker chemistry. Acta Pharm Sin B. 2021;11:3889–907.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Kong X, Huang W, Liu Y. Conditional antibody design as 3D equivariant graph translation. arXiv; 2022. Available from: https://arxiv.org/abs/2208.06073.

  110. Gao K, Wu L, Zhu J, Peng T, Xia Y, He L, et al. Incorporating pre-training paradigm for antibody sequence-structure co-design. arXiv; 2022. Available from: https://arxiv.org/abs/2211.08406.

  111. Watson JL, Juergens D, Bennett NR, Trippe BL, Yim J, Eisenach HE, et al. De novo design of protein structure and function with RFdiffusion. Nature. 2023;620:1089–100.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. He H, He B, Guan L, Zhao Y, Jiang F, Chen G, et al. De novo generation of SARS-CoV-2 antibody CDRH3 with a pre-trained generative large language model. Nat Commun. 2024;15:6867.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  113. Tingle BI, Tang KG, Castanon M, Gutierrez JJ, Khurelbaatar M, Dandarchuluun C, et al. ZINC-22─A free multi-billion-scale database of tangible compounds for ligand discovery. J Chem Inf Model. 2023;63:1166–76.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  114. Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, et al. PubChem 2023 update. Nucleic Acids Res. 2023;51:D1373–80.

    Article  PubMed  Google Scholar 

  115. Burley SK, Bhikadiya C, Bi C, Bittrich S, Chao H, Chen L, et al. RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Res. 2023;51:D488–508.

    Article  CAS  PubMed  Google Scholar 

  116. Varadi M, Bertoni D, Magana P, Paramval U, Pidruchna I, Radhakrishnan M, et al. AlphaFold protein structure database in 2024: providing structure coverage for over 214 million protein sequences. Nucleic Acids Res. 2024;52:D368–75.

    Article  CAS  PubMed  Google Scholar 

  117. Liu Z, Su M, Han L, Liu J, Yang Q, Li Y, et al. Forging the basis for developing protein–ligand interaction scoring functions. Acc Chem Res. 2017;50:302–9.

    Article  CAS  PubMed  Google Scholar 

  118. Liu T, Hwang L, Burley SK, Nitsche CI, Southan C, Walters WP, et al. BindingDB in 2024: a FAIR knowledgebase of protein-small molecule binding data. Nucleic Acids Res. 2025;53:D1633–44.

    Article  PubMed  Google Scholar 

  119. Zhang C, Zhang X, Freddolino PL, Zhang Y. BioLiP2: an updated structure database for biologically relevant ligand–protein interactions. Nucleic Acids Res. 2024;52:D404–12.

    Article  CAS  PubMed  Google Scholar 

  120. Stein RM, Yang Y, Balius TE, O’Meara MJ, Lyu J, Young J, et al. Property-unmatched decoys in docking benchmarks. J Chem Inf Model. 2021;61:699–714.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  121. Tran-Nguyen VK, Jacquemard C, Rognan D. LIT-PCBA: an unbiased data set for machine learning and virtual screening. J Chem Inf Model. 2020;60:4263–73.

    Article  CAS  PubMed  Google Scholar 

  122. Ljosa V, Sokolnicki KL, Carpenter AE. Annotated high-throughput microscopy image sets for validation. Nat Methods. 2012;9:637–637.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. Sypetkowski M, Rezanejad M, Saberian S, Kraus O, Urbanik J, Taylor J, et al. RxRx1: a dataset for evaluating experimental batch correction methods. arXiv; 2023. Available from: https://arxiv.org/abs/2301.05768.

  124. Chandrasekaran SN, Cimini BA, Goodale A, Miller L, Kost-Alimova M, Jamali N, et al. Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations. Nat Methods. 2024;21:1114–21.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  125. Zheng M, Okawa S, Bravo M, Chen F, Martínez-Chantar ML, del Sol A. ChemPert: mapping between chemical perturbation and transcriptional response for non-cancer cells. Nucleic Acids Res. 2023;51:D877–89.

    Article  CAS  PubMed  Google Scholar 

  126. Replogle JM, Saunders RA, Pogson AN, Hussmann JA, Lenail A, Guna A, et al. Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq. Cell. 2022;185:2559–2575.e28.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  127. Jayatunga MKP, Xie W, Ruder L, Schulze U, Meier C. AI in small-molecule drug discovery: a coming wave? Nat Rev Drug Discov. 2022;21:175–6.

    Article  CAS  PubMed  Google Scholar 

  128. Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, et al. ECOD: an evolutionary classification of protein domains. PLOS Comput Biol. 2014;10:e1003926.

    Article  PubMed  PubMed Central  Google Scholar 

  129. Corso G, Deng A, Fry B, Polizzi N, Barzilay R, Jaakkola T. Deep Confident Steps to New Pockets: Strategies for Docking Generalization. arXiv; 2024. Available from: http://arxiv.org/abs/2402.18396

  130. Ghandi M, Huang FW, Jané-Valbuena J, Kryukov GV, Lo CC, McDonald ER, et al. Next-generation characterization of the cancer cell line encyclopedia. Nature. 2019;569:503–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  131. Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol. 2014;32:381–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  132. Patel AP, Tirosh I, Trombetta JJ, Shalek AK, Gillespie SM, Wakimoto H, et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science. 2014;344:1396–401.

Download references

Acknowledgements

This work was supported by the following grants: the Strategic Priority Research Program of the Chinese Academy of sciences (XDB0830000), National Natural Science Foundation of China (82204278, T2225002 and 82273855), SIMM-SHUTCM Traditional Chinese Medicine Innovation Joint Research Program (E2G805H), Shanghai Municipal Science and Technology Major Project, National Key Research and Development Program of China (2023YFC2305904 and 2022YFC3400504), Key Technologies R&D Program of Guangdong Province (2023B1111030004), Shanghai Sailing Program (24YF2755600), and the China Postdoctoral Science Foundation (2024M763421).

Author information

Authors and Affiliations

Authors

Contributions

XYW: Writing—Original draft. YChen: Writing—Original draft. YFL: Writing—Original Draft. CYW: Writing—Original Draft. MYL: Literature Search and Screening. CXY: Writing—Original Draft. YYZ: Writing—Original Draft. MHQ: Writing—Original Draft. YFS: Writing—Original Draft. XCT: Conceptualization and Supervision. MYZ: Conceptualization and Supervision. XTL: Writing—Review and Editing, Conceptualization, and Supervision.

Corresponding authors

Correspondence to Ming-yue Zheng or Xu-tong Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Xy., Chen, Y., Li, Yf. et al. Advancing active compound discovery for novel drug targets: insights from AI-driven approaches. Acta Pharmacol Sin (2025). https://doi.org/10.1038/s41401-025-01591-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41401-025-01591-x

Keywords

Search

Quick links