Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Machine learning in preclinical drug discovery

Abstract

Drug-discovery and drug-development endeavors are laborious, costly and time consuming. These programs can take upward of 12 years and cost US $2.5 billion, with a failure rate of more than 90%. Machine learning (ML) presents an opportunity to improve the drug-discovery process. Indeed, with the growing abundance of public and private large-scale biological and chemical datasets, ML techniques are becoming well positioned as useful tools that can augment the traditional drug-development process. In this Perspective, we discuss the integration of algorithmic methods throughout the preclinical phases of drug discovery. Specifically, we highlight an array of ML-based efforts, across diverse disease areas, to accelerate initial hit discovery, mechanism-of-action (MOA) elucidation and chemical property optimization. With advances in the application of ML across diverse therapeutic areas, we posit that fully ML-integrated drug-discovery pipelines will define the future of drug-development programs.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Applications of ML in drug discovery.
Fig. 2: ML-guided virtual screening.
Fig. 3: VAEs and normalizing flows for molecular generation.
Fig. 4: Applications of CLMs in de novo molecular design.
Fig. 5: AF2 for MOA elucidation.
Fig. 6: Diffusion models for MOA elucidation.
Fig. 7: ML applications in translational investigations.

Similar content being viewed by others

References

  1. Wouters, O. J., McKee, M. & Luyten, J. Estimated research and development investment needed to bring a new medicine to market, 2009–2018. JAMA 323, 844–853 (2020).

    PubMed  PubMed Central  Google Scholar 

  2. Schenone, M., Dančík, V., Wagner, B. K. & Clemons, P. A. Target identification and mechanism of action in chemical biology and drug discovery. Nat. Chem. Biol. 9, 232–240 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Ashenden, S. K. in The Era of Artificial Intelligence, Machine Learning, and Data Science in the Pharmaceutical Industry Ch. 6 (Elsevier, 2021).

  4. Smietana, K., Siatkowski, M. & Møller, M. Trends in clinical success rates. Nat. Rev. Drug Discov. 15, 379–380 (2016).

    CAS  PubMed  Google Scholar 

  5. Harrison, R. K. Phase II and phase III failures: 2013–2015. Nat. Rev. Drug Discov. 15, 817–818 (2016).

    CAS  PubMed  Google Scholar 

  6. Dowden, H. & Munro, J. Trends in clinical success rates and therapeutic focus. Nat. Rev. Drug Discov. 18, 495–496 (2019).

    CAS  PubMed  Google Scholar 

  7. Janai, J., Güney, F., Behl, A. & Geiger, A. Computer vision for autonomous vehicles: problems, datasets and state of the art. Found. Trends Comp. Graph. Vis. 12, 1–308 (2020).

    Google Scholar 

  8. Goldberg, S. B. et al. Machine learning and natural language processing in psychotherapy research: alliance as example use case. J. Couns. Psychol. 67, 438–448 (2020).

    PubMed  PubMed Central  Google Scholar 

  9. Peterson, A. A. & Liu, D. R. Small-molecule discovery through DNA-encoded libraries. Nat. Rev. Drug Discov. 22, 699–722 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Lim, K. S. et al. Machine learning on DNA-encoded library count data using an uncertainty-aware probabilistic loss function. J. Chem. Inf. Model. 62, 2316–2331 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Hou, R., Xie, C., Gui, Y., Li, G. & Li, X. Machine-learning-based data analysis method for cell-based selection of DNA-encoded libraries. ACS Omega 8, 19057–19071 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Van de Sande, B. et al. Applications of single-cell RNA sequencing in drug discovery and development. Nat. Rev. Drug Discov. 22, 496–520 (2023).

    PubMed  PubMed Central  Google Scholar 

  13. Yang, F. et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4, 852–866 (2022).

    Google Scholar 

  14. Chen, J. et al. Deep transfer learning of cancer drug responses by integrating bulk and single-cell RNA-seq data. Nat. Commun. 13, 6494 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Godinez, W. J., Hossain, I., Lazic, S. E., Davies, J. W. & Zhang, X. A multi-scale convolutional neural network for phenotyping high-content cellular images. Bioinformatics 33, 2010–2019 (2017).

    CAS  PubMed  Google Scholar 

  16. Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Coley, C. W., Barzilay, R., Green, W. H., Jaakkola, T. S. & Jensen, K. F. Convolutional embedding of attributed molecular graphs for physical property prediction. J. Chem. Inf. Model. 57, 1757–1772 (2017).

    CAS  PubMed  Google Scholar 

  18. Jin, W. et al. Deep learning identifies synergistic drug combinations for treating COVID-19. Proc. Natl Acad. Sci. USA 118, e2105070118 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 18, 463–477 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).

    CAS  PubMed  Google Scholar 

  21. Fernández-De Gortari, E., García-Jacas, C. R., Martinez-Mayorga, K. & Medina-Franco, J. L. Database fingerprint (DFP): an approach to represent molecular databases. J. Cheminform. 9, 9 (2017).

    Google Scholar 

  22. Liu, G. et al. Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii. Nat. Chem. Biol. https://doi.org/10.1038/s41589-023-01349-8 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Yang, K. et al. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 59, 3370–3388 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Corsello, S. M. et al. The Drug Repurposing Hub: a next-generation drug library and information resource. Nat. Med. 23, 405–408 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Wong, F. et al. Discovery of a structural class of antibiotics with explainable deep learning. Nature 626, 177–185 (2024).

    CAS  PubMed  Google Scholar 

  26. Bender, B. J. et al. A practical guide to large-scale docking. Nat. Protoc. 16, 4799–4832 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Gentile, F. et al. Artificial intelligence-enabled virtual screening of ultra-large chemical libraries with Deep Docking. Nat. Protoc. 17, 672–697 (2022).

    CAS  PubMed  Google Scholar 

  28. Tropsha, A., Isayev, O., Varnek, A., Schneider, G. & Cherkasov, A. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat. Rev. Drug Discov. 23, 141–155 (2024).

    PubMed  Google Scholar 

  29. Acharya, A. et al. Supercomputer-based ensemble docking drug discovery pipeline with application to Covid-19. J. Chem. Inf. Model.60, 5832–5852 (2020).

  30. Muratov, E. N. et al. A critical overview of computational approaches employed for COVID-19 drug discovery. Chem. Soc. Rev. 50, 9121–9151 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Sterling, T. & Irwin, J. J. ZINC 15 — ligand discovery for everyone. J. Chem. Inf. Model. 55, 2324–2337 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Rossetti, G. G. et al. Non-covalent SARS-CoV-2 Mpro inhibitors developed from in silico screen hits. Sci. Rep. 12, 2505 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Reymond, J. L. The chemical space project. Acc. Chem. Res. 48, 722–730 (2015).

    CAS  PubMed  Google Scholar 

  34. Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

    PubMed  PubMed Central  Google Scholar 

  35. Anstine, D. M. & Isayev, O. Generative models as an emerging paradigm in the chemical sciences. J. Am. Chem. Soc. 145, 8736–8750 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. Preprint at arxiv.org/abs/1802.04364 (2018).

  37. Godinez, W. J. et al. Design of potent antimalarials with generative chemistry. Nat. Mach. Intell. 4, 180–186 (2022).

    Google Scholar 

  38. Walters, W. P. & Murcko, M. Assessing the impact of generative AI on medicinal chemistry. Nat. Biotechnol. 38, 143–145 (2020).

    CAS  PubMed  Google Scholar 

  39. Cesaro, A., Bagheri, M., Torres, M., Wan, F. & de la Fuente-Nunez, C. Deep learning tools to accelerate antibiotic discovery. Expert Opin. Drug Discov. 18, 1245–1257 (2023).

    PubMed  Google Scholar 

  40. Rezende, D. J. & Mohamed, S. Variational inference with normalizing flows. In Proc. 32nd International Conference on Machine Learning 2, 1530–1538 (PMLR, 2015).

  41. Shekhovtsov, A., Schlesinger, D. & Flach, B. VAE approximation error: ELBO and exponential families. Preprint at arxiv.org/abs/2102.09310 (2021).

  42. Shi, C. et al. GraphAF: a flow-based autoregressive model for molecular graph generation. Preprint at arxiv.org/abs/2001.09382 (2020).

  43. Hoogeboom, E., Satorras, V. G., Vignac, C. & Welling, M. Equivariant diffusion for molecule generation in 3D. In Proc. 39th International Conference on Machine Learning 8867–8887 (2022).

  44. Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Model. 28, 31–36 (1988).

    CAS  Google Scholar 

  45. Grisoni, F. Chemical language models for de novo drug design: challenges and opportunities. Curr. Opin. Struct. Biol. 79, 102527 (2023).

    CAS  PubMed  Google Scholar 

  46. Flam-Shepherd, D., Zhu, K. & Aspuru-Guzik, A. Language models can learn complex molecular distributions. Nat. Commun. 13, 3293 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. Skinnider, M. A., Greg Stacey, R., Wishart, D. S. & Foster, L. J. Chemical language models enable navigation in sparsely populated chemical space. Nat. Mach. Intell. 3, 759–770 (2021).

  48. Moret, M., Friedrich, L., Grisoni, F., Merk, D. & Schneider, G. Generative molecular design in low data regimes. Commun. Chem. 5, 129 (2022).

    Google Scholar 

  49. Ballarotto, M. et al. De novo design of Nurr1 agonists via fragment-augmented generative deep learning in low-data regime. J. Med. Chem. 66, 8170–8177 (2023).

    PubMed  PubMed Central  Google Scholar 

  50. Moret, M. et al. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nat. Commun. 14, 114 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Grisoni, F. et al. Combining generative artificial intelligence and on-chip synthesis for de novo drug design. Sci. Adv. 7, 3338–3349 (2021).

    Google Scholar 

  52. Merk, D., Friedrich, L., Grisoni, F. & Schneider, G. De novo design of bioactive small molecules by artificial intelligence. Mol. Inf. 37, 1700153 (2018).

    Google Scholar 

  53. Vaswani, A. et al. Attention is all you need. Preprint at arxiv.org/abs/1706.03762 (2023).

  54. Bagal, V., Aggarwal, R., Vinod, P. K. & Priyakumar, U. D. MolGPT: molecular generation using a transformer-decoder model. J. Chem. Inf. Model. 62, 2064–2076 (2021).

    PubMed  Google Scholar 

  55. Brown, N., Fiscato, M., Segler, M. H. S. & Vaucher, A. C. GuacaMol: benchmarking models for de novo molecular design. J. Chem. Inf. Model. 59, 1096–1108 (2019).

    CAS  PubMed  Google Scholar 

  56. Polykovskiy, D. et al. Molecular sets (MOSES): a benchmarking platform for molecular generation models. Front. Pharmacol. 11, 565644 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models. Nature 624, 570–578 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. Jablonka, K. M., Schwaller, P., Ortega-Guerrero, A. & Smit, B. Leveraging large language models for predictive chemistry. Nat. Mach. Intell. 6, 161–169 (2024).

    Google Scholar 

  59. Born, J. & Manica, M. Regression Transformer enables concurrent sequence regression and generation for molecular language modelling. Nat. Mach. Intell. 5, 432–444 (2023).

    Google Scholar 

  60. Frey, N. C. et al. Neural scaling of deep chemical models. Nat. Mach. Intell. 5, 1297–1305 (2023).

    Google Scholar 

  61. Grechishnikova, D. Transformer neural network for protein-specific de novo drug generation as a machine translation problem. Sci. Rep. 11, 321 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  63. Stsiapanava, A. et al. Structure of the decoy module of human glycoprotein 2 and uromodulin and its interaction with bacterial adhesin FimH. Nat. Struct. Mol. Biol. 29, 190–193 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. Liu, H. et al. Cryo-EM structures of human hepatitis B and woodchuck hepatitis virus small spherical subviral particles. Sci. Adv. 8, eabo4184 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  65. Ren, F. et al. AlphaFold accelerates artificial intelligence powered drug discovery: efficient discovery of a novel CDK20 small molecule inhibitor. Chem. Sci. 14, 1443–1452 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Yang, Q. et al. Structural comparison and drug screening of spike proteins of ten SARS-CoV-2 variants. Research 2022, 9781758 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. Yang, Q., Xia, D., Syed, A. A. S., Wang, Z. & Shi, Y. Highly accurate protein structure prediction and drug screen of monkeypox virus proteome. J. Infect. 86, 66–117 (2023).

    CAS  PubMed  Google Scholar 

  68. Ivanenkov, Y. A. et al. Chemistry42: an AI-driven platform for molecular design and optimization. J. Chem. Inf. Model. 63, 695–701 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  70. Berman, H. M. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. Van Wart, H. E. & Birkedal-Hansen, H. The cysteine switch: a principle of regulation of metalloproteinase activity with potential applicability to the entire matrix metalloproteinase gene family. Proc. Natl Acad. Sci. USA 87, 5578–5582 (1990).

    PubMed  PubMed Central  Google Scholar 

  72. Michaud, J. M., Madani, A. & Fraser, J. S. A language model beats AlphaFold2 on orphans. Nat. Biotechnol. 40, 1576–1577 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. Wu, R. et al. High-resolution de novo structure prediction from primary sequence. Preprint at bioRxiv https://doi.org/10.1101/2022.07.21.500999 (2022).

  74. Fang, X. et al. A method for multiple-sequence-alignment-free protein structure prediction using a protein language model. Nat. Mach. Intell. 5, 1087–1096 (2023).

    Google Scholar 

  75. Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 41, 1099–1106 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  76. Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).

    CAS  PubMed  Google Scholar 

  77. Suzek, B. E. et al. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–932 (2014).

  78. Corso, G., Stärk, H., Barzilay, R. & Jaakkola, T. DiffDock: diffusion steps, twists, and turns for molecular docking. Preprint at arxiv.org/abs/2210.01776 (2022).

  79. Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  80. Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  81. Chaffin, M. et al. Single-nucleus profiling of human dilated and hypertrophic cardiomyopathy. Nature 608, 174–180 (2022).

    CAS  PubMed  Google Scholar 

  82. Hughes, J. P., Rees, S. S., Kalindjian, S. B. & Philpott, K. L. Principles of early drug discovery. Br. J. Pharmacol. 162, 1239–1249 (2011).

    PubMed  PubMed Central  Google Scholar 

  83. Goodnow, R. A. Hit and lead identification: integrated technology-based approaches. Drug Discov. Today Technol. 3, 367–375 (2006).

    Google Scholar 

  84. Yang, L. et al. Transformer-based deep learning method for optimizing ADMET properties of lead compounds. Phys. Chem. Chem. Phys. 25, 2377–2385 (2023).

    PubMed  Google Scholar 

  85. Chen, Y., Yu, X., Li, W., Tang, Y. & Liu, G. In silico prediction of hERG blockers using machine learning and deep learning approaches. J. Appl. Toxicol. 43, 1462–1475 (2023).

    CAS  PubMed  Google Scholar 

  86. Sharma, B. et al. Accurate clinical toxicity prediction using multi-task deep neural nets and contrastive molecular explanations. Sci. Rep. 13, 4908 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  87. Sun, D., Gao, W., Hu, H. & Zhou, S. Why 90% of clinical drug development fails and how to improve it? Acta Pharm. Sin. B 12, 3049–3062 (2022).

    PubMed  PubMed Central  Google Scholar 

  88. Kola, I. & Landis, J. Can the pharmaceutical industry reduce attrition rates? Nat. Rev. Drug Discov. 3, 711–716 (2004).

    CAS  PubMed  Google Scholar 

  89. Lipinski, C. A. Lead- and drug-like compounds: the rule-of-five revolution. Drug Discov. Today Technol. 1, 337–341 (2004).

    CAS  PubMed  Google Scholar 

  90. Coutinho, A. L. et al. A robust, viable, and resource sparing HPLC-based log P method applied to common drugs. Int. J. Pharm. 644, 123325 (2023).

    CAS  PubMed  Google Scholar 

  91. Faller, B. & Ertl, P. Computational approaches to determine drug solubility. Adv. Drug Deliv. Rev. 59, 533–545 (2007).

    CAS  PubMed  Google Scholar 

  92. Aliagas, I., Gobbi, A., Lee, M. L. & Sellers, B. D. Comparison of log P and log D correction models trained with public and proprietary data sets. J. Comput. Aided Mol. Des. 36, 253–262 (2022).

    CAS  PubMed  Google Scholar 

  93. Win, Z. M., Cheong, A. M. Y. & Hopkins, W. S. Using machine learning to predict partition coefficient (log P) and distribution coefficient (log D) with molecular descriptors and liquid chromatography retention time. J. Chem. Inf. Model. 63, 1906–1913 (2023).

    CAS  PubMed  Google Scholar 

  94. Domingo-Almenara, X. et al. The METLIN small molecule dataset for machine learning-based retention time prediction. Nat. Commun. 10, 5811 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  95. Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–D1107 (2012).

    CAS  PubMed  Google Scholar 

  96. Datta, R., Das, D. & Das, S. Efficient lipophilicity prediction of molecules employing deep-learning models. Chemometr. Intell. Lab. Syst. 213, 104309 (2021).

  97. Prasad, S. & Brooks, B. R. A deep learning approach for the blind log P prediction in SAMPL6 challenge. J. Comput. Aided Mol. Des. 34, 535–542 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  98. Heijman, J., Voigt, N., Carlsson, L. G. & Dobrev, D. Cardiac safety assays. Curr. Opin. Pharmacol. 15, 16–21 (2014).

    CAS  PubMed  Google Scholar 

  99. Ackloo, S. et al. CACHE (Critical Assessment of Computational Hit-finding Experiments): a public–private partnership benchmarking initiative to enable the development of computational methods for hit-finding. Nat. Rev. Chem. 6, 287–295 (2022).

    PubMed  PubMed Central  Google Scholar 

  100. Swanson, K. et al. ADMET-AI: a machine learning ADMET platform for evaluation of large-scale chemical libraries. Zenodo https://doi.org/10.5281/zenodo.10372930 (2023).

  101. Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).

    CAS  PubMed  Google Scholar 

  102. Huang, R. et al. Tox21Challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental chemicals and drugs. Front. Environ. Sci. https://doi.org/10.3389/fenvs.2015.00085 (2016).

  103. Tingle, B. I. et al. ZINC-22—a free multi-billion-scale database of tangible compounds for ligand discovery. J. Chem. Inf. Model. 63, 1166–1176 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  104. Ramakrishnan, R., Dral, P. O., Rupp, M. & von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 140022 (2014).

    CAS  Google Scholar 

  105. Frye, L., Bhat, S., Akinsanya, K. & Abel, R. From computer-aided drug discovery to computer-driven drug discovery. Drug Discov. Today Technol. 39, 111–117 (2021).

    CAS  PubMed  Google Scholar 

  106. Zeng, W., Guo, L., Xu, S., Chen, J. & Zhou, J. High-throughput screening technology in industrial biotechnology. Trends Biotechnol. 38, 888–906 (2020).

    CAS  PubMed  Google Scholar 

  107. Sarkar, N. & Stokes, J. M. Practical applications of machine learning for anti-infective drug discovery. Med. Chem. Rev. 14, 345–375 (2023).

  108. Arnold, A., Alexander, J., Liu, G. & Stokes, J. M. Applications of machine learning in microbial natural product drug discovery. Expert Opin. Drug Discov. 18, 1259–1272 (2023).

    PubMed  Google Scholar 

  109. Mullowney, M. W. et al. Artificial intelligence for natural product drug discovery. Nat. Rev. Drug Discov. 22, 895–916 (2023).

    CAS  PubMed  Google Scholar 

  110. Ekins, S. et al. Exploiting machine learning for end-to-end drug discovery and development. Nat. Mater. 18, 435–441 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  111. Grisoni, F. et al. Designing anticancer peptides by constructive machine learning. ChemMedChem 13, 1300–1302 (2018).

    CAS  PubMed  Google Scholar 

  112. Chen, J., Cheong, H. H. & Siu, S. W. I. xDeep-AcPEP: deep learning method for anticancer peptide activity prediction based on convolutional neural network and multitask learning. J. Chem. Inf. Model. 61, 3789–3803 (2021).

    CAS  PubMed  Google Scholar 

  113. Walker, A. S. & Clardy, J. A machine learning bioinformatics method to predict biological activity from biosynthetic gene clusters. J. Chem. Inf. Model. 61, 2560–2571 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  114. Heyndrickx, W. et al. MELLODDY: cross-pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information. J. Chem. Inf. Model. 64, 2331–2344 (2023).

  115. Wellawatte, G. P., Gandhi, H. A., Seshadri, A. & White, A. D. A perspective on explanations of molecular prediction models. J. Chem. Theory Comput. 19, 2149–2160 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  116. Cichońska, A. et al. Crowdsourced mapping of unexplored target space of kinase inhibitors. Nat. Commun. 12, 3307 (2021).

    PubMed  PubMed Central  Google Scholar 

  117. Ketkar, N. in Deep Learning with Python 97–111 (Apress, 2017).

Download references

Acknowledgements

We thank G. Liu for his insightful comments during the preparation of the manuscript. This work was generously supported by the Weston Family Foundation, the Canadian Institutes of Health Research and the David Braley Centre for Antibiotic Discovery.

Author information

Authors and Affiliations

Authors

Contributions

D.B.C., J.A., A.A. and J.M.S. wrote and edited the paper.

Corresponding author

Correspondence to Jonathan M. Stokes.

Ethics declarations

Competing interests

J.M.S. is a founder of Stoked Bio. All other authors declare no competing interests.

Peer review

Peer review information

Nature Chemical Biology thanks Cesar de la Fuente, Daniel Reker and the other, anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Catacutan, D.B., Alexander, J., Arnold, A. et al. Machine learning in preclinical drug discovery. Nat Chem Biol 20, 960–973 (2024). https://doi.org/10.1038/s41589-024-01679-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41589-024-01679-1

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research