Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Chemical predictive modelling to improve compound quality

Key Points

  • Chemical predictive modelling encompasses empirical computational methods based on observed patterns in data that guide the design of future compounds.

  • Simple physicochemical property-based guidelines and structure-based chemical filters, such as AstraZeneca's AZFilters, are used to identify poor-quality compounds in screening set selection and compound design.

  • Despite their limitations, quantitative structure–activity relationship (QSAR) models of ADMET (absorption, distribution, metabolism, excretion and toxicity) properties are widely used in compound design; advice and guidance on the judicious use of QSAR methods has been published.

  • A key problem with QSAR methods is estimating confidence in the predictions, which is linked to the definition of the model's domain of applicability.

  • Project- or chemical series-specific QSAR models are one approach to solve the 'domain of applicability' problem but this approach requires automated model building to be practical for a large organization with multiple projects.

  • Interpretable models and inverse QSAR methods provide additional information to inform the design of compounds with improved properties.

  • Matched molecular pair analysis is complementary to standard QSAR, is interpretable and can be used to propose new compounds.

  • Despite the progress in chemical predictive modelling techniques, their impact on improving compound quality is difficult to assess and is limited by cultural factors.

  • These include continued debate over the application of compound quality guidelines and the diversity of opinions among medicinal chemists on attractive versus unattractive structures.

  • Current techniques are most successful in modelling ADMET properties, whereas prediction of potency or efficacy is more challenging.

  • Areas of active research include descriptors to incorporate chirality, multi-objective optimization and expert systems for compound optimization.

Abstract

The 'quality' of small-molecule drug candidates, encompassing aspects including their potency, selectivity and ADMET (absorption, distribution, metabolism, excretion and toxicity) characteristics, is a key factor influencing the chances of success in clinical trials. Importantly, such characteristics are under the control of chemists during the identification and optimization of lead compounds. Here, we discuss the application of computational methods, particularly quantitative structure–activity relationships (QSARs), in guiding the selection of higher-quality drug candidates, as well as cultural factors that may have affected their use and impact.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Performance of automated QSAR modelling.
Figure 2: Matched molecular pair analysis.
Figure 3: Example of the influence of organizational factors on the uptake of chemical predictive modelling.
Figure 4: Lack of consistency in expert evaluations of chemical quality.

Similar content being viewed by others

References

  1. Paul, S. M. et al. How to improve R&D productivity: the pharmaceutical industry's grand challenge. Nature Rev. Drug Discov. 9, 203–214 (2010). This is a useful source of data on timelines, the probability of technical success and the costs associated with running drug discovery and development projects.

    Article  CAS  Google Scholar 

  2. Morgan, P. et al. Can the flow of medicines be improved? Fundamental pharmacokinetic and pharmacological principles toward improving Phase II survival. Drug Discov. Today 17, 419–424 (2012). This paper describes Pfizer's drug development experience, and introduces the concept of target engagement as a key confidence builder in projects.

    Article  CAS  PubMed  Google Scholar 

  3. van de Waterbeemd, H. & Gifford, E. ADMET in silico modelling: towards prediction paradise? Nature Rev. Drug Discov. 2, 192–204 (2003).

    Article  CAS  Google Scholar 

  4. Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 23, 3–25 (1997). This seminal paper introduced the 'rule of five' guidelines for oral bioavailability; these are the original compound quality guidelines based on simple calculated physicochemical properties.

    Article  CAS  Google Scholar 

  5. Teague, S. J., Davis, A. M., Leeson, P. D. & Oprea, T. The design of leadlike combinatorial libraries. Angew. Chem. Int. Ed Engl. 38, 3743–3748 (1999). This paper introduces the lead-like concept, which has been highly influential on the lead generation activities of many companies.

    Article  CAS  PubMed  Google Scholar 

  6. Hann, M. M. & Oprea, T. I. Pursuing the leadlikeness concept in pharmaceutical research. Curr. Opin. Chem. Biol. 8, 255–263 (2004).

    Article  CAS  PubMed  Google Scholar 

  7. Lipinski, C. A. in Annual Reports in Computational Chemistry (ed. David, C. S.) 155–168 (Elsevier, 2005).

    Book  Google Scholar 

  8. Walters, W. P. Going further than Lipinski's rule in drug design. Expert Opin. Drug Discov. 7, 99–107 (2012).

    Article  CAS  PubMed  Google Scholar 

  9. Congreve, M., Carr, R., Murray, C. & Jhoti, H. A 'rule of three' for fragment-based lead discovery? Drug Discov. Today 8, 876–877 (2003).

    Article  PubMed  Google Scholar 

  10. Wager, T. T. et al. Defining desirable central nervous system drug space through the alignment of molecular properties, in vitro ADME, and safety attributes. ACS Chem. Neurosci. 1, 420–434 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Gleeson, M. P. Generation of a set of simple, interpretable ADMET rules of thumb. J. Med. Chem. 51, 817–834 (2008).

    Article  CAS  PubMed  Google Scholar 

  12. Hughes, J. D. et al. Physiochemical drug properties associated with in vivo toxicological outcomes. Bioorg. Med. Chem. Lett. 18, 4872–4875 (2008).

    Article  CAS  PubMed  Google Scholar 

  13. Leeson, P. D. & Davis, A. M. Time-related differences in the physical property profiles of oral drugs. J. Med. Chem. 47, 6338–6348 (2004).

    Article  CAS  PubMed  Google Scholar 

  14. Hann, M. M., Leach, A. R. & Harper, G. Molecular complexity and its impact on the probability of finding leads for drug discovery. J. Chem. Inf. Comput. Sci. 41, 856–864 (2001).

    Article  CAS  PubMed  Google Scholar 

  15. Vistoli, G., Pedretti, A. & Testa, B. Assessing drug-likeness — what are we missing? Drug Discov. Today 13, 285–294 (2008).

    Article  CAS  PubMed  Google Scholar 

  16. Andrews, P. R., Craik, D. J. & Martin, J. L. Functional group contributions to drug-receptor interactions. J. Med. Chem. 27, 1648–1657 (1984).

    Article  CAS  PubMed  Google Scholar 

  17. Kuntz, I. D., Chen, K., Sharp, K. A. & Kollman, P. A. The maximal affinity of ligands. Proc. Natl Acad. Sci. USA 96, 9997–10002 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Leeson, P. D. & Springthorpe, B. The influence of drug-like concepts on decision-making in medicinal chemistry. Nature Rev. Drug Discov. 6, 881–890 (2007). This is a provocative publication that challenges medicinal chemists' decision-making practices.

    Article  CAS  Google Scholar 

  19. Keseru, G. M. & Makara, G. M. The influence of lead discovery strategies on the properties of drug candidates. Nature Rev. Drug Discov. 8, 203–212 (2009).

    Article  CAS  Google Scholar 

  20. Murray, C. W., Verdonk, M. L. & Rees, D. C. Experiences in fragment-based drug discovery. Trends Pharmacol. Sci. 33, 224–232 (2012).

    Article  CAS  PubMed  Google Scholar 

  21. Leeson, P. D. & St-Gallay, S. The influence of the 'organizational factor' on compound quality in drug discovery. Nature Rev. Drug Discov. 10, 749–765 (2011).

    Article  CAS  Google Scholar 

  22. Tarcsay, A., Nyiri, K. & Keseru, G. M. Impact of lipophilic efficiency on compound quality. J. Med. Chem. 55, 1252–1260 (2012).

    Article  CAS  PubMed  Google Scholar 

  23. Tarcsay, A., Nyiri, K. & Keseru, G. M. Correction to impact of lipophilic efficiency on compound quality. J. Med. Chem. 56, 3120 (2013).

    Article  CAS  Google Scholar 

  24. Gilbert, M. R. Reactive compounds and in vitro false positives in HTS. Drug Discov. Today 2, 382–384 (1997).

    Article  Google Scholar 

  25. Baell, J. B. & Holloway, G. A. New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J. Med. Chem. 53, 2719–2740 (2010).

    Article  CAS  PubMed  Google Scholar 

  26. Davis, A. M., Keeling, D. J., Steele, J., Tomkinson, N. P. & Tinker, A. C. Components of successful lead generation. Curr. Top. Med. Chem. 5, 421–439 (2005).

    Article  CAS  PubMed  Google Scholar 

  27. Ursu, O., Rayan, A., Goldblum, A. & Oprea, T. I. Understanding drug-likeness. Wiley Interdiscip. Rev. Comput. Mol. Sci. 1, 760–781 (2011).

    Article  CAS  Google Scholar 

  28. Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nature Chem. 4, 90–98 (2012).

    Article  CAS  Google Scholar 

  29. Hansch, C. in QSAR and Molecular Modelling in Rational Design of Bioactive Molecules: Programs and Abstracts (eds Aki-Sener, E. & Yalcin, I.) 3–22 (Proceedings of the 15th European Symposium on Structure-Activity Relationships (QSAR) and Molecular Modelling, 2006).

    Google Scholar 

  30. Huang, J. & Fan, X. Why QSAR fails: an empirical evaluation using conventional computational approach. Mol. Pharm. 8, 600–608 (2011).

    Article  CAS  PubMed  Google Scholar 

  31. Doweyko, A. M. QSAR: dead or alive? J. Comput. Aided Mol. Des. 22, 81–89 (2008).

    Article  CAS  PubMed  Google Scholar 

  32. Stouch, T. R. et al. In silico ADME/Tox: why models fail. J. Comput. Aided Mol. Des. 17, 83–92 (2003). This is a textbook case study on how not to build QSARs.

    Article  CAS  PubMed  Google Scholar 

  33. Cronin, M. T. D. & Schultz, T. W. Pitfalls in QSAR. J. Mol. Struct. 622, 39–51 (2003).

    Article  CAS  Google Scholar 

  34. Young, D., Martin, T., Venkatapathy, R. & Harten, P. Are the chemical structures in your QSAR correct? QSAR Combinatorial Sci. 27, 1337–1345 (2008).

    Article  CAS  Google Scholar 

  35. Williams, A. J., Ekins, S. & Tkachenko, V. Towards a gold standard: regarding quality in public domain chemistry databases and approaches to improving the situation. Drug Discov. Today 17, 685–701 (2012).

    Article  CAS  PubMed  Google Scholar 

  36. Jorgensen, W. L. QSAR/QSPR and proprietary data. J. Chem. Inf. Model. 46, 937 (2006).

    Article  CAS  Google Scholar 

  37. Tetko, I. V., Bruneau, P., Mewes, H., Rohrer, D. C. & Poda, G. I. Can we estimate the accuracy of ADME–Tox predictions? Drug Discov. Today 11, 700–707 (2006).

    Article  CAS  PubMed  Google Scholar 

  38. Tetko, I. V. et al. Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection. J. Chem. Inf. Model. 48, 1733–1746 (2008).

    Article  CAS  PubMed  Google Scholar 

  39. Sahigara, F. et al. Comparison of different approaches to define the applicability domain of QSAR models. Molecules 17, 4791–4810 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Maggiora, G. M. On outliers and activity cliffs — why QSAR often disappoints. J. Chem. Inf. Model. 46, 1535 (2006).

    Article  CAS  PubMed  Google Scholar 

  41. Schwantes, J. M., Orton, C. R., Fraga, C. G., Douglas, M. & Christensen, R. N. The multi-isotope process (MIP) monitor: a near-real-time, non-destructive, indicator of spent nuclear fuel reprocessing conditions. Proceedings of the 50th Annual Meeting of the Institute of Nuclear Materials [online], (2009).

    Google Scholar 

  42. Olah, M., Bologa, C. & Oprea, T. I. An automated PLS search for biologically relevant QSAR descriptors. J. Comput. Aided Mol. Des. 18, 437–449 (2004).

    Article  CAS  PubMed  Google Scholar 

  43. Sushko, I. et al. Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J. Comput. Aided Mol. Des. 25, 533–554 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Cartmell, J., Krstajic, D. & Leahy, D. E. Competitive workflow: novel software architecture for automating drug design. Curr. Opin. Drug Discov. Devel. 10, 347–352 (2007).

    CAS  PubMed  Google Scholar 

  45. Hughes-Oliver, J. M. et al. ChemModLab: a web-based cheminformatics modeling laboratory. In Silico Biol. 11, 61–81 (2011).

    CAS  Google Scholar 

  46. Obrezanova, O., Gola, J. M., Champness, E. J. & Segall, M. D. Automatic QSAR modeling of ADME properties: blood–brain barrier penetration and aqueous solubility. J. Comput. Aided Mol. Des. 22, 431–440 (2008).

    Article  CAS  PubMed  Google Scholar 

  47. Fischer, H. & Kansy, M. Automated generation of multi-dimensional structure activity and structure property relationships. US Patent 7400982 (2008).

  48. Rodgers, S. L., Davis, A. M., Tomkinson, N. P. & van de Waterbeemd, H. Predictivity of simulated ADME AutoQSAR models over time. Mol. Inform. 30, 256–266 (2011).

    Article  CAS  PubMed  Google Scholar 

  49. Wood, D. J. et al. Automated QSAR with a hierarchy of global and local models. Mol. Inform. 30, 960–972 (2011).

    Article  CAS  PubMed  Google Scholar 

  50. Keefer, C. E., Kauffman, G. W. & Gupta, R. R. Interpretable, probability-based confidence metric for continuous quantitative structure–activity relationship models. J. Chem. Inf. Model. 53, 368–383 (2013).

    Article  CAS  PubMed  Google Scholar 

  51. Kramer, C. et al. Sharpening the toolbox of computational chemistry: a new approximation of critical f-values for multiple linear regression. J. Chem. Inf. Model. 49, 28–34 (2009).

    Article  CAS  PubMed  Google Scholar 

  52. Livingstone, D. J. & Salt, D. W. Judging the significance of multiple linear regression models. J. Med. Chem. 48, 661–663 (2005).

    Article  CAS  PubMed  Google Scholar 

  53. Kubinyi, H. in Handbook of Chemoinformatics: From Data to Knowledge in 4 Volumes (ed. Gasteiger, J.) 1532–1554 (Wiley-VCH Weinheim, 2003).

    Google Scholar 

  54. Rucker, C., Rucker, G. & Meringer, M. y-Randomization and its variants in QSPR/QSAR. J. Chem. Inf. Model. 47, 2345–2357 (2007).

    Article  CAS  PubMed  Google Scholar 

  55. Guha, R. On the interpretation and interpretability of quantitative structure–activity relationship models. J. Computer-Aided Mol. Design 22, 857–871 (2008).

    Article  CAS  Google Scholar 

  56. Johansson, U., Sonstrod, C., Norinder, U. & Bostrom, H. Trade-off between accuracy and interpretability for predictive in silico modeling. Future Med. Chem. 3, 647–663 (2011).

    Article  CAS  PubMed  Google Scholar 

  57. Carlsson, L., Helgee, E. A. & Boyer, S. Interpretation of nonlinear QSAR models applied to Ames mutagenicity data. J. Chem. Inf. Model. 49, 2551–2558 (2009).

    Article  CAS  PubMed  Google Scholar 

  58. Faulon, J. L., Visco, D. P. Jr & Pophale, R. S. The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. J. Chem. Inf. Comput. Sci. 43, 707–720 (2003).

    Article  CAS  PubMed  Google Scholar 

  59. Spjuth, O., Eklund, M., Ahlberg Helgee, E., Boyer, S. & Carlsson, L. Integrated decision support for assessing chemical liabilities. J. Chem. Inf. Model. 51, 1840–1847 (2011).

    Article  CAS  PubMed  Google Scholar 

  60. Segall, M., Champness, E., Obrezanova, O. & Leeding, C. Beyond profiling: using ADMET models to guide decisions. Chem. Biodivers. 6, 2144–2151 (2009).

    Article  CAS  PubMed  Google Scholar 

  61. Lewis, R. A. A general method for exploiting, QSAR models in lead optimization. J. Med. Chem. 48, 1638–1648 (2005).

    Article  CAS  PubMed  Google Scholar 

  62. Helgee, E. A., Carlsson, L. & Boyer, S. A. Method for automated molecular optimization applied to Ames mutagenicity data. J. Chem. Inform. Model. 49, 2559–2563 (2009).

    Article  CAS  Google Scholar 

  63. Griffen, E., Leach, A. G., Robb, G. R. & Warner, D. J. Matched molecular pairs as a medicinal chemistry tool. J. Med. Chem. 54, 7739–7750 (2011).

    Article  CAS  PubMed  Google Scholar 

  64. Dossetter, A. G., Griffen, E. J. & Leach, A. G. Matched molecular pair analysis in drug discovery. Drug Discov. Today 18, 724–731 (2013).

    Article  CAS  PubMed  Google Scholar 

  65. Griffen, E. The rise of the intelligent machines in drug hunting? Future Med. Chem. 1, 405–408 (2009).

    Article  CAS  PubMed  Google Scholar 

  66. Warner, D. J., Bridgland-Taylor, M. H., Sefton, C. E. & Wood, D. J. Prospective prediction of antitarget activity by matched molecular pairs analysis. Mol. Inform. 31, 365–368 (2012).

    Article  CAS  PubMed  Google Scholar 

  67. Hajduk, P. J. & Sauer, D. R. Statistical analysis of the effects of common chemical substituents on ligand potency. J. Med. Chem. 51, 553–564 (2008).

    Article  CAS  PubMed  Google Scholar 

  68. Mills, J. E. J. et al. SAR mining and its application to the design of TRPA1 antagonists. Med. Chem. Commun. 3, 174–178 (2012).

    Article  CAS  Google Scholar 

  69. Dalke, A., Bache, E., Van De Waterbeemd, H. & Boyer, S. C-Lab: a web tool for physical property and model calculations. Dalke Scientific [online], (2008).

    Google Scholar 

  70. Gavaghan, C., Arnby, C., Blomberg, N., Strandlund, G. & Boyer, S. Development, interpretation and temporal evaluation of a global QSAR of hERG electrophysiology screening data. J. Comput. Aided Mol. Des. 21, 189–206 (2007).

    Article  CAS  PubMed  Google Scholar 

  71. Dobo, K. L. et al. In silico methods combined with expert knowledge rule out mutagenic potential of pharmaceutical impurities: an industry survey. Regul. Toxicol. Pharmacol. 62, 449–455 (2012).

    Article  CAS  PubMed  Google Scholar 

  72. Austin, R. P. et al. QSAR and the rational design of long-acting dual D2-receptor/β2-adrenoceptor agonists. J. Med. Chem. 46, 3210–3220 (2003).

    Article  CAS  PubMed  Google Scholar 

  73. Brown, A. D. et al. The discovery of indole-derived long acting β2-adrenoceptor agonists for the treatment of asthma and COPD. Bioorg. Med. Chem. Lett. 17, 6188–6191 (2007).

    Article  CAS  PubMed  Google Scholar 

  74. Baur, F. et al. The identification of indacaterol as an ultralong-acting inhaled β2-adrenoceptor agonist. J. Med. Chem. 53, 3675–3684 (2010).

    Article  CAS  PubMed  Google Scholar 

  75. Bruneau, P. Search for predictive generic model of aqueous solubility using Bayesian neural nets. J. Chem. Inf. Comput. Sci. 41, 1605–1616 (2001).

    Article  CAS  PubMed  Google Scholar 

  76. Loughney, D., Claus, B. L. & Johnson, S. R. To measure is to know: an approach to CADD performance metrics. Drug Discov. Today 16, 548–554 (2011).

    Article  PubMed  Google Scholar 

  77. Kenny, P. W. & Montanari, C. A. Inflation of correlation in the pursuit of drug-likeness. J. Comput. Aided Mol. Des. 27, 1–13 (2013). This study challenges various highly cited papers on the robustness of their conclusions and provides good statistical guidance on studying drug-likeness through database analysis.

    Article  CAS  PubMed  Google Scholar 

  78. Lovering, F., Bikker, J. & Humblet, C. Escape from flatland: increasing saturation as an approach to improving clinical success. J. Med. Chem. 52, 6752–6756 (2009).

    Article  CAS  PubMed  Google Scholar 

  79. Muthas, D., Boyer, S. & Hasselgren, C. A critical assessment of modeling safety-related drug attrition Med. Chem. Commun. 4, 1058–1065 (2013).

    Article  CAS  Google Scholar 

  80. Bennani, Y. L. Drug discovery in the next decade: innovation needed ASAP. Drug Discov. Today 16, 779–792 (2011).

    Article  PubMed  Google Scholar 

  81. Vaidyanathan, S., Jarugula, V., Dieterich, H. A., Howard, D. & Dole, W. P. Clinical pharmacokinetics and pharmacodynamics of aliskiren. Clin. Pharmacokinet. 47, 515–531 (2008).

    Article  CAS  PubMed  Google Scholar 

  82. Springthorpe, B. et al. From ATP to AZD6140: the discovery of an orally active reversible P2Y12 receptor antagonist for the prevention of thrombosis. Bioorg. Med. Chem. Lett. 17, 6013–6018 (2007).

    Article  CAS  PubMed  Google Scholar 

  83. Lajiness, M. S., Maggiora, G. M. & Shanmugasundaram, V. Assessment of the consistency of medicinal chemists in reviewing sets of compounds. J. Med. Chem. 47, 4891–4896 (2004).

    Article  CAS  PubMed  Google Scholar 

  84. Kutchukian, P. S. et al. Inside the mind of a medicinal chemist: the role of human bias in compound prioritization during drug discovery. PLoS ONE 7, e48476 (2012). This is an investigation into the role of cognitive biases in medicinal chemistry decision-making.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Oprea, T. I. et al. A crowdsourcing evaluation of the NIH chemical probes. Nature Chem. Biol. 5, 441–447 (2009).

    Article  CAS  Google Scholar 

  86. Schein, E. H. The Corporate Culture Survival Guide (Wiley, 2009).

    Google Scholar 

  87. Stepan, A. F. et al. Structural alert/reactive metabolite concept as applied in medicinal chemistry to mitigate the risk of idiosyncratic drug toxicity: a perspective based on the critical examination of trends in the top 200 drugs marketed in the United States. Chem. Res. Toxicol. 24, 1345–1410 (2011).

    Article  CAS  PubMed  Google Scholar 

  88. Martin, Y. C. What works and what does not: lessons from experience in a pharmaceutical company. QSAR Comb. Sci. 25, 1192–1200 (2006).

    Article  CAS  Google Scholar 

  89. Young, S. S., Yuan, F. & Zhu, M. Chemical descriptors are more important than learning algorithms for modelling. Mol. Inform. 31, 707–710 (2012).

    Article  CAS  PubMed  Google Scholar 

  90. Leach, A. G. et al. Enantiomeric pairs reveal that key medicinal chemistry parameters vary more than simple physical property based models can explain. Med. Chem. Commun. 3, 528–540 (2012).

    Article  CAS  Google Scholar 

  91. Hillebrecht, A. & Klebe, G. Use of 3D QSAR models for database screening: a feasibility study. J. Chem. Inf. Model. 48, 384–396 (2008).

    Article  CAS  PubMed  Google Scholar 

  92. Carbonell, P., Carlsson, L. & Faulon, J. Stereo signature molecular descriptor. J. Chem. Inf. Model. 53, 887–897 (2013).

    Article  CAS  PubMed  Google Scholar 

  93. Segall, M. D. Multi-parameter optimization: identifying high quality compounds with a balance of properties. Curr. Pharm. Des. 18, 1292–1310 (2012).

    Article  CAS  PubMed  Google Scholar 

  94. Schneider, G. & Fechner, U. Computer-based de novo design of drug-like molecules. Nature Rev. Drug Discov. 4, 649–663 (2005).

    Article  CAS  Google Scholar 

  95. Kutchukian, P. S. & Shakhnovich, E. I. De novo design: balancing novelty and confined chemical space. Expert Opin. Drug Discov. 5, 789–812 (2010).

    Article  CAS  PubMed  Google Scholar 

  96. Segall, M. et al. Applying medicinal chemistry transformations and multiparameter optimization to guide the search for high-quality leads and candidates. J. Chem. Inf. Model. 51, 2967–2976 (2011).

    Article  CAS  PubMed  Google Scholar 

  97. Besnard, J. et al. Automated design of ligands to polypharmacological profiles. Nature 492, 215–220 (2012). This paper demonstrates the value of predictive modelling in developing an expert system for drug design.

    Article  CAS  PubMed  Google Scholar 

  98. Segall, M. Why is it still drug discovery? BioFocus [online], (2008).

    Google Scholar 

  99. Hann, M. M. Molecular obesity, potency and other addictions in drug discovery. Med. Chem. Commun. 2, 349–355 (2011).

    Article  CAS  Google Scholar 

  100. Ashby, J. Fundamental structural alerts to potential carcinogenicity or noncarcinogenicity. Environ. Mutagen. 7, 919–921 (1985).

    Article  CAS  PubMed  Google Scholar 

  101. Bergstrom, C. A., Norinder, U., Luthman, K. & Artursson, P. Experimental and computational screening models for prediction of aqueous drug solubility. Pharm. Res. 19, 182–188 (2002).

    Article  CAS  PubMed  Google Scholar 

  102. Steinbeck, C. et al. The Chemistry Development Kit (CDK): an open-source java library for chemo- and bioinformatics. J. Chem. Inf. Comput. Sci. 43, 493–500 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. Tetko, I. V. et al. Virtual computational chemistry laboratory — design and description. J. Comput. Aided Mol. Des. 19, 453–463 (2005).

    Article  CAS  PubMed  Google Scholar 

  104. Berthold, M. R. et al. in Data Analysis, Machine Learning and Applications 319–326 (Springer, 2008).

    Book  Google Scholar 

  105. Leach, A. G. et al. Matched molecular pairs as a guide in the optimization of pharmaceutical properties; a study of aqueous solubility, plasma protein binding and oral exposure. J. Med. Chem. 49, 6672–6682 (2006). This paper introduced the MMPA technique.

    Article  CAS  PubMed  Google Scholar 

  106. Gleeson, P., Bravi, G., Modi, S. & Lowe, D. ADMET rules of thumb II: a comparison of the effects of common substituents on a range of ADMET parameters. Bioorg. Med. Chem. 17, 5906–5919 (2009).

    Article  CAS  PubMed  Google Scholar 

  107. Lewis, M. L. & Cucurull-Sanchez, L. Structural pairwise comparisons of HLM stability of phenyl derivatives: introduction of the Pfizer metabolism index (PMI) and metabolism-lipophilicity efficiency (MLE). J. Comput. Aided Mol. Des. 23, 97–103 (2009).

    Article  CAS  PubMed  Google Scholar 

  108. Dossetter, A. G. A statistical analysis of in vitro human microsomal metabolic stability of small phenyl group substituents, leading to improved design sets for parallel SAR exploration of a chemical series. Bioorg. Med. Chem. 18, 4405–4414 (2010).

    Article  CAS  PubMed  Google Scholar 

  109. Dossetter, A. G., Douglas, A. & O'Donnell, C. A matched molecular pair analysis of in vitro human microsomal metabolic stability measurements for heterocyclic replacements of di-substituted benzene containing compounds — identification of those isosteres more likely to have beneficial effects. Med. Chem. Commun. 3, 1164–1169 (2012).

    Article  CAS  Google Scholar 

  110. Dossetter, A. G. A matched molecular pair analysis of in vitro human microsomal metabolic stability measurements for methylene substitution or replacements — identification of those transforms more likely to have beneficial effects. Med. Chem. Commun. 3, 1518–1525 (2012).

    Article  CAS  Google Scholar 

  111. Papadatos, G. et al. Lead optimization using matched molecular pairs: inclusion of contextual information for enhanced prediction of hERG inhibition, solubility, and lipophilicity. J. Chem. Inform. Model. 50, 1872–1886 (2010).

    Article  CAS  Google Scholar 

  112. Keefer, C. E., Chang, G. & Kauffman, G. W. Extraction of tacit knowledge from large ADME data sets via pairwise analysis. Bioorg. Med. Chem. 19, 3739–3749 (2011).

    Article  CAS  PubMed  Google Scholar 

  113. Warner, D. J., Griffen, E. J. & St-Gallay, S. WizePairZ: a novel algorithm to identify, encode, and exploit matched molecular pairs with unspecified cores in medicinal chemistry. J. Chem. Inform. Model. 50, 1350–1357 (2010).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank H. Van de Waterbeemd and N. Blomberg for their input to the shaping of this Review, and E. Griffen for providing input on Table 4. We also thank P. Kocis and J. Li for their contributions to AZFilters, and the chemistry community of AstraZeneca for participating in the AZFilters crowdsourcing exercise. Finally we thank the reviewers for their helpful suggestions for improving the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to John G. Cumming or Andrew M. Davis.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

PowerPoint slides

Supplementary information

Glossary

cLogP

The calculated logarithm of the 1-octanol–water partition coefficient of the non-ionized molecule.

Congeneric series

A set of molecules belonging to the same class, usually with chemical changes limited to changes in substituents on a fixed chemical core.

LogD7.4

Log10 of the octanol–water partition coefficient of a molecule (for example, a drug) at pH 7.4.

Support vector machine

A machine learning method that uses kernel functions to map input data into high-dimensional feature space. Support vector machines can be used for classification or regression.

Random forest

A machine learning method that constructs a multitude of decision trees with a random selection of features to split each node. Random forests can be used for classification or regression.

Ames mutagenicity test

A biological assay that uses Salmonella bacteria to test the mutagenic potential of compounds and thereby assess their potential to cause cancer.

pKa

The pH at which a group would be protonated in 50% of molecules. More molecules will become protonated with decreasing pH, and vice versa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cumming, J., Davis, A., Muresan, S. et al. Chemical predictive modelling to improve compound quality. Nat Rev Drug Discov 12, 948–962 (2013). https://doi.org/10.1038/nrd4128

Download citation

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/nrd4128

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing