Chemical predictive modelling to improve compound quality

Cumming, John G.; Davis, Andrew M.; Muresan, Sorel; Haeberlein, Markus; Chen, Hongming

doi:10.1038/nrd4128

Review Article
Published: 29 November 2013

Chemical predictive modelling to improve compound quality

John G. Cumming¹,
Andrew M. Davis²,
Sorel Muresan³^nAff4,
Markus Haeberlein⁴^nAff6 &
…
Hongming Chen³

Nature Reviews Drug Discovery volume 12, pages 948–962 (2013)Cite this article

17k Accesses
254 Citations
32 Altmetric
Metrics details

Subjects

Key Points

Chemical predictive modelling encompasses empirical computational methods based on observed patterns in data that guide the design of future compounds.
Simple physicochemical property-based guidelines and structure-based chemical filters, such as AstraZeneca's AZFilters, are used to identify poor-quality compounds in screening set selection and compound design.
Despite their limitations, quantitative structure–activity relationship (QSAR) models of ADMET (absorption, distribution, metabolism, excretion and toxicity) properties are widely used in compound design; advice and guidance on the judicious use of QSAR methods has been published.
A key problem with QSAR methods is estimating confidence in the predictions, which is linked to the definition of the model's domain of applicability.
Project- or chemical series-specific QSAR models are one approach to solve the 'domain of applicability' problem but this approach requires automated model building to be practical for a large organization with multiple projects.
Interpretable models and inverse QSAR methods provide additional information to inform the design of compounds with improved properties.
Matched molecular pair analysis is complementary to standard QSAR, is interpretable and can be used to propose new compounds.
Despite the progress in chemical predictive modelling techniques, their impact on improving compound quality is difficult to assess and is limited by cultural factors.
These include continued debate over the application of compound quality guidelines and the diversity of opinions among medicinal chemists on attractive versus unattractive structures.
Current techniques are most successful in modelling ADMET properties, whereas prediction of potency or efficacy is more challenging.
Areas of active research include descriptors to incorporate chirality, multi-objective optimization and expert systems for compound optimization.

Abstract

The 'quality' of small-molecule drug candidates, encompassing aspects including their potency, selectivity and ADMET (absorption, distribution, metabolism, excretion and toxicity) characteristics, is a key factor influencing the chances of success in clinical trials. Importantly, such characteristics are under the control of chemists during the identification and optimization of lead compounds. Here, we discuss the application of computational methods, particularly quantitative structure–activity relationships (QSARs), in guiding the selection of higher-quality drug candidates, as well as cultural factors that may have affected their use and impact.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Performance of automated QSAR modelling.**

**Figure 2: Matched molecular pair analysis.**

**Figure 3: Example of the influence of organizational factors on the uptake of chemical predictive modelling.**

**Figure 4: Lack of consistency in expert evaluations of chemical quality.**

Topological regression as an interpretable and efficient tool for quantitative structure-activity relationship modeling

Article Open access 13 June 2024

Characterization of tricyclic anti-depressant drugs efficacy via topological indices

Article Open access 02 July 2025

Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR

Article 08 December 2023

References

Paul, S. M. et al. How to improve R&D productivity: the pharmaceutical industry's grand challenge. Nature Rev. Drug Discov. 9, 203–214 (2010). This is a useful source of data on timelines, the probability of technical success and the costs associated with running drug discovery and development projects.
Article CAS Google Scholar
Morgan, P. et al. Can the flow of medicines be improved? Fundamental pharmacokinetic and pharmacological principles toward improving Phase II survival. Drug Discov. Today 17, 419–424 (2012). This paper describes Pfizer's drug development experience, and introduces the concept of target engagement as a key confidence builder in projects.
Article CAS PubMed Google Scholar
van de Waterbeemd, H. & Gifford, E. ADMET in silico modelling: towards prediction paradise? Nature Rev. Drug Discov. 2, 192–204 (2003).
Article CAS Google Scholar
Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 23, 3–25 (1997). This seminal paper introduced the 'rule of five' guidelines for oral bioavailability; these are the original compound quality guidelines based on simple calculated physicochemical properties.
Article CAS Google Scholar
Teague, S. J., Davis, A. M., Leeson, P. D. & Oprea, T. The design of leadlike combinatorial libraries. Angew. Chem. Int. Ed Engl. 38, 3743–3748 (1999). This paper introduces the lead-like concept, which has been highly influential on the lead generation activities of many companies.
Article CAS PubMed Google Scholar
Hann, M. M. & Oprea, T. I. Pursuing the leadlikeness concept in pharmaceutical research. Curr. Opin. Chem. Biol. 8, 255–263 (2004).
Article CAS PubMed Google Scholar
Lipinski, C. A. in Annual Reports in Computational Chemistry (ed. David, C. S.) 155–168 (Elsevier, 2005).
Book Google Scholar
Walters, W. P. Going further than Lipinski's rule in drug design. Expert Opin. Drug Discov. 7, 99–107 (2012).
Article CAS PubMed Google Scholar
Congreve, M., Carr, R., Murray, C. & Jhoti, H. A 'rule of three' for fragment-based lead discovery? Drug Discov. Today 8, 876–877 (2003).
Article PubMed Google Scholar
Wager, T. T. et al. Defining desirable central nervous system drug space through the alignment of molecular properties, in vitro ADME, and safety attributes. ACS Chem. Neurosci. 1, 420–434 (2010).
Article CAS PubMed PubMed Central Google Scholar
Gleeson, M. P. Generation of a set of simple, interpretable ADMET rules of thumb. J. Med. Chem. 51, 817–834 (2008).
Article CAS PubMed Google Scholar
Hughes, J. D. et al. Physiochemical drug properties associated with in vivo toxicological outcomes. Bioorg. Med. Chem. Lett. 18, 4872–4875 (2008).
Article CAS PubMed Google Scholar
Leeson, P. D. & Davis, A. M. Time-related differences in the physical property profiles of oral drugs. J. Med. Chem. 47, 6338–6348 (2004).
Article CAS PubMed Google Scholar
Hann, M. M., Leach, A. R. & Harper, G. Molecular complexity and its impact on the probability of finding leads for drug discovery. J. Chem. Inf. Comput. Sci. 41, 856–864 (2001).
Article CAS PubMed Google Scholar
Vistoli, G., Pedretti, A. & Testa, B. Assessing drug-likeness — what are we missing? Drug Discov. Today 13, 285–294 (2008).
Article CAS PubMed Google Scholar
Andrews, P. R., Craik, D. J. & Martin, J. L. Functional group contributions to drug-receptor interactions. J. Med. Chem. 27, 1648–1657 (1984).
Article CAS PubMed Google Scholar
Kuntz, I. D., Chen, K., Sharp, K. A. & Kollman, P. A. The maximal affinity of ligands. Proc. Natl Acad. Sci. USA 96, 9997–10002 (1999).
Article CAS PubMed PubMed Central Google Scholar
Leeson, P. D. & Springthorpe, B. The influence of drug-like concepts on decision-making in medicinal chemistry. Nature Rev. Drug Discov. 6, 881–890 (2007). This is a provocative publication that challenges medicinal chemists' decision-making practices.
Article CAS Google Scholar
Keseru, G. M. & Makara, G. M. The influence of lead discovery strategies on the properties of drug candidates. Nature Rev. Drug Discov. 8, 203–212 (2009).
Article CAS Google Scholar
Murray, C. W., Verdonk, M. L. & Rees, D. C. Experiences in fragment-based drug discovery. Trends Pharmacol. Sci. 33, 224–232 (2012).
Article CAS PubMed Google Scholar
Leeson, P. D. & St-Gallay, S. The influence of the 'organizational factor' on compound quality in drug discovery. Nature Rev. Drug Discov. 10, 749–765 (2011).
Article CAS Google Scholar
Tarcsay, A., Nyiri, K. & Keseru, G. M. Impact of lipophilic efficiency on compound quality. J. Med. Chem. 55, 1252–1260 (2012).
Article CAS PubMed Google Scholar
Tarcsay, A., Nyiri, K. & Keseru, G. M. Correction to impact of lipophilic efficiency on compound quality. J. Med. Chem. 56, 3120 (2013).
Article CAS Google Scholar
Gilbert, M. R. Reactive compounds and in vitro false positives in HTS. Drug Discov. Today 2, 382–384 (1997).
Article Google Scholar
Baell, J. B. & Holloway, G. A. New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J. Med. Chem. 53, 2719–2740 (2010).
Article CAS PubMed Google Scholar
Davis, A. M., Keeling, D. J., Steele, J., Tomkinson, N. P. & Tinker, A. C. Components of successful lead generation. Curr. Top. Med. Chem. 5, 421–439 (2005).
Article CAS PubMed Google Scholar
Ursu, O., Rayan, A., Goldblum, A. & Oprea, T. I. Understanding drug-likeness. Wiley Interdiscip. Rev. Comput. Mol. Sci. 1, 760–781 (2011).
Article CAS Google Scholar
Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nature Chem. 4, 90–98 (2012).
Article CAS Google Scholar
Hansch, C. in QSAR and Molecular Modelling in Rational Design of Bioactive Molecules: Programs and Abstracts (eds Aki-Sener, E. & Yalcin, I.) 3–22 (Proceedings of the 15th European Symposium on Structure-Activity Relationships (QSAR) and Molecular Modelling, 2006).
Google Scholar
Huang, J. & Fan, X. Why QSAR fails: an empirical evaluation using conventional computational approach. Mol. Pharm. 8, 600–608 (2011).
Article CAS PubMed Google Scholar
Doweyko, A. M. QSAR: dead or alive? J. Comput. Aided Mol. Des. 22, 81–89 (2008).
Article CAS PubMed Google Scholar
Stouch, T. R. et al. In silico ADME/Tox: why models fail. J. Comput. Aided Mol. Des. 17, 83–92 (2003). This is a textbook case study on how not to build QSARs.
Article CAS PubMed Google Scholar
Cronin, M. T. D. & Schultz, T. W. Pitfalls in QSAR. J. Mol. Struct. 622, 39–51 (2003).
Article CAS Google Scholar
Young, D., Martin, T., Venkatapathy, R. & Harten, P. Are the chemical structures in your QSAR correct? QSAR Combinatorial Sci. 27, 1337–1345 (2008).
Article CAS Google Scholar
Williams, A. J., Ekins, S. & Tkachenko, V. Towards a gold standard: regarding quality in public domain chemistry databases and approaches to improving the situation. Drug Discov. Today 17, 685–701 (2012).
Article CAS PubMed Google Scholar
Jorgensen, W. L. QSAR/QSPR and proprietary data. J. Chem. Inf. Model. 46, 937 (2006).
Article CAS Google Scholar
Tetko, I. V., Bruneau, P., Mewes, H., Rohrer, D. C. & Poda, G. I. Can we estimate the accuracy of ADME–Tox predictions? Drug Discov. Today 11, 700–707 (2006).
Article CAS PubMed Google Scholar
Tetko, I. V. et al. Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection. J. Chem. Inf. Model. 48, 1733–1746 (2008).
Article CAS PubMed Google Scholar
Sahigara, F. et al. Comparison of different approaches to define the applicability domain of QSAR models. Molecules 17, 4791–4810 (2012).
Article CAS PubMed PubMed Central Google Scholar
Maggiora, G. M. On outliers and activity cliffs — why QSAR often disappoints. J. Chem. Inf. Model. 46, 1535 (2006).
Article CAS PubMed Google Scholar
Schwantes, J. M., Orton, C. R., Fraga, C. G., Douglas, M. & Christensen, R. N. The multi-isotope process (MIP) monitor: a near-real-time, non-destructive, indicator of spent nuclear fuel reprocessing conditions. Proceedings of the 50th Annual Meeting of the Institute of Nuclear Materials [online], (2009).
Google Scholar
Olah, M., Bologa, C. & Oprea, T. I. An automated PLS search for biologically relevant QSAR descriptors. J. Comput. Aided Mol. Des. 18, 437–449 (2004).
Article CAS PubMed Google Scholar
Sushko, I. et al. Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J. Comput. Aided Mol. Des. 25, 533–554 (2011).
Article CAS PubMed PubMed Central Google Scholar
Cartmell, J., Krstajic, D. & Leahy, D. E. Competitive workflow: novel software architecture for automating drug design. Curr. Opin. Drug Discov. Devel. 10, 347–352 (2007).
CAS PubMed Google Scholar
Hughes-Oliver, J. M. et al. ChemModLab: a web-based cheminformatics modeling laboratory. In Silico Biol. 11, 61–81 (2011).
CAS Google Scholar
Obrezanova, O., Gola, J. M., Champness, E. J. & Segall, M. D. Automatic QSAR modeling of ADME properties: blood–brain barrier penetration and aqueous solubility. J. Comput. Aided Mol. Des. 22, 431–440 (2008).
Article CAS PubMed Google Scholar
Fischer, H. & Kansy, M. Automated generation of multi-dimensional structure activity and structure property relationships. US Patent 7400982 (2008).
Rodgers, S. L., Davis, A. M., Tomkinson, N. P. & van de Waterbeemd, H. Predictivity of simulated ADME AutoQSAR models over time. Mol. Inform. 30, 256–266 (2011).
Article CAS PubMed Google Scholar
Wood, D. J. et al. Automated QSAR with a hierarchy of global and local models. Mol. Inform. 30, 960–972 (2011).
Article CAS PubMed Google Scholar
Keefer, C. E., Kauffman, G. W. & Gupta, R. R. Interpretable, probability-based confidence metric for continuous quantitative structure–activity relationship models. J. Chem. Inf. Model. 53, 368–383 (2013).
Article CAS PubMed Google Scholar
Kramer, C. et al. Sharpening the toolbox of computational chemistry: a new approximation of critical f-values for multiple linear regression. J. Chem. Inf. Model. 49, 28–34 (2009).
Article CAS PubMed Google Scholar
Livingstone, D. J. & Salt, D. W. Judging the significance of multiple linear regression models. J. Med. Chem. 48, 661–663 (2005).
Article CAS PubMed Google Scholar
Kubinyi, H. in Handbook of Chemoinformatics: From Data to Knowledge in 4 Volumes (ed. Gasteiger, J.) 1532–1554 (Wiley-VCH Weinheim, 2003).
Google Scholar
Rucker, C., Rucker, G. & Meringer, M. y-Randomization and its variants in QSPR/QSAR. J. Chem. Inf. Model. 47, 2345–2357 (2007).
Article CAS PubMed Google Scholar
Guha, R. On the interpretation and interpretability of quantitative structure–activity relationship models. J. Computer-Aided Mol. Design 22, 857–871 (2008).
Article CAS Google Scholar
Johansson, U., Sonstrod, C., Norinder, U. & Bostrom, H. Trade-off between accuracy and interpretability for predictive in silico modeling. Future Med. Chem. 3, 647–663 (2011).
Article CAS PubMed Google Scholar
Carlsson, L., Helgee, E. A. & Boyer, S. Interpretation of nonlinear QSAR models applied to Ames mutagenicity data. J. Chem. Inf. Model. 49, 2551–2558 (2009).
Article CAS PubMed Google Scholar
Faulon, J. L., Visco, D. P. Jr & Pophale, R. S. The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. J. Chem. Inf. Comput. Sci. 43, 707–720 (2003).
Article CAS PubMed Google Scholar
Spjuth, O., Eklund, M., Ahlberg Helgee, E., Boyer, S. & Carlsson, L. Integrated decision support for assessing chemical liabilities. J. Chem. Inf. Model. 51, 1840–1847 (2011).
Article CAS PubMed Google Scholar
Segall, M., Champness, E., Obrezanova, O. & Leeding, C. Beyond profiling: using ADMET models to guide decisions. Chem. Biodivers. 6, 2144–2151 (2009).
Article CAS PubMed Google Scholar
Lewis, R. A. A general method for exploiting, QSAR models in lead optimization. J. Med. Chem. 48, 1638–1648 (2005).
Article CAS PubMed Google Scholar
Helgee, E. A., Carlsson, L. & Boyer, S. A. Method for automated molecular optimization applied to Ames mutagenicity data. J. Chem. Inform. Model. 49, 2559–2563 (2009).
Article CAS Google Scholar
Griffen, E., Leach, A. G., Robb, G. R. & Warner, D. J. Matched molecular pairs as a medicinal chemistry tool. J. Med. Chem. 54, 7739–7750 (2011).
Article CAS PubMed Google Scholar
Dossetter, A. G., Griffen, E. J. & Leach, A. G. Matched molecular pair analysis in drug discovery. Drug Discov. Today 18, 724–731 (2013).
Article CAS PubMed Google Scholar
Griffen, E. The rise of the intelligent machines in drug hunting? Future Med. Chem. 1, 405–408 (2009).
Article CAS PubMed Google Scholar
Warner, D. J., Bridgland-Taylor, M. H., Sefton, C. E. & Wood, D. J. Prospective prediction of antitarget activity by matched molecular pairs analysis. Mol. Inform. 31, 365–368 (2012).
Article CAS PubMed Google Scholar
Hajduk, P. J. & Sauer, D. R. Statistical analysis of the effects of common chemical substituents on ligand potency. J. Med. Chem. 51, 553–564 (2008).
Article CAS PubMed Google Scholar
Mills, J. E. J. et al. SAR mining and its application to the design of TRPA1 antagonists. Med. Chem. Commun. 3, 174–178 (2012).
Article CAS Google Scholar
Dalke, A., Bache, E., Van De Waterbeemd, H. & Boyer, S. C-Lab: a web tool for physical property and model calculations. Dalke Scientific [online], (2008).
Google Scholar
Gavaghan, C., Arnby, C., Blomberg, N., Strandlund, G. & Boyer, S. Development, interpretation and temporal evaluation of a global QSAR of hERG electrophysiology screening data. J. Comput. Aided Mol. Des. 21, 189–206 (2007).
Article CAS PubMed Google Scholar
Dobo, K. L. et al. In silico methods combined with expert knowledge rule out mutagenic potential of pharmaceutical impurities: an industry survey. Regul. Toxicol. Pharmacol. 62, 449–455 (2012).
Article CAS PubMed Google Scholar
Austin, R. P. et al. QSAR and the rational design of long-acting dual D2-receptor/β2-adrenoceptor agonists. J. Med. Chem. 46, 3210–3220 (2003).
Article CAS PubMed Google Scholar
Brown, A. D. et al. The discovery of indole-derived long acting β2-adrenoceptor agonists for the treatment of asthma and COPD. Bioorg. Med. Chem. Lett. 17, 6188–6191 (2007).
Article CAS PubMed Google Scholar
Baur, F. et al. The identification of indacaterol as an ultralong-acting inhaled β2-adrenoceptor agonist. J. Med. Chem. 53, 3675–3684 (2010).
Article CAS PubMed Google Scholar
Bruneau, P. Search for predictive generic model of aqueous solubility using Bayesian neural nets. J. Chem. Inf. Comput. Sci. 41, 1605–1616 (2001).
Article CAS PubMed Google Scholar
Loughney, D., Claus, B. L. & Johnson, S. R. To measure is to know: an approach to CADD performance metrics. Drug Discov. Today 16, 548–554 (2011).
Article PubMed Google Scholar
Kenny, P. W. & Montanari, C. A. Inflation of correlation in the pursuit of drug-likeness. J. Comput. Aided Mol. Des. 27, 1–13 (2013). This study challenges various highly cited papers on the robustness of their conclusions and provides good statistical guidance on studying drug-likeness through database analysis.
Article CAS PubMed Google Scholar
Lovering, F., Bikker, J. & Humblet, C. Escape from flatland: increasing saturation as an approach to improving clinical success. J. Med. Chem. 52, 6752–6756 (2009).
Article CAS PubMed Google Scholar
Muthas, D., Boyer, S. & Hasselgren, C. A critical assessment of modeling safety-related drug attrition Med. Chem. Commun. 4, 1058–1065 (2013).
Article CAS Google Scholar
Bennani, Y. L. Drug discovery in the next decade: innovation needed ASAP. Drug Discov. Today 16, 779–792 (2011).
Article PubMed Google Scholar
Vaidyanathan, S., Jarugula, V., Dieterich, H. A., Howard, D. & Dole, W. P. Clinical pharmacokinetics and pharmacodynamics of aliskiren. Clin. Pharmacokinet. 47, 515–531 (2008).
Article CAS PubMed Google Scholar
Springthorpe, B. et al. From ATP to AZD6140: the discovery of an orally active reversible P2Y12 receptor antagonist for the prevention of thrombosis. Bioorg. Med. Chem. Lett. 17, 6013–6018 (2007).
Article CAS PubMed Google Scholar
Lajiness, M. S., Maggiora, G. M. & Shanmugasundaram, V. Assessment of the consistency of medicinal chemists in reviewing sets of compounds. J. Med. Chem. 47, 4891–4896 (2004).
Article CAS PubMed Google Scholar
Kutchukian, P. S. et al. Inside the mind of a medicinal chemist: the role of human bias in compound prioritization during drug discovery. PLoS ONE 7, e48476 (2012). This is an investigation into the role of cognitive biases in medicinal chemistry decision-making.
Article CAS PubMed PubMed Central Google Scholar
Oprea, T. I. et al. A crowdsourcing evaluation of the NIH chemical probes. Nature Chem. Biol. 5, 441–447 (2009).
Article CAS Google Scholar
Schein, E. H. The Corporate Culture Survival Guide (Wiley, 2009).
Google Scholar
Stepan, A. F. et al. Structural alert/reactive metabolite concept as applied in medicinal chemistry to mitigate the risk of idiosyncratic drug toxicity: a perspective based on the critical examination of trends in the top 200 drugs marketed in the United States. Chem. Res. Toxicol. 24, 1345–1410 (2011).
Article CAS PubMed Google Scholar
Martin, Y. C. What works and what does not: lessons from experience in a pharmaceutical company. QSAR Comb. Sci. 25, 1192–1200 (2006).
Article CAS Google Scholar
Young, S. S., Yuan, F. & Zhu, M. Chemical descriptors are more important than learning algorithms for modelling. Mol. Inform. 31, 707–710 (2012).
Article CAS PubMed Google Scholar
Leach, A. G. et al. Enantiomeric pairs reveal that key medicinal chemistry parameters vary more than simple physical property based models can explain. Med. Chem. Commun. 3, 528–540 (2012).
Article CAS Google Scholar
Hillebrecht, A. & Klebe, G. Use of 3D QSAR models for database screening: a feasibility study. J. Chem. Inf. Model. 48, 384–396 (2008).
Article CAS PubMed Google Scholar
Carbonell, P., Carlsson, L. & Faulon, J. Stereo signature molecular descriptor. J. Chem. Inf. Model. 53, 887–897 (2013).
Article CAS PubMed Google Scholar
Segall, M. D. Multi-parameter optimization: identifying high quality compounds with a balance of properties. Curr. Pharm. Des. 18, 1292–1310 (2012).
Article CAS PubMed Google Scholar
Schneider, G. & Fechner, U. Computer-based de novo design of drug-like molecules. Nature Rev. Drug Discov. 4, 649–663 (2005).
Article CAS Google Scholar
Kutchukian, P. S. & Shakhnovich, E. I. De novo design: balancing novelty and confined chemical space. Expert Opin. Drug Discov. 5, 789–812 (2010).
Article CAS PubMed Google Scholar
Segall, M. et al. Applying medicinal chemistry transformations and multiparameter optimization to guide the search for high-quality leads and candidates. J. Chem. Inf. Model. 51, 2967–2976 (2011).
Article CAS PubMed Google Scholar
Besnard, J. et al. Automated design of ligands to polypharmacological profiles. Nature 492, 215–220 (2012). This paper demonstrates the value of predictive modelling in developing an expert system for drug design.
Article CAS PubMed Google Scholar
Segall, M. Why is it still drug discovery? BioFocus [online], (2008).
Google Scholar
Hann, M. M. Molecular obesity, potency and other addictions in drug discovery. Med. Chem. Commun. 2, 349–355 (2011).
Article CAS Google Scholar
Ashby, J. Fundamental structural alerts to potential carcinogenicity or noncarcinogenicity. Environ. Mutagen. 7, 919–921 (1985).
Article CAS PubMed Google Scholar
Bergstrom, C. A., Norinder, U., Luthman, K. & Artursson, P. Experimental and computational screening models for prediction of aqueous drug solubility. Pharm. Res. 19, 182–188 (2002).
Article CAS PubMed Google Scholar
Steinbeck, C. et al. The Chemistry Development Kit (CDK): an open-source java library for chemo- and bioinformatics. J. Chem. Inf. Comput. Sci. 43, 493–500 (2003).
Article CAS PubMed PubMed Central Google Scholar
Tetko, I. V. et al. Virtual computational chemistry laboratory — design and description. J. Comput. Aided Mol. Des. 19, 453–463 (2005).
Article CAS PubMed Google Scholar
Berthold, M. R. et al. in Data Analysis, Machine Learning and Applications 319–326 (Springer, 2008).
Book Google Scholar
Leach, A. G. et al. Matched molecular pairs as a guide in the optimization of pharmaceutical properties; a study of aqueous solubility, plasma protein binding and oral exposure. J. Med. Chem. 49, 6672–6682 (2006). This paper introduced the MMPA technique.
Article CAS PubMed Google Scholar
Gleeson, P., Bravi, G., Modi, S. & Lowe, D. ADMET rules of thumb II: a comparison of the effects of common substituents on a range of ADMET parameters. Bioorg. Med. Chem. 17, 5906–5919 (2009).
Article CAS PubMed Google Scholar
Lewis, M. L. & Cucurull-Sanchez, L. Structural pairwise comparisons of HLM stability of phenyl derivatives: introduction of the Pfizer metabolism index (PMI) and metabolism-lipophilicity efficiency (MLE). J. Comput. Aided Mol. Des. 23, 97–103 (2009).
Article CAS PubMed Google Scholar
Dossetter, A. G. A statistical analysis of in vitro human microsomal metabolic stability of small phenyl group substituents, leading to improved design sets for parallel SAR exploration of a chemical series. Bioorg. Med. Chem. 18, 4405–4414 (2010).
Article CAS PubMed Google Scholar
Dossetter, A. G., Douglas, A. & O'Donnell, C. A matched molecular pair analysis of in vitro human microsomal metabolic stability measurements for heterocyclic replacements of di-substituted benzene containing compounds — identification of those isosteres more likely to have beneficial effects. Med. Chem. Commun. 3, 1164–1169 (2012).
Article CAS Google Scholar
Dossetter, A. G. A matched molecular pair analysis of in vitro human microsomal metabolic stability measurements for methylene substitution or replacements — identification of those transforms more likely to have beneficial effects. Med. Chem. Commun. 3, 1518–1525 (2012).
Article CAS Google Scholar
Papadatos, G. et al. Lead optimization using matched molecular pairs: inclusion of contextual information for enhanced prediction of hERG inhibition, solubility, and lipophilicity. J. Chem. Inform. Model. 50, 1872–1886 (2010).
Article CAS Google Scholar
Keefer, C. E., Chang, G. & Kauffman, G. W. Extraction of tacit knowledge from large ADME data sets via pairwise analysis. Bioorg. Med. Chem. 19, 3739–3749 (2011).
Article CAS PubMed Google Scholar
Warner, D. J., Griffen, E. J. & St-Gallay, S. WizePairZ: a novel algorithm to identify, encode, and exploit matched molecular pairs with unspecified cores in medicinal chemistry. J. Chem. Inform. Model. 50, 1350–1357 (2010).
Article CAS Google Scholar

Download references

Acknowledgements

We thank H. Van de Waterbeemd and N. Blomberg for their input to the shaping of this Review, and E. Griffen for providing input on Table 4. We also thank P. Kocis and J. Li for their contributions to AZFilters, and the chemistry community of AstraZeneca for participating in the AZFilters crowdsourcing exercise. Finally we thank the reviewers for their helpful suggestions for improving the manuscript.

Author information

Sorel Muresan
Present address: Present address: AkzoNobel Surface Chemistry, Hamnvägen 2, 444 85 Stenungsund, Sweden.,
Markus Haeberlein
Present address: Present address: Proteostasis Therapeutics, 200 Technology Square, Cambridge, Massachusetts 02139, USA.,

Authors and Affiliations

Chemistry Innovation Centre, Discovery Sciences, AstraZeneca R&D, Alderley Park, Macclesfield, SK10 4TG, UK
John G. Cumming
Respiratory, Inflammation and Autoimmunity Innovative Medicines Unit, AstraZeneca R&D, Pepparedsleden 1, 431 83, Mölndal, Sweden
Andrew M. Davis
Chemistry Innovation Centre, Discovery Sciences, AstraZeneca R&D, Pepparedsleden 1, 431 83, Mölndal, Sweden
Sorel Muresan & Hongming Chen
CNS & Pain Innovative Medicines, AstraZeneca R&D, 151 85, Södertälje, Sweden
Markus Haeberlein

Authors

John G. Cumming
View author publications
Search author on:PubMed Google Scholar
Andrew M. Davis
View author publications
Search author on:PubMed Google Scholar
Sorel Muresan
View author publications
Search author on:PubMed Google Scholar
Markus Haeberlein
View author publications
Search author on:PubMed Google Scholar
Hongming Chen
View author publications
Search author on:PubMed Google Scholar

Corresponding authors

Correspondence to John G. Cumming or Andrew M. Davis.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary information S1 (box)

AZFilters (PDF 300 kb)

Glossary

cLogP: The calculated logarithm of the 1-octanol–water partition coefficient of the non-ionized molecule.
Congeneric series: A set of molecules belonging to the same class, usually with chemical changes limited to changes in substituents on a fixed chemical core.
LogD_7.4: Log₁₀ of the octanol–water partition coefficient of a molecule (for example, a drug) at pH 7.4.
Support vector machine: A machine learning method that uses kernel functions to map input data into high-dimensional feature space. Support vector machines can be used for classification or regression.
Random forest: A machine learning method that constructs a multitude of decision trees with a random selection of features to split each node. Random forests can be used for classification or regression.
Ames mutagenicity test: A biological assay that uses Salmonella bacteria to test the mutagenic potential of compounds and thereby assess their potential to cause cancer.
pK_a: The pH at which a group would be protonated in 50% of molecules. More molecules will become protonated with decreasing pH, and vice versa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cumming, J., Davis, A., Muresan, S. et al. Chemical predictive modelling to improve compound quality. Nat Rev Drug Discov 12, 948–962 (2013). https://doi.org/10.1038/nrd4128

Download citation

Published: 29 November 2013
Issue date: December 2013
DOI: https://doi.org/10.1038/nrd4128

This article is cited by

Integrating synthetic accessibility with AI-based generative drug design
- Maud Parrot
- Hamza Tajmouati
- Quentin Perron
Journal of Cheminformatics (2023)
Exploring the Potential of Compounds Isolated from Laranthus micranthus for the Treatment of Benign Prostatic Hyperplasia: Comprehensive Studies on Spectroscopic, Reactivity, and Biological Activity
- Richard U. Ukpanukpong
- Adindu E. Azubuike
- Hitler Louis
Chemistry Africa (2023)
Tandem Synthesis of Novel thiazole-substituted pyrrolo[1,2-d][1,2,4]triazin-4(3H)-one Derivatives and their Theoretical Pharmacokinetic Profiles
- Eylem Kuzu
- Burak Kuzu
Chemistry of Heterocyclic Compounds (2023)
Investigation of Novel Imidazole Analogues with Terminal Sulphonamides as Potential V600E-BRAF Inhibitors Through Computational Approaches
- Abdullahi Bello Umar
- Adamu Uzairu
- Bishir Usman
Chemistry Africa (2023)
Transformer-based molecular optimization beyond matched molecular pairs
- Jiazhen He
- Eva Nittinger
- Ola Engkvist
Journal of Cheminformatics (2022)