Abstract
Aim:
To develop a reliable computational approach for predicting potential drug targets based merely on protein sequence.
Methods:
With drug target and non-target datasets prepared and 3 classification algorithms (Support Vector Machine, Neural Network and Decision Tree), a multi-algorithm and multi-model based strategy was employed for constructing models to predict potential drug targets.
Results:
Twenty one prediction models for each of the 3 algorithms were successfully developed. Our evaluation results showed that ∼30% of human proteins were potential drug targets, and ∼40% of putative targets for the drugs undergoing phase II clinical trials were probably non-targets. A public web server named D3TPredictor (http://www.d3pharma.com/d3tpredictor) was constructed to provide easy access.
Conclusion:
Reliable and robust drug target prediction based on protein sequences is achieved using the multi-algorithm and multi-model strategy.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Ohlstein EH, Ruffolo RR, Elliott JD . Drug discovery in the next millennium. Annu Rev Pharmacol Toxicol 2000; 40: 177–91.
Hopkins AL, Groom CR . The druggable genome. Nat Rev Drug Discov 2002; 1: 727–30.
Drews J . Drug discovery: a historical perspective. Science 2000; 287: 1960–4.
Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 2006; 34: D668–72.
Drews J . Genomic sciences and the medicine of tomorrow. Nat Biotechnol 1996; 14: 1516–8.
Overington JP, Al-Lazikani B, Hopkins AL . Opinion — How many drug targets are there? Nat Rev Drug Discov 2006; 5: 993–6.
Butcher SP . Target discovery and validation in the post-genomic era. Neurochem Res 2003; 28: 367–71.
An J, Totrov M, Abagyan R . Comprehensive identification of “druggable” protein ligand binding sites. Genome Inform 2004; 15: 31–41.
Russ AP, Lampel S . The druggable genome: an update. Drug Discov Today 2005; 10: 1607–10.
Hardy LW, Peet NP . The multiple orthogonal tools approach to define molecular causation in the validation of druggable targets. Drug Discov Today 2004; 9: 117–26.
Hajduk PJ, Huth JR, Tse C . Predicting protein druggability. Drug Discov Today 2005; 10: 1675–82.
Hajduk PJ, Huth JR, Fesik SW . Druggability indices for protein targets derived from NMR-based screening data. J Med Chem 2005; 48: 2518–25.
Mullner S, Neumann T, Lottspeich F . Proteomics — a new way for drug target discovery. Arzneimittelforschung 1998; 48: 93–5.
Han LY, Zheng CJ, Xie B, Jia J, Ma XH, Zhu F, et al. Support vector machines approach for predicting druggable proteins: recent progress in its exploration and investigation of its usefulness. Drug Discov Today 2007; 12: 304–13.
Bakheet TM, Doig AJ . Properties and identification of human protein drug targets. Bioinformatics 2009; 25: 451–7.
Xu H, Lin M, Wang W, Li Z, Huang J, Chen Y, et al. Learning the drug target-likeness of a protein. Proteomics 2007; 7: 4255–63.
Li Q, Lai L . Prediction of potential drug targets based on simple sequence properties. BMC Bioinformatics 2007; 8: 353.
Zhang GL, Khan AM, Srinivasan KN, August JT, Brusic V . Neural models for predicting viral vaccine targets. J Bioinform Comput Biol 2005; 3: 1207–25.
Niwa T . Prediction of biological targets using probabilistic neural networks and atom-type descriptors. J Med Chem 2004; 47: 2645–50.
Nidhi, Glick M, Davies JW, Jenkins JL . Prediction of biological targets for compounds using multiple-category Bayesian models trained on chemogenomics databases. J Chem Inf Model 2006; 46: 1124–33.
Xu H, Fang Y, Yao L, Chen Y, Chen X . Does drug-target have a likeness? Method Inf Med 2007; 46: 360–6.
Huang C, Zhang R, Chen Z, Jiang Y, Shang Z, Sun P, et al. Predict potential drug targets from the ion channel proteins based on SVM. J Theor Biol 2009; 262: 750–6.
Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 2003; 31: 365–70.
Plosker GR . Information strategist — Thomson pharma and infotrieve life science research center: New directions for online aggregators. Online 2006; 30: 47–51.
Ji ZL, Han LY, Yap CW, Sun LZ, Chen X, Chen YZ . Drug adverse reaction target database (DART): proteins related to adverse drug reactions. Drug Saf 2003; 26: 685–90.
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The protein data bank. Nucleic Acids Res 2000; 28: 235–42.
Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, et al. The universal protein resource (UniProt). Nucleic Acids Res 2005; 33: D154–9.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ . Basic local alignment search tool. J Mol Biol 1990; 215: 403–10.
Vanopdenbosch N, Cramer R, Giarrusso FF . Sybyl, the integrated molecular modeling system. J Mol Graph 1985; 3: 110–1.
Halgren TA . Identifying and characterizing binding sites and assessing druggability. J Chem Inf Model 2009; 49: 377–89.
Chang CC, Lin CJ . LIBSVM: a library for support vector machines. ACM TIST 2011; 2: 1–27.
Rice P, Longden I, Bleasby A . EMBOSS: the european molecular biology open software suite. Trends Genet 2000; 16: 276–7.
Nielsen H, Engelbrecht J, Brunak S, vonHeijne G . Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng 1997; 10: 1–6.
Center for biological sequence analysis [homepage on the Internet]. Technical University of Denmark; c2001–2013 [updated 2013 Jun 5; cited 2013 Jul 19]. Available from: http://www.cbs.dtu.dk/services/NetNGlyc/.
Julenius K, Molgaard A, Gupta R, Brunak S . Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites. Glycobiology 2005; 15: 153–64.
Li ZR, Lin HH, Han LY, Jiang L, Chen X, Chen YZ . PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res 2006; 34: W32–7.
Kyte J, Doolittle RF . A simple method for displaying the hydropathic character of a protein. J Mol Biol 1982; 157: 105–32.
Dobson PD, Doig AJ . Distinguishing enzyme structures from non-enzymes without alignments. J Mol Biol 2003; 330: 771–83.
Paul SM, Mytelka DS, Dunwiddie CT, Persinger CC, Munos BH, Lindborg SR, et al. How to improve R&D productivity: the pharmaceutical industry's grand challenge. Nat Rev Drug Discov 2010; 9: 203–14.
Kola I, Landis J . Can the pharmaceutical industry reduce attrition rates? Nat Rev Drug Discov 2004; 3: 711–5.
Bateman A, Birney E, Durbin R, Eddy SR, Howe KL, Sonnhammer EL . The Pfam protein families database. Nucleic Acids Res 2000; 28: 263–6.
Lagerstrom MC, Schioth HB . Structural diversity of G protein — coupled receptors and significance for drug discovery. Nat Rev Drug Discov 2008; 7: 339–57.
Chantry D . G protein — coupled receptors: from ligand identification to drug targets. 14–16 October 2002, San Diego, CA, USA. Expert Opin Emerg Drugs 2003; 8: 273–6.
Cohen P . Protein kinases — the major drug targets of the twenty — first century? Nat Rev Drug Discov 2002; 1: 309–15.
Asano T, Ikegaki I, Satoh S, Seto M, Sasaki Y . A protein kinase inhibitor, fasudil (AT-877): A novel approach to signal transduction therapy. Cardiovasc Drug Rev 1998; 16: 76–87.
Garber K . Rapamycin's resurrection: a new way to target the cancer cell cycle. J Natl Cancer Inst 2001; 93: 1517–9.
Schindler T, Bornmann W, Pellicena P, Miller WT, Clarkson B, Kuriyan J . Structural mechanism for STI-571 inhibition of abelson tyrosine kinase. Science 2000; 289: 1938–42.
Sebolt-Leopold JS, Dudley DT, Herrera R, Van Becelaere K, Wiland A, Gowan RC, et al. Blockade of the MAP kinase pathway suppresses growth of colon tumors in vivo. Nat Med 1999; 5: 810–6.
Senderowicz AM . Small molecule modulators of cyclin-dependent kinases for cancer therapy. Oncogene 2000; 19: 6600–6.
Morin MJ . From oncogene to drug: development of small molecule tyrosine kinase inhibitors as anti-tumor and anti-angiogenic agents. Oncogene 2000; 19: 6574–83.
Acknowledgements
This work was supported by National Natural Science Foundation of China (81273435 and 21021063), National Science & Technology Projects (2012ZX09301001-004, 2012AA01A305, and 2013ZX09103001-001). Computational resources were provided by supercomputer TianHe-I in Tianjin and the Shanghai Supercomputing Center (SCC). The authors thank the developers of free and/or open source software for academic use, including SignalP-3.0, netOglyc-3.1d, netNglyc-1.0, tmhmm-2.0c and EMBOSS-6.0.1.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Dataset S1. Classification of predicted targets from 20 025 proteins in the human proteome using the multi-algorithm and multi-model strategy presented in this work (EXCEL).
Tables S1–S8. Quantitative results with tables from Figures 2, 3, 4, 5, 6, 7, and 9 (DOC).
Supplementary information is available at Acta Pharmacologica Sinica's website.
Supplementary information
Dataset S1 (download XLS )
The first sheet, namely Full Targets, lists the classification of predicted full targets, and the second sheet, namely Quasi Targets, lists the classification of predicted quasi targets (XLS 818 kb)
PowerPoint slides
Rights and permissions
About this article
Cite this article
Liu, Yt., Li, Y., Huang, Zf. et al. Multi-algorithm and multi-model based drug target prediction and web server. Acta Pharmacol Sin 35, 419–431 (2014). https://doi.org/10.1038/aps.2013.153
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/aps.2013.153
Keywords
This article is cited by
-
Rare Diseases: Drug Discovery and Informatics Resource
Interdisciplinary Sciences: Computational Life Sciences (2018)


