Abstract
In recent years, several association analysis methods for case-control studies have been developed. However, as we turn towards the identification of single nucleotide polymorphisms (SNPs) for prognosis, there is a need to develop methods for the identification of SNPs in high dimensional data with survival outcomes. Traditional methods for the identification of SNPs have some drawbacks. First, the majority of the approaches for case-control studies are based on single SNPs. Second, SNPs that are identified without incorporating biological knowledge are more difficult to interpret. Random forests has been found to perform well in gene expression analysis with survival outcomes. In this paper we present the first pathway-based method to correlate SNP with survival outcomes using a machine learning algorithm. We illustrate the application of pathway-based analysis of SNPs predictive of survival with a data set of 192 multiple myeloma patients genotyped for 500 000 SNPs. We also present simulation studies that show that the random forests technique with log-rank score split criterion outperforms several other machine learning algorithms. Thus, pathway-based survival analysis using machine learning tools represents a promising approach for the identification of biologically meaningful SNPs associated with disease.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Chasman DI : On the utility of gene set methods in genomewide association studies of quantitative traits. Genet Epidemiol 2008; 32: 658–668.
Peng G, Luo L, Siu H et al: Gene and pathway-based second-wave analysis of genome-wide association studies. Eur J Hum Genet 2010; 18: 111–117.
Ritchie MD : Using prior knowledge and genome-wide association to identify pathways involved in multiple sclerosis. Genome Med 2009; 1: 65.
Baranzini SE, Galwey NW, Wang J et al: Pathway and network-based analysis of genome-wide association studies in multiple sclerosis. Hum Mol Genet 2009; 18: 2078–2090.
Ballard DH, Aporntewan C, Lee JY, Lee JS, Wu Z, Zhao H : A pathway analysis applied to genetic analysis workshop 16 genome-wide rheumatoid arthritis data. BMC Proc 2009; 3 (Suppl 7): S91.
Wang K, Zhang H, Kugathasan S et al: Diverse genome-wide association studies associate the IL12/IL23 pathway with Crohn Disease. Am J Hum Genet 2009; 84: 399–405.
Wang K, Li M, Bucan M : Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet 2007; 81: 1278–1283.
Dinu V, Miller PL, Zhao H : Evidence for association between multiple complement pathway genes and AMD. Genet Epidemiol 2007; 31: 224–237.
Bureau A, Dupuis J, Falls K et al: Identifying SNPs predictive of phenotype using random forests. Genet Epidemiol 2005; 28: 171–182.
Chang JS, Yeh RF, Wiencke JK et al: Pathway analysis of single-nucleotide polymorphisms potentially associated with glioblastoma multiforme susceptibility using random forests. Cancer Epidemiol Biomarkers Prev 2008; 17: 1368–1373.
Dinu V, Zhao H, Miller P et al: Integrating domain knowledge with statistical and data mining methods for high-density genomic SNP disease association analysis. J Biomed Inform 2007; 40: 750–760.
Schumacher M, Binder H, Gerds T : Assessment of survival prediction models based on microarray data. Bioinformatics 2007; 23: 1768–1774.
van Wieringen W, Kun D, Hampel R, Boulesteix A-L : Survival prediction using gene expression data. a review and comparison. Comput Stat Data Anal 2009; 53: 1590–1603.
Pang H, Datta D, Zhao H : Pathway analysis using random forests with bivariate node-split for survival outcomes. Bioinformatics 2010; 26: 250–258.
Ishwaran H, Kogalur U, Blackstone E, Lauer M : Random survival forests. Ann Appl Stat 2008; 2: 841–860.
Hothorn T, Lausen B : On the exact distribution of maximally selected rank statistics. Comput Stat Data Anal 2003; 43: 121–137.
Segal M : Regression trees for censored data. Biometrics 1988; 44: 35–47.
Kanehisa M, Goto S, Hattori M et al: From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 2006; 34: D354–D357.
Bender R, Augustin T, Blettner M : Generating survival times to simulate Cox proportional hazards models. Stat Med 2005; 24: 1713–1723.
International HapMap Consortium: A second generation human haplotype map of over 3.1 million SNPs. Nature 2007; 449: 851–861.
Browning BL, Browning SR : A unified approach to genotype imputation and haplotype phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet 2009; 84: 210–223.
Nothnagel M, Ellinghaus D, Schreiber S, Krawczak M, Franke A : A comprehensive evaluation of SNP genotype imputation. Hum Genet 2009; 125: 163–171.
Avet-Loiseau H, Li C, Magrangeas F et al: Prognostic significance of copy-number alterations in multiple myeloma. J Clin Oncol 2009; 27: 4585–4590.
Storey JD, Tibshirani R : Statistical significance for genome-wide studies. PNAS 2003; 100: 9440–9445.
Duan S, Bleibel WK, Huang RS et al: Mapping genes that contribute to daunorubicin-induced cytotoxicity. Cancer Res 2007; 67: 5425–5433.
Schadt EE, Molony C, Chudin E et al: Mapping the genetic architecture of gene expression in human liver. Plos Biol 2008; 6: e107.
Lauta VM : A review of the cytokine network in multiple myeloma: diagnostic, prognostic, and therapeutic implications. Cancer 2003; 97: 2440–2452.
Georgakis GV, Younes A : Cytokines and lymphomas. Cancer Treat Res 2005; 126: 69–102.
Hosgood III HD, Baris D, Zhang Y et al: Caspase polymorphisms and genetic susceptibility to multiple myeloma. Hematol Oncol 2008; 26: 148–151.
Ishitsuka K, Hideshima T, Hamasaki M et al: Honokiol overcomes conventional drug resistance in human multiple myeloma by induction of caspase-dependent and -independent apoptosis. Blood 2005; 106: 1794–1800.
Nabhan C, Gajria D, Krett NL, Gandhi V, Ghias K, Rosen ST : Caspase activation is required for gemcitabine activity in multiple myeloma cell lines. Mol Cancer Ther 2002; 1: 1221–1227.
Fuller SJ, Papaemmanuil E, McKinnon L et al: Analysis of a large multi-generational family provides insight into the genetics of chronic lymphocytic leukemia. Br J Haematol 2008; 142: 238–245.
van de Donk NW, Bloem AC, van der Spek E, Lokhorst HM : New treatment strategies for multiple myeloma by targeting BCL-2 and the mevalonate pathway. Curr Pharm Des 2006; 12: 327–340.
Lombardi L, Poretti G, Mattioli M et al: Molecular characterization of human multiple myeloma cell lines by integrative genomics: insights into the biology of the disease. Genes Chromosomes Cancer 2007; 46: 226–238.
Carrasco DR, Tonon G, Huang Y et al: High-resolution genomic profiles define distinct clinico-pathogenetic subgroups of multiple myeloma patients. Cancer Cell 2006; 9: 313–325.
Feinman R, Koury J, Thames M, Barlogie B, Epstein J, Siegel DS : Role of NF-kappaB in the rescue of multiple myeloma cells from glucocorticoid-induced apoptosis by bcl-2. Blood 1999; 93: 3044–3052.
Shi J, Tricot G, Szmania S et al: Infusion of haplo-identical killer immunoglobulin-like receptor ligand mismatched NK cells for relapsed myeloma in the setting of autologous stem cell transplantation. Br J Haematol 2008; 143: 641–653.
Tinhofer I, Marschitz I, Henn T, Egle A, Greil R : Expression of functional interleukin-15 receptor and autocrine production of interleukin-15 as mechanisms of tumor propagation in multiple myeloma. Blood 2000; 95: 610–618.
Pappa C, Miyakis S, Tsirakis G et al: Serum levels of interleukin-15 and interleukin-10 and their correlation with proliferating cell nuclear antigen in multiple myeloma. Cytokine 2007; 37: 171–175.
Alexandrakis MG, Passam FH, Sfiridaki K et al: Interleukin-18 in multiple myeloma patients: serum levels in relation to response to treatment and survival. Leuk Res 2004; 28: 259–266.
Yamashita K, Iwasaki T, Tsujimura T et al: Interleukin-18 inhibits lodging and subsequent growth of human multiple myeloma cells in the bone marrow. Oncol Rep 2002; 9: 1237–1244.
Kitano M, Ogata A, Sekiguchi M, Hamano T, Sano H : Biphasic anti-osteoclastic action of intravenous alendronate therapy in multiple myeloma bone disease. J Bone Miner Metab 2005; 23: 48–52.
Brown EE, Lan Q, Zheng T et al: Common variants in genes that mediate immunity and risk of multiple myeloma. Int J Cancer 2007; 120: 2715–2722.
Goldstein B, Hubbard A, Cutler A, Barcellos L : An application of Random Forests to a genome-wide association dataset: methodological considerations & new findings. BMC Genet 2010; 11: 49.
Genuer R, Poggi J, Tuleau C : Random forests: some methodological insights. Tech rep, INRIA 2008, http://hal.inria.fr/inria-00340725/en/, arXiv:0811.3619.
Devlin B, Risch N : A comparison of linkage disequilibrium measures for fine-mapping. Genomics 1995; 29: 311–322.
Meng Y, Yu Y, Cupples L, Farrer L, Lunetta K : Performance of random forest when SNPs are in linkage disequilibrium. BMC Bioinformatics 2009; 10: 78.
Elbers C, van Eijk K, Franke L et al: Using genome-wide pathway analysis to unravel the etiology of complex diseases. Genet Epidemiol 2009; 33: 419–431.
Acknowledgements
This study was supported by National Institutes of Health (grant P01CA142538) and start-up funds from Duke University Medical Center.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no conflict of interest.
Additional information
Supplementary Information accompanies the paper on European Journal of Human Genetics website
Supplementary information
Rights and permissions
About this article
Cite this article
Pang, H., Hauser, M. & Minvielle, S. Pathway-based identification of SNPs predictive of survival. Eur J Hum Genet 19, 704–709 (2011). https://doi.org/10.1038/ejhg.2011.3
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/ejhg.2011.3
Keywords
This article is cited by
-
Random survival forests identify pathways with polymorphisms predictive of survival in KRAS mutant and KRAS wild-type metastatic colorectal cancer patients
Scientific Reports (2021)
-
Random Effects Model for Multiple Pathway Analysis with Applications to Type II Diabetes Microarray Data
Statistics in Biosciences (2015)