Abstract
Single nucleotide polymorphism (SNP) interaction plays a critical role for complex diseases. The primary limitation of logistic regressions (LR) in testing SNP–SNP interactions is that coefficient estimates may not be valid because of numerous terms in a model. Multivariate adaptive regression splines (MARS) have useful features to effectively reduce the number of terms in a model. To study how MARS can address these drawbacks possibly better than LR, the power of MARS and LR with SNPs using the reference-coding and additive-mode scheme was compared using simulated data of ten SNPs for 400 subjects based on 1,000 replications for five interaction models. In overall scenarios, MARS performed better than LR. In the model with a dominant two-way interaction, the power range was 76–96% for MARS and 1–8% for LR in both coding schemes. In the dominant three-way interaction model, the power was 57–85% for MARS and less than 4% for LR. In the prostate cancer example, we evaluated the association between ten SNPs and prostate cancer risk in 649 Caucasians. The best model with one two-way and one three-way interaction was selected using MARS. The findings supported that MARS may provide a useful tool for exploring SNP–SNP interactions.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
MARS user guide (2001) Salford Systems, San Diego
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr 19:716–723
Albert A, Anderson A (1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71:1–10
Barhdadi A, Dube MP (2007) Two-stage strategies to detect gene × gene interactions in case-control data. In: BMC proceedings. Genetic analysis workshop 15, p S135. St. Pete Beach, FL, USA
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont
Bureau A, Dupuis J, Falls K, Lunetta KL, Hayward B, Keith TP, Van Eerdewegh P (2005) Identifying SNPs predictive of phenotype using random forests. Genet Epidemiol 28:171–182
Cockerham CC (1954) An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics 39:859–882
Cook NR, Zee RY, Ridker PM (2004) Tree and spline based association analysis of gene–gene interaction models for ischemic stroke. Stat Med 23:1439–1453
Culverhouse R, Suarez BK, Lin J, Reich T (2002) A perspective on epistasis: limits of models displaying no main effect. Am J Hum Genet 70:461–471
De Boor C (1978) A practical guide to splines. Springer, New York
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19:1–66
Gauderman WJ (2002) Sample size requirements for association studies of gene–gene interaction. Am J Epidemiol 155:478–484
Ge D, Zhu H, Huang Y, Treiber FA, Harshfield GA, Snieder H, Dong Y (2007) Multilocus analyses of renin–angiotensin–aldosterone system gene variants on blood pressure at rest and during behavioral stress in young normotensive subjects. Hypertension 49:107–112
Gu D, Su S, Ge D, Chen S, Huang J, Li B, Chen R, Qiang B (2006) Association study with 33 single-nucleotide polymorphisms in 11 candidate genes for hypertension in Chinese. Hypertension 47:1147–1154
Hu JJ (2006) DNA repair pathways: genetic determinants of disparities in prostate and colon cancer. In: The 97th annual meeting of American association for cancer research. Washington, DC
Hu JJ, Keku TO, Galanko J, Velasco-Gonzalez C, Daniel B, Sandler RS (2007) DNA-repair genetic polymorphisms and racial difference of colon cancer risk. American Association Cancer Research, Los Angeles
Hu JJ, Hall MC, Grossman L, Hedayati M, McCullough DL, Lohman K, Case LD (2004) Deficient nucleotide excision repair capacity enhances human prostate cancer risk. Cancer Res 64:1197–1201
Lin HY, Desmond R, Louis Bridges S Jr, Soong SJ (2008) Variable selection in logistic regression for detecting SNP–SNP interactions: the rheumatoid arthritis example. Eur J Hum Genet 16(6):735–741
Lin HY, Hall MC, Clark PE, Phillips JJ, Hu JJ (2006) Gene–gene interactions of DNA-repair nsSNPs in prostate cancer recurrence. In: The 97th annual meeting of American association for cancer research, Washington, DC
Moore JH (2003) The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum Hered 56:73–82
Moore JH, Williams SM (2002) New strategies for identifying gene–gene interactions in hypertension. Ann Med 34:88–95
Morgan JN, Sonquist JA (1963) Problems in the analysis of survey data, and a proposal. J Am Stat Assoc 58:415–434
Musani SK, Shriner D, Liu N, Feng R, Coffey CS, Yi N, Tiwari HK, Allison DB (2007) Detection of gene × gene interactions in genome-wide association studies of human population data. Hum Hered 63:67–84
Nelson MR, Kardia SL, Ferrell RE, Sing CF (2001) A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. Genome Res 11:458–470
North BV, Curtis D, Sham PC (2005) Application of logistic regression to case-control association studies involving two causative loci. Hum Hered 59:79–87
Park MY, Hastie T (2008) Penalized logistic regression for detecting gene interactions. Biostatistics 9:30–50
Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH (2001) Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 69:138–147
Schork NJ, Fallin D, Thiel B, Xu X, Broeckel U, Jacob HJ, Cohen D (2001) The future of genetic case-control studies. Adv Genet 42:191–212
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Scuteri A, Sanna S, Chen WM, Uda M, Albai G, Strait J, Najjar S, Nagaraja R, Orru M, Usala G, Dei M, Lai S, Maschio A, Busonero F, Mulas A, Ehret GB, Fink AA, Weder AB, Cooper RS, Galan P, Chakravarti A, Schlessinger D, Cao A, Lakatta E, Abecasis GR (2007) Genome-wide association scan shows genetic variants in the FTO gene are associated with obesity-related traits. PLoS Genet 3:e115
Smith TR, Miller MS, Lohman K, Lange EM, Case LD, Mohrenweiser HW, Hu JJ (2002) Polymorphisms of XRCC1 and XRCC3 genes and susceptibility to breast cancer. Cancer Lett 190:183–190
Smith TR, Levine EA, Perrier ND, Miller MS, Freimanis RI, Lohman K, Case LD, Xu J, Mohrenweiser HW, Hu JJ (2003) DNA-repair genetic polymorphisms and breast cancer risk. Cancer Epidemiol Biomarkers Prev 12:1200–1204
Tomlinson I, Webb E, Carvajal-Carmona L, Broderick P, Kemp Z, Spain S, Penegar S, Chandler I, Gorman M, Wood W, Barclay E, Lubbe S, Martin L, Sellick G, Jaeger E, Hubner R, Wild R, Rowan A, Fielding S, Howarth K, Silver A, Atkin W, Muir K, Logan R, Kerr D, Johnstone E, Sieber O, Gray R, Thomas H, Peto J, Cazier JB, Houlston R (2007) A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nat Genet 39:984–988
Van Emburgh BO, Hu JJ, Levine EA, Mosley LJ, Case LD, Lin HY, Knight SN, Perrier ND, Rubin P, Sherrill GB, Shaw CS, Carey LA, Sawyer LR, Allen GO, Milikowski C, Willingham MC, Miller MS (2008) Polymorphisms in drug metabolism genes, smoking, and p53 mutations in breast cancer. Mol Carcinog 47:88–99
Veaux RDD, Psichogios DC, Ungar LH (1993) A comparison of two nonparametric estimation schemes: MARS and neural networks. Comput Chem Eng 17:819–837
Wade MJ (2000) Epistasis and evolutionary process. Oxford University Press, New York
Webb MC, Wilson JR, Chong J (2004) An analysis of quasi-complete binary data with logistic model: application to alcohol abuse data. J Data Sci 2:273–285
York TP, Eaves LJ (2001) Common disease analysis using multivariate adaptive regression splines (MARS): genetic analysis workshop 12 simulated sequence data. Genet Epidemiol 21(Suppl 1):S649–S654
York TP, Eaves LJ, van den Oord EJ (2006) Multivariate adaptive regression splines: a powerful method for detecting disease–risk relationship differences among subgroups. Stat Med 25:1355–1367
Zabaleta J, Lin HY, Sierra RA, Hall MC, Clark PE, Sartor OA, Hu JJ, Ochoa AC (2008) Interactions of cytokine gene polymorphisms in prostate cancer risk. Carcinogenesis 29:573–578
Acknowledgments
We thank Laura Gallitz for her help on the manuscript editing, and we also thank anonymous reviewers for very helpful suggestions and comments. This research was supported by National Cancer Institute grants, CA73629 and CA090898 and American Cancer Society (CNE-101119) to J.J. Hu.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lin, HY., Wang, W., Liu, YH. et al. Comparison of multivariate adaptive regression splines and logistic regression in detecting SNP–SNP interactions and their application in prostate cancer. J Hum Genet 53, 802–811 (2008). https://doi.org/10.1007/s10038-008-0313-z
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1007/s10038-008-0313-z
Keywords
This article is cited by
-
Genetic interactions effects for cancer disease identification using computational models: a review
Medical & Biological Engineering & Computing (2021)
-
Reliability Analysis of Pile Foundation Using ELM and MARS
Geotechnical and Geological Engineering (2019)
-
SNP by SNP by environment interaction network of alcoholism
BMC Systems Biology (2017)
-
Incorporating feature selection method into support vector regression for stock index forecasting
Neural Computing and Applications (2013)
-
Power of a reproducing kernel-based method for testing the joint effect of a set of single-nucleotide polymorphisms
Genetica (2012)