Abstract
With recent advances in sequencing, genotyping arrays, and imputation, GWAS now aim to identify associations with rare and uncommon genetic variants. Here, we describe and evaluate a class of statistics, generalized score statistics (GSS), that can test for an association between a group of genetic variants and a phenotype. GSS are a simple weighted sum of single-variant statistics and their cross-products. We show that the majority of statistics currently used to detect associations with rare variants are equivalent to choosing a specific set of weights within this framework. We then evaluate the power of various weighting schemes as a function of variant characteristics, such as MAF, the proportion associated with the phenotype, and the direction of effect. Ultimately, we find that two classical tests are robust and powerful, but details are provided as to when other GSS may perform favorably. The software package CRaVe is available at our website (http://dceg.cancer.gov/bb/tools/crave).
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Ahituv N, Kavaslar N, Schackwitz W et al: Medical sequencing at the extremes of human body mass. Am J Hum Genet 2007; 80: 779–791.
Cohen JC, Boerwinkle E, Mosley TH, Hobbs HH : Sequence variations in pcsk9, low ldl, and protection against coronary heart disease. N Engl J Med 2006; 354: 1264–1272.
Cohen JC, Kiss RS, Pertsemlidis A, Marcel YL, McPherson R, Hobbs HH : Multiple rare alleles contribute to low plasma levels of hdl cholesterol. Science 2004; 5685: 869–872.
Nejentsev S, Walker N, Riches D, Egholm M, Todd JA : Rare variants of ifih1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science 2009; 5925: 387–389.
Romeo S, Pennacchio LA, Fu Y et al: Population based resequencing of angptl4 uncovers variations that reduce triglycerides and increase hdl. Nat Genet 2007; 4: 513–516.
Liu DJ, Leal SM : A novel adaptive method for the analysis of next generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genet 2010; 10: e1001156.
Madsen BE, Browning SR : A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 2009; 5: e1000384.
Morgenthaler S, Thilly WG : A strategy to discover genes that carry multiallelic or monoallelic risk for common diseases: a cohort allelic sums test (cast). Mutat Res 2007; 12: 28–56.
Wang K, Li M, Hakonarson H : Annovar: functional annotation of genetic variants from highthroughput sequencing data. Nucleic Acids Res 2010; 38: e164–e164.
Basu S, Pan W, Shen X, Oetting WS : Multilocus association testing with penalized regression. Genet Epidemiol 2011; 35: 755–765.
Chapman J, Whittaker J : Analysis of multiple snps in a candidate gene or region. Genet Epidemiol 32, 2008; 6: 560–566.
Hoffmann TJ, Marini NJ, Witte JS : Comprehensive approach to analyzing rare genetic variants. PLoS ONE 2010; 11: e13584.
Lin DY, Tang ZZ : A general framework for detecting disease associations with rare variants in sequencing studies. Am J Hum Genet 2011; 3: 354–367.
Luedtke A, Powers S, Petersen A, Sitarik A, Bekmetjev A, Tintle N : Evaluating methods for the analysis of rare variants in sequence data. BMC Proc 2011; 5: S119.
Mosteller F, Fisher RA : Questions and answers. Am Stat 1948; 5: 30–31.
Neale BM, Rivas MA, Voight BF et al: Testing for an unusual distribution of rare variants. PLoS Genet 2011; 7: e1001322.
Tzeng JY, Zhang D, Chang SM, Thomas DC, Davidian M : Genetrait similarity regression for multimarkerbased association analysis. Biometrics 2009; 65: 822–832.
Wessel J, Schork NJ : Generalized genomic distance based regression methodology for multilocus association analysis. Am J Hum Genet 2006; 79: 792–806.
Xu X, Tian L, Wei LJ : Combining dependent tests for linkage or association across multiple phenotypic traits. Biostatistics 2003; 2: 223–229.
Hotelling H : The generalization of student’s ratio. Ann Math Stat 1931; 3: 360–378.
Han F, Pan W : A data adaptive sum test for disease association with multiple common or rare variants. Hum Hered 2010; 70: 42–54.
Li B, Leal SM : Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 2008; 83: 311–321.
Wu M, Lee S, Cai T, Li Y, Boehnke M, Lin X : Rare variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 2011; 89: 82–93.
Tukey RH, Strassburg CP : Human udpglucuronosyltransferases: metabolism, expression, and disease. Annu Rev Pharmacol Toxicol 2000; 40: 581–616.
Chan AT, Tranah GJ, Giovannucci EL, Hunter DJ, Fuchs CS : Genetic variants in the ugt1a6 enzyme, aspirin use, and the risk of colorectal adenoma. J Natl Cancer Inst 2005; 6: 457–460.
Strassburg CP, Vogel A, Kneip S, Tukey RH, Manns MP : Polymorphisms of the human udpglucuronosyltransferase (ugt) 1a7 gene in colorectal cancer. Gut 2002; 60: 851–856.
Ockenga J, Vogel A, Teich N, Keim V, Manns MP, Strassburg CP : Udp glucuronosyltransferase (ugt1a7) gene polymorphisms increase the risk of chronic pancreatitis and pancreatic cancer. Gastroenterology 2003; 7: 1802–1808.
Vogel A, Kneip S, Barut A et al: Genetic link of hepatocellular carcinoma with polymorphisms of the udpglucuronosyltransferase ugt1a7 gene. Gastroenterology 2001; 121: 1136–1144.
Rothman N, GarciaClosas M, Chatterjee N et al: A multistage genomewide association study of bladder cancer identifies multiple susceptibility loci. Nat Genet 2010; 11: 978–984.
Tang W, Fu YP, Figueroa J et al: An uncommon synonymous humanspecific coding variant within the ugt1a6 gene affects mrna expression and protects from bladder cancer. Genome Biol 2011; 12: 1–27.
IonitaLaza I, Buxbaum JD, Laird NM, Lange C : A new testing strategy to identify rare variants with either risk or protective effect on disease. PLoS Genet 2011; 7: e1001289.
1000 Genome Consortium: A map of human genome variation from population scale sequencing. Nature 2010; 467: 1061–1073.
Yu K, Li Q, Bergen AW et al: Pathway analysis by adaptive combination of P-values. Genet Epidemiol 2009; 33: 700–709.
Mukhopadhyay I, Feingold E, Weeks DE, Thalamuthu A : Association tests using kernel-based measures of multi-locus genotype similarity between individuals. Genet Epidemiol 2010; 34: 213–221.
Acknowledgements
The research of Dr Joshua Sampson was supported by the intramural program of the National Institute of Cancer. The research of this study utilized the high-performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, MD. (http://biowulf.nih.gov).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no conflict of interest.
Additional information
Supplementary Information accompanies the paper on European Journal of Human Genetics website
Supplementary information
Rights and permissions
About this article
Cite this article
Ferguson, J., Wheeler, W., Fu, Y. et al. Statistical tests for detecting associations with groups of genetic variants: generalization, evaluation, and implementation. Eur J Hum Genet 21, 680–686 (2013). https://doi.org/10.1038/ejhg.2012.220
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/ejhg.2012.220