Abstract
Current statistical methods to test association between rare variants and phenotypes are essentially the group-wise methods that collapse or aggregate all variants in a predefined group into a single variant. Comparing with the variant-by-variant methods, the group-wise methods have their advantages. However, two factors may affect the power of these methods. One is that some of the causal variants may be protective. When both risk and protective variants are presented, it will lose power by collapsing or aggregating all variants because the effects of risk and protective variants will counteract each other. The other is that not all variants in the group are causal; rather, a large proportion is believed to be neutral. When a large proportion of variants are neutral, collapsing or aggregating all variants may not be an optimal solution. We propose two alternative methods, adaptive clustering (AC) method and adaptive weighting (AW) method, aiming to test rare variant association in the presence of neutral and/or protective variants. Both of AC and AW are applicable to quantitative traits as well as qualitative traits. Results of extensive simulation studies show that AC and AW have similar power and both of them have clear advantages from power to computational efficiency comparing with existing group-wise methods and existing data-driven methods that allow neutral and protective variants. We recommend AW method because AW method is computationally more efficient than AC method.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Pritchard JK : Are rare variants responsible for susceptibility to complex diseases? Am J Hum Genet 2001; 69: 124–137.
Pritchard JK, Cox NJ : The allelic architecture of human disease genes: common disease-common variant...or not? Hum Mol Genet 2002; 11: 2417–2423.
Weiss KM, Terwilliger JD : How many diseases does it take to map a gene with SNPs? Nat Genet 2000; 26: 151–157.
Stratton MR, Rahman N : The emerging landscape of breast cancer susceptibility. Nat Genet 2008; 40: 17–22.
Walsh T, King MC : Ten genes for inherited breast cancer. Cancer Cell 2007; 11: 103–105.
Frikke-Schmidt R, Nordestgaard BG, Jensen GB, Tybjaerg-Hansen A : Genetic variation in ABC transporter A1 contributes to HDL cholesterol in the general population. J Clin Invest 2004; 114: 1343–1353.
Cohen JC, Kiss RS, Pertsemlidis A, Marcel YL, McPherson R, Hobbs HH : Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 2004; 305: 869–872.
Plenge RM, Cotsapas C, Davies L et al. Two independent alleles at 6q23 associated with risk of rheumatoid arthritis. Nat Genet 2007; 39: 1477–1482.
Thomson W, Barton A, Ke X et al. Rheumatoid arthritis association at 6q23. Nat Genet 2007; 39: 1431–1433.
Saxena R, Voight BF, Lyssenko V et al. Genomewide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 2007; 316: 1331–1336.
Zeggini E, Weedon MN, Lindgren CM et al. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 2007; 316: 1336–1341.
McCarthy MI, Abecasis GR, Cardon LR et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 2008; 9: 356–369.
Ji W, Foo JN, O’Roak BJ et al. Rare independent mutations in renal salt handling genes contribute to blood pressure variation. Nat Genet 2008; 40: 592–599.
Ahituv N, Kavaslar N, Schackwitz W et al. Medical sequencing at the extremes of human body mass. Am J Hum Genet 2007; 80: 779–791.
Cohen JC, Pertsemlidis A, Fahmi S et al. Multiple rare variants in NPC1L1 associated with reduced sterol absorption and plasma low-density lipoprotein levels. Proc Natl Acad Sci USA 2006; 103: 1810–1815.
Romeo S, Pennacchio LA, Fu Y et al. Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL. Nat Genet 2007; 39: 513–516.
Romeo S, Yin W, Kozlitina J et al. Rare loss-of-function mutations in ANGPTL family members contribute to plasma triglyceride levels in humans. J Clin Invest 2009; 119: 70–79.
Hodges E, Xuan Z, Balija V et al. Genome-wide in situ exon capture for selective resequencing. Nat Genet 2007; 39: 1522–1527.
Andre′s A, Clark A, Shimmin L et al. Understanding the accuracy of statistical haplotype inference with sequence data of known phase. Genetic Epidemiol 2007; 31: 659–671.
Morgenthaler S, Thilly WG : A strategy to discover genes that carry multiallelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat Res 2007; 615: 28–56.
Li B, Leal SM : Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 2008; 83: 311–321.
Madsen BE, Browning SR : A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 2009; 5: e1000384.
Price AL, Kryukov GV, de Bakker PI et al. Pooled association tests for rare variants in exon-resequencing studies. Am J Hum Genet 2010; 86: 832–838.
Zawistowski M, Gopalakrishnan S, Ding J et al. Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes. Am J Hum Genet 2010; 87: 604–617.
Hoffmann TJ, Marini NJ, Witte JS : Comprehensive Approach to Analyzing Rare Genetic Variants. PLoS One 2010; 5: e13584.
Kryukov GV, Shpunt A, Stamatoyannopoulos JA, Sunyaev SR : Power of deep, all-exon resequencing for discovery of human trait genes. Proc Natl Acad Sci USA 2009; 106: 3871–3876.
Ng PC, Henikoff S : Sift: predicting amino acid changes that affect protein function. Nucleic Acids Res 2003; 31: 3812–3814.
Ferrer-Costa C, Orozco M, de la Cruz X : Sequence-based prediction of pathological mutations. Proteins 2004; 57: 811–819.
Ramensky V, Bork P, Sunyaev S : Human non-synonymous snps: server and survey. Nucleic Acids Res 2002; 30: 3894–3900.
Liu DJ, Leal SM : A novel adaptive method for the analysis of next generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genet 2010; 6: e1001156.
Han F, Pan W : A data-adaptive sum test for disease association with multiple common or rare variants. Hum Hered 2010; 70: 42–54.
Bhatia G, Bansal V, Harismendy O et al. A covering method for detecting genetic associations between rare variants and common phenotypes. PLoS Comput Biol 2010; 6: e1000954.
Zhang L, Pei Y-F, Li J, Papasian CJ, Deng H-W : Efficient utilization of rare variants for detection of disease-related genomic regions. PLoS One 2010; 5: e14288.
Neale BM, Rivas MA, Voight BF et al. Testing for an unusual distribution of rare variants. PLoS Genet 2011; 7: e1001322.
Wu M, Lee S, Cai T, Li Y, Boehnke M, Lin X : Rare variant association testing for sequencing data using the sequence kernel association test (SKAT). Am J Hum Genet 2011; 89: 82–93.
Goeman JJ, van de Geer S, van Houwelingen HC : Testing against a high dimensional alternative. J Royal Stat Soc B 2006; 68: 477–493.
Uh HW, Tsonaka R, Houwing-Duistermaat JJ : Does pathway analysis make it easier for common variants to tag rare ones? BMC Proc 2011; 5 (Suppl 9): S90.
Nelder J, Wedderburn R : Generalized linear models. J R Stat Soc Ser A 1972; 135: 370–384.
Chapman JM, Cooper JD, Todd JA, Clayton DG : Detecting disease associations due to linkage disequilibrium using haplotype tags: a class of tests and the determinants of statistical power. Hum Hered 2003; 56: 18–31.
Scheet P, Stephens M : A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 2006; 78: 629–644.
Devlin B, Roeder K : Genomic control for association studies. Biometrics 1999; 55: 997–1004.
Pritchard JK, Stephens M, Rosenberg NA, Donnelly P : Association mapping in structured populations. Am J Hum Genet 2000; 67: 70–181.
Zhang S, Zhu X, Zhao H : On a semi-parametric test to detect associations between quantitative traits and candidate genes using unrelated individuals. Genet Epidemiol 2003; 24: 44–56.
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D : PCs analysis corrects for stratification in genome-wide association studies. Nat Genet 2006; 38: 904–909.
Acknowledgements
The Genetic Analysis workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences. Preparation of the Genetic Analysis Workshop 17 Simulated Exome Data Set was supported in part by NIH R01 MH059490 and used sequencing data from the 1000 Genomes Project (www.1000genomes.org).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no conflict of interest.
Additional information
Supplementary Information accompanies the paper on European Journal of Human Genetics website
Supplementary information
Rights and permissions
About this article
Cite this article
Sha, Q., Wang, S. & Zhang, S. Adaptive clustering and adaptive weighting methods to detect disease associated rare variants. Eur J Hum Genet 21, 332–337 (2013). https://doi.org/10.1038/ejhg.2012.143
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/ejhg.2012.143