Abstract
For most complex trait association studies using next-generation sequencing, in addition to the primary phenotype of interest, many clinically important secondary traits are also available, which can be analyzed to map susceptibility genes. Owing to high sequencing costs, most studies use selected samples, and the sampling mechanisms of these studies can be complicated. When the primary and secondary traits are correlated, analyses of secondary phenotypes can cause spurious associations in selected samples and existing methods are inadequate to adjust for them. To address this problem, a likelihood-based method, MULTI-TRAIT-ASSOCIATION (MTA) was developed. MTA is flexible and can be applied to any study with known sampling mechanisms. It also allows efficient inferences of genetic parameters. To investigate the power of MTA and different study designs, extensive simulations were performed under rigorous population genetic and phenotypic models. It is demonstrated that there are great benefits for analyzing secondary phenotypes in selected samples. In particular, using case–control samples and samples with extreme primary phenotypes can be more powerful than analyzing random samples of equivalent size. One major challenge for sequence-based association studies is that most data sets are not of sufficient size to be adequately powered. By applying MTA, data sets ascertained under distinct mechanisms or targeted at different primary traits can be jointly analyzed to map common phenotypes and greatly increase power. The combined analysis can be performed using freely available data sets from public repositories, for example, dbGaP. In conclusion, MTA will have an important role in dissecting the etiology of complex traits.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Cohen JC, Kiss RS, Pertsemlidis A, Marcel YL, McPherson R, Hobbs HH : Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 2004; 305: 869–872.
Ji W, Foo JN, O’Roak BJ et al: Rare independent mutations in renal salt handling genes contribute to blood pressure variation. Nat Genet 2008; 40: 592–599.
Romeo S, Pennacchio LA, Fu Y et al: Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL. Nat Genet 2007; 39: 513–516.
Bodmer W, Bonilla C : Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet 2008; 40: 695–701.
Kryukov GV, Shpunt A, Stamatoyannopoulos JA, Sunyaev SR : Power of deep, all-exon resequencing for discovery of human trait genes. Proc Natl Acad Sci USA 2009; 106: 3871–3876.
Cohen JC, Pertsemlidis A, Fahmi S et al: Multiple rare variants in NPC1L1 associated with reduced sterol absorption and plasma low-density lipoprotein levels. Proc Natl Acad Sci USA 2006; 103: 1810–1815.
Cauchi S, Nead KT, Choquet H et al: The genetic susceptibility to type 2 diabetes may be modulated by obesity status: implications for association studies. BMC Med Genet 2008; 9: 45.
Cauchi S, Meyre D, Dina C et al: Transcription factor TCF7L2 genetic study in the French population: expression in human beta-cells and adipose tissue and strong association with type 2 diabetes. Diabetes 2006; 55: 2903–2908.
Lin DY, Zeng D : Proper analysis of secondary phenotype data in case–control association studies. Genet Epidemiol 2009; 33: 256–265.
Richardson DB, Rzehak P, Klenk J, Weiland SK : Analyses of case–control data for additional outcomes. Epidemiology 2007; 18: 441–445.
Ioannidis JP, Thomas G, Daly MJ : Validating, augmenting and refining genome-wide association signals. Nat Rev Genet 2009; 10: 318–329.
McCarthy MI, Abecasis GR, Cardon LR et al: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 2008; 9: 356–369.
Cirulli ET, Goldstein DB : Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet 2010; 11: 415–425.
Plomin R, Haworth CM, Davis OS : Common disorders are quantitative traits. Nat Rev Genet 2009; 10: 872–878.
Lange C, Silverman EK, Xu X, Weiss ST, Laird NM : A multivariate family-based association test using generalized estimating equations: FBAT-GEE. Biostatistics 2003; 4: 195–206.
Liu J, Pei Y, Papasian CJ, Deng HW : Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations. Genet Epidemiol 2009; 33: 217–227.
Allison DB, Thiel B, St Jean P, Elston RC, Infante MC, Schork NJ : Multiple phenotype modeling in gene-mapping studies of quantitative traits: power advantages. Am J Hum Genet 1998; 63: 1190–1201.
Boyko AR, Williamson SH, Indap AR et al: Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet 2008; 4: e1000083.
Liu DJ, Leal SM : A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genet 2010; 6: e1001156.
Madsen BE, Browning SR : A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 2009; 5: e1000384.
Price AL, Kryukov GV, de Bakker PI et al: Pooled association tests for rare variants in exon-resequencing studies. Am J Hum Genet 2010; 86: 832–838.
Morris AP, Zeggini E : An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet Epidemiol 2009; 34: 188–193.
Li B, Leal SM : Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 2008; 83: 311–321.
Neale BM, Rivas MA, Voight BF et al: Testing for an unusual distribution of rare variants. PLoS Genet 2010; 7: e1001322.
Bhatia G, Bansal V, Harismendy O et al: A covering method for detecting genetic associations between rare variants and common phenotypes. PLoS Comput Biol 2010; 6: e1000954.
Aitken AC : Notes on selection from a multivariate normal population. Proc Edin Math Soc 1934; 4: 106–110.
Munafo MR, Flint J : Meta-analysis of genetic association studies. Trends Genet 2004; 20: 439–444.
Skol AD, Scott LJ, Abecasis GR, Boehnke M : Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet 2006; 38: 209–213.
Mailman MD, Feolo M, Jin Y et al: The NCBI dbGaP database of genotypes and phenotypes. Nat Genet 2007; 39: 1181–1186.
Bouatia-Naji N, Rocheleau G, Van Lommel L et al: A polymorphism within the G6PC2 gene is associated with fasting plasma glucose levels. Science 2008; 320: 1085–1088.
Elliott P, Chambers JC, Zhang W et al: Genetic loci associated with C-reactive protein levels and risk of coronary heart disease. JAMA 2009; 302: 37–48.
Sladek R, Rocheleau G, Rung J et al: A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 2007; 445: 881–885.
Webster RJ, Warrington NM, Weedon MN et al: The association of common genetic variants in the APOA5, LPL and GCK genes with longitudinal changes in metabolic and cardiovascular traits. Diabetologia 2009; 52: 106–114.
Koster A, Chao YB, Mosior M et al: Transgenic angiopoietin-like (angptl)4 overexpression and targeted disruption of angptl4 and angptl3: regulation of triglyceride metabolism. Endocrinology 2005; 146: 4943–4950.
Li B, Ge D, Wang Y et al: Lipoprotein lipase gene polymorphisms and blood pressure levels in the Northern Chinese Han population. Hypertens Res 2004; 27: 373–378.
Romeo S, Yin W, Kozlitina J et al: Rare loss-of-function mutations in ANGPTL family members contribute to plasma triglyceride levels in humans. J Clin Invest 2009; 119: 70–79.
Li M, Li C : Assessing departure from Hardy–Weinberg equilibrium in the presence of disease association. Genet Epidemiol 2008; 32: 589–599.
Garner C : Confounded by sequencing depth in association studies of rare alleles. Genet Epidemiol 2011; 35: 261–268.
Nyholt DR : A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am J Hum Genet 2004; 74: 765–769.
Liu DJ, Leal SM : Replication strategies for rare variant complex trait association studies via next-generation sequencing. Am J Hum Genet 2010; 87: 790–801.
Acknowledgements
This research is supported by National Institutes of Health Grants 1RC4MD005964 and 1RC2HL102926 (SML). DJL is partially supported by a training fellowship from the Keck Center Pharmacoinformatics Training Program of the Gulf Coast Consortia (NIH Grant no. 5 R90 DK071505-04). We thank Drs Jonathan Cohen (JC) and Helen Hobbs for providing us with data from the Dallas Heart Study on the ANGTPL-family genes, which was supported by National Institutes of Health Grant RL1HL092550 (JC). Computation for this research was supported in part by the Shared University Grid at Rice funded by NSF under Grant EIA-0216467, and a partnership between Rice University, Sun Microsystems and Sigma Solutions Inc.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no conflict of interest.
Additional information
Supplementary Information accompanies the paper on European Journal of Human Genetics website
Supplementary information
Rights and permissions
About this article
Cite this article
Liu, D., Leal, S. A flexible likelihood framework for detecting associations with secondary phenotypes in genetic studies using selected samples: application to sequence data. Eur J Hum Genet 20, 449–456 (2012). https://doi.org/10.1038/ejhg.2011.211
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/ejhg.2011.211


