Abstract
Fast and cheaper next-generation sequencing technologies will generate unprecedentedly massive and highly dimensional genetic variation data that allow nearly complete evaluation of genetic variation including both common and rare variants. There are two types of association tests: variant-by-variant test and group test. The variant-by-variant test is designed to test the association of common variants, while the group test is suitable to collectively test the association of multiple rare variants. We propose here a smoothed functional principal component analysis (SFPCA) statistic as a general approach for testing association of the entire allelic spectrum of genetic variation (both common and rare variants), which utilizes the merits of both variant-by-variant analysis and group tests. By intensive simulations, we demonstrate that the SFPCA statistic has the correct type 1 error rates and much higher power than the existing methods to detect association of (1) common variants, (2) rare variants, (3) both common and rare variants and (4) variants with opposite directions of effects. To further evaluate its performance, the SFPCA statistic is applied to ANGPTL4 sequence and six continuous phenotypes data from the Dallas Heart Study as an example for testing association of rare variants and a GWAS of schizophrenia data as an example for testing association of common variants. The results show that the SFPCA statistic has much smaller P-values than many existing statistics in both real data analysis examples.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
Accession codes
References
Rakyan VK, Down TA, Balding DJ et al. Epigenome-wide association studies for common human diseases. Nat Rev Genet 2011; 12: 529–541.
Neale BM, Rivas MA, Voight BF et al. Testing for an unusual distribution of rare variants. PLoS Genet 2011; 7: e1001322.
Bansal V, Libiger O, Torkamani A et al. Statistical analysis strategies for association studies involving rare variants. Nat Rev Genet 2010; 11: 773–785.
Pool JE, Hellmann I, Jensen JD et al. Population genetic inference from genomic sequence variation. Genome Res 2010; 20: 291–300.
Bacanu SA, Nelson MR, Whittaker JC : Comparison of methods and sampling designs to test for association between rare variants and quantitative traits. Genet Epidemiol 2011; 35: 226–235.
Shi Y, Rao Y : China’s research culture. Science 2010; 329: 1128.
Li Y, Byrnes AE, Li M : To identify associations with rare variants, just WHaIT: weighted haplotype and imputation-based tests. Am J Hum Genet 2010; 87: 728–735.
Li B, Leal SM : Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 2008; 83: 311–321.
Madsen BE, Browning SR : A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 2009; 5: e1000384.
Price AL, Kryukov GV, de Bakker PI et al. Pooled association tests for rare variants in exon-resequencing studies. Am J Hum Genet 2010; 86: 832–838.
King CR, Rathouz PJ, Nicolae DL : An evolutionary framework for association testing in resequencing studies. PLoS Genet 2010; 6: e1001202.
Yi N, Zhi D : Bayesian analysis of rare variants in genetic association studies. Genet Epidemiol 2011; 35: 57–69.
Han F, Pan W. : A data-adaptive sum test for disease association with multiple common or rare variants. Hum Hered 2010; 70: 42–54.
Ionita-Laza I, Buxbaum JD, Laird NM et al. A new testing strategy to identify rare variants with either risk or protective effect on disease. PLoS Genet 2011; 7: e1001289.
Hoffmann TJ, Marini NJ, Witte JS : Comprehensive approach to analyzing rare genetic variants. PLoS One 2010; 5: e13584.
Liu DJ, Leal SM : A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genet 2010; 6: e1001156.
Mukhopadhyay I, Feingold E, Weeks DE et al. Association tests using kernel-based measures of multi-locus genotype similarity between individuals. Genet Epidemiol 2010; 34: 213–221.
Luo L, Boerwinkle E, Xiong M : Association studies for next-generation sequencing. Genome Res 2011; 21: 1099–1108.
Xiong M, Zhao J, Boerwinkle E : Generalized T2 test for genome association studies. Am J Hum Genet. 2002; 70: 1257–1268.
Romeo S, Pennacchio LA, Fu Y et al. Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL. Nat Genet 2007; 39: 513–516.
Ramsay JO, Silverman BW : Functional Data Analysis Second Edition New York: Springer, 2005.
Hudson RR : Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 2002; 18: 337–338.
Wu MC, Lee S, Cai T et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 2011; 89: 82–93.
Peng G, Luo L, Siu H et al. Gene and pathway-based second-wave analysis of genome-wide association studies. Eur J Hum Genet 2010; 18: 111–117.
Zhao T, Liu Y, Wang P et al. Positive association between the PDLIM5 gene and bipolar disorder in the Chinese Han population. J Psychiatry Neurosci 2009; 34: 199–204.
Shimada M, Miyagawa T, Kawashima M et al. An approach based on a genome-wide association study reveals candidate loci for narcolepsy. Hum Genet 2010; 128: 433–441.
Kim JM, Lee KH, Jeon YJ et al. Identification of genes related to Parkinson's disease using expressed sequence tags. DNA Res 2006; 13: 275–286.
Singh RR, Kumar R : MTA family of transcriptional metaregulators in mammary gland morphogenesis and breast cancer. J. Mammary Gland Biol. Neoplasia 2007; 12: 115–125.
Acknowledgements
The project described was supported by Grant 1R01AR057120-01, 1R01HL106034-01 and 1U01HG005728-01 from the National Institutes of Health. Genome Wide Association Study of Schizophrenia. Funding support for the Genome-Wide Association of Schizophrenia Study was provided by the National Institute of Mental Health (R01 MH67257, R01 MH59588, R01 MH59571, R01 MH59565, R01 MH59587, R01 MH60870, R01 MH59566, R01 MH59586, R01 MH61675, R01 MH60879, R01 MH81800, U01 MH46276, U01 MH46289 U01 MH46318, U01 MH79469 and U01 MH79470) and the genotyping of samples was provided through the Genetic Association Information Network (GAIN). The data sets used for the analyses described in this manuscript were obtained from the database of Genotypes and Phenotypes (dbGaP) found at http://www.ncbi.nlm.nih.gov/gap through dbGaP accession number phs000021.v3.p2. Samples and associated phenotype data for the Genome-Wide Association of Schizophrenia Study were provided by the Molecular Genetics of Schizophrenia Collaboration (PI: Pablo V Gejman, Evanston Northwestern Healthcare (ENH) and Northwestern University, Evanston, IL, USA).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no conflict of interest.
Additional information
Supplementary Information accompanies the paper on European Journal of Human Genetics website
Supplementary information
Appendix
Appendix
We define an extended inner product as

where
. Similarly to equation (3), the penalized sample variance is defined as

where
.
To find the functional principal component, we seek to maximize F in equation (A2) which is equivalent to solving the following optimization problem:

Using the Lagrange multiplier, we reformulate the constrained optimization problem (A3) into the following non-constrained optimization problem:

where λ is a parameter. Its first variation is given by

which implies the following integral function:
.
Rights and permissions
About this article
Cite this article
Luo, L., Zhu, Y. & Xiong, M. Smoothed functional principal component analysis for testing association of the entire allelic spectrum of genetic variation. Eur J Hum Genet 21, 217–224 (2013). https://doi.org/10.1038/ejhg.2012.141
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/ejhg.2012.141


