Abstract
There is much recent interest in finding rare genetic variants associated with various diseases. Owing to the scarcity of rare mutations, single-variant analyses often lack power. To enable pooling of information across variants, we use a random effect formulation within a retrospective modeling framework that respects the retrospective data collecting mechanism of case–control studies. More concretely, we model the control allele frequencies of the variants as random effects, and the systematic differences between the case and control frequencies as fixed effects, resulting in a mixed model. The use of Poisson approximation and gamma-distributed random effects results in a generalized negative binomial distribution for the joint distribution of the control and case frequencies. Variants are selected by conducting stepwise likelihood ratio tests. The superiority of the proposed method over two existing variant selection methods is demonstrated in a simulation study. The effects of non-gamma random effects and correlated variants are also found to be not too detrimental in the simulation study. When the proposed procedure is applied to identify rare variants associated with obesity, it identifies one additional variant not picked up by existing methods.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Hindorff, L. A., Sethupathy, P., Junkins, H. A., Ramos, E. M., Mehta, J. P., Collins, F. S. et al. Potential etiologic and functional implications of genome-wide- association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).
Eichler, E. E., Flint, J., Gibson, G., Kong, A., Leal, S. M., Moore, J. H. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat. Rev. Genet. 11, 446–450 (2010).
Gorlov, I. P., Gorlova, O. Y., Sunyaev, S. R., Spitz, M. R. & Amos, C. I. Shifting paradigm of association studies: Value of rare single-nucleotide polymorphisms. Am. J. Hum. Genet. 82, 100–112 (2008).
Li, B. & Leal, S. M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008).
Madsen, B. E. & Browning, S. R. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5, e1000384 (2009).
Price, A. L., Kryukov, G. V., de Bakker, P. I. W., Purcell, S. M., Staples, J., Wei, L. J. et al. Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 86, 832–838 (2010).
Kim, S. Y., Lim, Y., Guo, Y., Li, R., Holmkvist, J., Hansen, T. et al. Design of association studies with pooled or un-pooled next-generation sequencing data. Nat. Biotechnol. 34, 479–491 (2010).
Liang, W. E., Thomas, D. C. & Conti, D. V. Analysis and optimal designs for association studies using next-generation sequencing with case-control pools. Genet. Epidemiol. 36, 870–881 (2012).
Neale, B. M., Rivas, M. A., Voight, B. F., Altshuler, D., Devlin, B., Orho-Melander, M. et al. Testing for an unusual distribution of rare variants. PLoS Genet. 7, e1001322 (2011).
Lin, D. Y. & Tang, Z. Z. A general framework for detecting disease associations with rare variants in sequencing studies. Am. J. Hum. Genet. 89, 354–367 (2011).
Wu, M. C., Lee, S., Cai, T., Li, Y., Boehnke, M. & Lin, X. Rare variant association testing for sequencing data using the sequence kernel association test (SKAT). Am. J. Hum. Genet. 89, 82–93 (2011).
Bhatia, G., Bansal, V., Harismendy, O., Schork, N. J., Topol, E., Frazer, K. et al. A covering method for detecting genetic associations between rare variants and common phenotypes. PLoS Comput. Biol. 6, e1000954 (2010).
Hoffmann, T. J., Marini, N. J. & Witte, J. S. Comprehensive approach to analyzing rare genetic variants. PLoS One 5, e13584 (2010).
Breslow, N. E. & Day, N. E. Statistical Methods in Cancer Research. Volume I—The Analysis of Case-Control Studies, (IRAC Publications, 1980).
Saha, K. & Paul, S. Bias-corrected maximum likelihood estimator of the negative binomial dispersion parameter. Biometrics 61, 179–185 (2005).
Efron, B. & Tibshirani, R. J. An Introduction to the Bootstrap, (Chapman & Hall: New York, NY, USA, 1994).
Topol, E. J., Bousser, M. G., Fox, K. A., Creager, M. A., Despres, J. P., Easton, J. D. et al CRESCENDO investigators Rimonabant for prevention of cardiovascular events (CRESCENDO): a randomised, multicentre, placebo-controlled trial. Lancet 376, 517–523 (2010).
Harismendy, O., Bansal, V., Bhatia, G., Nakano, M., Scott, M., Wang, X. et al. Population sequencing of two endocannabinoid metabolic genes identifies rare and common regulatory variants associated with extreme obesity and metabolite. Genome. Biol. 11, R118 (2010).
Hougaard, P., Lee, M. L. T. & Whitmore, G. A. Analysis of overdispersed count data by mixtures of Poisson variables and Poisson processes. Biometrics 53, 1225–1238 (1997).
Epstein, M. P. & Satten, G. A. Inference on haplotype effects in case-control studies using genotype data. Am. J. Hum. Genet. 75, 35–43 (2003).
Satten, G. A. & Epstein, M. P. Comparison of prospective and retrospective methods for haplotype inference in case-control studies. Genet. Epidemiol. 27, 192–201 (2004).
Prentice, R. L. & Pyke, R. Logistic disease incidence model and case-control studies. Biometrika 66, 403–411 (1979).
Kuk, A. Y. C., Li, X. & Xu, J. A fast collapsed data method for estimating haplotype frequencies from pooled genotype data with applications to the study of rare variants. Stat. Med. 32, 1343–1360 (2013).
Acknowledgements
We would like to thank the referees for their helpful comments and suggestions. The research of the third author was supported by the National Science Foundation of China Grant 11271346.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Appendix First and second derivatives of the log-likelihood function
It can be assumed without loss of generality that δj=0 for J⩾j>k, hence the model involves only the parameters v, μ and δ1,…,δk. The log-likelihood function can be written as l=l1+l2, where

and

The first and second derivatives of l1 can be obtained readily by symbolic differentiation. Note in particular that ∂2l1/∂δi∂δj=0 for i≠j. The only thing left to do is to find the derivatives of l2. As l2 is a function of v only, the only non-zero derivatives are

and

Rights and permissions
About this article
Cite this article
Kuk, A., Nott, D. & Yang, Y. A stepwise likelihood ratio test procedure for rare variant selection in case–control studies. J Hum Genet 59, 198–205 (2014). https://doi.org/10.1038/jhg.2014.1
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/jhg.2014.1


