Abstract
Genome-wide association studies (GWAS) are being conducted to identify common genetic variants that predispose to human diseases to unravel the genetic etiology of complex human diseases now. Because of genotyping cost constraints, it often follows a two-stage design, in which a large number of markers are identified in a proportion of the available samples in stage 1, and then the markers identified in stage 1 are examined in all the samples in stage 2. In this paper, we introduce a nonlinear entropy-based statistic for joint analysis for two-stage genome-wide association studies. Type I error rates and power of the entropy-based statistic for association tests are validated using simulation studies in single-locus test. The power of entropy-based joint analysis is investigated by simulations. And the results suggest that entropy-based joint analysis is always more powerful than linear joint analysis that uses a linear function of risk allele frequencies in cases and controls when detecting rare genetic variants; the powers of these two joint analyses are comparable when detecting common genetic variants. Furthermore, when the false discovery rate is controlled, entropy-based joint analysis is more powerful and needs fewer samples than linear joint analysis that uses a linear function of risk allele frequencies in cases and controls. So, we recommend we should use entropy-based strategy for two-stage genome-wide association studies to detect the rare and common genetic variants with moderate to large genetic effect underlying a complex disease.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Ackerman H, Usen S, Mott R, Richardson A, Sisay-Joof F, Katundu P, Tayor T, Ward R, Molyneux M, Pinder M et al (2003) Haplotypic analysis of the TNF locus by association efficiency and entropy. Genome Bio 4(4):R24.10
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc Ser B 57:289–300
Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New York, pp 12–15
Greiner W, Neise L, Stocker H (Translator) (1995) Thermodynamics and statistical mechanics. Springer, New York, pp 121–135
Hampe J, Schreiber S, Krawczak M (2003) Entropy-based SNP selection for genetic association studies. Hum Genet 114:36–43
Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, Cox DR (2005) Whole-genome patterns of common DNA variation in three human populations. Science 307:1072–1079
Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, Bracken MB, Ferris FL,Ott J, Barnstable C, Hoh J (2005) Complement factor H polymorphism in age-related macular degeneration. Science 308:385–389
Lehmann EL (1983) Theory of point estimation. Wiley, New York, pp 343–344
Lin DY (2006) Evaluating statistical significance in two-stage genomewide association studies. Am J Hum Genet 78:505–509
Lin S, Chakravarti A, Culter DJ (2004) Exhaustive allelic transmission disequilibrium tests as a new approach to genome-wide association studies. Nat Genet 36:1181–1188
Marchini J, Donnelly P, Cardon LR (2005) Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat Genet 37:413–417
Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273:1516–1517
Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G, Sherry S, Mullikin JC, Mortimore BJ, Willey DL, Hunt SE, Cole CG, Coggill PC, Rice CM, Ning Z, Rogers J, Bentley DR, Kwok PY, Mardis ER, Yeh RT, Schultz B, Cook L, Davenport R, Dante M, Fulton L, Hillier L, Waterston RH, McPherson JD, Gilman B, Schaffner S, Van Etten WJ, Reich D, Higgins J, Daly MJ, Blumenstiel B, Baldwin J, Stange-Thomann N, Zody MC, Linton L, Lander ES, Altshuler D; International SNP Map Working Group (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409:928–933
Satagopan JM, Elston RC (2003) Optimal two-stage genotyping in population-based association studies. Genet Epidemiol 25:149–157
Satagopan JM, Verbel DA, Venkatraman ES, Offit KE, Begg CB (2002) Two-stage designs for gene-disease association studies. Biometrics 58:163–170
Satagopan JM, Venkatraman ES, Begg CB (2004) Two-stage designs for gene- disease association studies with sample size contraints. Biometrics 60:589–597
Skol AD, Scott LJ, Abecasis GR, Boehnke M (2006) Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet 38:209–213
Thomas D, Xie RR, Gebregzibher M (2004) Sampling designs for gene association studies. Genet Epidemiol 27:401–414
Thomas DC, Haile RW, Duggan D (2005) Recent developments in genomewide association scans: a workshop summary and review. Am J Hum Genet 77:337–345
Zhao JY, Boerwinkle E, Xiong MM (2005) An entropy-based statistic for genomewide association studies. Am J Hum Genet 77:27–40
Zhao JY, Jin L, Xiong MM (2006) Nonlinear tests for genome-wide association studies. Genetics 174:1529–1538
Zou GH, Zuo YJ (2006) On the sample size requirement in genetic association tests when the proporting of false positives is controlled. Genetics 172:687–691
Zuo YJ, Zou GH, Zhao HY (2006) Two-stage designs in case-control association analysis. Genetics 173:1747–1760
Acknowledgments
We would like to thank two referees for very helpful comments on an earlier draft. This work was supported by grant DMS 0234078 from the National Science Foundation to Y. Zuo.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kang, G., Zuo, Y. Entropy-based joint analysis for two-stage genome-wide association studies. J Hum Genet 52, 747–756 (2007). https://doi.org/10.1007/s10038-007-0177-7
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1007/s10038-007-0177-7
Keywords
This article is cited by
-
Multi-strategy genome-wide association studies identify the DCAF16-NCAPG region as a susceptibility locus for average daily gain in cattle
Scientific Reports (2016)
-
Two-stage designs to identify the effects of SNP combinations on complex diseases
Journal of Human Genetics (2008)


