Abstract
In single nucleotide polymorphism (SNP) data analysis, the allelic odds ratio and its confidence interval (CI) are usually used to evaluate the association between disease and alleles at each SNP. The usual formula for calculating the CI of the allelic odds ratio based on the Hardy–Weinberg equilibrium (HWE) may, however, lead to errors beyond the control assured by the nominal confidence level if HWE is not true. We therefore present a generalized formula for CI that does not assume HWE. CIs calculated by this generalized formula are likely to be wider than those by the usual method if the Hardy–Weinberg disequilibrium (HWD) is toward a relative deficiency of the heterozygotes (fixation index greater than 0), whereas they are likely to be narrower if HWD is toward a relative excess of the heterozygotes (fixation index less than 0). A simulation experiment to examine the influence of the generalization was performed for the case where 2% of SNPs had a fixation index greater than 0. The result revealed that the generalized method slightly decreased the mean number of falsely detected SNPs.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Agresti A (2001) Categorical data analysis, 2nd edn. Wiley, New York
Balding D, Bishop M, Cannings C (2001) Handbook of statistical genetics. Wiley, New York
Bishop Y, Fienberg S, Holland P (1975) Discrete multivariate analysis: theory and practice. MIT Press, Cambridge
Gyorffy B, Kocsis I, Vasahelyi B (2004) Biallelic genotype distributions in papers published in Gut between 1998 and 2003: altered conclusions after recalculating the Hardy–Weinberg equilibrium. Gut 53:614–616
Haga H, Yamada R, Ohnishi Y, Nakamura Y, Tanaka T (2002) Gene-based SNP discovery as part of the Japanese Millennium Genome Project: identification of 190,562 genetic variations in the human genome. Single-nucleotide polymorphism. J Hum Genet 47:605–610
Hirakawa M, Tanaka T, Hashimoto Y, Kuroda M, Takagi T, Nakamura Y (2002) JSNP: a database of common gene variations in the Japanese population. Nucleic Acids Res 30:158–162
Hirschhorn JN, Daly MJ (2005) Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6:95–108
International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature 431:931–945
Kocsis I, Gyorffy B, Nemeth E, Vasarhelyi B (2004a) Examination of Hardy–Weinberg equilibrium in papers of Kidney International: an underused tool. Kidney Int 65:1956–1958
Kocsis I, Vasarhelyi B, Gyorffy A, Gyorffy B (2004b) Reanalysis of genotype distributions published in Neurology between 1999 and 2002. Neurology 63:357–358
Li CC, Horvitz DG (1953) Some methods of estimating the inbreeding coefficient. Am J Hum Genet 5:107–117
Nei M (1987) Molecular evolutionary genetics. Columbia University Press, New York
Osawa H, Yamada K, Onuma H, Murakami A, Ochi M, Kawata H, Nishimiya T, Niiya T, Shimizu I, Nishida W, Hashiramoto M, Kanatsuka A, Fujii Y, Ohashi J, Makino H (2004) The G/G genotype of a resistin single-nucleotide polymorphism at −420 increases type 2 diabetes mellitus susceptibility by inducing promoter activity through specific binding of Sp1/3. Am J Hum Genet 75:678–686
Pharoah PD, Antoniou A, Bobrow M, Zimmern RL, Easton DF, Ponder BA (2002) Polygenic susceptibility to breast cancer and implications for prevention. Nat Genet 31:33–36
Ponder BA (2001) Cancer genetics. Nature 17:336–341
Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G, Sherry S et al (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409:928–933
Salanti G, Amountza G, Ntzani EE, Ioannidis JP (2005) Hardy–Weinberg equilibrium in genetic association studies: an empirical evaluation of reporting, deviations, and power. Eur J Hum Genet 13:840–848
Sasieni PD (1997) From genotypes to genes: doubling the sample size. Biometrics 53:1253–1261
Sato Y, Suganami H, Hamada C, Yoshimura I, Yoshida T, Yoshimura K (2004) Designing a multistage, SNP-based, genome screen for common diseases. J Hum Genet 49:669–676
Schaid DJ, Jacobsen SJ (1999) Biased tests of association: comparisons of allele frequencies when departing from Hardy–Weinberg proportions. Am J Epidemiol 149:706–711
Sing F, Haviland B, Reilly L (1996) Genetic architecture of common multifactorial diseases. In: Variation in the Human Genome (Ciba Foundation Symposium 1997). Wiley, Chichester, pp 211–229
Wittke-Thompson JK, Pluzhnikov A, Cox NJ (2005) Rational Inferences about Departures from Hardy–Weinberg Equilibrium. Am J Hum Genet 76:967–986
Wright S (1951) The genetical structure of populations. Ann Eugen 15:23–354
Wright S (1965) The interpretation of population structure by F-statistics with special regard to systems of mating. Evolution 19:395–420
Wright AF, Carothers AD, Pirastu M (1999) Population choice in mapping genes for complex diseases. Nat Genet 23:397–404
Xu J, Turner A, Little J, Bleecker ER, Meyers DA (2002) Positive results in association studies are associated with departure from Hardy–Weinberg equilibrium: hint for genotyping error? Hum Genet 111:573–574
Yoshida T, Yoshimura K (2003) Outline of disease gene hunting approaches in the Millennium Genome Project of Japan. Proc Jpn Acad Ser B 79:34–50
Zaykin DV, Meng Z, Ghosh SK (2004) Interval estimation of genetic susceptibility for retrospective case-control studies. BMC Genet 5:9
Acknowledgments
We thank Professor Toshiya Sato at Kyoto University and Dr. Takashi Sozu at Tokyo University of Science for their valuable advice in improving this paper. We are grateful to the anonymous reviewers for their useful comments, which greatly improved this paper. This study was supported by the Program for Promotion of Fundamental Studies in Health Sciences of the National Institute of Biomedical Innovation of Japan.
Author information
Authors and Affiliations
Corresponding author
Appendix: Mathematical details
Appendix: Mathematical details
Let the population probabilities of genotypes “XX”, “Xx”, and “xx” be π1, π2 and π3 (π1 + π2 + π3=1), respectively, then those of alleles “X” and “x” for a SNP are given by Eq. 10.
When we use the fixation index (Li et al. 1953) defined by
(πi; i=1, 2, 3) are expressed as Eq. 12.
Therefore, (π1, π2, π3) is equivalent to (P 1, P 2, F).
When the Hardy–Weinberg equilibrium (HWE) holds, F=0 and Eq. 12 reduces to Eq. 13; that is, the second term in the right side of Eq. 12 represents the degree of disequilibrium.
For a random sample of size n, the observed frequency (n 1, n 2, n 3); (n 1 + n 2 + n 3=n) of genotypes (XX, Xx, xx) is distributed as trinomial distribution Tn(n; π1, π2, π3) and, therefore, the maximum likelihood estimator of πi (i=1, 2, 3) is p i=n i/n (i=1, 2, 3). Likewise, the maximum likelihood estimators of P 1 and allele odds \({P_{1} } \mathord{\left/ {\vphantom {{P_{1} } {{\left({1 - P_{1} } \right)}}}} \right. \kern-\nulldelimiterspace} {{\left({1 - P_{1} } \right)}}\;\hbox{are}\; \hat{P}_{1} = p_{1} + {p_{2}} \mathord{\left/ {\vphantom {{p_{2} } 2}} \right. \kern-\nulldelimiterspace} 2 = {{\left({2n_{1} + n_{2} } \right)}} \mathord{\left/ {\vphantom {{{\left({2n_{1} + n_{2} } \right)}} {{\left({2n} \right)}}}} \right. \kern-\nulldelimiterspace} {{\left({2n} \right)}},\;\hbox{and}\;{\hat{P}_{1} } \mathord{\left/ {\vphantom {{\hat{P}_{1} } {{\left({1 - \hat{P}_{1} } \right)}}}} \right. \kern-\nulldelimiterspace} {{\left({1 - \hat{P}_{1} } \right)}},\) respectively.
Since the means, variances, and covariance of p 1 and p 2 are given (Bishop et al. 1975; Agresti 2001) by
the mean and variance of \(\hat{P}_{1}\) is, after a simple but tedious algebra, derived as Eqs. 15 and 16.
When F=0, the last term is the well-known formula for binomial proportion for the size 2n and probability P 1. It reflects that the distribution of the frequency of allele X under HWE is the same as that of allele X randomly chosen from 2n alleles with P 1 as the proportion of X.
Since \(\hat{P}_{1}\) tends to P 1 in probability when n tends to infinity, the logarithm of estimated allelic odds, \(\log {\left({{\hat{P}_{1}} \mathord{\left/ {\vphantom {{\hat{P}_{1} } {{\left({1 - \hat{P}_{1} } \right)}}}} \right. \kern-\nulldelimiterspace} {{\left({1 - \hat{P}_{1} } \right)}}} \right)},\) can be approximated by the first order Taylor expansion as Eq. 17.
Consequently, the mean and variance of \(\log {\left({{\hat{P}_{1} } \mathord{\left/ {\vphantom {{\hat{P}_{1} } {{\left({1 - \hat{P}_{1} } \right)}}}} \right. \kern-\nulldelimiterspace} {{\left({1 - \hat{P}_{1} } \right)}}} \right)}\) are asymptotically approximated by Eqs. 18 and 19:
When we consider the populations of cases and controls of a disease, the association between allele and disease is conventionally represented by the allele odds ratio ψ defined by Eq. 20, where the case and the control are differentiated with the second subscript 1 (case) and 2 (control).
Consider we have random samples of size n .1 and n .2 from cases and controls, respectively. Then the maximum likelihood estimator \(\hat{\psi}\) of ψ is given by Eq. 21, where \(\hat{P}_{11}\;\hbox{and}\;\hat{P}_{12}\) are the maximum likelihood estimators based on samples of case and control, respectively.
Since the sample of case and that of control can be assumed independent, we obtain Eqs. 22 and 23.
where F 1 and F 2 are fixation indices of case and control, respectively.
When we construct an asymptotic confidence interval of log (ψ) with confidence level 1−α, we should replace \(V{\left\{ {\log {\left({\hat{\psi }} \right)}} \right\}}\) with its estimator given by Eq. 24.
where \(\hat{F}_{1}, \hat{F}_{2}\) are as follows:
Rights and permissions
About this article
Cite this article
Sato, Y., Suganami, H., Hamada, C. et al. The confidence interval of allelic odds ratios under the Hardy–Weinberg disequilibrium. J Hum Genet 51, 772–780 (2006). https://doi.org/10.1007/s10038-006-0020-6
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1007/s10038-006-0020-6
Keywords
This article is cited by
-
Susceptibility to leishmaniasis is affected by host SLC11A1 gene polymorphisms: a systematic review and meta-analysis
Parasitology Research (2019)
-
Impact of Hardy–Weinberg equilibrium deviation on allele-based risk effect of genetic association studies and meta-analysis
European Journal of Epidemiology (2010)
-
Variance estimation of allele-based odds ratio in the absence of Hardy–Weinberg equilibrium
European Journal of Epidemiology (2008)