Abstract
Large-scale association studies hold substantial promise for unraveling the genetic basis of common human diseases. A well-known problem with such studies is the presence of undetected population structure, which can lead to both false positive results and failures to detect genuine associations. Here we examine ∼15,000 genome-wide single-nucleotide polymorphisms typed in three population groups to assess the consequences of population structure on the coming generation of association studies. The consequences of population structure on association outcomes increase markedly with sample size. For the size of study needed to detect typical genetic effects in common diseases, even the modest levels of population structure within population groups cannot safely be ignored. We also examine one method for correcting for population structure (Genomic Control). Although it often performs well, it may not correct for structure if too few loci are used and may overcorrect in other settings, leading to substantial loss of power. The results of our analysis can guide the design of large-scale association studies.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
References
International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003).
Spence, M.A., Greenberg, D.A., Hodge, S.E. & Vieland, V.J. The emperor's new methods. Am. J. Hum. Genet. 72, 1084–1087 (2003).
Thomas, D.C. & Witte, J.S. Point: population stratification: a problem for case-control studies of candidate-gene associations? Cancer Epidemiol. Biomarkers Prev. 11, 505–512 (2002).
Ziv, E. & Burchard, E.G. Human population structure and genetic association studies. Pharmacogenomics 4, 431–441 (2003).
Ardlie, K.G., Lunetta, K.L. & Seielstad, M. Testing for population subdivision and association in four case-control studies. Am. J. Hum. Genet. 71, 304–311 (2002).
Wacholder, S., Rothman, N. & Caporaso, N. Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. Cancer Epidemiol. Biomarkers Prev. 11, 513–520 (2002).
Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996).
Risch, N.J. Searching for genetic determinants in the new millennium. Nature 405, 847–856 (2000).
Balding, D.J. & Nichols, R.A. A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica 96, 3–12 (1995).
Nicholson, G. et al. Assessing population differentiation and isolation from single nucleotide polymorphism data. J. R. Stat. Soc. (B) 64, 695–716 (2002).
Marchini, J.L. & Cardon, L. Discussion of Nicholson et al. J. R. Stat. Soc. (B) 64, 1–21 (2002).
Excoffier, L. Analysis of population subdivision. in Handbook of Statistical Genetics (eds. Balding, D.J., Bishop, M. & Cannings, C.) 271–308 (John Wiley & Sons, New York, 2001).
Balding, D.J. Likelihood-based inference for genetic correlation coefficients. Theor. Popul. Biol. 63, 221–230 (2003).
Cavalli-Sforza, L.L., Menozzi, P. & Piazza, A. History and Geography of Human Genes (Princeton University Press, Princeton, New Jersey, 1994).
Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).
Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).
Sasieni, P.D. From genotypes to genes: doubling the sample size. Biometrics 53, 1253–1261 (1997).
Moller, T. et al. Cancer prevalence in Northern Europe: the EUROPREVAL study. Ann. Oncol. 14, 946–957 (2003).
Pritchard, J.K. & Rosenberg, N.A. Use of unlinked genetic markers to detect population stratification in association studies. Am. J. Hum. Genet. 65, 220–228 (1999).
Reich, D.E. & Goldstein, D.B. Detecting association in a case-control study while correcting for population stratification. Genet. Epidemiol. 20, 4–16 (2001).
Pritchard, J.K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).
Pritchard, J.K., Stephens, M., Rosenberg, N.A. & Donnelly, P. Association mapping in structured populations. Am. J. Hum. Genet. 67, 170–181 (2000).
Satten, G.A., Flanders, W.D. & Yang, Q. Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. Am. J. Hum. Genet. 68, 466–477 (2001).
Ripatti, S., Pitkaniemi, J. & Sillanpaa, M.J. Joint modeling of genetic association and population stratification using latent class models. Genet. Epidemiol. 21 Suppl 1, S409–S414 (2001).
Hoggart, C.J. et al. Control of confounding of genetic associations in stratified populations. Am. J. Hum. Genet. 72, 1492–1504 (2003).
Pritchard, J.K. & Donnelly, P. Case-control studies of association in structured or admixed populations. Theor. Popul. Biol. 60, 227–237 (2001).
Bacanu, S.A., Devlin, B. & Roeder, K. The power of genomic control. Am. J. Hum. Genet. 66, 1933–1944 (2000).
Clayton, D. Population association. in Handbook of Statistical Genetics (eds. Balding, D.J., Bishop, M. & Cannings, C.) 519–540 (John Wiley & Sons, New York, 2001).
Goldstein, D.B., Tate, S.K. & Sisodiya, S.M. Pharmacogenetics goes genomic. Nat. Rev. Genet. 4, 937–947 (2003).
Acknowledgements
L.R.C. and J.M. thank The Wellcome Trust for support. L.R.C. and P.D. acknowledge the US National Institutes of Health and The SNP Consortium.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Rights and permissions
About this article
Cite this article
Marchini, J., Cardon, L., Phillips, M. et al. The effects of human population structure on large genetic association studies. Nat Genet 36, 512–517 (2004). https://doi.org/10.1038/ng1337
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/ng1337
This article is cited by
-
Genetic and ecological drivers of molt in a migratory bird
Scientific Reports (2023)
-
Hybrid autoencoder with orthogonal latent space for robust population structure inference
Scientific Reports (2023)
-
Indigenous Australian genomes show deep structure and rich novel variation
Nature (2023)
-
The contribution of common regulatory and protein-coding TYR variants to the genetic architecture of albinism
Nature Communications (2022)
-
KIBRA, MTNR1B, and FKBP5 genotypes are associated with decreased odds of incident delirium in elderly post-surgical patients
Scientific Reports (2022)