Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

The effects of human population structure on large genetic association studies

Abstract

Large-scale association studies hold substantial promise for unraveling the genetic basis of common human diseases. A well-known problem with such studies is the presence of undetected population structure, which can lead to both false positive results and failures to detect genuine associations. Here we examine 15,000 genome-wide single-nucleotide polymorphisms typed in three population groups to assess the consequences of population structure on the coming generation of association studies. The consequences of population structure on association outcomes increase markedly with sample size. For the size of study needed to detect typical genetic effects in common diseases, even the modest levels of population structure within population groups cannot safely be ignored. We also examine one method for correcting for population structure (Genomic Control). Although it often performs well, it may not correct for structure if too few loci are used and may overcorrect in other settings, leading to substantial loss of power. The results of our analysis can guide the design of large-scale association studies.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: The effects of population structure at a SNP locus.
Figure 2: Differences between actual and predicted values (residuals) in data set II (the Asian sample).
Figure 3: Multiplicative change in P values due to population structure in small samples.
Figure 4: Multiplicative change in P values due to population structure in large samples (shown on log10 scale).
Figure 5: Multiplicative change in P values due to population structure after Genomic Control correction for scenario A2.
Figure 6: Multiplicative change in P values due to population structure after Genomic Control correction for scenarios B1 (ac), B2 (df) and B3 (gi).

Similar content being viewed by others

References

  1. International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003).

  2. Spence, M.A., Greenberg, D.A., Hodge, S.E. & Vieland, V.J. The emperor's new methods. Am. J. Hum. Genet. 72, 1084–1087 (2003).

    Article  CAS  Google Scholar 

  3. Thomas, D.C. & Witte, J.S. Point: population stratification: a problem for case-control studies of candidate-gene associations? Cancer Epidemiol. Biomarkers Prev. 11, 505–512 (2002).

    Google Scholar 

  4. Ziv, E. & Burchard, E.G. Human population structure and genetic association studies. Pharmacogenomics 4, 431–441 (2003).

    Article  Google Scholar 

  5. Ardlie, K.G., Lunetta, K.L. & Seielstad, M. Testing for population subdivision and association in four case-control studies. Am. J. Hum. Genet. 71, 304–311 (2002).

    Article  CAS  Google Scholar 

  6. Wacholder, S., Rothman, N. & Caporaso, N. Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. Cancer Epidemiol. Biomarkers Prev. 11, 513–520 (2002).

    Google Scholar 

  7. Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996).

    Article  CAS  Google Scholar 

  8. Risch, N.J. Searching for genetic determinants in the new millennium. Nature 405, 847–856 (2000).

    Article  CAS  Google Scholar 

  9. Balding, D.J. & Nichols, R.A. A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica 96, 3–12 (1995).

    Article  CAS  Google Scholar 

  10. Nicholson, G. et al. Assessing population differentiation and isolation from single nucleotide polymorphism data. J. R. Stat. Soc. (B) 64, 695–716 (2002).

    Article  Google Scholar 

  11. Marchini, J.L. & Cardon, L. Discussion of Nicholson et al. J. R. Stat. Soc. (B) 64, 1–21 (2002).

    Article  Google Scholar 

  12. Excoffier, L. Analysis of population subdivision. in Handbook of Statistical Genetics (eds. Balding, D.J., Bishop, M. & Cannings, C.) 271–308 (John Wiley & Sons, New York, 2001).

    Google Scholar 

  13. Balding, D.J. Likelihood-based inference for genetic correlation coefficients. Theor. Popul. Biol. 63, 221–230 (2003).

    Article  Google Scholar 

  14. Cavalli-Sforza, L.L., Menozzi, P. & Piazza, A. History and Geography of Human Genes (Princeton University Press, Princeton, New Jersey, 1994).

    Google Scholar 

  15. Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).

    Article  CAS  Google Scholar 

  16. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).

    Article  CAS  Google Scholar 

  17. Sasieni, P.D. From genotypes to genes: doubling the sample size. Biometrics 53, 1253–1261 (1997).

    Article  CAS  Google Scholar 

  18. Moller, T. et al. Cancer prevalence in Northern Europe: the EUROPREVAL study. Ann. Oncol. 14, 946–957 (2003).

    Article  CAS  Google Scholar 

  19. Pritchard, J.K. & Rosenberg, N.A. Use of unlinked genetic markers to detect population stratification in association studies. Am. J. Hum. Genet. 65, 220–228 (1999).

    Article  CAS  Google Scholar 

  20. Reich, D.E. & Goldstein, D.B. Detecting association in a case-control study while correcting for population stratification. Genet. Epidemiol. 20, 4–16 (2001).

    Article  CAS  Google Scholar 

  21. Pritchard, J.K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).

    CAS  Google Scholar 

  22. Pritchard, J.K., Stephens, M., Rosenberg, N.A. & Donnelly, P. Association mapping in structured populations. Am. J. Hum. Genet. 67, 170–181 (2000).

    Article  CAS  Google Scholar 

  23. Satten, G.A., Flanders, W.D. & Yang, Q. Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. Am. J. Hum. Genet. 68, 466–477 (2001).

    Article  CAS  Google Scholar 

  24. Ripatti, S., Pitkaniemi, J. & Sillanpaa, M.J. Joint modeling of genetic association and population stratification using latent class models. Genet. Epidemiol. 21 Suppl 1, S409–S414 (2001).

    Article  Google Scholar 

  25. Hoggart, C.J. et al. Control of confounding of genetic associations in stratified populations. Am. J. Hum. Genet. 72, 1492–1504 (2003).

    Article  CAS  Google Scholar 

  26. Pritchard, J.K. & Donnelly, P. Case-control studies of association in structured or admixed populations. Theor. Popul. Biol. 60, 227–237 (2001).

    Article  CAS  Google Scholar 

  27. Bacanu, S.A., Devlin, B. & Roeder, K. The power of genomic control. Am. J. Hum. Genet. 66, 1933–1944 (2000).

    Article  CAS  Google Scholar 

  28. Clayton, D. Population association. in Handbook of Statistical Genetics (eds. Balding, D.J., Bishop, M. & Cannings, C.) 519–540 (John Wiley & Sons, New York, 2001).

    Google Scholar 

  29. Goldstein, D.B., Tate, S.K. & Sisodiya, S.M. Pharmacogenetics goes genomic. Nat. Rev. Genet. 4, 937–947 (2003).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

L.R.C. and J.M. thank The Wellcome Trust for support. L.R.C. and P.D. acknowledge the US National Institutes of Health and The SNP Consortium.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Donnelly.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marchini, J., Cardon, L., Phillips, M. et al. The effects of human population structure on large genetic association studies. Nat Genet 36, 512–517 (2004). https://doi.org/10.1038/ng1337

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/ng1337

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing