Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Microbial genome-wide association studies: lessons from human GWAS

Key Points

  • Genome-wide association studies (GWAS) have been highly successful in the analyses of human genomic data. The increased availability of microorganism whole genomes provides the opportunity for microbial GWAS.

  • Initial microbial GWAS have had success identifying variants for traits under strong selection, such as drug resistance, in a range of bacteria, viruses and protozoa.

  • Several challenges to microbial GWAS exist that could hinder identifying variants under moderate selection. The primary challenge is the increased population stratification in microorganisms owing to selection and complex recombination patterns.

  • Novel software that is tailored to the needs of microbial GWAS would greatly expedite progress in the field. In particular, the application of polygenic methods has yet to be evaluated in microorganisms.

  • An exciting future area of research is the generation of host and microbial genomics data within the same samples. This will allow for genome-to-genome analyses to test for host–microorganism interactions.

Abstract

The reduced costs of sequencing have led to whole-genome sequences for a large number of microorganisms, enabling the application of microbial genome-wide association studies (GWAS). Given the successes of human GWAS in understanding disease aetiology and identifying potential drug targets, microbial GWAS are likely to further advance our understanding of infectious diseases. These advances include insights into pressing global health problems, such as antibiotic resistance and disease transmission. In this Review, we outline the methodologies of GWAS, the current state of the field of microbial GWAS, and how lessons from human GWAS can direct the future of the field.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Phenotype prediction as GWAS sample sizes increase.
Figure 2: Potential models for microbial GWAS.

Similar content being viewed by others

References

  1. Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).

    Article  CAS  Google Scholar 

  2. Jostins, L. et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119–124 (2012).

    Article  CAS  Google Scholar 

  3. Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).

  4. Bush, W. S. & Moore, J. H. Chapter 11: Genome-wide association studies. PLoS Comput. Biol. 8, e1002822 (2012). This review discusses in detail the methods, nuances and caveats of GWAS.

    Article  CAS  Google Scholar 

  5. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    Article  CAS  Google Scholar 

  6. Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

    Article  CAS  Google Scholar 

  7. Manolio, T. A. Bringing genome-wide association findings into clinical use. Nat. Rev. Genet. 14, 549–558 (2013).

    Article  CAS  Google Scholar 

  8. Reich, D. E. & Lander, E. S. On the allelic spectrum of human disease. Trends Genet. 17, 502–510 (2001).

    Article  CAS  Google Scholar 

  9. Hirschhorn, J. N. & Daly, M. J. Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6, 95–108 (2005).

    Article  CAS  Google Scholar 

  10. Cordell, H. J. Detecting gene–gene interactions that underlie human diseases. Nat. Rev. Genet. 10, 392–404 (2009).

    Article  CAS  Google Scholar 

  11. Thomas, D. Gene–environment-wide association studies: emerging approaches. Nat. Rev. Genet. 11, 259–272 (2010).

    Article  CAS  Google Scholar 

  12. Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).

    Article  CAS  Google Scholar 

  13. Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).

    Article  CAS  Google Scholar 

  14. Bansal, V., Libiger, O., Torkamani, A. & Schork, N. J. Statistical analysis strategies for association studies involving rare variants. Nat. Rev. Genet. 11, 773–785 (2010).

    Article  CAS  Google Scholar 

  15. Lees, J. A. et al. Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes. Nat. Commun. 7, 12797 (2016). This methods paper presents a mixed model approach to microbial GWAS, including the analysis of k-mers.

    Article  CAS  Google Scholar 

  16. Earle, S. G. et al. Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Nat. Microbiol. 1, 16041 (2016). This methods paper presents an approach to disentangling the effects of single SNPs and lineage effects within microbial GWAS.

    Article  CAS  Google Scholar 

  17. Ioannidis, J. P., Thomas, G. & Daly, M. J. Validating, augmenting and refining genome-wide association signals. Nat. Rev. Genet. 10, 318–329 (2009).

    Article  CAS  Google Scholar 

  18. Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

    Article  CAS  Google Scholar 

  19. Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98–101 (2008).

    Article  CAS  Google Scholar 

  20. Didelot, X. & Maiden, M. C. Impact of recombination on bacterial evolution. Trends Microbiol. 18, 315–322 (2010).

    Article  CAS  Google Scholar 

  21. Read, T. D. & Massey, R. C. Characterizing the genetic basis of bacterial phenotypes using genome-wide association studies: a new direction for bacteriology. Genome Med. 6, 109 (2014). The authors present an important review of the findings of bacterial GWAS.

    Article  Google Scholar 

  22. Rosenberg, N. A. et al. Genome-wide association studies in diverse populations. Nat. Rev. Genet. 11, 356–366 (2010).

    Article  CAS  Google Scholar 

  23. Farhat, M. R. et al. Genomic analysis identifies targets of convergent positive selection in drug-resistant Mycobacterium tuberculosis. Nat. Genet. 45, 1183–1189 (2013). This microbial GWAS introduces the PhyC method, which uses phylogenetic trees to carry out a genome-wide scan of convergent evolution.

    Article  CAS  Google Scholar 

  24. Dudbridge, F. & Gusnanto, A. Estimation of significance thresholds for genomewide association scans. Genet. Epidemiol. 32, 227–234 (2008).

    Article  Google Scholar 

  25. NCI–NHGRI Working Group on Replication in Association Studies et al. Replicating genotype-phenotype associations. Nature 447, 655–660 (2007).

  26. Zollner, S. & Pritchard, J. K. Overcoming the winner's curse: estimating penetrance parameters from case-control data. Am. J. Hum. Genet. 80, 605–615 (2007).

    Article  CAS  Google Scholar 

  27. Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).

    Article  CAS  Google Scholar 

  28. Zeggini, E. & Ioannidis, J. P. Meta-analysis in genome-wide association studies. Pharmacogenomics 10, 191–201 (2009).

    Article  Google Scholar 

  29. Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012). An important perspective on the lessons learnt from human GWAS and predictions of the future of the field.

    Article  CAS  Google Scholar 

  30. Wray, N. R. et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507–515 (2013). A useful review of a range of polygenic methods and their applications.

    Article  CAS  Google Scholar 

  31. Purcell, S. M. et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).

    Article  CAS  Google Scholar 

  32. Lee, S. H., Yang, J., Goddard, M. E., Visscher, P. M. & Wray, N. R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics 28, 2540–2542 (2012).

    Article  CAS  Google Scholar 

  33. Visscher, P. M. & Yang, J. A plethora of pleiotropy across complex traits. Nat. Genet. 48, 707–708 (2016).

    Article  CAS  Google Scholar 

  34. Tan, J. C. et al. An optimized microarray platform for assaying genomic variation in Plasmodium falciparum field populations. Genome Biol. 12, R35 (2011).

    Article  CAS  Google Scholar 

  35. Cheeseman, I. H. et al. A major genome region underlying artemisinin resistance in malaria. Science 336, 79–82 (2012).

    Article  CAS  Google Scholar 

  36. Alam, M. T. et al. Dissecting vancomycin-intermediate resistance in Staphylococcus aureus using genome-wide association. Genome Biol. Evol. 6, 1174–1185 (2014).

    Article  CAS  Google Scholar 

  37. Chewapreecha, C. et al. Comprehensive identification of single nucleotide polymorphisms associated with β-lactam resistance within pneumococcal mosaic genes. PLoS Genet. 10, e1004547 (2014).

    Article  Google Scholar 

  38. Malaria Genomic Epidemiology Network. A global network for investigating the genomic epidemiology of malaria. Nature 456, 732–737 (2008).

  39. Pillay, D. et al. PANGEA-HIV: phylogenetics for generalised epidemics in Africa. Lancet Infect. Dis. 15, 259–261 (2015).

    Article  Google Scholar 

  40. Desjardins, C. A. et al. Genomic and functional analyses of Mycobacterium tuberculosis strains implicate ald in D-cycloserine resistance. Nat. Genet. 48, 544–551 (2016).

    Article  CAS  Google Scholar 

  41. Miotto, O. et al. Genetic architecture of artemisinin-resistant Plasmodium falciparum. Nat. Genet. 47, 226–234 (2015).

    Article  CAS  Google Scholar 

  42. Sheppard, S. K. et al. Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter. Proc. Natl Acad. Sci. USA 110, 11923–11927 (2013).

    Article  CAS  Google Scholar 

  43. Bartha, I. et al. A genome-to-genome analysis of associations between human genetic variation, HIV-1 sequence diversity, and viral control. eLife 2, e01123 (2013). An example of a genome-to-genome analysis with both host and microbial GWAS data.

    Article  Google Scholar 

  44. Laabei, M. et al. Predicting the virulence of MRSA from its genome sequence. Genome Res. 24, 839–849 (2014).

    Article  CAS  Google Scholar 

  45. Power, R. A. et al. Genome-wide association study of HIV whole genome sequences validated using drug resistance. PLoS ONE 11, e0163476 (2016).

    Article  Google Scholar 

  46. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Article  CAS  Google Scholar 

  47. Chen, P. E. & Shapiro, B. J. The advent of genome-wide association studies for bacteria. Curr. Opin. Microbiol. 25, 17–24 (2015).

    Article  CAS  Google Scholar 

  48. Thornton, T. & McPeek, M. S. ROADTRIPS: case-control association testing with partially or completely unknown population and pedigree structure. Am. J. Hum. Genet. 86, 172–184 (2010).

    Article  CAS  Google Scholar 

  49. Lippert, C. et al. FaST linear mixed models for genome-wide association studies. Nat. Methods 8, 833–837 (2011).

    Article  CAS  Google Scholar 

  50. Evangelou, E. & Ioannidis, J. P. Meta-analysis methods for genome-wide association studies and beyond. Nat. Rev. Genet. 14, 379–389 (2013).

    Article  CAS  Google Scholar 

  51. McCarthy, M. I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat. Rev. Genet. 9, 356–369 (2008).

    Article  CAS  Google Scholar 

  52. Traylor, M. et al. Using phenotypic heterogeneity to increase the power of genome-wide association studies: application to age at onset of ischaemic stroke subphenotypes. Genet. Epidemiol. 37, 495–503 (2013).

    Article  Google Scholar 

  53. Power, R. A. et al. Genome-wide association for major depression through age at onset stratification: major depressive disorder working group of the Psychiatric Genomics Consortium. Biol. Psychiatry http://dx.doi.org/10.1016/j.biopsych.2016.05.010 (2016).

  54. Hamshere, M. L. et al. Genome-wide significant associations in schizophrenia to ITIH3/4, CACNA1C and SDCCAG8, and extensive replication of associations reported by the Schizophrenia PGC. Mol. Psychiatry 18, 708–712 (2013).

    Article  CAS  Google Scholar 

  55. Rietveld, C. A. et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340, 1467–1471 (2013).

    Article  CAS  Google Scholar 

  56. Cross-Disorder Group of the Psychiatric Genomics Consortium. Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet 381, 1371–1379 (2013).

  57. Chapman, S. J. & Hill, A. V. Human genetic susceptibility to infectious disease. Nat. Rev. Genet. 13, 175–188 (2012).

    Article  CAS  Google Scholar 

  58. Bartha, I. et al. Estimating the respective contributions of human and viral genetic variation to HIV control. Preprint at bioRxivhttp://dx.doi.org/10.1101/029017 (2015).

  59. Walker, T. M. et al. Whole-genome sequencing for prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a retrospective cohort study. Lancet Infect. Dis. 15, 1193–1202 (2015).

    Article  CAS  Google Scholar 

  60. Fraser, C. et al. Virulence and pathogenesis of HIV-1 infection: an evolutionary perspective. Science 343, 1243727 (2014).

    Article  Google Scholar 

Download references

Acknowledgements

Research supported by a South African MRC Flagship grant (MRC-RFA-UFSP-01-2013/UKZN HIVEPI), Wellcome Trust grants (098051 and 201355/Z/16/Z) and a Royal Society Newton Advanced Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert A. Power.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

PowerPoint slides

Glossary

Genome-wide association studies

(GWAS). A hypothesis-free method that tests hundreds of thousands of variants across the genome to identify alleles that are associated with a phenotype.

Single-nucleotide polymorphisms

(SNPs). A base position where two alleles exist with a frequency of >1% in the population.

Heritability

The proportion of phenotypic variance that is due to inherited genetic variation.

Beta

The standardized regression coefficient, derived from linear regressions in genome-wide association studies of continuous traits. It is reported as an estimate of the effect size of a single-nucleotide polymorphism (SNP), and reflects the change in phenotype expected from carrying a copy of the reference allele of the SNP.

Odds ratio

(OR). The typical means of reporting the effect size of a single-nucleotide polymorphism in a case–control (or other binary phenotype) genome-wide association study. It is derived from a logistic regression, and represents the odds of the phenotype when carrying the reference allele, compared with the odds of the phenotype in the absence of the reference allele.

Main effects

The effects of a variant on the phenotype without accounting for any possible interactions with other variants or environmental factors.

Epistatic interactions

Interactions between variants at different locations in the genome.

Power

The probability that an analysis will reject the null hypothesis when the alternative hypothesis is true. It is influenced by numerous factors, such as the effect size and sample size.

Linkage disequilibrium

(LD). Correlations between variants due to co-inheritance. LD is usually higher between variants that are closer together, and is broken down by recombination.

Phred scores

A measure of the quality of sequencing at a given locus, specifically the confidence in the calling of alleles at that locus.

K-mers

A sequence of bases of length k that, in microbial genome-wide association studies, can be used as the genetic variant tested for association with the phenotype.

Superinfection

When an individual is infected with multiple strains of the same microorganism.

False positives

Variants, or any other predictors, that are identified as significantly associated with a phenotype but that are not causal. In the case of genome-wide association studies, this is usually due to confounding from population structure or insufficient quality control.

Clonal

The case in which reproduction produces genetically identical organisms, and so does not introduce novel variants or recombination.

Panmictic

A population in which clonal structure has been lost due to frequent recombination.

Genome-wide significance

The P value cut-off for declaring a variant significantly associated with a phenotype, accounting for the number of variants tested and the correlations between them.

Effect size

The proportion of variance in a phenotype predicted by a variant.

Polygenic methods

Statistical approaches that focus on the combined effects of many genetic variants rather than on the effect of any individual variant.

Pleiotropic

Pleiotropic variants are those that have an effect on multiple distinct phenotypes.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Power, R., Parkhill, J. & de Oliveira, T. Microbial genome-wide association studies: lessons from human GWAS. Nat Rev Genet 18, 41–50 (2017). https://doi.org/10.1038/nrg.2016.132

Download citation

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/nrg.2016.132

This article is cited by

Search

Quick links

Nature Briefing Microbiology

Sign up for the Nature Briefing: Microbiology newsletter — what matters in microbiology research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: Microbiology