Table 2 Sample sizes required to detect causal genes of different MAF and effect sizes in a pan-genome GWAS.

From: PowerBacGWAS: a computational pipeline to perform power calculations for bacterial genome-wide association studies

Bacterial species

Strain collection

Gene frequency (%)

Effect size (odds ratio)

Small (1.5)

Moderate (5)

Large (10)

Very large (100)

Enterococcus faecium (pan-genome GWAS)

Species-wide (n = 1432)

1

2.5

1100

5

1000

600

500

10

500

400

200

25

200

200

100

Single-clade (n = 1531)

0–1

2.5

1400

1000

5

10

600

400

300

25

300

200

100

Klebsiella pneumoniae (pan-genome GWAS)

Species-wide (n = 2628)

1

2.5

2500

1600

1200

5

1500

1000

700

10

600

400

300

25

500

400

200

Single-clade (n = 1193)

0–1

2.5

1000

5

900

700

500

10

500

300

200

25

300

200

100

Mycobacterium tuberculosis (pan-genome GWAS)

Species-wide (n = 2655)

1

2.5

2000

1300

1000

5

1100

700

500

10

900

500

25

300

200

100

Single-clade (n = 1139)

0–1

2.5

1000

5

900

700

500

10

500

300

200

25

300

200

100

  1. MAF minor allele frequency, - non-detectable with 80% power.
  2. Results of running GWAS power calculations applying the phenotype simulation approach (binary phenotype, full heritability assumed). This table shows the minimum sample sizes required to detect acquired genes of different effect sizes (in odds ratio units) and gene frequencies in a pan-genome GWAS with 80% power, in both species-wide and single-clade populations.