Table 2 Clusters of orthologous genes (COGs) with pan-genome-wide significant associations for invasive vs. non-invasive uropathogenic E. coli (UPEC).

From: Horizontally acquired papGII-containing pathogenicity islands underlie the emergence of invasive uropathogenic Escherichia coli lineages

Gene

Gene product

Associated locus

Frequency invasive UPEC isolates (n = 385)

Frequency non-invasive UPEC isolates (n = 337)

P

Adjusted Pa

Odds ratio (95% CI)

papGIIb

P fimbriae adhesin variant PapGII

papGII

246 (63.9%)

52 (15.4%)

6.0E−42

1.7E−37

9.7 (6.7–14.2)

papF

P fimbriae minor subunit PapF

papGII

230 (59.7%)

49 (14.5%)

2.0E−37

5.6E−33

8.7 (6.0–12.8)

papJ

P fimbriae assembly protein PapJ

various pap

285 (74.0%)

128 (38.0%)

8.4E−23

2.4E−18

4.6 (3.3–6.5)

papD

P fimbriae chaperone PapD

various pap

290 (75.3%)

133 (39.5%)

1.0E−22

2.9E−18

4.7 (3.4–6.5)

papC

P fimbriae outer membrane usher PapC

various pap

288 (74.8%)

133 (39.5%)

5.1E−22

1.4E−17

4.5 (3.3–6.3)

papH

P fimbriae minor subunit PapH

various pap

284 (73.8%)

132 (39.2%)

3.6E−21

1.0E−16

4.4 (3.1–6.1)

iucB

Aerobactin biosynthesis protein IucB

iuc

283 (73.5%)

132 (39.2%)

7.6E−21

2.2E−16

4.3 (3.1–6.0)

papK

P fimbriae minor subunit PapK

various pap

274 (71.2%)

126 (37.4%)

5.3E−20

1.5E−15

4.1 (3.0–5.7)

iucC

Aerobactin biosynthesis protein IucC

iuc

284 (73.8%)

136 (40.4%)

9.4E−20

2.7E−15

4.1 (3.0–5.8)

iucA

Aerobactin biosynthesis protein IucA

iuc

283 (73.5%)

135 (40.1%)

1.0E−19

2.9E−15

4.1 (3.0–5.8)

  1. Frequencies, P values (Fisher’s exact test), and odds ratios with 95% confidence intervals (CI) are shown for COGs with P values below the simulation inferred significance threshold (raw P = 1.42 × 10−18 or Bonferroni adjusted P = 4.04 × 10−14). Frequencies of papGII, which was occasionally fragmented in assemblies due to the presence of multiple papG alleles, were corrected using read-mapping-based identification (Supplementary Data 9). Other pap genes (papIBAHCDJKEF) were identified at lower significance levels than papGII, because these co-occurred in combination with other, non-significant papG alleles (papGI, papGIII, papGIV, papGV) or occurred as different variants. The iuc locus in E. coli consists of six genes: shiF, iucABCD, and iutA. The genes shiF and iutA occurred in two alleles (shiF, shiFp; iutA1, iutA2; Supplementary Fig. 15) assigned to distinct COGs, resulting in decreased pan-genome-wide significance. Additional COGs with P values below the Bonferroni adjusted P = 0.05 (corresponding to a raw P = 1.76 × 10−6) are provided in Supplementary Data 4.
  2. aBonferroni adjusted for comparisons of 28,468 candidate COGs.
  3. bFrequencies corrected by read mapping; uncorrected BLASTp based frequencies: 60.5% (invasive UPEC isolates) and 13.4% (non-invasive UPEC isolates).