Fig. 3: Pangenome representations of E. coli and Shigella.
From: Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups

a Each bar length of the circular bar plot represents the total number of proteins of a single genome, grouped by phylogroup. The proteins belonging to the TOTcore97 genome are shown in green. Additional proteins shared in each PHYcore97 genome are shown in blue, whereas purple is reserved for accessory proteins. b Principal Coordinate Analysis plot of 135,983 protein families of 10,667 assembled genomes. Phylogroups are indicated by the same color scheme used in Figs. 1 and 2. c Core genome matrix of 6719 phylogroup core clusters and 10,667 assembled genomes. Clusters are sorted such that the core for the species is placed first, then the phylogroup core genes are placed, sorted by their overall abundance in the species for each phylogroup in the same order as Fig. 1; finally, the remaining clusters are placed by overall abundance. Phylogroup unique core genes are indicated by purple blocks which do not appear in other phylogroups.