Fig. 5: Patterns of intraspecies genomic diversity.

a The normalized pan-genome size as a function of the number of genomes from the same species was plotted, with regression curves drawn for each phylum. The coefficients of determination are displayed next to each curve, and the shaded areas represent the 95% confidence intervals. b Fraction of the core genome for each species is shown as a function of the number of genomes from the same species on the left, and as a histogram on the right, with colors indicating different phyla. The horizontal dashed line represents the median value across all species. c Proportion of core and accessory genes (n = 436 species) classified using various annotation schemes is displayed, along with the percentage of genes that lack functional annotation. The box lengths represent the interquartile range (IQR) of the data, and the whiskers extend to the minimum and maximum values within 1.5 times the IQR from the first and third quartiles, respectively. A two-tailed Wilcoxon rank-sum test was used to compare the classification between core and accessory genes, with significance indicated by ***p-value < 0.001. d Comparison of functional categories assigned to core and accessory genes highlights only statistically significant differences (FDR < 0.05). e The relationship between the number of single nucleotide variants (SNVs) and the number of participating species is shown. f The number of SNVs in the species-specific microbial genomes is presented.