Fig. 5: Dynamics of selection forces. | Nature Communications

Fig. 5: Dynamics of selection forces.

From: Freshwater genome-reduced bacteria exhibit pervasive episodes of adaptive stasis

Fig. 5

a, b Genome-wide quantification of selection force for genes coding for SPs and CPs. a Percentages of genes under negative (red) and positive (orange) selection within the SPs and CPs. b Number of sites per gene under negative selection. The statistical difference within the predefined species categories (i.e., small and large species) (Large genomes SP n = 1512, Large genomes CP n = 9390; Small genomes SP n = 512; Small genomes CP n = 6805) was determined by Pairwise Wilcoxon rank-sum tests (Large genomes W = 7,774,646, P value = 2.437e-09; Small genomes W = 2,090,672, P value = 2.976e-14). c SPs and CPs length (aa). Average protein length was calculated for both SPs and CPs and compared within and between groups (Large genomes SP n = 9147, Large genomes CP n = 32,288; Small genomes SP n = 12,602; Small genomes CP n = 1652). Statistical significance within and between categories was determined through Pairwise Wilcoxon rank-sum tests (P values < 2.2e-16 across all). d Protein subcellular localization fragments length for the SPs and CPs (Large genomes SP: Cytoplasmic n = 916, Non-cytoplasmic n = 7313, Transmembrane n = 918; CP: Cytoplasmic n = 6185, Non-cytoplasmic n = 21,129, Transmembrane n = 4974; Small genomes SP: Cytoplasmic n = 319, Non-cytoplasmic n = 1011, Transmembrane n = 322; CP: Cytoplasmic n = 2220, Non-cytoplasmic n = 8573, Transmembrane n = 1809). The overall difference was determined by Pairwise Wilcoxon rank-sum tests (Small genomes Cytoplasmic P value = 6.353e-15, Small genomes Transmembrane P value = 0.0035, Small genomes Non-cytoplasmic P value = 7.805e-11; Large genomes Cytoplasmic P value = 7.585e-10, Large genomes Transmembrane P value = 0.0002306, Large genome Non-cytoplasmic P value < 2.2e-16; Large and Small genome Non-cytoplasmic P value < 2.2e-16). The central line across the boxplots identifies the median, marking the dataset’s midpoint. The box itself demarcates the interquartile range, extending from the first quartile to the third quartile, encapsulating the central 50% of the data. The whiskers project from the box to the furthest data points not categorized as outliers and show the spread of the main body of the dataset. Raw data is provided as a Source Data file.

Back to article page