Fig. 5: A scaling law links copy number and plasmid size across bacterial phylogeny.

a Scatter plots showing the correlation between plasmid size (x-axis) and PCN (y-axis) for the analysed genera. Each point represents the median PCN and plasmid size for each PTU, and error bars indicate the standard deviation from the median. Grey lines represent ordinary least squares regression, with the surrounding shaded area indicating 95% confidence intervals. The scaling factor or slope, k, is indicated on each panel. b Distribution of total DNA load per plasmid (x-axis) relative to chromosome size per genus (y-axis). The DNA load of each plasmid is calculated by multiplying the plasmid size by the copy number and then expressed as a proportion relative to the chromosome size. The point inside the box marks the median. The upper and lower hinges correspond to the 25th and 75th percentiles, and whiskers extend to 1.5 times the interquartile range. Only Escherichia and Salmonella significantly differ from All; Kruskal–Wallis test followed by Dunn’s test for pairwise multiple comparisons p < 10−4; effect size = 0.006. c Relative plasmid DNA load observed (%) (x-axis) and expected (y-axis) per cell. The y-axis indicates the expected plasmid DNA load (%) inside a cell when it contains one plasmid (1n), two plasmids (2n), and so on. This expected data has been calculated by generating a sequence from 1 to 9 multiplied by the median of the DNA load per plasmid (2.49%). Each green point represents a single genome, and the black points are the median for each category. Shading indicates interquartile ranges. Pearson’s p value and coefficient are shown for the correlation between expected and observed plasmid DNA.