Fig. 1: Computation of plasmid copy numbers over bacteria and Archaea reveals that copy number inversely correlates with plasmid length.

A The computational pipeline. Suppose a genome has one chromosome and one plasmid with three copies relative to the chromosome. By dividing the total amount of sequencing data mapped to a chromosome or plasmid by the length of the chromosome and plasmid, the ratio of plasmid DNA to chromosomal DNA can be calculated. This estimates the plasmid copy number per chromosome in the sample. Pseudoalignment is used to rapidly estimate plasmid copy numbers, and Probabilistic Iterative Read Assignment (PIRA) is used to incorporate reads that map to multiple replicons (e.g., the chromosome and plasmid) to further improve plasmid copy number estimates. B Plasmid length inversely correlates with plasmid copy number. Rescaling plasmid length by the length of the largest chromosome in the cell reveals a scaling law. A segmented regression (in maroon) was fit to these normalized data on a log-log plot. The segmented regression has a first slope of –0.880, a breakpoint at –1.735 (or 1.84% of a chromosome), a second slope of –0.125, and an Adjusted R2 of 0.690. The marginal density distributions of plasmid copy number and normalized length are displayed on the axes. C The inverse correlation holds across diverse environments. The ecological provenance of each replicon was annotated per the method described in Maddamsetti et al.3 (Methods). Source data are provided as a Source Data file.