Extended Data Fig. 5: Assessing the Impact of Sequencing Depth on Diversity Results.

(a & b) Correlations between Shannon index obtained from subsampled reads and those obtained from all reads. Each dot represents a soil metagenome sample that colored by the biome type. The lines denote the predicted values based on the linear mixed model and the shaded areas flanking the lines indicate the upper and lower 95% confidence intervals. The numbers in the lower right corner are the spearman correlation results. (c) Viral Shannon index across varying sequencing depths, with second-order fit for total samples (left upper corner) and for subsamples separated by biomes (upper) and continents (bottom). The lines in the graph represent the predicted values as calculated by the linear mixed model. Surrounding these lines, the shaded regions illustrate the upper and lower bounds of the 95% confidence intervals. (d) Correlation between microbial diversity and viral Shannon index normalized by sample read number (Shannon per Read Count), and each dot represents a soil metagenome sample that colored by the biome type. (e) Median and interquartile ranges for Shannon per Read Count, with whiskers extending to ≤1.5× interquartile range. Significance differences were assessed using one-way ANOVA with LSD test; biomes with different lowercase letters are significantly different at α=0.05; (n = 620 (Agricultural Land), n = 42 (Artificial Surfaces), n = 40 (Bare Land), n = 310 (Wetland), n = 293 (Grassland), n = 56 (Tundra), n = 417 (Forest), n = 21 (Shrubland)). (f) Correlation between microbial diversity and viral Shannon index for samples with sequencing depths ≥100 million reads. (g) Median and interquartile ranges for viral Shannon index at species level for samples with sequencing depths ≥100 million reads, with whiskers extending to ≤1.5× interquartile range. Significance was assessed using one-way ANOVA and LSD tests, with varying lowercase letters marking significant differences at α = 0.05 (n = Same as (e)).