Extended Data Fig. 1: Bioinformatics pipelines for HRGM2 construction.

a, The overall pipeline for HRGM2 construction. b, Pipeline to control genome quality. c, CheckM2 assessment of 10,172 MQ genomes previously assessed by CheckM (completeness-underestimated). The 6,281 genomes that were added to the near-complete (NC) genome set based on assessment using universal bacterial markers and CPR markers are marked with blue dots. The vertical and horizontal dashed lines indicate the 90% completeness and 5% contamination threshold, respectively. The grey area includes genomes that meet the NC genome criteria according to CheckM2. d, Count and proportion of filtered in and out by universal bacterial markers and CPR markers that meet the NC criteria under CheckM2.