Fig. 4: HiFi-assembled cMAGs retrieve hard-to-assemble regions by short-read assembly.

a Genome bin retrieval rate and genome region annotation plot of a cMAG for the uncultured species OMN01_HAM_0050. The inner circle annotates rRNA and genomic island regions. The outer circle represents retrieval rate by conspecific HRGM MAGs for every 1-kbp genome bin. b Proportion of genome bins by retrieval rate. c, d Retrieval rate of genome bins that include rRNAs (c) or genome islands (d) (n = 393 for 5S rRNA genome bins, n = 662 for16S rRNA genome bins, n = 1258 for 23S rRNA genome bins n = 283,267 for non-rRNA genome bins, n = 30,330 for GI genome bins, and n = 255,146 for non-GI genome bins). Retrieval rate was compared by the two-sided Mann–Whitney U test. (P-value = 1.83e−97 for 5S rRNA, P-value = 6.39e−286 for 16 S rRNA, P-value < 1e−300 for 23S rRNA compared to non-rRNA genome bins; P-value < 1e−300 for GI genome bins compared to non-GI genome bins). e Proportion of genome island proteins with an eggNOG ortholog according to the cultured status of the host genome (n = 63 cultured genomes, n = 38 uncultured genomes). The proportion was compared by the two-sided Mann–Whitney U test (P-value = 0.0009). f Read length of long-read human fecal metagenomic samples and length of entire genome islands found among 102 cMAGs. (n = 2,017,709 reads for KR001, n = 1,792,146 reads for m64011_210224_000525, n = 1,687,238 reads for m64011_210225_094432, n = 1,904,159 reads for m64011_210226_210143, n = 1,767,289 reads for m64011_210228_064650, n = 4,487,361 reads for SRR8427256, n = 3,404,101 reads for SRR8427257, n = 4,003,722 reads for SRR8427258, n = 11,016,028 reads for SRR9847854, n = 10,261,578 reads for SRR9847857). The maximum and the minimum of the boxplots represent the 10th and 90th percentile of the data. The upper and lower bounds of the box represent the 25th and 75th percentile. The center bar represents the median. All outliers are shown in (c–e) and omitted in (f). (***P-value < 0.001).