Fig. 2: Impact of repeats and heterozygosity on assembly quality. | Nature

Fig. 2: Impact of repeats and heterozygosity on assembly quality.

From: Towards complete and error-free genome assemblies of all vertebrate species

Fig. 2

a, Correlation between scaffold NG50 and genome size of the curated assemblies. b, Nonlinear correlation between contig NG50 and repeat content, before and after curation. c, Correlation between number of gaps per Gb assembled and repeat content. d, Correlation between primary assembly size relative to estimated genome size (y axis) and genome heterozygosity (x axis), before and after purging of false duplications. Assembly sizes above 100% indicate the presence of false duplications and those below 100% indicate collapsed repeats. e, f, Correlations between genome duplication rate using k-mers23 (e) and conserved BUSCO vertebrate gene set (f), and genome heterozygosity before and after purging of false duplications. g, h, As in e, f, but with whole-genome repeat content before and after purging of false duplications. Genome size, heterozygosity, and repeat content were estimated from 31-mer counts using GenomeScope71, except for the channel bull blenny, as the estimates were unreliable (see Methods). Repeat content was measured by modelling the k-mer multiplicity from sequencing reads. Sequence duplication rates were estimated with Merqury23 using 21-mers. *P < 0.05; **P < 0.01; ***P < 0.001, of the correlation coefficient: P values and adjusted r2 from F-statistics. n = 17 assemblies of 16 species.

Back to article page