Fig. 1: Quality control metrics and description of the S. mediterranea genome and annotation.

a Hi-C contact map of the reads used for scaffolding on S3h1 (upper right) and S3h2 (lower left), showing high contact intensity in red and low contact intensity in blue. b Results of a Merqury analysis using four Illumina shot-gun datasets not used for the assembly. c Dotplot representing a whole genome alignment between Chromosome 4, inferred with minimap2, of the previously scaffolded assembly (schMedS2) on the y-axis and the genome in this study (schMedS3h1) on the x-axis. Blue lines indicate scaffold gaps in schMedS2 and red lines indicate scaffold gaps in schMedS3h1. Numbered red bars indicate alignment gaps > 1 Mb, which contain highly repetitive satellite DNA absent in the previous assembly. d Self-similarity heatmap, calculated with stained glass, of the numbered gaps in c showing their high self-similarity, typical of centromeric or pericentromeric repeats. e Comparison between the two pseudohaplotypes of schMedS3. The chord diagram in the center indicates synteny regions (grey) and inversions (yellow) between the haplotypes. The black ribbons within the large inversion in Chromosome 1 indicate the contained smaller inversion. Density plots in the outer three circles show the distribution of transposable elements (TE), genes, and heterozygosity. f Representation of the hybrid gene annotation workflow. g–i Completeness comparison of benchmarked annotations using BUSCO (g), using the 1054 S. mediterranea transcripts deposited in GenBank (h), and using the mappability of 13 publicly available RNA-seq datasets (i). Box plots show the interquartile range (IQR), with whiskers extending to 1.5 times the IQR. j ORF integrity comparison of benchmarked annotations by manual inspection of 96 gene models for indicated error categories. The scores reflect only the best-predicted transcript/locus/benchmarked annotation. g–j Benchmarked gene annotations: S3h1, S3h2, S3BH: this study; dd_v1 the non-stranded dd_Smes_v1 assembly of the sexual strain of S. mediterranea42; dd_v6 the dd_Smed_v6 assembly of the asexual strain of S. mediterranea42; SMESG the gene prediction on basis of the previous dd_smes_g4 S. mediterranea genome assembly42; Oxford_v1 a composite annotation of38,45, and SMESG. Source data are provided as a Source Data file.