Fig. 2

Genome assembly assessment and comparison. (a) The histogram generated by findGSE with a k-mer size of 21 is displayed below. The observed k-mer frequency is depicted by the gray line, while the teal line represents the fitted model for the heterozygous k-mer peak. The blue line represents the fitted model without k-mer correction, and finally, the red line represents the fitted model with k-mer correction, which is utilized to estimate the size of the genome. (b) A left panel with a Hi-C contact map is used to represent the genome assembly, showing the proximity of genomic regions in three-dimensional space as a contiguous linear arrangement. Each cell in the contact map represents sequencing data that confirms the linkage between two specific regions. Gray lines are used to separate scaffolds, and the density of the map indicates the degree of fragmentation, with higher density indicating more fragmentation. The right panel, depict a plot showing the size distribution in Mega base (Mb) of each scaffold in the genome. (c) This plot compares the rfMv2 genome in this study to a published genome (GCA_014462685) of RPW by plotting the cumulative sequence length (y-axis) against the increasing number of scaffolds (x-axis). (d) The comparison between the rfMv2 genome presented in this study and the published genome (GCA_014462685) was visualized using grouped bar charts. These charts depict the BUSCO analyses for the insecta_odb10 gene sets. The height of the bars represents the percentage of genes found in each assembly relative to the total gene set. Additionally, x axis of each of the grouped bar charts are labeled with the Initials based on the BUSCO status: M for missing genes, F for fragmented genes, CD indicates complete and duplicated genes, and CS represents complete and single-copy genes.