Extended Data Fig. 4: The cannabis pangenome and pangenes are high quality.
From: Domesticated cannabinoid synthases amid a wild mosaic cannabis pangenome

A) Benchmarking Universal Single-Copy Orthologs (BUSCO)19 for both the genome and gene predictions suggest that they are both high quality and complete. Gene models were predicted based on homology and expression data from different tissues, including flowers, leaves, and roots (Supplementary Table 2) with TSEBRA. We evaluated the quality of gene models with BUSCO19, which were around 95% complete on average for all assembly types. The scaffolded genomes contained 35,000 genes on average, and in the contig genomes, the number of genes scaled with the presence of duplications detected by BUSCO (Fig. 1e). B) The number of genes predicted contrasted with the number of BUSCO duplicate genes suggesting that the CCS and CLR contig-based assemblies were retaining significant duplicated sequence due to uncollapsed haplotypes. These haplotypes were not removed to retain the level of variation for downstream analysis. C) Scatter plot of chromosome lengths on the x-axis compared with gene counts per chromosome on the y-axis across the nine autosomes and both sex chromosomes.