Extended Data Fig. 1: R570 genome assembly overview and analysis.
From: The complex polyploid genome architecture of sugarcane

A) Simplified genome assembly pipeline for sugarcane R570. The pipeline combines multiple sequencing technologies (HiFi contigs, optical map, genetic map, single chromosome libraries, Hi-C, Sorghum synteny), leveraging the strengths of each for phasing homeologous chromosomes, while using manual inspection and iterative steps to overcome each technology’s weaknesses. See ‘Supplemental Data’ for a full explanation and rationale for each step. B) Hi-C read heatmap for R570. Chromosomes are clustered based on homology, with Hi-C reads (~ 56X coverage) aligned to the finalized version of the assembly. C) Alignment of primary and alternate chromosomes. During genome construction and finalization, near perfect duplicate contigs were discovered when constructing chromosomes, which were binned into an alternate assembly to improve the utility of the genome (see Supplemental Data). When possible these contigs were ordered and oriented into chromosomes based on their closest, unique alignment in the primary assembly. For example, Chromosome 6E and its anchored alternate were aligned using nucmer (v4.0; -l 100 --maxmatch -b 400). Dots represents a 1-1 alignment between the two sequences that is greater than 15 kilobases in length. Each point is coloured based on alignment percent identity. D) Haplotype collapse summary for R570. To quantify regions of haplotype collapse within the R570 genome, PacBio HiFi reads were aligned to the assembly, using read depth to determine haplotype copy number. 0X unique coverage regions represent genomic blocks where reads could not uniquely map between the primary and alternate assembly. Counted bases represent genome sequences where depth could be reliably calculated (0-4X). Represented bases account for collapsed bases in the assembly (example 1 Mb of 3X coverage = 3 Mb represented sequence). E) Simplex marker densities in the sugarcane R570 genome. Simplex markers (80 bp) from the R570 genetic map (obtained by testing 3:1 segregation pattern in 96 genotyped S1 progeny) were searched in the R570 genome assembly, retaining only exact, single copy match locations. Position densities were then visualized by calculating the percent number of matched bases per 10 kb sliding window (1 kb step length). Lines underneath each chromosome correspond to progenitor and haplotype collapse block assignments.