Supplementary Figure 5: SDA results for the CHM13 assembly.
From: Long-read sequence and assembly of segmental duplications

a) SDA analysis of the CHM13 FALCON assembly generates 1,848 PSV clusters. b) Cumulative distribution of the assemblies and their percent identity to their best match in the reference. There are 40.4 Mb of diverged assembly (gray) and 43.0 Mb that map to the reference at high identity (black). c) A density plot of SDs plotted by length and percent identity. d) Copy number difference (CND) between CHM13 and the reference genome (CHM13 copy number – reference genome copy number) comparing n = 186 SD regions that match (>99.8%) versus n = 374 diverged SD regions (<99.8% identity). The mean CND of the matched sequence is 1.61 and the mean CND of the diverged sequence is 5.98, indicating that the diverged sequences are much more likely to represent additional duplicate copies that are unrepresented in the reference genome (GRCh38) (two-sided Mann-Whitney test; P = 2.77 × 10–5). The boxes indicate the range between the first and third quartiles, with the bold line specifying the median. The whiskers show the minimum and maximum within 1.5 times the interquartile range extending from the first and third quartiles. (See Fig. 2 for more details.).