Supplementary Figure 6: SDA results for the NA19240 (African Yoruban) assembly.
From: Long-read sequence and assembly of segmental duplications

a) SDA analysis of the NA19240 FALCON assembly generates 2,136 PSV clusters. b) Cumulative distribution of the assemblies and their percent identity to their best match in the reference. There are 46.1 Mb of diverged assembly (gray) and 41.0 Mb that maps to the reference at high identity (black). c) A density plot of SDs plotted by length and percent identity. d) CND between NA19240 and the reference genome (NA19240 copy number – reference genome copy number) comparing n = 177 SD regions that match (>99.8%) versus n = 384 diverged SD regions (<99.8% identity). The mean CND of the matched sequence is 4.11 and the mean CND of the diverged sequence is 10.87, indicating that the diverged sequences are much more likely to represent additional duplicate copies that are unrepresented in the reference genome (GRCh38) (two-sided Mann-Whitney test; P = 1.88 × 10–4). The boxes indicate the range between the first and third quartiles, with the bold line specifying the median. The whiskers show the minimum and maximum within 1.5 times the interquartile range extending from the first and third quartiles. (See Fig. 2 for more details.).