Fig. 2: Benchmarking of SAVANA against existing algorithms using replicates.

a, A schematic representation of the replicate analysis strategy implemented to benchmark the performance of somatic SV detection algorithms. Created with BioRender.com. b, The distribution of the number of somatic SVs detected by each algorithm stratified based on whether they were detected in one (red) or both (green) replicates. Each point represents a tumor sample (n = 64) that has been split into two replicates. c, A comparison of the fraction of somatic SVs detected in both replicates by each algorithm. The bars report the result across all samples. The error bars report the 95% confidence interval. Significance with respect to SAVANA was assessed using the two-sided Student’s t-test (***P < 0.0001). The P values for SAVANA compared with all other algorithms were P < 2.2 × 10−16. d, The number of somatic SVs detected in both replicates divided by the total number of somatic SVs detected as a function of allele fraction. The results for the entire cohort are shown (n = 64). The size of the dots represents the number of somatic SVs in each group. Only algorithms that report the allele fraction or information that can be used to calculate the allele fraction were included in this analysis. e, A comparison of the count of somatic SVs detected in one (red) or both (green) replicates stratified by SV type. Note the different x-axis scales used to reflect the number of SVs reported by each algorithm. c–e show the aggregated results for the 64 samples with the highest sequencing depth. f, The fraction of deletions in replicates mapping to microsatellite regions. Each point represents a tumor sample (n = 64) that has been split into two replicates. The significance was assessed using the two-sided Wilcoxon’s rank test (****P < 0.00001). The P values for the comparison between SAVANA against SVIM, NanomonSV, cuteSV, Sniffles2, SVision-pro and Severus were P < 2.2 × 10−16, P = 4.9 × 10−10, P < 2.2 × 10−16, P < 2.2 × 10−16, P = 5.1 × 10−13 and P < 2.2 × 10−16, respectively. g, A haplotype consistency analysis of SV-supporting reads using read-backed phasing across the entire cohort. Each dot represents an SV. The x and y axes report the number of sequencing reads supporting each SV that are assigned to either parental allele (arbitrarily labeled as ‘allele 1’ and ‘allele 2’, respectively). h, The same data shown in g depicted in a stacked barplot format. In g and h, the SVs supported by sequencing reads assigned to only one parental allele are colored in green. The SVs with significant read support from both parental alleles are shown in red, and those with inconclusive results are shown in blue. The box plots in b and f show the median, first and third quartiles (boxes) and the whiskers encompass observations within 1.5× the interquartile range from the first and third quartiles.