Extended Data Fig. 3: Benchmarking CN estimation and breakend detection in cancer genomes.
From: Most large structural variants in cancer genomes can be detected without long reads

(a) Benchmarking pipeline for simulating SRS data and testing the accuracy of inferred CN profiles. Junctions are simulated to rearrange parental haplotypes of the phased NA12878 Platinum genome yielding a rearranged and copy-altered cancer genome (top). Simulated haplotypes are sampled to generate 1 Kbp bins of total read depth, allelic counts at heterozygous SNPs, and set of junctions according to a sampled purity and ploidy value (second from top). These data are then analyzed by the CN algorithms shown in the bottom panel and results are compared to the ground truth allelic and total CN profile (bottom). See Methods for additional details. (b) Example of a simulated locus, along with inferred total CN by each algorithm (top panels), the simulated ground truth CN profile (second from bottom), and tumor purity / ploidy transformed read depth (bottom, see Methods). (c) Example of genome graphs inferred by JaBbA v1 and JaBbA v0.1 for another locus in one of the 500 simulated samples. Tracks from top to bottom show the genome graph inferred by JaBbA v1, the dryclean foreground, the graph inferred by JaBbA v0.1, the tumor/normal read depth ratio, the ground truth copy CN, and the location of CN-unmappable ranges masked in JaBbA v1. Both dryclean foreground and tumor/normal ratio are purity / ploidy transformed (see Methods). Genome graph legend same as for panel b. (d) Comparison of graphs inferred by JaBbA v1 and JaBbA v0.1, for another simulated tumor sample and locus. Tracks from top to bottom show graph inferred by JaBbA v1, graph inferred by JaBbA v0.1, read depth, and ground truth CN. Both dryclean foreground and tumor/normal ratio are purity / ploidy transformed, that is in CN units (see Methods). Genome graph legend same as for panel b. (e) Precision and recall for the detection as loose ends of SVs missing from the junction input to JaBbA (n = 500 simulated tumors). JaBbA v1 and v0.1 are compared, stratified by inferred sample purity. Points show median and error bars show IQR.(f) Overall precision and recall of SV breakend calls by JaBbA and other CN inference algorithms for all simulated samples, including those with low ( < 0.5) purity (n = 500 simulated tumors). Points show median and error bars show IQR. (g) Root mean square error for total (top) and allele-specific (bottom) CN by JaBbA and other CN inference algorithms (n = 500 simulated tumors). Box plots: line (median), body (IQR), whiskers (1.5 IQR).