Extended Data Fig. 10: Evaluation of barcode replacement in IronThrone GoT processing.
From: Somatic mutations and cell identity linked by Genotyping of Transcriptomes

a, Fraction of reads with cell barcodes that are not perfectly matched to the whitelisted cell barcodes from the species-mixing experiment. ‘>Hamm-1’ denotes filtered reads with barcodes that are more than one Hamming distance away from whitelisted barcodes (n = 139,422 reads). ‘Not significant’ denotes filtered reads with barcodes that are one Hamming distance away from the whitelisted barcodes, but which have a low probability of originating from the barcode (posterior probability < 0.99, n = 14,830 reads). ‘Replaced’ denotes rescued reads with barcodes that have candidates that are one Hamming distance away from the whitelisted barcodes, with statistical significance (posterior probability ≥ 0.99, n = 224,085 reads). b, c, Number of supporting reads per candidate barcode and base quality at the differing base positions (b) and across base positions (c). Two-sided Wilcoxon rank-sum tests were applied to compare not significant (n = 14,830) and replaced (n = 224,085) barcodes. d, Correlation between the number of supporting reads per candidate barcode and median base quality at the differing base (two-tailed Pearson’s correlation, F-test). e, Distribution of prior and posterior probabilities from not significant (n = 14,830) and replaced (n = 224,085) barcodes. The dashed red line represents the posterior probability cut-off (0.99). f–h, To further evaluate the efficiency of barcode replacement, we generated synthetic cell barcodes by randomly changing one base in whitelisted cell barcodes (n = 100 iterations). f, Percentage of reads with cell barcodes that are not identical to the whitelisted cell barcodes (n = 100 iterations). Percentages of replaced reads were 99.1% ± 0.001% (median ± absolute deviation) in simulations with 1 base changed, 1.1% ± 0.002% in simulations with 2 bases changed and 0.7 ± 0.001% in simulations with 3 bases changed. g, Determination of whether replaced cell barcodes are identical to the original cell barcodes. In simulations with 1 base change, the percentage of reads with replaced cell barcodes that were identical to the original cell barcodes was 100 ± 0% (median ± absolute deviation of 100 iterations). h, Estimation of prediction power for classifying cell barcodes from simulations with 1 base changed (n = 100 iterations).