Fig. 3

Development of the RePlow model. a The estimated proportion of background errors (BEs) from total mismatches by substitution type. MOS values were measured for each substitution type from total mismatches of matched control samples. Positions with germline variants were excluded to assume that all mismatches originated from either sequencing or background errors. The ratio of the sum of MOS scores to the total mismatch count is regarded as an estimate of the BE proportion. b VAF distribution of called mutation candidates from library replicates of sample B (1% VAF) for each platform. All candidates were called by MuTect in at least one replicate. True positive and false-positive calls are colored in blue and red, respectively. c Empirical and fitted cumulative distribution for the VAFs of background errors. To estimate the PDF of background errors, VAF profiles based on the MOS value of each position (empirical cumulative distribution, black lines) were constructed and fitted by cumulative exponential distribution (red lines) (see Methods). PDFs were then constructed for each substitution type with the estimated parameter of the cumulative exponential model. d Overview and examples of mutation detection by RePlow. Mapped sequencing data of replicates and matched control are taken as input. For each data set, VAF profiles of background errors per substitution type are constructed first to estimate the PDF. Then, each genomic position is analyzed to calculate probabilities of being a variant or an error using estimated concordance models with the average VAF (normal distribution) and background error profiles (exponential distribution), respectively (see Methods). Both probabilities are jointly analyzed to estimate the likelihood thereof in a sequence context. Sites with a C > A mutation (green-shaded area) show a higher VAF than A > G mutation sites (red-shaded area). However, due to the excessive occurrence of context-specific error (C > A) and VAF discordance between replicates, RePlow selects only the A > G mutation site as a final candidate based on the joint analysis result. MOS mismatch over-representation score, PDF probability density function