Fig. 3: Prioritizing training combinations for precise out-of-sample modification basecalling using signal cover scores.

A The definition of “signal cover score”. Q10 and Q90 mark the 10% and 90% signal quantiles, respectively. \(N\) and \(P\) denote the total number of training modification classes and sequence positions, respectively. B Signal cover scores for all the possible training modification combinations in descending order, for ac4C, Psi, and m1Psi test groups. Bars denote the inclusion of training modifications, and corresponding colors represent numbers of modifications, for a certain combination. Specifically, 4-combo combinations with the highest (Max) and lowest (Min) signal cover scores were marked. The performance of Max and Min basecallers was next quantified. Specifically, the mappability (C) and per-read CIGAR match fraction (D) were used as quantification metrics. AllMod, the basecaller trained by all the modifications except for the one to be basecalled; UnMod, the basecaller trained by only unmodified reads. E Mapped length distributions of “AllMod” and “Max”.