Fig. 1: Raw read accuracy of nanopore modified base detection using human whole genomic controls.

a Precision-Recall curve for 5mC detection using unmodified (N = 2) and methylated controls (N = 2). 5hmC detections are included as error, due to the absence of 5hmC in either control sample. Base-calls from both replicates of each control are counted and down sampled to 100,000,000 base-calls. b False positive rate of modified base detection from the unmodified control as a function of local GC content. CpG base-calls are binned into non-overlapping 100 bp windows and GC percentage calculated using the mm39 (GRCm39) reference genome. Bands indicate a 95% confidence interval across replicates (N = 2). Logo representation of the 12-mer sequence up/down stream of false positive modified base detections for 5mC (c) and 5hmC (d). Higher base probabilities are shaded and stacked top-down. Includes reads from both unmodified DNA standards (N = 2). e Rates of false positive modified base detection across classes of repetitive, low-complexity, or CpG island sequences, including (top) 5mC and 5hmC false positives in the modification-negative control (N = 2), and (bottom) 5hmC false positives in the 5mC-positive control (N = 2). LC: Low Complexity. SR: Simple Repeat. Error bars indicate 1 s.d. f Confusion matrix for predicted and ground-truth base-calls. Base-calls across all replicates are counted, considering C to be the ground-truth state of all base-calls in the unmodified controls (N = 2), and 5mC in the methylated controls (N = 2). ****p < 0.0001.