Fig. 6: Analysis of number of called sites. | Nature Communications

Fig. 6: Analysis of number of called sites.

From: Rockfish: A transformer-based model for accurate 5-methylcytosine prediction from nanopore sequencing

Fig. 6

a Complementary cumulative distribution function (CCDF) of the strand-specific calling coverage for each ONT-based method and whole genome bisulfite sequencing (WGBS) for the NA12878 dataset. Vertical lines represent strand-specific sequencing coverage for ONT ( ~ 23x; red) and WGBS ( ~ 48x; blue). MG Remora denotes Megalodon Remora, while MG Rerio stands for Megalodon Rerio. b Stacked bar charts representing counts of highly confident positive and negative sites for WGBS (left) and Rockfish small (right; RF) for the R9.4.1 NA12878 dataset. A CpG site is defined as positive if the coverage is at least 5x and methylation frequency is at least 50%. A CpG site is defined as negative if the coverage is at least 5x and the frequency is less or equal to 50%. Sites with coverage less than 5x are labeled as not called. We define three categories of highly confident sites: (1) WGBS and Rockfish are concordant, (2) WGBS and Rockfish calls differ with the target tool having support from at least one other ONT-based method, (3) WGBS without support and Rockfish without call and vice-versa. The numbers are given in millions (106). Rockfish calls more highly confident sites than WGBS on the whole genome. c Stacked bar charts representing counts of highly confident positive and negative sites for different genic and intergenic regions. Rockfish calls more highly confident sites for all genic and intergenic types. d Stacked bar charts representing counts of highly confident positive and negative sites for different repetitive regions. Except for unmethylated sites in SINE, Rockfish calls more highly confident sites than WGBS. Source data are provided as a Source Data file.

Back to article page