Fig. 1: Selection of plant samples for CHH methylation training feature collection.
From: Accurate cross-species 5mC detection for Oxford Nanopore sequencing in plants with DeepPlant

CHH methylation sites, particularly those with high-methylation levels (≥90%), are rare in plants. This figure presents the statistics on high-methylation CHH sites from previously published bisulfite sequencing (BS-seq) datasets16,22,23,24,25,26,27,28,29,30,31,32,33,34,35 and those generated in this study. a Ratios of high-methylation CHH sites among quantified CHH motifs (≥5 read coverage) (left panel) and the number of covered 9-mer contexts (right panel) in BS-seq datasets from ten plant species. Only 9-mers observed at three or more high-methylation CHH sites were considered. Species other than A. thaliana and O. sativa were selected based on high CHH methylation ratios reported in previous studies20,21. b Ratios of high-methylation CHH sites in BS-seq datasets sequenced for this study, including A. thaliana, O. sativa, and species with abundant high-methylation CHH sites identified in (a), as well as Glycine max and Marchantia polymorpha. c Number of covered 9-mer contexts (top) and heatmap of context abundance (bottom) grouped by CHH motifs in nanopore datasets from six plant species. In the top panel, the top line of each bar corresponds to the number (36,864) of all possible 9-mer sequences centered with a CHH motif. A 9-mer was considered covered if present in 50 or more positive training samples and had at least an equal number of negative samples. “Mixed” refers to combined samples from S. miltiorrhiza, R. communis, and S. tuberosum. Source data are provided as a Source Data file.