Abstract
DNA N6-methyladenine (6mA) serves as an intrinsic and principal epigenetic marker in prokaryotes, impacting various biological processes. To date, limited advanced sequencing technologies and analyzing tools are available for bacterial DNA 6mA. Here, we evaluate eight tools designed for the 6mA identification or de novo methylation detection. This assessment includes Nanopore (R9 and R10), Single-Molecule Real-Time (SMRT) Sequencing, and cross-reference with 6mA-IP-seq and DR-6mA-seq. Our multi-dimensional evaluation report encompasses motif discovery, site-level accuracy, single-molecule accuracy, and outlier detection across six bacteria strains. While most tools correctly identify motifs, their performance varies at single-base resolution, with SMRT and Dorado consistently delivering strong performance. Our study indicates that existing tools cannot accurately detect low-abundance methylation sites. Additionally, we introduce an optimized method for advancing 6mA prediction, which substantially improves the detection performance of Dorado. Overall, our study provides a robust and detailed examination of computational tools for bacterial 6mA profiling, highlighting insights for further tool enhancement and epigenetic research.
Similar content being viewed by others
Introduction
The study of bacterial epigenetics has spanned nearly a century, beginning with the groundbreaking discovery of the first DNA methylation on cytosine (5mC) in Tubercle Bacillus in 19251. Subsequently, Dr. Waddington introduces the term “Epigenetics”2. In 1955, the first N6-methyladenine (6mA) is identified in Bacterium coli, highlighting that the dynamic nature of the bacterial genome extends beyond the four nucleotide bases3. Although these epigenetic markers, such as 6mA, are known to be linked to biological functions, the specific details remain undiscovered at that time4,5. As research progressed, Werner Arber proposes the “Restriction-Modification system” (R-M system) in Escherichia coli (E. coli), demonstrating that DNA modifications are caused by nucleotide methylation6. A key characteristic of the R-M system is the ability of the methyltransferases (MTases) to specifically identify a DNA sequence7,8,9. The motifs carried by MTases are categorized into three types: type 1 motifs are lengthy and bipartite, type 2 motifs are usually 4-6 base pairs short and palindromic, and type 3 motifs are short and non-palindromic10. MTases in bacteria identify unmodified bases of foreign phage DNA and recruit restriction endonuclease (REase) to combat phage11. In addition, MTases in the R-M system are responsible for protecting genetic stability and participating in gene expression regulation12,13,14. Certain MTases that operate independently of the R-M system play crucial roles in bacterial processes. For example, DNA adenine methyltransferase (dam) is prevalent in E. coli and regulates methylation on 99% of the motif GATC15,16. The Bacteriophage Exclusion (BREX) system is another recently discovered mechanism of bacteria for defending phages17. The proteins encoded by the gene clusters of BREX include a DNA MTase, which shows motifs similar to the type 1 motif in the R-M system18,19.
Given the intricate composition and critical role of bacterial 6mA in functional biology, researchers are actively developing tools to analyze its distribution and motif patterns10,20,21. These methods have primarily evolved through three types of sequencing technologies. The first category includes qualitative and relative quantitative experimental methodologies. For example, Immunoblotting and Liquid Chromatography-Mass Spectrometry (LC-MS) can semi-quantitively detect methylation but lack the precision to pinpoint specific methylated bases22,23. Another promising approach is Next Generation Sequencing, including methods such as 6mA immunoprecipitation sequencing (6mA-IP-seq) and Nitrite Sequencing, offering improved resolution but can only be capable of identifying methylation patterns down to segments of tens of base pairs and rely heavily on the quality of antibodies or chemicals24,25,26.
Significant progress has been made with the advent of third-generation sequencing (TGS). In 2010, PacBio introduces a method to detect DNA methylation without chemical conversion using SMRT sequencing27. This advanced technology helps researchers uncover the MTase recognition sequences and methylome of bacteria28. However, SMRT sequencing has a higher error rate and requires multiple sequencing passes of a molecule to produce high-quality consensus data29. These challenges have been mitigated by the updated long high-fidelity (HiFi) sequencing technique, which achieves an accuracy rate of up to 99.8%30. SMRT sequencing relies on the optical detection of fluorescence signals during DNA polymerase-mediated nucleotide incorporation, where modified bases alter the kinetics of incorporation. In contrast, Nanopore sequencing, another third-generation technology launched by Oxford Nanopore Technologies (ONT), employs electrical measurements, detecting characteristic changes in ionic current as modified DNA bases traverse through protein nanopores. The foundational hypothesis of Nanopore sequencing that individual nucleotides could be detected and differentiated as they pass through an alpha-hemolysin channel originates in the 1990s31. Compared with SMRT sequencing, Nanopore sequencing offers several advantages, including its portability, which allows for sequencing in a various setting outside traditional laboratories, from fieldwork to clinical environments32. Additionally, Nanopore supports various analytical tools suitable for different types of biological aims, enhancing its utility for complex genomic, transcriptomic, and epigenomic investigations33. Despite being known for high error rates due to the challenges associated with basecalling and the high signal-to-noise ratio, the situation improves with the invention of another generation of flow cells was invented34. Flow cell R9.4.1 achieves an accuracy of approximately Q13+, and flow cell R10.4.1 is claimed to achieve an accuracy of Q20+ of raw reads35. Several studies verify these advancements, and numerous tools developed for Nanopore have attempted to address the persistent challenges and advance various applications35,36,37.
Compared with the number of tools available for detecting 5mC and RNA modification, fewer tools are capable of detecting bacterial 6mA38,39. Examples of 6mA-detecting available tools include mCaller, a neural network-based tool trained on the sequencing data of E. coli K-1240; Tombo, a comprehensive tool suite developed by the ONT, provides one de novo tool (Tombo_denovo) and two comparison mode tools, namely model_sample_compare (Tombo_modelcom) and level_sample_compare (Tombo_levelcom)41; Nanodisco, a widely used tool for de novo modification detection and prediction of the methylation types in bacteria42; Dorado (https://github.com/nanoporetech/dorado), a deep-learning-based tool that provides highly accurate basecalling and modification detection for Nanopore sequencing data; and Hammerhead, which focuses on strand-specific mismatch patterns generated from the Nanopore sequencing reads and uses a statistical method to refine modification detection43. Figure 1a and Supplementary Fig. 1 summarize the history of bacterial methylation research and relevant tools.
a Timeline of important events in bacterial DNA modification studies throughout the past century. Key milestones are highlighted, with two major techniques of the TGS colored blue and orange. Other types of 6mA detection technologies are listed within the green dashed line box, while the orange dashed line box marks the five tools (Tombo has three sub-tools) for 6mA detection on Nanopore data that are evaluated in this research. b Diagram describing the processing workflow and benchmarking strategy. The left dashed line separates PacBio tools and ONT tools. The right dashed line separates the two types of sequencing files and corresponding tools from the flow cell R9.4.1 and R10.4.1. Created in BioRender. Huang, J. (2025) https://BioRender.com/x63f659. c Quality statistics of sequencing outputs based on four features. The gray color represents unmapped reads or bases. Centre line: median. Box bounds: 25th to 75th percentiles (interquartile range, IQR). Whiskers: extend to most extreme data points within 1.5 × IQR. WT wild type, WGA whole genome amplification. The n number for coverage is the genome size of Psph (n = 6538260).
The importance of bacterial 6mA modification has gradually become evident. Studying bacterial epigenetics is essential for comprehensively understanding bacteria growth, toxicology, and pathogenesis. However, the tool best suited for 6mA detection in bacteria in different biological contexts remains unclear. In this study, we comprehensively compare tools for detecting bacterial 6mA. The results of eight tools are assessed against known 6mA sites within canonical motifs, providing crucial insights into the strengths and limitations of each tool concerning various biological objectives. Specifically, the tools combined with data generated from the R10.4.1 flow cell exhibit higher accuracy at the motif level, single-base resolution, and lower false calls than the tools combined with data generated from the older flow cell. Furthermore, we use the 6mA-deficient sample of DNA to further explore the implications of decreased 6mA, potentially affecting gene stability. Our findings highlight the complexity of SMRT and seven Nanopore tools, emphasizing the need for further development in methylation mapping technologies to better understand the intricate bacterial epigenetic systems.
Results
Benchmarking strategy for 6mA detection in bacteria
In this study, SMRT and seven tools currently available for bacterial DNA 6mA detection using Nanopore sequencing data were gathered for benchmarking (Fig. 1b). Five of the Nanopore tools (mCaller, Tombo_denovo, Tombo_modelcom, Tombo_levelcom, and Nanodisco) can only process the sequencing data from flow cell R9, and the other two Nanopore tools (Dorado and Hammerhead) are compatible with flow cell R10. We categorized tools that require the wild type (WT) and the low/no modification DNA as the control group of DNA sequencing data as “comparison mode”, and tools that only need one experimental group of data were considered “single mode”. In most cases, the control group can be whole genome amplification (WGA) DNA, which is accepted as DNA with all modifications removed44.
To evaluate the computational tools, we expanded our analysis to Pseudomonas syringae pv. phaseolicola 1448A (Psph). Our previous study verifies an MTase HsdMSR belonging to the type 1 R-M system and is responsible for all type 1 motif GAG-N6-GCTG methylation in the Psph45. The ∆hsdMSR variant, which lacks the primary 6mA MTase gene (hsdMSR), served as a 6mA-deficient control. Nanopore sequencing was conducted using both R9.4.1 and R10.4.1 flow cells to analyze native DNA from Psph WT and Psph ΔhsdMSR, alongside Psph WGA DNA. Each sample was analyzed for read length, read numbers, base numbers, and Q score (Fig. 1c and Supplementary Data 1). The average sequencing depth of six sequencing results was at least 241×, and the average read length was at least 2579 bp, consistent with the characteristics of long-read TGS. Among the six samples, ∆hsdMSR_R9, WT_R10, ∆hsdSMR_R10, and WGA_R10 had more than 90% of reads and bases mapped to the genome. The average Q scores of all R10.4.1 sequencing results were 1.63-fold higher than those of all R9.4.1 sequencing data, sufficient for further analysis. Additionally, SMRT sequencing was conducted on WGA samples, yielding an average coverage depth of 297× (Supplementary Data 2). Outputs from all eight tools were standardized into unified assigned values, where each tool’s distinct metrics (response scores, modification fractions, or p values) for 6mA/A sites were ordered and normalized to a 0–1 scale (Fig. 1b). The assigned value helped facilitate subsequent analyses and provided robust cutoff selection, detailed described in the “Methods” section.
Three reliable ground truths and four sets of TGS data for Psph
Given our understanding of MTase motif specificity, we derived two sets of 6mA site-specific ground truths (WT set and ∆hsdMSR set, respectively)46,47. The ground truth for WT included 3198 methylation sites (including 2898 for type 1 motif and 300 for type 2 motif), while that for the ∆hsdMSR strain involved 300 sites that overlapped with those in WT (Fig. 2a). This phenomenon was consistent with the high conservation of this enzyme, specificity of motif recognition, and the irreversibility of bacterial DNA modification48. Consequently, the 2898 sites of the type 1 motif lost in the methylome due to knockout were defined as the third ground truth (LOST set).
a Venn diagram comparing the 6mA motifs between WT and ΔhsdMSR. The LOST set is determined to exclusively contain the type 1 6mA motifs. b The number of modified bases covered in each tool in four sets, categorized based on their operational capabilities. The tools are classified into two main types: Single Mode and Comparison Mode. Single mode tools are designed to process one set of sequencing data at a time, whereas comparison mode tools can analyze and compare two different data sets simultaneously. Additionally, the tools are further divided based on the type of nucleotide predictions they offer. 6mA-specific Tools are specialized for predictions exclusively on adenine modifications, while All Bases Available Tools are equipped to provide predictions across all types of nucleotides. The data sets used in this analysis, namely WT, ∆hsdMSR, LOST, and WGA, are detailed in the “methods” section. c Motifs enrichment -log10(p value) and the number of enriched sites determined by MEME-Streme. The top 10,000 6mA sites called by each tool are included in this analysis. Dots hit zero if no corresponding motif are discovered. Fisher’s exact test was applied to calculate the p value. d Histogram displaying the example of 6mA sites in the ground truth and Tombo_levelcom results. e Summary of motif discovery results in WT, ∆hsdMSR, and LOST set.
To validate the TGS sequencing results, four datasets in Psph (WT, ∆hsdMSR, LOST, and WGA) were labeled. Comparison mode tools processed comparisons between WT-WGA, ∆hsdMSR-WGA, WT-∆hsdMSR, and WGA-WGA, while the “Single Mode” tools process data only from WT, ∆hsdMSR, and WGA solely (Fig. 1b). The LOST results for Single Mode tools required further comparative analysis, as described in the “Methods” section. Among all of the tools, Dorado and mCaller outputted 6mA calls that only involved adenine sites, while the other five tools provided modification fractions or p values at all bases. All Nanopore tools assigned predictions for at least 98% of the sites, except for Tombo_levelcom, a statistics-based tool requiring a sequencing depth of at least 50×, which only outputted 6,427,486 (52.58%) predictions from WGA sequencing results (Fig. 2b). A significant limitation of SMRT Link, the official analysis platform designated merely ~5% sites of all adenine sites as displaying optical signatures characteristic of modifications. While the Inter-Pulse Duration ratio (IPD ratio) and Phred-transformed p value (score) were available for the modified adenine sites, 6mA modification fraction measurements were exclusive for sites where their IPD ratios matched the distribution model28,49. The distributions of assigned values in each result varied, but all showed one- or more-peak shapes (Supplementary Fig. 2c).
Dorado, Hammerhead, Nanodisco, and Tombo_levelcom successfully unveil motifs
Psph has two types of 6mA motifs that mainly appear in chromosomes, accounting for 2800 (96.61%) and 290 (96.67%) of the sites (Supplementary Fig. 2a, b): The type 1 motif GAG-N6-GCTG is a long bipartite motif, while the type 2 motif RAGTACTY is a short palindromic motif. Both motifs showed significant enrichments near the beginning of the protein-coding region (Supplementary Fig. 3a). In bacterial genomes, the number of methylated adenines typically ranges from several thousand to ~30,000–40,000, varying according to the diversity and abundance of MTases50. Therefore, we selected the top 10,000 predicted A sites according to the assigned values for further analysis51,52. In the WT set, all tools yielded a near-even distribution on both the positive and negative strands (Supplementary Fig. 3b). Most of the sites were on the chromosomal DNA and enriched near the start codon (Supplementary Fig. 3c). The exception was Tombo_denovo, which had 4560 (45.6%) sites from the small plasmid (Supplementary Fig. 3b), indicating a bias towards the signals from the plasmids rather than the chromosomal region. MEME-Streme was used for motif discovery. SMRT, Dorado, Hammerhead, Nanodisco, and Tombo_levelcom identified both two types of motifs with p < 10−2 (Fig. 2c; Supplementary Fig. 3d). Interestingly, Tombo_modelcom detected only the short palindromic motifs.
The motif-level evaluation of the eight tools was also performed on the ∆hsdMSR set (Supplementary Fig. 4a). With the partially methylated DNA from ∆hsdMSR, SMRT, Dorado, Hammerhead, Nanodisco, and Tombo_levelcom maintained excellent performance, whereas the R9 exclusive tool Tombo_levelcom excelled in terms of significance level and the number of motif sites discovered (Fig. 2c). The enriched sites of Tombo_levelcom were larger than the true motif numbers, suggesting that the current of adjacent A sites is influenced when passing through the nanopores, and can thus be easily called as 6mA33 (Fig. 2d). In the LOST set, SMRT, Dorado, Hammerhead, Nanodisco, and Tombo_levelcom highlighted their performance consistently, while mCaller also correctly identified the motif, with an enrichment percentage of 1%, indicating its potential for motif discovery (Fig. 2b; Supplementary Fig. 4b, c). Overall, these results highlight the remarkable capabilities of SMRT, Dorado, Hammerhead, Nanodisco, and Tombo_levelcom in motif finding. (Fig. 2e).
SMRT, Dorado, and Tombo_levelcom outperform other tools in 6mA profiling
We performed a comparative analysis of six bioinformatics tools to evaluate their performance in identifying and profiling 6mA sites. Nanodisco and Hammerhead were excluded from the site-level evaluation as their primary function is motif categorization48. The assessment was based on three metrics: Precision-Recall Curve (PRC), Receiver Operating Characteristic (ROC), and F1 score at different cutoffs across two distinct conditions, designated as WT and ∆hsdMSR. Simultaneously, a 5-mer shift was implemented in Tombo_denovo, Tombo_modelcom, and Tombo_levelcom, considering the limitations of per-base current distribution in their algorithm53. All predictions were ordered based on the assigned values, and the F1 score was calculated at different cutoffs to provide a balanced measure by combining precision and recall54. To normalize for variations in base-calling frequency and assigned value distributions, the analysis was standardized using the count of adenine sites as the x-axis (Supplementary Fig. 5a, d).
In the WT dataset, SMRT demonstrated superior performance among all tools, achieving the highest average precision (AP; area under the PRC) and F1 score of 0.958 and 0.933, respectively (Fig. 3a, e). Notably, SMRT maintained the most stable precision score, while other tools exhibited significant variations with increasing numbers of analyzed adenine sites (Fig. 3b). Although all tools showed varying degrees of resistance to false positive accumulation, SMRT’s recall and false positive rate (FPR) did not approach 1, primarily due to its considerably more conservative prediction output (Fig. 3b–d). Dorado achieved the second outstanding maximum F1 score of 0.412, more than 1.3 times higher than that of the next best tool, Tombo_levelcom (Fig. 3e, f). Notably, Dorado displayed the best ROC value of 0.992, indicating excellent separation between true positives (TP) and false positives (FP) (Fig. 3f). However, SMRT and Dorado both showed reduced performance when processing sequencing results of DNA with fewer modifications from the ∆hsdMSR strain, with maximum F1 scores of 0.707 and 0.026, respectively (Fig. 3i, m). Tombo_levelcom retained its superiority when analyzing samples with low 6mA site abundance, ranking the second highest with an AP of 0.48 and an F1 score of 0.575 (Fig. 3h–m). Dorado’s suboptimal performance may be attributed to the reduced current signal variations caused by low-abundance 6mA modifications, which challenged the model’s discrimination capability55. Further analysis of specificity revealed that all tools demonstrated susceptibility to FP predictions (Supplementary Fig. 5b, e). Overall, SMRT, Dorado, and Tombo_levelcom exhibited high reliability at the single-nucleotide/5-mer level 6mA detection.
a Precision-Recall Curve (PRC) shows the overall detection performance of different tools. AP (average precision) values are indicated for each method. b Precision and recall values plotted against the logarithmic number of adenine sites for different detection tools. c Receiver operating characteristic (ROC) curve depicting the performance evaluation of six tools. Area under curve (AUC) values are shown for each tool. d True positive rate (TPR) and false positive rate (FPR) plotted against the logarithmic number of adenine sites. e The curve of the F1 score changes with the number of adenine sites included. f Heat map with the number indicated shows the optimal F1 score, the ROC value, and the AP value. Color intensity scales with numeric values, darker indicating higher values. g The curve of the F1 score changes with the modification fraction provided by SMRT and Dorado, indicating the single molecule level (per-read) accuracy. The ground truth dataset for Psph WT comprised all 6mA sites within type 1 and type 2 recognition motifs. h PRC with AP values. i Precision recall values with log-transformed adenine site count. j ROC curves with AUC values. k TPR and FPR values with log-transformed adenine site count. l The curve of the F1 score changes with the number of positive predictions. m Heat map with the number indicated shows the best F1 score reached, the ROC values, and the AP values. Color intensity scales with numeric values, darker indicating higher values. n The curve of the F1 score changes with the modification fraction provided by SMRT and Dorado, indicating the per-read accuracy. The ground truth dataset for Psph ∆hsdMSR comprised all 6mA sites within type 2 recognition motifs. The small line plots in (e) and (l) provide a zoom-in view of the F1 score change as the number of adenines increases, focusing on the top 10,000 predictions. All outcomes from Tombo_denovo, Tombo_levelcom, and Tombo_modelcom are processed with a 5-mer shift as indicated in Methods. Source data are provided as a Source Data file. The curves of different colors represent the results of predictions using different tools, marked at the bottom. TPR true positive rate. FPR false positive rate.
SMRT and Dorado predict methylation probability for each DNA molecule independently by analyzing the optical or current signal it produces49,56. We evaluated the accuracy of single-read methylation predictions directly against the established ground truth data (Fig. 3g, n; Supplementary Fig. 5c, f). SMRT demonstrated superior precision in both WT and ∆hsdMSR datasets, achieving F1 scores of 0.955 and 0.442, respectively. The corresponding modification fractions varied between the two strains, with 80.3% in WT and 69% in ∆hsdMSR. Notably, Dorado maintained more consistent modification fractions (85.86% and 78.18%), suggesting that the electrical signal perturbations caused by modified bases were relatively stable.
SMRT excels in site-level 6mA calling with low false calls
To assess 6mA detection accuracy at single-base resolution, we implemented tool-specific threshold values optimized from our preliminary analyses. These thresholds were established by identifying the assigned values that yielded maximal F1 scores for each respective tool39,57,58. The outliers for each tool were defined as predictions in WGA48. To assess the false calls performance of the six tools, we compared the overlaps of identified predictions from WT, outliers, and the true 6mA sites within canonical motifs. Notably, the proportion of features varied (Fig. 4a). SMRT, Dorado, Tombo_denovo, Tombo_levelcom, and Tombo_modelcom showed a reasonable number of filtered WT outputs, consistent with the low overall bacterial 6mA content (~0.19%) in the Pseudomonas genome59. However, using the established cutoff, Tombo_levelcom, and Tombo_modelcom demonstrated higher numbers of outliers, which were 32.8 and 3.7 times those of WT, respectively. The false calling rate (FCR) was determined as the ratio of the number of predictions that overlapped between WT and outliers to the total number of outliers. Dorado, Tombo_denovo, and mCaller had higher FCRs, which suggested that these tools have a larger proportion of WGA predictions also present in WT (Fig. 4b). Comparing the ∆hsdMSR results, outliers, and true motif sites of ∆hsdMSR, we observed that the three Tombo tools were less effective compared to themselves with shifting (Supplementary Fig. 6a, b). Moreover, this indicated an overall decrease in performance for all tools with the methylation-deficient samples. Among the tools, SMRT exhibited remarkable performance characterized by its exceptional accuracy, minimal outliers detection, and small intersections between outliers and the ground truth data (Supplementary Fig. 6c). Furthermore, we observed that subtracting sites common to WT and WGA datasets can significantly reduce the FPR in Dorado, mCaller, and Tombo_denovo.
a Venn diagram illustrating the overlaps among the obtained results of WT, outliers, and the ground truth from canonical motifs. Each tool follows the cutoff determined by the best F1 score achieved in the previous analysis of the WT dataset. No shifts are assayed in the 5-mer region. b Bubble plot displaying the log-transformed number of outliers and the false calling ratio. The false calling ratio is calculated by dividing the number of outliers by the total number of sites called in the WT dataset. The size of the bubbles represents the intersection of outliers with ground truth. The outliers represent the results obtained from processing WGA sequencing files. c The bar plot illustrating the distribution of 87 methylation sites across chromosomes and their corresponding motif types. d Upset plot presenting the six tools’ results after being filtered with a determined cutoff. The vertical bars represent the size of each set, with the largest set on the left and the top 20 sets displayed. The horizontal bars indicate the number of features unique to each tool, and the dots and connecting lines show the intersections between different sets of tools but exclude the contents in the intersections of all tools. Sites colored in rose red represent the number of true 6mA sites that were not identified in all TGS tools without shifting. e Venn diagram shows the overlap between Psph WT 6mA-IP-seq results and the 87 sites undetected by TGS. f Peak defined by 6mA-IP-seq and 6mA sites detected by different tools are visualized by “r.trackplot”. g The Venn diagram depicting the overlapping methylation sites identified by 6mA-IP-seq, SMRT, Dorado, and canonical motif analysis in Psph WT. h Box plot showcasing the region encompassing 6mA motif GAG-N6-GCTG, with the assigned current at each position sourced from nanoCEM. Centre line: median. Box bounds: 25th to 75th percentiles (interquartile range, IQR). Whiskers: extend to most extreme data points within 1.5 × IQR.
6mA-IP-seq validation reveals 14 methylation sites undetected by third-generation sequencing tools
We compared the six tools against true motif sites and each other to identify the advantages and disadvantages of the two TGS technologies. Notably, 108 sites identified by SMRT were not detected by any Nanopore-based analytical tool (Fig. 4d). mCaller identified the highest number of unique 6mA sites, showing minimal overlap with other tools while demonstrating a significant erroneous judgment. Intriguingly, 87 sites within established methylation motifs remained undetected by all TGS tools, including 60 type 1 motif and 27 type 2 motif sites (Fig. 4c). To validate the methylation status of these sites, we performed 6mA-IP-seq analysis on both WT and ∆hsdMSR mutant strains of Psph (Supplementary Data 4, 5)60. Analysis revealed 1814 and 543 peaks in the WT and ∆hsdMSR strains, with median peak widths of 122 and 142 bp (Supplementary Fig. 6d, e).
Among the 87 previously TGS undetectable sites, 14 were validated by 6mA-IP-seq analysis, with locations proximal to or within several genes, including smc and gabD2 (Fig. 4e; Supplementary Fig. 6f). For instance, a type 1 motif (positions 2,077,381–2,077,393) within the smc gene region showed methylation signals despite being undetected on the forward strand by SMRT, Dorado, and Tombo_levelcom (Fig. 4f). Further investigation of canonical motif sites and those detected by 6mA-IP-seq, Dorado, and SMRT revealed varying degrees of detection gaps across all methodologies (Fig. 4g). To elucidate the underlying factors affecting detection sensitivity, we employed nanoCEM for in-depth analysis of ionic current signals at sites undetected by Dorado or SMRT61. Two 6mA sites (positions 1,486,230 and 1,686,240) in the hrpS promoter region were detected by SMRT and 6mA-IP-seq but missed by Dorado (Fig. 4h; Supplementary Fig. 7a, b). Current signal analysis revealed subtle yet consistent differences between WT and WGA samples at these positions, suggesting that Dorado’s detection algorithm requires further optimization to enhance accuracy and sensitivity. Analysis of overlap patterns revealed 114 non-canonical 6mA sites detected by both Dorado and 6mA-IP-seq, validating their authenticity as methylation sites. The mechanism underlying these modifications remained unclear, potentially arising from complex in vivo regulation or misdirected methyl transfer events caused by adjacent DNA structures.
R10.4.1 is superior in capturing base signal differences compared to R9.4.1 flow cell
Furthermore, we explored current differences between WT and WGA sequencing at specific sites with different types of flow cells. Within the promoter region of the hrpQ gene, a type 1 motif (GAG-N6-GCTG) houses 262 base pairs upstream of the start codon, featuring methylation on the N6 position of the adenine. This modification on the negative strand, shown by SMRT, was not detectable through Nanopore sequencing with any of the five tools (Fig. 4g). Specifically, at the adenine site and the contiguous 5-mer region, a discernible disparity was observed in dwell time, mean, and median current signals through the R10.4.1 flow cell, underscoring the impact of methylation, while R9.4.1 showed less difference in the same region (Supplementary Fig. 8a). Additionally, mismatches in the regions contained 6mA occurred more frequently and significantly compared to the WGA during basecalling in the 5-mer regions containing methylated adenine with both types of flow cells. In the 3,902,663–3,902,683 region on the chromosome, another type 1 motif housing 644 base pairs before filL was verified by both SMRT and tool Dorado. By tracing the source, the current differences and mismatch errors in the 5-mer region could be visualized using both R10.4.1 and R9.4.1 flow cells (Supplementary Fig. 8b). Overall, the modification of one base may continuously affect the current of multiple surrounding bases and cause errors during basecalling. Moreover, flow cell R10.4.1 exhibited enhanced sensitivity in detecting the methylation-induced current differences. Consequently, tools designed for R10.4.1 may demonstrate superior performance to those designed for R9.4.1, especially in discerning site variations between WT and WGA.
Optimized method and high consistency of Dorado
To test the base-resolution accuracy at the LOST set, we employed the single mode tools to call 6mA in both WT and ∆hsdMSR separately, subsequently masking identical predictions, which proved the continued high performance of SMRT and Dorado (Supplementary Fig. 9a–f). This highlighted a processing method: By applying the same cutoff for assigned values, sites with identical results to WGA can be removed from the WT dataset, potentially eliminating the intrinsic FP caused by the sequencing methods and analytical tools (Fig. 5a). To test our hypothesis, we applied this method to all six tools by subtracting WGA results from the WT/∆hsdMSR results (Supplementary Fig. 10). F1 scores at each cutoff were calculated for all tools, and the optimized method retained the ranking of the tools compared with the previous results (Fig. 5b, c). An overall improvement of the F1 score at each cutoff was observed (Fig. 5d). Dorado showed the most significant increase of the maximum F1 score (from 0.387 to 0.667) in the WT set, indicating a substantial functional improvement at single nucleotide resolution. Notably, for all six individual tools, a flat state or a minor reduction appeared in the assigned value at which the best F1 score was achieved, suggesting a lower threshold of assigned value when applying outliers subtraction (Fig. 5d).
a Schematic plot shows the optimized method. Created in BioRender. Huang, J. (2025) https://BioRender.com/g4dqgr0. b, c F1 score change curve depicting the variation in performance with different numbers of adenines considered. The small line plots zoom in on the top 20,000 predictions, providing a closer examination. d Heat map illustrating the best F1 score observed during the optimization process, the minimal assigned values among all predictions, precision values, and FPR values. Color intensity scales with numeric values, darker indicating higher values. e Distribution of assigned values across different outputs at varying coverage levels. f Correlation score calculated using Pearson’s method among six types of input coverage. g Venn diagram showing the intersection of candidate sites generated by Dorado. The cutoff is determined based on the best F1 score. Color intensity scales with numeric values, darker indicating higher values. h–j Evaluation across different input coverage through the presentation of the F1 score curve, PRC, and ROC analysis. The ground truth dataset comprised all 6mA sites within type 1 and type 2 recognition motifs. Source data are provided as a Source Data file.
Additionally, the performance across various subsampling levels is of particular interest. The variable sequencing characteristics at different genome coverage levels are essential to identify the best sequencing depth for further analysis62. To examine the consistency and output heterogeneity of Dorado, data from six coverages (10×, 20×, 50×, 100×, 200×, and 450×) were collected. The assigned values gradually increased as the input coverage decreased (Fig. 5e). The outputs were highly consistent once a coverage level of 50× was reached, demonstrating a Pearson correlation coefficient exceeding 0.88 (Fig. 5f). We compared the predictions under six different sequencing depth to the ground truth. The resulting F1 score curve displayed a forward shifting, and the margins decreased after achieving a depth of 50× or more (Fig. 5h). Using the assigned value of 0.872 as the cutoff (previously defined for Dorado in the WT set), the intersection across the six groups included 3794 predictions, with 1265 of them corresponding to true 6mA sites (Fig. 5g). Additionally, a progressive change was observed in the AP and ROC scores (Fig. 5i, j). At the highest coverage of 450×, the AP was 0.295 and the AUC reached 0.992, indicating robust model performance. As the coverage decreased to 200×, 100×, and 50×, the AP and the AUC values gradually decreased, and this trend was intensified at lower coverage levels of 20× and 10×. To achieve the best performance of Dorado, we emphasized the importance of sufficient sequencing depth to maintain predictive reliability.
Cross-validation across five bacterial strains
To establish a robust validation framework, we extended our analysis to five additional bacterial strains: Pseudomonas syringae pv. tomato DC3000 (Pst), Escherichia coli K-12, Escherichia coli 980-2, Klebsiella pneumoniae 17ZR-21, and Bacillus cereus ATCC 14579 (Supplementary Data 3). With both publicly available datasets and newly generated SMRT and Nanopore (R9/R10) sequencing data, an initial quality assessment was performed to evaluate sequencing quality metrics and coverage depth (Supplementary Data 1, 2)40,43,45,48,63,64. Nanopore sequencing achieved a mean coverage depth of 321×, while SMRT sequencing, excluding E. coli K-12 where raw data was unavailable, yielded an average coverage of 480×.
Based on the established analysis pipeline from Psph, we evaluated three top-performing tools: SMRT, Dorado, and Tombo_levelcom (referred to as Tombo). While Dorado and Tombo achieved > 86% adenine site coverage, SMRT implemented selective filtering based on modification characteristics, resulting in adenine site coverage below 6% (Supplementary Fig. 11a). Ten methylation motifs, validated by REBASE or published literature, were characterized across five bacterial species, including three Dam-dependent GATC motifs, four dual-recognition site motifs, and three single-recognition site motifs. Motif analyses revealed that SMRT identified all canonical motifs, while Dorado and Tombo detected 9/10 and 5/10 motifs, respectively (Fig. 6a). Single-read accuracy analysis of SMRT and Dorado across five bacterial strains revealed that SMRT consistently achieved higher F1 scores and precision scores, with particularly notable superiority in detecting modifications at lower fractions, indicating better signal-to-noise ratio (Supplementary Fig. 11b, c; Fig. 6b). However, SMRT performance showed notable deterioration in E. coli K-12 analysis, likely attributable to the implementation of early-generation SMRT algorithms with limited accuracy, consistent with previous studies27,65.
a Characteristic methylation motif identified in five bacterial strains. The left phylogenetic tree shows the relationships among five bacterial strains. Quantitative metrics displaying: modification fraction for SMRT and Dorado, number of motif occurrences detected by each tool, and motif enrichment significance for Tombo. The likelihood ratio test was applied to calculate the e-value. b Heatmap showing modification fraction corresponding to the optimal F1 scores in the single-molecule resolution comparison of Dorado and SMRT. c Heatmap showing the maximum F1 scores in the site-resolution evaluation. d Genomic coverage analysis showing the adenine site coverage by different detection tools. e Ratio of optimal predictions to validated 6mA sites in respective bacterial genomes. f Distribution of F1 scores across five bacterial strains displaying the performance of tools: Dorado, Dorado_optimized, and SMRT. g Comparative analysis in E. coli K-12 of SMRT predictions from WT sample, outliers (SMRT predictions from ∆dam/dcm sample), and two types of canonical motifs through Venn diagram representation. h Venn diagram showing the overlaps pattern of Dorado, optimized Dorado predictions, and canonical motif-based ground truth in E. coli K-12. i Venn diagram illustrating the overlap between DR-6mA-seq results, SMRT detection, Dorado predictions, and canonical motif-based ground truth in E. coli K-12. (b–d, g–i) Color intensity scales with numeric values, darker indicating higher values. Source data are provided as a Source Data file.
Implementation of our optimization methodology enabled comprehensive comparison among SMRT, Dorado, optimized Dorado, Tombo, and optimized Tombo analyses. By evaluating the profiling capabilities on the site level against the gold standard of true motif sites, SMRT demonstrated superior performance, with consistently high F1 scores, precision, AP, and ROC values across all tested bacterial strains, outperforming other detection methods (Fig. 6c; Supplementary Fig. 11d–f). B. cereus harbored a single MTase corresponding to 992 motif sites (Supplementary Fig. 11a). Dorado’s low accuracy in B. cereus aligned with our previous observations in Psph ΔhsdMSR, suggesting performance limitations in low-abundance methylation contexts. The optimized Dorado protocol exhibited significant improvements compared to the baseline version of Dorado, with F1 scores and precision scores enhancement across all strains, most notably in Pst, whose F1 scores increased by 0.261 (Fig. 6c). The incorporation of WGA samples as control groups in Tombo’s native statistical framework precluded the need for additional optimization (Fig. 6c; Supplementary Fig. 11d, e). In Pst and Psph, both Dorado and the optimized method failed to identify all 6mA sites on the small plasmid (Supplementary Fig. 12a). This underscored the importance of tool optimization for specific genomic contexts and the potential need for further refinement to improve detection capabilities across diverse genomic elements.
Outlier analysis reveals SMRT motif overfitting in E. coli K-12
By quantifying the adenine site coverage across all tools, we compared the total adenine sites and canonical 6mA motif occurrences across five bacterial strains (Fig. 6d, e). We observed that SMRT predictions closely matched the number of canonical 6mA sites (mean fold change: 0.993) (Fig. 6e). While Dorado showed a slight over-prediction (mean fold change: 1.255), it demonstrated lower sensitivity at the WGA set, identifying an average 0.34-fold of the canonical sites (Fig. 6e). We integrated previously published SMRT sequencing data from E. coli K-12 ∆dam/dcm strain alongside WT SMRT sequencing results to evaluate the two established methylation motifs in E. coli K-1240. The overlap pattern revealed the loss of 27,770 GATC motifs after knocking out the orphan MTase Dam. Unexpectedly, 6mA motifs AAC-N6-GTGC predicted to be unaffected by dam knockout were also absent from the analysis (Fig. 6g). Dorado’s outlier analysis in E. coli K-12 demonstrated reduced methylation calls compared to WT samples. However, 1274 sites absent in WT samples were still detected (Fig. 6h). We incorporated recently published DR-6mA-seq data, which utilizes 2-thio-dTTP to generate distinctive mutation signatures at 6mA-modified positions66. Three-way comparative analysis in E. coli K-12 revealed complex intersection patterns (Fig. 6i). While 9536 methylation sites were consistently identified across all three approaches, each method also generated unique predictions: Dorado identified 9517 sites; SMRT detected 1143 sites; and DR-6mA-seq yielded 4238 sites that showed no overlap with other methods or established ground truth. The biological authenticity of these method-specific 6mA positions in E. coli K-12 required validation through advanced detection strategies.
Additionally, given that previous studies indicate bacterial 6mA motif methylation levels typically approach 95% instead of fully methylated, and despite methylation in bacteria being primarily MTase-specific motif driven, we investigated the potential impacts of these characteristics on our assessment methodology48,59,67. We compared two different ground truth datasets in E. coli K-12: MTase-corresponding motif sites and 6mA sites validated by triplicate DR-6mA-seq experiments. While the relative performance ranking of tools remained consistent, using DR-6mA-seq-verified sites as ground truth revealed lower overall F1 scores (Supplementary Fig. 12b). This evidence further supported our experimental design.
Discussion
Advanced technologies can identify the MTase system or classify it as an orphan MTase based on the sequence identity. However, more advanced sequencing techniques are required to clarify the methylation frequency and distribution that MTases brought to genomes8. In this study, we have summarized the tools currently available for bacterial 6mA research. In addition to seven Nanopore tools and SMRT sequencing highlighted in this work, we identified numerous tools for Nanopore that attempt to detect 6mA. However, many of these tools, such as SignalAlign and methBERT, are outdated or do not support the latest sequencing platforms, such as R9.4.1 or R10.4.168. Additionally, some tools such as DeepMP and DeepSignal are limited to detecting specific motifs (for example, GATC)69,70. These tools inherently lack the capability to detect MTase types other than the Dam, which restricts their utility in comprehensive 6mA detection.
Nanopore tools exhibit considerable diversity and can primarily be categorized into two types based on their inherent input requests: comparison mode tools and single mode tools71. Seven Nanopore tools were systematically evaluated to guide future research on bacterial 6mA profiling using Nanopore sequencing. Our results revealed variable performance in terms of the F1 score, two continuous evaluation metrics, PRC and ROC, and the outliers. The benchmarking results highlighted two top-performing Nanopore tools. One is Tombo_levelcom, a comparison mode tool designed for R9.4.1 sequencing data. Despite its strong performance, we did not consider Tombo_levelcom to be the best Nanopore tool due to its lack of support for R10 sequencing data, the low base resolution, and unstable performance across other strains. Nevertheless, its statistical comparison method, which objectively and accurately predicted true 6mA motifs and sites in 5-mer bases solely on the differences between two sets of current signals, underscored its potential for identifying modifications by detecting signal variations caused by modified bases. Another top-performing tool is Dorado, which supports R10.4.1 sequencing data and delivers highly accurate 6mA predictions. Dorado’s advanced 6mA monitoring model requires only the experimental DNA sequencing results and can achieve single-nucleotide resolution. By examining Dorado, we found that removing the intersection of WT/∆hsdMSR and WGA results led to better overlap with ground truth, forming the basis for our optimized method. This finding highlighted the ongoing need for eliminating FP and suggested that WGA sequencing should be used as a blank control to enhance the precision of 6mA detection tools. Additionally, consistency was observed in Dorado across six different sequencing depths. While we recommend a minimum sequencing depth of 50× based on our results, we found that Dorado achieved its peak F1 score at a maximum depth of 450×. This indicated that even deeper sequencing could potentially improve Dorado’s accuracy further, though the upper limit remains unknown. Moreover, Dorado failed to detect the E. coli K-12 AAC-N6-GTGC motif, instead identifying TCA-N6-TGCT, which lacks a corresponding MTase. This suggested limitations in Dorado’s basecalling models for distinguishing the ionic current signals of this motif.
However, a significant challenge with Nanopore sequencing is that the impact on the current signal is rarely at single-base resolution, often affecting an interval of three to six nucleotides33. Numerous studies have shown that the mismatches observed in Nanopore basecalling may be affected by methylation or other modifications, consistent with our results72,73. Therefore, we proposed the development of Nanopore analytical tools based on a comparison of the current signals and mismatches between WT and WGA to provide more accurate modification site information. A model based on machine learning or deep learning algorithms, such as Dorado, may provide better single-base resolution.
Additionally, we evaluated the performance of SMRT with seven Nanopore tools. Previous experimental results have demonstrated that SMRT sequencing can accurately detect 6mA in Pseudomonas45,59. In our systematic evaluation across six bacterial strains and one HsdMSR mutant strain, SMRT consistently demonstrated superior performance compared to other tools. These rigorous comparative studies established SMRT as the most reliable tool for bacterial 6mA detection currently.
However, our findings underscored the limitations of relying on a single technique for de novo research. The conducted comprehensive cross-validation studies compared SMRT sequencing with Dorado results from WT, ∆hsdMSR and WGA samples in Psph against established 6mA motifs and 6mA-IP data, revealing tool-specific limitations. Similarly, in E. coli K-12, we analyzed SMRT sequencing data from both WT and ∆dam/dcm strains, validating results against DR-6mA-seq data. The complex patterns emerging from these intersecting analyses highlighted the urgent need for establishing a gold standard in bacterial 6mA detection, particularly given the inherent uncertainty of results from any single tool without experimental validation. Thus, we generated this cross-comparison report and revealed that even the most accurate tools might miss certain signals or incorrectly identify others. Current methylation analysis methods (SMRT, Nanopore, or antibody-based approaches 6mA-IP and chemical-required experiment DR-6mA-seq) individually provide adequate motif identification and base-resolution detection for research applications. Some sites consistently undetected by all methods might represent unique genomic regions that are not fully methylated during the specific phase analyzed. Various hypotheses could explain these observations. We proposed that complex DNA structures may hinder MTases from accessing these sites and transferring methyl groups effectively. Additionally, the spatial configuration of DNA sequences could potentially lead to erroneous methylation patterns74,75. However, without spatial evidence, these interpretations remain speculative. Improving Nanopore’s 6mA detection accuracy is crucial for bacterial methylome research, given the costly and complex nature of alternative methods like SMRT sequencing, 6mA-IP-seq and DR-6mA-seq.
Current bacterial 6mA detection lacks a definitive gold standard for tool evaluation. The benchmark of methylation site detection tools has historically followed several approaches, including next-generation sequencing results, intersections of multiple detection tools, or motif-based identification38,39,40,71,76. While we utilized MTase-corresponding motif sites as reference points, which is supported by extensive research showing MTase-driven motif methylation at ~95% levels, the absolute methylation status remains difficult to determine. Furthermore, our comparison of two different ground truth sets in E. coli K-12 strengthened confidence in the MTase-based approach, given the consistently higher F1 scores observed77,78.
Overall, this study presents a comprehensive comparison of results from SMRT and seven Nanopore-focused computational tools for bacterial 6mA. Although current evaluation metrics may not be flawless to assess tool performance, the observed variations in data processing algorithms among these methods merit detailed investigation. Notably, our proposed optimized method remained applicable when fresh data were used to verify our conclusions. We also observed that 6mA sites consistently identified by both the ONT tools and SMRT can be visually distinguished after the visualization of Nanopore current signals and mismatch rates. This observation underscores the need for improvement of the 6mA detection algorithm in TGS and a comparative tool tailored for Nanopore R10.4.1 sequencing results. In summary, we conducted a comprehensive comparison of computational tools for 6mA mapping. The insights pertaining to motif discovery capabilities, 6mA profiling preferences, and inherent biases are expected to facilitate the development of detection strategies and workflows, aiding microbiologists in their 6mA detection efforts.
Methods
Bacteria culture, DNA extraction, WGA production, and Nanopore sequencing
Psph 1448 A WT strain, ∆hsdMSR strain, and Pst DC3000 WT strain were cultured at 28 °C in King’s B (KB) medium with 220 rpm for 18 h. Rifampin 25 μg/mL was added additionally as antibiotics. Escherichia coli 980-2, Klebsiella pneumoniae 17ZR-21, and Bacillus cereus ATCC 14579 were cultured at 37 °C in LB with 220 rpm overnight.
DNA was extracted with the TIANamp Bacteria DNA kit (Tiangen Biotech, Cat No. GDP302-02). The WGA DNA followed the REPLI-g Mini Kit (QIAGEN, Cat No. 150023) and was purified by the QIAquick PCR Purification Kit (QIAGEN, Cat No. 28104). DNA quality was checked with Nanodrop and Qubit. DNA of OD 260/280 around 1.8 and OD 260/230 around 2.0–2.2 were required. For flow cell R9.4.1, Rapid Sequencing Kit (SKU: SQK-RAD004) was used to construct the nanopore sequencing library. And Flow Cell Wash Kit (SKU: EXP-WSH004) was applied to wash the chips. For flow cell R10.4.1, Native Barcoding Kit 24 V14 (SKU: SQK-NBD114.24) was used to construct the library, and flow Cell Wash Kit (SKU: EXP-WSH004) was applied for flow cell washing.
Sequencing data and quality control
Psph ∆hsdMSR DNA used in this analysis were subsampled from original sequence results. Psph WGA DNA were sequenced twice, the estimate features are compatible and then combined together as WGA_merged. We used minimap2 (v2.17) and samtools (v1.17) to filter unaligned reads79,80. The basecalling results were stored as FASTQ files and were statistically analyzed with SeqKit v2.3.081. Raw read features, including read length and Q score, were extracted using Giraffe_View v0.1.0.1482.
SMRT sequencing and SMRT Link analysis
Single-molecule real-time sequencing was performed on the Sequel II system using standard SMRTbell prep kit 3.0 preparation protocols. HiFi reads were exclusively selected for 6mA detection analysis. Raw sequencing data were processed using SMRT Link (v13.0), and methylation sites were identified using ipdSummary with a significance threshold of p < 0.05. For a comprehensive assessment, we included predictions annotated as either “modified_base” or “m6A” in the motif-, site-, and false call-level evaluations. Single-molecule level analysis was performed exclusively on predictions labeled as “m6A”, as the modification fractions were only calculated after the identification of 6mA.
Tools for identifying 6mA modifications in Nanopore sequencing data
mCaller is a Python script that requires the Nanopolish (v0.14.1) processed file. First, basecall was performed, and we then used Nanopolish to index the raw fast5 with fastq output. Next, Nanopolish eventalign aligned the signal to k-mers of the provided reference gnome. Finally, mCaller.py was executed to detect the 6mA probability on all A sites.
Tombo (v1.5) requires raw sequencing fast5 of Nanopore R9.4.1. The fast5 files were transformed into single_read_fast5 files before further processing. This was done with a tool provided by ont_fast5_api (v4.1.1). Tombo requires three steps before calling the 6mA sites: (1) Basecalling the fast5s and getting the fastq results. (2) Tombo preprocess adding the fastq into fast5 files. (3) Tombo resquiggling the signals to reference. Afterward, three methods in Tombo were employed for DNA 6mA calling.
-
(a)
Tombo_denovo is a method for the discovery of de novo motifs and any type of modified bases. The model of Tombo_denovo is designed for de novo detection of modified bases without the need for a pre-trained model. The resulting data file assigns a fractional value, ranging between 0 and 1, to each identified site.
-
(b)
Tombo_modelcom provides a comparison method for all types of modified bases. Tombo_modelcom adjusts its model using a control set (usually non-modification DNA). The fresh model is then applied similarly to the Tombo_denovo methods, identifying deviations. WGA is recommended for control settings. The output file contains a fraction value for each site, and the fractions vary from 0 to 1.
-
(c)
Tombo_levelcom is another comparison mode tool for all types of modifications. This method compares the difference at signal levels between the two sets of input. Also, Tombo_levelcom applies three types of test methods: KS-test, U-test, and T-test. We kept the p value of the KS test for further analysis.
Nanodisco (v1.0.3) is a de novo tool to identify the methylations based on the current signal difference between whole-genome sequences and negative controls. To detect the methylation motifs within our bacteria sample, we used the functions including “preprocess”, “chunk_info”, “difference”, “merge”, and “motif” of Nanodisco with default arguments on R9.4.1 datasets. The resulting files named “sampleID.RDS” from the “difference” function included the position, mean current difference (pA), and p values (for t test and Mann-Whitney u test) for each used site. Specifically, we referred to the p value of the t test. The RDS files were utilized for the motif discovery study.
Dorado (v0.7.3) is a tool developed by ONT. For flow cell R10.4.1, the pod5 file format is recommended rather than fast5 in the Dorado pipelines. Modified base detection can come along together with basecalling and each type of modification has its corresponding model. After the modified basecalling, the output bam file needs to be aligned with the reference genome and then sorted by samtools (v1.9). The sorted bam files can be transformed into bed files with modkit (https://github.com/nanoporetech/modkit). Importantly, we use the “include-bed” option to only export A sites.
Hammerhead (v0.1.3) is a tool to de novo identify the methylations based on the specific strand errors on R10.4.1 reads. To get the difference index (a value to judge the probability of methylation ranging from 0 to 1), for total sites, we adjusted the “cutoff” argument from the default (0.35) to 0. Hammerhead developed the strand-comparison algorithm, considering the fraction of difference index as representing the modification probability of both bases on the two strands. To assess the performance of Hammerhead in motif detection, all top 10,000 A sites sorted in descending order of assigned values were utilized for downstream motif enrichment.
Four sets of data and the outliers
In this research, we conducted three sets of data for comparison to ground truth. For WT and ∆hsdMSR sets, comparison mode tools (Tombo_modelcom, Tombo_levelcom, and Nanodisco) processed the WT DNA or ∆hsdMSR DNA sequencing results and compared them to the WGA DNA signal; single mode tools (mCaller, Tombo_denovo, Dorado, and Hammerhead) produced the results via only WT or ∆hsdMSR DNA sequencing data. For the LOST set, we obtained the results by comparing WT to ∆HsdMSR signal with comparison mode tools or using single mode tools to process WT and ∆hsdMSR signals twice separately and obtained the lost sites by matching two results.
WGA sequencing results were analyzed using all seven tools and the predictions were named outliers, serving as false calls. Single mode tools (SMRT, mCaller, Tombo_denovo, Dorado, and Hammerhead) processed the WGA sequencing results as input directly. For comparison mode tools (Tombo_modelcom, Tombo_levelcom, and Nanodisco), we first separated the WGA raw sequencing results (pod5 files) into two parts randomly, then compared one half set of WGA to the other.
The optimized method
Experiment DNA (Native DNA or ∆hsdMSR DNA indicated in this research) and whole genome amplification DNA (WGA) should be sequenced consecutively. Comparison mode tools (Tombo_modelcom, Tombo_levelcom, and Nano-disco) should first produce the results by comparing experiment DNA to/with WGA, and secondly, WGA sequencing files were separated into two halves randomly and compared to each other. Single mode tools (SMRT, mCaller, Tombo_denovo, Dorado, and Hammerhead) called the methylations with experiment DNA sequencing result alone first and then WGA. After achieving two predictions and applying specific filtration, sites from WGA were deleted from the experiment DNA analysis result. Note that WGA had the same cutoff as the experiment DNA output.
Dorado downsampling
The raw data from R10.4.1 Psph WT Nanopore sequencing exhibited an average coverage of 450x. We randomly selected subsets of the raw pod5 dataset, comprising 37, 18, 9, 4, and 2 files from a total of 74 files by using the shell command “find fast5/ -type f -print0 | shuf -zn ${file_numbers} | xargs -0 cp -t dSample/”. Following selection, we conducted de novo basecalling for modification detection using Dorado and extracted the methylation bed files with modkit. The sequencing depth across all six coverage levels displayed the following characteristics: average coverages of 450, 199, 97, 48, 21, and 10, with median values of 362, 167, 86, 34, 14, and 5, respectively.
Normalization of all result files and assigned value calculation
In different domains, tools can be divided into different groups. According to the adapted flow cell type, mCaller, Tombo, and Nanodisco belong to the R9.4.1 group; Dorado and Hammerhead are R10.4.1 tools. Based on the input sequencing files needed and the algorithm of each tool, mCaller, Tombo_denovo, Dorado, and Hammerhead were named as single mode tools; Tombo_modelcom, Tombo_levelcom, and Nanodisco were labeled as comparison mode tools as they needed additional sequencing file as control.
mCaller, Tombo_denovo, Tombo_modelcom, Dorado, and Hammerhead provide a fraction at each base, while Tombo_levelcom and Nanodisco provide a p value at each position. Initially, we transformed the p value to its negative logarithm, -lg(p value), to yield a positive value that suggests the probability of a modification. SMRT provide a Phred-transformed score of p value to all predictions. Subsequently, we normalized the fractions, -log10(p value), and the score by performing min-max scaling with boundary protection. The assigned values were calculated according to the following equation:
where \({V}_{{norm}}\) represents the assigned value, \({V}_{{orig}}\) represents the original value, \({V}_{max }\) and \({V}_{min }\) are the maximum and minimum values in the dataset, respectively, and ε is set to 1×10^-6 to prevent boundary effects. This transformation mapped all values to the range \([\varepsilon,1-\varepsilon ]\), ensuring numerical stability in subsequent analyses.
Motif discovery
R-version of MEME was leveraged to identify the motifs. First, DNA sequences of length 30–40 bp were extracted, centered on the methylation site provided. We then used runStreme to discover motifs and the output format followed “universalmotif_df”. Control was set to “shuffle” as this will automatically select some input sequences as the background set. Motifs were visualized with “view_motifs”.
Additionally, Methylation motif analysis was performed using three complementary approaches. ModKit (v0.3.2) was employed to identify methylation motifs from Dorado results using a high-confidence threshold of 0.8. MotifMaker (from SMRK Link 13.0) was used to detect methylation-associated sequence patterns by analyzing the GFF modification files against the reference sequence.
5-mer shift for Tombo_denovo, Tombo_modelcom and Tombo_levelcom
In Nanopore sequencing, a modification at any position can influence the ionic current measurements of neighboring nucleotides. To mitigate the influence of ionic current dispersion effects in tools based on direct signal comparison, we implemented a 5-mer shift strategy in our profile capabilities analysis of Tombo tools. For each putative modification site, we generated four additional data frames by shifting the base positions by -2, -1, +1, and +2 nucleotides relative to the original position. These five data frames (including the original) were merged to create a comprehensive 5-mer window analysis. The combined data was then sorted by modification probability values in descending order. For cases where multiple shifted positions identified the same site, we retained only the instance with the highest probability score and removed duplicates. This approach optimizes modification detection by considering the broader sequence context inherent to Nanopore current measurements.
Evaluation matrix and F1 score calculation
In the three sets categorized before, results analyzed with SMRT and seven Nanopore tools were compared to the ground truth. We applied 200 distinct cutoff thresholds for data filtration, retaining predictions ranked above each threshold based on their assigned values. For single-molecule analysis, modification fraction values were used as filtering criteria. The following computation iterated over each cutoff value and performed the same assessments.
For Dorado and mCaller, no 5-mer shift was involved. The TP were identified as the intersection between predictions and the ground truth. The true negatives (TN) were calculated by determining the difference between a set of all A sites and the union of predictions and ground truth, representing cases correctly identified as negative. The FP were the predicted positives that were not TP. The false negatives (FN) were the ground truth positives that were not predicted as positive. Precision was defined as the ratio of TP to all predicted positives. Recall was the ratio of TP to all actual positives.
For tools requiring a 5-mer shift (Tombo_denovo, Tombo_modelcom, and Tombo_levelcom), the calculation of precision and recall differed. The TP for precision was the intersection of predictions and the 5-mer context of ground truth. The TP for recall was the intersection of ground truth and the 5-mer context of predictions. Additionally, FP was the predicted positive that was not TP for precision. FN was the ground truth that was not predicted as TP for recall.
The F1 score is the harmonic mean of precision and recall. The definition of TP, FP and FN with shifting was adopted in previous studies83,84.
AP, and AUC score calculation
The AP was calculated as the area under the PRC. To achieve this, both the Precision and Recall vectors were adjusted to include the values 0 at the beginning and 1 at the end of each vector. This adjustment ensured that the curve starts at the origin (0,0) and ends at the point (1,1), essential for capturing the complete behavior of the model from the lowest to the highest threshold. A smooth approximation of the PRC was performed to calculate the AP accurately. Similarly, the ROC curve was constructed using two key performance indicators: the True Positive Rate (TPR) and the FPR. The TPR, also known as sensitivity, is calculated as the ratio of TP to the total actual positives (ground truth), indicating the model’s ability to correctly identify positive instances. The FPR is calculated as the ratio of FP to the sum of TN and FP, representing the proportion of negative instances that are incorrectly classified as positive. The ROC curve plots TPR against FPR at various threshold settings, and the AUC score was derived by integrating the area under this curve following the above methods.
6mA-IP-seq
Genomic DNA was extracted from stationary phase bacterial cultures (two single colonies) with RNase A treatment. The extracted DNA was sonicated to ~150 bp fragments. A 5× IP buffer (50 mM Tris-HCl, 750 mM NaCl, 0.5% v/v IGEPAL CA-630) was prepared. Dynabeads Protein A magnetic beads (Invitrogen, Cat. No. 10001D) were washed with 1× IP buffer and pre-blocked with bovine serum albumin (20 μg/μL; Sigma-Aldrich, CAS Number: 9048-46-8) at 4 °C for 6 h. The elution buffer was prepared by combining IP buffer, TE and Proteinase K. Genomic DNA (5 μg) was denatured in Tris-EDTA buffer at 95 °C for 10 min followed by ice-chilling. A portion was reserved as input control. The remaining DNA was incubated with anti-6mA antibody (15 μg; Synaptic Systems, Cat. No. 202 003) in 1× IP buffer at 4 °C for 6 h, followed by overnight incubation with pre-blocked Protein A beads at 4 °C with gentle rotation. After three times washing steps with 1× IP buffer, methylated DNA was eluted twice using elution buffer containing 6mA monophosphate. The combined eluates were precipitated using sodium acetate, ethanol, and glycogen at −80 °C overnight. The library was prepared by PCR amplification (15–18 cycles) and purified for Illumina sequencing. Peak calling was performed using MACS2 (v2.2.6), with input DNA sequencing alignments serving as controls. Significant peaks were identified using a stringent q value threshold of 0.001.
Current feature showcase with nanoCEM
nanoCEM (version 0.0.5.8) was utilized for displaying Nanopore sequencing current. All sequencing files were transformed into blow5 format by blue-crab (https://github.com/Psy-Fer/blue-crab) and slow5tools (https://github.com/hasindu2008/slow5tools). We selected the f5c_ev mode with the options “--norm” and “--pore r9” for R9 data, and “--pore r10” for R10 data.
Statistics and reproducibility
The study was designed to evaluate the performance of third-generation sequencing tools for bacterial 6mA profiling, with all experiments conducted under controlled conditions to ensure consistency and reproducibility. In addition to the quality control process and random sampling mentioned above, no statistical method was used to predetermine the sample size. No data were excluded from the analyses.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The raw Nanopore sequencing, SMRT, and 6mA-IP-seq data generated in this study have been deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database with the reference code BioProject PRJNA1119015. The processed data are available at Zenodo [https://doi.org/10.5281/zenodo.15165726]85. The benchmark evaluation results generated in this study are provided in the Source Data file. Source data are provided with this paper.
Code availability
The main benchmark pipeline and scripts used for this paper are available at https://github.com/LuBeifang/TGS-comparison-for-bacteria-6mA86.
References
Johnson, T. B. & Coghill, R. D. Researches on pyrimidines. C111. The discovery of 5-methyl-cytosine in tuberculinic acid, the nucleic acid of the tubercle bacillus. J. Am. Chem. Soc. 47, 2838–2844 (1925).
Waddington C. H. The epigenotype. Endeavour 1, 18–20 (1942).
Knelman, F., Dombrowski, N., Newitt, D. M. & Woodcock, A. H. Occurrence of a new base in the deoxyribonucleic acid of a strain of Bacterium coli. Nature 175, 336–337 (1955).
Mccarthy et al. The enzymatic methylation of RNA and DNA, ii. on the species specificity of the methylation enzymes*. Proc. Natl. Acad. Sci. USA 50, 164–169 (1963).
Mattei, A. L., Bailly, N. & Meissner, A. DNA methylation: a historical perspective. Trends Genet. 38, 676–707 (2022).
Arber, W. Host-controlled modification of bacteriophage. Annu. Rev. Microbiol. 19, 365–378 (1965).
Vasu, K. & Nagaraja, V. Diverse functions of restriction-modification systems in addition to cellular defense. Microbiol. Mol. Biol. Rev. 77, 53 (2013).
Roberts, R. J., Vincze, T., Posfai, J. & Macelis, D. REBASE—a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Res. 43, D298–D299 (2015).
Rodic, A., Blagojevic, B., Zdobnov, E., Djordjevic, M. & Djordjevic, M. Understanding key features of bacterial restriction-modification systems through quantitative modeling. BMC Syst. Biol. 11, 1 (2017).
Beaulaurier, J., Schadt, E. E. & Fang, G. Deciphering bacterial epigenomes using modern sequencing technologies. Nat. Rev. Genet. 2018 20:3 20, 157–172 (2018).
Enikeeva, F. N., Severinov, K. V. & Gelfand, M. S. Restriction–modification systems and bacteriophage invasion: who wins? J. Theor. Biol. 266, 550–559 (2010).
Shaw, L. P., Rocha, E. P. C. & Maclean, R. C. Restriction-modification systems have shaped the evolution and distribution of plasmids across bacteria. Nucleic Acids Res. 51, 6806–6818 (2023).
Gao, Q. et al. Bacterial DNA methyltransferase: A key to the epigenetic world with lessons learned from proteobacteria. Front. Microbiol. 14, 1129437 (2023).
Kusano, K., Naito, T., Handa, N. & Kobayashi, I. Restriction-modification systems as genomic parasites in competition for specific sequences. Proc. Natl. Acad. Sci. USA 92, 11095–11099 (1995).
Marinus, M. G. & Løbner-Olesen, A. DNA methylation. EcoSal Plus 6, 10–1128 (2014).
Low, D. A., Weyand, N. J. & Mahan, M. J. Roles of DNA adenine methylation in regulating bacterial gene expression and virulence. Infect. Immun. 69, 7197 (2001).
Goldfarb, T. et al. BREX is a novel phage resistance system widespread in microbial genomes. EMBO J. 34, 169 (2015).
Hui, W. et al. Functional analysis of the second methyltransferase in the bacteriophage exclusion system of Lactobacillus casei Zhang. J. Dairy Sci. 105, 2049–2057 (2022).
Zhao, J. et al. Roles of adenine methylation in the physiology of Lacticaseibacillus paracasei. Nat. Commun. 14, 1–12 (2023).
Lyu, C., Wang, H. D., Lai, W. & Wang, H. Identification and quantification of DNA N6-methyladenine modification in mammals: a challenge to modern analytical technologies. Curr. Opin. Chem. Biol. 73, 102259 (2023).
Lv, H., Dao, F. Y., Zhang, D., Yang, H. & Lin, H. Advances in mapping the epigenetic modifications of 5-methylcytosine (5mC), N6-methyladenine (6mA), and N4-methylcytosine (4mC). Biotechnol. Bioeng. 118, 4204–4216 (2021).
Stott, D. I. Immunoblotting and dot blotting. J. Immunol. Methods 119, 153–187 (1989).
Huang, W. et al. Determination of DNA adenine methylation in genomes of mammals and plants by liquid chromatography/mass spectrometry. RSC Adv. 5, 64046–64054 (2015).
Liu, J. et al. Abundant DNA 6mA methylation during early embryogenesis of zebrafish and pig. Nat. Commun. 2016 7:1 7, 1–7 (2016).
Mahdavi-Amiri, Y., Chung Kim Chung, K. & Hili, R. Single-nucleotide resolution of N6-adenine methylation sites in DNA and RNA by nitrite sequencing. Chem. Sci. 12, 606–612 (2021).
Chen, X., Xu, H., Shu, X. & Song, C. X. Mapping epigenetic modifications by sequencing technologies. Cell Death Differ. 1–10. https://doi.org/10.1038/s41418-023-01213-1 (2023).
Flusberg, B. A. et al. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat. Methods 7, 461–465 (2010).
Murray, I. A. et al. The methylomes of six bacteria. Nucleic Acids Res. 40, 11450 (2012).
Buck, D. et al. Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000Res 6, 100 (2017).
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).
Akeson, M., Branton, D., Kasianowicz, J. J., Brandin, E. & Deamer, D. W. Microsecond time-scale discrimination among polycytidylic acid, polyadenylic acid, and polyuridylic acid as homopolymers or as segments within single RNA molecules. Biophys. J. 77, 3227 (1999).
MacKenzie, M. & Argyropoulos, C. An introduction to nanopore sequencing: past, present, and future considerations. Micromachines 14, 459 (2023).
Wang, Y., Zhao, Y., Bollas, A., Wang, Y. & Au, K. F. Nanopore sequencing technology, bioinformatics and applications. Nat. Biotechnol. 39, 1348–1365 (2021).
Liang, S. et al. Noise in nanopore sensors: sources, models, reduction, and benchmarking. Nanotechnol. Precis. Eng. 3, 9–17 (2020).
Ferguson, S. et al. Species-specific basecallers improve actual accuracy of nanopore sequencing in plants. Plant Methods 18, 1–11 (2022).
Ni, Y., Liu, X., Simeneh, Z. M., Yang, M. & Li, R. Benchmarking of Nanopore R10.4 and R9.4.1 flow cells in single-cell whole-genome amplification and whole-genome shotgun sequencing. Comput. Struct. Biotechnol. J. 21, 2352–2364 (2023).
Sereika, M. et al. Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing. Nat. Methods 19, 823–826 (2022).
Yuen, Z. W. S. et al. Systematic benchmarking of tools for CpG methylation detection from nanopore sequencing. Nat. Commun. 12, 1–12 (2021).
Zhong, Z. D. et al. Systematic comparison of tools used for m6A mapping from nanopore direct RNA sequencing. Nat. Commun. 14, 1–14 (2023).
McIntyre, A. B. R. et al. Single-molecule sequencing detection of N6-methyladenine in microbial reference materials. Nat. Commun. 10, 1–11 (2019).
Stoiber, M. et al. De novo Identification of DNA modifications enabled by genome-guided nanopore signal processing. bioRxiv 094672. https://doi.org/10.1101/094672 (2017).
Rand, A. C. et al. Mapping DNA methylation with high-throughput nanopore sequencing. Nat. Methods 14, 411–413 (2017).
Liu, X. et al. Nanopore strand-specific mismatch enables de novo detection of bacterial DNA modifications. Genome Res. https://doi.org/10.1101/GR.279012.124 (2024).
Wang, X. et al. Recent advances and application of whole genome amplification in molecular diagnosis and medicine. MedComm 3, e116 (2022).
Huang, J. et al. DNA methylome regulates virulence and metabolism in Pseudomonas syringae. Elife 13, RP96290 (2024).
Ginibre, N. et al. Diverse roles for a conserved DNA-methyltransferase in the entomopathogenic bacterium Xenorhabdus. Int. J. Mol. Sci. 23, 11981 (2022).
Gao, Q. et al. Bacterial DNA methyltransferase: a key to the epigenetic world with lessons learned from proteobacteria. Front. Microbiol. 14, 1129437 (2023).
Tourancheau, A., Mead, E. A., Zhang, X. S. & Fang, G. Discovering multiple types of DNA methylation from bacteria and microbiome using nanopore sequencing. Nat. Methods 18, 491 (2021).
Clark, T. A. et al. Characterization of DNA methyltransferase specificities using single-molecule, real-time DNA sequencing. Nucleic Acids Res. 40, e29 (2012).
O’Brown, Z. K. & Greer, E. L. N6-methyladenine: a conserved and dynamic DNA mark. Adv. Exp. Med Biol. 945, 213 (2016).
Li, X. et al. Quantitative profiling of DNA 6mA at single-base resolution using NAME-seq. https://doi.org/10.21203/RS.3.RS-2286301/V1 (2022).
Chen, L. Q. et al. High-precision mapping reveals rare N6-deoxyadenosine methylation in the mammalian genome. Cell Discov. 8, 1–11 (2022).
Liu, Q., Georgieva, D. C., Egli, D. & Wang, K. NanoMod: a computational tool to detect DNA modifications using Nanopore long-read sequencing data. BMC Genom. 20, 31–42 (2019).
Chicco, D. Ten quick tips for machine learning in computational biology. BioData Min. 10, 1–17 (2017).
Simpson, J. T. et al. Detecting DNA cytosine methylation using nanopore sequencing. Nat. Methods 14, 407–410 (2017).
Bai, X. et al. DeepBAM: a high-accuracy single-molecule CpG methylation detection tool for Oxford nanopore sequencing. Brief Bioinform 25, bbae413 (2024).
Lou, R., Lalevic, D., Chambers, C., Zafar, H. M. & Cook, T. S. Automated detection of radiology reports that require follow-up imaging using natural language processing feature engineering and machine learning classification. J. Digit. Imaging 33, 131–136 (2020).
Zhang, Y. et al. Development of an automated screening system for retinopathy of prematurity using a deep neural network for wide-angle retinal images. IEEE Access 7, 10232–10241 (2019).
Doberenz, S. et al. Identification of a Pseudomonas aeruginosa PAO1 DNA methyltransferase, its targets, and physiological roles. mBio 8, e02312-16 (2017).
Fu, Y. et al. N6-methyldeoxyadenosine marks active transcription start sites in Chlamydomonas. Cell 161, 879 (2015).
Guo, Z. et al. Nanopore Current Events Magnifier (nanoCEM): a novel tool for visualizing current events at modification sites of nanopore sequencing. NAR Genom. Bioinform. 6, lqae052 (2024).
De La Cerda, G. Y. et al. Balancing read length and sequencing depth: Optimizing Nanopore long‐read sequencing for monocots with an emphasis on the Liliales. Appl. Plant Sci. 11, e11524 (2023).
Urel, H. et al. Nanopore- and AI-empowered metagenomic viability inference. bioRxiv 2024.06.10.598221 https://doi.org/10.1101/2024.06.10.598221 (2024).
Blow, M. J. et al. The epigenomic landscape of prokaryotes. PLoS Genet. 12, e1005854 (2016).
O’Brown, Z. K. et al. Sources of artifact in measurements of 6mA and 4mC abundance in eukaryotic genomic DNA. BMC Genom. 20, 1–15 (2019).
Feng, X. et al. Sequencing of N6-methyl-deoxyadenosine at single-base resolution across the mammalian genome. Mol. Cell 84, 596–610.e6 (2024).
Wang, M. X. & Church, G. M. A whole genome approach to in vivo DNA-protein interactions in E. coli. Nature 360, 606–610 (1992).
Zhang, Y. Z. et al. On the application of BERT models for nanopore methylation detection. Proceedings—2021 IEEE International Conference on Bioinformatics and Biomedicine BIBM 320–327. https://doi.org/10.1109/BIBM52615.2021.9669841 (2021).
Ni, P. et al. DeepSignal: detecting DNA methylation state from Nanopore sequencing reads using deep-learning. Bioinformatics 35, 4586–4595 (2019).
Bonet, J. et al. DeepMP: a deep learning tool to detect DNA base modifications on Nanopore sequencing data. Bioinformatics 38, 1235–1243 (2022).
Liu, Y. et al. DNA methylation-calling tools for Oxford Nanopore sequencing: a survey and human epigenome-wide evaluation. Genome Biol. 22, 1–33 (2021).
Rang, F. J., Kloosterman, W. P. & de Ridder, J. From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy. Genome Biol. 19, 1–11 (2018).
Wang, L., Qu, L., Yang, L., Wang, Y. & Zhu, H. NanoReviser: an error-correction tool for nanopore sequencing based on a deep learning algorithm. Front. Genet. 11, 900 (2020).
Stone, C. J., Boyer, G. F. & Behringer, M. G. Differential adenine methylation analysis reveals increased variability in 6mA in the absence of methyl-directed mismatch repair. mBio 14, e01289–23 (2023).
Li, L. et al. The epigenetic modification of DNA methylation in neurological diseases. Front. Immunol. 15, 1401962 (2024).
Li, X. et al. NT-seq: a chemical-based sequencing method for genomic methylome profiling. Genome Biol. 23, 1–20 (2022).
Rainio, O., Teuho, J. & Klén, R. Evaluation metrics and statistical tests for machine learning. Sci. Rep. 14, 1–14 (2024).
Christen, P., Hand, D. J. & Kirielle, N. A review of the F-measure: its history, properties, criticism, and alternatives. ACM Comput. Surv. 56, 1–24 (2023).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, 1–4 (2021).
Shen, W., Le, S., Li, Y. & Hu, F. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS ONE 11, e0163962 (2016).
Liu, X. et al. Giraffe: a tool for comprehensive processing and visualization of multiple long-read sequencing data. Comput. Struct. Biotechnol. J. 23, 3241–3246 (2024).
Yu, M. et al. SnapHiC: a computational pipeline to identify chromatin loops from single-cell Hi-C data. Nat. Methods 18, 1056–1059 (2021).
Wang, F. et al. A lightweight framework for chromatin loop detection at the single-cell level. Adv. Sci. 10, 2303502 (2023).
Deng, X. & Lu, B. Comprehensive comparison of the third-generation sequencing tools for bacterial 6mA profiling. https://doi.org/10.5281/ZENODO.15165726 (2025).
LucileLu, GuoZhihao, lab, D. & Raymond. LuBeifang/TGS-comparison-for-bacteria-6mA: v0.1.1. https://doi.org/10.5281/ZENODO.15171867 (2025).
Acknowledgements
This study was supported by grants from Theme-based Research Scheme (T11-104/22R, recipient: X.D.), Guangdong Major Project of Basic and Applied Basic Research (2020B0301030005, recipient: X.D.), the National Natural Science Foundation of China (32172358, recipient: X.D.), and General Research Funds of Hong Kong (11103221, recipient: X.D.; 11102223, recipient: X.D.; 11101722, recipient: X.D.). The funders were not involved in the study design, data collection, data interpretation, or the decision to submit the work for publication.
Author information
Authors and Affiliations
Contributions
B.L., R.L., and X.D. conceived and supervised the project. B.L. and Z.G. designed experiments and prepared figures and tables. B.L., Y.N., T.F., J.H., and T.L. carried out experiments. B.L., Z.G., X.L., and L.X. analyzed data. B.L., Z.G., X.L., R.L., and X.D. wrote and revised the manuscript. All the authors read the paper and agreed with the final version.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Guan-Zheng Luo, Chuan-Le Xiao, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Lu, B., Guo, Z., Liu, X. et al. Comprehensive comparison of the third-generation sequencing tools for bacterial 6mA profiling. Nat Commun 16, 3982 (2025). https://doi.org/10.1038/s41467-025-59187-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-59187-2