Comprehensive mapping of RNA modification dynamics and crosstalk via deep learning and nanopore direct RNA-sequencing

Dong, Han; Gao, Yongsheng; Cai, Zhengyi; Li, Yi; Li, Xing; Zhao, Fangqing; Zhang, Jinyang

doi:10.1038/s41467-026-68419-y

Download PDF

Article
Open access
Published: 14 January 2026

Comprehensive mapping of RNA modification dynamics and crosstalk via deep learning and nanopore direct RNA-sequencing

Nature Communications volume 17, Article number: 1722 (2026) Cite this article

10k Accesses
1 Citations
12 Altmetric
Metrics details

Subjects

Abstract

Despite the extensive studies of individual RNA modifications, the lack of methods to detect multiple modification types simultaneously has left the global epitranscriptomic landscape and its underlying crosstalk largely unexplored. Here, we present ORCA (Omni-RNA modification Characterization and Annotation), a deep learning framework that enables comprehensive mapping of RNA modification landscape using nanopore direct RNA sequencing. ORCA employs domain adversarial learning to detect and quantify a wide range of modifications by leveraging mixed stoichiometry-driven signal and sequence variability between modified and unmodified nucleotides. It also incorporates a transfer learning module for accurate annotation of modification types with minimal prior knowledge. Applying ORCA to multiple human cell lines reveals widespread, isoform-specific modification patterns, as well as intricate cooperative and competitive interactions among neighboring modification sites. This approach substantially expands the repertoire of known RNA modification sites and elucidates their spatial organization, revealing the emerging roles of RNA modifications in splicing regulation. ORCA thus provides an unbiased and generalizable framework for decoding RNA modification dynamics and their regulatory complexity across diverse biological contexts.

Transfer learning enables identification of multiple types of RNA modifications using nanopore direct RNA sequencing

Article Open access 14 May 2024

Mapping human pre-rRNA processing and modification at single nucleotide resolution using long read nanopore sequencing

Article Open access 31 March 2026

RNA modifications detection by comparative Nanopore direct RNA sequencing

Article Open access 10 December 2021

Introduction

RNA modifications represent a complex and dynamic layer of post-transcriptional regulation, with over 170 distinct chemical marks that regulate RNA stability¹, splicing^2,3, translation^4,5, and subcellular localization⁶. While the functions of individual modification such as N6-methyladenosine (m6A), pseudouridine (Ψ), and 5-methylcytosine (m5C) have been extensively studied^7,8, the combinatorial effects and crosstalk among different RNA modifications remain largely unexplored. Recent studies have revealed the coordinated roles for m6A and Ψ in modulating translation⁹, and a synergistic co-occurrence of m6A and m5C in plants under salt stress¹⁰, highlighting the emerging role of interaction between different modifications. Deciphering these interactions is crucial for understanding the multilayered regulatory mechanisms governing RNA biogenesis and function. However, progress in this area has been hindered by the lack of transcriptome-wide tools capable of simultaneously detecting and analyzing diverse RNA modifications and their interactions, limiting our ability to decode the full regulatory potential of the epitranscriptome.

Recent Illumina-based approaches using immunoprecipitation^11,12,13 or chemical treatment^14,15,16,17 have enabled transcriptome-wide profiling of individual RNA modification types, but are unable to capture the global epitranscriptomic landscape simultaneously⁸. Nanopore direct RNA sequencing (DRS) overcomes this limitation by directly sequencing native RNA molecules and recording ionic current signals that reflect each nucleotide’s chemical structure^18,19. These inherent signals produce distinct ionic signal profiles and basecalling differences between modified and unmodified bases, encoding rich information about various RNA modifications within single molecules²⁰. However, most existing DRS-based tools are either trained on in vitro synthesized datasets that are restricted to a few well-characterized modifications such as m6A^{9,10,21,22,23,24,25}, m5C^10,23,26 and Ψ^9,10,20,27, or rely on comparative analyses to detect condition-specific modification changes^21,23,28,29. Both strategies are unable to resolve the full spectrum of endogenous RNA modifications or reveal their complex interactions. While a few attempts have been made to simultaneously identify multiple RNA modifications and their associations^9,10, these models remain constrained by the narrow scope of in vitro synthesized modification types, which limits their generalizability to unseen or uncharacterized modifications. Thus, the systematic and unbiased characterization of the full epitranscriptomic landscape and its underlying regulatory crosstalk remains a fundamental challenge.

To address these limitations, we present ORCA (Omni-RNA modification Characterization and Annotation), a deep-learning framework for comprehensive profiling of RNA modifications and their interactions at isoform and single-molecule resolution. ORCA employs an adversarial learning strategy to capture both signal- and sequence-level variations arising from the mixed stoichiometry of modified and unmodified nucleotides, thereby overcoming the limited detection scope of current DRS-based approaches. Extensive benchmarking shows that ORCA serves as a powerful tool for unbiased detection and stoichiometric quantification of RNA modifications, even for modification types absent from the training data, demonstrating ORCA’s broad generalizability across diverse modification types. Applying ORCA to human cell lines, we expand the known repertoire of RNA modification sites and uncover the widespread interactions among different modifications across transcript isoforms. Notably, ORCA reveal intricate cooperative and competitive relationships between neighboring modification sites, suggesting the complex crosstalk between RNA modifications and splicing regulation. Collectively, ORCA provides a robust and versatile approach for mapping the full spectrum of RNA modifications, revealing the regulatory complexity and isoform-specific crosstalk in the eukaryotic epitranscriptome.

Results

Deep-learning-based detection and annotation of various RNA modifications from direct RNA-seq data

To enable generalized detection of diverse RNA modifications from nanopore direct RNA-seq data, we developed a deep-learning framework (ORCA) to systematically identify multiple RNA modification types. Briefly, ORCA first aggregates the raw current signals and basecalled sequences from all reads aligned to a given genomic region, focusing on a 9-nucleotide window centered on each candidate site (Fig.1a and “Methods”). Since RNA modifications exhibit mixed stoichiometry^17,30, where not all copies of a given base are modified, modified positions should be characterized by elevated skewness in signal intensity distributions and increased basecalling errors (Supplementary Fig. 1). Thus, ORCA employs these polymorphic features to detect the presence of RNA modifications across the transcriptome. Afterward, ORCA integrates prior knowledge from established RNA modification databases for effective annotation of a wide range of modification types.

**Fig. 1: Overview of the ORCA framework for RNA modification characterization and annotation.**

To accurately predict the presence of RNA modifications based on signal- and base-level features, we first constructed a robust and diverse training set comprising six types of RNA modifications from the in vitro synthetic ELIGOS sequences³¹ (Fig. 1b). Synthetic transcripts containing one of the six modified bases (m6A, m5C, Ψ, m1A, hm5C, and 5fC) or four canonical bases were randomly sampled and combined to simulate varying stoichiometries and sequencing depths (“Methods”). To mitigate k-mer bias from the limited sequence diversity of the ELIGOS sequences¹⁰, raw sequence or absolute current levels features were deliberately excluded to ensure that these features could represent generalized modification status without sequence preference. In total, over 7,000,000 sites were generated, with positions containing >10% modified transcripts designated as the positive set¹⁵. To develop a generalized model capable of accurately predicting diverse RNA modifications without being restricted to specific types, we implemented a domain adversarial learning framework³² (Fig. 1b). Here, a feature encoder comprising two LSTM layers that process the sequence in opposite directions was trained to capture contextual and sequential features and predict modification presence (modScore) and stoichiometry. Notably, a domain classifier was adversarially trained to minimize the models’ ability to discriminate between different modifications using the encoder’s output (“Methods”). This adversarial training strategy forced the encoder to learn generalized features that are shared across modifications, ensuring robust representation of modification status beyond training types while suppressing overfitting to the modification types used for training.

Considering occurrence of the same RNA modification across different transcriptomic positions often share conserved sequence contexts or signal patterns^15,33, we implemented a transfer-learning strategy for modification type annotation (Fig. 1c). First, an autoencoder was trained to project all predicted modification sites into a low-dimensional embedding space using both signal- and base-level features, as well as k-mer frequency profiles that capture motif preference of modification sites. These modification sites were then provisionally annotated using public RNA modification databases (RMBase v3.0³⁴ and DirectRMDB³⁵). Subsequently, the model was fine-tuned to predict the type of annotated modification sites, with unannotated sites randomly sampled as negative controls to reject low-confidence predictions and suppress false discoveries. Finally, the classifier’s predictions were transferred to all unannotated sites, enabling comprehensive and rigorous identification of unannotated RNA modification sites while minimizing dependence on pre-labeled training data. This framework ensures that ORCA can achieve stringent modification sites identification and annotation, with inherent flexibility to integrate new modification types with the emerging updates of RNA modification resources^36,37.

Performance evaluation of RNA modification detection

To evaluate ORCA’s performance in detecting various RNA modifications, we first assessed its sensitivity and accuracy using the synthetic ELIGOS dataset³¹. Training datasets for six in vitro synthesized modifications were generated as described above, and 5-fold cross-validation was applied to evaluate prediction accuracy across modification types. As shown in Fig. 2a and Supplementary Fig. 2a, ORCA achieved high recall and precision across all six modifications, with an average area under the precision–recall curve (AUPRC) of 0.95 and average area under the receiver operating characteristic curve (AUROC) of 0.94. To quantify the overall performance, we further calculated the F1-score, which balances sensitivity and false discovery rate (FDR) (Fig. 2b). Across all modification types, ORCA consistently attained high F1-scores (0.971-0.976), reflecting its reliable and accurate detection capability. Given that modified sites represent only a small fraction of the transcriptome, we further estimated the false discovery rate of ORCA using an in vitro transcribed (IVT) human mRNA transcriptome³¹ devoid of endogenous modifications. As shown in Fig. 2c, ORCA exhibited a low false discovery rate of 2.25% using the default threshold (modScore >0.9), demonstrating superior false discovery scores compared with multi-modification detection tools and comparable performance relative to several modification-specific methods (Supplementary Fig. 2b). In addition, ORCA demonstrated robust stoichiometry prediction (Supplementary Fig. 2c), showing its ability to accurately quantify diverse RNA modifications using the integrated signal- and base-level features, Together, these results suggested that ORCA provides accurate and unbiased prediction of multiple RNA modification types.

**Fig. 2: Evaluation of the modification prediction performance of ORCA.**

To assess the performance of ORCA in real-world transcriptomes, we benchmarked its ability to detect m6A modifications using DRS data from Mettl3 knockout (KO) and wild-type (WT) mouse embryonic stem cells³¹. We applied ORCA to predict modification sites in individual samples and analyzed site-specific differential modification levels after Mettl3 knockout (Methods). As shown in Fig. 2d, Mettl3-KO cells exhibited a significant global reduction in RNA modification, with 17.58% of modification sites showing >0.2 stoichiometric reduction. In contrast, only 5.78% of sites retained increased modification stoichiometries, consistent with Mettl3’s role as a primary m6A methyltransferase. For comparison, we evaluated both typical m6A-specific models (CHEUI-solo²³, EpiNano-SVM³¹, m6Anet²², and TandemMod¹⁰) and comparative-based methods (CHEUI-diff²³, EpiNano-Error²¹, Nanocompore²⁹, and xPore²⁸) for detecting differential m6A sites. The m6A sites identified by miCLIP2¹² and GLORI¹⁵ were collected as ground-truth benchmarks. To ensure a fair comparison between m6A-specific and comparative-based tools, the performance of m6A prediction was evaluated at both single-base level (only adenosine within DRACH motifs was considered modified) and 5-mer level (all nucleotides within DRACH motifs were treated as modified), respectively³⁸. ORCA achieved an AUPRC of 0.42 at the single-base level (Supplementary Fig. 2d) and 0.43 at the 5-mer level (Fig. 2e), matching the performance of state-of-the-art m6A-specific and comparative-based algorithms. Among these top differentially modified sites (ranked by change of modScore), ORCA exhibited the highest proportion of sites overlapping DRACH motifs or modified 5-mers from m6A sequencing methods (Fig. 2f), confirming its high accuracy in identifying the biologically relevant m6A modifications.

To further validate the versatility of ORCA, we applied it to detect m5C modifications in HeLa cells following NSUN2 knockout²³, and then evaluated the performance of ORCA and other tools against m5C sites reported by BS-seq³⁹, bsRNA-seq⁴⁰, and RNA-BisSeq⁴¹. For each tool, changes in predicted stoichiometry were then calculated, and a chi-squared test was employed to measure the reduction in m5C sites. Among the m5C sites that consistently detected across two biological replicates, ORCA identified a higher proportion of downregulated sites than most existing tools (Fig. 2g). While CHEUI-diff reported a marginally higher fraction of downregulated sites, ORCA uncovered a significantly larger absolute number of m5C sites with statistically significant downregulation (Fig. 2h and Supplementary Fig. 2e), indicating its strong ability to detect NSUN2-dependent m5C alterations. Beyond m5C, we further tested ORCA’s ability to detect Ψ modifications in the ribosomal RNA dataset²⁷. Compared to established tools (Tombo⁴², TandemMod¹⁰, and NanoPSU²⁷), ORCA identified a comparable high number of validated Ψ sites (73/89) to TandemMod (75/89, Fig. 2i) and also showed strong orthogonal overlap with each approach (Supplementary Fig. 2f), suggesting its accuracy in Ψ detection. Collectively, these results demonstrated ORCA’s capacity as an effective framework for detecting a wide range of RNA modifications across different experimental conditions and modification types, including m6A, m5C and Ψ, and supported its utility for accurate and robust transcriptome-wide RNA modification discovery superior to canonical modification-specific and comparative-based tools.

ORCA enables zero-shot detection of unseen RNA modification types

A key limitation of current DRS-based RNA modification detection tools is their reliance on sophisticated training datasets, which typically derived from in vitro-synthesized transcripts¹⁰. However, many modification types are challenging to synthesize in vitro⁴³, limiting the development of these modification-specific models. To determine whether our strategy is applicable to modifications not included in the training set, we comprehensively benchmarked the performance of ORCA using three complementary approaches: (1) zero-shot prediction of unseen modifications using the ELIGOS dataset, (2) prediction of modifications absent in training set with reference RNA modification sequencing data, and (3) evaluation of ribosomal RNA modifications from mass spectrometry (MS)-based databases (Fig. 3a).

**Fig. 3: Evaluation of zero-shot detection capability for unseen RNA modification types.**

To systematically evaluate the performance of ORCA in detecting unseen RNA modifications, we first assessed the precision and recall in zero-shot prediction using synthetic ELIGOS sequences³¹. For each target modification, we iteratively excluded it from the training set and trained ORCA on different combinations of other modifications to evaluate ORCA’s ability to generalize from arbitrary subsets to unseen targets. For zero-shot prediction of target modifications the absent in training set, ORCA achieved high prediction accuracy ~90% across all modification types and maintained considerable recall rates >30% for most modifications (Fig. 3b). Both accuracy and recall increased with the inclusion of additional training modification types (Fig. 3b and Supplementary Fig. 3a), indicating that ORCA can effectively extract generalized cross-modification features. In addition, we further evaluated ORCA’s ability to estimate stoichiometry for previously unseen modifications. Strikingly, our adversarial learning framework accurately encapsulated stoichiometry in the zero-shot prediction of all six modification types, achieving a strong linear correlation (average Pearson’s correlation coefficient = 0.76) between predicted and ground-truth stoichiometries (Supplementary Fig. 3b). These results confirmed that ORCA can be effectively adapted to detect and quantify unseen RNA modifications without requiring prior training data for these modification types.

To further evaluate ORCA’s ability to predict unseen RNA modifications transcriptome-wide, we employed three different high-throughput RNA modification sequencing datasets to validate its applicability. First, we performed ONT direct RNA-seq and m6A-SAC-seq¹⁷ on the same mouse brain sample. To test ORCA’s capacity for zero-shot prediction, m6A was excluded from the training set to construct an m6A-absent model, which was then used for transcriptome-wide modification prediction. In total, 1000 m6A sites detected by m6A-SAC-seq were covered in the nanopore DRS data, of which 70.5% were confidently predicted as modified (modScore >0.9) by the ORCA model trained without m6A-specific data. Moreover, these identified m6A sites showed significantly higher modification probabilities than randomly sampled DRACH motifs (P < 1 × 10⁻³⁰⁸, Wilcoxon rank-sum test; Fig. 3c), confirming ORCA’s capacity to detect transcriptome-wide m6A without prior training on this modification.

Similarly, we further evaluated ORCA’s ability to detect 2’-O-methylation (Nm) and inosine (I) modifications which were not included in the training set. For Nm detection, we analyzed public DRS data from mESCs cells³¹ and benchmarked ORCA predictions against Nm sites identified by 2’-OMe-seq⁴⁴. As shown in Fig. 3d, ORCA accurately predicted Nm modifications, with 74.4% of reference Nm sites successfully identified, and an overall significant enrichment of reference Nm and random control sites was also observed among ORCA’s prediction (P = 1.30 × 10⁻²⁴, Wilcoxon rank-sum test). For inosine prediction, we employed public DRS data from wild-type and FY-ADAR2 yeast strains engineered to express human ADAR2, which introduce A-to-I editing in yeast without an endogenous ADAR system⁴⁵. A-to-I editing sites were identified as reference using Illumina RNA-Seq data from the same project (“Methods”), and differentially modified sites between WT and hADAR2-expressing yeast were identified using ORCA. As expected, reference inosine sites showed significantly higher modification scores in hADAR2 yeast, whereas no such patterns were observed for randomly sampled background adenosines (Fig. 3e). Taken together, these results indicated the ability of ORCA to detect previously unseen RNA modifications at the transcriptome scale, even in the absence of modification-specific training datasets.

Given that ribosomal RNA harbors a diverse array of RNA modifications—many of which are undetectable by either modification-specific or comparative-based tools (Fig. 3f), we employed human and yeast ribosomal RNA sequencing datasets to evaluate ORCA’s performance in predicting this broader spectrum of RNA modifications. Across human 28S and 18S rRNAs, ORCA successfully identified 78.1% of the 224 modification sites spanning 13 chemically distinct modification types (e.g., N4-acetylcytidine (ac4C) and m1acp3Ψ) supported by SILNAS mass spectrometry⁴⁶. In parallel, a low false discovery rate (10.0%) was observed for unmodified bases, indicating high specificity in distinguishing modified from unmodified sites. Similarly, ORCA achieved comparable performance on yeast 18S and 25S rRNAs, accurately predicting 79.8% of orthogonally validated modifications, further supporting its robustness in identifying a wide range of RNA modification types across species (Supplementary Fig. 3c, d).

Furthermore, we evaluated ORCA’s ability to detect non-natural 4-thiouridine (4sU) modifications. Applying ORCA to K562 4sU pulldown and DMSO control samples⁴⁷, ORCA detected significantly elevated modification scores for U-containg 5-mers in 4sU pulldown samples compared with non U-containing controls (P = 1.38 × 10⁻²⁰, one-sided Wilcoxon rank-sum test; Supplementary Fig. 4a). The predicted 4sU levels also strongly correlated with nascent RNA expression measured by orthogonal Illumina sequencing (P = 1.67 × 10⁻¹⁶, Supplementary Fig. 4b), demonstrating ORCA’s sensitivity to 4sU incorporation. Taken together, these results demonstrate that ORCA can overcome the limitations of modification-specific models, enabling transcriptome-wide discovery of unseen RNA modifications through its generalized adversarial training strategy.

ORCA uncovers the transcriptome-wide landscape of a broad range of modifications

The comprehensive identification of diverse RNA modifications is essential for understanding the post-transcriptional processing of RNAs. However, current DRS-based tools typically rely on modification-specific models that target a limited subset of modifications or utilize comparative strategies to detect changes between different condition pairs. These limitations make it challenging to profile a broad spectrum of RNA modifications in individual samples. To further demonstrate ORCA’s ability to simultaneously detect multiple RNA modifications per sample, we applied it to the previously described Mettl3-KO mESCs dataset. All predicted modification sites were ranked by modScore and compared against known modification sites in the RMBase 3.0 database³⁴, as well as sites predicted by modification-specific tools. Among the top-ranked predictions, over 40% were supported by either a public database or ONT-based RNA modification detection tools (Fig. 4a), indicating the high reliability of ORCA’s prediction. Specifically, a comprehensive catalog of well-characterized modifications including m6A, m5C, inosine (I), Ψ, m7G, m1A and 2′-O-methylation (Nm) were detected within the top 10,000 predicted sites (Fig. 4b). For example, 41.9% and 30.9% of predicted sites in mESCs and HeLa cells were supported by m6A sites, while 16.1% and 5.2% of sites were supported by Ψ modifications. This high concordance underscored ORCA’s ability to accurately resolve multiple RNA modifications in a single analysis.

**Fig. 4: Simultaneous detection of different RNA modification types in HeLa and mESC transcriptome.**

For instance, ORCA identified adjacent modification sites in the 3’ UTR of the Lars2 transcript, where one Nm, two m5C, and five Ψ modification sites were consistently detected in both WT and Mettl3 KO mESCs. Notably, a previously unannotated site exhibited a dramatic reduction in modScore upon Mettl3 knockout, suggesting it may represent an unseen m6A site (Fig. 4c). Similarly, two m6A sites in the 3’ UTR of Ets2 transcript were robustly detected in WT mESCs but were absent in Mettl3 KO samples, consistent with substantial alterations in both sequence- and signal-level features following loss of the key m6A methyltransferase complex (Fig. 4d). Moreover, these predicted sites were also consistently supported by both modification-specific and comparative-based algorithms. We further extended this analysis to WT and NSUN2 KO HeLa cells. For exemplary modification sites in the 3’ UTR of RPL13A and in the small nuclear RNA RNA5-8SN1, ORCA revealed a selective reduction in m5C, but not in m6A or Ψ levels, consistent with the specificity of NSUN2 as an m5C methyltransferase⁴¹ (Supplementary Fig. 5). Taken together, these results demonstrated that ORCA enables the simultaneous detection of diverse RNA modifications while accurately resolves biologically relevant stoichiometry changes upon perturbation of specific modification writers.

Transfer learning of sequence and signal features enables accurate discovery of previously unannotated RNA modification sites

Based on accurate modification presence prediction, we further developed a transfer learning framework to annotate modification types using both sequence- and signal-level features of high-confidence sites curated from public databases (Fig. 5a and “Methods”). In addition to the features used for modification presence prediction, we incorporated k-mer occurrence frequencies to capture sequence similarity specific to each modification type. Given that chemical modifications affect current signals across a 5–6 nucleotide window as the strand passes through the nanopore⁴⁸, a multi-task learning model was implemented to simultaneously predict both modification types and positional phase. The prediction results were then filtered based on the consistency between predicted modification types and the corrected nucleotide phases, and only predictions where the modification types matched the corresponding nucleotide position were retained to ensure accurate modification assignment. To avoid over-assignment of uncharacterized modifications to known categories, unannotated sites were also sampled as negative controls during training to ensure stringent and reliable identification of known modifications. Finally, the trained model was transferred to predict modification types for previously unannotated sites, enabling the discovery of RNA modification sites absent from existing databases. Overall, this approach balances the accuracy of modification type prediction with the sensitivity to uncover epitranscriptomic features.

**Fig. 5: Annotation of modification sites with a transfer-learning strategy.**

To rigorously evaluate the performance of RNA modification annotation, ORCA was employed to present modification presence using a K562 DRS dataset from the SGNex project⁴⁹. A total of 48,377 sites across 7 RNA modification types were then annotated with high confidence NGS-supported sites from RMBase 3.0³⁴ and DirectRMDB³⁵, and the modification annotation model was trained as described previously (Fig. 5a). In cross-validation, ORCA achieved >90% precision and > 40% recall for most modification types (Fig. 5c, d), demonstrating robust accuracy and sensitivity. To further assess the specificity of modification type prediction, we iteratively masked each modification type during training and quantified the misclassification of masked sites into other categories. As shown in Fig. 5e and Supplementary Fig. 6a, ORCA maintained an average of >0.83 accuracy across all modification types. Moreover, ablation of the background negative-control class markedly increased false discovery (Supplementary Fig. 7), confirming that our strategy ensure accurate and stringent modification type prediction while minimize false positive assignments of unannotated modifications.

The trained model was subsequently applied to predict modification types across all unannotated modification sites. In total, 42,449 previously unannotated RNA modification sites were identified, whose gene body distributions closely consistent with those in the established databases (Fig. 5f). Notably, ORCA largely expanded the catalog of current modification sites, annotating 29% additional m6A sites and dramatically increasing the low-abundance modifications: >400% more m5C, Nm, Ψ and inosine, >1030% more m7G, and >178% more m1A sites compared to existing annotations (Fig. 5g). To validate these predictions, we then performed de novo motif analysis using XSTREME⁵⁰ on previously unannotated m5C sites. Two canonical m5C motifs CUCC (88.3% of ORCA-annotated sites) and CGGG (8.8% of ORCA-annotated sites) were identified, aligning with known NSUN6⁵¹ and NSUN2-dependent³⁹ m5C sites. For example, ORCA predicted an m5C site in the CDT1 3’ UTR that was absent in both RMBase 3.0 and DirectRMDB, but was independently validated by UBS-seq¹⁶ in HeLa cells (Fig.5i). Besides, ORCA also demonstrated high specificity for annotating m6A sites, with 26.9% sites supported by GLORI¹⁵, which was consistent with the validation rate of database-curated m6A sites (37.6%, Supplementary Fig. 6b). For other modifications, high validation rates were also observed for Ψ (BID-seq¹⁴), m7G (20% overlap with QKI CLIP-seq¹¹ peaks), and m1A (18.5% by m1A-seq¹³) (Supplementary Fig. 6c–e). Taken together, these results demonstrated the effectiveness of ORCA’s label-transfer learning strategy in discovering and annotating RNA modification sites that were previously unannotated in existing databases with high confidence.

Furthermore, we evaluated whether database composition introduces biases associated with modification detection technologies or cell line origins. Although cross-technology overlap for the four major modification types was generally limited (Supplementary Fig. 8a, b), ORCA’s annotation model trained on individual assays achieved high recall within orthogonal datasets, and performance further improved when complementary assays were incorporated while maintaining low FDR (Supplementary Fig. 8c, d). Consistent results were observed in cross-cell-line validation, where restricting training to a single cell line reduced annotation sensitivity but did not affect precision or FDR (Supplementary Fig. 9). Together, these results demonstrate that limited overlap across technologies or cell lines does not compromise annotation accuracy, and that integrating multiple public resources effectively mitigates technology-specific biases and improves sensitivity without sacrificing precision.

Characterization of RNA modification landscape and its regulatory crosstalk in human cell lines

To demonstrate the applicability of our method, we applied ORCA to characterize the RNA modification landscape in human cell lines DRS data from the SGNex project⁴⁹. In summary, a total of 98,586 sites were detected across all samples, with 10,954 modification sites per cell line. Notably, 70.2% of these sites were consistently detected in at least two cell lines (Fig. 6a), which is consistent with the reported stable m6A modifications shared across human cell lines²⁸. To further investigate the spatial associations between different modification types, we further calculated the genomic distances between adjacent modifications. Strikingly, a substantial proportion of modifications (33%) occurred within 20-nt of each other (Supplementary Fig. 10a). We therefore clustered proximal sites using a 20-nt window, yielding 13,633 clusters with an average of 2.85 modification per cluster (Fig. 6b, c). As m6A was the most abundantly detected modification, most clusters were m6A-enriched, while a high degree of association between m6A and other modifications, such as m5C and m1A, was also observed, highlighting the complex spatial organization and potential crosstalk among neighboring modification types (Fig. 6d).

**Fig. 6: RNA modification landscape and interplay of neighboring RNA modification sites across human cell lines.**

To further investigate the regulatory interplay between these neighboring RNA modifications, we applied an expectation-maximization (EM)-based model to estimate the single-molecule co-occurrence patterns within modification clusters (Fig. 6e and “Methods”). Among 443,361 modification clusters, 7719 exhibited significant co-modification, while 39,906 were exclusively modified with competitive exclusion (Supplementary Fig. 6b). First, cooperative modified clusters were prioritized for downstream analysis. As shown in Fig. 6f, frequent co-occurrence of different modifications was substantially observed, with m5C and m6A emerging as the most prevalent combinatorial pattern. To further validate these predictions, we leveraged m6A-SAC-seq and m5C-TAC-seq⁵² datasets to extract short-read level co-modification evidence. For instance, a cooperative modification of a pair of m6A sites spaced 8 nucleotides apart in the 3’ UTR of DNAJB1 was identified in IM95 DRS data and independently confirmed in the HeLa m6A-SAC-seq data (Fig. 6g). Similarly, two co-occurring adjacent m5C sites in the 3’UTR of HDGF were detected in Hct116 DRS data, which was also validated by HeLa m5C-TAC-seq (Supplementary Fig. 6c). These results demonstrated the cooperative modification between different types and also validated accuracy of ORCA in resolving spatial co-existence of neighboring RNA modifications at single-molecule resolution.

To investigate the interplay between RNA modifications and splicing regulation, we then focused on exclusively modified clusters that exhibited significant isoform-specific changes in K562 cells, and integrated ENCORE eCLIP-seq data⁵³ of K562 cells to assess regulatory associations. The binding patterns of RNA-binding proteins, including splicing factors and RNA modification-associated proteins (writers, erasers, and readers, WERs) within these modification clusters were further calculated. Notably, many isoform-specific modification clusters, particularly those associated with m6A, showed significant enrichment of splicing factors and modification-associated WERs, suggesting the widespread coupling between m6A modifications and alternative splicing events (Fig. 6h). For example, two splicing factors ELAVL1⁵⁴ and FMR1⁵⁵ were significantly enriched in isoform-specific m6A/m5C modification clusters, consistent with the FMR1’s preference for binding m6A-modified RNAs^56,57 (Fig. 6i, j). In one case, transcript-level analysis of RBIS revealed an isoform-specific exclusion pattern between neighboring m6A and m5C sites at exon 4 (Fig. 6k). m6A-modified reads were strongly associated with an upstream skipped exon, whereas the exon was consistently retained in m5C-modified reads. Furthermore, strong eCLIP-seq peaks for splicing factors MBNL1 and the RNA-binding protein FMR1 were also detected in the same region, consistent with their established roles in alternative splicing and m6A-mediated splicing regulation⁵⁶. Together, these findings demonstrate that ORCA enables systematic characterization of the interactions between RNA modifications and splicing, offering a powerful platform for dissecting the multilayered regulation of the eukaryotic transcriptome.

Discussion

In this study, we present a comprehensive computational framework for mapping the global RNA modification landscape and regulatory crosstalk using nanopore direct RNA sequencing data. ORCA employs deep-learning algorithms for unbiased and generalized detection of RNA modification presence and enables accurate modification-type annotation by incorporating prior knowledge of validated sites. Comprehensive evaluations demonstrated that ORCA reliably detects and quantifies previously uncharacterized modification sites and revealed its applicability in uncovering complex interactions between neighboring modifications and isoform-specific RNA modification regulation.

Comprehensive detection of the full spectrum of RNA modification is essential for understanding their roles in RNA biology and epitranscriptomic regulation^58,59. However, current high-throughput sequencing-based approaches rely on modification-specific antibodies or chemical reactivity, substantially limiting their generalizability^{11,12,13,14,15,16,17}. Despite rapid advancements in DRS-based algorithms for modification detection or modification-aware basecalling, existing methods remain constrained by their dependence on modification-specific training sets^{9,10,21,22,23,24,25,26,27,31}. Meanwhile, comparative profiling of nanopore direct RNA-seq data across different experimental conditions have also enabled identification of RNA modification changes without modification type limitation^21,28,29, but also overlook the unperturbed modifications thus restricting the analyses to condition-specific modification sites. Furthermore, modification types could only be inferred from the experimental setup, which risk bias due to complex interaction between modification⁶⁰.

To address this limitation, ORCA leverages the mixed stoichiometry nature of RNA modifications and detects their presence based on variability in signal- and sequence-level features arising from the co-existence of modified and unmodified bases. Specifically, an adversarial learning strategy is employed to ensure unbiased detection of diverse modifications by preventing modification-specific overfitting. Through comprehensive evaluation, we demonstrated that ORCA enables accurate zero-shot detection and quantification of various RNA modifications without requiring a corresponding training dataset. highlighting its broad applicability for profiling the transcriptome-wide RNA modification landscape. Furthermore, a transfer-learning-based annotation assigns modification types by aligning signal- and sequence-level features of identified sites with prior knowledge from validated databases, enabling accurate co-profiling of multiple modification types. As metabolic labeling and chemical-based sequencing techniques continue to evolve, ORCA can be further extended to incorporate these reference sites, facilitating transcriptome-wide characterization of emerging RNA modifications without requiring extensive synthesis of in vitro transcription experiments.

Nanopore-based full-length RNA sequencing approaches have been widely applied to resolve transcript isoform landscape across diver RNA classes^61,62,63. Beyond transcriptome-wide characterization of RNA modification sites, single-molecule RNA modification identification is critical for uncovering the underlying regulatory crosstalk between different modifications^10,22,23. ORCA incorporates an expectation-maximization (EM)-based model to infer single-molecule modification states and assess competitive or cooperative interactions among neighboring modification sites. Applied to human cell lines, ORCA substantially expanded the known catalog of RNA modification sites, increasing the number of both well-characterized m6A and other low-abundance modifications. Notably, ORCA revealed the widespread interplay between different RNA modifications and uncovered the potential regulatory crosstalk between splicing factors and modification-associated RNA-binding proteins in shaping isoform-specific modification patterns. These findings highlight ORCA as a powerful platform for dissecting the complex regulatory architecture of the RNA epitranscriptome at isoform and single-molecule resolution.

Recent studies have employed a deep-learning model for detecting multiple RNA modifications. In particular, TandemMod employs deep-learning models to identify multiple RNA modifications (including m6A, m1A and m5C) at the single-read level, and further incorporates transfer learning to predict additional modification such as m7G, hm5C and Ψ using limited training examples¹⁰. While this approach enables simultaneous detection of multiple RNA modification types, it still relies on IVT-derived training sites, which restricts its ability to capture the full RNA modification landscape. In contrast, ORCA leverages a domain-adversarial learning strategy to infer modification presence based on signal polymorphism, enabling the detection of a wide range of RNA modification types without requiring corresponding IVT training sets. However, this approach requires sufficient read depth and is less effective at low-coverage sites (<10 reads). Further work integrating both strategies may enable robust de novo modification detection at the single-molecule level.

In addition, recent advances in RNA-004 chemistry have largely improved ionic signal quality and basecalling accuracy⁶⁴. To assess ORCA’s compatibility with the new sequencing chemistry, we trained an RNA004-specific model using the IVT curlcake dataset⁶⁵ (Supplementary Fig. 11a). Compared with Dorado, ORCA achieved similarly high performance for the three basecallable modification types, while also maintaining high accuracy on the remaining four modification types that Dorado could not detect (Supplementary Fig. 11b, c). Furthermore, ORCA exhibited reliable zero-shot prediction performance for these modification, consistent with the results obtained on RNA002 datasets (Supplementary Fig. 11d). We additionally generated a mouse brain RNA004 dataset and compared de novo m6A predictions between RNA002 and RNA004 using corresponding non-m6A models. ORCA produced highly concordant m6A signals across chemistries (Supplementary Fig. 11e, f), demonstrating stable and robust de novo detection under RNA004 chemistry (Supplementary Table 1).

Despite these advantages, ORCA also faces several limitations. First, ORCA requires sufficient sequencing depth to robustly estimate modification-induced feature variability. Although its performance becomes largely insensitive to coverage beyond a certain threshold (Supplementary Fig. 12), reliable detection remains challenging at very low read depths (<10 reads) or when attempting single-read inference. In addition, the ELIGOS training dataset exhibited limited 9-mer diversity, which might introduce sequence composition biases and affect generalization. Cross-dataset evaluation using the in vitro transcribed epitranscriptome (IVET¹⁰) revealed that the IVET-derived model achieved superior cross-dataset prediction performance (Supplementary Fig. 13), indicating that greater sequence diversity in the training set improves ORCA’s prediction performance across diverse sequence contexts. Finally, each ELIGOS read contains only a single modification type, resulting in that no two modification types co-occur within the same read in the training dataset, which could impact the model’s performance to predict co-occurring modifications in very close proximity.

In summary, ORCA comprehensively captures the full RNA modification spectrum and reveals the widespread crosstalk between different modifications and splicing regulation. This framework enables unbiased profiling of RNA modifications without requiring extensive IVT training data, providing robust identification of various RNA within individual samples and detection of biologically relevant changes across experimental conditions. By facilitating simultaneous identification, quantification, and annotation of diverse RNA modifications at isoform and single-molecule resolution, ORCA uncovers the cooperative modification patterns among neighboring modification sites and highlights the potential regulatory role of adjacent RNA modifications and RBPs in isoform-specific splicing and modification dynamics. Overall, ORCA provides a powerful computational strategy towards the comprehensive elucidation of the RNAome, offering a foundation for understanding RNA biology at unprecedented resolution.

Methods

Animal experiments

All mice used in this study were adult C57BL/6 mice and were purchased from SiPeiFu Biotechnology. Two adult mice were used for brain tissue dissection, with one mouse used for RNA002 sequencing and the other for RNA004 sequencing. Animals were maintained under conventional specific pathogen-free conditions and a 12-h light/12-h dark cycle at 25 °C and 40–60% humidity.

RNA isolation and nanopore direct RNA sequencing

Total RNA was extracted from two healthy adult mice brain using TRIzol (Invitrogen) according to the manufacturer’s instructions. RNA integrity and quality were assessed using the Agilent 5200 Fragment Analyzer System. Nanopore direct RNA-seq library was prepared using the Direct RNA Sequencing Kit (SQK-RNA002) from Oxford Nanopore Technologies following the manufacturer’s protocol and sequenced on an R9.4.1 flow cell (FLO-MIN106D) using a MinION Mk1B device for 72 h. An adult mouse brain direct RNA-seq library was also generated using the SQK-RNA004 sequencing kit and sequenced according to the manufacturer’s instructions on an FLO-MIN004RA flow cell for 72 h.

m6A-SAC-seq and data analysis

For m6A-SAC-seq experiments, 1 μg of total RNA was subjected to ribosomal RNA depletion using the RiboErase kit (human/mouse/rat, Kapa Biosystems). The rRNA-depleted total RNA was used directly for m6A-SAC-seq library preparation following the protocol described by He et al.¹⁷. Briefly, m6A modifications were selectively converted into allyl-labeled derivatives by MjDim1, followed by iodine-induced intramolecular cyclization. These modifications were subsequently converted into sequence mutations during by HIV-1 RT reverse transcription and detected via Illumina sequencing.

Sequencing reads were trimmed using Cutadapt⁶⁶ (v2.10) and Fastp⁶⁷ (v0.23.4). Reads aligning to rRNA sequences were removed using Bowtie2⁶⁸ (v2.3.4.3) and Samtools⁶⁹ (v1.18). Cleaned reads were then mapped to the mm10 reference genome using STAR⁷⁰ (v2.7.10b). PCR duplicates were collapsed using UMICollapse⁷¹ (v1.0.0), and deduplicated BAM files from biological replicates were merged with Samtools. Strand-specific BAM files were generated and processed with Samtools mpileup. Somatic variants were called using VarScan⁷² (2.3.9), and candidate m6A sites were identified based on mutation profiles and the presence of DRACH motifs.

Nanopore data preprocessing and feature extraction

The GRCh38 (human) and GRCm38 (mouse) reference transcriptome were obtained from the Ensembl database. For RNA002 data, raw nanopore fast5 files were basecalled using Guppy (v6.3.8) with rna_r9.4.1_70bps_hac model. For RNA004 data, raw nanopore pod5 files were basecalled using Dorado (v1.1.1 + e72f1492) with rna004_130bps_hac@v5.2.0 model. POD5 files were transferred to BLOW5 format using bluecrab⁷³ (v0.4.0) p2s and slow5tools⁷⁴ (v0.8.0) merge commands. Basecalled reads were then aligned to the reference transcriptome using Minimap2⁷⁵ (v2.21, with the parameters “-ax splice -N 0 -uf -k14 --cs --secondary=no”). The alignment results were processed with samtools mpileup (v1.11) to generate per-base summary statistics. Ionic current signals were aligned to the reference sequence using the f5c⁷⁶ eventalign (v1.11), an accelerated implementation of Nanopolish⁷⁷, with the parameters ‘--min-mapq 0 --rna --signal-index --scale-events --secondary=no --collapse-events’ for RNA002 and ‘--pore RNA004 --min-mapq 0 --rna --signal-index --scale-events --secondary=no --collapse-events’ for RNA004 reads.

For modification presence prediction, both signal-level and sequence-level features were extracted within a ± 2 k-mer window surrounding each candidate site. For signal-level features, raw electrical events from the eventalign output were standardized using the method defined in Nanopolish. Specifically, each event’s mean signal level was normalized by subtracting the expected reference mean and then dividing by the reference standard deviation. This normalization accounts for variation in signal intensity across different sequence contexts. The standardized signal values from all reads aligned to the same genomic position were then aggregated and interpolated into a fixed-length vector of 50 values to ensure consistent input dimensions for the model. For sequence-level features, rate of insertions, deletions, and mismatches, as well as statistical metrics including the mean, median, and standard deviation of sequence quality scores of all aligned at each position were extracted based on the ‘samtools mpileup’ result.

For modification-type annotation, three categories of features were included as input for the transfer-learning model. K-mer occurrence features were derived from the frequency of all 256 possible 4-mer motifs within an 11-nucleotide window centered on each modification site. Besides, signal-level features were computed using the event-level means and standard deviations across the window surrounding each modification site. Then a Gaussian mixture model was applied to partition each feature into divide into modified and unmodified clusters, and the mean, variance, and covariance of each component were extracted as model input. Finally, sequence-level features were obtained using the same strategy as described above.

ORCA model design

ORCA comprises two neural network models designed for predicting the presence of RNA modifications and inferring their types. For modification presence prediction, ORCA adopts a domain-adversarial neural network architecture composed of an encoder and two classifier branches. The encoder utilizes a bidirectional LSTM network to capture contextual and sequential dependencies from both sequence- and signal-level features within a 11-nucleotide window surrounding each candidate site. The encoded representations are then simultaneously passed into two parallel branches: (1) a modification predictor for predicting modification presence (modScore) along with an estimate of stoichiometry; and (2) a domain classifier with a gradient reversal layer aims to distinguish between different RNA modification types. The model is trained adversarially to optimize the encoder and modification predictor to accurately detect modification presence, while the encoder is simultaneously trained to learn representations that minimize the performance of the domain classifier, ensuring a generalizable representation of modification presence across a diverse range of modification types without introducing modification-specific bias.

For modification type inference, ORCA employs a transfer-learning framework consisting of an autoencoder and two classifier modules. During the pretraining phase, the autoencoder learns the global low-dimensional representation of all predicted modification sites. The encoded features are subsequently passed to multi-task prediction to produce probabilities corresponding to different modification types, and the phase represents the exact modification position in the input window. Then, only modification type predictions that correspond to the base at the inferred modification position are retained as valid outputs.

Model training

To train the presence prediction model, we generated a labeled dataset based on the public ELIGOS resource. Specifically, six in vitro transcribed RNA samples containing individual modifications (m1A, m6A, m5C, hm5C, 5fC, and Ψ) and a control sample composed of only canonical bases were obtained. For each modification type, raw nanopore reads were aligned to the reference sequence using Minimap2. Then, modified and unmodified reads were randomly sampled and combined to generate 3,000 training samples at varying sequencing depths (10–200) and modification rates (0–1) of individual modification sites. The sequence- and signal-level features were extracted as described above, resulting in a training set that simulates the realistic features of RNA modification types and stoichiometry levels. In total, over 7 million training samples were generated. Positions with >10% modified reads were labeled as positive samples. For each modification type, the dataset was randomly split into training and testing set at 4:1 ratio. Model optimization was conducted using the AdamW optimizer with a learning rate of 0.0005. The loss functions were defined as follows: negative log-likelihood loss (NLLLoss) for modification presence prediction, NLLLoss for domain label prediction, and mean squared error (MSELoss) for stoichiometry regression.

To train the modification type annotation model, we utilized 25 direct RNA-seq datasets generated from the MinION/GridION platform, spanning nine human cell lines from the SGNex project⁴⁹. RNA modifications were first identified using the ORCA prediction module with a modScore threshold of 0.9 to ensure high-confidence predictions. Predicted modification sites were then annotated based on the presence of NGS-supported modifications from RMBase v3.0³⁴ and DirectRMDB³⁵ using a distance threshold of ±2 nucleotides. The autoencoder was initially trained on all predicted modification sites using the AdamW optimizer with a learning rate of 0.002 and MSELoss to learn global feature representations. The model was then fine-tuned using the annotated subset to enable prediction of both modification type and phase. These downstream tasks employed cross-entropy loss and were optimized with AdamW at the same learning rate. For each modification type, the labeled dataset was split into training and testing set in an 1:4 ratio. To mitigate class imbalance, each mini-batch was constructed to contain similar numbers of samples from each modification type, preventing the loss function from being dominated by any single class. To avoid over-assignment of uncharacterized modifications to known categories, we also included 3,000 unannotated sites as negative controls during training to ensure a stringent and reliable identification of known modification types.

EM-based prediction of modification interactions

ORCA outputs read-level modification predictions for neighboring modification pairs through the EM-based model, which enables assessment of read-level linkage or mutual exclusivity between modification events. After performing RNA modification presence and type prediction and across all human cell line samples, only high-confidence sites detected in at least two samples were retained for downstream analysis. To define local modification clusters, transcriptomic distances between neighboring modification sites were calculated, and sites within 20 nucleotides were iteratively merged into the same cluster. To minimize potential signal interference caused by adjacent modifications, any neighboring sites located within 4 nucleotides of each other were excluded from clustering.

To evaluate potential co-modification between modification sites within each cluster, we employed a Local Outlier Factor (LOF) score-based strategy to infer the read-level modification states. For each site, raw signal features including event level mean, standard deviation, and dwell time were extracted from a ± 5 k-mer (15 nucleotide) window. LOF scores were computed for each read and normalized to the range [0, 1]. For every pair of neighboring modification sites, each read was represented as a two-dimensional coordinate (x_i, y_i), where x_i and y_i are the normalized LOF scores at the two positions, respectively.

Given the site-specific distributions of LOF scores, we employed an expectation-maximization (EM) algorithm to classify each read into one of four canonical modification states, including: d₀ centered at (0,0) for unmodified reads; d₁ centered at (1,1) for dual modification; and d₂ and d₃ centered at (1,0) and (0,1), respectively, for single-site modifications. Each state d_j was assigned an initial mixing weight θ_j = 0.25. In the expectation step, the posterior probability that read r_i (x_i, y_i) belongs to state d_j was calculated as:

$${\gamma }_{{ij}}=\frac{{\theta }_{j}\cdot P({{r}_{i}{|d}}_{j})}{{\sum }_{k=0}^{3}{\theta }_{k}\cdot P({{r}_{i}{|d}}_{j})}$$

(1)

where the likelihood $P\left({r}_{i} | {d}_{j}\right)={Euclidean}\left({r}_{i},\,{d}_{j}\right)\times s({r}_{i},{d}_{j})$ was defined as the product of the Euclidean distance between r_i and the center of d_j, and a dispersion score $s({r}_{i},{d}_{j})$, such that:

$$\begin{array}{c}s\left({r}_{i},{d}_{0}\right)=s\left({r}_{i},{d}_{1}\right\}=1-|{x}_{i}-{y}_{i}|\\ s\left({r}_{i},{d}_{2}\right)=s\left({r}_{i},{d}_{3}\right\}=1-|{x}_{i}+{y}_{i}-1|\end{array}$$

(2)

This formulation captures both the geometric proximity of the read to a canonical modification state and the consistency of LOF across the two sites.

In the maximization step, the mixing weights were updated as the mean of the posterior probability across all N reads:

$${\theta }_{j}=\frac{1}{N}{\sum }_{i=1}^{N}{\gamma }_{{ij}}$$

(3)

The EM process was iteratively repeated until convergence, defined by the change in posterior weights falling below a predefined threshold. After convergence, each read was assigned to the modification state with the highest posterior probability ${\gamma }_{{ij}}$, enabling stratified analysis of co-modified, mutually exclusive, and unmodified read populations within each cluster.

To further quantify the interaction between each pair of modification sites, we defined a linkage score for each candidate pair as:

$${Linkage\; score}=({\theta }_{1}-{\theta }_{2})+({\theta }_{1}-{\theta }_{3})$$

(4)

A modification site was considered cooperatively modified if θ₁ > max (θ₂, θ₃), indicating an enrichment of simultaneously modified reads. Conversely, modification site pairs were considered as mutually exclusive if the linkage score was less than −0.2 and θ₁ <min (θ₂, θ₃), suggesting that the two modifications tend not to co-occur in the same read.

False positive rate evaluation

To evaluate the false discovery rate of ORCA, the IVT human mRNA transcriptome was downloaded from the SRA database (accession number SRP166020). ORCA was employed to predict the modification presence in the IVT transcriptome. Among 7,576,597 sites with coverage great than 10, where only 2.25% (170,708) of sites were predicted as modified using the default modScore threshold of 0.9. TandemMod, EpiNano-SVM, NanoPSU and m6ANet were also applied to the same IVT transcriptome. For each tool, the cumulative distribution function of prediction scores was calculated, and false discovery performance was measured using the area under the cumulative distribution curve.

Model benchmarking

To evaluate ORCA’s performance, we benchmarked it against several representative tools for direct RNA modification detection using publicly available datasets. For m6A prediction, we included m6Anet²² (v2.0.1), TandemMod¹⁰ (v1.1.0), EpiNano²¹ (v1.2), CHEUI²³ (v0.1), xPore²⁸ (v2.1), Nanocompore²⁹ (v1.0.4). For m5C detection CHEUI, TandemMod and xPore were also employed. For Ψ detection, Tombo (v1.5.1) and NanoPSU²⁷ (v1.0) were used. Benchmarking was mainly performed on three datasets: Mettl3 knockout and wild-type (KO/WT) mouse embryonic stem cell (mESC) samples for m6A detection (SRP166020), NSUN2 KO/WT HeLa samples for m5C (SRP393373) and a mixed rRNA sample for Ψ detection (SRP329477). All sequencing reads were aligned using Minimap2. Most tools utilized using transcriptome-based alignment, except for EpiNano, which required genome-based alignment. For signal-to-reference alignment, most tools employed the eventalign module for f5c⁷⁶ or Nanopolish⁷⁷ with tool-specific parameter configurations. However, TandemMod used “tombo resquiggle” command to map raw ionic signals to the basecalled sequences.

m6A detection in mESCs (Mettl3 KO/WT) samples

To benchmark m6A detection, we evaluated several tools using the Mettl3 knockout and wild-type mouse embryonic stem cell (mESC) sample. Experimentally detected m6A sites were obtained from the GEO database, using the union of GLORI¹⁵ (GSE210563) and miCLIP2¹² (GSE163491) datasets as ground-truth references.

For EpiNano (v1.2 SVM & Error mode), we first converted BAM files into TSV format using sam2tsv from jvarkit (v2023.09.07) and extracted basecalling error features with Epinano_Variants.py script. Then, Epinano_DiffErr.R and Epinano_Predict.py were used to identify differentiated modified sites across samples and to directly predict m6A modifications, respectively. For CHEUI (v0.1 diff & solo mode), feature extraction was performed using ‘CHEUI preprocess --m6A’, then followed by direct m6A prediction using CHEUI_predict_model1.py and CHEUI_predict_model2.py. In addition, signal-level differentiated analysis between samples was carried out using CHEUI_differentialRNAMod command. For m6Anet (v2.0.1), input data were preprocessed using “m6anet dataprep” command, and m6A sites were inferred with “m6anet inference”. For Nanocompore (v1.0.4), event-level signal alignments were first collapsed to the site level using “nanocompore eventalign_collapse” command, and sample comparisons were performed using “nanocompore samplcomp”. For TandemMod, raw signal features were extracted from Tombo-resquiggled fast5 files using extract_signal_from_fast5.py and extract_feature_from_signal.py, and modification predictions were performed using “TandemMod.py --run_mode predict”. For xPore (v2.1), signal-level features were extracted using the “xpore dataprep” command, and differential modification analysis was performed with “xpore diffmod”. For ORCA, both wild-type and knockout samples were processed using the modification presence prediction model.

To evaluate the m6A prediction performance, the difference in ORCA’s predicted modScore between paired samples was used to rank candidate m6A sites. The overlap between top differential modified m6A sites and the reference m6A dataset were further calculated. For tools that directly provide differential predictions, such as xPore and CHEUI-diff, the reported difference in modification rate was used as the ranking metric. For EpiNano-Error and Nanocompore, we used the delta_sum_err and P values, respectively, as provided in their outputs. To reduce tool-specific biases and ensure fair comparisons, transcriptome coordinates were converted to genomic coordinates, and only genomic sites reported in both samples were retained for downstream analysis. When multiple predictions are assigned to a single genomic site, the prediction with the highest score is selected to represent that position.

m5C detection in HeLa NSUN2 KO/WT samples

To evaluate the m5C detection performance, CHEUI, TandemMod, xPore, and ORCA were applied to NSUN2 knockout and wild-type HeLa cell samples using the same preprocessing and prediction workflow described above. Reference m5C sites identified from HeLa BS-seq³⁹ (GSE122260), bsRNA-seq⁴⁰ (GSE140995), and RNA-BisSeq⁴¹ (GSE93751) were directly downloaded from the GEO database and merged to construct a unified reference set. For comparison against WT and KO samples, the same strategy was applied here to ensure consistent genomic site-level comparisons, and only sites that were detected in both samples were retained for evaluation. The changes in predicted stoichiometry between WT and KO samples at m5C sites were calculated for comparison.

Ψ detection in mixed rRNA sequencing samples

For Ψ detection evaluation, we evaluated ORCA, Tombo, and NanoPSU using rRNA sequencing data. Reference Ψ sites supported by SILNAS-based mass spectrometry⁴⁶ were used as the ground-truth. Basecalled reads were aligned to ribosomal RNAs from four species following the procedure described in NanoPSU²⁷. For downstream comparison, only reads aligned to human 18S (NR_003286.4) and 28S (NR_003287.4) rRNA sequences were retained. For Tombo, Ψ sites were identified from resquiggled fast5 files using the ‘tombo detect_modifications’ command in de_novo detection mode. For NanoPSU, the recommended pipeline was applied, where alignment, remove_intron, extract_features, and prediction commands were subsequently performed for Ψ site prediction. For TandemMod, raw signal features were extracted from Tombo-resquiggled fast5 files using extract_signal_from_fast5.py and extract_feature_from_signal.py, and modification predictions were performed using ‘TandemMod.py --run_mode predict’ with the Ψ detection model.

Benchmarking of zero-shot modification detection

To evaluate ORCA’s zero-shot detection capability, training datasets were constructed by iteratively selecting different combinations of 2 to 5 RNA modification types from the full set of six RNA modifications (m6A, m5C, Ψ, m1A, hm5C, 5fC). For each combination, the selected modifications were used to train the ORCA modification presence prediction model as previously described. The trained models were then used to perform zero-shot prediction on the modification types excluded from training. Prediction accuracy and recall were calculated for each target modification type for evaluation. In addition, the Pearson correlation between predicted stoichiometry and simulated ground-truth were calculated to evaluate the generalizability of modification identification and quantification.

For zero-shot m6A prediction, an m6A-absent model was trained by excluding m6A from the training dataset. This model was then applied to direct RNA-seq data obtained from mouse brain tissue, and m6A sites detected by mouse brain m6A-SAC-Seq were used as a reference dataset. ORCA’s modScores were computed for all adenosine (A) sites located at the center of DRACH motifs across the transcriptome. The distribution of modScores between m6A-SAC-Seq-supported m6A sites and randomly selected background adenosines within DRACH motifs was compared using a two-sided Wilcoxon rank-sum test.

For Nm prediction, ORCA modScores were computed for sites supported by 2’-OMe-seq experiments⁴⁴ downloaded from RMBase v3.0³⁴. All other transcriptome sites with read coverage greater than 10 were used as background controls. The distributions of modScores between reference Nm sites and background transcriptomic sites were compared using a two-sided Wilcoxon rank-sum test.

For zero-shot prediction of inosine (A-to-I) RNA editing sites, a Schizosaccharomyces pombe direct RNA-seq dataset was downloaded from PRJEB46364. The ASM294v2 S. pombe genome and annotation were downloaded, and the recommended strategy described in DeepEdit⁴⁵ was followed to establish a high-confidence set of inosines (A-to-I) RNA editing sites. Illumina RNA-seq reads from two hADAR2⁺ samples and two control samples were downloaded and individually aligned using HISAT2⁷⁸ and processed with bcftools⁶⁹ mpileup for single-nucleotide variant calling. Candidate sites were defined as A-to-G substitutions supported by a minimum read coverage of 50 and a variant allele frequency exceeding 10%. Sites that consistently appeared in both hADAR2⁺ samples and were absent from any of the control samples were retained as the final ground-truth set for evaluation.

For the evaluation of rRNA modification predictions, high-confidence modification sites for human 18S/28S and yeast 18S/25S rRNAs were downloaded from a published SILNAS mass-spectrometry-based study⁴⁶. The mixed-species rRNA direct RNA-seq library was then analyzed, and ORCA-predicted modScores were benchmarked against the reference modification sites to calculate the true-positive rate and false-discovery rate.

For the evaluation of non-natural modifications, publicly available 4sU-labeled K562 direct RNA-seq data⁴⁷ and a matched DMSO control sample were downloaded and analyzed using ORCA. The change in modScore between the 4sU-labeled and control samples was compared between U-containing and non-U-containing 5-mers to measure the enrichment of modification signal at 4sU-incorporated sites. In addition, a transcript-level 4sU load was defined as the sum of ORCA-estimated modified counts across U-centered sites per transcript, and was compared with transcript abundance in matched short-read 4sU pulldown RNA-seq data using Spearman correlation to evaluate concordance between ORCA-derived 4sU signals and orthogonal measurements of 4sU incorporation.

Benchmarking of simultaneous prediction of multiple RNA modification types

For both mESCs and HeLa datasets, RNA modification sites were independently predicted from two WT replicates. Only sites commonly identified in both replicates were retained, and mean modScore across replicates was calculated and used for site ranking. To establish a reference modification set, predicted modification sites were compared against public databases, including DirectRMDB³⁵ and RMBase v3.0³⁴, as well as orthogonal long-read-based prediction tools: m6Anet (m6A), EpiNano-SVM (m6A), CHEUI-solo (m6A & m5C), and NanoPSU (Ψ). To exclude SNV interference, predicted modification sites with unusually high mutation rates of insertion, deletion, or mismatch exceeding 0.5 were excluded from the prediction results. Then, the top 10,000 ranked sites were retained for downstream analysis.

Benchmarking of modification type annotation

To evaluate the performance of modification-type annotation, the K562 DRS dataset (K562_replicate6_run1) from the SGNex project was first employed for modification site prediction. A total of 48,377 high-confidence modification sites were annotated using NGS-supported sites from RMBase and DirectRMDB. The modification annotation model was trained as described above using a fivefold cross-validation. The annotation precision was measured by dividing the number of each correct modification annotation by the number assigned to it and other types. The recall rate was measured by dividing the number of correct modification annotations by the total number of modification sites in the validation site. The final trained model was applied to unannotated candidate sites for modification type inference, and prediction results were subsequently filtered based on the correspondence between the predicted modification type and the reference nucleotide. Specifically, an additional filtering step was applied for m6A prediction to retain only sites located within the consensus DRACH motif.

To evaluate the false discovery rate (FDR), each modification type was iteratively excluded from the training dataset. The trained model was then used to assess the fraction of excluded modification types incorrectly predicted as one of the included types. The number of such misclassified sites was used to quantify the FDR for each modification type.

For modification annotation validation, de novo motif analysis of ORCA-annotated m5C sites was performed using XSTREME⁵⁰. For m7G validation, the QKI-CLIP peaks were downloaded from GSE193039, and intersected with the annotated m7G sites. For m1A validation, peaks identified by m1A-Seq (GSE70485) were downloaded, and each peak was expanded by ±150 nt around the center before intersecting with predicted m1A sites. For Ψ validation, single-nucleotide resolution Ψ sites from BID-seq (GSE179798) were downloaded and compared against predicted Ψ, non-Ψ sites, and unmodified controls. For m6A validation, m6A sites detected by GLORI were downloaded from GSE210563 and compared against public databases and ORCA-annotated m6A sites. To account for potential influence in neighboring nucleotides, coordinates from BID-seq and GLORI were converted to 5-mer regions prior to comparison.

Exclusive modification sites identification and RBP enrichment analysis

To quantify transcript-level variation in exclusively modified sites, each pair of unique genomic sites was treated as an individual unit of analysis. To focus on isoform-specific modifications, only modification pairs that were supported by reads spanning more than one transcript isoform were retained. For each modification pair, an n × 3 contingency table was constructed, where n represents the number of transcript isoforms, and the three columns correspond to the number of reads assigned to each distribution, including d₁ (simultaneous modification at both sites), d₂ (modification at the upstream site only), and d₃ (modification at the downstream site only). A chi-square test was then applied to assess variation in modification patterns across different transcript isoforms. Resulting P values were used to rank the site pairs by isoform-specific modification heterogeneity.

To explore potential regulatory mechanisms underlying these isoform-specific modification patterns, the binding of RNA-binding proteins (RBPs) near these modification sites was analyzed. Genomic binding profiles from eCLIP-seq experiments for 139 RBPs in K562 cells were obtained from the ENCODE project in bedGraph format. All isoform-specific modification pairs were ranked by statistical significance, and enrichment of RBP binding around the top-ranked site pairs was assessed using a hypergeometric test. RBPs with P values ≤ 0.01 were considered significantly enriched. To further investigate the connection between isoform-specific modifications and splicing regulation, we performed an in-depth analysis using a curated list of 21 splicing factors from the SpliceAidF⁷⁹ database and 7 known RNA modification regulators, including writers (RBM15), erasers (FTO), and readers (HNRNPA1, IGF2BP1, IFG2BP2, FMR1).

For single-read-level validation, raw reads from m5C-TAC-Seq (SRP459299) and m6A-SAC-Seq (SRP295164) raw reads were downloaded and aligned to the hg38 reference genome with STAR⁷⁰ (v2.7.10b). Modification coordinates reported by m6A-SAC-Seq (GSE162356) and the m5C-TAC-Seq (provided in their supplementary data) were used as reference loci to quantify and visualize single-molecule co-occurrence between adjacent modification sites. The same m6A and m5C loci were independently visualized using ORCA predictions derived from IM95 and HCT116 direct RNA-seq samples, respectively.

Benchmarking on RNA004 chemistry data

For RNA004 chemistry, a prediction model was trained on curlcake IVT libraries⁶⁵ containing seven synthetic RNA modification samples and an unmodified control. Training samples with defined modification fractions and read depths were generated using the same mixing strategy as for the RNA002 training sets, and the trained model was evaluated on held-out RNA004 test samples by ROC and precision–recall analysis for each modification type. For comparison with the vendor-provided state-of-the-art caller, Dorado RNA modification models (rna004_130bps_hac@v5.2.0 for m5C, Ψ, and inosine/m6A) were applied to the same curlcake reads. Read-level modification calls from Dorado were aligned to the reference using the Dorado aligner, and per-site modification statistics were obtained with modkit pileup. To place Dorado in the same simulated evaluation framework, synthetic sites spanning a range of sequencing depths and modification fractions were generated by random sampling, and the per-site modification fractions reported by Dorado were used as continuous prediction scores. Sites with a true simulated modification fraction of at least 0.1 were treated as positives, and ROC and precision–recall curves were computed using the Dorado scores.

To examine zero-shot detection on RNA004, leave-one-modification-out models were constructed by excluding each of the seven synthetic modification types in turn from the RNA004 curlcake training set and evaluating presence-prediction performance on the held-out type. De novo m6A detection across chemistries was further assessed on RNA002- and RNA004-based mouse brain direct RNA-seq libraries using models trained without m6A. For each library, ORCA modScores were calculated at adenosines located in DRACH motifs, and score distributions were compared between m6A-SAC-seq-supported sites and background DRACH positions. Overlaps of called m6A sites between the RNA002 and RNA004 datasets were summarized at the site level.

Benchmarking on ELIGOS and IVET training data

To assess the impact of training-set composition and sequencing chemistry on the presence-prediction model, additional models were trained on alternative IVT resources. One model was trained on IVET library¹⁰ from the TandemMod study using the same simulation pipeline and neural network architecture as the ELIGOS-derived RNA002 training set. Performance of the ELIGOS-trained and IVET-trained models was compared by computing ROC and precision–recall curves, in order to evaluate cross-dataset generalization and the influence of sequence diversity in the training data.

Benchmarking on the sequencing depth dependence of ORCA prediction model

The effect of read depth on presence-prediction performance was assessed by stratifying candidate sites into coverage bins and computing evaluation metrics within each bin. In the ELIGOS IVT datasets, sites with at least 10 supporting reads were grouped into four depth ranges. For each modification type and each depth bin, AUROC and AUPRC were calculated using the simulated modification labels as reference. A depth-stratified evaluation was also carried out for endogenous m6A detection in the mESC Mettl3 WT/KO datasets. Sites were grouped into the same four coverage bins, and precision–recall and ROC curves were computed separately for each bin.

Benchmarking on assay-specific and cell line-specific biases

Assay-specific and cell-line-specific biases were evaluated on K562 cell line by partitioning database-supported sites for m6A, m5C, m1A, and Nm according to the profiling technology or cell line. For each modification type in turn, annotation models with the same architecture and training procedure were trained in a multi-class setting in which training examples for the focal modification were restricted to sites from a single technology or cell line, whereas training examples for all other modification types and the background class always included all available database-supported sites. For each training configuration, recall and 1 − FDR for the focal modification were quantified on sites detected by held-out technologies or cell lines.

Statistics and reproducibility

No statistical method was used to predetermine the sample size. The in vivo mouse experiment was performed once as a small proof-of-principle, using two adult C57BL/6 mice, and is not used for formal statistical comparisons or sex-specific analyses. This sample size was chosen to obtain one high-depth RNA002 and one high-depth RNA004 direct RNA-sequencing library in order to demonstrate technical feasibility rather than to estimate variability between animals. For all computational analyses, sample sizes were determined by the size of the available public datasets, and we used all reads or samples that passed predefined quality control criteria without additional subsampling. No data were excluded from the analyses.

Where statistical tests were applied, the specific test, the exact P values, and the definition of n are provided in the figure legends or Source Data. P values < 0.05 were considered statistically significant unless otherwise stated. All computational analyses were performed using Python Jupyter Notebooks with numpy, pandas, and scipy for numerical/statistical calculations and matplotlib/seaborn for plotting. All codes to replicate the analysis are available as part of “Code availability”.

Ethics statement

All experimental procedures involving animals in this study were carried out in accordance with the guidelines for procurement and use of laboratory animals and have been approved by the Institutional Animal Ethics Committee at the Institute of Zoology, Chinese Academy of Sciences.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The m6A-SAC-seq and ONT direct RNA-seq data generated in this study have been deposited in the Genome Sequence Archive⁸⁰ in National Genomics Data Center, China National Center for Bioinformation (Accession number: PRJCA040561) that are publicly accessible at https://ngdc.cncb.ac.cn/gsa. Publicly available nanopore direct RNA sequencing datasets used in this study were obtained from SRP166020, SRP393373, PRJEB46364, SRP329477, PRJEB82528, SRP426654, SRP171702 and the SGNex project⁴⁹. RNA modification sites detected by NGS-based sequencing technologies were collected from the RMBase v3.0³⁴ and DirectRMDB³⁵. In addition, individual datasets were used for the detection of different modification types, including m6A (GSE210563, GSE163491), m5C (GSE140995, GSE93751, GSE122260), Ψ (GSE179798), m1A (GSE70485), m7G (GSE193039), and inosine (PRJEB46364). In particular, the m5C sites were identified as described in CHEUI²³, and inosine editing sites were de novo identified from RNA-seq data in the DeepEdit⁴⁵ study. Source Data are provided at Zenodo (https://doi.org/10.5281/zenodo.17960329). Source data are provided with this paper.

Code availability

ORCA is implemented in Python and can be freely accessed on GitHub at https://github.com/bioinfo-biols/ORCA and is archived on Zenodo under the https://zenodo.org/records/17949213⁸¹. The software is packaged with sample datasets and has been extensively tested on Linux. The detailed software installation guide has been included in our GitHub repository. Codes for data analysis have been deposited at Zenodo (https://doi.org/10.5281/zenodo.17785932).

References

Wang, X. et al. N6-methyladenosine-dependent regulation of messenger RNA stability. Nature 505, 117–120 (2014).
ADS PubMed Google Scholar
Haussmann, I. U. et al. m(6)A potentiates Sxl alternative pre-mRNA splicing for robust Drosophila sex determination. Nature 540, 301–304 (2016).
ADS CAS PubMed Google Scholar
Mendel, M. et al. Splice site m(6)A methylation prevents binding of U2AF35 to inhibit RNA splicing. Cell 184, 3125–3142 e3125 (2021).
CAS PubMed PubMed Central Google Scholar
Chen, T. et al. m(6)A modification plays an integral role in mRNA stability and translation during pattern-triggered immunity. Proc. Natl. Acad. Sci. USA 121, e2411100121 (2024).
CAS PubMed PubMed Central Google Scholar
Wang, X. et al. N(6)-methyladenosine modulates messenger RNA translation efficiency. Cell 161, 1388–1399 (2015).
CAS PubMed PubMed Central Google Scholar
Loedige, I. et al. mRNA stability and m(6)A are major determinants of subcellular mRNA localization in neurons. Mol. Cell 83, 2709–2725 e2710 (2023).
CAS PubMed PubMed Central Google Scholar
Zhang, Y., Lu, L. & Li, X. Detection technologies for RNA modifications. Exp. Mol. Med. 54, 1601–1616 (2022).
CAS PubMed PubMed Central Google Scholar
Lucas, M. C. & Novoa, E. M. Long-read sequencing in the era of epigenomics and epitranscriptomics. Nat. Methods 20, 25–29 (2023).
CAS PubMed Google Scholar
Huang, S., Wylder, A. C. & Pan, T. Simultaneous nanopore profiling of mRNA m(6)A and pseudouridine reveals translation coordination. Nat. Biotechnol. 42, 1831–1835 (2024).
ADS CAS PubMed PubMed Central Google Scholar
Wu, Y. et al. Transfer learning enables identification of multiple types of RNA modifications using nanopore direct RNA sequencing. Nat. Commun. 15, 4049 (2024).
ADS CAS PubMed PubMed Central Google Scholar
Zhao, Z. et al. QKI shuttles internal m(7)G-modified transcripts into stress granules and modulates mRNA metabolism. Cell 186, 3208–3226 e3227 (2023).
CAS PubMed PubMed Central Google Scholar
Kortel, N. et al. Deep and accurate detection of m6A RNA modifications using miCLIP2 and m6Aboost machine learning. Nucleic Acids Res. 49, e92 (2021).
PubMed PubMed Central Google Scholar
Dominissini, D. et al. The dynamic N(1)-methyladenosine methylome in eukaryotic messenger RNA. Nature 530, 441–446 (2016).
ADS CAS PubMed PubMed Central Google Scholar
Dai, Q. et al. Quantitative sequencing using BID-seq uncovers abundant pseudouridines in mammalian mRNA at base resolution. Nat. Biotechnol. 41, 344–354 (2023).
CAS PubMed Google Scholar
Liu, C. et al. Absolute quantification of single-base m(6)A methylation in the mammalian transcriptome using GLORI. Nat. Biotechnol. 41, 355–366 (2023).
CAS PubMed Google Scholar
Dai, Q. et al. Ultrafast bisulfite sequencing detection of 5-methylcytosine in DNA and RNA. Nat. Biotechnol. 42, 1559–1570 (2024).
CAS PubMed PubMed Central Google Scholar
Hu, L. et al. m(6)A RNA modifications are measured at single-base resolution across the mammalian transcriptome. Nat. Biotechnol. 40, 1210–1219 (2022).
CAS PubMed PubMed Central Google Scholar
Jain, M., Abu-Shumays, R., Olsen, H. E. & Akeson, M. Advances in nanopore direct RNA sequencing. Nat. Methods 19, 1160–1164 (2022).
CAS PubMed PubMed Central Google Scholar
Wang, Y., Zhao, Y., Bollas, A., Wang, Y. & Au, K. F. Nanopore sequencing technology, bioinformatics and applications. Nat. Biotechnol. 39, 1348–1365 (2021).
CAS PubMed PubMed Central Google Scholar
Begik, O. et al. Quantitative profiling of pseudouridylation dynamics in native RNAs with nanopore sequencing. Nat. Biotechnol. 39, 1278–1291 (2021).
CAS PubMed Google Scholar
Liu, H. et al. Accurate detection of m(6)A RNA modifications in native RNA sequences. Nat. Commun. 10, 4079 (2019).
ADS PubMed PubMed Central Google Scholar
Hendra, C. et al. Detection of m6A from direct RNA sequencing using a multiple instance learning framework. Nat. Methods 19, 1590–1598 (2022).
CAS PubMed PubMed Central Google Scholar
Acera Mateos, P. et al. Prediction of m6A and m5C at single-molecule resolution reveals a transcriptome-wide co-occurrence of RNA modifications. Nat. Commun. 15, 3899 (2024).
ADS CAS PubMed PubMed Central Google Scholar
Gao, Y. et al. Quantitative profiling of N(6)-methyladenosine at single-base resolution in stem-differentiating xylem of Populus trichocarpa using Nanopore direct RNA sequencing. Genome Biol. 22, 22 (2021).
CAS PubMed PubMed Central Google Scholar
Lorenz, D. A., Sathe, S., Einstein, J. M. & Yeo, G. W. Direct RNA sequencing enables m(6)A detection in endogenous transcript isoforms at base-specific resolution. RNA 26, 19–28 (2020).
CAS PubMed Google Scholar
Wu, Y. et al. Simultaneous profiling of ac(4)C and m(5)C modifications from nanopore direct RNA sequencing. Int. J. Biol. Macromol. 305, 140863 (2025).
CAS PubMed Google Scholar
Huang, S. et al. Interferon inducible pseudouridine modification in human mRNA by quantitative nanopore profiling. Genome Biol. 22, 330 (2021).
CAS PubMed PubMed Central Google Scholar
Pratanwanich, P. N. et al. Identification of differential RNA modifications from nanopore direct RNA sequencing with xPore. Nat. Biotechnol. 39, 1394–1402 (2021).
CAS PubMed Google Scholar
Leger, A. et al. RNA modifications detection by comparative Nanopore direct RNA sequencing. Nat. Commun. 12, 7198 (2021).
ADS CAS PubMed PubMed Central Google Scholar
Liu, N. et al. Probing N6-methyladenosine RNA modification status at single nucleotide resolution in mRNA and long noncoding RNA. RNA 19, 1848–1856 (2013).
CAS PubMed PubMed Central Google Scholar
Jenjaroenpun, P. et al. Decoding the epitranscriptional landscape from native RNA sequences. Nucleic Acids Res. 49, e7 (2021).
CAS PubMed PubMed Central Google Scholar
Ganin, Y. et al. Domain-adversarial training of neural networks. J. Machine Learn. Res. 17, 1–35 (2015).
Tourancheau, A., Mead, E. A., Zhang, X. S. & Fang, G. Discovering multiple types of DNA methylation from bacteria and microbiome using nanopore sequencing. Nat. Methods 18, 491–498 (2021).
CAS PubMed PubMed Central Google Scholar
Xuan, J. et al. RMBase v3.0: decode the landscape, mechanisms and functions of RNA modifications. Nucleic Acids Res. 52, D273–D284 (2024).
CAS PubMed PubMed Central Google Scholar
Zhang, Y. et al. DirectRMDB: a database of post-transcriptional RNA modifications unveiled from direct RNA sequencing technology. Nucleic Acids Res. 51, D106–D116 (2023).
CAS PubMed PubMed Central Google Scholar
Spangenberg, J. et al. The RMaP challenge of predicting RNA modifications by nanopore sequencing. Commun. Chem. 8, 115 (2025).
CAS PubMed PubMed Central Google Scholar
Cappannini, A. et al. MODOMICS: a database of RNA modifications and related information. 2023 update. Nucleic Acids Res. 52, D239–D244 (2024).
CAS PubMed PubMed Central Google Scholar
Zhong, Z. D. et al. Systematic comparison of tools used for m(6)A mapping from nanopore direct RNA sequencing. Nat. Commun. 14, 1906 (2023).
ADS CAS PubMed PubMed Central Google Scholar
Huang, T., Chen, W., Liu, J., Gu, N. & Zhang, R. Genome-wide identification of mRNA 5-methylcytosine in mammals. Nat. Struct. Mol. Biol. 26, 380–388 (2019).
CAS PubMed Google Scholar
Schumann, U. et al. Multiple links between 5-methylcytosine content of mRNA and translation. BMC Biol. 18, 40 (2020).
CAS PubMed PubMed Central Google Scholar
Yang, X. et al. 5-methylcytosine promotes mRNA export - NSUN2 as the methyltransferase and ALYREF as an m⁵C reader. Cell Res. 27, 606–625 (2017).
CAS PubMed PubMed Central Google Scholar
Stoiber, M. et al. De novo identification of DNA modifications enabled by genome-guided nanopore signal processing. Preprint at https://www.biorxiv.org/content/10.1101/094672v2 (2017).
Flemmich, L., Bereiter, R. & Micura, R. Chemical synthesis of modified RNA. Angew. Chem. Int. Ed. Engl. 63, e202403063 (2024).
CAS PubMed Google Scholar
Incarnato, D. et al. High-throughput single-base resolution mapping of RNA 2΄-O-methylated residues. Nucleic Acids Res. 45, 1433–1441 (2017).
CAS PubMed PubMed Central Google Scholar
Chen, L. et al. DeepEdit: single-molecule detection and phasing of A-to-I RNA editing events using nanopore direct RNA sequencing. Genome Biol. 24, 75 (2023).
CAS PubMed PubMed Central Google Scholar
Taoka, M. et al. Landscape of the complete RNA chemical modifications in the human 80S ribosome. Nucleic Acids Res. 46, 9289–9298 (2018).
CAS PubMed PubMed Central Google Scholar
Drexler, H. L., Choquet, K. & Churchman, L. S. Splicing kinetics and coordination revealed by direct nascent RNA sequencing through nanopores. Mol. Cell 77, 985–998 e988 (2020).
CAS PubMed Google Scholar
Georgieva, D., Liu, Q., Wang, K. & Egli, D. Detection of base analogs incorporated during DNA replication by nanopore sequencing. Nucleic Acids Res. 48, e88 (2020).
CAS PubMed PubMed Central Google Scholar
Chen, Y. et al. A systematic benchmark of Nanopore long-read RNA sequencing for transcript-level analysis in human cell lines. Nat. Methods 22, 801–812 (2025).
CAS PubMed PubMed Central Google Scholar
Grant, C. E. & Bailey, T. L. XSTREME: comprehensive motif analysis of biological sequence datasets. Preprint at https://www.biorxiv.org/content/10.1101/2021.09.02.458722v1 (2021).
Selmi, T. et al. Sequence- and structure-specific cytosine-5 mRNA methylation by NSUN6. Nucleic Acids Res. 49, 1006–1022 (2021).
CAS PubMed PubMed Central Google Scholar
Lu, L. et al. Base-resolution m(5)C profiling across the mammalian transcriptome by bisulfite-free enzyme-assisted chemical labeling approach. Mol. Cell 84, 2984–3000 e2988 (2024).
CAS PubMed Google Scholar
Consortium, E. P. et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
ADS Google Scholar
Chang, S. H. et al. ELAVL1 regulates alternative splicing of eIF4E transporter to promote postnatal angiogenesis. Proc. Natl. Acad. Sci. USA 111, 18309–18314 (2014).
ADS CAS PubMed PubMed Central Google Scholar
Didiot, M. C. et al. The G-quartet containing FMRP binding site in FMR1 mRNA is a potent exonic splicing enhancer. Nucleic Acids Res. 36, 4902–4912 (2008).
CAS PubMed PubMed Central Google Scholar
Edens, B. M. et al. FMRP modulates neural differentiation through m(6)A-dependent mRNA nuclear export. Cell Rep. 28, 845–854 e845 (2019).
CAS PubMed PubMed Central Google Scholar
Arguello, A. E., DeLiberto, A. N. & Kleiner, R. E. RNA chemical proteomics reveals the N(6)-methyladenosine (m(6)A)-regulated protein-RNA interactome. J. Am. Chem. Soc. 139, 17249–17252 (2017).
ADS CAS PubMed Google Scholar
Roundtree, I. A., Evans, M. E., Pan, T. & He, C. Dynamic RNA modifications in gene expression regulation. Cell 169, 1187–1200 (2017).
CAS PubMed PubMed Central Google Scholar
Barbieri, I. & Kouzarides, T. Role of RNA modifications in cancer. Nat. Rev. Cancer 20, 303–322 (2020).
CAS PubMed Google Scholar
Wei, J. et al. Differential m(6)A, m(6)A(m), and m(1)A demethylation mediated by FTO in the cell nucleus and cytoplasm. Mol. Cell 71, 973–985 e975 (2018).
CAS PubMed PubMed Central Google Scholar
Zhang, J. et al. Comprehensive profiling of circular RNAs with nanopore sequencing and CIRI-long. Nat. Biotechnol. 39, 836–845 (2021).
PubMed Google Scholar
Zhang, J. et al. Real-time and programmable transcriptome sequencing with PROFIT-seq. Nat. Cell Biol. 26, 2183–2194 (2024).
CAS PubMed PubMed Central Google Scholar
Zhang, J. & Zhao, F. Circular RNA discovery with emerging sequencing and deep learning technologies. Nat. Genet. 57, 1089–1102 (2025).
CAS PubMed Google Scholar
Liu-Wei, W. et al. Sequencing accuracy and systematic errors of nanopore direct RNA sequencing. BMC Genomics 25, 528 (2024).
CAS PubMed PubMed Central Google Scholar
Cruciani, S. et al. De novo basecalling of RNA modifications at single molecule and nucleotide resolution. Genome Biol. 26, 38 (2025).
CAS PubMed PubMed Central Google Scholar
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 3 (2011).
Google Scholar
Chen, S. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. Imeta 2, e107 (2023).
CAS PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
CAS PubMed PubMed Central Google Scholar
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
CAS PubMed Google Scholar
Liu, D. Algorithms for efficiently collapsing reads with Unique Molecular Identifiers. PeerJ 7, e8275 (2019).
PubMed PubMed Central Google Scholar
Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).
CAS PubMed PubMed Central Google Scholar
Gamaarachchi, H. et al. Fast nanopore sequencing data analysis with SLOW5. Nat. Biotechnol. 40, 1026–1029 (2022).
CAS PubMed PubMed Central Google Scholar
Samarakoon, H. et al. Flexible and efficient handling of nanopore sequencing signal data with slow5tools. Genome Biol. 24, 69 (2023).
PubMed PubMed Central Google Scholar
Li, H. New strategies to improve minimap2 alignment accuracy. Bioinformatics 37, 4572–4574 (2021).
CAS PubMed PubMed Central Google Scholar
Gamaarachchi, H. et al. GPU accelerated adaptive banded event alignment for rapid comparative nanopore signal analysis. BMC Bioinforma. 21, 343 (2020).
CAS Google Scholar
Simpson, J. T. et al. Detecting DNA cytosine methylation using nanopore sequencing. Nat. Methods 14, 407–410 (2017).
CAS PubMed Google Scholar
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
CAS PubMed PubMed Central Google Scholar
Giulietti, M. et al. SpliceAid-F: a database of human splicing factors and their RNA-binding sites. Nucleic Acids Res. 41, D125–D131 (2013).
CAS PubMed Google Scholar
Chen, T. et al. The Genome sequence archive family: toward explosive data growth and diverse data types. Genomics Proteom. Bioinforma. 19, 578–583 (2021).
Google Scholar
Dong, H., Zhao, F. & Zhang, J. Comprehensive mapping of RNA modification dynamics and crosstalk via deep learning and nanopore direct RNA-sequencing. Zenodo https://doi.org/10.5281/zenodo.17949213 (2025).
Article Google Scholar

Download references

Acknowledgements

This work was supported by grants from the National Key R&D Project [2023YFC2604300, 2024YFF1206600, 2025YFF1207900, 2025YFC3409300, 2021YFA1302000], the National Natural Science Foundation of China [32422020, 32025009, 32130020, 92581121, T2521004], and the Zhejiang Province Vanguard Goose-Leading Initiative [2025C01114].

Author information

Authors and Affiliations

State Key Laboratory of Animal Biodiversity Conservation and Integrated Pest Management, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
Han Dong, Yongsheng Gao, Zhengyi Cai, Yi Li, Xing Li, Fangqing Zhao & Jinyang Zhang
University of Chinese Academy of Sciences, Beijing, China
Han Dong, Yongsheng Gao, Zhengyi Cai, Xing Li & Fangqing Zhao
Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China
Fangqing Zhao

Authors

Han Dong
View author publications
Search author on:PubMed Google Scholar
Yongsheng Gao
View author publications
Search author on:PubMed Google Scholar
Zhengyi Cai
View author publications
Search author on:PubMed Google Scholar
Yi Li
View author publications
Search author on:PubMed Google Scholar
Xing Li
View author publications
Search author on:PubMed Google Scholar
Fangqing Zhao
View author publications
Search author on:PubMed Google Scholar
Jinyang Zhang
View author publications
Search author on:PubMed Google Scholar

Contributions

F.Z. and J.Z. conceived the project. H.D. and J.Z. designed the method. H.D. implemented the model. H.D. and J.Z. performed data analysis. Y.G., Z.C., Y.L., and X.L. performed the experiments and generated sequencing data. H.D., J.Z., and F.Z. wrote the manuscript with the contribution of all authors.

Corresponding authors

Correspondence to Xing Li, Fangqing Zhao or Jinyang Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Jia Meng and Mattia Pelizzola for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Reporting Summary (download PDF )

Transparent Peer Review file (download PDF )

Source data

Source Data Figure 2 (download XLSX )

Source Data Figure 3 (download XLSX )

Source Data Figure 4 (download XLSX )

Source Data Figure 5 (download XLSX )

Source Data Figure 6 (download XLSX )

Source Data for Supplementary Figs. (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Dong, H., Gao, Y., Cai, Z. et al. Comprehensive mapping of RNA modification dynamics and crosstalk via deep learning and nanopore direct RNA-sequencing. Nat Commun 17, 1722 (2026). https://doi.org/10.1038/s41467-026-68419-y

Download citation

Received: 05 August 2025
Accepted: 07 January 2026
Published: 14 January 2026
Version of record: 17 February 2026
DOI: https://doi.org/10.1038/s41467-026-68419-y