Retentive Network promotes efficient RNA language modeling of long sequences

Shen, Yi; Cao, Guangshuo; Hu, Yueming; Zhang, Shilong; Wu, Jianghong; Chen, Dijun; Chen, Ming

doi:10.1038/s42003-026-09757-x

Download PDF

Article
Open access
Published: 11 March 2026

Retentive Network promotes efficient RNA language modeling of long sequences

Communications Biology , Article number: (2026) Cite this article

1822 Accesses
1 Altmetric
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

The latent features of RNA sequences are crucial for our understanding of their functions. Thus, Transformer-based nucleotide language models have received widespread attention; however, the O(n²) complexity of Transformer limits their ability to process long sequences. In this work, we propose RNAret, an RNA language model based on Retention Network, which achieves training parallelism, low computational overhead, and long-sequence processing through a retention mechanism, with O(n) complexity. We pretrain RNAret using a self-supervised masked language modeling approach on 29.8 million RNA sequences. Experiments demonstrate the merit of RNAret as an RNA language model, achieving superior performance on a range of tasks, including RNA-RNA interaction prediction, RNA secondary structure prediction, and mRNA/lncRNA classification. RNAret shows strong potential for extracting latent features from RNA sequences and advancing our understanding of RNA biology.

Multi-purpose RNA language modelling with motif-aware pretraining and type-guided fine-tuning

Article Open access 13 May 2024

Transcriptomics in the era of long-read sequencing

Article 28 March 2025

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Article Open access 07 June 2024

Data availability

All the datasets used for analyses in this work are publicly available online. The RNAcentral dataset is available at https://ftp.ebi.ac.uk/pub/databases/RNAcentral/releases/21.0/. Datasets for downstream fine-tuning tasks are available at https://bis.zju.edu.cn/rnaret/download/ and have been deposited in Zenodo (https://doi.org/10.5281/zenodo.18313475)⁵⁴. Source data for figures are provided in Supplementary Data 1-3.

Code availability

The RNAret source code, including scripts for pretraining, training, and inference, is available on GitHub (https://github.com/DrBlackZJU/RNAret/) and archived on Zenodo (https://doi.org/10.5281/zenodo.18271233)⁵⁵. The RNAret web server is accessible at https://bis.zju.edu.cn/rnaret/. Model weights are available at the project website (https://bis.zju.edu.cn/rnaret/download/) and have also been deposited on Zenodo (https://doi.org/10.5281/zenodo.18313475)⁵⁴.

References

Caprara, M. G. & Nilsen, T. W. RNA: versatility in form and function. Nat. Struct. Biol. 7, 831–833 (2000).
Google Scholar
Holbrook, S. R. RNA structure: the long and the short of it. Curr. Opin. Struct. Biol. 15, 302–308 (2005).
Google Scholar
Chen, Z., Ain, N. U., Zhao, Q. & Zhang, X. From tradition to innovation: conventional and deep learning frameworks in genome annotation. Brief. Bioinform. 25, bbae138 (2024).
Google Scholar
Vaswani, A. et al. Attention is all you need. In Adv. Neural Inf. Process. Syst. 30, 5998–6008 (NIPS, 2017).
Min, B. et al. Recent advances in natural language processing via large pre-trained language models: a survey. ACM Comput. Surv. 56, 30 (2023).
Google Scholar
Chen, J. et al. Interpretable RNA foundation model from unannotated data for highly accurate RNA structure and function predictions. Preprint at https://arxiv.org/abs/2204.00300 (2022).
Zhang, Y. et al. Multiple sequence alignment-based RNA language model and its application to structural inference. Nucleic Acids Res 52, e3 (2024).
Google Scholar
Wang, N. et al. Multi-purpose RNA language modelling with motif-aware pretraining and type-guided fine-tuning. Nat. Mach. Intell. 6, 548–557 (2024).
Google Scholar
Shen, T. et al. Accurate RNA 3D structure prediction using a language model-based deep learning approach. Nat. Methods 21, 2287–2298 (2024).
Google Scholar
Wang, X. et al. Uni-RNA: universal pre-trained models revolutionize RNA research. Preprint at https://www.biorxiv.org/content/10.1101/2023.07.11.548588v1 (2023).
Penić, R. J., Vlašić, T., Huber, R. G., Wan, Y. & Šikić, M. RiNALMo: general-purpose RNA language models can generalize well on structure prediction tasks. Nat. Commun. 16, 5671 (2025).
Google Scholar
Dao, T., Fu, D., Ermon, S., Rudra, A. & Ré, C. FlashAttention: fast and memory-efficient exact attention with IO-Awareness. In Adv. Neural Inf. Process. Syst. 35, 16344–16359 (NIPS, 2022).
Dao, T. FlashAttention-2: faster attention with better parallelism and work partitioning. In International Conference on Learning Representations (ICLR, 2024).
Avsec, Ž et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).
Google Scholar
Nguyen, E. et al. Sequence modeling and design from molecular to genome scale with Evo. Science 386, eado9336 (2024).
Google Scholar
Sun, Y. et al. Retentive Network: a successor to Transformer for large language models. Preprint at https://arxiv.org/abs/2307.08621 (2023).
Sweeney, B. et al. RNAcentral 2021: secondary structure integration, improved sequence search and new member databases. Nucleic Acids Res. 49, D212–D220 (2021).
Google Scholar
Kirk, J. M. et al. Functional classification of long non-coding RNAs by k-mer content. Nat. Genet. 50, 1474–1482 (2018).
Google Scholar
van der Maaten, L. & Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Google Scholar
Wen, M., Cong, P., Zhang, Z., Lu, H. & Li, T. DeepMirTar: a deep-learning approach for predicting human miRNA targets. Bioinformatics 34, 3781–3787 (2018).
Google Scholar
Pla, A., Zhong, X. & Rayner, S. miRAW: A deep learning-based approach to predict microRNA targets by analyzing whole microRNA transcripts. PLoS Comput. Biol. 14, e1006185 (2018).
Google Scholar
Gu, T., Zhao, X., Barbazuk, W. B. & Lee, J. miTAR: a hybrid deep learning-based approach for predicting miRNA targets. BMC Bioinform 22, 96 (2021).
Google Scholar
Akiyama, M. & Sakakibara, Y. Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning. NAR Genom. Bioinform. 4, lqac12 (2022).
Google Scholar
Bartel, D. P. MicroRNAs: target recognition and regulatory functions. Cell 136, 215–233 (2009).
Google Scholar
Li, J., Liu, S., Zhou, H., Qu, L. & Yang, J. starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res, 42, D92–D97 (2014).
Google Scholar
Tan, Z., Fu, Y., Sharma, G. & Mathews, D. H. TurboFold II: RNA structural alignment and secondary structure prediction informed by multiple homologs. Nucleic Acids Res. 45, 11570–11581 (2017).
Google Scholar
Sloma, M. F. & Mathews, D. H. Exact calculation of loop formation probability identifies folding motifs in RNA secondary structures. RNA 22, 1808–1818 (2016).
Google Scholar
Danaee, P. et al. bpRNA: large-scale automated annotation and analysis of RNA secondary structure. Nucleic Acids Res. 46, 5381–5394 (2018).
Google Scholar
Szikszai, M., Wise, M., Datta, A., Ward, M. & Mathews, D. H. Deep learning models for RNA secondary structure prediction (probably) do not generalize across families. Bioinformatics 38, 3892–3899 (2022).
Google Scholar
Fu, L. et al. UFold: fast and accurate RNA secondary structure prediction with deep learning. Nucleic Acids Res. 50, e14 (2022).
Google Scholar
Gruber, A. R., Lorenz, R., Bernhart, S. H., Neuböck, R. & Hofacker, I. L. The Vienna RNA websuite. Nucleic Acids Res. 36, W70–W74 (2008).
Google Scholar
Kerpedjiev, P., Hammer, S. & Hofacker, I. L. Forna (force-directed RNA): Simple and effective online RNA secondary structure diagrams. Bioinformatics 31, 3377–3379 (2015).
Google Scholar
Zuker, M. & Stiegler, P. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res 9, 133–148 (1981).
Google Scholar
Frankish, A. et al. GENCODE 2021. Nucleic Acids Res. 49, D916–D923 (2021).
Google Scholar
Kang, Y. et al. CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 45, W12–W16 (2017).
Google Scholar
Wang, L. et al. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 41, e74 (2013).
Google Scholar
Subramanian, K., Payne, B., Feyertag, F. & Alvarez-Ponce, D. The codon statistics database: a database of codon usage bias. Mol. Biol. Evol. 39, msac157 (2022).
Google Scholar
Dominguez, D. et al. Sequence, structure, and context preferences of human RNA binding proteins. Mol. Cell. 70, 854–867 (2018).
Google Scholar
Ji, Y., Zhou, Z., Liu, H. & Davuluri, R. V. DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome. Bioinformatics 37, 2112–2120 (2021).
Google Scholar
Ba, J. L., Kiros, J. R. & Hinton, G. E. Layer normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).
Sun, Y. et al. A length-extrapolatable Transformer. In 61st Annual Meeting of the Association-for-Computational-Linguistics 14590-14604 (ACL, 2023)
Fan, Q., Huang, H., Chen, M., Liu, H. & He, R. RMT: Retentive Networks meet Vision Transformers. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR, 2024).
Hendrycks, D. & Gimpel, K. Gaussian error linear units (GELUs). Preprint at https://arxiv.org/abs/1606.08415 (2016).
Kenton, J. & Toutanova, L. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. NAACL-HLT 2019 Vol. 1 (eds Burstein, J. et al.) 4171–4186 (Association for Computational Linguistics, 2019).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In International Conference on Learning Representations (ICLR, 2019).
Ning, W. CatIIIIIIII/RNAErnie: v.1.0. Zenodo https://doi.org/10.5281/zenodo.10847621 (2024).
Nowakowski, J. & Tinoco, I. RNA structure and stability. Semin. Virol. 8, 153–165 (1997).
Google Scholar
Chen, X., Li, Y., Umarov, R., Gao, X. & Song, L. RNA secondary structure prediction by learning unrolled algorithms. In International Conference on Learning Representations (ICLR, 2020).
Kuhn, H. The Hungarian Method for the assignment problem. Nav. Res. Logist. 52, 7–21 (2005).
Google Scholar
Ventola, G. M. M. et al. Identification of long non-coding transcripts with feature selection: a comparative study. BMC Bioinform. 18, 187 (2017).
Google Scholar
Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
Google Scholar
Pedregosa, F. et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Google Scholar
Shen, Y. RNAret - Datasets and Model Weights [Data set]. Zenodo https://doi.org/10.5281/zenodo.18313475 (2026).
Shen, Y. DrBlackZJU/RNAret: Retentive Network promotes efficient RNA language modeling of long sequences (v1.0). Zenodo https://doi.org/10.5281/zenodo.18271233 (2026).

Download references

Acknowledgements

This work was partially supported by the National Key Research and Development Program of China [2023YFE0112300]; National Science Foundation of China [32261133526; 32570787]; the Science and Technology Innovation Leading Scientist [2022R52035], the 151 talent project of Zhejiang Province (first level); and Collaborative Innovation Center for Modern Crop Production co-sponsored by the province and the ministry. The authors are grateful to the members of Ming Chen’s laboratory for helpful discussions and valuable comments, and to Jianghong Wu for assistance with computational resources.

Author information

Authors and Affiliations

Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
Yi Shen, Yueming Hu, Shilong Zhang & Ming Chen
State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China
Guangshuo Cao & Dijun Chen
College of Animal Science and Technology, Inner Mongolia Minzu University, Tongliao, China
Jianghong Wu
State Key Laboratory of Vegetation Structure, Function and Construction, College of Life Sciences, Zhejiang University, Hangzhou, China
Ming Chen

Authors

Yi Shen
View author publications
Search author on:PubMed Google Scholar
Guangshuo Cao
View author publications
Search author on:PubMed Google Scholar
Yueming Hu
View author publications
Search author on:PubMed Google Scholar
Shilong Zhang
View author publications
Search author on:PubMed Google Scholar
Jianghong Wu
View author publications
Search author on:PubMed Google Scholar
Dijun Chen
View author publications
Search author on:PubMed Google Scholar
Ming Chen
View author publications
Search author on:PubMed Google Scholar

Contributions

M.C. and D.C. supervised and designed the study. Y.S. designed the study, implemented the model, and performed data analysis with support from J.W. and Y.H. Y.S. wrote the manuscript with input from G.C. and Y.H. S.Z. helped to build up the web server. All authors reviewed and approved the submitted manuscript.

Corresponding authors

Correspondence to Dijun Chen or Ming Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editors: Professor Maria Anisimova and Dr. Nilanjan Banerjee, Dr. Aylin Bircan, Dr. Kaliya Georgieva. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Transparent Peer Review file (download PDF )

Supplementary Information (download PDF )

Description of Additional Supplementary files (download PDF )

Supplementary Data 1 (download TXT )

Supplementary Data 2 (download TXT )

Supplementary Data 3 (download XLSX )

Reporting Summary (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Shen, Y., Cao, G., Hu, Y. et al. Retentive Network promotes efficient RNA language modeling of long sequences. Commun Biol (2026). https://doi.org/10.1038/s42003-026-09757-x

Download citation

Received: 18 April 2025
Accepted: 16 February 2026
Published: 11 March 2026
DOI: https://doi.org/10.1038/s42003-026-09757-x