Abstract
In addition to encoding proteins, mRNAs have context-specific regulatory roles that contribute to many cellular processes. However, uncovering new mRNA functions is constrained by limitations of traditional biochemical and computational methods. In this Roadmap, we highlight how artificial intelligence can transform our understanding of RNA biology by fostering collaborations between RNA biologists and computational scientists to drive innovation in this fundamental field of research. We discuss how non-coding regions of the mRNA, including introns and 5′ and 3′ untranslated regions, regulate the metabolism and interactomes of mRNA, and the current challenges in characterizing these regions. We further discuss large language models, which can be used to learn biologically meaningful RNA sequence representations. We also provide a detailed roadmap for integrating large language models with graph neural networks to harness publicly available sequencing and knowledge data. Adopting this roadmap will allow us to predict RNA interactions with diverse molecules and the modelling of context-specific mRNA interactomes.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
References
Robertson, M. P. & Joyce, G. F. The origins of the RNA world. Cold Spring Harb. Perspect. Biol. 4, a003608 (2012).
Papastavrou, N., Horning, D. P. & Joyce, G. F. RNA-catalyzed evolution of catalytic RNA. Proc. Natl Acad. Sci. USA 121, e2321592121 (2024).
Pearce, B. K. D., Pudritz, R. E., Semenov, D. A. & Henning, T. K. Origin of the RNA world: the fate of nucleobases in warm little ponds. Proc. Natl Acad. Sci. USA 114, 11327–11332 (2017).
Palcau, A. C. et al. CircPVT1: a pivotal circular node intersecting long non-coding-PVT1 and c-MYC oncogenic signals. Mol. Cancer 21, 33 (2022).
Mou, X., Liew, S. W. & Kwok, C. K. Identification and targeting of G-quadruplex structures in MALAT1 long non-coding RNA. Nucleic Acids Res. 50, 397–410 (2022).
Roden, C. & Gladfelter, A. S. RNA contributions to the form and function of biomolecular condensates. Nat. Rev. Mol. Cell Biol. 22, 183–195 (2021).
Morris, K. V. & Mattick, J. S. The rise of regulatory RNA. Nat. Rev. Genet. 15, 423–437 (2014).
Yang, H., Li, Q., Stroup, E. K., Wang, S. & Ji, Z. Widespread stable noncanonical peptides identified by integrated analyses of ribosome profiling and ORF features. Nat. Commun. 15, 1932 (2024).
Nussbacher, J. K., Tabet, R., Yeo, G. W. & Lagier-Tourenne, C. Disruption of RNA metabolism in neurological diseases and emerging therapeutic interventions. Neuron 102, 294–320 (2019).
Wang, W., van Niekerk, E., Willis, D. E. & Twiss, J. L. RNA transport and localized protein synthesis in neurological disorders and neural repair. Dev. Neurobiol. 67, 1166–1182 (2007).
Goodall, G. J. & Wickramasinghe, V. O. RNA in cancer. Nat. Rev. Cancer 21, 22–36 (2021).
Egger, G. & Arimondo, P. Drug Discovery in Cancer Epigenetics (Academic, 2015).
Zhou, Y., Huang, T., Li, T. & Sun, J. RNA Modification in Human Cancers: Roles and Therapeutic Implications (Frontiers Media, 2022).
Giangrande, P. H., de Franciscis, V. & Rossi, J. J. RNA Therapeutics: the Evolving Landscape of RNA Therapeutics (Academic, 2022).
Ahmad, R. U. & Pathak, S. Unlocking the therapeutic potential of RNA: a comprehensive review of RNA-based therapy. Doctoral dissertation, BRAC University (2023).
Lin, C. & Miles, W. O. Beyond CLIP: advances and opportunities to measure RBP–RNA and RNA–RNA interactions. Nucleic Acids Res. 47, 5490–5501 (2019).
Baralle, F. E. & Giudice, J. Alternative splicing as a regulator of development and tissue identity. Nat. Rev. Mol. Cell Biol. 18, 437–451 (2017).
Sciarrillo, R. et al. The role of alternative splicing in cancer: from oncogenesis to drug resistance. Drug Resist. Updat. 53, 100728 (2020).
Andreassi, C., Crerar, H. & Riccio, A. Post-transcriptional processing of mRNA in neurons: the vestiges of the RNA world drive transcriptome diversity. Front. Mol. Neurosci. 11, 304 (2018).
Andreassi, C. & Riccio, A. To localize or not to localize: mRNA fate is in 3′UTR ends. Trends Cell Biol. 19, 465–474 (2009).
Mayr, C. Regulation by 3′-untranslated regions. Annu. Rev. Genet. 51, 171–194 (2017).
Van Nostrand, E. L. et al. A large-scale binding and functional map of human RNA-binding proteins. Nature 583, 711–719 (2020).
Hentze, M. W., Castello, A., Schwarzl, T. & Preiss, T. A brave new world of RNA-binding proteins. Nat. Rev. Mol. Cell Biol. 19, 327–341 (2018).
Zhang, R. & Su, B. Small but influential: the role of microRNAs on gene regulatory network and 3′UTR evolution. J. Genet. Genomics 36, 1–6 (2009).
Rajyaguru, P. & Parker, R. RGG motif proteins: modulators of mRNA functional states. Cell Cycle 11, 2594–2599 (2012).
Schwartz, J. C., Cech, T. R. & Parker, R. R. Biochemical properties and biological functions of FET proteins. Annu. Rev. Biochem. 84, 355–379 (2015).
Taliaferro, J. M. et al. Distal alternative last exons localize mRNAs to neural projections. Mol. Cell 61, 821–833 (2016).
Hinnebusch, A. G., Ivanov, I. P. & Sonenberg, N. Translational control by 5′-untranslated regions of eukaryotic mRNAs. Science 352, 1413–1416 (2016).
Wieder, N. et al. Differences in 5′ untranslated regions highlight the importance of translational regulation of dosage sensitive genes. Genome Biol. 25, 111 (2024).
Jia, L. et al. Decoding mRNA translatability and stability from the 5′ UTR. Nat. Struct. Mol. Biol. 27, 814–821 (2020).
Ryczek, N., Łyś, A. & Makałowska, I. The functional meaning of 5′UTR in protein-coding genes. Int. J. Mol. Sci. 24, 2976 (2023).
Colliva, A. & Tongiorgi, E. Distinct role of 5′UTR sequences in dendritic trafficking of BDNF mRNA: additional mechanisms for the BDNF splice variants spatial code. Mol. Brain 14, 10 (2021).
Derti, A. et al. A quantitative atlas of polyadenylation in five mammals. Genome Res. 22, 1173–1183 (2012).
Hilgers, V. et al. Neural-specific elongation of 3′ UTRs during drosophila development. Proc. Natl Acad. Sci. USA 108, 15864–15869 (2011).
Miura, P., Shenker, S., Andreu-Agullo, C., Westholm, J. O. & Lai, E. C. Widespread and extensive lengthening of 3′ UTRs in the mammalian brain. Genome Res. 23, 812–825 (2013).
Andreassi, C. et al. Cytoplasmic cleavage of IMPA1 3′ UTR is necessary for maintaining axon integrity. Cell Rep. 34, 108778 (2021).
Tushev, G. et al. Alternative 3′ UTRs modify the localization, regulatory potential, stability, and plasticity of mRNAs in neuronal compartments. Neuron 98, 495–511.e6 (2018).
Kislauskis, E. H., Zhu, X. & Singer, R. H. Sequences responsible for intracellular localization of beta-actin messenger RNA also affect cell phenotype. J. Cell Biol. 127, 441–451 (1994).
Braunschweig, U. et al. Widespread intron retention in mammals functionally tunes transcriptomes. Genome Res. 24, 1774–1786 (2014).
Petric-Howe, M. et al. Physiological intron retaining transcripts in the cytoplasm abound during human motor neurogenesis. Genome Res. 32, 1808–1825 (2022).
Skalska, L., Beltran-Nebot, M., Ule, J. & Jenner, R. G. Regulatory feedback from nascent RNA to chromatin and transcription. Nat. Rev. Mol. Cell Biol. 18, 331–337 (2017).
Wong, J. J.-L., Au, A. Y. M., Ritchie, W. & Rasko, J. E. J. Intron retention in mRNA: no longer nonsense: known and putative roles of intron retention in normal and disease biology: known and putative roles of intron retention in normal and disease biology. Bioessays 38, 41–49 (2016).
Wong, J. J.-L. et al. Orchestrated intron retention regulates normal granulocyte differentiation. Cell 154, 583–595 (2013).
Ortiz, R. et al. Recruitment of Staufen2 enhances dendritic localization of an intron-containing CaMKIIα mRNA. Cell Rep. 20, 13–20 (2017).
Luisier, R. et al. Intron retention and nuclear loss of SFPQ are molecular hallmarks of ALS. Nat. Commun. 9, 2010 (2018).
Ma, W. & Mayr, C. A membraneless organelle associated with the endoplasmic reticulum enables 3′UTR-mediated protein-protein interactions. Cell 175, 1492–1506.e19 (2018).
Horste, E. L. et al. Subcytoplasmic location of translation controls protein output. Mol. Cell 83, 4509–4523.e11 (2023).
Luo, Y. & Mayr, C. How the location of protein synthesis controls protein function. Biophys. J. 123, 309a (2024).
Luo, Y. et al. mRNA interactions with disordered regions control protein activity. Preprint at bioRxiv https://doi.org/10.1101/2023.02.18.529068 (2023).
Luisier, R., Andreassi, C., Fournier, L. & Riccio, A. The predicted RNA-binding protein regulome of axonal mRNAs. Genome Res. 33, 1497–1512 (2023).
Gadir, N., Haim-Vilmovsky, L., Kraut-Cohen, J. & Gerst, J. E. Localization of mRNAs coding for mitochondrial proteins in the yeast Saccharomyces cerevisiae. RNA 17, 1551–1565 (2011).
Morita, M. et al. mTORC1 controls mitochondrial activity and biogenesis through 4E-BP-dependent translational regulation. Cell Metab. 18, 698–711 (2013).
Gandin, V. et al. nanoCAGE reveals 5′ UTR features that define specific modes of translation of functionally related MTOR-sensitive mRNAs. Genome Res. 26, 636–648 (2016).
Bugler, B., Amalric, F. & Prats, H. Alternative initiation of translation determines cytoplasmic or nuclear localization of basic fibroblast growth factor. Mol. Cell. Biol. 11, 573–577 (1991).
Lee, I. et al. New class of microRNA targets containing simultaneous 5′-UTR and 3′-UTR interaction sites. Genome Res. 19, 1175–1183 (2009).
Jia, J., Yao, P., Arif, A. & Fox, P. L. Regulation and dysregulation of 3′UTR-mediated translational control. Curr. Opin. Genet. Dev. 23, 29–34 (2013).
Mattick, J. S. et al. Long non-coding RNAs: definitions, functions, challenges and recommendations. Nat. Rev. Mol. Cell Biol. 24, 430–447 (2023).
Kageyama, Y., Kondo, T. & Hashimoto, Y. Coding vs non-coding: translatability of short ORFs found in putative non-coding transcripts. Biochimie 93, 1981–1986 (2011).
Nam, J.-W., Choi, S.-W. & You, B.-H. Incredible RNA: dual functions of coding and noncoding. Mol. Cell 39, 367–374 (2016).
Yang, Y. et al. Extensive translation of circular RNAs driven by N6-methyladenosine. Cell Res. 27, 626–641 (2017).
Rodriguez, C. M., Chun, S. Y., Mills, R. E. & Todd, P. K. Translation of upstream open reading frames in a model of neuronal differentiation. BMC Genomics 20, 391 (2019).
Rodriguez, J. M. et al. Evidence for widespread translation of 5’ untranslated regions. Nucleic Acids Res. 52, 8112–8126 (2024).
Sudmant, P. H., Lee, H., Dominguez, D., Heiman, M. & Burge, C. B. Widespread accumulation of ribosome-associated isolated 3′ UTRs in neuronal cell populations of the aging brain. Cell Rep. 25, 2447–2456.e4 (2018).
Osman, I., Tay, M. L.-I. & Pek, J. W. Stable intronic sequence RNAs (sisRNAs): a new layer of gene regulation. Cell. Mol. Life Sci. 73, 3507–3519 (2016).
Rasmussen, A. M. et al. Circular stable intronic RNAs possess distinct biological features and are deregulated in bladder cancer. NAR Cancer 5, zcad041 (2023).
Chan, S. N. & Pek, J. W. Stable intronic sequence RNAs (sisRNAs): an expanding universe. Trends Biochem. Sci. 44, 258–272 (2019).
Talhouarne, G. J. S. & Gall, J. G. Lariat intronic RNAs in the cytoplasm of vertebrate cells. Proc. Natl Acad. Sci. USA 115, E7970–E7977 (2018).
Wilson, T. J. & Lilley, D. RNA catalysis — is that it? RNA 21, 534–537 (2015).
Cech, T. R. & Steitz, J. A. The noncoding RNA revolution — trashing old rules to forge new ones. Cell 157, 77–94 (2014).
Sebastián, D. et al. TP53INP2-dependent activation of muscle autophagy ameliorates sarcopenia and promotes healthy aging. Autophagy 20, 1815–1824 (2024).
Crerar, H. et al. Regulation of NGF signaling by an axonal untranslated mRNA. Neuron 102, 553–563.e8 (2019).
Valluy, J. et al. A coding-independent function of an alternative Ube3a transcript during neuronal development. Nat. Neurosci. 18, 666–673 (2015).
Lyford, G. L. et al. Arc, a growth factor and activity-regulated gene, encodes a novel cytoskeleton-associated protein that is enriched in neuronal dendrites. Neuron 14, 433–445 (1995).
Steward, O. & Worley, P. F. Selective targeting of newly synthesized Arc mRNA to active synapses requires NMDA receptor activation. Neuron 30, 227–240 (2001).
Ashley, J. et al. Retrovirus-like Gag protein Arc1 binds RNA and traffics across synaptic boutons. Cell 172, 262–274.e11 (2018).
Pastuzyn, E. D. et al. The neuronal gene Arc encodes a repurposed retrotransposon Gag protein that mediates intercellular RNA transfer. Cell 172, 275–288.e18 (2018).
O’Brien, K., Breyne, K., Ughetto, S., Laurent, L. C. & Breakefield, X. O. RNA delivery by extracellular vesicles in mammalian cells and its applications. Nat. Rev. Mol. Cell Biol. 21, 585–606 (2020).
Kafida, M., Karela, M. & Giakountis, A. RNA-independent regulatory functions of lncRNA in complex disease. Cancers 16, 2728 (2024).
Singh, G., Pratt, G., Yeo, G. W. & Moore, M. J. The clothes make the mRNA: past and present trends in mRNP fashion. Annu. Rev. Biochem. 84, 325–354 (2015).
Singh, S., Shyamal, S. & Panda, A. C. Detecting RNA–RNA interactome. Wiley Interdiscip. Rev. RNA 13, e1715 (2022).
Guil, S. & Esteller, M. RNA–RNA interactions in gene regulation: the coding and noncoding players. Trends Biochem. Sci. 40, 248–256 (2015).
Yoon, J.-H., Abdelmohsen, K. & Gorospe, M. Functional interactions among microRNAs and long noncoding RNAs. Semin. Cell Dev. Biol. 34, 9–14 (2014).
Li, X. et al. GRID-seq reveals the global RNA–chromatin interactome. Nat. Biotechnol. 35, 940–950 (2017).
Cao, X., Zhang, Y., Ding, Y. & Wan, Y. Identification of RNA structures and their roles in RNA functions. Nat. Rev. Mol. Cell Biol. 25, 784–801 (2024).
Doyle, M. & Kiebler, M. A. Mechanisms of dendritic mRNA transport and its role in synaptic tagging. EMBO J. 30, 3540–3552 (2011).
Mortimer, S. A., Kidwell, M. A. & Doudna, J. A. Insights into RNA structure and function from genome-wide studies. Nat. Rev. Genet. 15, 469–479 (2014).
Sugimoto, Y. et al. hiCLIP reveals the in vivo atlas of mRNA secondary structures recognized by Staufen 1. Nature 519, 491–494 (2015).
Wan, Y. et al. Landscape and variation of RNA secondary structure across the human transcriptome. Nature 505, 706–709 (2014).
Leppek, K., Das, R. & Barna, M. Functional 5′ UTR mRNA structures in eukaryotic translation regulation and how to find them. Nat. Rev. Mol. Cell Biol. 19, 158–174 (2017).
Jacquet, K. et al. The TIP60 complex regulates bivalent chromatin recognition by 53BP1 through direct H4K20me binding and H2AK15 acetylation. Mol. Cell 62, 409–421 (2016).
Beaudoin, J.-D. et al. Analyses of mRNA structure dynamics identify embryonic gene regulatory programs. Nat. Struct. Mol. Biol. 25, 677–686 (2018).
Wu, M.-Z. et al. Interplay between HDAC3 and WDR5 is essential for hypoxia-induced epithelial–mesenchymal transition. Mol. Cell 43, 811–822 (2011).
Paz, I. et al. RBPmap: a web server for mapping binding sites of RNA-binding proteins. Nucleic Acids Res. 42, W361–W367 (2014).
Xu, W., Biswas, J., Singer, R. H. & Rosbash, M. Targeted RNA editing: novel tools to study post-transcriptional regulation. Mol. Cell 82, 389–403 (2022).
Hu, X., Zou, Q., Yao, L. & Yang, X. Survey of the binding preferences of RNA-binding proteins to RNA editing events. Genome Biol. 23, 169 (2022).
Medina-Munoz, H. C. et al. Expanded palette of RNA base editors for comprehensive RBP–RNA interactome studies. Nat. Commun. 15, 875 (2024).
Seo, K. W. & Kleiner, R. E. Profiling dynamic RNA–protein interactions using small-molecule-induced RNA editing. Nat. Chem. Biol. 19, 1361–1371 (2023).
Baysal, B. E., Sharma, S., Hashemikhabir, S. & Janga, S. C. RNA editing in pathogenesis of cancer. Cancer Res. 77, 3733–3739 (2017).
Wassmer, E., Koppány, G., Hermes, M., Diederichs, S. & Caudron-Herger, M. Refining the pool of RNA-binding domains advances the classification and prediction of RNA-binding proteins. Nucleic Acids Res. 52, 7504–7522 (2024).
Zigdon, I. et al. Beyond RNA binding domains: determinants of protein–RNA binding. RNA 30, 1620–1633 (2024).
Lunde, B. M., Moore, C. & Varani, G. RNA-binding proteins: modular design for efficient function. Nat. Rev. Mol. Cell Biol. 8, 479–490 (2007).
Gronland, G. R. & Ramos, A. The devil is in the domain: understanding protein recognition of multiple RNA targets. Biochem. Soc. Trans. 45, 1305–1311 (2017).
Corley, M., Burns, M. C. & Yeo, G. W. How RNA-binding proteins interact with RNA: molecules and mechanisms. Mol. Cell 78, 9–29 (2020).
Street, L. A. et al. Large-scale map of RNA-binding protein interactomes across the mRNA life cycle. Mol. Cell 84, 3790–3809.e8 (2024).
He, S., Valkov, E., Cheloufi, S. & Murn, J. The nexus between RNA-binding proteins and their effectors. Nat. Rev. Genet. 24, 276–294 (2023).
Zhang, Y. et al. Structure, phosphorylation and U2AF65 binding of the N-terminal domain of splicing factor 1 during 3′-splice site recognition. Nucleic Acids Res. 41, 1343–1354 (2013).
Järvelin, A. I., Noerenberg, M., Davis, I. & Castello, A. The new (dis)order in RNA regulation. Cell Commun. Signal. 14, 9 (2016).
Kato, M., Zhou, X. & McKnight, S. L. How do protein domains of low sequence complexity work? RNA 28, 3–15 (2022).
Stowell, J. A. W. et al. A low-complexity region in the YTH domain protein Mmi1 enhances RNA binding. J. Biol. Chem. 293, 9210–9222 (2018).
Nicastro, G. et al. Direct m6A recognition by IMP1 underlays an alternative model of target selection for non-canonical methyl-readers. Nucleic Acids Res. 51, 8774–8786 (2023).
Xu, C. et al. Structural basis for selective binding of m6A RNA by the YTHDC1 YTH domain. Nat. Chem. Biol. 10, 927–929 (2014).
Woods, C. T. et al. Comparative visualization of the RNA suboptimal conformational ensemble in vivo. Biophys. J. 113, 290–301 (2017).
Liu, N. et al. N6-methyladenosine-dependent RNA structural switches regulate RNA–protein interactions. Nature 518, 560–564 (2015).
Dominissini, D. et al. Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature 485, 201–206 (2012).
Tang, J., Wang, X., Xiao, D., Liu, S. & Tao, Y. The chromatin-associated RNAs in gene regulation and cancer. Mol. Cancer 22, 27 (2023).
Calandrelli, R. et al. Genome-wide analysis of the interplay between chromatin-associated RNA and 3D genome organization in human cells. Nat. Commun. 14, 6519 (2023).
Childs-Disney, J. L. et al. Targeting RNA structures with small molecules. Nat. Rev. Drug Discov. 21, 736–762 (2022).
Balaratnam, S. et al. Investigating the NRAS 5′ UTR as a target for small molecules. Cell Chem. Biol. 30, 643–657.e8 (2023).
Wu, L. et al. RNALocate v3.0: advancing the repository of RNA subcellular localization with dynamic analysis and prediction. Nucleic Acids Res. 53, D284–D292 (2025).
Rangaraju, V., tom Dieck, S. & Schuman, E. M. Local translation in neuronal compartments: how local is local? EMBO Rep. 18, 693–711 (2017).
Bourke, A. M., Schwarz, A. & Schuman, E. M. De-centralizing the central dogma: mRNA translation in space and time. Mol. Cell 83, 452–468 (2023).
Spitale, R. C. & Incarnato, D. Probing the dynamic RNA structurome and its functions. Nat. Rev. Genet. 24, 178–196 (2023).
Goering, R., Arora, A., Pockalny, M. C. & Taliaferro, J. M. RNA localization mechanisms transcend cell morphology. eLife 12, e80040 (2023).
Yang, L. et al. The challenges of investigating RNA function. Mol. Cell 84, 3567–3571 (2024).
Das, S., Vera, M., Gandin, V., Singer, R. H. & Tutucci, E. Intracellular mRNA transport and localized translation. Nat. Rev. Mol. Cell Biol. 22, 483–504 (2021).
Andreassi, C. et al. An NGF-responsive element targets myo-inositol monophosphatase-1 mRNA to sympathetic neuron axons. Nat. Neurosci. 13, 291–301 (2010).
Jambor, H., Brunel, C. & Ephrussi, A. Dimerization of oskar 3′ UTRs promotes hitchhiking for RNA localization in the drosophila oocyte. RNA 17, 2049–2057 (2011).
Will, T. J. et al. Deep sequencing and high-resolution imaging reveal compartment-specific localization of Bdnf mRNA in hippocampal neurons. Sci. Signal. 6, rs16 (2013).
Sample, P. J. et al. Human 5′ UTR design and variant effect prediction from a massively parallel translation assay. Nat. Biotechnol. 37, 803–809 (2019).
Davuluri, R. V., Suzuki, Y., Sugano, S. & Zhang, M. Q. CART classification of human 5′ UTR sequences. Genome Res. 10, 1807–1816 (2000).
Karollus, A., Avsec, Ž. & Gagneur, J. Predicting mean ribosome load for 5′UTR of any length using deep learning. PLoS Comput. Biol. 17, e1008982 (2021).
Cuperus, J. T., Groves, B. & Kuchina, A. Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences. Genome 27, 2015–2024 (2017).
Gilliot, P.-A. & Gorochowski, T. E. Transfer learning for cross-context prediction of protein expression from 5′UTR sequence. Nucleic Acids Res. 52, e58 (2024).
Wayment-Steele, H. K. et al. Deep learning models for predicting RNA degradation via dual crowdsourcing. Nat. Mach. Intell. 4, 1174–1184 (2022).
He, S., Gao, B., Sabnis, R. & Sun, Q. RNAdegformer: accurate prediction of mRNA degradation at nucleotide resolution with deep learning. Brief. Bioinform. 24, bbac581 (2023).
Garg, A., Singhal, N., Kumar, R. & Kumar, M. mRNALoc: a novel machine-learning based in-silico tool to predict mRNA subcellular localization. Nucleic Acids Res. 48, W239–W243 (2020).
Zhang, Z.-Y. et al. Design powerful predictor for mRNA subcellular location prediction in Homo sapiens. Brief. Bioinform. 22, 526–535 (2021).
Samacoits, A. et al. A computational framework to study sub-cellular RNA localization. Nat. Commun. 9, 4584 (2018).
Yan, Z., Lécuyer, E. & Blanchette, M. Prediction of mRNA subcellular localization using deep recurrent neural networks. Bioinformatics 35, i333–i342 (2019).
Musleh, S., Islam, M. T., Qureshi, R., Alajez, N. M. & Alam, T. Correction: MSLP: mRNA subcellular localization predictor based on machine learning techniques. BMC Bioinformatics 24, 156 (2023).
Wang, D. et al. DM3Loc: multi-label mRNA subcellular localization prediction and analysis based on multi-head self-attention mechanism. Nucleic Acids Res. 49, e46 (2021).
Zeng, M. et al. DeepLncLoc: a deep learning framework for long non-coding RNA subcellular localization prediction based on subsequence embedding. Brief. Bioinform. 23, bbab360 (2022).
Yang, Y. et al. Deciphering 3′UTR mediated gene regulation using interpretable deep representation learning. Adv. Sci. 11, e2407013 (2024).
Sato, K., Akiyama, M. & Sakakibara, Y. RNA secondary structure prediction using deep learning with thermodynamic integration. Nat. Commun. 12, 941 (2021).
Fu, L. et al. UFold: fast and accurate RNA secondary structure prediction with deep learning. Nucleic Acids Res. 50, e14 (2022).
Singh, J., Hanson, J., Paliwal, K. & Zhou, Y. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat. Commun. 10, 5407 (2019).
Sapoval, N. et al. Current progress and open challenges for applying deep learning across the biosciences. Nat. Commun. 13, 1728 (2022).
Weinberg, D. E. et al. Improved ribosome-footprint and mRNA measurements provide insights into dynamics and regulation of yeast translation. Cell Rep. 14, 1787–1799 (2016).
Qiu, X. Sequence similarity governs generalizability of de novo deep learning models for RNA secondary structure prediction. PLoS Comput. Biol. 19, e1011047 (2023).
Flamm, C. et al. Caveats to deep learning approaches to RNA secondary structure prediction. Front. Bioinform. 2, 835422 (2022).
Schlusser, N., González, A., Pandey, M. & Zavolan, M. Current limitations in predicting mRNA translation with deep learning models. Genome Biol. 25, 227 (2024).
Tomasev, N. et al. Pushing the limits of self-supervised ResNets: can we outperform supervised learning without labels on ImageNet? In First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward (ICML, 2022).
Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) Vol. 139 8748–8763 (PMLR, 2021).
Waisberg, E. et al. GPT-4: a new era of artificial intelligence in medicine. Ir. J. Med. Sci. 192, 3197–3200 (2023).
Zhang, Y. et al. Multiple sequence alignment-based RNA language model and its application to structural inference. Nucleic Acids Res. 52, e3 (2024).
Gong, T. & Bu, D. Language models enable zero-shot prediction of RNA secondary structures including pseudoknots. Preprint at bioRxiv https://doi.org/10.1101/2024.01.27.577533 (2024).
Yin, W. et al. ERNIE-RNA: an RNA language model with structure-enhanced representations. Preprint at bioRxiv https://doi.org/10.1101/2024.03.17.585376 (2024).
Wang, N. et al. Multi-purpose RNA language modelling with motif-aware pretraining and type-guided fine-tuning. Nat. Mach. Intell. 6, 548–557 (2024).
Akiyama, M. & Sakakibara, Y. Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning. NAR Genom. Bioinform. 4, lqac012 (2022).
Chen, J. et al. Interpretable RNA foundation model from unannotated data for highly accurate RNA structure and function predictions. Preprint at bioRxiv https://doi.org/10.1101/2022.08.06.503062 (2022).
Penić, R. J., Vlašić, T., Huber, R. G., Wan, Y. & Šikić, M. RiNALMo: general-purpose RNA language models can generalize well on structure prediction tasks. Preprint at https://doi.org/10.48550/arXiv.2403.00043 (2024).
Yamada, K. & Hamada, M. Prediction of RNA–protein interactions using a nucleotide language model. Bioinform. Adv. 2, vbac023 (2022).
Wang, X. et al. Uni-Rna: universal pre-trained models revolutionize RNA research. Preprint at bioRxiv https://doi.org/10.1101/2023.07.11.548588 (2023).
Chen, K. et al. Self-supervised learning on millions of primary RNA sequences from 72 vertebrates improves sequence-based RNA splicing prediction. Brief. Bioinform. 25, bbae163 (2024).
Sun, L. et al. Predicting dynamic cellular protein–RNA interactions by deep learning using in vivo RNA structures. Cell Res. 31, 495–516 (2021).
Flynn, R. A. et al. Transcriptome-wide interrogation of RNA secondary structure in living cells with icSHAPE. Nat. Protoc. 11, 273–290 (2016).
Rao, R., Meier, J., Sercu, T., Ovchinnikov, S. & Rives, A. Transformer protein language models are unsupervised structure learners. In Proceedings of the International Conference on Learning Representations (ICLR, 2021).
Vig, J. et al. BERTology meets biology: interpreting attention in protein language models. In Proceedings of the International Conference on Learning Representations 2021 (ICLR, 2021).
Ali, S. et al. Explainable artificial intelligence (XAI): what we know and what is left to attain trustworthy artificial intelligence. Inf. Fusion 99, 101805 (2023).
Zhao, H. et al. Explainability for large language models: a survey. ACM Trans. Intell. Syst. Technol. 15, 20 (2024).
Vu, M. H. et al. Linguistically inspired roadmap for building biologically reliable protein language models. Nat. Mach. Intell. 5, 485–496 (2023).
Dalla-Torre, H. et al. The nucleotide transformer: building and evaluating robust foundation models for human genomics. Nat. Methods 22, 287–297 (2025).
Yang, Y. et al. Deciphering 3′ UTR mediated gene regulation using interpretable deep representation learning. Adv. Sci. (Weinh.) 11, e2407013 (2024).
Ren, Y. et al. BEACON: benchmark for comprehensive RNA tasks and language models. Adv. Neural Inf. Process. Syst. 37, 92891–92921 (2024).
Su, J. et al. SaProt: protein language modelling with structure-aware vocabulary. In Proceedings of the International Confernce on Learning Representations (ICLR, 2024).
Poli, M. et al. Hyena hierarchy: towards larger convolutional language models. ICML 202, 28043–28078 (2023).
Beck, M. et al. xLSTM: extended long short-term memory. Adv. Neural Inf. Process. Syst. 37, 107547–107603 (2025).
Dai, D. et al. DeepSeekMoE: towards ultimate expert specialization in mixture-of-experts language models. In Proc. 62nd Annual Meeting of the Association for Computational Linguistics (eds Ku, L.-W., Martins, A. & Srikumar, V.) Vol. 1: Long Papers, 1280–1297 (Association for Computational Linguistics, 2024).
Peng, B., Quesnelle, J., Fan, H. & Shippole, E. YaRN: efficient context window extension of large language models. In The Twelfth International Conference on Learning Representations (2024).
He, L. et al. Pre-training co-evolutionary protein representation via a pairwise masked language model. Preprint at https://doi.org/10.48550/arXiv.2110.15527 (2021).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Zablocki, L. I. et al. Comprehensive benchmarking of large language models for RNA secondary structure prediction. Brief. Bioinform. 26, bbaf137 (2025).
Hastings, J. Primer on ontologies. Methods Mol. Biol. 1446, 3–13 (2017).
Cavalleri, E. et al. An ontology-based knowledge graph for representing interactions involving RNA molecules. Sci. Data 11, 906 (2024).
Bepler, T. & Berger, B. Learning the protein language: evolution, structure, and function. Cell Syst. 12, 654–669.e3 (2021).
Zhou, Z. et al. Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning. Nat. Commun. 15, 5566 (2024).
Zhang, N. et al. OntoProtein: protein pretraining with gene ontology embedding. In Proceedings of the International Conference on Learning Representations 2022 (ICLR, 2022).
Kulmanov, M. et al. Protein function prediction as approximate semantic entailment. Nat. Mach. Intell. 6, 220–228 (2024).
Fiannaca, A., La Rosa, M., La Paglia, L., Gaglio, S. & Urso, A. GOWDL: gene ontology-driven wide and deep learning model for cell typing of scRNA-seq data. Brief. Bioinform. 24, bbad332 (2023).
Yin, Q. & Chen, L. CellTICS: an explainable neural network for cell-type identification and interpretation based on single-cell RNA-seq data. Brief. Bioinform. 25, bbad449 (2023).
He, Y. et al. LucaOne: generalized biological foundation model with unified nucleic acid and protein language. Preprint at bioRxiv https://doi.org/10.1101/2024.05.10.592927 (2024).
Fradkin, P. et al. Orthrus: towards evolutionary and functional RNA foundation models. In NeurIPS 2024 Workshop on AI for New Drug Modalities (2024).
Boyd, N. et al. ATOM-1: a foundation model for RNA structure and function built on chemical mapping data. Preprint at bioRxiv https://doi.org/10.1101/2023.12.13.571579 (2023).
He, S. et al. Ribonanza: deep learning of RNA structure through dual crowdsourcing. Preprint at bioRxiv https://doi.org/10.1101/2024.02.24.581671 (2024).
Garau-Luis, J. J. et al. Multi-modal transfer learning between biological foundation models. Adv. Neural Inf. Process. Syst. 37, 78431–78450 (2024).
Jha, K., Karmakar, S. & Saha, S. Graph-BERT and language model-based framework for protein–protein interaction identification. Sci. Rep. 13, 5663 (2023).
Zang, X., Zhao, X. & Tang, B. Hierarchical molecular graph self-supervised learning for property prediction. Commun. Chem. 6, 34 (2023).
Birnbaum, F., Jain, S., Madry, A. & Keating, A. E. Jointly embedding protein structures and sequences through residue level alignment. PRX Life 2, 043013 (2024).
Barua, A., Ahmed, M. U. & Begum, S. A systematic literature review on multimodal machine learning: applications, challenges, gaps and future directions. IEEE Access 11, 14804–14831 (2023).
Li, S. & Tang, H. Multimodal alignment and fusion: a survey. Preprint at https://doi.org/10.48550/arXiv.2411.17040 (2024).
Liang, P. P., Zadeh, A. & Morency, L.-P. Foundations & trends in multimodal machine learning: principles, challenges, and open questions. ACM Comput. Surv. 56, 1–42 (2024).
Zhang, Z. et al. A systematic study of joint representation learning on protein sequences and structures. Preprint at https://doi.org/10.48550/arXiv.2303.06275 (2023).
Ma, M., Ren, J., Zhao, L., Testuggine, D. & Peng, X. Are multimodal transformers robust to missing modality? In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition 18156–18165 (IEEE, 2022).
Wang, H. et al. Multi-modal learning with missing modality via shared-specific feature modelling. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition 15878–15887 (IEEE, 2023).
Flügel, S., Glauer, M., Mossakowski, T., & Neuhaus, F. A fuzzy loss for ontology classification. In International Conference on Neural-Symbolic Learning and Reasoning 101–118 (Springer Nature, 2024).
Hawkins-Hooker, A., Kmec, J., Bent, O. & Duckworth, P. Likelihood-based fine-tuning of protein language models for few-shot fitness prediction and design. In ICML Workshop ML for Life ad Material Science: From Theory to Industry Applications (2024).
Vo, H. V. et al. Automatic data curation for self-supervised learning: a clustering-based approach. Trans. Mach. Learn. Res. (2024).
Campanella, G., Vanderbilt, C. & Fuchs, T. Computational pathology at health system scale — self-supervised foundation models from billions of images. In AAAI 2024 Spring Symposium on Clinical Foundation Models (2024).
Chu, Y. et al. A 5’ UTR language model for decoding untranslated regions of mRNA and function predictions. Nat. Mach. Intell. 6, 449–460 (2024).
Wei, J., Chen, S., Zong, L., Gao, X. & Li, Y. Protein–RNA interaction prediction with deep learning: structure matters. Brief. Bioinform. 23, bbab540 (2021).
Xia, Y., Xia, C.-Q., Pan, X. & Shen, H.-B. GraphBind: protein structural context embedded rules learned by hierarchical graph neural networks for recognizing nucleic-acid-binding residues. Nucleic Acids Res. 49, e51 (2021).
Tieng, F. Y. F. et al. A Hitchhiker’s guide to RNA–RNA structure and interaction prediction tools. Brief. Bioinform. 25, bbad421 (2023).
Fang, Y., Pan, X. & Shen, H.-B. Recent deep learning methodology development for RNA–RNA interaction prediction. Symmetry 14, 1302 (2022).
Zhang, H. et al. ncRNAInter: a novel strategy based on graph neural network to discover interactions between lncRNA and miRNA. Brief. Bioinform. 23, bbac411 (2022).
Li, Y.-C. et al. DeepCMI: a graph-based model for accurate prediction of circRNA–miRNA interactions with multiple information. Brief. Funct. Genomics 23, 276–285 (2024).
Rasti, S. & Vogiatzis, C. A survey of computational methods in protein–protein interaction networks. Ann. Oper. Res. 276, 35–87 (2019).
Hu, L., Wang, X., Huang, Y.-A., Hu, P. & You, Z.-H. A survey on computational models for predicting protein–protein interactions. Brief. Bioinform. 22, bbab036 (2021).
Xu, M. et al. Graph neural networks for protein-protein interactions — a short survey. Preprint at https://doi.org/10.48550/arXiv.2404.10450 (2024).
Gao, Z. et al. Hierarchical graph learning for protein–protein interaction. Nat. Commun. 14, 1093 (2023).
Huang, K., Xiao, C., Glass, L. M., Zitnik, M. & Sun, J. SkipGNN: predicting molecular interactions with skip-graph networks. Sci. Rep. 10, 21092 (2020).
Jha, K., Saha, S. & Singh, H. Prediction of protein-protein interaction using graph neural networks. Sci. Rep. 12, 8360 (2022).
Nambiar, A. et al. Transforming the language of life: transformer neural networks for protein prediction tasks. In Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics https://doi.org/10.1145/3388440.34124 (2020).
Wang, Y. et al. RPI-GGCN: prediction of RNA–protein interaction based on interpretability gated graph convolution neural network and co-regularized variational autoencoders. IEEE Trans. Neural Netw. Learn. Syst. 36, 7681–7695 (2024).
Yu, B. et al. RPI-MDLStack: predicting RNA–protein interactions through deep learning with stacking strategy and LASSO. Appl. Soft Comput. 120, 108676 (2022).
Wang, Y. et al. RPI-CapsuleGAN: predicting RNA–protein interactions through an interpretable generative adversarial capsule network. Pattern Recognit. 141, 109626 (2023).
Zhou, J., Wang, X., Niu, R., Shang, X. & Wen, J. Predicting circRNA–miRNA interactions utilizing transformer-based RNA sequential learning and high-order proximity preserved embedding. iScience 27, 108592 (2024).
Singh, R., Xu, J. & Berger, B. Struct2net: Integrating structure into protein-protein interaction prediction. Pac. Symp. Biocomput. 2006, 403–414 (2006).
Yang, F., Fan, K., Song, D. & Lin, H. Graph-based prediction of protein–protein interactions with attributed signed graph embedding. BMC Bioinformatics 21, 323 (2020).
Yan, Z., Hamilton, W. L. & Blanchette, M. Graph neural representational learning of RNA secondary structures for predicting RNA–protein interactions. Bioinformatics 36, i276–i284 (2020).
Huang, Y.-A. et al. Predicting lncRNA–miRNA interaction via graph convolution auto-encoder. Front. Genet. 10, 758 (2019).
Zhao, C. et al. Graph embedding ensemble methods based on the heterogeneous network for lncRNA–miRNA interaction prediction. BMC Genomics 21, 867 (2020).
Dutta, P. & Saha, S. Amalgamation of protein sequence, structure and textual information for improving protein–protein interaction identification. In Proc. 58th Annual Meeting of the Association for Computational Linguistics (eds Jurafsky, D., et al.) 6396–6407 (Association for Computational Linguistics, 2020).
Luo, X., Wang, L., Hu, P. & Hu, L. Predicting protein–protein interactions using sequence and network information via variational graph autoencoder. IEEE/ACM Trans. Comput. Biol. Bioinform. 20, 3182–3194 (2023).
Chen, L. et al. Graph optimal transport for cross-domain alignment. ICML 1542–1553 (2020).
Bing, R. et al. Heterogeneous graph neural networks analysis: a survey of techniques, evaluations and applications. Artif. Intell. Rev. 56, 8003–8042 (2023).
Li, D. et al. RNA–protein interaction prediction based on deep learning: a comprehensive survey. Preprint at https://doi.org/10.48550/arXiv.2410.00077 (2024).
Ying, R., Bourgeois, D., You, J., Zitnik, M. & Leskovec, J. GNNExplainer: generating explanations for graph neural networks. Adv. Neural Inf. Process. Syst. 32, 9240–9251 (2019).
Lv, G., Hu, Z., Bi, Y. & Zhang, S. Learning unknown from correlations: graph neural network for inter-novel-protein interaction prediction. in Proc. Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21 (ed. Zhou, Z.-H.) 3677–3683 (International Joint Conferences on Artificial Intelligence Organization, 2021).
Li, J. et al. Evaluating graph neural networks for link prediction: current pitfalls and new benchmarking. In Advances in Neural Information Processing Systems 36 (eds Oh, A. et al) 3853–3866 (NeurIPS, 2023).
Morris, C. et al. Position: future directions in the theory of graph machine learning. in Proc. 41st International Conference on Machine Learning (eds Salakhutdinov, R. et al.) Vol. 235, 36294–36307 (PMLR, 2024).
Zhang, B. et al. The expressive power of graph neural networks: a survey. IEEE Trans. Knowl. Data Eng. 37, 1455–1474 (2025).
Papamarkou, T. et al. Position: topological deep learning is the new frontier for relational learning. in Proc. 41st International Conference on Machine Learning (eds Salakhutdinov, R. et al.) Vol. 235, 39529–39555 (PMLR, 2024).
Zheng, X. et al. Graph neural networks for graphs with heterophily: a survey. Preprint at https://doi.org/10.48550/arXiv.2202.07082 (2022).
Chen, F., Cocaign-Bousquet, M., Girbal, L. & Nouaille, S. 5′UTR sequences influence protein levels in Escherichia coli by regulating translation initiation and mRNA stability. Front. Microbiol. 13, 1088941 (2022).
Lytle, J. R., Yario, T. A. & Steitz, J. A. Target mRNAs are repressed as efficiently by microRNA-binding sites in the 5′ UTR as in the 3′ UTR. Proc. Natl Acad. Sci. USA 104, 9667–9672 (2007).
Zhu, H. et al. Dynamic characterization and interpretation for protein–RNA interactions across diverse cellular conditions using HDRNet. Nat. Commun. 14, 6824 (2023).
Li, M. M. et al. Contextual AI models for single-cell protein biology. Nat. Methods 21, 1546–1557 (2024).
Blakes, A. J. M. et al. A systematic analysis of splicing variants identifies new diagnoses in the 100,000 genomes project. Genome Med. 14, 1–11 (2022).
Farh, K. K.-H. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2014).
Whiffin, N. et al. Characterising the loss-of-function impact of 5′ untranslated region variants in 15,708 individuals. Nat. Commun. 11, 2523 (2020).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).
GTEx Consortium et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).
Cui, Y. et al. Alternative polyadenylation transcriptome-wide association study identifies APA-linked susceptibility genes in brain disorders. Nat. Commun. 14, 583 (2023).
Mai, J., Lu, M., Gao, Q., Zeng, J. & Xiao, J. Transcriptome-wide association studies: recent advances in methods, applications and available databases. Commun. Biol. 6, 899 (2023).
Zhernakova, D. V. et al. Identification of context-dependent expression quantitative trait loci in whole blood. Nat. Genet. 49, 139–145 (2017).
Fu, X.-D. & Ares, M. Jr Context-dependent control of alternative splicing by RNA-binding proteins. Nat. Rev. Genet. 15, 689–701 (2014).
Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).
Grønning, A. G. B. et al. DeepCLIP: predicting the effect of mutations on protein-RNA binding with deep learning. Nucleic Acids Res. 48, 7099–7118 (2020).
Liu, X., Li, Y. I. & Pritchard, J. K. Trans effects on gene expression can drive omnigenic inheritance. Cell 177, 1022–1034.e6 (2019).
Cano-Gamez, E. & Trynka, G. From GWAS to function: using functional genomics to identify the mechanisms underlying complex diseases. Front. Genet. 11, 424 (2020).
Klim, J. R. et al. ALS-implicated protein TDP-43 sustains levels of STMN2, a mediator of motor neuron growth and repair. Nat. Neurosci. 22, 167–179 (2019).
Tyzack, G. E. et al. Widespread FUS mislocalization is a molecular hallmark of amyotrophic lateral sclerosis. Brain 142, 2572–2580 (2019).
Ziff, O. J. et al. Nucleocytoplasmic mRNA redistribution accompanies RNA binding protein mislocalization in ALS motor neurons and is restored by VCP ATPase inhibition. Neuron 111, 3011–3027.e7 (2023).
Wood, M. J. A., Talbot, K. & Bowerman, M. Spinal muscular atrophy: antisense oligonucleotide therapy opens the door to an integrated therapeutic landscape. Hum. Mol. Genet. 26, R151–R159 (2017).
Raal, F. J. et al. Mipomersen, an apolipoprotein B synthesis inhibitor, for lowering of LDL cholesterol concentrations in patients with homozygous familial hypercholesterolaemia: a randomised, double-blind, placebo-controlled trial. Lancet 375, 998–1006 (2010).
Mercuri, E. et al. Nusinersen versus sham control in later-onset spinal muscular atrophy. N. Engl. J. Med. 378, 625–635 (2018).
Finkel, R. S. et al. Nusinersen versus sham control in infantile-onset spinal muscular atrophy. N. Engl. J. Med. 377, 1723–1732 (2017).
Mendell, J. R. et al. Eteplirsen for the treatment of Duchenne muscular dystrophy. Ann. Neurol. 74, 637–647 (2013).
Benson, M. D. et al. Inotersen treatment for patients with hereditary transthyretin amyloidosis. N. Engl. J. Med. 379, 22–31 (2018).
Frank, D. E. et al. Increased dystrophin production with golodirsen in patients with Duchenne muscular dystrophy. Neurology 94, e2270–e2282 (2020).
Witztum, J. L. et al. Volanesorsen and triglyceride levels in familial chylomicronemia syndrome. N. Engl. J. Med. 381, 531–542 (2019).
Adams, D. et al. Patisiran, an RNAi therapeutic, for hereditary transthyretin amyloidosis. N. Engl. J. Med. 379, 11–21 (2018).
Balwani, M. et al. Phase 3 trial of RNAi therapeutic givosiran for acute intermittent porphyria. N. Engl. J. Med. 382, 2289–2301 (2020).
Garrelfs, S. F. et al. Lumasiran, an RNAi therapeutic for primary hyperoxaluria type 1. N. Engl. J. Med. 384, 1216–1226 (2021).
Ray, K. K. et al. Two phase 3 trials of inclisiran in patients with elevated LDL cholesterol. N. Engl. J. Med. 382, 1507–1519 (2020).
Clemens, P. R. et al. Long-term functional efficacy and safety of viltolarsen in patients with Duchenne muscular dystrophy. J. Neuromuscul. Dis. 9, 493–501 (2022).
Wagner, K. R. et al. Safety, tolerability, and pharmacokinetics of casimersen in patients with Duchenne muscular dystrophy amenable to exon 45 skipping: a randomized, double-blind, placebo-controlled, dose-titration trial. Muscle Nerve 64, 285–292 (2021).
Liu, A. et al. Nedosiran, a candidate siRNA drug for the treatment of primary hyperoxaluria: design, development, and clinical studies. ACS Pharmacol. Transl. Sci. 5, 1007–1016 (2022).
Miller, T. M. et al. Trial of antisense oligonucleotide tofersen for ALS. N. Engl. J. Med. 387, 1099–1110 (2022).
van Roon-Mom, W., Ferguson, C. & Aartsma-Rus, A. From failure to meet the clinical endpoint to US food and drug administration approval: 15th antisense oligonucleotide therapy approved qalsody (tofersen) for treatment of SOD1 mutated amyotrophic lateral sclerosis. Nucleic Acid. Ther. 33, 234–237 (2023).
Korobeynikov, V. A., Lyashchenko, A. K., Blanco-Redondo, B., Jafar-Nejad, P. & Shneider, N. A. Antisense oligonucleotide silencing of FUS expression as a therapeutic approach in amyotrophic lateral sclerosis. Nat. Med. 28, 104–116 (2022).
Musa, D. A., Raji, M. O., Sikiru, A. B., Aremu, K. H. & Aigboeghian, E. A. Promising RNA-based therapies for viral infections, genetic disorders and cancer. Acad. Mol. Biol. Genomics https://doi.org/10.20935/acadmolbiogen7329 (2024).
Wang, Q. et al. Cell cycle regulation by alternative polyadenylation of CCND1. Sci. Rep. 8, 6824 (2018).
Ng, K. P. et al. A common BIM deletion polymorphism mediates intrinsic resistance and inferior responses to tyrosine kinase inhibitors in cancer. Nat. Med. 18, 521–528 (2012).
Sotillo, E. et al. Convergence of acquired mutations and alternative splicing of CD19 enables resistance to CART-19 immunotherapy. Cancer Discov. 5, 1282–1295 (2015).
Sobczak, K. & Krzyzosiak, W. J. Structural determinants of BRCA1 translational regulation. J. Biol. Chem. 277, 17349–17358 (2002).
Mayr, C. & Bartel, D. P. Widespread shortening of 3′UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell 138, 673–684 (2009).
Mitschka, S. & Mayr, C. Context-specific regulation and function of mRNA alternative polyadenylation. Nat. Rev. Mol. Cell Biol. 23, 779–796 (2022).
Liang, X.-H. et al. Translation efficiency of mRNAs is increased by antisense oligonucleotides targeting upstream open reading frames. Nat. Biotechnol. 34, 875–880 (2016).
Liang, X.-H. et al. Antisense oligonucleotides targeting translation inhibitory elements in 5′ UTRs can selectively increase protein levels. Nucleic Acids Res. 45, 9528–9546 (2017).
Zhao, Y., Oono, K., Takizawa, H. & Kotera, M. GenerRNA: a generative pre-trained language model for de novo RNA design. PLoS ONE 19, e0310814 (2024).
Holdt, L. M., Kohlmaier, A. & Teupser, D. Circular RNAs as therapeutic agents and targets. Front. Physiol. 9, 1262 (2018).
Touznik, A., Maruyama, R., Hosoki, K., Echigoya, Y. & Yokota, T. LNA/DNA mixmer-based antisense oligonucleotides correct alternative splicing of the SMN2 gene and restore SMN protein expression in type 1 SMA fibroblasts. Sci. Rep. 7, 3672 (2017).
Roux, B. T., Lindsay, M. A. & Heward, J. A. Knockdown of nuclear-located enhancer RNAs and long ncRNAs using locked nucleic acid GapmeRs. Methods Mol. Biol. 1468, 11–18 (2017).
Amodio, N. et al. Drugging the lncRNA MALAT1 via LNA gapmeR ASO inhibits gene expression of proteasome subunits and triggers anti-multiple myeloma activity. Leukemia 32, 1948–1957 (2018).
Chen, R. et al. Engineering circular RNA for enhanced protein production. Nat. Biotechnol. 41, 262–272 (2023).
Zhang, G. et al. KGANSynergy: knowledge graph attention network for drug synergy prediction. Brief. Bioinform. 24, bbad167 (2023).
Vignac, C. et al. DiGress: discrete denoising diffusion for graph generation. In Proceedings of the International Conference on Learning Representations (ICLR, 2023).
Zheng, S. et al. Predicting equilibrium distributions for molecular systems with deep learning. Nat. Mach. Intell. 6, 558–567 (2024).
Nguyen, E. et al. Sequence modelling and design from molecular to genome scale with Evo. Science 386, eado9336 (2024).
Atz, K. et al. Prospective de novo drug design with deep interactome learning. Nat. Commun. 15, 1–18 (2024).
Cacciarelli, D. & Kulahci, M. Active learning for data streams: a survey. Mach. Learn. https://doi.org/10.1007/s10994-023-06454-2 (2023).
Fournier, Q. et al. Protein language models: is scaling necessary? Preprint at bioRxiv https://doi.org/10.1101/2024.09.23.614603 (2024).
Outeiral, C. & Deane, C. M. Codon language embeddings provide strong signals for use in protein engineering. Nat. Mach. Intell. 6, 170–179 (2024).
Naghipourfar, M. et al. A suite of foundation models captures the contextual interplay between codons. Preprint at bioRxiv https://doi.org/10.1101/2024.10.10.617568 (2024).
Celaj, A. et al. An RNA foundation model enables discovery of disease mechanisms and candidate therapeutics. Preprint at bioRxiv https://doi.org/10.1101/2023.09.20.558508 (2023).
Ren, Z. et al. CodonBERT: a BERT-based architecture tailored for codon optimization using the cross-attention mechanism. Bioinformatics 40, btae330 (2024).
Boccaletto, P. et al. MODOMICS: a database of RNA modification pathways. 2017 update. Nucleic Acids Res. 46, D303–D307 (2018).
Varani, L. et al. The NMR structure of the 38 kDa U1A protein–PIE RNA complex reveals the basis of cooperativity in regulation of polyadenylation by human U1A protein. Nat. Struct. Biol. 7, 329–335 (2000).
Hennig, J. et al. Structural basis for the assembly of the Sxl–Unr translation regulatory complex. Nature 515, 287–290 (2014).
Chen, S.-J. RNA folding: conformational statistics, folding kinetics, and ion electrostatics. Annu. Rev. Biophys. 37, 197–214 (2008).
Halvorsen, M., Martin, J. S., Broadaway, S. & Laederach, A. Disease-associated mutations that alter the RNA structural ensemble. PLoS Genet. 6, e1001074 (2010).
Mortimer, S. A. & Weeks, K. M. Time-resolved RNA SHAPE chemistry: quantitative RNA structure analysis in one-second snapshots and at single-nucleotide resolution. Nat. Protoc. 4, 1413–1421 (2009).
Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A. & Eddy, S. R. Rfam: an RNA family database. Nucleic Acids Res. 31, 439–441 (2003).
Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 49, D192–D200 (2021).
RNAcentral Consortium RNAcentral 2021: secondary structure integration, improved sequence search and new member databases. Nucleic Acids Res. 49, D212–D220 (2021).
Harrison, P. W. et al. Ensembl 2024. Nucleic Acids Res. 52, D891–D899 (2024).
Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information in 2023. Nucleic Acids Res. 51, D29–D38 (2023).
Kenton, J. D. M.-W. & Chang, L. K. Bert: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1, 4171–4186 (2019).
Szklarczyk, D. et al. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 51, D638–D646 (2023).
Celledoni, E. et al. Structure-preserving deep learning. Eur. J. Appl. Math. 32, 888–936 (2021).
Lai, P., Zhang, Z., Zhang, W., Fu, F. & Cui, B. Enhancing unsupervised sentence embeddings via knowledge-driven data augmentation and Gaussian-decayed contrastive learning. Preprint at https://doi.org/10.48550/arXiv.2409.12887 (2024).
Glauer, M., Neuhaus, F., Mossakowski, T. & Hastings, J. Ontology pre-training for poison prediction. In German Conference on Artificial Intelligence (Künstliche Intelligenz) 31–45 (Springer Nature, 2023).
Raissi, M., Perdikaris, P. & Karniadakis, G. E. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019).
Elmarakeby, H. A. et al. Biologically informed deep neural network for prostate cancer discovery. Nature 598, 348–352 (2021).
Bronstein, M. M., Bruna, J., Cohen, T. & Veličković, P. Geometric deep learning: grids, groups, graphs, geodesics, and gauges. Preprint at https://doi.org/10.48550/arXiv.2104.13478 (2021).
Wu, Z. et al. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32, 4–24 (2021).
Chami, I., Abu-El-Haija, S., Perozzi, B., Ré, C. & Murphy, K. Machine learning on graphs: a model and comprehensive taxonomy. J. Mach. Learn. Res. 23, 1–64 (2022).
Müller, L., Galkin, M., Morris, C. & Rampášek, L. Attending to graph transformers. Trans. Mach. Learn. Res. 2835–8856 (2024).
Kipf, T. N. & Welling, M. Variational graph auto-encoders. In NIPS Workshop on Bayesian Deep Learning (2016).
Yang, Z. et al. Understanding negative sampling in graph representation learning. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1666–1676 (ACM, 2020).
Acknowledgements
This work is supported by the Swiss National Science Foundation (310030_207907 to Z.M.X.; 215906 to C.T. and J.H.; 1000144 to C.V.C.; and 205121_207437 to L.V.P.). A.R. and M.D. are supported by Wellcome Trust Investigator Award (217213/Z/19/Z), Y.W. holds a Non-Clinical Junior Research Fellowship from the Motor Neurone Disease Association (Wang/Oct23/2324-799), and R.P. holds a Lister Research Prize Fellowship.
Author information
Authors and Affiliations
Contributions
All authors contributed substantially to discussion of the content. All authors wrote the article. All authors reviewed the manuscript before submission.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Reviews Molecular Cell Biology thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Glossary
- Attention maps
-
Matrix of attention coefficients, often used in the context of self-attention in Transformers. A large attention coefficient between two tokens means that one of them will weigh heavily in the updated representation of the other.
- Contrastive learning
-
Machine learning technique in which a model learns to differentiate between similar and dissimilar pairs of data by bringing similar pairs closer in the representation space and pushing dissimilar pairs apart.
- Embeddings
-
Learned, vectorial abstract representations of data.
- Fine-tuning
-
Fine-tuning is a secondary training stage of foundation models or models that are being trained in multiple stages, usually done on smaller, labelled datasets to specialize the model on a given task.
- Foundation model
-
A model trained on a large quantity of data to be general purpose and suitable for a variety of predictive tasks for a given data modality, in contrast to task-specific models.
- Graph neural networks
-
(GNNs). Deep learning models dedicated to learning on graphs.
- Heterogeneous
-
Graphs are homogeneous when their nodes and edges share comparable features regardless of entity or interaction type. If node or edge features originate from different spaces, the graph is heterogeneous.
- Hierarchical training
-
A machine learning strategy in which a deep learning model, composed of a succession of components dedicated to the encoding of specific modalities or concepts, is trained to achieve a given final task. Each intermediate component can be associated with a specific objective function to optimize.
- Knowledge graphs
-
Graph-structured representations of knowledge about a domain, typically comprising one or more ontologies associated to instance data of various types and their inter-relationships.
- Knowledge-based weighting strategies
-
Strategies in machine learning in which weights are assigned to training data points based on their biological or domain-specific importance rather than treating all data equally. These strategies help models learn from scarce but crucial data, which is particularly beneficial for training with imbalanced datasets.
- Large language models
-
Deep learning models that are trained in a self-supervised way on sequences. They often feature stacked transformer layers, resulting in millions to billions of parameters.
- Message passing algorithms
-
A computational method used in graph-based models, where each node exchanges information with other specific nodes in the graph. These nodes can be direct or indirect neighbours of the central node. This process is used in GNNs to iteratively update the representation of each node according to the structure of the graph and the learned representations of neighbouring nodes.
- Multimodal alignment
-
The use of multiple modalities that do not represent the same object but are related in some way.
- Multimodal fusion
-
The process of combining multiple modalities representing the same object, for example, the sequence, structure and functional annotation of a given RNA sequence.
- Multitask training
-
A machine learning strategy in which a model is trained to perform multiple tasks simultaneously. The model shares a common underlying architecture for all tasks, but has specific outputs tailored to each task.
- Objective function
-
The function that is being optimized (minimized or maximized) during training, typically a loss function that represents the difference between what the model predicts and the ground truth.
- Ontology
-
A knowledge representation structure in which explicit knowledge about a topic is organized into classes and relationships, each of which is given a definition and associated synonyms and other metadata.
- Pretraining
-
Generally used in the context of foundation models or other transfer learning scenarios, it is the first training step that allows the model to build general-purpose representations. It is generally done on large datasets, often on unlabelled data and training is done in a self-supervised manner.
- Self-attention mechanisms
-
Flexible mechanisms for a transformer model to learn the relative context-specific relationships between input tokens. They are the attention mechanisms that calculate attention scores between all elements of the input, and output the weighted averages using those scores.
- Self-supervised learning
-
A paradigm in machine learning in which a model learns the structure of a dataset from the dataset itself without additional explicit labels, for example, by being tasked to learn to predict masked or missing parts of the data given surrounding elements.
- Semantic loss function
-
A type of loss function used in machine learning models to ensure that the predictions made by the model adhere to specific domain knowledge or logical constraints, in addition to minimizing the prediction error.
- Supervised deep learning
-
A machine learning paradigm in which an input object (for example, a sequence), together with a desired output value (for example, the type of sequence), are used to train a predictive model capable of inferring outputs of this type for new inputs.
- Tokenization
-
The process of splitting a sequence into defined subunits, called ‘tokens’. In RNA, sequences are often split into nucleotides or in overlapping k-mers.
- Transfer learning
-
A paradigm in machine learning in which a model is trained in different stages, with the learned parameters from earlier stages in the training being retained or ‘transferred’ into later stages, where they benefit downstream predictions.
- Transformer architecture
-
A popular and powerful deep learning architecture that is characterized by an attention mechanism and positional encodings that allow complex contextual relationships to be learned. The architecture is made up of transformer blocks of a linear transformation layer, followed by a batch normalization layer, followed by a self-attention layer and one last batch normalization layer.
- Vectorized representations
-
Abstract, numerical representation of data in the form of a vector of (usually) continuous values.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jung, V., Vincent-Cuaz, C., Tumescheit, C. et al. Decoding the interactions and functions of non-coding RNA with artificial intelligence. Nat Rev Mol Cell Biol 26, 797–818 (2025). https://doi.org/10.1038/s41580-025-00857-w
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41580-025-00857-w


