Abstract
Despite recent advances in RNA-targeting drug discovery, the development of data-driven deep learning models remains challenging owing to limited validated RNA–small molecule interaction data and scarce known RNA structures. In this context, we introduce RNAsmol, a sequence-based deep learning framework that incorporates data perturbation with augmentation, graph-based molecular feature representation and attention-based feature fusion modules to predict RNA–small molecule interactions. RNAsmol employs perturbation strategies to balance the bias between the true negative and unknown interaction space, thereby elucidating the intrinsic binding patterns between RNA and small molecules. The resulting model demonstrates accurate predictions of the binding between RNA and small molecules, outperforming other methods in ten-fold cross-validation, unseen evaluation and decoy evaluation. Moreover, we use case studies to visualize molecular binding profiles and the distribution of learned weights, providing interpretable insights into RNAsmol’s predictions. In particular, without requiring structural input, RNAsmol can generate reliable predictions and be adapted to various drug design scenarios.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
All datasets used in this study are publicly available for academic use. The RCSB PDB is available at https://www.rcsb.org/. The ROBIN dataset is avilable at https://github.com/ky66/ROBIN. The non-canonical base-pairing files of structures in the PDB are available at https://www.bgsu.edu/research/rna/databases.html. The ZINC database subsets are available at https://zinc.docking.org/substances/subsets/. The BindingDB protein binder is available at https://www.bindingdb.org/rwd/bind/chemsearch/marvin/Download.jsp. The COCONUT database is available at https://coconut.naturalproducts.net/download. The ChemBridge BuildingBlocks(chbrbb) dataset is available at https://zinc12.docking.org/catalogs/chbrbb#download-vendor-info. Instructions for access and usage are provided via GitHub at https://github.com/hongli-ma/RNAsmol under a GNU General Public License v3.0 and via Zenodo at https://doi.org/10.5281/zenodo.15331739 (ref. 66). Source data are provided with this paper.
Code availability
Source code is available via GitHub at https://github.com/hongli-ma/RNAsmol under a GNU General Public License v3.0 and via Zenodo at https://doi.org/10.5281/zenodo.15331739 (ref. 66) with detailed instructions.
References
Schneider, P. et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discov. 19, 353–364 (2020).
Schneider, G. Mind and machine in drug design. Nat. Mach. Intell. 1, 128–130 (2019).
Warner, K. D., Hajdin, C. E. & Weeks, K. M. Principles for targeting RNA with drug-like small molecules. Nat. Rev. Drug Discov. 17, 547–558 (2018).
Childs-Disney, J. L. et al. Targeting RNA structures with small molecules. Nat. Rev. Drug Discov. 21, 736–762 (2022).
Knox, C. et al. DrugBank 6.0: the DrugBank knowledgebase for 2024. Nucleic Acids Res. 52, D1265–D1275 (2024).
Poehlsgaard, J. & Douthwaite, S. The bacterial ribosome as a target for antibiotics. Nat. Rev. Microbiol. 3, 870–881 (2005).
Davis, B. D. Mechanism of bactericidal action of aminoglycosides. Microbiol Rev. 51, 341–350 (1987).
Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702.e13 (2020).
Sheridan, C. First small-molecule drug targeting RNA gains momentum. Nat. Biotechnol. 39, 6–8 (2021).
Howe, J. A. et al. Selective small-molecule inhibition of an RNA structural element. Nature 526, 672–677 (2015).
Aguilar, R. et al. Targeting Xist with compounds that disrupt RNA structure and X inactivation. Nature 604, 160–166 (2022).
Kaur, J. et al. RNA–small-molecule interaction: challenging the “undruggable” tag. J. Med. Chem. 67, 4259–4297 (2024).
Ratni, H. et al. Discovery of risdiplam, a selective survival of motor neuron-2 (SMN2) gene splicing modifier for the treatment of spinal muscular atrophy (SMA). J. Med. Chem. 61, 6501–6517 (2018).
Rube, H. T. et al. Prediction of protein–ligand binding affinity from sequencing data with interpretable machine learning. Nat. Biotechnol. 40, 1520–1527 (2022).
Chen, L. et al. Sequence-based drug design as a concept in computational drug design. Nat. Commun. 14, 4217 (2023).
Kovachka, S. et al. Small molecule approaches to targeting RNA. Nat. Rev. Chem. 8, 120–135 (2024).
Velagapudi, S. P., Gallo, S. M. & Disney, M. D. Sequence-based design of bioactive small molecules that target precursor microRNAs. Nat. Chem. Biol. 10, 291–297 (2014).
Tong, Y. et al. Programming inactive RNA-binding small molecules into bioactive degraders. Nature 618, 169–179 (2023).
Yazdani, K. et al. Machine learning informs RNA-binding chemical space. Angew. Chem. Int. Ed. Engl. 62, e202211358 (2023).
Disney, M. D. et al. Inforna 2.0: a platform for the sequence-based design of small molecules targeting structured RNAs. ACS Chem. Biol. 11, 1720–1728 (2016).
Sun, S., Yang, J. & Zhang, Z. RNALigands: a database and web server for RNA–ligand interactions. RNA 28, 115–122 (2022).
Krishnan, S. R., Roy, A. & Gromiha, M. M. Reliable method for predicting the binding affinity of RNA–small molecule interactions using machine learning. Brief. Bioinform 25, bbae002 (2024).
Ruiz-Carmona, S. et al. rDock: a fast, versatile and open source program for docking ligands to proteins and nucleic acids. PLoS Comput. Biol. 10, e1003571 (2014).
Sun, L. Z. et al. RLDOCK: a new method for predicting RNA–ligand interactions. J. Chem. Theory Comput. 16, 7173–7183 (2020).
Eberhardt, J. et al. AutoDock Vina 1.2.0: new docking methods, expanded force field, and python bindings. J. Chem. Inf. Model. 61, 3891–3898 (2021).
Su, H., Peng, Z. & Yang, J. Recognition of small molecule–RNA binding sites using RNA sequence and structure. Bioinformatics 37, 36–42 (2021).
Wang, K. et al. RBind: computational network method to predict RNA binding sites. Bioinformatics 34, 3131–3136 (2018).
Liu, H. et al. RNet: a network strategy to predict RNA binding preferences. Brief. Bioinform 25, bbad482 (2023).
Oliver, C. et al. Augmented base pairing networks encode RNA–small molecule binding preferences. Nucleic Acids Res. 48, 7690–7699 (2020).
Deng, Z., R. Gu, & H. Bi, Predicting ligand–RNA binding using E3-equivariant network and pretraining. In Proc. Machine Learning in Structural Biology Workshop https://neurips.cc/media/PosterPDFs/NeurIPS%202022/59062.png?t=1669885458.1570177 (NeurIPS, 2022).
Carvajal-Patino, J. G. et al. RNAmigos2: accelerated structure-based RNA virtual screening with deep graph learning. Nat. Commun. https://doi.org/10.1038/s41467-025-57852-0 (2025).
Sun, S. & Gao, L. Contrastive pre-training and 3D convolution neural network for RNA and small molecule binding affinity prediction. Bioinformatics 40, btae155 (2024).
Wang, Y. et al. RNAincoder: a deep learning-based encoder for RNA and RNA-associated interaction. Nucleic Acids Res. 51, W509–W519 (2023).
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).
Nguyen, T. et al. GraphDTA: predicting drug–target binding affinity with graph neural networks. Bioinformatics 37, 1140–1147 (2021).
Bai, P. Z. et al. Interpretable bilinear attention network with domain adaptation improves drug-target prediction. Nat. Mach. Intell. 5, 126–136 (2023).
Mastropietro, A., Pasculli, G. & Bajorath, J. Learning characteristics of graph neural networks predicting protein–ligand affinities. Nat. Mach. Intell. 5, 1427–1436 (2023).
Yang, Z. et al. MGraphDTA: deep multiscale graph neural network for explainable drug–target binding affinity prediction. Chem. Sci. 13, 816–833 (2022).
Cheng, Z. J. et al. IIFDTI: predicting drug-target interactions through interactive and independent features based on attention mechanism. Bioinformatics 38, 4153–4161 (2022).
Cao, D. H. et al. Generic protein–ligand interaction scoring by integrating physical prior knowledge and data augmentation modelling. Nat. Mach. Intell. 6, 688–700 (2024).
Alipanahi, B. et al. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).
Zhu, H. et al. Dynamic characterization and interpretation for protein–RNA interactions across diverse cellular conditions using HDRNet. Nat. Commun. 14, 6824 (2023).
Sweeney, B. A. et al. RNAcentral 2021: secondary structure integration, improved sequence search and new member databases. Nucleic Acids Res. 49, D212–D220 (2021).
Cherkasov, A. The ‘Big Bang’ of the chemical universe. Nat. Chem. Biol. 19, 667–668 (2023).
Lyu, J., Irwin, J. J. & Shoichet, B. K. Modeling the expansion of virtual screening libraries. Nat. Chem. Biol. 19, 712–718 (2023).
Cao, Z. & Zhang, S. Simple tricks of convolutional neural network architectures improve DNA–protein binding prediction. Bioinformatics 35, 1837–1843 (2019).
de Almeida, B. P. et al. DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat. Genet. 54, 613–624 (2022).
Toneyan, S., Tang, Z. & Koo, P. K. Evaluating deep learning for predicting epigenomic profiles. Nat. Mach. Intell. 4, 1088–1100 (2022).
Kelley, D. R. Cross-species regulatory sequence activity prediction. PLoS Comput. Biol. 16, e1008050 (2020).
Duncan, A. G., Mitchell, J. A. & Moses, A. M. Improving the performance of supervised deep learning for regulatory genomics using phylogenetic augmentation. Bioinformatics 40, btae190 (2024).
Hemmerich, J., Asilar, E. & Ecker, G. F. COVER: conformational oversampling as data augmentation for molecules. J. Cheminform 12, 18 (2020).
Diao, Y. et al. Macrocyclization of linear molecules by deep learning to facilitate macrocyclic drug candidates discovery. Nat. Commun. 14, 4552 (2023).
Irwin, J. J. et al. ZINC: a free tool to discover chemistry for biology. J. Chem. Inf. Model. 52, 1757–1768 (2012).
Sorokina, M. et al. COCONUT online: Collection of Open Natural Products database. J. Cheminform 13, 2 (2021).
Gilson, M.K.L., Tiqing, BindingDB: Measured Binding Data for Protein–Ligand and Other Molecular Systems (Univ. California San Diego Library Digital Collections, 2023).
Donlic, A. et al. Discovery of small molecule ligands for MALAT1 by tuning an RNA-binding scaffold. Angew. Chem. Int. Ed. Engl. 57, 13242–13247 (2018).
Shen, C. et al. Beware of the generic machine learning-based scoring functions in structure-based virtual screening. Brief. Bioinform 22, bbaa070 (2021).
Volkov, M. et al. On the frustration to predict binding affinities from protein–ligand structures with deep neural networks. J. Med. Chem. 65, 7946–7958 (2022).
Moller, L. et al. Translating from proteins to ribonucleic acids for ligand-binding site detection. Mol. Inf. 41, e2200059 (2022).
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013).
Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).
Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 49, D192–D200 (2021).
Cereto-Massague, A. et al. DecoyFinder: an easy-to-use python GUI application for building target-specific decoy sets. Bioinformatics 28, 1661–1662 (2012).
Laskowski, R. A. & Swindells, M. B. LigPlot+: multiple ligand–protein interaction diagrams for drug discovery. J. Chem. Inf. Model. 51, 2778–2786 (2011).
PyMOL Molecular Graphics System v1.8 (Schrodinger LLC, 2015).
Ma, H. RNAsmol. Zenodo https://doi.org/10.5281/zenodo.15331739 (2025).
Acknowledgements
This work is supported by National Key Research and Development Program of China (grant nos. 2024YFC2510300 and 2024YFC3405900 to Z.J.L. and 2022YFA1304200 and 2020YFA0509600 to Z.Z.X.), the National Natural Science Foundation of China (grant nos. 82371855, 82341101 and 32170671 to Z.J.L.) and the Tsinghua University Initiative Scientific Research Program of Precision Medicine (grant no. 2022ZLA003 to Z.J.L.). This study was also supported by the BioComputing Platform of the Tsinghua University Branch of China National Center for Protein Sciences.
Author information
Authors and Affiliations
Contributions
H.M., Z.J.L. and Z.Z.X. conceived and designed the project. H.M., Y.J. and K.L. completed the preprocessing of the data. H.M. and Y.B. developed the framework of the model and performed the experiments. H.M., L.G. and J.M. performed the evaluation of the model and analyses. H.M. wrote the manuscript. Z.J.L., Z.Z.X., Y.J., X.L., P.B. and J.M. revised the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Computational Science thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Kaitlin McCardle, in collaboration with the Nature Computational Science team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Notes 1–10, Figs. 1–21 and Tables 1 and 2.
Source data
Source Data Fig. 2.
Model performance and statistical source data.
Source Data Fig. 3.
Model performance and statistical source data.
Source Data Fig. 4.
Molecular properties, model performance and statistical source data.
Source Data Fig. 5.
Model performance and statistical source data.
Source Data Fig. 6.
Structural distance and Grad-CAM weights.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ma, H., Gao, L., Jin, Y. et al. RNA–ligand interaction scoring via data perturbation and augmentation modeling. Nat Comput Sci 5, 648–660 (2025). https://doi.org/10.1038/s43588-025-00820-x
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s43588-025-00820-x
This article is cited by
-
What’s so hard about RNA-targeting drug discovery?
Nature Computational Science (2025)