Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

RNA–ligand interaction scoring via data perturbation and augmentation modeling

Abstract

Despite recent advances in RNA-targeting drug discovery, the development of data-driven deep learning models remains challenging owing to limited validated RNA–small molecule interaction data and scarce known RNA structures. In this context, we introduce RNAsmol, a sequence-based deep learning framework that incorporates data perturbation with augmentation, graph-based molecular feature representation and attention-based feature fusion modules to predict RNA–small molecule interactions. RNAsmol employs perturbation strategies to balance the bias between the true negative and unknown interaction space, thereby elucidating the intrinsic binding patterns between RNA and small molecules. The resulting model demonstrates accurate predictions of the binding between RNA and small molecules, outperforming other methods in ten-fold cross-validation, unseen evaluation and decoy evaluation. Moreover, we use case studies to visualize molecular binding profiles and the distribution of learned weights, providing interpretable insights into RNAsmol’s predictions. In particular, without requiring structural input, RNAsmol can generate reliable predictions and be adapted to various drug design scenarios.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the RNAsmol framework.
Fig. 2: Performance comparison in predicting RNA–ligand interaction based on ten-fold CV and unseen evaluation.
Fig. 3: Applications of small molecule perturbation (ρm) and RNA perturbation (ρr) on the PDB and ROBIN datasets.
Fig. 4: Optimization of the small molecule perturbation (ρm) for decoy evaluation.
Fig. 5: Performance comparison in virtual screening based on decoy evaluation.
Fig. 6: Case study visualizations of molecular hotspots of RNAsmol predictions.

Similar content being viewed by others

Data availability

All datasets used in this study are publicly available for academic use. The RCSB PDB is available at https://www.rcsb.org/. The ROBIN dataset is avilable at https://github.com/ky66/ROBIN. The non-canonical base-pairing files of structures in the PDB are available at https://www.bgsu.edu/research/rna/databases.html. The ZINC database subsets are available at https://zinc.docking.org/substances/subsets/. The BindingDB protein binder is available at https://www.bindingdb.org/rwd/bind/chemsearch/marvin/Download.jsp. The COCONUT database is available at https://coconut.naturalproducts.net/download. The ChemBridge BuildingBlocks(chbrbb) dataset is available at https://zinc12.docking.org/catalogs/chbrbb#download-vendor-info. Instructions for access and usage are provided via GitHub at https://github.com/hongli-ma/RNAsmol under a GNU General Public License v3.0 and via Zenodo at https://doi.org/10.5281/zenodo.15331739 (ref. 66). Source data are provided with this paper.

Code availability

Source code is available via GitHub at https://github.com/hongli-ma/RNAsmol under a GNU General Public License v3.0 and via Zenodo at https://doi.org/10.5281/zenodo.15331739 (ref. 66) with detailed instructions.

References

  1. Schneider, P. et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discov. 19, 353–364 (2020).

    Article  Google Scholar 

  2. Schneider, G. Mind and machine in drug design. Nat. Mach. Intell. 1, 128–130 (2019).

    Article  Google Scholar 

  3. Warner, K. D., Hajdin, C. E. & Weeks, K. M. Principles for targeting RNA with drug-like small molecules. Nat. Rev. Drug Discov. 17, 547–558 (2018).

    Article  Google Scholar 

  4. Childs-Disney, J. L. et al. Targeting RNA structures with small molecules. Nat. Rev. Drug Discov. 21, 736–762 (2022).

    Article  Google Scholar 

  5. Knox, C. et al. DrugBank 6.0: the DrugBank knowledgebase for 2024. Nucleic Acids Res. 52, D1265–D1275 (2024).

    Article  Google Scholar 

  6. Poehlsgaard, J. & Douthwaite, S. The bacterial ribosome as a target for antibiotics. Nat. Rev. Microbiol. 3, 870–881 (2005).

    Article  Google Scholar 

  7. Davis, B. D. Mechanism of bactericidal action of aminoglycosides. Microbiol Rev. 51, 341–350 (1987).

    Article  Google Scholar 

  8. Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702.e13 (2020).

    Article  Google Scholar 

  9. Sheridan, C. First small-molecule drug targeting RNA gains momentum. Nat. Biotechnol. 39, 6–8 (2021).

    Article  Google Scholar 

  10. Howe, J. A. et al. Selective small-molecule inhibition of an RNA structural element. Nature 526, 672–677 (2015).

    Article  Google Scholar 

  11. Aguilar, R. et al. Targeting Xist with compounds that disrupt RNA structure and X inactivation. Nature 604, 160–166 (2022).

    Article  Google Scholar 

  12. Kaur, J. et al. RNA–small-molecule interaction: challenging the “undruggable” tag. J. Med. Chem. 67, 4259–4297 (2024).

    Article  Google Scholar 

  13. Ratni, H. et al. Discovery of risdiplam, a selective survival of motor neuron-2 (SMN2) gene splicing modifier for the treatment of spinal muscular atrophy (SMA). J. Med. Chem. 61, 6501–6517 (2018).

    Article  Google Scholar 

  14. Rube, H. T. et al. Prediction of protein–ligand binding affinity from sequencing data with interpretable machine learning. Nat. Biotechnol. 40, 1520–1527 (2022).

    Article  Google Scholar 

  15. Chen, L. et al. Sequence-based drug design as a concept in computational drug design. Nat. Commun. 14, 4217 (2023).

    Article  Google Scholar 

  16. Kovachka, S. et al. Small molecule approaches to targeting RNA. Nat. Rev. Chem. 8, 120–135 (2024).

    Article  Google Scholar 

  17. Velagapudi, S. P., Gallo, S. M. & Disney, M. D. Sequence-based design of bioactive small molecules that target precursor microRNAs. Nat. Chem. Biol. 10, 291–297 (2014).

    Article  Google Scholar 

  18. Tong, Y. et al. Programming inactive RNA-binding small molecules into bioactive degraders. Nature 618, 169–179 (2023).

    Article  Google Scholar 

  19. Yazdani, K. et al. Machine learning informs RNA-binding chemical space. Angew. Chem. Int. Ed. Engl. 62, e202211358 (2023).

    Article  Google Scholar 

  20. Disney, M. D. et al. Inforna 2.0: a platform for the sequence-based design of small molecules targeting structured RNAs. ACS Chem. Biol. 11, 1720–1728 (2016).

    Article  Google Scholar 

  21. Sun, S., Yang, J. & Zhang, Z. RNALigands: a database and web server for RNA–ligand interactions. RNA 28, 115–122 (2022).

    Article  Google Scholar 

  22. Krishnan, S. R., Roy, A. & Gromiha, M. M. Reliable method for predicting the binding affinity of RNA–small molecule interactions using machine learning. Brief. Bioinform 25, bbae002 (2024).

    Article  Google Scholar 

  23. Ruiz-Carmona, S. et al. rDock: a fast, versatile and open source program for docking ligands to proteins and nucleic acids. PLoS Comput. Biol. 10, e1003571 (2014).

    Article  Google Scholar 

  24. Sun, L. Z. et al. RLDOCK: a new method for predicting RNA–ligand interactions. J. Chem. Theory Comput. 16, 7173–7183 (2020).

    Article  Google Scholar 

  25. Eberhardt, J. et al. AutoDock Vina 1.2.0: new docking methods, expanded force field, and python bindings. J. Chem. Inf. Model. 61, 3891–3898 (2021).

    Article  Google Scholar 

  26. Su, H., Peng, Z. & Yang, J. Recognition of small molecule–RNA binding sites using RNA sequence and structure. Bioinformatics 37, 36–42 (2021).

    Article  Google Scholar 

  27. Wang, K. et al. RBind: computational network method to predict RNA binding sites. Bioinformatics 34, 3131–3136 (2018).

    Article  Google Scholar 

  28. Liu, H. et al. RNet: a network strategy to predict RNA binding preferences. Brief. Bioinform 25, bbad482 (2023).

    Article  Google Scholar 

  29. Oliver, C. et al. Augmented base pairing networks encode RNA–small molecule binding preferences. Nucleic Acids Res. 48, 7690–7699 (2020).

    Article  Google Scholar 

  30. Deng, Z., R. Gu, & H. Bi, Predicting ligand–RNA binding using E3-equivariant network and pretraining. In Proc. Machine Learning in Structural Biology Workshop https://neurips.cc/media/PosterPDFs/NeurIPS%202022/59062.png?t=1669885458.1570177 (NeurIPS, 2022).

  31. Carvajal-Patino, J. G. et al. RNAmigos2: accelerated structure-based RNA virtual screening with deep graph learning. Nat. Commun. https://doi.org/10.1038/s41467-025-57852-0 (2025).

  32. Sun, S. & Gao, L. Contrastive pre-training and 3D convolution neural network for RNA and small molecule binding affinity prediction. Bioinformatics 40, btae155 (2024).

    Article  Google Scholar 

  33. Wang, Y. et al. RNAincoder: a deep learning-based encoder for RNA and RNA-associated interaction. Nucleic Acids Res. 51, W509–W519 (2023).

    Article  Google Scholar 

  34. Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).

    Article  Google Scholar 

  35. Nguyen, T. et al. GraphDTA: predicting drug–target binding affinity with graph neural networks. Bioinformatics 37, 1140–1147 (2021).

    Article  Google Scholar 

  36. Bai, P. Z. et al. Interpretable bilinear attention network with domain adaptation improves drug-target prediction. Nat. Mach. Intell. 5, 126–136 (2023).

    Article  Google Scholar 

  37. Mastropietro, A., Pasculli, G. & Bajorath, J. Learning characteristics of graph neural networks predicting protein–ligand affinities. Nat. Mach. Intell. 5, 1427–1436 (2023).

    Article  Google Scholar 

  38. Yang, Z. et al. MGraphDTA: deep multiscale graph neural network for explainable drug–target binding affinity prediction. Chem. Sci. 13, 816–833 (2022).

    Article  Google Scholar 

  39. Cheng, Z. J. et al. IIFDTI: predicting drug-target interactions through interactive and independent features based on attention mechanism. Bioinformatics 38, 4153–4161 (2022).

    Article  Google Scholar 

  40. Cao, D. H. et al. Generic protein–ligand interaction scoring by integrating physical prior knowledge and data augmentation modelling. Nat. Mach. Intell. 6, 688–700 (2024).

    Article  Google Scholar 

  41. Alipanahi, B. et al. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).

    Article  Google Scholar 

  42. Zhu, H. et al. Dynamic characterization and interpretation for protein–RNA interactions across diverse cellular conditions using HDRNet. Nat. Commun. 14, 6824 (2023).

    Article  Google Scholar 

  43. Sweeney, B. A. et al. RNAcentral 2021: secondary structure integration, improved sequence search and new member databases. Nucleic Acids Res. 49, D212–D220 (2021).

    Article  Google Scholar 

  44. Cherkasov, A. The ‘Big Bang’ of the chemical universe. Nat. Chem. Biol. 19, 667–668 (2023).

    Article  Google Scholar 

  45. Lyu, J., Irwin, J. J. & Shoichet, B. K. Modeling the expansion of virtual screening libraries. Nat. Chem. Biol. 19, 712–718 (2023).

    Article  Google Scholar 

  46. Cao, Z. & Zhang, S. Simple tricks of convolutional neural network architectures improve DNA–protein binding prediction. Bioinformatics 35, 1837–1843 (2019).

    Article  Google Scholar 

  47. de Almeida, B. P. et al. DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat. Genet. 54, 613–624 (2022).

    Article  Google Scholar 

  48. Toneyan, S., Tang, Z. & Koo, P. K. Evaluating deep learning for predicting epigenomic profiles. Nat. Mach. Intell. 4, 1088–1100 (2022).

    Article  Google Scholar 

  49. Kelley, D. R. Cross-species regulatory sequence activity prediction. PLoS Comput. Biol. 16, e1008050 (2020).

    Article  Google Scholar 

  50. Duncan, A. G., Mitchell, J. A. & Moses, A. M. Improving the performance of supervised deep learning for regulatory genomics using phylogenetic augmentation. Bioinformatics 40, btae190 (2024).

    Article  Google Scholar 

  51. Hemmerich, J., Asilar, E. & Ecker, G. F. COVER: conformational oversampling as data augmentation for molecules. J. Cheminform 12, 18 (2020).

    Article  Google Scholar 

  52. Diao, Y. et al. Macrocyclization of linear molecules by deep learning to facilitate macrocyclic drug candidates discovery. Nat. Commun. 14, 4552 (2023).

    Article  Google Scholar 

  53. Irwin, J. J. et al. ZINC: a free tool to discover chemistry for biology. J. Chem. Inf. Model. 52, 1757–1768 (2012).

    Article  Google Scholar 

  54. Sorokina, M. et al. COCONUT online: Collection of Open Natural Products database. J. Cheminform 13, 2 (2021).

    Article  Google Scholar 

  55. Gilson, M.K.L., Tiqing, BindingDB: Measured Binding Data for Protein–Ligand and Other Molecular Systems (Univ. California San Diego Library Digital Collections, 2023).

  56. Donlic, A. et al. Discovery of small molecule ligands for MALAT1 by tuning an RNA-binding scaffold. Angew. Chem. Int. Ed. Engl. 57, 13242–13247 (2018).

    Article  Google Scholar 

  57. Shen, C. et al. Beware of the generic machine learning-based scoring functions in structure-based virtual screening. Brief. Bioinform 22, bbaa070 (2021).

    Article  Google Scholar 

  58. Volkov, M. et al. On the frustration to predict binding affinities from protein–ligand structures with deep neural networks. J. Med. Chem. 65, 7946–7958 (2022).

    Article  Google Scholar 

  59. Moller, L. et al. Translating from proteins to ribonucleic acids for ligand-binding site detection. Mol. Inf. 41, e2200059 (2022).

    Article  Google Scholar 

  60. Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013).

    Article  Google Scholar 

  61. Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).

    Article  Google Scholar 

  62. Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 49, D192–D200 (2021).

    Article  Google Scholar 

  63. Cereto-Massague, A. et al. DecoyFinder: an easy-to-use python GUI application for building target-specific decoy sets. Bioinformatics 28, 1661–1662 (2012).

    Article  Google Scholar 

  64. Laskowski, R. A. & Swindells, M. B. LigPlot+: multiple ligand–protein interaction diagrams for drug discovery. J. Chem. Inf. Model. 51, 2778–2786 (2011).

    Article  Google Scholar 

  65. PyMOL Molecular Graphics System v1.8 (Schrodinger LLC, 2015).

  66. Ma, H. RNAsmol. Zenodo https://doi.org/10.5281/zenodo.15331739 (2025).

Download references

Acknowledgements

This work is supported by National Key Research and Development Program of China (grant nos. 2024YFC2510300 and 2024YFC3405900 to Z.J.L. and 2022YFA1304200 and 2020YFA0509600 to Z.Z.X.), the National Natural Science Foundation of China (grant nos. 82371855, 82341101 and 32170671 to Z.J.L.) and the Tsinghua University Initiative Scientific Research Program of Precision Medicine (grant no. 2022ZLA003 to Z.J.L.). This study was also supported by the BioComputing Platform of the Tsinghua University Branch of China National Center for Protein Sciences.

Author information

Authors and Affiliations

Authors

Contributions

H.M., Z.J.L. and Z.Z.X. conceived and designed the project. H.M., Y.J. and K.L. completed the preprocessing of the data. H.M. and Y.B. developed the framework of the model and performed the experiments. H.M., L.G. and J.M. performed the evaluation of the model and analyses. H.M. wrote the manuscript. Z.J.L., Z.Z.X., Y.J., X.L., P.B. and J.M. revised the manuscript.

Corresponding authors

Correspondence to Zhenjiang Zech Xu or Zhi John Lu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Kaitlin McCardle, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Notes 1–10, Figs. 1–21 and Tables 1 and 2.

Reporting Summary

Source data

Source Data Fig. 2.

Model performance and statistical source data.

Source Data Fig. 3.

Model performance and statistical source data.

Source Data Fig. 4.

Molecular properties, model performance and statistical source data.

Source Data Fig. 5.

Model performance and statistical source data.

Source Data Fig. 6.

Structural distance and Grad-CAM weights.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, H., Gao, L., Jin, Y. et al. RNA–ligand interaction scoring via data perturbation and augmentation modeling. Nat Comput Sci 5, 648–660 (2025). https://doi.org/10.1038/s43588-025-00820-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s43588-025-00820-x

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research