Abstract
RNA plays a wide variety of roles in biological processes. In addition to serving as the coding messenger RNA (mRNA), the vast majority of RNAs function as non-coding RNAs (ncRNAs), where their dynamic structural ensemble is critical for mediating diverse biological functions. However, traditional experimental techniques and molecular dynamics (MD) simulations face significant challenges in characterizing the conformational dynamics of RNA, due to inherent methodological limitations and high computational cost. We herein present DynaRNA, a diffusion-based generative model for RNA conformation ensemble. DynaRNA employs a denoising diffusion probabilistic model (DDPM) with an equivariant graph neural network (EGNN) to directly model RNA 3D coordinates, enabling rapid exploration of RNA conformational space. DynaRNA enables end-to-end generation of RNA conformation ensemble reproducing experimental geometries without the need for Multiple Sequence Alignments (MSA) information. Our results demonstrate that DynaRNA effectively and accurately generates a tetranucleotide ensemble with a lower intercalation rate than molecular dynamics simulations. Besides, DynaRNA has the ability to capture the rare excited state of HIV-1 Trans-Activation Response (TAR), and recapitulate de novo folding of tetraloops. DynaRNA can serve as a complementary tool with current computational tools, such as molecular dynamics simulations, and a versatile and efficient platform for modeling RNA structural dynamics with broad implications and potential in RNA structural biology, synthetic biology, and therapeutic development.
Introduction
Over 95% of the human genome is transcribed to non-coding RNA, which serves pivotal roles in biomolecular processes1. The intrinsic dynamic flexibility and pronounced conformational heterogeneity of RNA endow it with diverse functional capabilities2. Deciphering the conformational ensembles of RNA is fundamental for understanding its intricate mechanisms of action, advancing RNA-targeted drug discovery, and facilitating the design of RNA-based therapeutic strategies3. However, traditional experimental methods, including NMR, X-ray, and cryo-electron microscopy, encounter considerable limitations in resolving the complex conformational ensembles of RNA4. On the one hand, these methods often average signals from multiple conformations, making it difficult to accurately capture RNA’s highly heterogeneous structural characteristics5. On the other hand, the intrinsic properties of RNA structures further complicate their resolution by experimental approaches6. Conventional computational methods, such as Molecular dynamics simulations (MDs), are too expensive to explore RNA’s vast conformation space7. Besides, inaccuracies in RNA force fields severely limit the application of MDs8,9,10,11.
Recently, the rapid advancement of artificial intelligence methodologies has provided novel opportunities for structural biology12. AlphaFold2 significantly improved the accuracy of protein structure prediction, but it does not include RNA13. AlphaFold3 introduced a substantially updated diffusion-based architecture that extends beyond protein to nucleic acids and other biomolecule structure prediction14. AlphaFold3 is predominantly confined to predicting single stable conformations rather than generating a conformation ensemble of RNA, which is important for comprehensively characterizing the heterogeneity of RNA15. Diffusion model has shown promise in generating protein conformation ensemble16, but has not been used in predicting RNA conformation ensemble to our best knowledge.
In this work, we developed DynaRNA to generate an RNA dynamic conformation ensemble based on a particular generative model. We herein demonstrated the capability of the diffusion model in RNA conformation generation with the development DynaRNA model in an attempt to directly model 3D coordinates of RNA, orders of magnitude faster than MDs. We show that DynaRNA can generalize across various molecular systems and propose diverse structures that agree with experimental results. We employed several RNA molecular systems, including tetranucleotides17, tetraloop18,19, and HIV-1 TAR20,21, to demonstrate applications of DynaRNA. DynaRNA showed the ability to generate an RNA dynamic conformation ensemble in agreement with experimental observations. Besides, DynaRNA can capture the excited-ground state (GS) of HIV-1 TAR and the de novo folding of the tetraloop. These results demonstrate that DynaRNA extends the application of diffusion models to RNA conformational ensemble generation, offering a novel approach for exploring the vast dynamic conformational space of RNA.
Results
DynaRNA architecture
Diffusion models have been widely used and proven effective in molecule generation. In this study, we employed a denoising diffusion probabilistic model (DDPM) tailored for RNA conformational ensemble generation, which operates directly on the 3D atomic coordinates of a given input structure. Distinct from conventional DDPMs that gradually diffuse inputs into pure Gaussian noise22, our model adopts a partial noising scheme, where the diffusion is applied only up to an intermediate noise step rather than a full corruption. This enables a tunable balance between preserving structural information from the original input and introducing stochastic variability for sampling diverse conformations23. The generative pipeline of DynaRNA, shown in Fig. 1A, consists of two stages: a forward diffusion process that incrementally adds Gaussian noise to the coordinate space, and a reverse denoising process that iteratively reconstructs the conformation. The denoising network is implemented using equivariant graph neural networks (EGNNs)24, which are designed to respect the Euclidean symmetries (E(3)) of molecular structures, such as rotation and translation equivariance. By modeling the molecule as a spatial graph, EGNNs ensure that the generative process remains consistent with the physical geometry of the system25.
A The framework of DynaRNA comprises two processes: a forward noising process indicated by the blue solid line, in which Gaussian noise is progressively added to the input structure, and a reverse denoising process indicated by the green dashed line. B Distribution of adjacent C4’ distances of conformation ensembles generated by DynaRNA(blue) compared to PDB experimental structures(orange). C EGNNs are utilized to predict the noise and denoise. D Distribution of C4’ hyper bond angles of conformation ensembles generated by DynaRNA (blue) compared to PDB experimental structures (green).
In our implementation of DynaRNA, we employed partial noising and denoising instead of a conventional fully noising process. This approach aimed to balance efficient sampling and the preservation of essential initial structural information of RNA conformation generation. With the complete diffusion process of 1024 steps, we systematically evaluated truncated versions at 200, 400, 600, 800, 1000, and 1024 steps for RNA conformation generation of U40. These generated conformational ensembles were rigorously compared against reference ensembles obtained from molecular dynamics simulations using the D. E. SHAW force field26. As shown in Fig. 2, we conducted a comprehensive analysis of distance map comparisons and Jensen–Shannon (JS) divergence calculations between DynaRNA-generated ensembles across different denoising steps and the MD reference ensemble for the U40 system. Our quantitative evaluation revealed that the 800-step implementation achieved the optimal trade-off between structural fidelity and computational efficiency, demonstrating superior agreement with the MD reference while maintaining reasonable sampling speed. Therefore, we implemented a partial noising process where the forward diffusion process, as well as the backward denoising process, spans 800 steps instead of the full 1024 steps in inference. The flexibility introduced by partial noising allows users to control how much structural prior is retained during sampling, enabling tailored conformational generation based on the desired balance between fidelity and diversity. This makes our approach particularly suitable for modeling RNA molecules, as RNA conformation ensembles are often more diverse than proteins. Further architectural and training details are provided in the “Materials and methods” section.
The first panel shows the nucleotide–nucleotide distance contact map of the MD simulation ensemble. The following panels present contact maps of ensembles generated by adding noise and denoising from 200 to 1024 steps. Each panel is annotated with the JS divergence relative to the MD ensemble distribution, with the ensemble at 800 steps showing the smallest JS divergence.
General validation of DynaRNA
We first assessed the geometric fidelity of RNA conformations generated by DynaRNA by examining two key structural features: the distance between adjacent nucleotides and the hyper bond angles formed by three consecutive nucleotide C4’ atoms. These metrics serve as important indicators of RNA backbone integrity. We computed the distributions of these features in the DynaRNA-generated conformations and compared them with reference distributions derived from high-resolution RNA structures in the Protein Data Bank (PDB)27, as shown in Fig. 1B, D. The results demonstrated that the adjacent C4’–C4’ distances in the DynaRNA-generated ensemble are highly consistent with those observed in experimental structures, both peaking around 6 Å. Besides, the hyper bond angles defined by three consecutive C4’ atoms are centered around 40 degrees in both datasets, reinforcing the model’s ability to reproduce the local backbone geometry of native RNA conformations.
Moreover, we have systematically generated conformational ensembles for all RNA structures in the PDB training set and performed a comprehensive structural evaluation, including both fundamental geometric properties, such as bond lengths and bond angles, and global structural features, such as the radius of gyration (Rg). As shown in Table 1, The average bond lengths of C5′–C4′, C3′–C4′, and C4′–O4′ in the DynaRNA-generated ensembles are 1.509 Å, 1.520 Å, and 1.450 Å, respectively, whereas those in the PDB ensembles are 1.478 Å, 1.512 Å, and 1.467 Å, respectively. The mean absolute errors (MAEs) between the two ensembles are 0.031 Å, 0.008 Å, and 0.017 Å. Similarly, the average bond angles of C5′–C4′–C3′, C5′–C4′–O4′, and C3′–C4′–O4′ in the DynaRNA-generated ensembles are 115.73°, 109.80°, and 104.29°, respectively, compared with 113.57°, 111.12°, and 104.05° in the PDB ensembles, yielding MAEs of 2.16°, 1.32°, and 0.24°, respectively. These results indicate that the basic geometric features of the DynaRNA-generated ensembles are highly consistent with those of experimental PDB structures, supporting the geometric plausibility of our generated conformations. As shown in the new Fig. 3, the predicted and experimental Rg values exhibit a very strong correlation (R² = 0.982), with the regression line closely following the y = x reference, indicating that DynaRNA accurately reproduces the global structural properties of RNA.
Green point represents RNA entries, with the experimental Rg on the x-axis and the ensemble-averaged Rg from DynaRNA-generated conformations on the y-axis. The blue line is the linear fit (R² = 0.982), which closely follows the black dashed line (y = x), indicating that DynaRNA accurately reproduces global structural properties.
Taken together, these observations suggest that DynaRNA is capable of generating RNA conformations with high geometric plausibility, closely matching the statistical features of experimentally resolved RNA structures. This level of agreement underscores the model’s fidelity and its potential utility in RNA structure modeling and related computational studies.
DynaRNA can capture the conformation ensemble of tetranucleotides
Tetranucleotide, consisting of four nucleotides, serves as a key benchmark system for RNA computational structure research5. Existing computational methods, such as molecular dynamics simulations, generate a large number of RNA intercalated conformations, which are in serious disagreement with the results of solution NMR experiments17. We systematically compared the performance of DynaRNA with MD simulations employing three distinct force fields (OL328, BSFF111, and BSFF210) initialized from canonical A-form structures and intercalated conformations. Quantitative analysis of intercalation propensity in the generated conformational ensembles is presented in Fig. 4. Notably, DynaRNA-predicted tetranucleotide ensembles exhibited substantially lower intercalation ratios compared to MD simulations with OL3. This improvement was particularly evident when starting from the intercalated conformation. For CAAU and CCCC systems, simulations with OL3 became trapped in the intercalated conformation, with intercalation rates of 97.3% and 90.7%, respectively. In contrast, DynaRNA yielded a conformation ensemble with intercalation rates of only 9.2% and 4.7%. Regardless of whether the simulations started from the A-form or the intercalated conformation, the intercalation rates in DynaRNA-generated ensembles remained below 10%, effectively demonstrating the robustness of DynaRNA. Besides, we conducted direct and quantitative comparisons of tetranucleotide ensembles with experimental data. Specifically, the radius of gyration (Rg) of tetranucleotide ensembles generated with DynaRNA and OL3 was calculated and compared with the previous experimental fit results29 (Rg = 10−10 × (4.06 ± 0.47) × N(0.38 ± 0.03), N represents the number of nucleotides). Results shown in Supplementary Table 1 demonstrate that the DynaRNA-generated tetranucleotide ensembles are in closer agreement with experimental measurements, thereby providing more objective validation of the model. These additional comparisons based on experimental observables strengthen the validation of DynaRNA. However, DynaRNA still has room for improvement. The Rg of the DynaRNA-generated ensembles for tetranucleotides remains partially underestimated. This discrepancy stems from the fact that DynaRNA’s training set is derived from the PDB database, in which the vast majority of experimentally resolved RNA structures are compact. Consequently, DynaRNA may have limited exposure to short, single-stranded RNA conformations that are largely unstructured, such as UUUU. This limitation could be addressed by expanding the training set to include more diverse RNA dynamic datasets for model training.
Furthermore, we performed detailed structural analysis of the conformational ensembles. Previous studies have shown significant differences in the ζ/α dihedral distributions between intercalated and non-intercalated conformations11. Our results, shown in Fig. 5 revealed that in the ensemble simulated with OL3, the ζ/α dihedrals were predominantly concentrated in the intercalated conformation region (+30° to +90°). In contrast, DynaRNA-generated ensembles exhibited ζ/α dihedrals concentrated in the non-intercalated region (−30° to −90°), which was closer to experimental results.
A Dihedral statistical distribution of PDB experimental structures. B, C Dihedral statistical distribution of conformation ensemble generated by molecular dynamics simulations with OL3 initialized from A-form structures and intercalated structures. D, E Dihedral statistical distribution of conformation ensemble generated with DynaRNA initialized from A-form structures and intercalated structures. F–J Dihedral statistical distribution of conformation ensemble generated by molecular dynamics simulations with OL3 of AAAA, CAAU, CCCC, GACC, UUUU, respectively. K–O Dihedral statistical distribution of conformation ensemble generated by molecular dynamics simulations with DynaRNA of AAAA, CAAU, CCCC, GACC, UUUU, respectively.
Structural clustering analysis shown in Supplementary Figs. 1 and 2 further confirmed that the major conformation in DynaRNA-generated ensembles was the non-intercalated A-form, whereas OL3 force field simulations predominantly yielded intercalated conformations. Besides, we also analyzed the dihedral distribution of the conformation ensemble generated by molecular dynamics simulations with BSFF1 and BSFF2. Results shown in Supplementary Figs. 3–6 demonstrated that DynaRNA attains accuracy on par with, or exceeding, that of MD simulations in RNA conformation ensemble generation, while also offering significantly faster computational speed due to its innovative approach of bypassing the need for step-by-step sampling.
DynaRNA can capture the excited state of RNA conformation
The HIV-1 trans-activation response (TAR) element has emerged as a highly promising therapeutic target30. Its structure, consisting of two helical regions connected by a bulge and a hairpin loop motif at the apex, has attracted extensive research attention31. Previous studies have revealed that besides the dominating GS, HIV-1 TAR also adopts low-populated excited states (ES)20,21. These ES play essential roles in biochemical reactions, disease mechanisms, and therapeutic development32. However, due to their richness in non-canonical mismatches and energetically unfavorable nature, they are sparse and short-lived, posing significant challenges for structural characterization. Conventional experimental techniques struggle to capture the RNA excited state33. Traditional computational techniques, such as molecular dynamics simulations, face difficulties in overcoming the high energy barriers required to sample ES34. DynaRNA provides a unique technique to overcome the energy barrier and directly explore the ensemble of conformational space. Recently, Ainan Geng et al determined an HIV-1 TAR ES termed ES2 with a population of about 0.4% and a lifetime of ~2.1 ms21.
We herein generated the HIV-1 TAR conformation ensemble with DynaRNA initialized from both GS and ES2. The GS starting structure is taken from the PDB entry 8THV, and the ES2 starting structure from the PDB entry 8U3M. As shown in Fig. 6A, GS differed from ES2 of secondary structure, involving six base pairings and over fifteen nucleotides. Transitions between these states necessitate crossing multiple conformational potential energy barriers, presenting a formidable computational challenge for molecular dynamics simulations. This challenge is particularly serious for ES2, which exists at a higher potential energy level, making it especially difficult to access from GS. DynaRNA has successfully bridged this gap with the bidirectional conformation generation capability shown in Fig. 6B, C. When initiated from GS, DynaRNA generated conformational ensembles that effectively captured ES2. Conversely, when initiated from ES2, DynaRNA also successfully sampled GS. Results of principal component analysis (PCA) for conformation ensemble generated by DynaRNA initialized from GS and ES2 were respectively represented in Fig. 6D, E. Conformational landscapes exhibit two distinct clusters, corresponding to the ground state (GS) and the excited state 2 (ES2). Notably, initiated from either GS or ES2, DynaRNA could capture the alternate conformations, highlighting its ability to traverse the complex conformational landscape of RNA. We performed clustering analysis for the conformation ensemble generated by DynaRNA from both states. Specifically, for the ensemble initiated from GS, the clustering analysis identified a GS population of 48% and an ES2 population of 11%. Conversely, for the ensemble initiated from ES2, clustering yielded an ES2 population of 59% and a GS population of 16%. These results underscore DynaRNA as a powerful tool for discovering and characterizing rare, transient RNA conformational states and imply the potential of DynaRNA as a valuable method for RNA structural plasticity and its functional implications.
A Secondary structure of HIV-1 TAR GS and ES2. B Tertiary structure of GS and ES2 generated by DynaRNA initialized from the other state. C Structural annotation for GS and ES2 generated by DynaRNA initialized from the other state. D PCA results of conformation ensemble generated by DynaRNA initialized from GS. E PCA results of the conformation ensemble generated by DynaRNA initialized from ES2.
DynaRNA can capture de novo folding of RNA tetraloops
Tetraloops, comprising a Watson–Crick base-paired stem and a loop of four nucleotides, represent one of the most ubiquitous and well-characterized RNA secondary structure motifs35. Tetraloops play critical roles in RNA folding, stability, and function, and often serve as nucleation sites in tertiary interactions36. De novo folding of tetraloops remains a challenge for RNA computational research. This challenge is further exacerbated by the inaccuracies of RNA force fields37. Recent RNA force fields such as gHBfix9, tHBfix8, and DE Shaw’s RNA force field26 partially revised these inaccuracies and can sample folded states with extensive simulations. Previous studies38 have shown that RNA hairpin folding is a hierarchical process, which makes it computationally expensive to de novo capture the tetraloop folded state using molecular dynamics simulations39. Despite these challenges, DynaRNA successfully generated native-like folded conformations of tetraloops starting from fully extended, single-stranded RNA sequences without any structural restraints or prior knowledge. Results of alignment between experimental structure and DynaRNA-predicted structure are shown in Fig. 7. DynaRNA achieved the minimum atom root-mean-square deviation (RMSD) of 0.9 Å and eRMSD of 1.01 for the UUCG tetraloop (PDB ID: 2KOC) and RMSD of 1.3 Å with eRMSD of 1.13 for the GAAA tetraloop (PDB ID: 8CLR) compared with the corresponding native structures, which recapitulated all of the Watson-Crick base pairs with no prior knowledge. We performed clustering analysis of the generated conformational ensembles, which showed native state cluster populations of 20.6% and 18%, respectively. DynaRNA is a generative model, and thus its ensemble distributions are not expected to converge to a single dominant conformation as in MD simulations. DynaRNA can serve as a complementary tool to molecular dynamics, for example, by generating near-native states that can then be refined through MD simulations to achieve converged sampling, thereby combining reduced computational cost with accurate convergence.
The green color represents experimental structure and the blue color represents the conformation generated by DynaRNA with structural annotation by DSSR and VARNA51.
Discussion
Deciphering the complex hierarchical structural dynamics of RNA is crucial for understanding its functional mechanisms40, but this remains highly challenging for both traditional experimental and computational approaches41. To bridge this gap, we employed a neural generator to directly sample RNA dynamic conformation (DynaRNA). DynaRNA represents a paradigm shift in computational modeling of RNA conformational dynamics by leveraging the power of diffusion-based generative models. Unlike traditional molecular dynamics simulations that rely on step-by-step sampling with physics-based force fields and are often limited by sampling inefficiencies and force field inaccuracies, DynaRNA efficiently generates diverse and physically plausible RNA conformations.
DynaRNA successfully reproduced experimental geometries (e.g., C4’–C4’ distances, hyper bond angles) and predicted folded tetraloop structure de novo from fully unfolded initial conformation. Besides, DynaRNA outperformed MD simulations in generating tetranucleotide conformation ensembles, achieving intercalation rates below 10% compared to >90% for OL3 force fields, and demonstrated orders-of-magnitude faster computational efficiency. While conventional MD simulations typically require weeks of intensive sampling to explore conformational landscapes (often hindered by energy barriers and force field inaccuracies), DynaRNA generates physically plausible ensembles in mere minutes to hours on a single GPU. Notably, DynaRNA achieved high efficiency in capturing rare excited states, capturing the HIV TAR’s low-population (~0.3%) conformation, highlighting its robustness in escaping energy traps—a critical limitation of physics-based simulations. These results underscore the ability of DynaRNA to accurately and efficiently resolve RNA’s intrinsic heterogeneity.
Furthermore, the framework of DynaRNA holds substantial potential for expansion through strategic avenues. First, the artificial intelligence generative network of RNA can be integrated with a physics-based model (e.g., force field). Refining generated conformations with short MD simulations for local energy minimization could reconcile data-driven efficiency with physical realism. Second, incorporating richer training data, including RNA molecular dynamics trajectories and multi-resolution atomic representations (e.g., explicit backbone atoms beyond C4’), would enhance the model’s ability to capture subtle conformational nuances. Third, the framework of DynaRNA can be expanded to model DNA dynamics42, enabling comparative studies of nucleic acid flexibility. Besides, we observed that DynaRNA still has room for improvement in faithfully reproducing certain RNA secondary structure features, particularly canonical Watson–Crick base pairs. One promising direction is to enhance the granularity of the coarse-grained representation—from the current C4′-only model to incorporating additional backbone and base atoms, or even transitioning to an all-atom representation—combined with improved back-mapping methods to recover full-atom detail. Another avenue is to incorporate secondary structure recovery, specifically the preservation of canonical Watson–Crick base pairs, directly into the loss function to guide model optimization. Furthermore, expanding the training dataset beyond static PDB structures to include dynamic molecular data, such as RNA molecular dynamics trajectories, could further improve DynaRNA’s performance in challenging systems, including short single-stranded RNA systems such as the UUUU tetranucleotide.
DynaRNA bridges critical gaps in RNA research by complementing both experimental and computational techniques, such as molecular dynamics simulations, accelerating RNA therapeutic development, and expanding the scope of AI-driven structural analysis. DynaRNA can combine traditional methods like NMR and simulations, enabling cost-efficient, rapid, and accurate resolution of RNA conformational ensembles and also capturing rare excited and transient states, which are critical for function yet elude experimental and computational detection due to their low populations or short lifetimes. Future applications of DynaRNA will range from contributing to current molecular dynamics simulations, RNA-enhanced sampling, interpreting RNA experiments, RNA-targeted drug binding site identification, and RNA-protein binding mechanisms. These capabilities make DynaRNA a powerful tool for paving the way for future advancements in RNA-targeted drug discovery and RNA therapy development, such as mRNA vaccine design. Last but not least, DynaRNA breaks through the static structure prediction paradigm exemplified by AlphaFold3, pioneering the generative modeling of RNA dynamic ensembles—a framework that inherently aligns with RNA’s flexible nature, where biological functions emerge from continuous conformational transitions. By bridging RNA structure and dynamics, DynaRNA offers a scalable foundation for decoding the full complexity of RNA’s dynamic universe.
Materials and methods
Dataset
DynaRNA was trained on high-quality RNA PDB crystal structures. We extracted 14,632 experimentally determined 3D RNA structures from the RNAsolo database43. The training dataset was curated by removing entries that included non-RNA elements (e.g., DNAs and proteins), non-standard RNA elements (modified bases), or incomplete nucleotides. Structures containing 5–200 nucleotides were retained, producing 6,820 curated structures as the final training dataset. For the test dataset, MD trajectories of five tetranucleotides were derived from previous REST2 simulations10,11, which were extensively sampled starting from both the experimental A-form conformations and the intercalated conformations. MD trajectories of U40 were derived from DE Shaw’s research26.
Model
DynaRNA takes a single RNA structure as input, and does not rely on sequence features like MSA. Each nucleotide is coarse-grained into one particle located at the C4’ atom, providing a minimal yet informative encoding of the RNA backbone geometry. The resulting representations are subsequently integrated into downstream modeling pipelines to facilitate structural learning. DDPM44 is utilized in RNA conformation generation, which can be partitioned into a forward noising process and a symmetric backward denoising process. Both processes are defined on a discrete time space. The forward process gradually perturbs the original data with Gaussian noise, which is define by the following Itâ stochastic differential equation (SDE):
where xt represents the noised data at the t-th step, βt represents the noise level at step t which is defined by Eq. (2), β0 is set as 0.0001, βT is set as 0.02, \({\epsilon }_{t}{{\mathscr{\sim }}}{{\mathscr{N}}}\left(0,I\right)\) is Gaussian noise.
To reverse the noising process and recover original structures, we train an EGNN45 to predict the noise \({\epsilon }_{\theta }\left({x}_{t},t\right)\) given data xt at the time step t. The EGNN architecture is specifically designed to respect geometric symmetries such as translation and rotation equivariance, making it highly suitable for molecular or geometric data. Each layer of EGNN incorporates both node and edge updates to capture intricate geometric relationships among nucleotides. The node features consist of the 3D positions of C4’ atoms along with time-step embeddings, while edge features encode both the molecular connectivity and spatial distances. We used a hidden dimension of 128 across all layers, with LayerNorm and SiLU activation functions to enhance training stability and non-linearity. Temporal information is encoded using sinusoidal embeddings, following the standard approach in diffusion models. To reduce overfitting, dropout with a rate of 0.1 is applied after each EGNN layer. The final output of the network predicts the noise vector added at each time step, conditioned on both geometry and graph topology. During inference, the reverse (denoising) process is approximated by integrating the following formulation:
where \({\hat{x}}_{t-1}\) represents predicted data at the previous timestep t − 1, αt = 1 − βt represents the signal retention rate, \({\bar{\alpha }}_{t}={\prod }_{s=1}^{t}{\alpha }_{s}\) represents the cumulative signal retention, \({\epsilon }_{\theta }\left({x}_{t},t\right)\) represents the noise predicted by EGNN, \(z\sim {{\mathscr{N}}}(0,I)\) represents, the fresh Gaussian noise used during sampling.
In DDPM framework pursues a distinct training objective compared to other neural networks. Instead of directly fitting RNA coordinates, the network is designed to estimate noise in the perturbed data. We utilized L2 loss on noise defined as follows:
where \(\epsilon\) represents the real noise, \({\epsilon }_{\theta }\left({x}_{t},t\right)\) represents the noise predicted \({\epsilon }_{\theta }\left({x}_{t},t\right)\) given data xt at the time step t. \({{\mathbb{E}}}_{{x}_{0},\epsilon ,t}\) is used to calculate the mean square error (MSE) between them.
Training
DynaRNA was implemented utilizing PyTorch and PyTorch-Lightning. All training processes were conducted on one NVIDIA 4090D GPU, taking approximately 14 days. The Model parameters were optimized with the Adam optimizer46, using a learning rate of 0.0001. To prevent gradient explosion and maintain numerical stability, gradient clipping with a maximum norm of 1.0 is applied. A weight decay of 1e-4 serves as a regularization mechanism to mitigate overfitting. Training procedure loss was shown in Supplementary Fig. 7.
Statistics and reproducibility
We generated 1000 RNA conformations for each system of tetranucleotides, tetraloops, and HIV TAR states, and calculated structural features for analysis. The random number of the generation was provided in the model for reproducibility.
Analysis
DSSR47 software was used for RNA structure analysis. Arena48 was used for converting the RNA coarse-grained model to an all-atom structure. Pymol was used for visualization and alignment. Intercalation conformation was defined with nucleotide j positioned between nucleotides i and i + 1 and forms stacking interactions with them, where j < i or j > i + 1. The DBSCAN algorithm was used for clustering based on the RMSD of all heavy atoms, where epsilon was set as 1.2 and the minimum number was set as 10. GS and ES2 conformations were obtained from the PDB(8U3M). Initial structures of de novo folding tetraloop were constructed with the NAB module, producing fully extended single-stranded conformations devoid of base pairing. Experimental structures from the PDB (2KOC and 8CLR) served as reference models for tetraloops.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The figure is available at Figshare49 with doi (https://doi.org/10.6084/m9.figshare.30021871) at https://figshare.com/articles/figure/DynaRNA_figures_/30021871.The model file of DynaRNA is available at zenodo50 with https://zenodo.org/records/15600148/files/DynaRNA.pkl.
Code availability
The code of DynaRNA and the version of the software are available at https://github.com/lizxSJTU/DynaRNA.git.
References
Esteller, M. Non-coding RNAs in human disease. Nat. Rev. Genet. 12, 861–874 (2011).
Lee, Y.-T. et al. The conformational space of RNase P RNA in solution. Nature 637, 1244–1251 (2025).
Childs-Disney, J. L. et al. Targeting RNA structures with small molecules. Nat. Rev. Drug Discov. 21, 736–762 (2022).
Bonilla, S. L. & Jang, K. Challenges, advances, and opportunities in RNA structural biology by Cryo-EM. Curr. Opin. Struct. Biol. 88, 102894 (2024).
Šponer, J. et al. RNA structural dynamics as captured by molecular simulations: a comprehensive overview. Chem. Rev. 118, 4177–4338 (2018).
Zhang, J., Fei, Y., Sun, L. & Zhang, Q. C. Advances and opportunities in RNA structure experimental determination and computational modeling. Nat. Methods 19, 1193–1207 (2022).
Jones, D. et al. Accelerators for classical molecular dynamics simulations of biomolecules. J. Chem. Theory Comput. 18, 4047–4069 (2022).
Mlýnský, V. et al. Fine-tuning of the AMBER RNA force field with a new term adjusting interactions of terminal nucleotides. J. Chem. Theory Comput. 16, 3936–3946 (2020).
Kührová, P. et al. Improving the performance of the Amber RNA force field by tuning the hydrogen-bonding interactions. J. Chem. Theory Comput. 15, 3288–3305 (2019).
Li, Z. et al. Excited-ground-state transition of the RNA strand slippage mechanism captured by the base-specific force field. J. Chem. Theory Comput. 20, 6082–6097 (2024).
Li, Z., Mu, J., Chen, J. & Chen, H. F. Base-specific RNA force field improving the dynamics conformation of nucleotide. Int J. Biol. Macromol. 222, 680–690 (2022).
Subramaniam, S. Structural biology in the age of AI. Nat. Methods 21, 18–19 (2024).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Ding, J. et al. Visualizing RNA conformational and architectural heterogeneity in solution. Nat. Commun. 14, 714 (2023).
Zhu, J. et al. Precise generation of conformational ensembles for intrinsically disordered proteins using fine-tuned diffusion models. Preprint at bioRxiv https://doi.org/10.1101/2024.05.05.592611 (2024).
Condon, D. E. et al. Stacking in RNA: NMR of four tetramers benchmark molecular dynamics. J. Chem. Theory Comput. 11, 2729–2742 (2015).
Nozinovic, S., Fürtig, B., Jonker, H. R., Richter, C. & Schwalbe, H. High-resolution NMR structure of an RNA model system: the 14-mer cUUCGg tetraloop hairpin RNA. Nucleic Acids Res. 38, 683–694 (2010).
Oxenfarth, A. et al. Integrated NMR/molecular dynamics determination of the ensemble conformation of a thermodynamically stable CUUG RNA tetraloop. J. Am. Chem. Soc. 145, 16557–16572 (2023).
Roy, R. et al. Kinetic resolution of the atomic 3D structures formed by ground and excited conformational states in an RNA dynamic ensemble. J. Am. Chem. Soc. 145, 22964–22978 (2023).
Geng, A. et al. An RNA excited conformational state at atomic resolution. Nat. Commun. 14, 8432 (2023).
Trippe, B. L. et al. Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem. Preprint at arXiv https://arxiv.org/abs/2206.04119 (2022).
Lu, J., Zhong, B., Zhang, Z., Tang, J. Str2str: a score-based framework for zero-shot protein conformation sampling. Preprint at arXiv https://arxiv.org/abs/2306.03117 (2023).
Batzner, S. et al. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 13, 2453 (2022).
Soleymani, F., Paquet, E., Viktor, H. L. & Michalowski, W. Structure-based protein and small molecule generation using EGNN and diffusion models: a comprehensive review. Comput. Struct. Biotechnol. J. 23, 2779–2797 (2024).
Tan, D., Piana, S., Dirks, R. M. & Shaw, D. E. RNA force field with accuracy comparable to state-of-the-art protein force fields. Proc. Natl. Acad. Sci. USA 115, E1346–E1355 (2018).
Burley, S. K. et al. Updated resources for exploring experimentally-determined PDB structures and computed structure models at the RCSB Protein Data Bank. Nucleic Acids Res. 53, D564–d574 (2025).
Zgarbová, M. et al. Refinement of the Cornell et al. nucleic acids force field based on reference quantum chemical calculations of glycosidic torsion profiles. J. Chem. Theory Comput. 7, 2886–2902 (2011).
Werner, A. Predicting translational diffusion of evolutionary conserved RNA structures by the nucleotide number. Nucleic Acids Res. 39, e17 (2011).
Bannwarth, S. & Gatignol, A. HIV-1 TAR RNA: the target of molecular interactions between the virus and its host. Curr. HIV Res. 3, 61–71 (2005).
Bou-Nader, C., Link, K. A., Suddala, K. C., Knutson, J. R. & Zhang, J. Structures of complete HIV-1 TAR RNA portray a dynamic platform poised for protein binding and structural remodeling. Nat. Commun. 16, 2252 (2025).
Chu, C. C., Plangger, R., Kreutz, C. & Al-Hashimi, H. M. Dynamic ensemble of HIV-1 RRE stem IIB reveals non-native conformations that disrupt the Rev-binding site. Nucleic Acids Res. 47, 7105–7117 (2019).
Xue, Y. et al. Characterizing RNA excited states using NMR relaxation dispersion. Methods Enzymol. 558, 39–73 (2015).
Han, G. & Xue, Y. Rational design of hairpin RNA excited states reveals multi-step transitions. Nat. Commun. 13, 1523 (2022).
Klosterman, P. S., Hendrix, D. K., Tamura, M., Holbrook, S. R. & Brenner, S. E. Three-dimensional motifs from the SCOR, structural classification of RNA database: extruded strands, base triples, tetraloops and U-turns. Nucleic Acids Res. 32, 2342–2352 (2004).
Thapar, R., Denmon, A. P. & Nikonowicz, E. P. Recognition modes of RNA tetraloops and tetraloop-like motifs by RNA-binding proteins. Wiley Interdiscip. Rev. RNA 5, 49–67 (2014).
Kührová, P. et al. Computer folding of RNA tetraloops: identification of key force field deficiencies. J. Chem. Theory Comput. 12, 4534–4548 (2016).
Tinoco, I. & Bustamante, C. How RNA folds. J. Mol. Biol. 293, 271–281 (1999).
Chen, A. A. & García, A. E. High-resolution reversible folding of hyperstable RNA tetraloops using molecular dynamics simulations. Proc. Natl. Acad. Sci. USA 110, 16820–16825 (2013).
Mustoe, A. M., Brooks, C. L. & Al-Hashimi, H. M. Hierarchy of RNA functional dynamics. Annu Rev. Biochem. 83, 441–466 (2014).
Ganser, L. R., Kelly, M. L., Herschlag, D. & Al-Hashimi, H. M. The roles of structural dynamics in the cellular functions of RNAs. Nat. Rev. Mol. Cell Biol. 20, 474–489 (2019).
Duzdevich, D., Redding, S. & Greene, E. C. DNA dynamics and single-molecule biology. Chem. Rev. 114, 3072–3086 (2014).
Adamczyk, B., Antczak, M., Szachniuk, M. RNAsolo: a repository of cleaned PDB-derived RNA 3D structures. Bioinformatics 38, 3668–3670. https://rnasolo.cs.put.poznan.pl (2022).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).
Satorras, V. G., Hoogeboom, E., Welling, M. E(n) equivariant graph neural networks. In International Conference on Machine Learning 9323–9332 (PMLR, 2021).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at arXiv https://arxiv.org/abs/1412.6980 (2014).
Lu, X. J., Bussemaker, H. J. & Olson, W. K. DSSR: an integrated software tool for dissecting the spatial structure of RNA. Nucleic Acids Res. 43, e142 (2015).
Perry, Z. R., Pyle, A. M. & Zhang, C. Arena: rapid and accurate reconstruction of full atomic RNA structures from coarse-grained models. J. Mol. Biol. 435, 168210 (2023).
Figshare. Figshare. 2011. [Data set]. Figshare https://figshare.com/articles/figure/DynaRNA_figures_/30021871 (2025).
Zenodo. Zenodo. 2013. [Data set]. Zenodo https://zenodo.org/records/15600148/files/DynaRNA.pkl (2025).
Darty, K., Denise, A. & Ponty, Y. VARNA: interactive drawing and editing of the RNA secondary structure. Bioinformatics 25, 1974–1975 (2009).
Acknowledgements
This work was supported by the Shanghai Municipal Science and Technology Major Project, partially by SJTU Kunpeng & Ascend Center of Excellence, the Center for HPC at Shanghai Jiao Tong University, and the National Key Research and Development Program of China (2025YFA0921000 and 2023YFF1205102), the Fundamental Research Funds for the Central Universities (YG2023LC03), the National Natural Science Foundation of China (32571435 and 32171242), and the Fuzhou University scientific research Grant (XRC-23077).
Author information
Authors and Affiliations
Contributions
Zhengxin Li conceived the study, developed the DynaRNA framework, implemented the computational workflow, performed molecular dynamics simulations, analyzed structural ensembles, and wrote the initial draft of the manuscript. Junjie Zhu assisted with the visualization of RNA ensembles and model training. Xiaokun Hong assisted with the preprocessing of RNA structural data. Junxi Mu assisted with data interpretation and figure preparation. Zhuoqi Zheng contributed to model development and provided advice on machine learning implementation. Taeyoung Cui, Yutong Sun, and Ting Wei participated in the interpretation of results and manuscript editing. Prof. Hai-Feng Chen conceived and supervised the project, provided critical guidance throughout the study, secured funding, and revised the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Biology thanks Alan Chen and the other, anonymous, reviewers for their contribution to the peer review of this work. Primary Handling Editors: Michal Kolar and Aylin Bircan. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Li, Z., Zhu, J., Hong, X. et al. DynaRNA: accurate dynamic RNA conformation ensemble generation with diffusion model. Commun Biol 8, 1472 (2025). https://doi.org/10.1038/s42003-025-08875-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42003-025-08875-2