DynaRNA: accurate dynamic RNA conformation ensemble generation with diffusion model

Li, Zhengxin; Zhu, Junjie; Hong, Xiaokun; Mu, Junxi; Zheng, Zhuoqi; Cui, Taeyoung; Sun, Yutong; Wei, Ting; Chen, Hai-Feng

doi:10.1038/s42003-025-08875-2

Download PDF

Article
Open access
Published: 15 October 2025

DynaRNA: accurate dynamic RNA conformation ensemble generation with diffusion model

Zhengxin Li¹,
Junjie Zhu¹,
Xiaokun Hong ORCID: orcid.org/0009-0003-2030-3201²,
Junxi Mu¹,
Zhuoqi Zheng¹,
Taeyoung Cui¹,
Yutong Sun¹,
Ting Wei¹ &
…
Hai-Feng Chen ORCID: orcid.org/0000-0002-7496-4182¹

Communications Biology volume 8, Article number: 1472 (2025) Cite this article

Subjects

Abstract

RNA plays a wide variety of roles in biological processes. In addition to serving as the coding messenger RNA (mRNA), the vast majority of RNAs function as non-coding RNAs (ncRNAs), where their dynamic structural ensemble is critical for mediating diverse biological functions. However, traditional experimental techniques and molecular dynamics (MD) simulations face significant challenges in characterizing the conformational dynamics of RNA, due to inherent methodological limitations and high computational cost. We herein present DynaRNA, a diffusion-based generative model for RNA conformation ensemble. DynaRNA employs a denoising diffusion probabilistic model (DDPM) with an equivariant graph neural network (EGNN) to directly model RNA 3D coordinates, enabling rapid exploration of RNA conformational space. DynaRNA enables end-to-end generation of RNA conformation ensemble reproducing experimental geometries without the need for Multiple Sequence Alignments (MSA) information. Our results demonstrate that DynaRNA effectively and accurately generates a tetranucleotide ensemble with a lower intercalation rate than molecular dynamics simulations. Besides, DynaRNA has the ability to capture the rare excited state of HIV-1 Trans-Activation Response (TAR), and recapitulate de novo folding of tetraloops. DynaRNA can serve as a complementary tool with current computational tools, such as molecular dynamics simulations, and a versatile and efficient platform for modeling RNA structural dynamics with broad implications and potential in RNA structural biology, synthetic biology, and therapeutic development.

Introduction

Over 95% of the human genome is transcribed to non-coding RNA, which serves pivotal roles in biomolecular processes¹. The intrinsic dynamic flexibility and pronounced conformational heterogeneity of RNA endow it with diverse functional capabilities². Deciphering the conformational ensembles of RNA is fundamental for understanding its intricate mechanisms of action, advancing RNA-targeted drug discovery, and facilitating the design of RNA-based therapeutic strategies³. However, traditional experimental methods, including NMR, X-ray, and cryo-electron microscopy, encounter considerable limitations in resolving the complex conformational ensembles of RNA⁴. On the one hand, these methods often average signals from multiple conformations, making it difficult to accurately capture RNA’s highly heterogeneous structural characteristics⁵. On the other hand, the intrinsic properties of RNA structures further complicate their resolution by experimental approaches⁶. Conventional computational methods, such as Molecular dynamics simulations (MDs), are too expensive to explore RNA’s vast conformation space⁷. Besides, inaccuracies in RNA force fields severely limit the application of MDs^8,9,10,11.

Recently, the rapid advancement of artificial intelligence methodologies has provided novel opportunities for structural biology¹². AlphaFold2 significantly improved the accuracy of protein structure prediction, but it does not include RNA¹³. AlphaFold3 introduced a substantially updated diffusion-based architecture that extends beyond protein to nucleic acids and other biomolecule structure prediction¹⁴. AlphaFold3 is predominantly confined to predicting single stable conformations rather than generating a conformation ensemble of RNA, which is important for comprehensively characterizing the heterogeneity of RNA¹⁵. Diffusion model has shown promise in generating protein conformation ensemble¹⁶, but has not been used in predicting RNA conformation ensemble to our best knowledge.

In this work, we developed DynaRNA to generate an RNA dynamic conformation ensemble based on a particular generative model. We herein demonstrated the capability of the diffusion model in RNA conformation generation with the development DynaRNA model in an attempt to directly model 3D coordinates of RNA, orders of magnitude faster than MDs. We show that DynaRNA can generalize across various molecular systems and propose diverse structures that agree with experimental results. We employed several RNA molecular systems, including tetranucleotides¹⁷, tetraloop^18,19, and HIV-1 TAR^20,21, to demonstrate applications of DynaRNA. DynaRNA showed the ability to generate an RNA dynamic conformation ensemble in agreement with experimental observations. Besides, DynaRNA can capture the excited-ground state (GS) of HIV-1 TAR and the de novo folding of the tetraloop. These results demonstrate that DynaRNA extends the application of diffusion models to RNA conformational ensemble generation, offering a novel approach for exploring the vast dynamic conformational space of RNA.

Results

DynaRNA architecture

Diffusion models have been widely used and proven effective in molecule generation. In this study, we employed a denoising diffusion probabilistic model (DDPM) tailored for RNA conformational ensemble generation, which operates directly on the 3D atomic coordinates of a given input structure. Distinct from conventional DDPMs that gradually diffuse inputs into pure Gaussian noise²², our model adopts a partial noising scheme, where the diffusion is applied only up to an intermediate noise step rather than a full corruption. This enables a tunable balance between preserving structural information from the original input and introducing stochastic variability for sampling diverse conformations²³. The generative pipeline of DynaRNA, shown in Fig. 1A, consists of two stages: a forward diffusion process that incrementally adds Gaussian noise to the coordinate space, and a reverse denoising process that iteratively reconstructs the conformation. The denoising network is implemented using equivariant graph neural networks (EGNNs)²⁴, which are designed to respect the Euclidean symmetries (E(3)) of molecular structures, such as rotation and translation equivariance. By modeling the molecule as a spatial graph, EGNNs ensure that the generative process remains consistent with the physical geometry of the system²⁵.

In our implementation of DynaRNA, we employed partial noising and denoising instead of a conventional fully noising process. This approach aimed to balance efficient sampling and the preservation of essential initial structural information of RNA conformation generation. With the complete diffusion process of 1024 steps, we systematically evaluated truncated versions at 200, 400, 600, 800, 1000, and 1024 steps for RNA conformation generation of U40. These generated conformational ensembles were rigorously compared against reference ensembles obtained from molecular dynamics simulations using the D. E. SHAW force field²⁶. As shown in Fig. 2, we conducted a comprehensive analysis of distance map comparisons and Jensen–Shannon (JS) divergence calculations between DynaRNA-generated ensembles across different denoising steps and the MD reference ensemble for the U40 system. Our quantitative evaluation revealed that the 800-step implementation achieved the optimal trade-off between structural fidelity and computational efficiency, demonstrating superior agreement with the MD reference while maintaining reasonable sampling speed. Therefore, we implemented a partial noising process where the forward diffusion process, as well as the backward denoising process, spans 800 steps instead of the full 1024 steps in inference. The flexibility introduced by partial noising allows users to control how much structural prior is retained during sampling, enabling tailored conformational generation based on the desired balance between fidelity and diversity. This makes our approach particularly suitable for modeling RNA molecules, as RNA conformation ensembles are often more diverse than proteins. Further architectural and training details are provided in the “Materials and methods” section.

**Fig. 2: JS divergence between MD simulation conformations (green) and generated conformations of different steps with DynaRNA (blue).**

General validation of DynaRNA

We first assessed the geometric fidelity of RNA conformations generated by DynaRNA by examining two key structural features: the distance between adjacent nucleotides and the hyper bond angles formed by three consecutive nucleotide C4’ atoms. These metrics serve as important indicators of RNA backbone integrity. We computed the distributions of these features in the DynaRNA-generated conformations and compared them with reference distributions derived from high-resolution RNA structures in the Protein Data Bank (PDB)²⁷, as shown in Fig. 1B, D. The results demonstrated that the adjacent C4’–C4’ distances in the DynaRNA-generated ensemble are highly consistent with those observed in experimental structures, both peaking around 6 Å. Besides, the hyper bond angles defined by three consecutive C4’ atoms are centered around 40 degrees in both datasets, reinforcing the model’s ability to reproduce the local backbone geometry of native RNA conformations.

Moreover, we have systematically generated conformational ensembles for all RNA structures in the PDB training set and performed a comprehensive structural evaluation, including both fundamental geometric properties, such as bond lengths and bond angles, and global structural features, such as the radius of gyration (Rg). As shown in Table 1, The average bond lengths of C5′–C4′, C3′–C4′, and C4′–O4′ in the DynaRNA-generated ensembles are 1.509 Å, 1.520 Å, and 1.450 Å, respectively, whereas those in the PDB ensembles are 1.478 Å, 1.512 Å, and 1.467 Å, respectively. The mean absolute errors (MAEs) between the two ensembles are 0.031 Å, 0.008 Å, and 0.017 Å. Similarly, the average bond angles of C5′–C4′–C3′, C5′–C4′–O4′, and C3′–C4′–O4′ in the DynaRNA-generated ensembles are 115.73°, 109.80°, and 104.29°, respectively, compared with 113.57°, 111.12°, and 104.05° in the PDB ensembles, yielding MAEs of 2.16°, 1.32°, and 0.24°, respectively. These results indicate that the basic geometric features of the DynaRNA-generated ensembles are highly consistent with those of experimental PDB structures, supporting the geometric plausibility of our generated conformations. As shown in the new Fig. 3, the predicted and experimental Rg values exhibit a very strong correlation (R² = 0.982), with the regression line closely following the y = x reference, indicating that DynaRNA accurately reproduces the global structural properties of RNA.

**Fig. 3: Correlation between the experimental and DynaRNA-predicted Rg for RNAs in the PDB training set.**

Table 1 Comparison of bond length and bond angle between experimental PDB and DynaRNA-generated conformation ensembles

Full size table

Taken together, these observations suggest that DynaRNA is capable of generating RNA conformations with high geometric plausibility, closely matching the statistical features of experimentally resolved RNA structures. This level of agreement underscores the model’s fidelity and its potential utility in RNA structure modeling and related computational studies.

DynaRNA can capture the conformation ensemble of tetranucleotides

Tetranucleotide, consisting of four nucleotides, serves as a key benchmark system for RNA computational structure research⁵. Existing computational methods, such as molecular dynamics simulations, generate a large number of RNA intercalated conformations, which are in serious disagreement with the results of solution NMR experiments¹⁷. We systematically compared the performance of DynaRNA with MD simulations employing three distinct force fields (OL3²⁸, BSFF1¹¹, and BSFF2¹⁰) initialized from canonical A-form structures and intercalated conformations. Quantitative analysis of intercalation propensity in the generated conformational ensembles is presented in Fig. 4. Notably, DynaRNA-predicted tetranucleotide ensembles exhibited substantially lower intercalation ratios compared to MD simulations with OL3. This improvement was particularly evident when starting from the intercalated conformation. For CAAU and CCCC systems, simulations with OL3 became trapped in the intercalated conformation, with intercalation rates of 97.3% and 90.7%, respectively. In contrast, DynaRNA yielded a conformation ensemble with intercalation rates of only 9.2% and 4.7%. Regardless of whether the simulations started from the A-form or the intercalated conformation, the intercalation rates in DynaRNA-generated ensembles remained below 10%, effectively demonstrating the robustness of DynaRNA. Besides, we conducted direct and quantitative comparisons of tetranucleotide ensembles with experimental data. Specifically, the radius of gyration (Rg) of tetranucleotide ensembles generated with DynaRNA and OL3 was calculated and compared with the previous experimental fit results²⁹ (Rg = 10⁻¹⁰ × (4.06 ± 0.47) × N^{(0.38 ± 0.03)}, N represents the number of nucleotides). Results shown in Supplementary Table 1 demonstrate that the DynaRNA-generated tetranucleotide ensembles are in closer agreement with experimental measurements, thereby providing more objective validation of the model. These additional comparisons based on experimental observables strengthen the validation of DynaRNA. However, DynaRNA still has room for improvement. The Rg of the DynaRNA-generated ensembles for tetranucleotides remains partially underestimated. This discrepancy stems from the fact that DynaRNA’s training set is derived from the PDB database, in which the vast majority of experimentally resolved RNA structures are compact. Consequently, DynaRNA may have limited exposure to short, single-stranded RNA conformations that are largely unstructured, such as UUUU. This limitation could be addressed by expanding the training set to include more diverse RNA dynamic datasets for model training.

**Fig. 4: DynaRNA captures the experimental conformation ensemble of tetranucleotides.**

Furthermore, we performed detailed structural analysis of the conformational ensembles. Previous studies have shown significant differences in the ζ/α dihedral distributions between intercalated and non-intercalated conformations¹¹. Our results, shown in Fig. 5 revealed that in the ensemble simulated with OL3, the ζ/α dihedrals were predominantly concentrated in the intercalated conformation region (+30° to +90°). In contrast, DynaRNA-generated ensembles exhibited ζ/α dihedrals concentrated in the non-intercalated region (−30° to −90°), which was closer to experimental results.

**Fig. 5: Results of ζ/α dihedral distributions analysis.**

Structural clustering analysis shown in Supplementary Figs. 1 and 2 further confirmed that the major conformation in DynaRNA-generated ensembles was the non-intercalated A-form, whereas OL3 force field simulations predominantly yielded intercalated conformations. Besides, we also analyzed the dihedral distribution of the conformation ensemble generated by molecular dynamics simulations with BSFF1 and BSFF2. Results shown in Supplementary Figs. 3–6 demonstrated that DynaRNA attains accuracy on par with, or exceeding, that of MD simulations in RNA conformation ensemble generation, while also offering significantly faster computational speed due to its innovative approach of bypassing the need for step-by-step sampling.

DynaRNA can capture the excited state of RNA conformation

The HIV-1 trans-activation response (TAR) element has emerged as a highly promising therapeutic target³⁰. Its structure, consisting of two helical regions connected by a bulge and a hairpin loop motif at the apex, has attracted extensive research attention³¹. Previous studies have revealed that besides the dominating GS, HIV-1 TAR also adopts low-populated excited states (ES)^20,21. These ES play essential roles in biochemical reactions, disease mechanisms, and therapeutic development³². However, due to their richness in non-canonical mismatches and energetically unfavorable nature, they are sparse and short-lived, posing significant challenges for structural characterization. Conventional experimental techniques struggle to capture the RNA excited state³³. Traditional computational techniques, such as molecular dynamics simulations, face difficulties in overcoming the high energy barriers required to sample ES³⁴. DynaRNA provides a unique technique to overcome the energy barrier and directly explore the ensemble of conformational space. Recently, Ainan Geng et al determined an HIV-1 TAR ES termed ES2 with a population of about 0.4% and a lifetime of ~2.1 ms²¹.

We herein generated the HIV-1 TAR conformation ensemble with DynaRNA initialized from both GS and ES2. The GS starting structure is taken from the PDB entry 8THV, and the ES2 starting structure from the PDB entry 8U3M. As shown in Fig. 6A, GS differed from ES2 of secondary structure, involving six base pairings and over fifteen nucleotides. Transitions between these states necessitate crossing multiple conformational potential energy barriers, presenting a formidable computational challenge for molecular dynamics simulations. This challenge is particularly serious for ES2, which exists at a higher potential energy level, making it especially difficult to access from GS. DynaRNA has successfully bridged this gap with the bidirectional conformation generation capability shown in Fig. 6B, C. When initiated from GS, DynaRNA generated conformational ensembles that effectively captured ES2. Conversely, when initiated from ES2, DynaRNA also successfully sampled GS. Results of principal component analysis (PCA) for conformation ensemble generated by DynaRNA initialized from GS and ES2 were respectively represented in Fig. 6D, E. Conformational landscapes exhibit two distinct clusters, corresponding to the ground state (GS) and the excited state 2 (ES2). Notably, initiated from either GS or ES2, DynaRNA could capture the alternate conformations, highlighting its ability to traverse the complex conformational landscape of RNA. We performed clustering analysis for the conformation ensemble generated by DynaRNA from both states. Specifically, for the ensemble initiated from GS, the clustering analysis identified a GS population of 48% and an ES2 population of 11%. Conversely, for the ensemble initiated from ES2, clustering yielded an ES2 population of 59% and a GS population of 16%. These results underscore DynaRNA as a powerful tool for discovering and characterizing rare, transient RNA conformational states and imply the potential of DynaRNA as a valuable method for RNA structural plasticity and its functional implications.

**Fig. 6: Results of HIV-1 TAR conformation generation with DynaRNA.**

DynaRNA can capture de novo folding of RNA tetraloops

Tetraloops, comprising a Watson–Crick base-paired stem and a loop of four nucleotides, represent one of the most ubiquitous and well-characterized RNA secondary structure motifs³⁵. Tetraloops play critical roles in RNA folding, stability, and function, and often serve as nucleation sites in tertiary interactions³⁶. De novo folding of tetraloops remains a challenge for RNA computational research. This challenge is further exacerbated by the inaccuracies of RNA force fields³⁷. Recent RNA force fields such as gHBfix⁹, tHBfix⁸, and DE Shaw’s RNA force field²⁶ partially revised these inaccuracies and can sample folded states with extensive simulations. Previous studies³⁸ have shown that RNA hairpin folding is a hierarchical process, which makes it computationally expensive to de novo capture the tetraloop folded state using molecular dynamics simulations³⁹. Despite these challenges, DynaRNA successfully generated native-like folded conformations of tetraloops starting from fully extended, single-stranded RNA sequences without any structural restraints or prior knowledge. Results of alignment between experimental structure and DynaRNA-predicted structure are shown in Fig. 7. DynaRNA achieved the minimum atom root-mean-square deviation (RMSD) of 0.9 Å and eRMSD of 1.01 for the UUCG tetraloop (PDB ID: 2KOC) and RMSD of 1.3 Å with eRMSD of 1.13 for the GAAA tetraloop (PDB ID: 8CLR) compared with the corresponding native structures, which recapitulated all of the Watson-Crick base pairs with no prior knowledge. We performed clustering analysis of the generated conformational ensembles, which showed native state cluster populations of 20.6% and 18%, respectively. DynaRNA is a generative model, and thus its ensemble distributions are not expected to converge to a single dominant conformation as in MD simulations. DynaRNA can serve as a complementary tool to molecular dynamics, for example, by generating near-native states that can then be refined through MD simulations to achieve converged sampling, thereby combining reduced computational cost with accurate convergence.

**Fig. 7: Results of alignment of tetraloops structures.**

Discussion

Deciphering the complex hierarchical structural dynamics of RNA is crucial for understanding its functional mechanisms⁴⁰, but this remains highly challenging for both traditional experimental and computational approaches⁴¹. To bridge this gap, we employed a neural generator to directly sample RNA dynamic conformation (DynaRNA). DynaRNA represents a paradigm shift in computational modeling of RNA conformational dynamics by leveraging the power of diffusion-based generative models. Unlike traditional molecular dynamics simulations that rely on step-by-step sampling with physics-based force fields and are often limited by sampling inefficiencies and force field inaccuracies, DynaRNA efficiently generates diverse and physically plausible RNA conformations.

DynaRNA successfully reproduced experimental geometries (e.g., C4’–C4’ distances, hyper bond angles) and predicted folded tetraloop structure de novo from fully unfolded initial conformation. Besides, DynaRNA outperformed MD simulations in generating tetranucleotide conformation ensembles, achieving intercalation rates below 10% compared to >90% for OL3 force fields, and demonstrated orders-of-magnitude faster computational efficiency. While conventional MD simulations typically require weeks of intensive sampling to explore conformational landscapes (often hindered by energy barriers and force field inaccuracies), DynaRNA generates physically plausible ensembles in mere minutes to hours on a single GPU. Notably, DynaRNA achieved high efficiency in capturing rare excited states, capturing the HIV TAR’s low-population (~0.3%) conformation, highlighting its robustness in escaping energy traps—a critical limitation of physics-based simulations. These results underscore the ability of DynaRNA to accurately and efficiently resolve RNA’s intrinsic heterogeneity.

Furthermore, the framework of DynaRNA holds substantial potential for expansion through strategic avenues. First, the artificial intelligence generative network of RNA can be integrated with a physics-based model (e.g., force field). Refining generated conformations with short MD simulations for local energy minimization could reconcile data-driven efficiency with physical realism. Second, incorporating richer training data, including RNA molecular dynamics trajectories and multi-resolution atomic representations (e.g., explicit backbone atoms beyond C4’), would enhance the model’s ability to capture subtle conformational nuances. Third, the framework of DynaRNA can be expanded to model DNA dynamics⁴², enabling comparative studies of nucleic acid flexibility. Besides, we observed that DynaRNA still has room for improvement in faithfully reproducing certain RNA secondary structure features, particularly canonical Watson–Crick base pairs. One promising direction is to enhance the granularity of the coarse-grained representation—from the current C4′-only model to incorporating additional backbone and base atoms, or even transitioning to an all-atom representation—combined with improved back-mapping methods to recover full-atom detail. Another avenue is to incorporate secondary structure recovery, specifically the preservation of canonical Watson–Crick base pairs, directly into the loss function to guide model optimization. Furthermore, expanding the training dataset beyond static PDB structures to include dynamic molecular data, such as RNA molecular dynamics trajectories, could further improve DynaRNA’s performance in challenging systems, including short single-stranded RNA systems such as the UUUU tetranucleotide.

DynaRNA bridges critical gaps in RNA research by complementing both experimental and computational techniques, such as molecular dynamics simulations, accelerating RNA therapeutic development, and expanding the scope of AI-driven structural analysis. DynaRNA can combine traditional methods like NMR and simulations, enabling cost-efficient, rapid, and accurate resolution of RNA conformational ensembles and also capturing rare excited and transient states, which are critical for function yet elude experimental and computational detection due to their low populations or short lifetimes. Future applications of DynaRNA will range from contributing to current molecular dynamics simulations, RNA-enhanced sampling, interpreting RNA experiments, RNA-targeted drug binding site identification, and RNA-protein binding mechanisms. These capabilities make DynaRNA a powerful tool for paving the way for future advancements in RNA-targeted drug discovery and RNA therapy development, such as mRNA vaccine design. Last but not least, DynaRNA breaks through the static structure prediction paradigm exemplified by AlphaFold3, pioneering the generative modeling of RNA dynamic ensembles—a framework that inherently aligns with RNA’s flexible nature, where biological functions emerge from continuous conformational transitions. By bridging RNA structure and dynamics, DynaRNA offers a scalable foundation for decoding the full complexity of RNA’s dynamic universe.

Materials and methods

Dataset

DynaRNA was trained on high-quality RNA PDB crystal structures. We extracted 14,632 experimentally determined 3D RNA structures from the RNAsolo database⁴³. The training dataset was curated by removing entries that included non-RNA elements (e.g., DNAs and proteins), non-standard RNA elements (modified bases), or incomplete nucleotides. Structures containing 5–200 nucleotides were retained, producing 6,820 curated structures as the final training dataset. For the test dataset, MD trajectories of five tetranucleotides were derived from previous REST2 simulations^10,11, which were extensively sampled starting from both the experimental A-form conformations and the intercalated conformations. MD trajectories of U40 were derived from DE Shaw’s research²⁶.

Model

DynaRNA takes a single RNA structure as input, and does not rely on sequence features like MSA. Each nucleotide is coarse-grained into one particle located at the C4’ atom, providing a minimal yet informative encoding of the RNA backbone geometry. The resulting representations are subsequently integrated into downstream modeling pipelines to facilitate structural learning. DDPM⁴⁴ is utilized in RNA conformation generation, which can be partitioned into a forward noising process and a symmetric backward denoising process. Both processes are defined on a discrete time space. The forward process gradually perturbs the original data with Gaussian noise, which is define by the following Itâ stochastic differential equation (SDE):

$${x}_{t}=\sqrt{1-{\beta }_{t}}\cdot {x}_{t-1}+\sqrt{{\beta }_{t}}\cdot {{\epsilon }}_{t}$$

(1)

$${\beta }_{t}={\beta }_{0}+\frac{t}{T-1}\cdot ({\beta }_{T}-{\beta }_{0}),t=0,1,\ldots ,T-1$$

(2)

where x_t represents the noised data at the t-th step, β_t represents the noise level at step t which is defined by Eq. (2), β₀ is set as 0.0001, β_T is set as 0.02, ${\epsilon }_{t}{{\mathscr{\sim }}}{{\mathscr{N}}}\left(0,I\right)$ is Gaussian noise.

To reverse the noising process and recover original structures, we train an EGNN⁴⁵ to predict the noise ${\epsilon }_{\theta }\left({x}_{t},t\right)$ given data x_t at the time step t. The EGNN architecture is specifically designed to respect geometric symmetries such as translation and rotation equivariance, making it highly suitable for molecular or geometric data. Each layer of EGNN incorporates both node and edge updates to capture intricate geometric relationships among nucleotides. The node features consist of the 3D positions of C4’ atoms along with time-step embeddings, while edge features encode both the molecular connectivity and spatial distances. We used a hidden dimension of 128 across all layers, with LayerNorm and SiLU activation functions to enhance training stability and non-linearity. Temporal information is encoded using sinusoidal embeddings, following the standard approach in diffusion models. To reduce overfitting, dropout with a rate of 0.1 is applied after each EGNN layer. The final output of the network predicts the noise vector added at each time step, conditioned on both geometry and graph topology. During inference, the reverse (denoising) process is approximated by integrating the following formulation:

$${\hat{x}}_{t-1}=\frac{1}{\sqrt{{\alpha }_{t}}}\left({x}_{t}-\frac{1-{\alpha }_{t}}{\sqrt{1-{\bar{\alpha }}_{t}}} \, {\cdot } \, {\epsilon }_{\theta }\left({x}_{t},t\right)\right)+z \, {\cdot } \, \sqrt{{\beta }_{t}}$$

(3)

where ${\hat{x}}_{t-1}$ represents predicted data at the previous timestep t − 1, α_t = 1 − β_t represents the signal retention rate, ${\bar{\alpha }}_{t}={\prod }_{s=1}^{t}{\alpha }_{s}$ represents the cumulative signal retention, ${\epsilon }_{\theta }\left({x}_{t},t\right)$ represents the noise predicted by EGNN, $z\sim {{\mathscr{N}}}(0,I)$ represents, the fresh Gaussian noise used during sampling.

In DDPM framework pursues a distinct training objective compared to other neural networks. Instead of directly fitting RNA coordinates, the network is designed to estimate noise in the perturbed data. We utilized L2 loss on noise defined as follows:

$$L={{\mathbb{E}}}_{{x}_{0},{\epsilon },t}[{\Vert {\epsilon }-{{\epsilon }}_{\theta }({x}_{t},t)\Vert }^{2}]$$

(4)

where $\epsilon$ represents the real noise, ${\epsilon }_{\theta }\left({x}_{t},t\right)$ represents the noise predicted ${\epsilon }_{\theta }\left({x}_{t},t\right)$ given data x_t at the time step t. ${{\mathbb{E}}}_{{x}_{0},\epsilon ,t}$ is used to calculate the mean square error (MSE) between them.

Training

DynaRNA was implemented utilizing PyTorch and PyTorch-Lightning. All training processes were conducted on one NVIDIA 4090D GPU, taking approximately 14 days. The Model parameters were optimized with the Adam optimizer⁴⁶, using a learning rate of 0.0001. To prevent gradient explosion and maintain numerical stability, gradient clipping with a maximum norm of 1.0 is applied. A weight decay of 1e-4 serves as a regularization mechanism to mitigate overfitting. Training procedure loss was shown in Supplementary Fig. 7.

Statistics and reproducibility

We generated 1000 RNA conformations for each system of tetranucleotides, tetraloops, and HIV TAR states, and calculated structural features for analysis. The random number of the generation was provided in the model for reproducibility.

Analysis

DSSR⁴⁷ software was used for RNA structure analysis. Arena⁴⁸ was used for converting the RNA coarse-grained model to an all-atom structure. Pymol was used for visualization and alignment. Intercalation conformation was defined with nucleotide j positioned between nucleotides i and i + 1 and forms stacking interactions with them, where j < i or j > i + 1. The DBSCAN algorithm was used for clustering based on the RMSD of all heavy atoms, where epsilon was set as 1.2 and the minimum number was set as 10. GS and ES2 conformations were obtained from the PDB(8U3M). Initial structures of de novo folding tetraloop were constructed with the NAB module, producing fully extended single-stranded conformations devoid of base pairing. Experimental structures from the PDB (2KOC and 8CLR) served as reference models for tetraloops.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The figure is available at Figshare⁴⁹ with doi (https://doi.org/10.6084/m9.figshare.30021871) at https://figshare.com/articles/figure/DynaRNA_figures_/30021871.The model file of DynaRNA is available at zenodo⁵⁰ with https://zenodo.org/records/15600148/files/DynaRNA.pkl.

Code availability

The code of DynaRNA and the version of the software are available at https://github.com/lizxSJTU/DynaRNA.git.

References

Esteller, M. Non-coding RNAs in human disease. Nat. Rev. Genet. 12, 861–874 (2011).
Article CAS PubMed Google Scholar
Lee, Y.-T. et al. The conformational space of RNase P RNA in solution. Nature 637, 1244–1251 (2025).
Article CAS PubMed Google Scholar
Childs-Disney, J. L. et al. Targeting RNA structures with small molecules. Nat. Rev. Drug Discov. 21, 736–762 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bonilla, S. L. & Jang, K. Challenges, advances, and opportunities in RNA structural biology by Cryo-EM. Curr. Opin. Struct. Biol. 88, 102894 (2024).
Article CAS PubMed Google Scholar
Šponer, J. et al. RNA structural dynamics as captured by molecular simulations: a comprehensive overview. Chem. Rev. 118, 4177–4338 (2018).
Article PubMed PubMed Central Google Scholar
Zhang, J., Fei, Y., Sun, L. & Zhang, Q. C. Advances and opportunities in RNA structure experimental determination and computational modeling. Nat. Methods 19, 1193–1207 (2022).
Article CAS PubMed Google Scholar
Jones, D. et al. Accelerators for classical molecular dynamics simulations of biomolecules. J. Chem. Theory Comput. 18, 4047–4069 (2022).
Article CAS PubMed PubMed Central Google Scholar
Mlýnský, V. et al. Fine-tuning of the AMBER RNA force field with a new term adjusting interactions of terminal nucleotides. J. Chem. Theory Comput. 16, 3936–3946 (2020).
Article PubMed Google Scholar
Kührová, P. et al. Improving the performance of the Amber RNA force field by tuning the hydrogen-bonding interactions. J. Chem. Theory Comput. 15, 3288–3305 (2019).
Article PubMed PubMed Central Google Scholar
Li, Z. et al. Excited-ground-state transition of the RNA strand slippage mechanism captured by the base-specific force field. J. Chem. Theory Comput. 20, 6082–6097 (2024).
Article CAS PubMed Google Scholar
Li, Z., Mu, J., Chen, J. & Chen, H. F. Base-specific RNA force field improving the dynamics conformation of nucleotide. Int J. Biol. Macromol. 222, 680–690 (2022).
Article CAS PubMed Google Scholar
Subramaniam, S. Structural biology in the age of AI. Nat. Methods 21, 18–19 (2024).
Article CAS PubMed Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article CAS PubMed PubMed Central Google Scholar
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Article CAS PubMed PubMed Central Google Scholar
Ding, J. et al. Visualizing RNA conformational and architectural heterogeneity in solution. Nat. Commun. 14, 714 (2023).
Article CAS PubMed PubMed Central Google Scholar
Zhu, J. et al. Precise generation of conformational ensembles for intrinsically disordered proteins using fine-tuned diffusion models. Preprint at bioRxiv https://doi.org/10.1101/2024.05.05.592611 (2024).
Condon, D. E. et al. Stacking in RNA: NMR of four tetramers benchmark molecular dynamics. J. Chem. Theory Comput. 11, 2729–2742 (2015).
Article CAS PubMed PubMed Central Google Scholar
Nozinovic, S., Fürtig, B., Jonker, H. R., Richter, C. & Schwalbe, H. High-resolution NMR structure of an RNA model system: the 14-mer cUUCGg tetraloop hairpin RNA. Nucleic Acids Res. 38, 683–694 (2010).
Article CAS PubMed Google Scholar
Oxenfarth, A. et al. Integrated NMR/molecular dynamics determination of the ensemble conformation of a thermodynamically stable CUUG RNA tetraloop. J. Am. Chem. Soc. 145, 16557–16572 (2023).
Article CAS PubMed PubMed Central Google Scholar
Roy, R. et al. Kinetic resolution of the atomic 3D structures formed by ground and excited conformational states in an RNA dynamic ensemble. J. Am. Chem. Soc. 145, 22964–22978 (2023).
Article CAS PubMed PubMed Central Google Scholar
Geng, A. et al. An RNA excited conformational state at atomic resolution. Nat. Commun. 14, 8432 (2023).
Article CAS PubMed PubMed Central Google Scholar
Trippe, B. L. et al. Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem. Preprint at arXiv https://arxiv.org/abs/2206.04119 (2022).
Lu, J., Zhong, B., Zhang, Z., Tang, J. Str2str: a score-based framework for zero-shot protein conformation sampling. Preprint at arXiv https://arxiv.org/abs/2306.03117 (2023).
Batzner, S. et al. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 13, 2453 (2022).
Article CAS PubMed PubMed Central Google Scholar
Soleymani, F., Paquet, E., Viktor, H. L. & Michalowski, W. Structure-based protein and small molecule generation using EGNN and diffusion models: a comprehensive review. Comput. Struct. Biotechnol. J. 23, 2779–2797 (2024).
Article CAS PubMed PubMed Central Google Scholar
Tan, D., Piana, S., Dirks, R. M. & Shaw, D. E. RNA force field with accuracy comparable to state-of-the-art protein force fields. Proc. Natl. Acad. Sci. USA 115, E1346–E1355 (2018).
Article CAS PubMed PubMed Central Google Scholar
Burley, S. K. et al. Updated resources for exploring experimentally-determined PDB structures and computed structure models at the RCSB Protein Data Bank. Nucleic Acids Res. 53, D564–d574 (2025).
Article PubMed Google Scholar
Zgarbová, M. et al. Refinement of the Cornell et al. nucleic acids force field based on reference quantum chemical calculations of glycosidic torsion profiles. J. Chem. Theory Comput. 7, 2886–2902 (2011).
Article PubMed PubMed Central Google Scholar
Werner, A. Predicting translational diffusion of evolutionary conserved RNA structures by the nucleotide number. Nucleic Acids Res. 39, e17 (2011).
Article PubMed Google Scholar
Bannwarth, S. & Gatignol, A. HIV-1 TAR RNA: the target of molecular interactions between the virus and its host. Curr. HIV Res. 3, 61–71 (2005).
Article CAS PubMed Google Scholar
Bou-Nader, C., Link, K. A., Suddala, K. C., Knutson, J. R. & Zhang, J. Structures of complete HIV-1 TAR RNA portray a dynamic platform poised for protein binding and structural remodeling. Nat. Commun. 16, 2252 (2025).
Article CAS PubMed PubMed Central Google Scholar
Chu, C. C., Plangger, R., Kreutz, C. & Al-Hashimi, H. M. Dynamic ensemble of HIV-1 RRE stem IIB reveals non-native conformations that disrupt the Rev-binding site. Nucleic Acids Res. 47, 7105–7117 (2019).
Article CAS PubMed PubMed Central Google Scholar
Xue, Y. et al. Characterizing RNA excited states using NMR relaxation dispersion. Methods Enzymol. 558, 39–73 (2015).
Article CAS PubMed PubMed Central Google Scholar
Han, G. & Xue, Y. Rational design of hairpin RNA excited states reveals multi-step transitions. Nat. Commun. 13, 1523 (2022).
Article CAS PubMed PubMed Central Google Scholar
Klosterman, P. S., Hendrix, D. K., Tamura, M., Holbrook, S. R. & Brenner, S. E. Three-dimensional motifs from the SCOR, structural classification of RNA database: extruded strands, base triples, tetraloops and U-turns. Nucleic Acids Res. 32, 2342–2352 (2004).
Article CAS PubMed PubMed Central Google Scholar
Thapar, R., Denmon, A. P. & Nikonowicz, E. P. Recognition modes of RNA tetraloops and tetraloop-like motifs by RNA-binding proteins. Wiley Interdiscip. Rev. RNA 5, 49–67 (2014).
Article CAS PubMed Google Scholar
Kührová, P. et al. Computer folding of RNA tetraloops: identification of key force field deficiencies. J. Chem. Theory Comput. 12, 4534–4548 (2016).
Article PubMed PubMed Central Google Scholar
Tinoco, I. & Bustamante, C. How RNA folds. J. Mol. Biol. 293, 271–281 (1999).
Article CAS PubMed Google Scholar
Chen, A. A. & García, A. E. High-resolution reversible folding of hyperstable RNA tetraloops using molecular dynamics simulations. Proc. Natl. Acad. Sci. USA 110, 16820–16825 (2013).
Article CAS PubMed PubMed Central Google Scholar
Mustoe, A. M., Brooks, C. L. & Al-Hashimi, H. M. Hierarchy of RNA functional dynamics. Annu Rev. Biochem. 83, 441–466 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ganser, L. R., Kelly, M. L., Herschlag, D. & Al-Hashimi, H. M. The roles of structural dynamics in the cellular functions of RNAs. Nat. Rev. Mol. Cell Biol. 20, 474–489 (2019).
Article CAS PubMed PubMed Central Google Scholar
Duzdevich, D., Redding, S. & Greene, E. C. DNA dynamics and single-molecule biology. Chem. Rev. 114, 3072–3086 (2014).
Article CAS PubMed PubMed Central Google Scholar
Adamczyk, B., Antczak, M., Szachniuk, M. RNAsolo: a repository of cleaned PDB-derived RNA 3D structures. Bioinformatics 38, 3668–3670. https://rnasolo.cs.put.poznan.pl (2022).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).
Google Scholar
Satorras, V. G., Hoogeboom, E., Welling, M. E(n) equivariant graph neural networks. In International Conference on Machine Learning 9323–9332 (PMLR, 2021).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at arXiv https://arxiv.org/abs/1412.6980 (2014).
Lu, X. J., Bussemaker, H. J. & Olson, W. K. DSSR: an integrated software tool for dissecting the spatial structure of RNA. Nucleic Acids Res. 43, e142 (2015).
PubMed PubMed Central Google Scholar
Perry, Z. R., Pyle, A. M. & Zhang, C. Arena: rapid and accurate reconstruction of full atomic RNA structures from coarse-grained models. J. Mol. Biol. 435, 168210 (2023).
Article CAS PubMed Google Scholar
Figshare. Figshare. 2011. [Data set]. Figshare https://figshare.com/articles/figure/DynaRNA_figures_/30021871 (2025).
Zenodo. Zenodo. 2013. [Data set]. Zenodo https://zenodo.org/records/15600148/files/DynaRNA.pkl (2025).
Darty, K., Denise, A. & Ponty, Y. VARNA: interactive drawing and editing of the RNA secondary structure. Bioinformatics 25, 1974–1975 (2009).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the Shanghai Municipal Science and Technology Major Project, partially by SJTU Kunpeng & Ascend Center of Excellence, the Center for HPC at Shanghai Jiao Tong University, and the National Key Research and Development Program of China (2025YFA0921000 and 2023YFF1205102), the Fundamental Research Funds for the Central Universities (YG2023LC03), the National Natural Science Foundation of China (32571435 and 32171242), and the Fuzhou University scientific research Grant (XRC-23077).

Author information

Authors and Affiliations

State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
Zhengxin Li, Junjie Zhu, Junxi Mu, Zhuoqi Zheng, Taeyoung Cui, Yutong Sun, Ting Wei & Hai-Feng Chen
College of Biological Science and Engineering, Fuzhou University, Fuzhou, China
Xiaokun Hong

Authors

Zhengxin Li
View author publications
Search author on:PubMed Google Scholar
Junjie Zhu
View author publications
Search author on:PubMed Google Scholar
Xiaokun Hong
View author publications
Search author on:PubMed Google Scholar
Junxi Mu
View author publications
Search author on:PubMed Google Scholar
Zhuoqi Zheng
View author publications
Search author on:PubMed Google Scholar
Taeyoung Cui
View author publications
Search author on:PubMed Google Scholar
Yutong Sun
View author publications
Search author on:PubMed Google Scholar
Ting Wei
View author publications
Search author on:PubMed Google Scholar
Hai-Feng Chen
View author publications
Search author on:PubMed Google Scholar

Contributions

Zhengxin Li conceived the study, developed the DynaRNA framework, implemented the computational workflow, performed molecular dynamics simulations, analyzed structural ensembles, and wrote the initial draft of the manuscript. Junjie Zhu assisted with the visualization of RNA ensembles and model training. Xiaokun Hong assisted with the preprocessing of RNA structural data. Junxi Mu assisted with data interpretation and figure preparation. Zhuoqi Zheng contributed to model development and provided advice on machine learning implementation. Taeyoung Cui, Yutong Sun, and Ting Wei participated in the interpretation of results and manuscript editing. Prof. Hai-Feng Chen conceived and supervised the project, provided critical guidance throughout the study, secured funding, and revised the manuscript.

Corresponding author

Correspondence to Hai-Feng Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Biology thanks Alan Chen and the other, anonymous, reviewers for their contribution to the peer review of this work. Primary Handling Editors: Michal Kolar and Aylin Bircan. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting summary

Transparent Peer Review file

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Li, Z., Zhu, J., Hong, X. et al. DynaRNA: accurate dynamic RNA conformation ensemble generation with diffusion model. Commun Biol 8, 1472 (2025). https://doi.org/10.1038/s42003-025-08875-2

Download citation

Received: 26 June 2025
Accepted: 09 September 2025
Published: 15 October 2025
DOI: https://doi.org/10.1038/s42003-025-08875-2

Subjects

Abstract

Introduction

Results

DynaRNA architecture

General validation of DynaRNA

DynaRNA can capture the conformation ensemble of tetranucleotides

DynaRNA can capture the excited state of RNA conformation

DynaRNA can capture de novo folding of RNA tetraloops

Discussion

Materials and methods

Dataset

Model

Training

Statistics and reproducibility

Analysis

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Supplementary Information

Reporting summary

Transparent Peer Review file

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links