Main

Advances in cryo-EM have enabled the visualization of biomolecular complexes in their near-native hydrated states. Water and ions are critical for maintaining the stability and functional effectiveness of biomolecules1,2,3. Unlike most protein or protein–RNA complexes, RNA forms well-defined structures whose cores are extensively solvated. Water has been implicated in the stability, catalysis and dynamics of RNA both independently and in collaboration with ions4,5,6,7,8,9,10,11. Both highly ordered and diffuse water and ions are involved in RNA folding and function19,20. Thus, RNA-only structures offer unique opportunities to understand how water interacts with and stabilizes biomolecules. Unfortunately, the flexibility of RNA raises challenges for experimental structure determination. Molecular dynamics simulations can suggest kinetic and structural information inaccessible to current cryo-EM and X-ray crystallographic methods12. For example, molecular dynamics studies have proposed binding sites for long-lived water6,13,14,15, Mg2+ ions14,21 and spines of fully hydrated metal ions16. Nevertheless, the sensitivity and potential inaccuracies of the parameterization and classical assumptions of molecular dynamics force fields have limited confidence in these inferences22,23,24.

Here we report cryo-EM maps of solvated and enzymatically active Tetrahymena ribozyme without substrate (387 nt, 128 kDa) using cryo-EM in two independent reconstructions at 2.2 Å and 2.3 Å resolution, enabling detailed analysis and confidence estimation of the interactions of ordered water in the context of an intricate RNA tertiary structure. Water is extensively observed in the interior of the ribozyme and can directly mediate interactions between RNA atoms without also coordinating site-bound ions. Unexpectedly, we found numerous sites where computationally identified waters in the 2.2 Å and 2.3 Å maps disagreed but still had cryo-EM densities that were highly similar. Many of these regions showed a high correlation between the cryo-EM densities and the density predicted by explicit solvent all-atom molecular dynamics simulations, suggesting that cryo-EM maps contain information on highly mobile waters in addition to ordered waters traditionally annotated in atomic models. We also observed regions where the two cryo-EM maps share diffuse water density features that do not agree with the predictions of molecular dynamics, encouraging further development and validation of molecular dynamics force fields, potentially through future blind prediction challenges using flash-frozen biomolecules.

Structure determination

Since 2018, cryo-EM has been applied to the study of several RNA-only 3D structures at subnanometre resolutions25, but obtaining sufficient resolution to resolve waters has remained out of reach. Meanwhile, cryo-EM single-particle analysis has been effective at determining atomic structures of large protein–nucleic acid complexes at approximately 2 Å resolution, enabling the visualization of ordered water and ion densities26,27, but nucleic acids in these complexes are generally not fully solvated because of their extensive interactions with proteins. This study extends the resolution of an RNA-only system, Tetrahymena ribozyme28, towards 2 Å using the same sample preparation for the apo-ribozyme as previously described29, but with more data and a next-generation electron detector (Extended Data Table 1 and Methods). Single-particle cryo-EM analysis yielded two maps after image classification at 2.2 Å and 2.3 Å resolution with Rosenthal–Henderson B-factor values30 of 63 Å2 and 66 Å2, respectively (Fig. 1a,b, Extended Data Fig. 1 and Methods). Overlaying the two models shows that the structural differences lie primarily in peripheral domains P9.2 and P6, which both point out into solution without interactions with the rest of the ribozyme (Fig. 1c–f and Methods).

Fig. 1: Map and model of Tetrahymena ribozyme at 2.2 Å and 2.3 Å resolution.
figure 1

a,b, 2.2 Å (a) and 2.3 Å (b) cryo-EM map at 3σ threshold. c,d, Maps at 2.2 Å (c) and 2.3 Å (d) with transparent surface and derived model in ribbon display, each domain is coloured. e, Models from 2.2 Å and 2.3 Å maps are overlaid. Deviations are labelled for peripheral domains that have considerable differences. f, Secondary structure diagram. All domains are labelled and coloured. g, Extracted densities around four nucleotides showing base resolvability and clear separation of stacked bases. The Q score is labelled. h,i, Density surrounding four nucleotides in the 2.2 Å (h) and 2.3 Å (i) maps showing similar density features surrounding the RNA. The map was segmented at 3σ using Segger59. Density segments at distances 1.8–5.0 Å from the RNA heavy atoms are displayed at 5σ (transparent teal) and 8σ (dark teal).

Structure quality assessment

In the cryo-EM maps, unambiguous separation between bases was visually evident and bases were well resolved, indicating that there is high confidence in the positioning of nucleotides (Fig. 1g and Extended Data Fig. 2a,b). When zooming out, most domains in the structure had a Q score31, a metric to assess the resolvability of atoms, above that expected at the overall resolution of the map (Extended Data Fig. 2c). Additional density peaks were observed around the RNA, potentially water, ions or experimental noise (Fig. 1h,i). Although water had been observed previously in X-ray crystal structures of the P4–P6 subdomain of the ribozyme8,32, only ions had been previously modelled in structures including the catalytic domain of the ribozyme28,29,33,34,35. The assignment of water and ion densities in X-ray diffraction models has been heavily scrutinized36,37, highlighting the need for rigorous analysis. Rigorous assessment of map quality and modelling of the RNA atoms had to be established to avoid modelling water into noise peaks, RNA density or ion density. As we observed that the cryo-EM density is not uniformly resolved across domains of the ribozyme (Extended Data Fig. 3), water could only be confidently modelled in select regions. These well-resolved regions were also conformationally very similar between the two models (root mean square deviation (RMSD) of 0.59 Å; Extended Data Fig. 3) and thus we reasoned that these regions would be well suited for modelling water with rigorous criteria and comparing these placements between the two maps as cross-validation.

Automated modelling of water with SWIM

For modelling water, we applied SWIM, which was originally developed for automated analysis of water and ions in atomic-resolution cryo-EM maps of proteins17,18. Several SWIM criteria were updated or introduced to be more stringent and reduce the likelihood of modelling water in noise peaks at the 2.2–2.3 Å resolution observed here, resulting in conservative peak identification (Extended Data Fig. 4a–i and Methods). Owing to resolution limitations, resolvability was only sufficient in the solvent shell directly adjacent to the RNA atoms, hence we strictly limited peak assignments to this shell. Using SWIM, 255 and 281 water molecules along with 47 and 47 Mg2+ ions were modelled in the 2.2 Å and 2.3 Å ribozyme maps, respectively (Figs. 2a,b and 3). A small number of sites are expected to be partially occupied by monovalent ions, but we expect the contribution to be minor compared with water and Mg2+ ions (see Extended Data Figs. 6k–m and 9e,f for detailed discussion). The modelled waters were restricted to those with high resolvability as indicated by the Q scores in the final maps of approximately 0.8, with a trend to higher Q score for waters bound to more RNA atoms (Extended Data Fig. 4j–m). The majority of waters were modelled closest to oxygen atoms, and distances to the nearest atom ranged between 2.5 Å and 3.5 Å, as per the SWIM criteria, with the most prevalent water–oxygen distance of approximately 2.8 Å (Extended Data Fig. 4n,o).

Fig. 2: Water and Mg2+ ion detection with SWIM and consensus between 2.2 Å and 2.3 Å maps.
figure 2

a,b, Detected waters (red) and Mg2+ ions (green) for the 2.2 Å (a) and 2.3 Å (b) model. Consensus water and Mg2+ ions coloured dark and counts are noted next to each model. ce, Distributions of water and Mg2+ ions separated by consensus and non-consensus types: Q score of water or Mg2+ ions (c; PMg = 5.7 × 10−2 and Pwater = 6.1 × 10−4), average Q score of bound RNA nucleotides (d; PMg = 2.2 × 10−3 and Pwater = 1.0 × 10−6) and RMSD of bound RNA nucleotides between the 2.2 Å and 2.3 Å models after alignment on all RNA heavy atoms (e; PMg = 2.6 × 10−2 and Pwater = 3.8 × 10−5). The horizontal line is the mean value. Pairwise significance was determined by a two-sided Mann–Whitney U-test: not significant (NS) P > 0.05, *P < 0.05 and **P < 10−4.

Fig. 3: Water and Mg2+ ion binding to nucleotides within and between domains.
figure 3

Ribbon display of the model built in the 2.2 Å map (centre), colour-coded by domain as in Fig. 1, along with water (red spheres) and Mg2+ ions (green spheres). aj, The panels on the outside highlight a selection of water and Mg2+ ions interactions, with nucleotide labels colour-coded by domain as in Fig. 1. All water and Mg2+ ions displayed are found in the 2.2 Å and 2.3 Å models; the same regions but with the 2.3 Å map and models can be found in Extended Data Fig. 6a–j. Distances (Å) from water and Mg2+ ions to RNA heavy atoms are labelled; only some of the contacts for each water and Mg2+ ion may be shown in each case. See Supplementary Video 1 to visualize panels c,e,f,h in 3D.

SWIM found many waters and Mg2+ ions that overlapped with those modelled in the lower-resolution cryo-EM structures of the full ribozyme29,33,34,35 and X-ray structures of subdomains of the ribozyme8,28,32 (Extended Data Fig. 5 and Supplementary Data, file 1). These previously modelled water and Mg2+ ions include those in the metal ion core of the well-characterized P4–P6 domain8,32 (Fig. 3a–c and Extended Data Fig. 6a–c). Generally, the modelled Mg2+ ions bind to similar parts of nucleotides as Mg2+ ions modelled in previous RNA studies38 (Extended Data Fig. 7a,b,d), except for more frequent binding of Mg2+ ions to sugars in our SWIM model (see example in Extended Data Fig. 6k,l).

SWIM modelled many novel waters in our new structures. To investigate the confidence in their positioning and assignment, we took advantage of the availability of two discrete, independent, high-resolution cryo-EM maps. We examined whether the water modelled in both maps share equivalent locations; shared waters are referred to below as ‘consensus’ waters (Methods). Of the SWIM waters, 134 were identified as consensus, 53% and 48% of all SWIM waters in the 2.2 Å and 2.3 Å models, respectively (Fig. 2a,b).

Although all waters modelled were well resolved (Q > 0.7), consensus waters had a statistically significant higher Q score than non-consensus waters (Fig. 2c) and were in regions of the RNA that were statistically significantly better resolved (higher Q score; Fig. 2d) and more similar (smaller RMSD; Fig. 2e). Hence, we first analysed the binding motif of just the consensus waters, found in both the 2.2 Å and the 2.3 Å models, and then analysed the nature of the other, non-consensus, positions.

Consensus waters were bound throughout all parts of nucleotides in our cryo-EM structures (Extended Data Fig. 7a,c–e). However, some regions were more or less densely hydrated. The catalytic active site of Tetrahymena ribozyme had a similar number of waters per nucleotide as the other regions of the ribozyme (Supplementary Table 1), indicating that ordered waters may be important for tertiary interactions generally and there is not a particularly elaborate water structure in the active site when substrate groups are absent, as in our sample.

Indeed, all resolved tertiary interactions in our structures, which link distant nucleotides in the sequence into stable junctions, were extensively bound to well-positioned waters. For example, the catalytic active site is embedded in a highly conserved core with numerous non-canonical interactions. Stabilizing this core, where the P3 and P6a domains come in close proximity, the Watson–Crick–Franklin edge of C255 and the sugar edge of G272 meet in an orientation that is not favourable for direct hydrogen bonding. We observed an ordered water that bridges these two bases stabilizing the tertiary contact (Fig. 3d and Extended Data Fig. 6d). This water was not previously observed, although some signal was visible at low contours in the 3.1 Å cryo-EM map (Extended Data Fig. 6n). Another previously unidentified water stabilized these domains by bridging the sugars of G254 and U273 (Fig. 3e and Extended Data Fig. 6e).

Elsewhere in the conserved catalytic core, the backbone of P6a comes in close proximity to the junction that connects P3 and P4. We observed a chain of waters mediating this interaction. As an example, a water, not previously modelled, bridges the sugar of C216 and the phosphate of U106 (Fig. 3f and Extended Data Fig. 6f). A newly identified water binds to the G215–U258 wobble base pair of P6a in the minor groove. It forms hydrogen bonds with the O2 and O2′ of the U258 and the N2 of the G215 (Fig. 3g and Extended Data Fig. 6g). A water with the same binding atoms was previously highlighted in an unrelated RNA system within an A-form helix where it was observed to make an additional contact with the O2 of the downstream cytidine39. Our structure also contains a downstream cytidine, C216, with a long bond length of 3.3 Å from the O2 to the modelled water. However, C216-O2 makes an angle of 79° with the water and the G215-N6, too small for the tetrahedral arrangement of water (109.5°)39. A hydrogen bond is more likely formed with the O2′ of A105 in the junction between P3 and P4. Hence, this water may have a role in stabilizing the tertiary interaction between P6a, P3 and P4. This example illustrates how, to stabilize tertiary interactions, water networks can be structured differently from what is expected of a classic A-form helix.

Water has an integral role in stabilizing the tertiary structure generally. For example, in the region where P4, P5c and P14 come together, we observed at least five waters that form a network stabilizing this complex junction (Fig. 3h and Extended Data Fig. 6h). In the junction between P3 and P7, A306 is adjacent and in the same plane as the A308–U267 base pair, but instead of A306 forming direct RNA–RNA hydrogen bonds with the A–U base pair, the interaction is mediated by two waters (Fig. 3i and Extended Data Fig. 6i).

SWIM modelled waters interact with multiple phosphates, potentially having the role of shielding the negative charge typically fulfilled by ions. In the highly solvated and well-resolved P3 domain, a density peak is visible bridging phosphates of two consecutive nucleotides with distances of 2.8 Å and 3.3 Å (Fig. 3j and Extended Data Fig. 6j). No previous structures modelled a water or an ion at this site. These distances are too long for a Mg2+ ion, which is typically approximately 2 Å away from a phosphate, and hence the atom is more likely a water. However, we could not rule out a monovalent cation at this position (further discussed in Extended Data Fig. 6j–m). To get a better sense of the limitations of our modelling at 2.2 Å and 2.3 Å resolution, we analysed the SWIM assignments that disagreed between our two independent maps.

Exploring non-consensus waters

We were left with roughly half of the automatically modelled waters that were identified by SWIM in only one of the two independently reconstructed cryo-EM maps. A similar percent overlap was observed in the two asymmetric units of the X-ray crystal structure of the P4–P6 subdomain8 (Extended Data Fig. 5g). We examined the non-consensus waters in more detail by placing them into the map that they were not modelled in and observing the model and map features. If the disagreement is explained by noise or difference in solvent structure, we expected the models and map features to differ significantly between the two independent maps. For 30% of the non-consensus waters, a peak was modelled nearby but in a slightly different position in the other map, sometimes leading to disagreement in the assignment (for example, water in one model, a Mg2+ ion in the other; Extended Data Fig. 8a–c). Of all waters, when placed in the other map, 84% still exhibited high cryo-EM density (more than 5σ) and resolvability (Q > 0.7; compared with 1.2% of randomly sampled positions in the solvent shell), and 75% of all waters additionally passed the half-map resolvability criteria (compared with 0.6% of randomly sampled positions) (Extended Data Figs. 4a–h and 8a,b). This suggested that many of the non-consensus waters had significant density features in both maps and generally that very few of the non-consensus waters originated from experimental noise or an actual difference in solvent structure. The above data led us to conclude that SWIM is reliable in locating density peaks and these peaks are unlikely to be noise.

Generally, non-consensus waters have a significantly lower peak density than consensus water (Fig. 4a), providing a simple explanation for the inconsistency of SWIM in modelling. To understand the origin of the lower peak densities, we reasoned that peak density can be reduced by two factors: occupancy (that is, how often water is present anywhere in the site) and high positional spread (that is, water diffuses within the binding sites and is not always localized to the peak coordinate; Fig. 4b). Cryo-EM maps at this resolution do not contain the information to neatly decompose these factors. To develop hypotheses for what accounts for lower peak densities of non-consensus SWIM-modelled waters, we carried out explicit solvent all-atom molecular dynamics (Extended Data Fig. 9 and Methods). The root mean square fluctuation (RMSF) of the RNA in 30 400-ns simulations was used to estimate the global flexibility of the ribozyme. A significant negative correlation exists between the Q score and RMSF (Extended Data Fig. 3e–h), in agreement with the premise that inherent flexibility is the primary attribute explaining poor resolvability in peripheral regions of the ribozyme. These results indicate that cryo-EM freezing artefacts are not the major determinant of poor resolvability and simultaneously offer support for the credibility of the simulations.

Fig. 4: Water dynamics in molecular dynamics simulation with reference to cryo-EM water-binding sites.
figure 4

a, Cryo-EM peak density of all SWIM-identified waters is plotted for waters in both cryo-EM maps (‘consensus’ (red) n = 268) or only one cryo-EM map (‘non-consensus’ (pink) n = 268). The violin plot displays the range, dotted lines are the quartiles, and pairwise significance was determined by two-sided Mann-Whitney U-test (P = 3.6 × 10−26). The black dashed line is the mean density of the map. b, Example of molecular dynamics water-binding sites displaying how low occupancy or high RMSF can reduce the peak density. For every molecular dynamics frame, the position of water, if present, is a small red dot. c,d, Water-binding sites from molecular dynamics (MD) simulations are grouped by whether they are found in both cryo-EM maps (red; n = 76), only one cryo-EM map (pink; n = 167) or no cryo-EM maps (yellow; n = 1,984). The molecular dynamics and EM (2.2 Å map) peak water density are the maximum density within 1 Å of the molecular dynamics peak coordinate. The molecular dynamics peak water density is displayed (c; one versus two cryo-EM maps: P = 1.0 × 10−4; one or two versus no cryo-EM maps: P = 1.2 × 10−14); statistics are as in panel a. The black dashed line is the density of bulk water. For each molecular dynamics water-binding site found in both or one cryo-EM map, the cryo-EM peak density is plotted against the molecular dynamics peak water density; the contours display the 5%, 25%, 50%, 75% and 95% probability densities from kernel density estimation (d). Pearson’s correlation coefficient and P value, two-sided hypothesis test for 0 correlation, are reported (P = 3.2 × 10−19). e,f, For each molecular dynamics water-binding site, the proportion of time that any water occupied that site (e; P = 2.9 × 10−4) and the RMSF of the waters found at that binding site (f; P = 2.5 × 10−8) across all simulations are plotted against the molecular dynamics peak water density, as displayed in panel d. Significance is reported as: *P < 0.05, ***P < 10−6 and ****P < 10−8.

Waters (simulated with the TIP4P-D water model40) were initially placed randomly and were not guided by the cryo-EM map during simulation to enable unbiased comparison between molecular dynamics and cryo-EM. To assess the reliability of making observations of water dynamics in the molecular dynamics simulations, we compared water-binding sites found in the molecular dynamics simulation to the SWIM-modelled cryo-EM waters. On average, the cryo-EM waters had a predicted molecular dynamics water concentration of 90 M (0.054 waters per Å3), and only 10% of cryo-EM waters had predicted molecular dynamics water density lower than bulk water (55 M, 0.033 waters per Å3; Fig. 4c and Extended Data Fig. 9e). As expected, molecular dynamics water-binding sites that were modelled by SWIM in both cryo-EM maps had higher water density in the molecular dynamics simulations than waters modelled in only one cryo-EM map or locations where SWIM did not model a water (Fig. 4d and Supplementary Data, file 2). In addition, the molecular dynamics peak density correlated with the cryo-EM peak density at molecular dynamics water-binding sites that were modelled by SWIM (Fig. 4c). Our observations thus increased confidence in the use of the simulation in interpreting cryo-EM data and suggesting hypotheses about the nature of water dynamics.

The molecular dynamics simulation enabled the examination of the two factors that we proposed could reduce peak density: occupancy and positional spread. The variation in the molecular dynamics peak density is partially accounted for by the variation in occupancy and positional spread (RMSF) of water (Fig. 4e,f), suggesting that both factors contribute to reduction in peak density, which results in the deviations in the SWIM-based water modelling for the two maps. The inspection of sites with higher positional spread led us to analyse water maps in a method independent of discrete peak identification by SWIM, described next.

Complex and diffuse water networks

On the basis of the results above, we expanded our analysis beyond highly ordered waters traditionally modelled in peak densities by SWIM and other algorithms41. We visually observed patches of density that diffuse into specific directions and are correlated between the two independent maps. We next explored whether the cryo-EM density could contain information about more diffuse and unmodelled water networks.

We first considered whether the diffuseness of density might be due to a lack of resolution. Comparison with a lower-resolution (3.1 Å) cryo-EM map showed that in many instances, although the density was too diffuse to model at 3.1 Å, the signal for waters now resolved at 2.2 Å was present (Extended Data Fig. 6n–p). We expected similar effects at a smaller distance scale for the 2.2 Å map, for example, blurring the density between the Mg2+ ion and coordinating waters (Extended Data Fig. 8c). The resolution-dependent diffuseness of cryo-EM density suggests caution on automatically assigning peaks to water or Mg2+ ions at 2.2 Å without cross-validation, as was done here, or manual inspection. However, the confirmation that diffuse density in the 3.1 Å map overlaps with modelled water, confirmed in our 2.2 Å and 2.3 Å maps, motivated further investigation of these diffuse densities.

Globally, we observed excellent agreement between the 2.2 Å and 2.3 Å maps in the cryo-EM density of the solvent shell (Fig. 5a; displayed 3σ above background). We observed wires of diffuse density that are present in the 2.2 Å and 2.3 Å maps, as well as the predicted molecular dynamics water density (Fig. 5a). Taking a slice of density, we observed water networks along the grooves of helices as well as around the backbone, particularly in the region of tertiary contacts (Fig. 5b). Zooming in further, around the backbone of P6a, we observed a complex network of waters (Fig. 5c). The placement of waters from SWIM matched well between the 2.2 Å and 2.3 Å maps: eight waters were modelled in similar positions. Six of the overlapping water positions further overlapped with a local maximum in the simulated molecular dynamics water density. This suggested that molecular dynamics is capable of predicting some locally ordered waters. The SWIM-assigned water positions disagreed in five positions between the two independent maps; however, at all of these positions, the cryo-EM maps in the solvent shell were highly similar. Moreover, the cryo-EM and simulated molecular dynamics densities agreed at low contours, suggesting that the cryo-EM density maps and molecular dynamics simulation agreed not only on many of the ordered water-binding sites but also on the diffuseness of water through a network. This was even the case for regions where map diffuseness precluded atomic water placement by SWIM (Fig. 5c).

Fig. 5: Diffuse cryo-EM densities and molecular dynamics water networks.
figure 5

Comparison of density features surrounding the RNA (red) between the cryo-EM maps and molecular dynamics. The first column labels the nucleotides in the region, coloured according to domains from Fig. 1. The second and third columns are the 2.2 Å and 2.3 Å cryo-EM densities, respectively, at 3σ at least 1.8 Å away from the RNA coloured in transparent green (within 2.5 Å of a Mg2+ ion) and red (remainder of density). The models are displayed in white, with the SWIM modelled Mg2+ ion as green spheres and water as red spheres. The final column is the predicted density from molecular dynamics after local alignment (Methods). Water (dark red at 101 M, light red at 68 M), Mg2+ ion (dark green at 46 M, light green at 14 M) and Na+ ion (dark purple at 28 M, light purple at 14 M) densities are displayed. The 2.2 Å RNA is pictured in white for reference. a, Densities of water and ions around the RNA for a global view of the ribozyme. bf, Specific regions of interest, showing water networks along the groove of helices and around the backbone (b), a network of waters around the backbone of P6a (c), water networks surrounding the catalytic core (d,e) and a water wire bridging the backbone of two RNA helices (f). The black circle (c) highlights a density feature in cryo-EM and molecular dynamics simulations whose irregular shape precluded SWIM assignment of an atomically ordered water. A density of unknown identity in the binding site for the guanine substrate is circled in black (d,e). See Supplementary Video 2 to visualize the 3D context of panels be.

Examining near the catalytic core, specifically, the cavity created by the major groove of P7, A261, C261 and A306, we saw diffuse densities that agree well between the two cryo-EM maps, including the pocket that would bind to the guanosine substrate ωG of the ribozyme (Fig. 5d). Turning towards the substrate-binding site, we again observed similar density features in the cryo-EM maps; diffuse densities, with the same structured network between the two maps (Fig. 5e). In both sites, molecular dynamics predicted less hydration density than at other sites; although the RNA is fully solvated at each point in the simulation, molecular dynamics predicted that this solvation shell is more dynamic. This cavity is directly exposed to bulk solvent, a potential explanation for why the molecular dynamics predicted these sites to have less-ordered waters than observed experimentally in flash-frozen samples. Analysing a nearby, less-solvent exposed, tertiary interaction between the backbone of P9.1 and P7, molecular dynamics predicted a water wire between these backbones, which is also supported in the cryo-EM densities (Fig. 5f). The SWIM models disagreed on some water placements in this region due to the diffuseness of the density. This region may be more accurately modelled by an ensemble of partially occupied water positions as opposed to a few specific positions.

These examples reveal that the cryo-EM maps contained information about the solvent structure that is not currently modelled in the traditional atomic structure representation underlying SWIM and depositions in the Protein Data Bank (PDB). We quantified the agreement by comparing the distribution of density in the solvent shell from SWIM models and molecular dynamics with the 2.2 Å map. Despite conservative modelling, the SWIM model obtained a better cross-correlation coefficient than molecular dynamics, although both are far from our level of experimental precision, here estimated by comparison between the 2.2 Å and 2.3 Å maps (Supplementary Table 2).

To delve into this comparison further, we treated the density as a classification task with the goal of identifying all the voxels with a significant density (more than 3σ) in the well-resolved (RNA Q > 0.6) solvent shell (1.8–3.5 Å away from RNA) of the 2.2 Å map. The 2.3 Å map at 3σ was able to match 71% of these solvent locations, also known as recall, with a low false positive rate of 5.6% and a high precision of 63% (Extended Data Fig. 8d–f). At the same recall (71%), the molecular dynamics water density falsely identified 35% of the voxels and had a much lower precision of 22%, revealing a large accuracy gap for molecular dynamics (Extended Data Fig. 8d–f). Furthermore, this analysis confirms that the SWIM-identified waters and Mg2+ ions only accounted for a minor proportion of the consensus density in the solvent shell. Comparing the SWIM-identified waters to the cryo-EM density, we observed high precision; SWIM-identified waters recovered 10% of the solvent positions with a precision of 96%, whereas molecular dynamics recovered this many with a precision of only 68%. However, there was a large proportion of density left unmodelled by SWIM, plateauing at approximately 25% solvent locations recovered (Extended Data Fig. 8d–f). This low recall was due to the SWIM criteria being designed to only detect rigidly ordered waters and ions, and not other solvent and ion densities. Overall, using classification performance summary metrics, we observed good experimental reproducibility, with reasonable performance for the SWIM model and poorer performance for molecular dynamics predictions. Finally, we used this metric to assess the local accuracy of the TIP4P-D model in predicting the water structure by direct comparison to the cryo-EM map (Extended Data Fig. 8g–i). The molecular dynamics-predicted water structure agrees with the cryo-EM map most in the peripheral tetraloop–tetraloop receptor, but performs poorly in other regions, most notably in the catalytic core (mean normalized nucleotide area under the precision-recall curve of 0.58 and 0.34, respectively; Supplementary Table 1).

Discussion

This investigation was made possible by the high quality of the cryo-EM map that we obtained, which, to our knowledge, is the highest resolution reported for an RNA-only system to date. In the future, with the development of newer methods in cryo-specimen preparation42,43, larger datasets and improved data-processing algorithms that account for local flexibility44,45, RNA cryo-EM maps have the potential to be extended closer to atomic resolution. With high-quality density maps and an optimized automated SWIM algorithm, water positions were automatically modelled based on the strength of the local signal in the map, concurrent resolvability in both full and half maps, and physically plausible distances from RNA atoms. The SWIM-modelled waters were included in the PDB-deposited model. The SWIM criteria were shown to be stringent, with only approximately 1% of randomly sampled positions in the solvent shell meeting the criteria. However, at approximately 2 Å resolution, there remained ambiguity in precise peak placement and molecular identification to be either water or cations in some instances. Therefore, a future challenge is to develop improved algorithms to reliably characterize the identity of bound water and ions in those instances at this resolution range. Beyond the geometry of coordination, such an algorithm may take biophysical properties of electron scattering of cations into account to differentiate between waters and ions46, especially monovalent ions. Fortunately, in our study, cross-validation was facilitated by two independent high-resolution maps of Tetrahymena ribozyme, which were highly similar in the well-resolved regions.

Contrary to the common portrayal of water and ions as molecules dispersed on the periphery of biomolecules, in the full ribozyme in vitreous ice, we observed water and Mg2+ ions principally in the interior, as also observed in the crystal structure of the P4–P6 domain8. We not only validated our finding for well-studied water and ions, such as those previously known in the metal ion core32, but also uncovered many other novel waters that formed bridges between RNA nucleotides stabilizing the compact fold of ribozymes (Fig. 3). We resolved a structured water network surrounding the active site of the ribozyme, despite the site being solvent exposed (Fig. 5d,e). These ordered waters could be important for the pre-structuring of catalytically important ions or to reduce the entropic penalty of desolvation upon substrate binding and may be important for understanding currently unexplained effects from nucleotide analogue interference mapping experiments47,48,49,50,51,52 (Supplementary Table 3). To fully appreciate the biophysical information that can be derived from the cryo-ensemble of water structure, future study of the effect of radiation damage and the freezing process on water bound to RNA is warranted53,54.

Owing to our conservative criteria in the models, there remained signals in the map that are not annotated in the PDB-deposited model. For example, we only modelled enough ions to neutralize around one-quarter of the RNA charge, implying there are some aspects of ionic composition not represented in our structural models. This missing representation of ions is pervasive in all experimental structures, which only model ordered waters and ions. Furthermore, our consensus criteria revealed that only half of the automatically modelled waters agree and have been modelled, despite high agreement between cryo-EM densities (Fig. 2a,b).

We turned to molecular dynamics simulation to develop hypotheses for the biophysical explanation of the unmodelled water cryo-EM densities. Comparisons with molecular dynamics simulations supported the hypothesis that the cryo-EM map SWIM-modelled peaks capture ordered waters that have high occupancy and low positional spread (Fig. 4). Furthermore, some diffuse cryo-EM densities matched diffuse water networks in the molecular dynamics simulations, supporting the hypothesis that cryo-EM may resolve diffuse density representing flexible water networks (Fig. 5). If this is true, even if RNA can be resolved at higher resolution, some or most water density will remain diffuse. In the future, we foresee two methods by which these networks may be understood in more detail. First, new methods of heterogeneity analysis of individual cryo-EM particles that contain high-resolution information should be developed and validated, enabling the visualization of water networks via the ensemble of water placements revealed. Second, although the simulation of water dynamics cannot yet accurately model experimental RNA and solvent structural fluctuations fully, the observed agreement warrants further investigation into the utility of molecular dynamics in interpreting and predicting water structure around RNA, including various catalytic states of the Tetrahymena ribozyme. In the future, cryo-EM data could be used to evaluate and compare simulation methods, including quantum mechanical simulations and alternative force fields, with particular focus on different parameterizations of water and ions55,56,57,58.

Methods

Cryo-EM sample preparation and data collection

Tetrahymena ribozyme RNA (125 kDa, the linear L-21 ScaI ribozyme spanning residues 22–409 without the P1 substrate or ωG) was prepared as previously described29. The ribozyme sample (approximately 25 μM, approximately 3 mg ml−1) in 50 mM Na-HEPES pH 8.0 was denatured at 90 °C for 3 min and cooled to room temperature for 10 min. MgCl2 was then added to a final concentration of 10 mM, and the samples were incubated at 50 °C for 30 min. The samples were cooled again to room temperature for 10 min. Three microlitre volumes were applied onto glow-discharged 200-mesh R2/1 Quantifoil copper grids. The grids were blotted for 4 s and rapidly cryocooled in liquid ethane using a Vitrobot Mark IV (Thermo Fisher Scientific) at 4 °C and approximately 100% humidity. The grids were screened using a Talos Arctica cryo-electron microscope (Thermo Fisher Scientific) operated at 200 kV. The grids were imaged in a Titan Krios G3i cryo-electron microscope (Thermo Fisher Scientific) operated at 300 kV at a magnification of ×105,000 (corresponding to a calibrated sampling of 0.82 Å per pixel). Micrographs were recorded by EPU software (Thermo Fisher Scientific, v2.7) using ‘Faster Acquisition’ with a throughput of approximately 420 movie stacks per hour with a Gatan K3 Summit direct electron detector, where each image was composed of 30 individual frames with an exposure time of 2.5 s and a dose rate of 22.9 e s−1 Å2. Finally, a total of 18,365 movie stacks was collected with a defocus range of −0.5 to −2.0 μm.

Image processing

All micrographs were motion-corrected using MotionCor2 (ref. 60), and the contrast transfer function (CTF) was determined using CTFFIND4 (ref. 61). All particles were autopicked using the NeuralNet option in EMAN2 (ref. 62) and further checked manually. The resulting number of boxed particles was 3,804,753. Then, particle coordinates were imported to Relion63, where three rounds of 2D classification were performed to remove 2D class averages with less resolved features. The selected 1,823,256 particles were imported to cryoSPARC64 for generating ab initio maps, and a good map with clear RNA features was derived. Then, starting with this map, non-uniform refinement65 together with local and global CTF refinement was performed, yielding a map with 2.2 Å resolution from 1,181,331 particles. Further 3D variability analysis66 was performed to classify slightly different conformations, and two conformations were obtained. Final maps were achieved after another round of non-uniform refinement65 for each of the two classes, 708,006 and 473,325 particles, respectively, with resolutions at 2.2 Å and 2.3 Å, respectively. The cited resolutions for the final maps were estimated by the 0.143 criterion of the FSC curve in cryoSPARC. The local resolution map was calculated using cryoSPARC local resolution estimation using default values. See more information in Extended Data Table 1 and Extended Data Fig. 1.

Map sharpening

The reconstructed maps for both 2.2 Å and 2.3 Å were further processed with phenix.auto_sharpen67. Half-maps were also sharpened for modelling waters and ions. In phenix.auto_sharpen, the amount of sharpening was reflected by the b_sharpen parameter. For the 2.2 Å map, the b_sharpen applied was 79.23, giving a final b_iso of 20.00 (initial 99.23), and for the 2.3 Å map, b_sharpen was 74.07, giving a b_iso of 30.45 (initial 94.07). The b_iso indicates how quickly amplitudes fall off with increasing frequencies in Fourier space. In theory, the higher b_iso of the 2.3 Å map could be attributed to (1) lower signal-to-noise ratio (SNR) due to fewer particles, and/or (2) more varied particle conformations with larger atomic displacements than in the 2.2 Å map.

Model building

The atomic model of full-length apo Tetrahymena ribozyme (PDB ID: 7ez0)29 was first rigidly fitted into 2.2 Å and 2.3 Å sharpened maps, respectively. The resultant models were refined into the sharpened maps using phenix.real_space_refine with secondary structure and geometry restraints, using the default parameters and five cycles68. The models were then further refined into the sharpened maps with ISOLDE in ChimeraX (v1.6.1)68,69,70, which improved model geometry and clashscore. SWIM was then applied to model ions and water molecules (see below). The quality of the final models was evaluated by MolProbity71 and Q score31. Statistics of the map reconstruction and model optimization are summarized in Extended Data Table 1. All figures were made using Chimera72 and ChimeraX70.

SWIM

The SWIM procedure, which is available on GitHub (https://github.com/gregdp/segger (v2.9.7); and scripts (https://github.com/DasLab/Water-CryoEM-ribozyme)), identified peaks in density and modelled these peaks as water, Mg2+ ion or no atom. Previously, SWIM was run on a 1.34 Å map of apoferritin, using a 2σ density threshold and with no Q score or half-map threshold17. In this study, the density threshold had been increased to 5σ and minimum Q score criteria were imposed, including Q scores in full and half-maps and Q scores of bound nucleotides in the full maps for increased stringency. Specifically, SWIM was applied to the 2.2 Å and 2.3 Å cryo-EM sharpened maps, using sharpened half-maps within the procedure, and the refined models with an additional guanine placed in the guanine-binding site, which was subsequently removed, as follows:

  1. (1)

    The average (avgD) and standard deviation (σ) of all density values in the full map were calculated.

  2. (2)

    The full map was segmented using Segger, with no grouping steps, at a threshold of 3σ above avgD. Each segment was then considered in order of decreasing volume.

  3. (3)

    For each segment, the point within the segment that had the highest interpolated density value (Pmax) was identified, using the Fitmap.locate_maximum function in Chimera.

  4. (4)

    The following rules were applied to determine whether to ignore the segment, or model a water molecule or an ion at Pmax:

    1. (i)

      The Q score at Pmax was calculated in the full map and the two half-maps; half-maps have previously been used to model water73. Only segments with a Q score at Pmax above the chosen value Q_peak_min of 0.7 in all three maps were further considered. Choosing a higher Q_peak_min reflects a better-resolved water or ion.

    2. (ii)

      If the atom nearest to Pmax was in a poorly resolved nucleotide (Q< 0.6), no water or ion was modelled. We refer to this parameter as Q_res_min.

    3. (iii)

      The density value at Pmax was calculated by tri-linear interpolation. If the density value was less than 5σ above avgD, no water or ion was modelled.

    4. (iv)

      If there were non-polar atoms (carbon atoms or phosphorus) that would clash with Pmax (within 3.2 Å), no water was modelled. Likewise, for ions, if there was a carbon atom that would clash with Pmax (within 3.0 Å), no ion was placed.

    5. (v)

      If, as expected for a cation, electronegative atoms neighboured Pmax but no electropositive atoms that would repel a cation neighboured Pmax, a Mg2+ ion was modelled. Electronegative atoms were all oxygen atoms and non-protonated nitrogen atoms within 1.8–2.5 Å of Pmax. Electropositive atoms were protonated nitrogen atoms within 1.8–3.4 Å of Pmax.

    6. (vi)

      Otherwise, if there were RNA atoms that could hydrogen bond with Pmax and/or ions close to Pmax, a water molecule was modelled. Specifically, any nitrogen or oxygen atoms 2.5–3.4 Å away from Pmax or an Mg2+ ion 1.8–2.5 Å away from Pmax.

    7. (vii)

      When a new water was modelled, if it was within minWaterD (2.5 Å) of an already modelled water molecule, it was ignored because it was too close. This water may be an ‘alternate conformer’ of the same water, which could be modelled in partial occupancy, but that was not done for this study. This was also done for Mg2+ ions, using minIonD (4.5 Å).

  5. (5)

    The method was repeated until no more water or Mg2+ ions were modelled; in total three iterations were performed.

Only Mg2+ ions bound directly to the RNA and water were modelled. It is conceivable that our models may have missed or mis-assigned monovalent ions, such as Na+ and Cl, as well as Mg2+ ions interacting with RNA through water (see discussion in Extended Data Figs. 6k–m and 9e,f).

Consensus analysis

We referred to high-confidence waters if they are found in both the 2.2 Å and the 2.3 Å models; there is consensus. Consensus was defined by passing two criteria concurrently: superimposable and the same binding site, ensuring we captured waters and Mg2+ ions that are spatially superimposable and have the same local chemical environment. Superimposable waters were within 1 Å of one another after the alignment of local RNA atoms. Local RNA atoms were all heavy atoms in nucleotides that contained an atom that was 10 Å or closer to either water. The same binding site was defined as follows. For each water in the pair of water being compared, the set of ‘close RNA atoms’, defined as RNA atoms 2.5–3.2 Å away from the water, was obtained, expanding by 0.3 Å if no RNA atoms were within 3.2 Å. If the same close RNA atoms were found within an expanded distance 2.5–3.5 Å of the other water and vice versa, then the pair of waters were said to have the same binding site. The distance was expanded to account for distance uncertainties. If there were a pair of waters, for example, one in the 2.2 Å model and another in the 2.3 Å model, that passed both criteria, the waters were considered consensus waters, otherwise, they were non-consensus waters. The same criteria were used for Mg2+ ions with a close binder distance of 1.8–2.2 Å and an expanded distance of 1.8–2.5 Å. Non-consensus waters and Mg2+ ions were modelled by SWIM but do not meet the very strict requirement of being superimposable and binding to the same RNA sites in the 2.2 Å and 2.3 Å maps.

Separate models were deposited in the PDB with consensus water and ions (9CBU and 9CBW for the 2.2 Å and 2.3 Å models, respectively) and a different model with all automated water and ions (9CBX and 9CBY for the 2.2 Å and 2.3 Å models, respectively). For confident water and ion placement only, the first model should be used. The B-factors reported are calculated according to the formula B = 150(1 − Q) (ref. 18).

These consensus criteria were used identically to assess overlap with previous X-ray and cryo-EM models (PDB IDs: 7EZ0, 7EZ2, 7R6L, 7XD5, 7XD6, 7XD7, 7YG8, 7YG9, 7YGA, 7YGB, 7YGB, 8I7N, 1GID, 1HR2 and 1X8W). All previous structures were compared when manually inspected for sequence alignment. For comparison to multiple models (Extended Data Fig. 5), waters and Mg2+ ions from all models were combined, including those that overlap. For the ‘2.2 Å and 2.3 Å’ category, only the consensus waters and Mg2+ ions were used.

Analysis of models

For all analysis and averaging, hydrogens were ignored, only heavy atoms were considered. Map values were calculated using tri-linear interpolation. Map resolvability was quantified by a map-to-model measure of atomic resolvability, the Q score, which is independent of the contour level31. All Q score calculations, including those with half-maps, were calculated using sharpened maps and default parameters in the command line tool (https://github.com/gregdp/mapq). Per-residue Q scores were calculated by averaging across all heavy atoms in each residue without mass-weighting as is the default in Q score. The expected Q score was calculated using the previously published linear relationship31. The RMSD between the two models was calculated after the superposition of all RNA-heavy atoms. Distances to RNA atoms were calculated with custom scripts.

Sampling of positions in the solvent shell

For assessment of the strictness of the SWIM criteria, positions in the solvent shell of the RNA were sampled and each position was assessed by the SWIM criteria. For sampling the positions, a cube containing the RNA with a point every 1.67 Å, only keeping points that were between 1.5–3.5 Å from an RNA heavy atom in a well-resolved nucleotide (Q > 0.6).

For assessing how well SWIM-modelled waters fit in the alternative map, the map they were not modelled in, the waters were placed in the other map after locally aligning the RNA (RNA nucleotides within 10 Å). The SWIM criteria were assessed for these positions as above.

Molecular dynamics

Details and input files are available on GitHub (https://github.com/DasLab/Water-CryoEM-ribozyme) and simulation files can be found in the Stanford Digital Repository (https://doi.org/10.25740/sw275qs6749). Atomic RNA coordinates were taken from the 2.2 Å model. To ensure RNA-folding stability during the simulation, 20 Mg2+ ions bound to specific RNA sites were pre-positioned in the initial conditions of the simulation; these were modelled by an earlier version of SWIM in both the 2.2 and the 2.3 Å maps. During the simulation, some of these 20 ions moved, whereas some stayed in their placed positions throughout the simulation. As Mg2+ ion placement by molecular dynamics is necessary but biased (that is, not randomly sampling Mg2+ ion-binding sites), the comparison between molecular dynamics and cryo-EM Mg2+ ion-binding sites was not analysed in detail (Supplementary Data, file 3). Seventy-five additional Mg2+ and 196 Na+ were added to neutralize the system. For six simulations of each force field, these additional ions were added randomly, whereas for four simulations, they were added using the Coulombic potential-guided placement method ‘addIons’ in LEaP74. Three force fields were used, DESRES75, parmBSC0χOL3 using mMg and parmBSC0χOL3 using nMg58,76,77,78,79. mMg and nMg were parameterized using an in-house parameter file and placed in a TIP4P-D octahedral box. For each independent simulation, the system was minimized with 500 steps of steepest descent followed by 500 steps of conjugate gradient descent three times. Harmonic restraints of 25, 20 and 15 kcal mol−1 Å−2 were used on the RNA for first, second and third minimization, respectively. The system was then heated from 0 K to 100 K over 12.5 ps, harmonically restraining the RNA with a restraint of 15 kcal mol−1 Å−2. The system was further heated with the same restraints from 100 K to 310 K over 125 ps. All simulations were run on a single graphical processing unit (GPU) using the Amber20 Compute Unified Device Architecture (CUDA) version of particle-mesh Ewald molecular dynamics (PMEMD)80. The system was equilibrated with harmonic restraints on RNA atoms for 30 ns. The restraint strength started at 15 kcal mol−1 Å−2, was reduced by 2 kcal mol−1 Å−2 every 1 ns for the first 5 ns, then by 1 kcal mol−1 Å−2 every 1 ns for the next 5 ns and finally it was reduced by 0.1 kcal mol−1 Å−2 every 2 ns for the final 20 ns. Production simulations were performed (without restraints) at 310 K and 1 bar using the NPT ensemble, a Berendsen thermostat and a barostat. Every 200 ps, snapshots were saved. All simulations were run for 400 ns. These simulations used a 2-fs step, and bond lengths to hydrogen atoms were constrained using SHAKE. The cut-off for non-bonded interactions was at 9 Å.

Molecular dynamics analysis

Scripts for all analyses are available on GitHub (https://github.com/DasLab/Water-CryoEM-ribozyme). The simulation was autoimaged, and 1-ns frames were aligned using all RNA heavy atoms using CPPTRAJ. Analysis was conducted on snapshots sampled every 1 ns; to reduce file size, water was cut-off at 3.5 Å. The simulations were further analysed using custom scripts with MDAnalysis81,82 including RMSF and distance calculations. Throughout calculations, OP1 and OP2 were considered OP, and hydrogens were ignored.

For assessing the stability of the simulations generally, the RMSD from the 2.2 Å and 2.3 Å model was calculated from all RNA atoms after aligning to the ribozyme core defined as residues 31–42, 46–56, 96–102, 107–112, 116–121, 200–205, 208–214, 262–268, 272–278, 307-315-316, 318–331 and 405–406. For assessing the stability of the secondary structure, PDB-formatted coordinates of each frame were extracted using MDAnalysis, and information regarding base pairing was obtained from Rosetta rna_motif. Watson–Crick–Franklin base pairs from the model were analysed and the percentage frames in which they were maintained were reported using an in-house script. Ribodraw83 was used to draw the secondary structure diagram of the starting coordinates. The interactions were manually coloured according to the percentage of frames in which the interaction was present.

RMSF and B-factor of RNA nucleotides were calculated per simulation by aligning to an average structure based on all RNA heavy atoms using MDAnalysis. The per nucleotide calculations were non-mass-weighted averages to match the Q score, which does not weight the average Q score by mass. Molecular dynamics-simulated maps were calculated using ChimeraX molmap with a resolution of 2.2 Å of all frames for each simulation independently. Q scores were also calculated in these molecular dynamics-simulated maps and the average structures, as used in RMSF calculation, for each simulation to obtain ‘simQ’.

Molecular dynamics binding sites were defined using the densities generated from all simulations (see below). Peaks in the molecular dynamics densities for water, Na+ and Mg2+ ions were identified with Segger. These were filtered to exclude peaks closer than 1 Å to another peak, choosing the peak with the highest density value. Every frame of the 30 simulations was analysed to identify waters and ions, whether they were in a binding site, and what their positions was in each binding site. First, for each nucleotide, a neighbourhood was defined as the set of nucleotides within 10 Å. Second, each frame of the simulation was locally aligned to each of these neighbourhoods, saving the water and ion positions in this locally aligned frame. The set of RNA atoms each water and ion was bound to in each frame was also saved. Third, for each frame, all neighbourhoods were combined. For waters present in more than one neighbourhood, the coordinates were selected from the neighbourhood where the water or ion was closest to the neighbourhood centre. Fourth, each water was labelled with a molecular dynamics binding peak if it was within 2 Å of the molecular dynamics peak coordinate and was bound to the same RNA atoms.

A molecular dynamics binding site was said to overlap with a cryo-EM SWIM-identified water or Mg2+ ion if they bound the same RNA atoms and were within 2 Å of the molecular dynamics peak position.

Using the list of molecular dynamics binding sites, water occupancy for each simulation was defined as the fraction of frames in which water was bound in that site, as defined above, after removing frames in which the RNA deviated more than 3.4 Å from either cryo-EM structure. Water RMSF of each molecular dynamics binding site was calculated by taking the aligned coordinates of all waters in that binding site and calculating the RMSF of that set of coordinates.

The summary statistics for RNA atom binders (Extended Data Fig. 7) were calculated using MDAnalysis to calculate the distance between all RNA heavy atoms and the ions or waters. Distance cut-offs of 1.8–2.5 Å and 2.5–3.5 Å were used for ions and waters, respectively. Only the binding sites that had a water or Mg2+ ion that was present for at least 10 frames (10 ns) were counted. The MgRNA values were counted (https://csgid.org/metalnas/), only counting inner-shell coordination.

For Mg2+ ions, the residence time was defined as the time a single Mg2+ ion was found at that binding site, allowing for 1 frame skip.

Density comparison

To obtain the solvent density of the molecular dynamics simulation, first, local regions of interest were isolated by obtaining the list of RNA nucleotides that were within 10 Å of each modelled water or ion in the 2.2 Å and 2.3 Å map. For each set of nucleotides, the simulation frames were aligned on the heavy atoms of that set of nucleotides, and then the RNA, water, Mg2+ ion and Na+ ion density were calculated separately using DensityAnalysis from MDAnalysis. The densities were averaged across the 30 simulations for each local region. Then, to combine all regions into a composite map, at each voxel, the density value for all submaps whose centre was within 10 Å were averaged, weighted by the inverse of the distance from the centre of the submap to the voxel whose density was being calculated.

To obtain the average density around each nucleotide type (A, C, G or U), the transformation to align all ribozyme nucleobases to their respective reference nucleobase was calculated. Then, for each transform, the 2.2 Å resolution map (zoned > 1.8 Å from RNA heavy atoms) and the molecular dynamics water density (averaged as described above) were transformed, and transforms across each nucleotide type (A, C, G or U) were averaged evenly.

To assess overall agreement of the densities studied, the 2.2 Å density in the region 1.8–3.5 Å from all well-resolved RNA atoms (Q > 0.6) was used as reference. The comparison maps were (1) the ‘independent map’, the 2.3 Å map after transforming according to the alignment of the well-resolved RNA atoms (Q > 0.6) of the 2.2 Å and 2.3 Å models, (2) the 2.2 Å model molmap calculated in ChimeraX using 2.2 Å as the resolution; the water density was added to the Mg2+ density weighted by 1.5, (3) the molecular dynamics density with water, Mg2+ and Na+ density summed with weight of 1.0, 1.5 and 1.3, respectively, densities are described above, and (4) the ‘random map’, the 2.2 Å map with all density values in the solvent shell randomly shuffled. The density was normalized by Z-score, and then the voxels in this region were isolated. The cross-correlation coefficient and mutual information were calculated as defined by Vasishtan and Topf84, using 20 density bins for mutual information.

For the global comparison of solvent density, the modelling of water and ions was cast as a classification task. Therefore, the prediction task becomes predicting when the density is above a certain threshold. For this study, the 2.2 Å map was used as the reference map. The region 1.8.–3.5 Å from all well-resolved RNA atoms (Q > 0.6) was taken as the data, and all voxels above 3σ were defined as ‘positives’ and all other voxels as ‘negatives’. The comparison maps were as above. The threshold of these comparison maps was varied to assess the classification accuracy. The precision (fraction of voxels above the current threshold for the comparison map that were correctly classified as positives), recall (fraction of all positives that were correctly classified above the current threshold for the comparison map) and false-positive rate (fraction of all negatives that were falsely classified as above the current threshold for the comparison map) at various thresholds were calculated. From these values, the precision-recall curve (PRC) and receiver operating characteristic (ROC) were plotted, and the area under the curve (AUC) was calculated. Owing to the imbalance of classes (most of the map is empty), the AU-PRC is the preferred measurement. Finally, the maximal Matthews correlation coefficient across thresholds was calculated.

For local accuracy measurements, the density in the 2.2 Å map was around 1.8.–3.5 Å, and individual nucleotides were used as the data using a 3σ threshold for classification. Owing to the variance in experimental uncertainty in different regions of the RNA, the AU-PRC was normalized to compare the accuracy of the molecular dynamics predictions across regions. A minimum–maximum normalization with the AU-PRC of the 2.3 Å map, AU-PRCexp, was the ceiling (1), and the AU-PRC of the randomly shuffled densities, AU-PRCrandom, was the floor (0). Any nucleotide with high experimental uncertainty (AU-PRCexp < 0.2) was assigned a score of 0.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.