Main

The importance of non-coding RNAs (ncRNAs) across biology is increasingly recognized, but only a small number have been functionally characterized, with studies revealing sophisticated catalytic and sensory functions in some cases5,6,7,8,9. Bacteria, archaea and their viruses are thought to possess numerous diverse and complex ncRNAs, but most of these have not been thoroughly studied1,2,3,4. Furthermore, there is a conspicuous shortage of data on the 3D structures of RNA molecules. Out of more than 4,000 RNA classes in the RNA Families (RFAM) database 15.0, only 143 have experimentally resolved tertiary structures10. For many of the remaining cases, it appears likely that structural characterization will depend on reconstitution of the RNA with small molecule, protein or nucleic acid partners, which are unknown in most cases.

The Breaker laboratory and collaborators have previously described three classes of bacterial and phage RNAs for which covariance analysis of genomic and metagenomic sequences revealed secondary structures that were so extensive and elaborate that ‘ornate’, ‘giant’ or ‘large’ were included in their names: GOLLD RNA3, ROOL RNA1,11 (concomitantly reported in ref. 12) and OLE RNA2. The functions of these three classes of large RNAs remain poorly understood.

Here, using cryo-EM, we show that OLE, ROOL and GOLLD all form atomically ordered 3D structures. Unexpectedly, the three structures are stabilized not by proteins but by other copies of the same RNA molecule in ornate quaternary assemblies with many intermolecular bridges, a phenomenon that has not previously been observed for natural RNA molecules13.

OLE forms an RNA-only dimer

OLE is a class of large RNAs with an ornate secondary structure that is conserved throughout evolution2. OLE is found mainly in extremophilic bacteria, and experimental characterization in Halalkalibacterium halodurans has demonstrated its involvement in integrating energy availability, metal ion homeostasis and drug treatment to mediate cellular adaptation, although the underlying molecular mechanisms remain unknown2,14,15,16. Cellular localization to the membrane, binding to at least six protein partners15,17,18,19,20,21,22 and evidence of alternative secondary structures17 suggested that OLE was unlikely to form a well-defined RNA-only 3D structure. However, our study showed that the 577-nucleotide (nt) OLE RNA from Clostridium acetobutylicum2,23 formed distinct, compact particles that were clearly visible in cryo-EM images (Fig. 1a). Furthermore, a 2.9 Å resolution 3D map of a dimeric OLE RNA could be reconstructed with two-fold imposed symmetry (Extended Data Fig. 1). A model of the each chain has been built for 308 nt in the OLE 5′ region, with Q-scores24 exceeding the expected score at this resolution (Fig. 1b, Extended Data Fig. 1e and Supplementary Video 1).

Fig. 1: Structure of OLE homodimer.
figure 1

a, Top, representative micrograph (6,752 micrographs total). Particles selected for reconstruction are circled in white. Bottom, 2D class averages. Scale bar, 50 nm. b, The cryo-EM reconstruction of the OLE dimer. The top left image depicts the separation of the two chains: one chain on the right and the other on the left. Each domain of both chains is coloured. To aid visualization, the flexible P9.3 domain (red) is displayed with the unsharpened map at 10σ contour. The proposed binding sites of previously described proteins (RpsU, OapC and OapA) are labelled. c, Secondary structure of OLE dimer. The domains are coloured as in b. df, The intermolecular bridge interactions B1 (d), B2 (e) and B3 (f), coloured by domain. In d, the domain colouring is darker for chain A to differentiate the chains. g, The kink-turn motif that may bind the OapC protein, identical for each monomer. The sharpened cryo-EM map is displayed at the following contours: 7σ (b), 15σ (d), 12σ (f) and 10σ (g).

Our OLE dimer map shows that it is organized as a series of parallel A-form helices, resembling a bundle of pipes. The exterior ends of these pipes from each chain are interconnected into a five-way junction, with a secondary structure that agrees with the previously proposed one for the observed domain with stems P3 to P9.3 (refs. 2,15) (Fig. 1c; hereafter, paired stems, hairpin loops and joining linkers are designated ‘P’, ‘L’ and ‘J’, respectively, following conventional RNA nomenclature). An unusual but highly conserved symmetric interaction comprised of four A–A base pairs between two chains (L4, Fig. 1d), intermolecular base pairing and stacking interactions connecting L5, L6 and L7 (Fig. 1e), and a kissing loop (L9.3, Fig. 1f) ‘weld’ the pipes together in the middle of the complex. We denote these intermolecular interactions ‘bridges’ B1–B3, as used in ribosome nomenclature25. An elaborative list of intramolecular motifs and intermolecular interactions is presented in Supplementary Tables 1 and 2, respectively. Beyond the 5′ region, other conserved parts of OLE were not resolved in the structure, suggesting flexibility.

Surprisingly, regions of OLE that were previously thought to adopt alternative structures upon protein binding are clearly resolved and solvent-accessible, suggesting that proteins may bind the OLE dimer in the pre-formed RNA conformation that we observed here. Our cryo-EM data show that these proteins are not required for the folding of the 5′ domain of OLE, and the RNA structure itself may have a crucial role in organizing these proteins. In particular, the protein OapC was previously hypothesized to bind a kink turn between J4a/5 and J5/6, and binding of OapC was thought to alter secondary structure, in particular increasing protection of J6/7 to in-line hydrolysis17. Our OLE dimer structure supports formation of a kink turn26,27 in J4a/5 at the base of P5, but this kink turn is formed with J6/7, not J5/6 (Fig. 1g). The previously observed protection of J6/7 may therefore be explained by direct binding to the protein, and not by a rearrangement of secondary structure. In addition, whereas the internal loop of the P6 stem is different from the previously proposed one, it exposes residues 163–165, which were proposed to bind the protein RpsU15. A163 is flipped out of the helix and docks into a pocket created by P5, P6, P7 and dimer interface. This OLE dimer pocket is reminiscent of the pocket RpsU occupies in the ribosome, supporting the previous hypothesis that OLE could sequester RpsU15.

ROOL assembles into an ordered nanocage

ROOL is a class of RNAs that is encoded in a wide variety of bacterial prophages and phages, often near tRNA islands1,11,12. The predicted secondary structure is highly complex with multiple pseudoknots, but no protein binding partners have been identified, leading to the hypothesis that ROOL may function as an RNA-only complex1. Although no function has been described for ROOL, it has been shown to be as abundant as 16S ribosomal RNA, but non-essential, in at least one strain of Ligilactobacillus salivarius12.

The 659-nt ROOL env-120, discovered in cow rumen1,28, produces visually clear, symmetric particles in cryo-EM micrographs (Fig. 2a and Extended Data Fig. 2). The 3.1 Å reconstructed map reveals a closed, hollow nanocage structure that comprises 8 chains with dihedral symmetry and a diameter of approximately 280 Å, larger than the maximal dimension of a bacterial ribosome (approximately 250 Å) (Fig. 2b and Supplementary Video 2). Each chain has a secondary structure that is consistent with the stems P1 to P19 proposed previously by covariation analysis12, including the pseudoknot P10 (Fig. 2c). Atomic models for each chain can be built with a good match to the map density as shown by the Q-scores24 (Extended Data Fig. 2e). Our model shows intramolecular tertiary interactions (Fig. 2d–i), which scaffold the flat monomer structure (Fig. 2f–i), including a set of non-canonical base pairs and stacking interactions that connect loops L3a and J6/7 (Fig. 2f), an A-minor interaction between L3c and P5 (Fig. 2g), an additional pseudoknot P13 (adjacent to P5 and P10, Fig. 2h), a complex set of non-canonical pairs between nucleotides that are already in stems P1, P2 and P3b (Fig. 2i), and other motifs (Supplementary Table 1).

Fig. 2: Atomically ordered structure of ROOL homo-octamer.
figure 2

a, Representative micrograph (top; 4,462 micrographs total) with 2D class averages (bottom). Scale bar, 50 nm. b, The 3.1 Å cryo-EM reconstruction of the ROOL complex with D4 symmetry. The map is coloured by 8 labelled chains. In the top view (left), the inner and second circle are labelled. c, Secondary structure of ROOL coloured by domain. Only one chain is shown, in full. Nucleotides involved in intermolecular interactions have been circled in light or dark grey. d,e, Chain 1 is coloured by domain with all other chains in grey. d, Cutaway view, showing chain 1 from the interior of the nanocage with the disordered linker labelled in pink (nucleotides 414–464). e, Intermolecular interactions or bridges of chain 1 are labelled, with kissing loops labelled in magenta, A-minor interactions in cyan and other interactions in lime. One interaction, B7 (grey), is not ordered in this cryo-EM map, but residues come in sufficiently close contact that interactions could form. The same interactions, but between different pairs of chains, share the same number. fi, Selected intramolecular interactions as labelled in d,e. jo, Intermolecular interactions as labelled in e. The sharpened cryo-EM map is displayed at the following contours: 6σ (b,d,e), 8σ (o), 16σ (g,in) and 20σ (f,h).

The ROOL quaternary complex is an octameric nanocage, with a top and bottom half-shells, each formed by 4 chains, hereafter labelled chains 1–4 and 1′−4′. Within a half-shell, each chain forms 8 bridges with its neighbours, 4 on each side, labelled B1–B4 (Fig. 2j–o). Starting from the top, the loop of stem P7 forms an isolated base pair with a bulged out base in stem 7 of the next chain (B3, Fig. 2l). This ‘daisy chain’ of interacting stem-loops forms an inner circle on the top of the half-shell approximately 36 Å in diameter (Fig. 2b). The P7 stem is not always conserved, but a second circle of RNA (Fig. 2b), involving a quaternary kissing loop (B4, Fig. 2m), is highly conserved in evolution, and was identified as tertiary interaction by previous covariation analysis1. An A-minor interaction (B2, Fig. 2k) and a novel quaternary kissing loop (B1, Fig. 2j) further glue together the chains in the half-shell. Between a novel intramolecular tertiary interaction (Fig. 2g) and the intermolecular kissing loop B1 (Fig. 2j), we identified a disordered region that appears to be located inside the nanocage, based on the position of flanking regions (Fig. 2d and Extended Data Fig. 2d,e). This region was previously identified as a linker with little to no sequence or structural similarity across homologues1.

As opposed to a simple dimer such as the OLE interface, where each chain interacts with a single partner, in the ROOL complex, each chain reaches over and interacts with two chains in the other half-shell. These interactions favour the full cage assembly, as opposed to isolated dimers. B5 and B6 are quaternary interactions in which the same sequences from different chains interact via adenosine stacking and Watson–Crick–Franklin base pairing, respectively (Fig. 2n,o). An additional interaction between the internal loop J17/18 of chain 1, previously proposed to form a pseudoknot with the flank of the linker region, and P19 of chain 1′ seems plausible given their proximity, but that region was not well-resolved in our structure.

GOLLD assembles into a distinct nanocage

GOLLD RNAs are the largest among the three RNA classes analysed here, with many members exceeding 800 nucleotides in length3,11,29. GOLLD, similar to ROOL, is a molecule of unknown function encoded in bacterial prophages and phages, often near tRNA islands, but with sequences and secondary structures that are distinct from those of ROOL3,11,29. GOLLD expression has been shown to increase during the lysis of bacterial cells infected by phage11. Unlike ROOL, the predicted secondary structures of GOLLD RNAs consist of a universally conserved 3′ region and a less conserved 5′ region3.

The GOLLD env-38 RNA, first identified in a marine metagenomic sample downstream of Met-tRNA3,30, produces visually striking flower-like particles in cryo-EM micrographs (Fig. 3a and Extended Data Fig. 3). The 3D reconstruction at 3.0 Å resolution shows that GOLLD forms a nanocage, similar to the one formed by ROOL, but larger. The GOLLD structure is a closed 14-mer with D7 quaternary symmetry, with a diameter of 380 Å and a completely empty interior except for a disordered loop (Fig. 3b,c and Supplementary Video 3). Models for each of the 14 chains were built with Q-score24 in accordance with the map resolution (Extended Data Fig. 3g). As with ROOL, kissing loops and A-minor interactions underlie the tertiary and quaternary structure of GOLLD in addition to other motifs (Fig. 3d–t and Supplementary Tables 1 and 2) but the specific interactions are distinct. Beyond confirming the accuracy of the previously predicted secondary structure with stems P1–P27 (Fig. 3d), the tertiary structure of GOLLD reveals prominent interactions, including A-minor interactions involving adenosines at the P3–P4–P5 junction (Fig. 3g), an A-minor interaction between adenosines in L26 and stem P14 (Fig. 3h) and a loop L22 that forms a pseudoknot with the nearby linker J17/22, in addition to an A-minor interaction with that pseudoknot (Fig. 3i). Furthermore, loop L27 brings together seven regions by forming base pairs with stem P23 and linker J24/26 as well as base-backbone interactions with two additional stems, P18 and P22, and linker J17/18 (Fig. 3j). Similar to ROOL, the variable linker within each chain is not resolved, but the positions of immediate flanking sequences in the 5′ and 3′ regions indicate that the linker resides in the interior of the cage (Fig. 3f and Extended Data Fig. 3f,g). Globally, the cryo-EM structure shows that the 5′ region and the 3′ region form separate domains in the 3D structure (Fig. 3c). This separation could explain why the 3′ and 5′ domains are divergent in GOLLD, whereas, in ROOL, the 5′ and 3′ regions are intertwined and hence have to co-evolve to maintain the tertiary and quaternary structure.

Fig. 3: Atomically ordered structure of GOLLD homo-14-mer.
figure 3

a, Representative micrograph (23,281 micrographs total) and 2D class averages. Scale bar, 50 nm. b, The 3.0 Å cryo-EM reconstruction of GOLLD with D7 symmetry, coloured by chain. In the top view (left), the inner circle is labelled. c, The 5′ (blue, residues 1–420) and 3′ (red, residues 421–833) regions of GOLLD organize into the cap and a ring of the half-shell of the nanocage, respectively. To demonstrate the separation of domains, the four regions are artificially moved apart. d, The secondary structure of GOLLD. Only one chain is displayed in full, nucleotides participating in intermolecular interactions that are from other chains are circled in light or dark grey. e, One chain of GOLLD is depicted in rainbow, intermolecular interactions or bridges are labelled, with kissing loops labelled in magenta, A-minor interactions in cyan and other interactions in lime. The same interaction, but between different pairs of chains, share the same number. Each chain interacts with four other chains. f, Same as e, but rotated and is cut away to show the rainbow-labelled chain from the interior of the nanocage with the disordered linker labelled in pink (nucleotides 497–538). gj, Select intramolecular interactions. kt, Intermolecular interactions as labelled in e. The sharpened cryo-EM map is displayed at the following contours: 12σ (e,f,i), 14σ (c,o), 16σ (b,n), 18σ (s,q), 20σ (km,p), 22σ (t), 25σ (h,i) and 30σ (g,j).

The 5′ domains of GOLLD form the cap of each half-shell of the nanocage. Within the cap of each half-shell, each of 7 monomers forms 8 quaternary bridges to other chains—4 on each side, including kissing loops, A-minor interactions and other interactions (B2–B5, Fig. 3e,l–o). B2 (Fig. 3l) closely resembles the daisy chain of interacting stem-loops from ROOL, except that the distance between the pairs of interacting residues is reduced from 9 nt to 4 nt. This compensates for the increased number of chains in GOLLD, resulting in an inner circle of roughly the same diameter as of ROOL. In GOLLD, the only non-interacting loop with a conserved sequence, L11a (sequence GAAA), points towards this inner circle. The 3′ regions complete the half-shell below this 5′ cap through two interactions: an A-minor interaction (B6, Fig. 3p) and a kissing loop between L20 and J13/15, which was previously identified by covariation analysis3 and here shown to be an intermolecular bridge (B7, Fig. 3q). Only a single intermolecular A-minor interaction, B1, glues the 3′ and 5′ regions from different chains together (Fig. 3k).

Finally, similar to the ROOL nanocage, the two half-shells come together with each chain in the top half-shell interacting with two chains in the bottom half-shell. In the GOLLD nanocage, these interactions consist of two self-interacting kissing-loop interactions (B8 and B9, Fig. 3r,s) and an A-minor interaction (B10, Fig. 3t) involving 3′ regions from different chains.

Biological relevance of homo-multimers

Symmetric multimers are common among proteins and rationally designed RNA molecules26,31,32,33, but observations of natural RNA multimers are rare. When observed, natural RNA homomeric interactions typically involve a single contact13. Further, with notable exceptions of viruses, such as HIV and other retroviruses34 and the Φ29 bacteriophage35,36,37, the biological relevance of RNA homomeric complexes has not been demonstrated, leaving the possibility that they form only at high RNA concentrations and extreme ionic conditions or in the context of the specific constructs chosen for in vitro structural characterization. By contrast, several lines of evidence support GOLLD, ROOL and OLE forming multimers in their biological contexts.

First, concomitant with the same set of cryo-EM studies presented above, we resolved a 2.9 Å resolution map of another large RNA molecule, the raiA motif from C. acetobutylicum23,38, as a pure monomer (Extended Data Figs. 4 and 5 and Supplementary Text 1), refuting the possibility that any large RNA would form a multimer in our experimental conditions. Independent studies have also resolved the raiA motif as a monomer39,40. Additionally, we characterized the 343-nt HNH endonuclease-associated RNA and open reading frame (HEARO)39 from Limnospira maxima, which is known to form a defined RNA structure that is involved in DNA nickase activity when bound to the protein IsrB41. Unlike the 5′ region of OLE, which is also known to bind proteins, the HEARO RNA was disordered in the absence of the protein (Extended Data Fig. 6), suggesting that multimer formation of protein-binding RNAs is not an artefact of cryo-EM experimental conditions.

Second, mass photometry, which gives high precision estimates of molecular weight but requires molecular binding to surfaces, confirms the stoichiometry of GOLLD, ROOL and OLE to be 14, 8 and 2, respectively, at RNA concentrations as low as 12.5 nM (Extended Data Fig. 7). This concentration is three orders of magnitude lower than the concentrations in our cryo-EM experiments and corresponds to a population of only around ten RNA molecules in a bacterial cell, substantially lower than what is expected from observed expression levels1.

Third, using dynamic light scattering (DLS; Extended Data Fig. 7), we confirmed that both ROOL and GOLLD primarily form thermostable multimers, with no detectable fraction of monomers, at temperatures up to 55 °C and concentrations as low as 110 nM.

Fourth, for all three structures each chain contains five or more conserved inter-subunit contacts, indicative of intricate arrangements that suggest selection pressure during the evolution of these RNAs.

Fifth, using comparative analysis of both sequences and secondary structures, we detected evolutionary conservation of structural elements and, in particular, the sites of intermolecular interactions supporting RNA homo-oligomerization (Supplementary Files 13 and Supplementary Table 3). Comparative analysis of OLE, ROOL and GOLLD showed that, although the sequences of these RNAs are not highly conserved, all intramolecular stems exhibit extensive base pairing supported by covariation analysis, including stems whose loops are involved in intermolecular bridges (Extended Data Fig. 8a–c, Supplementary Text 2 and Supplementary Table 3). The A positions in the OLE non-canonical A–A base-pair stem bridge B1 and the GOLLD A-minor interaction bridge B6 are highly conserved (Figs. 1d and 3p and Extended Data Fig. 8d,e). The intermolecular base pairs between ROOL J6/8 and L8 (bridge B4, Fig. 2m) were detected as a prominent, conserved quaternary interaction in prior covariation analysis40 (Extended Data Fig. 8b and Supplementary Table 3). In other bridges, we observed intermolecular symmetric kissing loops that had base pairs between the same loop from two different chains: nucleotides 315–318 in chain A and B of OLE (B3) and nucleotides 656–657 from chain 1 and 7′ of GOLLD (B8). Apparent covariance at immediately adjacent nucleotides in these loop sequences supports intermolecular base pairs because base pairing of adjacent nucleotides within the same chain is stereochemically precluded (Extended Data Fig. 8f,g). OLE L9.3 and GOLLD L21a were each found to covary in this manner, switching an internal tetranucleotide between palindromes GGCC to GAUC or AGCU and an internal dinucleotide between GC and CG, supporting bridges OLE B3 and GOLLD B8, respectively (Extended Data Fig. 8f,g). The other symmetric kissing loops in our structures, GOLLD L21b (bridge B9) and ROOL L18 (bridge B6), were highly conserved across the variants for which the loops could be confidently aligned (Extended Data Fig. 8g,h), precluding similar covariance analysis but consistent with the importance of the observed intermolecular interactions.

Discussion

Together, our cryo-EM data, biophysical experiments and evolutionary analyses show that GOLLD, ROOL and OLE each form not only ornate secondary structures but also symmetric quaternary assemblies stabilized by many intermolecular contacts. While this Article was being revised, a publication appeared reporting similar cryo-EM structures, supporting the reproducibility of cryo-EM42. These structures and their complex network of RNA structure motifs offer a rich source of data for RNA structure prediction and design efforts. OLE forms a dimer shaped like a bundle of pipes and exposes structured binding pockets for protein partners such as the membrane-associated OLE-associated protein A (OapA). After superimposing an OapA dimer to each P4a site (OapA is known to bind OLE in a 2:1 ratio41,42,43), we note that the RNA could induce the formation of an OapA tetramer. OapA is a membrane protein, and the tetramer is reminiscent of the double-stranded RNA transporter SID-143,44,45, suggesting that it may be able to accommodate RNA elements, such as the 3′ region of OLE, which was not resolved here (Fig. 4a). In contrast to OLE, and despite unrelated sequences and distinct secondary and tertiary structures, GOLLD and ROOL both form nanocages, suggesting that their function might involve encapsulating their internal disordered linkers and/or other molecules, analogous to proteinaceous microcompartments that are common in bacteria and archaea46. Although not large enough to enclose entire DNA genomes of their parent phages, these cages might contain macromolecules of significant size (Fig. 4b,c), such as phage-encoded tRNAs, which are sometimes present in the GOLLD linker region, bacterial ribosomes, which have been shown to bind GOLLD in pull-down assays45, metabolites, or stress response proteins. It remains to be determined whether nanocage formation is a common feature among large natural RNAs.

Fig. 4: Structure-guided hypotheses for homo-oligomeric RNAs.
figure 4

a, OLE dimer displayed with AlphaFold 347 models of OapC, OapA dimer and RpsU proteins at their proposed binding sites. b,c, The two half-shells of the RNA nanocages are held together by only a few interactions and hence could open up to encapsulate other biomolecules, such as protein, metabolites, nucleic acids or RNA–protein complexes. b, The RNA nanocage formed by ROOL is shown encapsulating a ribosomal large subunit. c, The RNA nanocage formed by GOLLD is shown exposing the covalently linked tRNAs when open. The sharpened cryo-EM maps are displayed at the following contours 7σ (a), 6σ (b) and 12σ (c).

Methods

In vitro RNA synthesis

DNA templates containing the RNA sequence of interest prepended with the T7 promoter (see Supplementary Table 4 for sequences) were ordered as gBlocks from IDT. Primers designed to amplify these sequences (see Supplementary Table 4 for sequences) were also from IDT. PCR amplification was carried out with NEBNext Ultra II Q5 Master Mix (NEB M0544S) using 10 ng of template per reaction. The thermocycler settings were: 98 °C for 30 s, then 35 cycles of 98 °C for 10 s, 55 °C for 30 s, then 72 °C for 30 s, and a final step of 72 °C for 5 min. The PCR products were then column purified using the QIAquick PCR Purification Kit (Qiagen 28104) and run on a 2% E-Gel agarose gel (Thermo Scientific A42135) to check DNA quality. DNA concentration was measured using a NanoDrop. Purified DNA smaller than 515 bp were in vitro transcribed using TranscriptAid T7 High Yield Transcription Kit (Thermo Scientific K0441) with 6 μl of DNA template per reaction. Purified DNA longer than 515 bp were in vitro transcribed using MEGAscript T7 Transcription Kit (Thermo Scientific AM1334) with 6−8 μl DNA template per reaction. These in vitro transcription reactions were incubated at 37 °C for 6 h, then held at 4 °C before DNase treatment. The RNA was then purified using the RNA Clean & Concentrator-25 Kit (Zymo Research R1017) and eluted in 30 μl water. The concentration of purified RNA was measured using a NanoDrop, and the quality was checked using the Agilent 2100 Bioanalyzer (Nano RNA Assay, run by the Stanford PAN Facility; Bioanlyzer 2100 Expert B.02.11.SI824), as shown in Extended Data Fig. 7a.

RNA folding

For all subsequent experiments, RNA was re-folded using the same basic protocol. RNA concentrations used and any other modifications to this standard protocol are mentioned in each section. RNA was denatured (90 °C for 3 min, room temperature for 10 min) in 50 mM Na-HEPES pH 8.0. RNA was then folded with 10 mM MgCl2 at 50 °C for 20 min, and cooled to room temperature for at least 10 min before taking measurements.

Mass photometry

Mass photometry data were collected using the Refeyn TwoMP, using AcquireMP version 2024-R1.1 and DiscoverMP version 2024-R1 to obtain histogram data. For the OLE data, coated glass slides from the MP Sample Preparation Pack (MP-CON-21014) were used, for the ROOL and GOLLD data Mass Glass UC slides (MP-CON-41001) were used after coating with poly-l-lysine. The instrument was focused using droplet-dilution. Data were collected for 1 min using the large image size. The contrast data were calibrated to nucleotide length using the Millennium RNA Markers (Thermo Scientific AM7150). Gaussians were fitted by the automatic analysis in DiscoverMP. The resulting data and plotting code can be found in the accompanying GitHub repository.

Mass photometry data were not reliable for the raiA motif. raiA motif RNA was folded at 1 µM following the standard procedure above. On the stage two dilutions were attempted, 15 µl buffer:2 µl sample for final concentration of 118 nM and 18 µl buffer:2 µl 10x diluted sample for a final concentration of 10 nM. There is a known issue with nucleic acid samples, whereby there are noisy low-mass peaks48 (communication with the company Refeyn). These are not present in the buffer alone. For this reason, raiA motif (205 nt) is below the recommended minimal size for mass photometry, and indeed when we attempted to collect data on the raiA motif we observed noise peaks, containing the size of raiA motif monomer but smaller than any multimer, in both binding and unbinding regimes, indicating unreliable results (data not shown).

OLE was folded at 0.25 µM and was diluted to 12.5 nM on the stage. OLE folded in various buffers, all including 50 mM Na-HEPES pH 8.0, with other components added at the time when MgCl2 is added in the standard protocol: (1) nothing added; (2) 1 mM MgCl2; (3) 10 mM MgCl2, standard; (4) 100 mM MgCl2; (5) 10 mM MgCl2 and 1% ethanol; (6) 10 mM MgCl2 and 5% ethanol; (7) 0 mM MgCl2 and 200 mM KCl; (8) 10 mM MgCl2 and 200 mM KCl; (9) 0 mM MgCl2 and 200 mM NaCl; and (10) 10 mM MgCl2 and 200 mM NaCl. Buffers with MnCl2 were attempted but the manganese saturated the detector.

ROOL and GOLLD were folded at 1 µM. The samples were diluted 10× prior to taking data. On the stage the samples were further diluted (10 µl buffer:10 µl sample) for a final concentration of 50 nM.

DLS of RNA nanocages

RNA was folded at 30 ng µl−1 using the standard folding protocol. DLS traces were collected using the Prometheus Panta. Two replicates (2 capillaries of 10 µl volume, NanoTemper PR-C002) for each RNA were obtained. DLS data of 10× 5 s acquisitions per capillary with laser power 100% were obtained using PR.PantaControl v.1.8.0. The auto-correlation function was calculated and size distribution was fitted using default parameters in PR.PantaAnalysis v.1.8.0. The resulting size distribution tables and plotting code can be found in the accompanying GitHub repository.

Cryo-electron microscopy grid preparation

For all samples, the RNA was frozen using a VitroBot Mark IV, using no. 542 filter paper and Quantifoil 1.2/1.3 200 mesh copper grids which were glow discharged for 30 s at 15 mA. GOLLD was folded at 8 µM, using the standard folding conditions except, after the 50 °C incubation, the temperature was lowered to 37 °C at a rate of 0.1 °C s−1, held at 37 °C for 2 min, and then reduced to room temperature at a rate of 0.1 °C s−1. To increase concentration of GOLLD in the ice, 4 cycles of applying 2 µl of sample and blotting for 3 s were performed before plunging. ROOL was folded at 9.1 µM with the standard folding protocol. The grid was coated with 2 µl of 100 mM NaCl which was blotted for 3 s. Then, 2 µl sample was immediately applied to the grid and blotted for 3 s before plunging into liquid ethane. OLE and raiA motif RNA were frozen with the standard folding protocol at 20 µM and 15 µM respectively; 2 µl of sample was applied to the grid, followed by 3 s blot and plunge into liquid ethane.

Cryo-electron microscopy data collection

All datasets were collected on Titan Krios G3 microscopes using a 50 μm C2 aperture and 100 μm objective aperture and EPU software (v.3.5). The OLE dataset was collected using a Falcon 4 camera with a 10 eV slit on a Selectris energy filter, while the other datasets were collected using a K3 camera with a 20 eV slit on a Bio Quantum energy filter and EPU software. Additional information on dose, magnification, and data collected for each RNA can be found in Extended Data Table 1.

Cryo-electron microscopy data processing

Data were processed live using CryoSparc (v.4.5.3)49 and then further refined, including non-uniform refinement50. For OLE and raiA motif per particle motion correction was performed51. For all datasets, symmetry was not applied until final refinement stages. For OLE, C2 symmetry was applied. For ROOL and GOLLD, D4 and D7 symmetry were applied, respectively, followed by symmetry expansion of the particles and local refinement for one asymmetric unit. Finally, for GOLLD and ROOL subdomains of one asymmetric unit were locally refined and composite maps, and half-maps were created for one asymmetric unit and then composited to the full symmetry using phenix.combine_focused_maps (v.1.21)52. Local resolution was estimated using CryoSparc. See Extended Data Figs. 15 for more details on processing pipelines.

Modelling

Maps were sharpened using phenix.auto_sharpen with half-maps (v.1.21). Initial models for a monomer were obtained from ModelAngelo (Relion-5.0)53; because current versions of ModelAngelo cannot be run on a pure RNA structure, EMDB-17659 was added to the corner of the map, the corresponding protein sequence (Protein Data Bank (PDB): 8PHE) was provided, and protein residues were subsequently deleted from the model. The RNA modelled chains were manually combined tracing the RNA sequences, adding and mutating residues when necessary (in particular, C to U mutations were commonly required). Manual model correction and refinement was accomplished in Coot (version 0.9.8)54. Manual refinement of the monomer was performed using Isolde and Coot54. Symmetry was applied to the model, from henceforth refinement was done asymmetrically due to limitations in refinement programs. Intermolecular interactions were analysed by hand and corrected using Isolde55 and Coot54. DRRAFTER56 (Rosetta 3.10 (2020.42)) was used to fill in low resolution areas. For symmetric kissing loops, these models were selected and fit into the map and refined more symmetrically by hand using Isolde55. Final refinement was first run through phenix.real_space_refine57 followed by piecewise corrections using ERRASER258 (Rosetta 3.10 (2020.42)), followed by manual refinement in Coot54 and Isolde55 when necessary. A protocol was created to enable refinement on the large ROOL and GOLLD complexes. First, the monomer was refined in ~30 sections splitting the model and map prior to using ERRASER2. These were then stitched together and regions encompassing the stitch sites were further refined. Finally, problematic regions of the monomer were refined further. Symmetry was applied to the monomer and the interaction sites were refined in parallel until interactions were sufficiently realistic with only minor clashes. Throughout, split points were manually edited if they caused minimization errors or to include interaction residues. The following ERRASER2 command was used, repeating if not yet converged:

$ERRASER -s $PDB -edensity:mapfile $MAP -edensity::mapreso $RESOLUTION -score:weights stepwise/rna/rna_res_level_energy7beta.wts -set_weights elec_dens_fast 40 cart_bonded 5.0 linear_chainbreak 10.0 chainbreak 10.0 fa_rep 1.5 fa_intra_rep 0.5 rna_torsion 10 suiteness_bonus 5 rna_sugar_close 10 -rmsd_screen 3.0 -mute core.scoring.CartesianBondedEnergy core.scoring.electron_density.xray_scattering -rounds 3 -fasta $FASTA -cryoem_scatterers -rna:erraser:fixed_res $FIXED.

Validation metrics were calculated using Phenix, including phenix.rna_validate59,60,61. ChimeraX (version 1.8)62 was used to calculate Q-score61 and for all visuals.

Base pairing and base stacking were identified using Rosetta rna_motif63. Kink turns and ribose zippers were identified using DSSR with the “–k-turns” flag (v.1.9.9)64. Z-anchors were manually labelled by aligning every 5-nt range of each structure to a representative Z-anchor (4E8Q residues 108–111) and manually inspected each region that had root mean squared deviation (rmsd) < 4 Å. Secondary structure was drawn using RiboDraw with manual manipulation63. For visualizing a hypothetical OLE RNA–protein complex, AlphaFold 3 (server version)47 was used to predict: (1) a OapA dimer with a OapC monomer; and (2) RpsU using the sequences in (Supplementary Table 4). The OapA dimer was fitted into the proposed RNA site manually. The OapC was close to its presumed binding site, but clashed with RNA and therefore its position was manually adjusted. RpsU was also placed manually in its proposed binding site. C2 symmetry was then applied to visualize the full complex.

Bioinformatic analysis

Bacterial genomes were downloaded from National Center for Biotechnology Information (NCBI) Genome database in February 2024 (https://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/). GenBank records for phage genomes were downloaded in March 2024 (https://millardlab.org/bacteriophage-genomics/phage-genomes-march-2024/). Sequence profiles of GOLLD, ROOL and OLE were downloaded from the Rfam database (ftp.ebi.ac.uk/pub/databases/Rfam/) on March 2024. A custom sequence profile of raiA motif was built using the reported alignment63. To retrieve ncRNAs from genome sequences, cmsearch was conducted using sequence profiles with a cutoff value of 10−5 (Infernal 1.15)65. The overall procedure yielded the following numbers of nonredundant ncRNA sequences: 806, GOLLD; 1,596, ROOL; 8,585, OLE; 4,875, raiA motif.

The Infernal software65 (v.1.1.2) was used to compare candidate RNA structures against Rfam models (cmscan), build and calibrate new covariance models (cmbuild, cmcalibrate) for separate clusters of RNAs, and perform structure-informed homology searches (cmsearch). Comparative analysis and multiple alignments for isolated RNA candidates were conducted using cmalign65 and MUSCLE (v.5)66 with pairwise comparisons refined using the OWEN program67. Evolutionary history was inferred via the Maximum Likelihood method with different models in MEGA X68. Evolutionary analysis of compensatory substitutions in isolated clusters was performed by DecipherSSC69.

RNAalifold (from ViennaRNA 2.7.0)70 applied to computationally fold multiple RNA alignments, and Afold/Hybrid71,72 were used to predict locally folded secondary structures or hybrid duplex elements within clusters. Covariation analysis was performed with R-scape (v.1.2.3)73, which annotates multiple structural alignments of RNAs using statistically significant covariations (E-value < 0.05) as base-pairing constraints.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.