RNA secondary structure packages evaluated and improved by high-throughput experiments

Wayment-Steele, Hannah K.; Kladwang, Wipapat; Strom, Alexandra I.; Lee, Jeehyung; Treuille, Adrien; Becka, Alex; Das, Rhiju

doi:10.1038/s41592-022-01605-0

Article
Published: 03 October 2022

RNA secondary structure packages evaluated and improved by high-throughput experiments

Hannah K. Wayment-Steele^1,2,
Wipapat Kladwang^2,3,
Alexandra I. Strom^3,4,
Jeehyung Lee^2,5,
Adrien Treuille^2,5,
Alex Becka ORCID: orcid.org/0000-0002-1552-4881³,
Eterna Participants &
…
Rhiju Das ORCID: orcid.org/0000-0001-7497-0972^2,3,6

Nature Methods volume 19, pages 1234–1242 (2022)Cite this article

20k Accesses
120 Citations
25 Altmetric
Metrics details

Subjects

Abstract

Despite the popularity of computer-aided study and design of RNA molecules, little is known about the accuracy of commonly used structure modeling packages in tasks sensitive to ensemble properties of RNA. Here, we demonstrate that the EternaBench dataset, a set of more than 20,000 synthetic RNA constructs designed on the RNA design platform Eterna, provides incisive discriminative power in evaluating current packages in ensemble-oriented structure prediction tasks. We find that CONTRAfold and RNAsoft, packages with parameters derived through statistical learning, achieve consistently higher accuracy than more widely used packages in their standard settings, which derive parameters primarily from thermodynamic experiments. We hypothesized that training a multitask model with the varied data types in EternaBench might improve inference on ensemble-based prediction tasks. Indeed, the resulting model, named EternaFold, demonstrated improved performance that generalizes to diverse external datasets including complete messenger RNAs, viral genomes probed in human cells and synthetic designs modeling mRNA vaccines.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Community-science-designed RNA datasets from the Eterna ‘Cloud Lab’ experiments identify consistent discrepancies in ensemble calculations from secondary structure packages.**

**Fig. 2: Riboswitch affinity predictions reveal similar package ranking.**

**Fig. 3: Multitask training using EternaBench datasets results in improved thermodynamic prediction.**

**Fig. 4: EternaFold improved prediction extends across diverse natural RNA contexts and experiments.**

Comprehensive datasets for RNA design, machine learning, and beyond

Article Open access 01 July 2025

Deep learning models for predicting RNA degradation via dual crowdsourcing

Article Open access 14 December 2022

Accurate RNA 3D structure prediction using a language model-based deep learning approach

Article Open access 21 November 2024

Data availability

All datasets used here for evaluation are available at https://www.github.com/eternagame/EternaBench. The original Cloud Lab datasets are available at the RNA Mapping Database²⁸ under accession IDs ETERNA_R00_0000 (round 00), ETERNA_R69_0000 (round 01), ETERNA_R70_0000 (round 02), ETERNA_R71_0000 (round 03), ETERNA_R72_0000 (round 04), ETERNA_R73_0000 (round 05), ETERNA_R74_0000 (round 06), ETERNA_R75_0000 (round 07), ETERNA_R76_0000 (round 08), ETERNA_R77_0002 (round 09), ETERNA_R78_0001 (round 10), ETERNA_R79_0001 (round 11), ETERNA_R80_0001 (round 12), ETERNA_R81_0001 (round 13), ETERNA_R82_0001 (round 14), ETERNA_R83_0003 (round 15), ETERNA_R84_0000 (round 16), ETERNA_R85_0000 (round 17), ETERNA_R86_0000 (round 18), ETERNA_R87_0001 (round 19), ETERNA_R89_0000 (round 20), ETERNA_R91_0000 (round 21), ETERNA_R92_0000 (round 22) and ETERNA_R94_0000 (round 23). A list of RMDB accession IDs or URLs corresponding to the data used for benchmarking SHAPE-guided folding is in Supplementary Table 12. Source data are provided with this paper.

Code availability

The datasets used here for evaluation, as well as scripts and Python notebooks for reproducing the filtered datasets and the chemical mapping and riboswitch affinity calculations described here, are available at https://www.github.com/eternagame/EternaBench. The code for training EternaFold is available at https://www.github.com/eternagame/EternaFold. A server to run EternaFold is available at https://eternafold.eternagame.org/. The EternaFold code is derived from the CONTRAfold-SE³⁶ codebase, which is derived from the CONTRAfold¹¹ codebase.

References

Amaral, P. P., Dinger, M. E., Mercer, T. R. & Mattick, J. S. The eukaryotic genome as an RNA machine. Science 319, 1787–1789 (2008).
Article CAS PubMed Google Scholar
Singh, V., Braddick, D. & Dhar, P. K. Exploring the potential of genome editing CRISPR-Cas9 technology. Gene 599, 1–18 (2017).
Article CAS PubMed Google Scholar
Jaffrey, S. R. RNA-based fluorescent biosensors for detecting metabolites in vitro and in living cells. Adv. Pharm. 82, 187–203 (2018).
Article CAS Google Scholar
Kramps, T. & Elbers, K. Introduction to RNA Vaccines. In: Kramps, T., Elbrs, K. (eds) RNA Vaccines. Methods Mol. Biol. Vol. 1499, 1–11 (2017).
Zuker, M. & Stiegler, P. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9, 133–148 (1981).
Article CAS PubMed PubMed Central Google Scholar
Lorenz, R. et al. ViennaRNA package 2.0. Algorithms Mol. Biol. 6, 26 (2011).
Article PubMed PubMed Central Google Scholar
Zadeh, J. N. et al. NUPACK: analysis and design of nucleic acid systems. J. Comput. Chem. 32, 170–173 (2011).
Article CAS PubMed Google Scholar
Reuter, J. S. & Mathews, D. H. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinf. 11, 129 (2010).
Article Google Scholar
Xia, T. et al. Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. Biochemistry 37, 14719–14735 (1998).
Article CAS PubMed Google Scholar
Andronescu, M., Condon, A., Hoos, H. H., Mathews, D. H. & Murphy, K. P. Efficient parameter estimation for RNA secondary structure prediction. Bioinformatics 23, i19–i28 (2007).
Article CAS PubMed Google Scholar
Do, C. B., Woods, D. A. & Batzoglou, S. CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics 22, e90–e98 (2006).
Article CAS PubMed Google Scholar
Sloma, M. F. & Mathews, D. H. Base pair probability estimates improve the prediction accuracy of RNA non-canonical base pairs. PLoS Comput. Biol. 13, e1005827 (2017).
Article PubMed PubMed Central Google Scholar
Rezaur Rahman Chowdhury, F.A., Zhang, H. & Huang, L. Learning to fold RNAs in linear time. Preprint at bioRxiv, 852871 (2019).
Akiyama, M., Sato, K. & Sakakibara, Y. A max-margin training of RNA secondary structure prediction integrated with the thermodynamic model. J. Bioinform Comput Biol. 16, 1840025 (2018).
Article PubMed Google Scholar
Singh, J., Hanson, J., Paliwal, K. & Zhou, Y. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat. Commun. 10, 5407 (2019).
Article PubMed PubMed Central Google Scholar
Puton, T., Kozlowski, L. P., Rother, K. M. & Bujnicki, J. M. CompaRNA: a server for continuous benchmarking of automated methods for RNA secondary structure prediction. Nucleic Acids Res. 41, 4307–4323 (2013).
Article CAS PubMed PubMed Central Google Scholar
Wayment-Steele, H., Wu, M., Gotrik, M. & Das, R. Evaluating riboswitch optimality. Methods Enzymol. 623, 417–450 (2019).
Article CAS PubMed PubMed Central Google Scholar
Berens, C. & Suess, B. Riboswitch engineering–making the all-important second and third steps. Curr. Opin. Biotechnol. 31, 10–15 (2015).
Article CAS PubMed Google Scholar
Mauger, D. M. et al. mRNA structure regulates protein expression through changes in functional half-life. Proc. Natl Acad. Sci. USA 116, 24075–24083 (2019).
Article CAS PubMed PubMed Central Google Scholar
Watters, K. E. & Lucks, J. B. Mapping RNA structure in vitro with SHAPE chemistry and next-generation sequencing (SHAPE-Seq). Methods Mol. Biol. 1490, 135–162 (2016).
Article CAS PubMed Google Scholar
Wilkinson, K. A., Merino, E. J. & Weeks, K. M. Selective 2’-hydroxyl acylation analyzed by primer extension (SHAPE): quantitative RNA structure analysis at single nucleotide resolution. Nat. Protoc. 1, 1610–1616 (2006).
Article CAS PubMed Google Scholar
Tian, S. & Das, R. RNA structure through multidimensional chemical mapping. Q. Rev. Biophys. 49, e7 (2016).
Article PubMed Google Scholar
Denny, S. K. et al. High-throughput investigation of diverse junction elements in RNA tertiary folding. Cell 174, 377–390 e320 (2018).
Article CAS PubMed PubMed Central Google Scholar
Buenrostro, J. D. et al. Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes. Nat. Biotechnol. 32, 562–568 (2014).
Article CAS PubMed PubMed Central Google Scholar
Lee, J. et al. RNA design rules from a massive open laboratory. Proc. Natl Acad. Sci. USA 111, 2122–2127 (2014).
Article PubMed PubMed Central Google Scholar
Delli Ponti, R., Marti, S., Armaos, A. & Tartaglia, G. G. A high-throughput approach to profile RNA structure. Nucleic Acids Res. 45, e35 (2017).
Article PubMed Google Scholar
Eddy, S. R. Analysis of conserved RNA secondary structure in transcriptomes and genomes. Annu. Rev. Biophys. 43, 433–456 (2014).
Cordero, P., Lucks, J. B. & Das, R. An RNA mapping database for curating RNA structure mapping experiments. Bioinformatics 28, 3006–3008 (2012).
Article CAS PubMed PubMed Central Google Scholar
Wellington-Oguri, R. et al. Evidence of an unusual Poly(A) RNA signature detected by high-throughput chemical mapping. Biochemistry 59, 2041–2046 (2020).
Article CAS PubMed Google Scholar
Anderson-Lee, J. et al. Principles for predicting RNA secondary structure design difficulty. J. Mol. Biol. 428, 748–757 (2016).
Article CAS PubMed PubMed Central Google Scholar
Beisel, C. L. & Smolke, C. D. Design principles for riboswitch function. PLoS Comput. Biol. 5, e1000363 (2009).
Article PubMed PubMed Central Google Scholar
Breaker, R. R. Prospects for riboswitch discovery and analysis. Mol. Cell 43, 867–879 (2011).
Article CAS PubMed PubMed Central Google Scholar
Andreasson, J. O. L. et al. Crowdsourced RNA design discovers diverse, reversible, efficient, self-contained molecular switches. Proc. Natl Acad. Sci. USA 119, e2112979119 (2022).
Article CAS PubMed PubMed Central Google Scholar
Wu, M. J., Andreasson, J. O. L., Kladwang, W., Greenleaf, W. & Das, R. Automated design of diverse stand-alone riboswitches. ACS Synth. Biol. 8, 1838–1846 (2019).
Article CAS PubMed PubMed Central Google Scholar
Andronescu, M., Condon, A., Hoos, H. H., Mathews, D. H. & Murphy, K. P. Computational approaches for RNA energy parameter estimation. RNA 16, 2304–2318 (2010).
Article CAS PubMed PubMed Central Google Scholar
Foo, C.-S. & Pop, C. Learning RNA secondary structure (only) from structure probing data. Preprint at bioRxiv, 152629 (2017).
Andronescu, M., Bereg, V., Hoos, H. H. & Condon, A. RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinf. 9, 340 (2008).
Article Google Scholar
Sloma, M. F. & Mathews, D. H. Exact calculation of loop formation probability identifies folding motifs in RNA secondary structures. RNA 22, 1808–1818 (2016).
Article CAS PubMed PubMed Central Google Scholar
Watters, K. E. et al. Probing of RNA structures in a positive sense RNA virus reveals selection pressures for structural elements. Nucleic Acids Res. 46, 2573–2584 (2018).
Article CAS PubMed Google Scholar
Watts, J. M. et al. Architecture and secondary structure of an entire HIV-1 RNA genome. Nature 460, 711–716 (2009).
Article CAS PubMed PubMed Central Google Scholar
Kutchko, K. M. et al. Structural divergence creates new functional features in alphavirus genomes. Nucleic Acids Res. 46, 3657–3670 (2018).
Article CAS PubMed PubMed Central Google Scholar
Siegfried, N. A., Busan, S., Rice, G. M., Nelson, J. A. & Weeks, K. M. RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP). Nat. Methods 11, 959–965 (2014).
Article CAS PubMed PubMed Central Google Scholar
Dadonaite, B. et al. The structure of the influenza A virus genome. Nat. Microbiol 4, 1781–1789 (2019).
Article CAS PubMed PubMed Central Google Scholar
Simon, L. M. et al. In vivo analysis of influenza A mRNA secondary structures identifies critical regulatory motifs. Nucleic Acids Res. 47, 7003–7017 (2019).
Article CAS PubMed PubMed Central Google Scholar
Huber, R. G. et al. Structure mapping of dengue and Zika viruses reveals functional long-range interactions. Nat. Commun. 10, 1408 (2019).
Article PubMed PubMed Central Google Scholar
Huston, N. C. et al. Comprehensive in vivo secondary structure of the SARS-CoV-2 genome reveals novel regulatory motifs and mechanisms. Mol. Cell 81, 584–598 e585 (2021).
Article CAS PubMed PubMed Central Google Scholar
Manfredonia, I. et al. Genome-wide mapping of SARS-CoV-2 RNA structures identifies therapeutically-relevant elements. Nucleic Acids Res. 48, 12436–12452 (2020).
Article CAS PubMed PubMed Central Google Scholar
Sun, L. et al. In vivo structural characterization of the SARS-CoV-2 RNA genome identifies host proteins vulnerable to repurposed drugs. Cell 184, 1865–1883 e1820 (2021).
Article CAS PubMed PubMed Central Google Scholar
Lavender, C. A., Gorelick, R. J. & Weeks, K. M. Structure-based alignment and consensus secondary structures for three HIV-related RNA genomes. PLoS Comput. Biol. 11, e1004230 (2015).
Article PubMed PubMed Central Google Scholar
Deigan, K. E., Li, T. W., Mathews, D. H. & Weeks, K. M. Accurate SHAPE-directed RNA structure determination. Proc. Natl Acad. Sci. USA 106, 97–102 (2009).
Article CAS PubMed Google Scholar
McGinnis, J. L. & Weeks, K. M. Ribosome RNA assembly intermediates visualized in living cells. Biochemistry 53, 3237–3247 (2014).
Article CAS PubMed Google Scholar
Leppek, K. et al. Combinatorial optimization of mRNA structure, stability, and translation for RNA-based therapeutics. Nat. Commun. 13, 1536 (2022).
Article CAS PubMed PubMed Central Google Scholar
Sun, L. et al. RNA structure maps across mammalian cellular compartments. Nat. Struct. Mol. Biol. 26, 322–330 (2019).
Article CAS PubMed PubMed Central Google Scholar
Becker, W. R. et al. Quantitative high-throughput tests of ubiquitous RNA secondary structure prediction algorithms via RNA/protein binding. Preprint at bioRxiv, 571588 (2019).
Rouskin, S., Zubradt, M., Washietl, S., Kellis, M. & Weissman, J. S. Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505, 701–705 (2014).
Article CAS PubMed Google Scholar
Morandi, E. et al. Genome-scale deconvolution of RNA structure ensembles. Nat. Methods 18, 249–252 (2021).
Article CAS PubMed Google Scholar
Hajdin, C. E. et al. Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots. Proc. Natl Acad. Sci. USA 110, 5498–5503 (2013).
Article CAS PubMed PubMed Central Google Scholar
Zarringhalam, K., Meyer, M. M., Dotu, I., Chuang, J. H. & Clote, P. Integrating chemical footprinting data into RNA secondary structure prediction. PLoS ONE 7, e45160 (2012).
Article CAS PubMed PubMed Central Google Scholar
Sato, K., Akiyama, M. & Sakakibara, Y. RNA secondary structure prediction using deep learning with thermodynamic integration. Nat. Commun. 12, 941 (2021).
Article CAS PubMed PubMed Central Google Scholar
Chen, X., Li, Y., Umarov, R., Gao, X. &, Song, L. RNA secondary structure prediction by learning unrolled algorithms. In Proceedings of the 8th International Conference on Learning Representations (2020).
Ward, M., Datta, A., Wise, M. & Mathews, D. H. Advanced multi-loop algorithms for RNA secondary structure prediction reveal that the simplest model is best. Nucleic Acids Res. 45, 8541–8550 (2017).
Article CAS PubMed PubMed Central Google Scholar
Zhao, B. S., Roundtree, I. A. & He, C. Post-transcriptional gene regulation by mRNA modifications. Nat. Rev. Mol. Cell Biol. 18, 31–42 (2017).
Article CAS PubMed Google Scholar
Rinnenthal, J. et al. Mapping the landscape of RNA dynamics with NMR spectroscopy. Acc. Chem. Res. 44, 1292–1301 (2011).
Article CAS PubMed Google Scholar
Kappel, K. et al. Accelerated cryo-EM-guided determination of three-dimensional RNA-only structures. Nat. Methods 17, 699–707 (2020).
Article CAS PubMed PubMed Central Google Scholar
McCaskill, J. S. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29, 1105–1119 (1990).
Article CAS PubMed Google Scholar
Washietl, S., Hofacker, I. L., Stadler, P. F. & Kellis, M. RNA folding with soft constraints: reconciliation of probing data and thermodynamic secondary structure prediction. Nucleic Acids Res. 40, 4261–4272 (2012).
Article CAS PubMed PubMed Central Google Scholar
Deng, F., Ledda, M., Vaziri, S. & Aviran, S. Data-directed RNA secondary structure prediction using probabilistic modeling. RNA 22, 1109–1119 (2016).
Article CAS PubMed PubMed Central Google Scholar
Cordero, P. & Das, R. Rich RNA structure landscapes revealed by mutate-and-map analysis. PLoS Comput. Biol. 11, e1004473 (2015).
Article PubMed PubMed Central Google Scholar
Xu, Y. et al. Hoogsteen base pairs increase the susceptibility of double-stranded DNA to cytotoxic damage. J. Biol. Chem. 295, 15933–15947 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kladwang, W. et al. Standardization of RNA chemical mapping experiments. Biochemistry 53, 3063–3065 (2014).
Article CAS PubMed Google Scholar
Seetin, M. G., Kladwang, W., Bida, J. P. & Das, R. Massively parallel RNA chemical mapping with a reduced bias MAP-seq protocol. Methods Mol. Biol. 1086, 95–117 (2014).
Article CAS PubMed Google Scholar
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
Article CAS PubMed PubMed Central Google Scholar
Kladwang, W. et al. Anomalous reverse transcription through chemical modifications in polyadenosine stretches. Biochemistry 59, 2154–2170 (2020).
Article CAS PubMed Google Scholar
Zhang, H., Zhang, L., Mathews, D. H. & Huang, L. LinearPartition: linear-time approximation of RNA folding partition function and base-pairing probabilities. Bioinformatics 36, i258–i267 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zou, G. Y. Toward using confidence intervals to compare correlations. Psychol. Methods 12, 399–413 (2007).
Article PubMed Google Scholar
Diedenhofen, B. & Musch, J. cocor: a comprehensive solution for the statistical comparison of correlations. PLoS ONE 10, e0121945 (2015).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank members of the Das and Barna laboratories (Stanford University), C. Pop and C.-S. Foo for useful discussions. We thank I. Jarmoskaite, V.V. Topkar, R. Rangan and J. Townley for helpful comments on the manuscript. Calculations and model training were performed on the Stanford Sherlock cluster. We acknowledge funding from the National Science Foundation (GRFP to H.K.W.S.), the National Institute of Health (grant no. R35 GM122579 to R.D.) and gifts to the Eterna OpenVaccine project from donors listed in Supplementary Table 13.

Author information

A list of members and their affiliations appears in the Supplementary Information.

Authors and Affiliations

Department of Chemistry, Stanford University, Stanford, CA, USA
Hannah K. Wayment-Steele
Eterna Massive Open Laboratory, Stanford, CA, USA
Hannah K. Wayment-Steele, Wipapat Kladwang, Jeehyung Lee, Adrien Treuille & Rhiju Das
Department of Biochemistry, Stanford University, Stanford, CA, USA
Wipapat Kladwang, Alexandra I. Strom, Alex Becka & Rhiju Das
Department of Chemistry and Biochemistry, San Diego State University, San Diego, CA, USA
Alexandra I. Strom
Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Jeehyung Lee & Adrien Treuille
Department of Physics, Stanford University, Stanford, CA, USA
Rhiju Das

Authors

Hannah K. Wayment-Steele
View author publications
Search author on:PubMed Google Scholar
Wipapat Kladwang
View author publications
Search author on:PubMed Google Scholar
Alexandra I. Strom
View author publications
Search author on:PubMed Google Scholar
Jeehyung Lee
View author publications
Search author on:PubMed Google Scholar
Adrien Treuille
View author publications
Search author on:PubMed Google Scholar
Alex Becka
View author publications
Search author on:PubMed Google Scholar
Rhiju Das
View author publications
Search author on:PubMed Google Scholar

Consortia

Eterna Participants

Contributions

H.K.W.S. and R.D. designed the EternaBench benchmark approach and EternaFold multitask training method. H.K.W.S. prepared the EternaBench datasets, performed analyses and implemented and trained the EternaFold model. H.K.W.S. and R.D. wrote the manuscript. W.K. designed methods, acquired data for high-throughput chemical mapping experiments and reviewed the manuscript. A.I.S. performed data analyses and visualizations. W.K., J.L., A.T. and R.D. designed and implemented the Eterna Cloud Lab initiative. A.B. generated SHAPE and DMS data for RNAs of known structure used in SHAPE-directed folding benchmarking. Eterna participants created online design projects, provided RNA solutions and reviewed the manuscript (Supplementary Table 3).

Corresponding author

Correspondence to Rhiju Das.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Hashim Al-Hashimi and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Rita Strack, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Extended analysis of package rankings based on Eterna Cloud lab chemical mapping data.

a) Pearson correlation of all package options tested on Cloud Lab Round 1, which was also a holdout test set for EternaFold training studies. Mean ± SEM of Pearson correlation calculated via bootstrapping, n = 1088 independent constructs. b) ViennaRNA 2, NUPACK 1999, and RNAstructure show maximum Pearson correlation to chemical mapping data at 60 °C, 40 °C, and 60 °C respectively for Eterna Cloud Lab Round 1. Mean ± SEM of Pearson correlation calculated via bootstrapping, n = 1088 independent constructs. c) Ranking across Cloud lab dataset rounds using Spearman rank correlation (compare to Fig. 1e, f). Error bars represent 95% confidence interval of the mean obtained over 1000 iterations of bootstrapping over 24 independent experiments, n = 12,711 independent constructs total. d) (Top) Mean Pearson correlations, calculated over each project (as opposed to each dataset), compared to sequence metrics of the Cloud Lab projects. The strongest correlation to mean correlation was Signal/Noise ratio. (Bottom) Z-score of CONTRAfold-2, calculated over each project, compared to sequence metrics of the Cloud Lab projects.

Source data

Extended Data Fig. 2 Example chemical mapping predictions from all package options tested.

Example heatmaps of all package options tested for the ‘Aires’ project (compare to Fig. 1c).

Source data

Extended Data Fig. 3 Summary statistics for EternaBench datasets before and after performing CD-HIT filtering.

a) Distributions of sequence properties for chemical mapping data (n = 38,846 before filtering and n = 12,711 independent constructs after filtering, collected across 24 experiments), and B) riboswitch constructs (n = 19,016 independent constructs and n = 7,228 independent constructs after filtering, collected in 12 experiments). Dataset statistics of EternaBench train and test experimental rounds for (c) Chemical Mapping (Train set: n = 3,476 independent constructs collected over 6 experiments. Test set: n = 1,492 independent constructs collected over 18 experiments) and (d) Riboswitch data (Train set: n = 2,508 independent constructs collected over 3 experiments. Test set: n = 4,018 independent constructs collected over 9 experiments). Center dot, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range. For all subplots: center dot, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range.

Source data

Extended Data Fig. 4 Overview of all Cloud Labs data.

Example reactivity and p(unpaired) heatmaps from example packages for all 24 Cloud Lab rounds. Data have been filtered to exclude nucleotides with reactivity equal to zero or less.

Extended Data Fig. 5 Extended analysis of package rankings based on riboswitch activity predictions.

a) Example set of states for a riboswitch that toggles binding of the fluorescent MS2 protein as an output, controlled by binding the small molecule FMN. The equilibrium constant for forming the MS2 aptamer in the absence of ligand, \(K_{MS2}^{ - lig}\), is estimated using the probability of forming the closing base pair for all packages. b) Riboswitch Z-scores stratified by input ligand type. Error bars represent standard error on Z-score as calculated by bootstrapping from 6402, 440, and 386 constructs collected over 8, 2, and 2 experiments, respectively. c) Overall ranking \(K_{MS2}^{ - lig}\) calculations using the calculated Spearman correlation (no linear assumption, compare to Fig. 2b). Evaulating the Pearson Correlation of package calculations for (d) \(K_{MS2}^{ + lig}\) as well as (e) riboswitch Activation Ratio results in a similar ranking. In C, D, E, error bars represent 95% confidence interval of the mean obtained over 1000 iterations of bootstrapping across datasets, n = 7,228 independent constructs collected over 12 experiments.

Source data

Extended Data Fig. 6 Example riboswitch predictions from all package options tested.

Scatterplots for all options tested for Ribologic dataset. Black solid line indicates line of best fit.

Source data

Extended Data Fig. 7 Example riboswitch predictions across all datasets.

Scatterplots for representative packages on all riboswitch datasets. Black solid line indicates line of best fit.

Source data

Extended Data Fig. 8 Effect of window size and Levenshtein distance filtering for independent chemical mapping test set.

a) Calculating p(unpaired) using varying sliding windows of size 300, 600, and 1200 does not change the overall ranking obtained across datasets, compare to Fig. 4b, which was calculated for window size 900 (n = 31 datasets for all). Package ranking is also consistent for a redundancy cutoff of 40% b) (n = 16 datasets included after filtering based on 40% cutoff by windowed Levenshtein distance). Error bars in A and B represent 95% confidence interval for the mean Z-score as calculated by bootstrapping across respective number of datasets for each.

Source data

Extended Data Fig. 9 Extended data corresponding to EternaFold development and test set evaluation.

a) Comparing Vienna, CONTRAfold, and EternaFold predictions in predicting free energy of PUM binding. i) Replication of ddG_exp for both PUM WT and mutant binding from (Becker, 2019). The same calculation in Vienna 2 at 37 °C shows lower Root-mean-squared error (RMSE) (ii), but higher RMSE at 60 °C (iii). CONTRAfold 2 shows no improvement over Vienna at 37 °C (iv), but EternaFold shows modest improvement over both (v). b) Package performance for the S-Processed test set is qualitatively similar to results on the ArchiveII-NR test set (cf. Figure 3b). Error bars represent 95% confidence interval of the mean calculated with 1000 iterations of bootstrapping over n = 6 independent datasets, which contain 974 independent constructs total. c) Evaluating SHAPE- and DMS- directed folding. Error bars represent 95% confidence interval of the mean calculated with 1000 iterations of bootstrapping over n = 5 independent datasets of RNAs with known secondary structures,, which contain 47 constructs total. d) Potentials learned from EternaFold training and used in SHAPE-directed structure prediction.

Source data

Extended Data Fig. 10 Extended data corresponding to predicting riboswitch affinity in the presence of small molecule ligands.

a) \(\log K_{MS2}^{ - lig}\) and \(\log K_{MS2}^{ + lig}\) values of riboswitches included in filtered datasets. Black starred datapoint indicates reference value used for \(K_{obs}^{ref}\). b) Estimates for the RiboLogic FMN dataset for \(\log K_{MS2}^{ + lig}\) in all package options able to make estimates with constrained-partition functions.

Source data

Supplementary information

Reporting Summary

Supplementary Table 1

Supplementary Tables 1–14.

Source data

Source Data Fig. 1

Raw source data in reactivity heatmap, raw project analysis source data, raw source P(unpaired) calculations, raw project analysis source data, raw z-scores and significant data across datasets.

Source Data Fig. 2

Raw source K_fold values and calculations, raw z-scores and significant data across datasets.

Source Data Fig. 3

Raw z-scores and significant data across datasets.

Source Data Fig. 4

Raw z-scores and significant data across datasets, raw windowed correlation traces and raw windowed correlation traces.

Source Data Extended Data Fig. 1

Raw z-scores and significant data across datasets and raw project analysis source data.

Source Data Extended Data Fig. 2

Raw source P(unpaired) calculations.

Source Data Extended Data Fig. 3

EternaBench Chemical mapping summary statistics, EternaBench riboswitch summary statistics, chemical mapping train/test split statistics and riboswitch train/test split statistics.

Source Data Extended Data Fig. 5

Raw source K_fold values and calculations.

Source Data Extended Data Fig. 6

Raw source K_fold values and calculations.

Source Data Extended Data Fig. 7

Raw z-scores and significant data across datasets.

Source Data Extended Data Fig. 8

Raw z-scores and significant data across datasets.

Source Data Extended Data Fig. 9

Pumilio protein sequences and K_fold calculations, Raw STRAND test set structure prediction metrics, SHAPE-directed folding z-scores and significant data across datasets.

Source Data Extended Data Fig. 10

Raw source K_fold values and calculations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wayment-Steele, H.K., Kladwang, W., Strom, A.I. et al. RNA secondary structure packages evaluated and improved by high-throughput experiments. Nat Methods 19, 1234–1242 (2022). https://doi.org/10.1038/s41592-022-01605-0

Download citation

Received: 29 May 2020
Accepted: 10 August 2022
Published: 03 October 2022
Version of record: 03 October 2022
Issue date: October 2022
DOI: https://doi.org/10.1038/s41592-022-01605-0

This article is cited by

All-atom RNA structure determination from cryo-EM maps
- Tao Li
- Jiahua He
- Sheng-You Huang
Nature Biotechnology (2025)
Integrated analysis of post-transcriptional regulations reveals insights into acute myeloid leukemia
- Elissa Khadra
- Zoaila Iqbal
- Ismael Boussaid
Communications Biology (2025)
RNA sample optimization for cryo-EM analysis
- Xingyu Chen
- Liu Wang
- Zhaoming Su
Nature Protocols (2025)
Generative and predictive neural networks for the design of functional RNA molecules
- Aidan T. Riley
- James M. Robson
- Alexander A. Green
Nature Communications (2025)
Translational activators align mRNAs at the small mitoribosomal subunit for translation initiation
- Joseph B. Bridgers
- Andreas Carlström
- L. Stirling Churchman
Nature Structural & Molecular Biology (2025)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

Eterna Participants

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links