Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

RNA secondary structure packages evaluated and improved by high-throughput experiments

Abstract

Despite the popularity of computer-aided study and design of RNA molecules, little is known about the accuracy of commonly used structure modeling packages in tasks sensitive to ensemble properties of RNA. Here, we demonstrate that the EternaBench dataset, a set of more than 20,000 synthetic RNA constructs designed on the RNA design platform Eterna, provides incisive discriminative power in evaluating current packages in ensemble-oriented structure prediction tasks. We find that CONTRAfold and RNAsoft, packages with parameters derived through statistical learning, achieve consistently higher accuracy than more widely used packages in their standard settings, which derive parameters primarily from thermodynamic experiments. We hypothesized that training a multitask model with the varied data types in EternaBench might improve inference on ensemble-based prediction tasks. Indeed, the resulting model, named EternaFold, demonstrated improved performance that generalizes to diverse external datasets including complete messenger RNAs, viral genomes probed in human cells and synthetic designs modeling mRNA vaccines.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Community-science-designed RNA datasets from the Eterna ‘Cloud Lab’ experiments identify consistent discrepancies in ensemble calculations from secondary structure packages.
Fig. 2: Riboswitch affinity predictions reveal similar package ranking.
Fig. 3: Multitask training using EternaBench datasets results in improved thermodynamic prediction.
Fig. 4: EternaFold improved prediction extends across diverse natural RNA contexts and experiments.

Similar content being viewed by others

Data availability

All datasets used here for evaluation are available at https://www.github.com/eternagame/EternaBench. The original Cloud Lab datasets are available at the RNA Mapping Database28 under accession IDs ETERNA_R00_0000 (round 00), ETERNA_R69_0000 (round 01), ETERNA_R70_0000 (round 02), ETERNA_R71_0000 (round 03), ETERNA_R72_0000 (round 04), ETERNA_R73_0000 (round 05), ETERNA_R74_0000 (round 06), ETERNA_R75_0000 (round 07), ETERNA_R76_0000 (round 08), ETERNA_R77_0002 (round 09), ETERNA_R78_0001 (round 10), ETERNA_R79_0001 (round 11), ETERNA_R80_0001 (round 12), ETERNA_R81_0001 (round 13), ETERNA_R82_0001 (round 14), ETERNA_R83_0003 (round 15), ETERNA_R84_0000 (round 16), ETERNA_R85_0000 (round 17), ETERNA_R86_0000 (round 18), ETERNA_R87_0001 (round 19), ETERNA_R89_0000 (round 20), ETERNA_R91_0000 (round 21), ETERNA_R92_0000 (round 22) and ETERNA_R94_0000 (round 23). A list of RMDB accession IDs or URLs corresponding to the data used for benchmarking SHAPE-guided folding is in Supplementary Table 12. Source data are provided with this paper.

Code availability

The datasets used here for evaluation, as well as scripts and Python notebooks for reproducing the filtered datasets and the chemical mapping and riboswitch affinity calculations described here, are available at https://www.github.com/eternagame/EternaBench. The code for training EternaFold is available at https://www.github.com/eternagame/EternaFold. A server to run EternaFold is available at https://eternafold.eternagame.org/. The EternaFold code is derived from the CONTRAfold-SE36 codebase, which is derived from the CONTRAfold11 codebase.

References

  1. Amaral, P. P., Dinger, M. E., Mercer, T. R. & Mattick, J. S. The eukaryotic genome as an RNA machine. Science 319, 1787–1789 (2008).

    Article  CAS  PubMed  Google Scholar 

  2. Singh, V., Braddick, D. & Dhar, P. K. Exploring the potential of genome editing CRISPR-Cas9 technology. Gene 599, 1–18 (2017).

    Article  CAS  PubMed  Google Scholar 

  3. Jaffrey, S. R. RNA-based fluorescent biosensors for detecting metabolites in vitro and in living cells. Adv. Pharm. 82, 187–203 (2018).

    Article  CAS  Google Scholar 

  4. Kramps, T. & Elbers, K. Introduction to RNA Vaccines. In: Kramps, T., Elbrs, K. (eds) RNA Vaccines. Methods Mol. Biol. Vol. 1499, 1–11 (2017).

  5. Zuker, M. & Stiegler, P. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9, 133–148 (1981).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Lorenz, R. et al. ViennaRNA package 2.0. Algorithms Mol. Biol. 6, 26 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Zadeh, J. N. et al. NUPACK: analysis and design of nucleic acid systems. J. Comput. Chem. 32, 170–173 (2011).

    Article  CAS  PubMed  Google Scholar 

  8. Reuter, J. S. & Mathews, D. H. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinf. 11, 129 (2010).

    Article  Google Scholar 

  9. Xia, T. et al. Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. Biochemistry 37, 14719–14735 (1998).

    Article  CAS  PubMed  Google Scholar 

  10. Andronescu, M., Condon, A., Hoos, H. H., Mathews, D. H. & Murphy, K. P. Efficient parameter estimation for RNA secondary structure prediction. Bioinformatics 23, i19–i28 (2007).

    Article  CAS  PubMed  Google Scholar 

  11. Do, C. B., Woods, D. A. & Batzoglou, S. CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics 22, e90–e98 (2006).

    Article  CAS  PubMed  Google Scholar 

  12. Sloma, M. F. & Mathews, D. H. Base pair probability estimates improve the prediction accuracy of RNA non-canonical base pairs. PLoS Comput. Biol. 13, e1005827 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Rezaur Rahman Chowdhury, F.A., Zhang, H. & Huang, L. Learning to fold RNAs in linear time. Preprint at bioRxiv, 852871 (2019).

  14. Akiyama, M., Sato, K. & Sakakibara, Y. A max-margin training of RNA secondary structure prediction integrated with the thermodynamic model. J. Bioinform Comput Biol. 16, 1840025 (2018).

    Article  PubMed  Google Scholar 

  15. Singh, J., Hanson, J., Paliwal, K. & Zhou, Y. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat. Commun. 10, 5407 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Puton, T., Kozlowski, L. P., Rother, K. M. & Bujnicki, J. M. CompaRNA: a server for continuous benchmarking of automated methods for RNA secondary structure prediction. Nucleic Acids Res. 41, 4307–4323 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Wayment-Steele, H., Wu, M., Gotrik, M. & Das, R. Evaluating riboswitch optimality. Methods Enzymol. 623, 417–450 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Berens, C. & Suess, B. Riboswitch engineering–making the all-important second and third steps. Curr. Opin. Biotechnol. 31, 10–15 (2015).

    Article  CAS  PubMed  Google Scholar 

  19. Mauger, D. M. et al. mRNA structure regulates protein expression through changes in functional half-life. Proc. Natl Acad. Sci. USA 116, 24075–24083 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Watters, K. E. & Lucks, J. B. Mapping RNA structure in vitro with SHAPE chemistry and next-generation sequencing (SHAPE-Seq). Methods Mol. Biol. 1490, 135–162 (2016).

    Article  CAS  PubMed  Google Scholar 

  21. Wilkinson, K. A., Merino, E. J. & Weeks, K. M. Selective 2’-hydroxyl acylation analyzed by primer extension (SHAPE): quantitative RNA structure analysis at single nucleotide resolution. Nat. Protoc. 1, 1610–1616 (2006).

    Article  CAS  PubMed  Google Scholar 

  22. Tian, S. & Das, R. RNA structure through multidimensional chemical mapping. Q. Rev. Biophys. 49, e7 (2016).

    Article  PubMed  Google Scholar 

  23. Denny, S. K. et al. High-throughput investigation of diverse junction elements in RNA tertiary folding. Cell 174, 377–390 e320 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Buenrostro, J. D. et al. Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes. Nat. Biotechnol. 32, 562–568 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Lee, J. et al. RNA design rules from a massive open laboratory. Proc. Natl Acad. Sci. USA 111, 2122–2127 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Delli Ponti, R., Marti, S., Armaos, A. & Tartaglia, G. G. A high-throughput approach to profile RNA structure. Nucleic Acids Res. 45, e35 (2017).

    Article  PubMed  Google Scholar 

  27. Eddy, S. R. Analysis of conserved RNA secondary structure in transcriptomes and genomes. Annu. Rev. Biophys. 43, 433–456 (2014).

  28. Cordero, P., Lucks, J. B. & Das, R. An RNA mapping database for curating RNA structure mapping experiments. Bioinformatics 28, 3006–3008 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Wellington-Oguri, R. et al. Evidence of an unusual Poly(A) RNA signature detected by high-throughput chemical mapping. Biochemistry 59, 2041–2046 (2020).

    Article  CAS  PubMed  Google Scholar 

  30. Anderson-Lee, J. et al. Principles for predicting RNA secondary structure design difficulty. J. Mol. Biol. 428, 748–757 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Beisel, C. L. & Smolke, C. D. Design principles for riboswitch function. PLoS Comput. Biol. 5, e1000363 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Breaker, R. R. Prospects for riboswitch discovery and analysis. Mol. Cell 43, 867–879 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Andreasson, J. O. L. et al. Crowdsourced RNA design discovers diverse, reversible, efficient, self-contained molecular switches. Proc. Natl Acad. Sci. USA 119, e2112979119 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Wu, M. J., Andreasson, J. O. L., Kladwang, W., Greenleaf, W. & Das, R. Automated design of diverse stand-alone riboswitches. ACS Synth. Biol. 8, 1838–1846 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Andronescu, M., Condon, A., Hoos, H. H., Mathews, D. H. & Murphy, K. P. Computational approaches for RNA energy parameter estimation. RNA 16, 2304–2318 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Foo, C.-S. & Pop, C. Learning RNA secondary structure (only) from structure probing data. Preprint at bioRxiv, 152629 (2017).

  37. Andronescu, M., Bereg, V., Hoos, H. H. & Condon, A. RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinf. 9, 340 (2008).

    Article  Google Scholar 

  38. Sloma, M. F. & Mathews, D. H. Exact calculation of loop formation probability identifies folding motifs in RNA secondary structures. RNA 22, 1808–1818 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Watters, K. E. et al. Probing of RNA structures in a positive sense RNA virus reveals selection pressures for structural elements. Nucleic Acids Res. 46, 2573–2584 (2018).

    Article  CAS  PubMed  Google Scholar 

  40. Watts, J. M. et al. Architecture and secondary structure of an entire HIV-1 RNA genome. Nature 460, 711–716 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Kutchko, K. M. et al. Structural divergence creates new functional features in alphavirus genomes. Nucleic Acids Res. 46, 3657–3670 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Siegfried, N. A., Busan, S., Rice, G. M., Nelson, J. A. & Weeks, K. M. RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP). Nat. Methods 11, 959–965 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Dadonaite, B. et al. The structure of the influenza A virus genome. Nat. Microbiol 4, 1781–1789 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Simon, L. M. et al. In vivo analysis of influenza A mRNA secondary structures identifies critical regulatory motifs. Nucleic Acids Res. 47, 7003–7017 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Huber, R. G. et al. Structure mapping of dengue and Zika viruses reveals functional long-range interactions. Nat. Commun. 10, 1408 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Huston, N. C. et al. Comprehensive in vivo secondary structure of the SARS-CoV-2 genome reveals novel regulatory motifs and mechanisms. Mol. Cell 81, 584–598 e585 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Manfredonia, I. et al. Genome-wide mapping of SARS-CoV-2 RNA structures identifies therapeutically-relevant elements. Nucleic Acids Res. 48, 12436–12452 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Sun, L. et al. In vivo structural characterization of the SARS-CoV-2 RNA genome identifies host proteins vulnerable to repurposed drugs. Cell 184, 1865–1883 e1820 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Lavender, C. A., Gorelick, R. J. & Weeks, K. M. Structure-based alignment and consensus secondary structures for three HIV-related RNA genomes. PLoS Comput. Biol. 11, e1004230 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Deigan, K. E., Li, T. W., Mathews, D. H. & Weeks, K. M. Accurate SHAPE-directed RNA structure determination. Proc. Natl Acad. Sci. USA 106, 97–102 (2009).

    Article  CAS  PubMed  Google Scholar 

  51. McGinnis, J. L. & Weeks, K. M. Ribosome RNA assembly intermediates visualized in living cells. Biochemistry 53, 3237–3247 (2014).

    Article  CAS  PubMed  Google Scholar 

  52. Leppek, K. et al. Combinatorial optimization of mRNA structure, stability, and translation for RNA-based therapeutics. Nat. Commun. 13, 1536 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Sun, L. et al. RNA structure maps across mammalian cellular compartments. Nat. Struct. Mol. Biol. 26, 322–330 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Becker, W. R. et al. Quantitative high-throughput tests of ubiquitous RNA secondary structure prediction algorithms via RNA/protein binding. Preprint at bioRxiv, 571588 (2019).

  55. Rouskin, S., Zubradt, M., Washietl, S., Kellis, M. & Weissman, J. S. Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505, 701–705 (2014).

    Article  CAS  PubMed  Google Scholar 

  56. Morandi, E. et al. Genome-scale deconvolution of RNA structure ensembles. Nat. Methods 18, 249–252 (2021).

    Article  CAS  PubMed  Google Scholar 

  57. Hajdin, C. E. et al. Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots. Proc. Natl Acad. Sci. USA 110, 5498–5503 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Zarringhalam, K., Meyer, M. M., Dotu, I., Chuang, J. H. & Clote, P. Integrating chemical footprinting data into RNA secondary structure prediction. PLoS ONE 7, e45160 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Sato, K., Akiyama, M. & Sakakibara, Y. RNA secondary structure prediction using deep learning with thermodynamic integration. Nat. Commun. 12, 941 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Chen, X., Li, Y., Umarov, R., Gao, X. &, Song, L. RNA secondary structure prediction by learning unrolled algorithms. In Proceedings of the 8th International Conference on Learning Representations (2020).

  61. Ward, M., Datta, A., Wise, M. & Mathews, D. H. Advanced multi-loop algorithms for RNA secondary structure prediction reveal that the simplest model is best. Nucleic Acids Res. 45, 8541–8550 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Zhao, B. S., Roundtree, I. A. & He, C. Post-transcriptional gene regulation by mRNA modifications. Nat. Rev. Mol. Cell Biol. 18, 31–42 (2017).

    Article  CAS  PubMed  Google Scholar 

  63. Rinnenthal, J. et al. Mapping the landscape of RNA dynamics with NMR spectroscopy. Acc. Chem. Res. 44, 1292–1301 (2011).

    Article  CAS  PubMed  Google Scholar 

  64. Kappel, K. et al. Accelerated cryo-EM-guided determination of three-dimensional RNA-only structures. Nat. Methods 17, 699–707 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. McCaskill, J. S. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29, 1105–1119 (1990).

    Article  CAS  PubMed  Google Scholar 

  66. Washietl, S., Hofacker, I. L., Stadler, P. F. & Kellis, M. RNA folding with soft constraints: reconciliation of probing data and thermodynamic secondary structure prediction. Nucleic Acids Res. 40, 4261–4272 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Deng, F., Ledda, M., Vaziri, S. & Aviran, S. Data-directed RNA secondary structure prediction using probabilistic modeling. RNA 22, 1109–1119 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Cordero, P. & Das, R. Rich RNA structure landscapes revealed by mutate-and-map analysis. PLoS Comput. Biol. 11, e1004473 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  69. Xu, Y. et al. Hoogsteen base pairs increase the susceptibility of double-stranded DNA to cytotoxic damage. J. Biol. Chem. 295, 15933–15947 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Kladwang, W. et al. Standardization of RNA chemical mapping experiments. Biochemistry 53, 3063–3065 (2014).

    Article  CAS  PubMed  Google Scholar 

  71. Seetin, M. G., Kladwang, W., Bida, J. P. & Das, R. Massively parallel RNA chemical mapping with a reduced bias MAP-seq protocol. Methods Mol. Biol. 1086, 95–117 (2014).

    Article  CAS  PubMed  Google Scholar 

  72. Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Kladwang, W. et al. Anomalous reverse transcription through chemical modifications in polyadenosine stretches. Biochemistry 59, 2154–2170 (2020).

    Article  CAS  PubMed  Google Scholar 

  74. Zhang, H., Zhang, L., Mathews, D. H. & Huang, L. LinearPartition: linear-time approximation of RNA folding partition function and base-pairing probabilities. Bioinformatics 36, i258–i267 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Zou, G. Y. Toward using confidence intervals to compare correlations. Psychol. Methods 12, 399–413 (2007).

    Article  PubMed  Google Scholar 

  76. Diedenhofen, B. & Musch, J. cocor: a comprehensive solution for the statistical comparison of correlations. PLoS ONE 10, e0121945 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank members of the Das and Barna laboratories (Stanford University), C. Pop and C.-S. Foo for useful discussions. We thank I. Jarmoskaite, V.V. Topkar, R. Rangan and J. Townley for helpful comments on the manuscript. Calculations and model training were performed on the Stanford Sherlock cluster. We acknowledge funding from the National Science Foundation (GRFP to H.K.W.S.), the National Institute of Health (grant no. R35 GM122579 to R.D.) and gifts to the Eterna OpenVaccine project from donors listed in Supplementary Table 13.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

H.K.W.S. and R.D. designed the EternaBench benchmark approach and EternaFold multitask training method. H.K.W.S. prepared the EternaBench datasets, performed analyses and implemented and trained the EternaFold model. H.K.W.S. and R.D. wrote the manuscript. W.K. designed methods, acquired data for high-throughput chemical mapping experiments and reviewed the manuscript. A.I.S. performed data analyses and visualizations. W.K., J.L., A.T. and R.D. designed and implemented the Eterna Cloud Lab initiative. A.B. generated SHAPE and DMS data for RNAs of known structure used in SHAPE-directed folding benchmarking. Eterna participants created online design projects, provided RNA solutions and reviewed the manuscript (Supplementary Table 3).

Corresponding author

Correspondence to Rhiju Das.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Hashim Al-Hashimi and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Rita Strack, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Extended analysis of package rankings based on Eterna Cloud lab chemical mapping data.

a) Pearson correlation of all package options tested on Cloud Lab Round 1, which was also a holdout test set for EternaFold training studies. Mean ± SEM of Pearson correlation calculated via bootstrapping, n = 1088 independent constructs. b) ViennaRNA 2, NUPACK 1999, and RNAstructure show maximum Pearson correlation to chemical mapping data at 60 °C, 40 °C, and 60 °C respectively for Eterna Cloud Lab Round 1. Mean ± SEM of Pearson correlation calculated via bootstrapping, n = 1088 independent constructs. c) Ranking across Cloud lab dataset rounds using Spearman rank correlation (compare to Fig. 1e, f). Error bars represent 95% confidence interval of the mean obtained over 1000 iterations of bootstrapping over 24 independent experiments, n = 12,711 independent constructs total. d) (Top) Mean Pearson correlations, calculated over each project (as opposed to each dataset), compared to sequence metrics of the Cloud Lab projects. The strongest correlation to mean correlation was Signal/Noise ratio. (Bottom) Z-score of CONTRAfold-2, calculated over each project, compared to sequence metrics of the Cloud Lab projects.

Source data

Extended Data Fig. 2 Example chemical mapping predictions from all package options tested.

Example heatmaps of all package options tested for the ‘Aires’ project (compare to Fig. 1c).

Source data

Extended Data Fig. 3 Summary statistics for EternaBench datasets before and after performing CD-HIT filtering.

a) Distributions of sequence properties for chemical mapping data (n = 38,846 before filtering and n = 12,711 independent constructs after filtering, collected across 24 experiments), and B) riboswitch constructs (n = 19,016 independent constructs and n = 7,228 independent constructs after filtering, collected in 12 experiments). Dataset statistics of EternaBench train and test experimental rounds for (c) Chemical Mapping (Train set: n = 3,476 independent constructs collected over 6 experiments. Test set: n = 1,492 independent constructs collected over 18 experiments) and (d) Riboswitch data (Train set: n = 2,508 independent constructs collected over 3 experiments. Test set: n = 4,018 independent constructs collected over 9 experiments). Center dot, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range. For all subplots: center dot, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range.

Source data

Extended Data Fig. 4 Overview of all Cloud Labs data.

Example reactivity and p(unpaired) heatmaps from example packages for all 24 Cloud Lab rounds. Data have been filtered to exclude nucleotides with reactivity equal to zero or less.

Extended Data Fig. 5 Extended analysis of package rankings based on riboswitch activity predictions.

a) Example set of states for a riboswitch that toggles binding of the fluorescent MS2 protein as an output, controlled by binding the small molecule FMN. The equilibrium constant for forming the MS2 aptamer in the absence of ligand, \(K_{MS2}^{ - lig}\), is estimated using the probability of forming the closing base pair for all packages. b) Riboswitch Z-scores stratified by input ligand type. Error bars represent standard error on Z-score as calculated by bootstrapping from 6402, 440, and 386 constructs collected over 8, 2, and 2 experiments, respectively. c) Overall ranking \(K_{MS2}^{ - lig}\) calculations using the calculated Spearman correlation (no linear assumption, compare to Fig. 2b). Evaulating the Pearson Correlation of package calculations for (d) \(K_{MS2}^{ + lig}\) as well as (e) riboswitch Activation Ratio results in a similar ranking. In C, D, E, error bars represent 95% confidence interval of the mean obtained over 1000 iterations of bootstrapping across datasets, n = 7,228 independent constructs collected over 12 experiments.

Source data

Extended Data Fig. 6 Example riboswitch predictions from all package options tested.

Scatterplots for all options tested for Ribologic dataset. Black solid line indicates line of best fit.

Source data

Extended Data Fig. 7 Example riboswitch predictions across all datasets.

Scatterplots for representative packages on all riboswitch datasets. Black solid line indicates line of best fit.

Source data

Extended Data Fig. 8 Effect of window size and Levenshtein distance filtering for independent chemical mapping test set.

a) Calculating p(unpaired) using varying sliding windows of size 300, 600, and 1200 does not change the overall ranking obtained across datasets, compare to Fig. 4b, which was calculated for window size 900 (n = 31 datasets for all). Package ranking is also consistent for a redundancy cutoff of 40% b) (n = 16 datasets included after filtering based on 40% cutoff by windowed Levenshtein distance). Error bars in A and B represent 95% confidence interval for the mean Z-score as calculated by bootstrapping across respective number of datasets for each.

Source data

Extended Data Fig. 9 Extended data corresponding to EternaFold development and test set evaluation.

a) Comparing Vienna, CONTRAfold, and EternaFold predictions in predicting free energy of PUM binding. i) Replication of ddG_exp for both PUM WT and mutant binding from (Becker, 2019). The same calculation in Vienna 2 at 37 °C shows lower Root-mean-squared error (RMSE) (ii), but higher RMSE at 60 °C (iii). CONTRAfold 2 shows no improvement over Vienna at 37 °C (iv), but EternaFold shows modest improvement over both (v). b) Package performance for the S-Processed test set is qualitatively similar to results on the ArchiveII-NR test set (cf. Figure 3b). Error bars represent 95% confidence interval of the mean calculated with 1000 iterations of bootstrapping over n = 6 independent datasets, which contain 974 independent constructs total. c) Evaluating SHAPE- and DMS- directed folding. Error bars represent 95% confidence interval of the mean calculated with 1000 iterations of bootstrapping over n = 5 independent datasets of RNAs with known secondary structures,, which contain 47 constructs total. d) Potentials learned from EternaFold training and used in SHAPE-directed structure prediction.

Source data

Extended Data Fig. 10 Extended data corresponding to predicting riboswitch affinity in the presence of small molecule ligands.

a) \(\log K_{MS2}^{ - lig}\) and \(\log K_{MS2}^{ + lig}\) values of riboswitches included in filtered datasets. Black starred datapoint indicates reference value used for \(K_{obs}^{ref}\). b) Estimates for the RiboLogic FMN dataset for \(\log K_{MS2}^{ + lig}\) in all package options able to make estimates with constrained-partition functions.

Source data

Supplementary information

Reporting Summary

Supplementary Table 1

Supplementary Tables 1–14.

Source data

Source Data Fig. 1

Raw source data in reactivity heatmap, raw project analysis source data, raw source P(unpaired) calculations, raw project analysis source data, raw z-scores and significant data across datasets.

Source Data Fig. 2

Raw source Kfold values and calculations, raw z-scores and significant data across datasets.

Source Data Fig. 3

Raw z-scores and significant data across datasets.

Source Data Fig. 4

Raw z-scores and significant data across datasets, raw windowed correlation traces and raw windowed correlation traces.

Source Data Extended Data Fig. 1

Raw z-scores and significant data across datasets and raw project analysis source data.

Source Data Extended Data Fig. 2

Raw source P(unpaired) calculations.

Source Data Extended Data Fig. 3

EternaBench Chemical mapping summary statistics, EternaBench riboswitch summary statistics, chemical mapping train/test split statistics and riboswitch train/test split statistics.

Source Data Extended Data Fig. 5

Raw source Kfold values and calculations.

Source Data Extended Data Fig. 6

Raw source Kfold values and calculations.

Source Data Extended Data Fig. 7

Raw z-scores and significant data across datasets.

Source Data Extended Data Fig. 8

Raw z-scores and significant data across datasets.

Source Data Extended Data Fig. 9

Pumilio protein sequences and Kfold calculations, Raw STRAND test set structure prediction metrics, SHAPE-directed folding z-scores and significant data across datasets.

Source Data Extended Data Fig. 10

Raw source Kfold values and calculations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wayment-Steele, H.K., Kladwang, W., Strom, A.I. et al. RNA secondary structure packages evaluated and improved by high-throughput experiments. Nat Methods 19, 1234–1242 (2022). https://doi.org/10.1038/s41592-022-01605-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41592-022-01605-0

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing