Fig. 5: Kaggle models demonstrate improved performance in independent test of degradation of full-length mRNAs.
From: Deep learning models for predicting RNA degradation via dual crowdsourcing

a, Overall mRNA degradation rate from PERSIST seq is driven by mRNA length. Kaggle models were therefore tested in their ability to predict length-averaged mRNA degradation. Data are presented as mean values ± standard error estimated from the PERSIST-seq experiment, n = 3 biologically independent samples. b, Representative structures of two mRNAs of the same length that both encode nanoluciferase, one with high degradation (‘Yellowstone’, left) and low degradation (‘LinearDesign-1’, right). These mRNAs were designed by Eterna participants, and were used as a negative control and positive control of structured mRNA in ref. 13. c, Prediction vectors were summed over nucleotides corresponding to the CDS region to compare to PERSIST-seq degradation rates, which account for degradation between two RT-PCR primers designed to capture degradation in the CDS region. d, Length-normalized predictions from the Kaggle first-placed ‘Nullrecurrent’ model and Kaggle second-placed ‘Kazuki2’ model show improved prediction over unpaired probabilities from ViennaRNA RNAfold23 and the DegScore linear regression model13, and a version of the DegScore featurization with XGBoost25 training. Data are presented as mean values ± standard error estimated from the PERSIST-seq experiment, n = 3 biologically independent samples. Significance test for Spearman correlation value is two-sided p-value for a hypothesis test whose null hypothesis is that two sets of data are uncorrelated.