Fig. 5: Kaggle models demonstrate improved performance in independent test of degradation of full-length mRNAs. | Nature Machine Intelligence

Fig. 5: Kaggle models demonstrate improved performance in independent test of degradation of full-length mRNAs.

From: Deep learning models for predicting RNA degradation via dual crowdsourcing

Fig. 5

a, Overall mRNA degradation rate from PERSIST seq is driven by mRNA length. Kaggle models were therefore tested in their ability to predict length-averaged mRNA degradation. Data are presented as mean values ± standard error estimated from the PERSIST-seq experiment, n = 3 biologically independent samples. b, Representative structures of two mRNAs of the same length that both encode nanoluciferase, one with high degradation (‘Yellowstone’, left) and low degradation (‘LinearDesign-1’, right). These mRNAs were designed by Eterna participants, and were used as a negative control and positive control of structured mRNA in ref. 13. c, Prediction vectors were summed over nucleotides corresponding to the CDS region to compare to PERSIST-seq degradation rates, which account for degradation between two RT-PCR primers designed to capture degradation in the CDS region. d, Length-normalized predictions from the Kaggle first-placed ‘Nullrecurrent’ model and Kaggle second-placed ‘Kazuki2’ model show improved prediction over unpaired probabilities from ViennaRNA RNAfold23 and the DegScore linear regression model13, and a version of the DegScore featurization with XGBoost25 training. Data are presented as mean values ± standard error estimated from the PERSIST-seq experiment, n = 3 biologically independent samples. Significance test for Spearman correlation value is two-sided p-value for a hypothesis test whose null hypothesis is that two sets of data are uncorrelated.

Source data

Back to article page