Fig. 1: Machine-learning models predict tail-length changes during frog oocyte maturation.

a Experimental scheme for examining poly(A) tail-length changes of frog mRNAs during oocyte maturation. Total RNA was extracted from oocytes before and after progesterone-induced germinal vesicle breakdown (GVBD), and changes in poly(A)-tail lengths were measured. b Schematic of the 10-fold cross-validation strategy used to train and test different machine-learning models in this study. Data were partitioned into training/validation and test sets, repeated across 10 different stratified folds. c Performance of the multiple linear regression model. Plotted are the tail-length changes predicted by the model as a function of the changes measured in frog oocytes between 0 h and 7 h post-progesterone treatment. Each point represents a unique poly(A) site of an endogenous mRNA. Colors indicate the density of points. C.V., cross validation. d Diagram outlining the two machine-learning models developed to predict poly(A) tail-length changes from mRNA sequences: a multiple linear regression model and an integrated neural network (PAL-AI). e Performance of PAL-AI; otherwise as in (c). f Prediction performance of PAL-AI trained on different input regions of mRNAs or additional annotation features. Left: input sequence regions (bars) and additional features, i.e, coding sequence (CDS) or predicted pairing (fold), blue and orange dots, respectively. Right: distributions of Rp values observed when comparing predicted and measured tail-length changes for test data held out during training. Ten-fold cross-validation of the model was repeated five times, generating 50 Rp values. The red rectangle indicates the configuration chosen as the final model. Box and whiskers indicate the 10th, 25th, 50th, 75th, and 90th percentiles. g Pairwise comparison of input strategies used for PAL-AI based on prediction performance. Shown are binned P values from one-sided t-tests, testing the alternative hypothesis that the mean Rp value of the group indicated on the y axis is greater than that indicated on the x axis. The red rectangle indicates the configuration chosen as the final model.