Fig. 6: Predicting translation efficiency using UTR sequences alone.
From: Post-transcriptional reprogramming by thousands of mRNA untranslated regions in trypanosomes

a The plots show nucleobase composition for 5’-UTRs of >49 b, CDSs of >199 b, and 3’-UTRs of > 99 b, relative to published measures of translation efficiency5. b Machine learning model evaluation based on 3’-UTR sequences. The upper plot shows translation efficiency values for the test set (n = 2020 genes) compared with the model predictions. A linear regression line is shown. The lower plot shows the SHAP values for each gene and for the top eight features that contribute to the predictions. The colour scale reflects relative contribution to high (red) or low (blue) translation efficiency. The dots are jittered in the y-axis to illustrate the distribution of the SHAP values. Am2, A-tracts longer than 5 allowing 2 mismatches; Cm2, C-tracts longer than 5 allowing 2 mismatches; AGm2, AG-tracts longer than 5 allowing 2 mismatches. c Machine learning model evaluation based on 5’-UTR and 3’-UTR sequences. n = 2016 genes. Other details as in panel b.