Fig. 4: GA multivalency and exon architecture drive selective nuclear retention of mRNAs. | Nature

Fig. 4: GA multivalency and exon architecture drive selective nuclear retention of mRNAs.

From: Collective homeostasis of condensation-prone proteins via their mRNAs

Fig. 4: GA multivalency and exon architecture drive selective nuclear retention of mRNAs.

a, The proportion of proteins made up of highly charged regions (regions of 40 amino acids with greater than 40% charged residues). Proteins encoded by genes that showed significant nuclear mRNA retention upon expression of PPIGLCD are compared with proteins from expressed genes that did not exhibit retention (n = 6,543 and 531, respectively). b, Proportions of the same proteins as in a predicted to be disordered by AlphaFold2. c, DosPS scores for the same proteins as in a. d, ROC curve (purple) indicates the mean performance of the balanced random forest model, trained on the indicated features, across fourfold cross-validation (grey curves) in distinguishing between nuclear-retained mRNAs and mRNAs with unchanged nuclear:cytoplasmic ratios (Methods). The shaded area represents the standard deviation across the fourfolds. e, Schematic illustrating how mRNP packaging effects driven by exon length could influence assembly of interstasis-promoting RBPs. f, Schematic describing the assembly of the reporter library. g, The nuclear:cytoplasmic abundance ratio of reporter transcripts depending on their sequence multivalency and different lengths of expression induction via doxycycline. The distribution of ratios is shown via a ridgeline plot, whereas the individual group means for each replicate are shown as a dot plot (n = 3 independent replicates). h, The nuclear:cytoplasmic abundance ratio of reporter transcripts over time depending on the number of exons and their multivalency (n = 3 independent replicates). i, Pearson’s correlations of the multivalency of reporter gene sequences with the binding scores of different RBPs. All pairwise statistical comparisons were performed using a Welch t-test with FDR correction for multiple testing, where *P < 0.05, **P < 0.01 and ***P < 0.001. Precise P values can be found in the Source Data. The boxplots show the median (centre), the upper and lower quartiles (hinges), and the nearest value within 1.5 times the interquartile range from the quartile (whiskers). Black dashed lines represent no change between conditions.

Source data

Back to article page