Extended Data Fig. 2: Clusters of multivalent coding sequences and their coding biases. | Nature

Extended Data Fig. 2: Clusters of multivalent coding sequences and their coding biases.

From: Collective homeostasis of condensation-prone proteins via their mRNAs

Extended Data Fig. 2

a, The contribution to the total GeRM score in each GeRM cluster that is accounted for by a given 5-mer. Only the three most common 5-mers per cluster are shown. b, The relative occurrence of amino acids encoded by different GeRM regions relative to their occurrence in the entire proteome. Only amino acids with notable enrichments are shown. c, The number of GeRM regions per cluster encoding at least two-fold more arginines than expected based on occurrence in the proteome. d, The mean amino acid entropy calculated in sliding windows for amino acids encoded by different GeRM regions or the amino acid sequences encoded by the rest of the CDS (grey line). The dashed line represents the mean amino acid entropy for the entire proteome. Statistical comparisons are Welch t-tests between the GeRM regions and the non-GeRM regions from the same set of proteins, with FDR correction. e, The log2-transformed ratio between the total GeRM calculated from the native sequence within a GeRM region and the mean total GeRM score in the same region when codons have been synonymously shuffled five times within each transcript. Statistical tests are one sample t-tests, with FDR correction.

Back to article page