Fig. 1: Generalized RNA multivalency scoring identifies codon-biased multivalent regions that encode low-complexity domains. | Nature

Fig. 1: Generalized RNA multivalency scoring identifies codon-biased multivalent regions that encode low-complexity domains.

From: Collective homeostasis of condensation-prone proteins via their mRNAs

Fig. 1

a, Calculation of a GeRM score for an individual 5-mer. b, An example of two 5-mers with either high or low GeRM scores. c, An example transcript from the gene LUC7L3. The smoothed GeRM score is shown at the top (solid line), and the dashed line shows the average smoothed GeRM score after synonymous codon shuffling. The amino acid entropy of the encoded sequence is shown at the bottom (black and teal line), and the proportion of charged amino acids in that window is shown as the orange line. d, The native DNA and amino acid sequences of LUC7L3 within the GeRM peak and a synonymously codon-shuffled sequence. Conservation across 100 vertebrates (PhyloP) for each position that can tolerate synonymous mutation is shown by the height of the letter. Below, a ratio of GeRM scores for the native codon choice to average scores of any synonymous mutation is also shown. e, The mean entropy for amino acid sequences encoded inside high GeRM CDS regions (black line) outside but within the same protein (grey line). f, As in e, but comparing the AlphaFold-predicted pLDDT values inside (black line) and outside (grey line) protein regions encoded by high GeRM regions. g, The mean GeRM scores within CDS regions encoding LCDs (black lines) or the rest of the CDS (grey lines). The mean GeRM scores in those regions after synonymously reshuffling the codons across the transcriptome (dashed lines) are also shown. h, The normalized conservation across 100 vertebrates of synonymously mutable positions in coding sequences that either encode LCDs or do not. Codons are binned by the degree that the native codon choice supports sequence multivalency, in which codons with the highest ratio support the multivalency the most. Unless otherwise stated, all pairwise significance testing were performed using FDR-corrected Welch t-tests, where *P < 1−15. Precise P values can be found in the Source Data. The boxplots show the median (centre), upper and lower quartiles (hinges), and the nearest value within 1.5 times the interquartile range from the quartile (whiskers).

Source data

Back to article page