Extended Data Fig. 2: Proteome-wide conservation of composition vs. sequence.

a, ChIP scores for viable vs. inviable constructs confirm the ability of variants to bind Abf1-specific genomic loci even in the event of inviability. The distribution of ChIP scores for selected viable and all inviable rationally designed constructs (Supplementary Table 1) is plotted. Horizontal bars mark the median value. Boxed values of viable constructs were measured in the presence of untagged wild-type Abf1 (Methods). These two populations do not differ significantly (Mann–Whitney U-test; P value = 0.24). b, Panels showing median sequence conservation as assessed by alignment (y axis for all) compared with the per-residue compositional conservation (x axis) for four different compositional groups: negative (E,D), positive (R,K), hydrophobic (I,L,V,M,Y,F,W) and polar (Q,S,N,H,G) residues. In each panel, we also show the compositional conservation and sequence conservation scores for the set of IDRs generated via random mutagenesis (Methods and Fig. 5f), which were either viable (green square) or inviable (red square). In addition, the position of IDR2 is noted (circles). These points are included only to be compared with one another and are not easily comparable with “natural” proteins. c–f, All IDRs that fall below the dashed line in b are histogrammed on compositional conservation, with IDR2 from Abf1 shown as a red vertical line on the histogram. c, Positive amino acid composition in IDR2 is more conserved than 93% of other IDRs, but the number of positively charged residues is low, indicating that IDR2 is less likely to acquire positive residues. d, Negative amino acid composition in IDR2 is more conserved than 64% of other IDRs. e, Hydrophobic amino acid composition in IDR2 is more conserved than 93% of other IDRs. f, Polar amino acid composition in IDR2 is more conserved than 48% of other IDRs. From this, we naively conclude that charge and hydrophobicity are more constrained in IDR2 than polar amino acid content. g, Sequence conservation vs. compositional conservation compared with all other similarly conserved folded domains in the yeast proteome. Folded domains were identified using predicted pLDDT-based sequence analysis, analogous to the identification of disordered regions. Compositional and linear conservation analyses confirm that folded domains are substantially more conserved in both respects.