Extended Data Fig. 6: Analysis of the CD-CODE database, and relationships between sequence and conformational properties for IDRs in the pLDDT-based and SPOT-based sets.
From: Conformational ensembles of the human intrinsically disordered proteome

a,b, Analysis of the association between IDR conformational properties and protein localization in membraneless organelles (as reported in the CD-CODE database27). Distributions of (a) ν and (b) Sconf/N for IDRs in ‘driver’ proteins (shaded bars) and in proteins that are not a part of the examined subset (black lines). c,d, Distributions of (c) ν and (d) Sconf/N for IDRs in ‘member’ proteins (shaded bars) and in proteins that are not a part of the examined subset (black lines). Histograms for proteins enriched in compact and expanded IDRs are shown in orange and teal, respectively. P values are estimated from one-sided Brunner–Munzel tests using t-distributions and the reported degrees of freedom (DoF). Standard errors of Cohen’s d values are estimated through 105 bootstraps. e–q, Comparison between sequence features that affect compaction in pLDDT-based and SPOT-based sets of IDRs. We show NARDINI z-scores for (e) basic–acidic patterning, z(δ+−), (f) acidic patterning, z(Ω−), and (g) aromatic patterning, z(Ωπ); (h) sequence charge decoration, SCD; (l) charge segregation, κ; (m) Sconf/N; (n) sequence hydropathy decoration, SHD; (o) average stickiness, ⟨λ⟩; (p) fraction of charged residues, FCR; and (q) sequence length, N, as a function of ν. i–k: (i) z(δ+−), (j) z(Ω−), and (k) z(Ωπ) as a function of Sconf/N. Results are shown for all IDRs in the pLDDT-based (grey) and SPOT-based (red) sets, and for IDRs with fdomain = 0 in the pLDDT-based set (blue). r,s, Short IDRs are on average more highly charged and expanded than longer IDRs. (r) ν and (s) Sconf/N as a function of sequence length, N, for the human IDRs in the pLDDT-based (grey) and SPOT-based (red) sets. Data are displayed as mean ± s.e.m. t, Normalized distributions of the NCPR for IDRs with N≤200 (full lines) and with N > 200 (dotted lines) in the pLDDT-based (grey) and SPOT-based (red) sets.