Extended Data Fig. 8: SVR models for ν and Sconf/N.
From: Conformational ensembles of the human intrinsically disordered proteome

a,g, Permutation importance of the sequence features used in the SVR models for the prediction of (a) ν and (g) Sconf/N. b–f, Average sequence features as a function of ν calculated from simulations (grey) and using the SVR model (red); data are shown for (b) sequence hydropathy decoration, SHD; (c) sequence charge decoration, SCD; (d) fraction of charged residues, FCR; (e) average stickiness, ⟨λ⟩; and (f) charge segregation, κ. h–j, Average sequence features as a function of Sconf/N calculated from simulations (grey) and using the SVR model (red); data are shown for (h) ⟨λ⟩; (i) SHD; and (j) SCD. Data are displayed as mean ± s.e.m. calculated within bins of width Δν = 0.015 and ΔSconf/N = 0.05 kB. The samples sizes in each bin are (b–f) n = 4; 3; 5; 3; 5; 6; 9; 11; 8; 16; 14; 26; 23; 28; 41; 49; 98; 135; 212; 473; 854; 1,921; 3,319; 4,724; 5,393; 4,987; 3,405; 1,461; 484; 182; 76; 43; 16; 9; 4; and (h–j) n = 5; 9; 4; 20; 29; 93; 406; 1,290; 2,555; 3,204; 3,520; 3,483; 3,415; 3,005; 2,517; 1,895; 1,208; 686; 320; 161; 90; 49; 29; 20; 12; 6; 8; 4; 3; 3; 3. k–n, Testing the SVR models. k,m, Correlation between (k) ν and (m) Sconf/N from simulations and corresponding predictions of the SVR models for a held-out test set of 2,795 distinct sequences from the set of 28,058 IDRs identified in this work. l,n, Correlation between (l) ν and (n) Sconf/N values from simulations and corresponding predictions of the SVR models for 611 IDRs in the 531 proteins that are unique to the SPOT-based set of IDRs.