Fig. 4: Machine learning uncovers features underlying protein degradation potential. | Nature Communications

Fig. 4: Machine learning uncovers features underlying protein degradation potential.

From: Targeted protein degradation in mycobacteria uncovers antibacterial effects and potentiates antibiotic efficacy

Fig. 4

a Schematic of our machine learning experiment encompassing protein feature extraction, feature selection, and Lasso regression. The accuracy of the Lasso regression model was estimated by calculating Pearson’s correlation between the measured and the predicted log2(degradation constant) of the 18 substrates used for model validation. r and P represent the coefficient and P-value of the Pearson’s correlation analysis, respectively. b–d The disorder propensity of the N-termini, but not the C-termini of tested substrates positively associates with protein degradation potential. (b) depicts the linear association between the average disordered propensity of the first (N-termini) and the last (C-termini) 30 amino acids as measured by flDPnn method35. r and P represent the Pearson’s coefficients and their P values between the protein features and their measured log2(degradation constant), respectively. (c) depicts the flDPnn predicted per-residue disorder propensity of the terminal 30 amino acids of each substrate. Here the terminal disorder propensity profiles of the 72 tested substrates (54 for model training and 18 for validation) are stacked and ordered by their degradation constant (top to bottom). For both termini, the per-residue values are color-coded and ordered by their distance from the start codon (left to right). (d) demonstrates the full-length disorder propensity profiles of 5 representative proteins with varied degradation potential (degradation constant provided in parentheses). e Scatter plot depicting the predicted degradation potential of the 348 conserved essential Msm proteins along with their transcriptional vulnerability index as measured by a genome-wide CRISPRi screening28. Protein candidates of interest, including those predicted to be highly susceptible to TPD (light blue), or the ones for which we have generated chromosomally tagged TPD strains (red), are highlighted as colored dots. The centerlines (dashed black lines) in (a, b) denote the linear fit of the datapoints and are bounded by the 95% confidence interval. The correlation coefficients (r) and P-values in (a, b) were determined using a two-sided Pearson’s correlation test. Source data are provided with this paper.

Back to article page