Fig. 3: High accuracy of UniKP in enzyme kcat prediction. | Nature Communications

Fig. 3: High accuracy of UniKP in enzyme kcat prediction.

From: UniKP: a unified framework for the prediction of enzyme kinetic parameters

Fig. 3

a Comparison of average coefficient of determination (R²) values for DLKcat and UniKP after five rounds of random test set splitting (n = 1684). b Comparison of the root mean square error (RMSE) between experimentally measured kcat values and predicted kcat values of DLKcat and UniKP for training (n = 15,154) and test sets (n = 1684). Dark bars represent RMSE of DLKcat and light bars for UniKP. c Scatter plot illustrating the Pearson coefficient correlation (PCC) between experimentally measured kcat values and predicted kcat values of UniKP for the test set (N = 1684), showing a strong linear correlation. The color gradient represents the density of data points, ranging from blue (0.02) to red (0.28). d Comparison of RMSE between experimentally measured kcat values and predicted kcat values of DLKcat and UniKP in various experimental kcat numerical intervals. Dark bars represent RMSE of DLKcat and light bars for UniKP. e Enzymes with significantly different kcat values between primary central and energy metabolism, and intermediary and secondary metabolism. An independent two-sided t-test to determine whether the means of two independent samples differ significantly. Primary central and energy metabolism (n = 3098) and intermediary and secondary metabolism (n = 4201) were examined in this analysis. f Shapley additive explanations (SHAP) analysis for the top 20-feature Extra Trees model. The impact of each feature on kcat values is illustrated through a swarm plot of their corresponding SHAP values. The color of the dot represents the relative value of the feature in the dataset (high-to-low depicted as red-to-blue). The horizontal location of the dots shows whether the effect of that feature value contributed positively or negatively in that prediction instance (x-axis). In each box plot (a, e), the central band represents the median value, the box represents the upper and lower quartiles and the whiskers extend up to 1.5 times the interquartile range beyond the box range. Source data are provided as a Source Data file.

Back to article page