Fig. 5: Comparative evaluation of CatPred against existing ML frameworks using CatPred-DB benchmark datasets. | Nature Communications

Fig. 5: Comparative evaluation of CatPred against existing ML frameworks using CatPred-DB benchmark datasets.

From: CatPred: a comprehensive framework for deep learning in vitro enzyme kinetic parameters

Fig. 5

Baseline, DLKcat, UniKP and CatPred were evaluated. The values of coefficient of determination (R2) obtained on held-out and out-of-distribution tests at decreasing levels of enzyme sequence similarity to training sequences are plotted. R2 values obtained for benchmarking on (a) CatPred-DB-kcat (b) CatPred-DB-Km and (c) CatPred-DB-Ki are shown. Each group on X-axes indicates the test set formed by using a maximum percent sequence identity cutoff (Max. % seq. id. cutoff) to training sequences. The set with 100% Max. seq. id. cutoff refers to held-out test and the rest refer to out-of-distribution sets. The heights of each bar denote the mean metric value of ten replicate models for DLKcat, UniKP, and CatPred, respectively, while the overlaid points denote the metric values of individual replicates. There are no replicates for the Baseline. For CatPred-DB-Ki evaluation, ** indicates statistically insignificant p-value (=0.539) from a Welch’s two-sample t-test (two-sided) assuming unequal variances. The results were: t(17.3) = 0.626, p = 0.539 and 95% CI for the difference in means = (−0.0061, 0.0075). No adjustments were made for multiple comparisons. * is the placeholder used for ‘Baseline’ bars that have a negative R2 value (−0.047 and −0.106 for Out-of-distribution evaluation for 60% and 40% Max. % seq. id. Cutoff respectively).

Back to article page