Fig. 3: Performance comparison of kcat/Km models.
From: Robust enzyme discovery and engineering with deep learning using CataPro

a Distribution of kcat/Km values in the kcat/Km dataset. The kcat/Km dataset consists of samples with concurrent kcat and Km entries, which are divided into ten groups based on enzyme sequence similarity of 0.4. b Distribution of experimental kcat/Km values in each fold of the ten-fold unbiased dataset is shown in different colors, with the white line in the body of the violin plot representing the median. Fold 0 contains 2,584 data points, while each of other nine folds contains 2,583 data points. c Performance of CataPro on the ten-fold unbiased dataset of kcat/Km, with the colorbar representing data density. d, e, and f respectively show the PCC, SCC, and RMSE achieved by the kcat/Km prediction models. CataPro is highlighted in red in panels d-f. Source data are provided as a Source Data file.