Fig. 3: Schematic of CatPred-DB dataset splits and evaluation of CatPred models.
From: CatPred: a comprehensive framework for deep learning in vitro enzyme kinetic parameters

a CatPred-DB dataset sizes used for training, held-out test and out-of-distribution test are shown as Venn diagrams. b Coefficient of determination (R2) values obtained by trained CatPred models for kcat, Km and Ki prediction on held-out and out-of-distribution test sets. (a) by the models on (hold out) test sets (solid bars) and on (out-of-distribution) samples (patterned bars) are shown. The out-of-distribution samples are subsets of the full test-sets extracted so as no enzyme sequence in the subset is more than 99% similar to any training sequence. ‘Substrate Only’ refers to CatPred models trained using only the substrate features; ‘Substrate+Seq-Attn’ (Sequence Attention) refers to CatPred models trained using substrate features and the Seq-Attn features; ‘Substrate+Seq-Attn+pLM’ (protein Language Model) refers to CatPred models trained using substrate features along with both the Seq-Attn and pLM features; ‘Substrate+Seq-Attn+pLM+EGNN’ (Equivariant Graph Neural Networks) refers to CatPred models trained using substrate features along with Seq-Attn+pLM and EGNN features.