Table 7 Comparison of F1 scores on the test set for the ESM-2 token classification model and the GNN model, alongside the F1 scores of prosperousplus, obtained by training on our dataset and evaluating on our test set. A Dash (–) indicates that an error occurred during training for the corresponding protease due to the prosperousplus code failing to complete the calculations.

From: Prediction of peptide cleavage sites using protein language models and graph neural networks

MEROPS ID

ESM-2 Token Classification

F1

Graph Neural Network

F1

ProsperousPlus F1 (1:1 undersampling training set)

ProsperousPlus F1 (no undersampling training set)

C14.001

0.3896

0.1613

0.0359

0.0000

C14.003

0.5389

0.367

0.0837

-

C14.004

0.3394

0.1393

0.0469

0.0563

C14.005

0.8655

0.737

0.4547

0.6713

M10.003

0.5966

0.2513

0.0853

-

M10.005

0.2420

0.0887

0.0567

-

S01.010

0.3456

0.1672

0.0438

-

S01.217

0.7671

0.3692

0.1405

0.2083