Table 1 Breakdowns of Dataset 1 and Dataset 2.

From: PepCNN deep learning tool for predicting peptide binding residues in proteins using sequence, structural, and language model features

 

Dataset 1

Dataset 2

TE125

TR1115

TE639

TR640

(test set)

(train set)

(test set)

(train set)

No. of proteins

125

1115

639

640

No. of residues

30,870

266,712

150,330

157,362

No. of non-binding residues

29,154

251,770

141,840

149,103

No. of binding residues

1716

14,942

8490

8259