Table 1 Summary of all datasets and tasks

From: Learning the language of protein-protein interactions

Task

Reference

Train size

Validation size

Test size

Task

Projector used

Evaluation metric

General PPI prediction

Gold-standard PPI

22

163019

59260

52048

Binary classification

MLP

AUPRC

Human-PPI

67

26319

234

180

Binary classification

MLP

Accuracy

Yeast-PPI

12

4945

95

394

Binary classification

MLP

Accuracy

PDB-Bind

28

4945

95

394

Regression

MLP

Pearson correlation

SKEMPI

23

4777

1929

Regression

MLP

Pearson correlation

MutaionalPPI

29

3406

Binary classification

MLP

AUPRC

Antibody tasks

FLAB (Binding 422)

35

422

Regression

Ridge regression

R2

FLAB (Binding 2048)

36

2048

Regression

Ridge regression

R2

FLAB (Binding 4275)

37

4275

Regression

Ridge regression

R2

FLAB (Expression 4275)

37

4275

Regression

Ridge regression

R2

SARS-CoV2 binding

38

86929

Regression

Ridge regression

Spearman rank

TCR-Epitope-MHC tasks

TDC-Tchard

44

522239

71666

Binary classification

MLP

AUROC

TCR-Epitope-HLA

17

28144

71036

2806

Binary classification

MLP

AUROC

TCR-epitope interface prediction

46

122

Interface prediction

CNN

AUPRC