Table 1 Performance comparison on the CATH 4.2 and CATH 4.3 datasets with topology classification split

From: Mask-prior-guided denoising diffusion improves inverse protein folding

 

Models

External

Model

Perplexity ()

Median recovery rate (%, )

 

knowledge

parameters

Short

Single-chain

Full

Short

Single-chain

Full

CATH 4.2

aStructGNN26

1.4M

8.29

8.74

6.40

29.44

28.26

35.91

aGraphTrans26

1.5M

8.39

8.83

6.63

28.14

28.46

35.82

aGVP43

2.0M

7.09

7.49

6.05

32.62

31.10

37.64

aAlphaDesign44

6.6M

7.32

7.63

6.30

34.16

32.66

41.31

ProteinMPNN1

1.9M

6.90

7.03

4.70

36.45

35.29

48.63

PiFold13

6.6M

5.97

6.13

4.61

39.17

42.43

51.40

LM-Design45

659M

6.86

6.82

4.55

37.66

38.94

53.19

GRADE-IF38

7.0M

5.65

6.46

4.40

45.84

42.73

52.63

MapDiff (uniform prior)

14.7M

3.99

4.43

3.46

52.85

50.00

61.03

MapDiff (marginal prior)

14.7M

3.96

4.41

3.43

54.04

49.34

60.93

CATH 4.3

aGVP-GNN-Large27

21M

7.68

6.12

6.17

32.60

39.40

39.20

a+ AF2 predicted data

142M

6.11

4.09

4.08

38.30

50.08

50.08

aGVP-Transformer27

21M

8.18

6.33

6.44

31.30

38.50

38.30

a+ AF2 predicted data

142M

6.05

4.00

4.01

38.10

51.50

51.60

ProteinMPNN1

1.9M

6.12

6.18

4.63

40.00

39.13

47.66

PiFold13

6.6M

5.52

5.00

4.38

43.06

45.54

51.45

LM-Design45

659M

6.01

5.73

4.47

44.44

45.31

53.66

GRADE-IF38

7.0M

5.30

6.05

4.58

48.21

45.94

52.24

MapDiff (uniform prior)

14.7M

3.88

3.85

3.48

55.95

54.65

60.86

MapDiff (marginal prior)

14.7M

3.90

3.83

3.52

55.56

54.99

60.68

  1. The results include the perplexity and median recovery rate on the full test set, as well as on short and single-chain subsets. The external knowledge column indicates whether additional training data or protein language models are used. aWe also quote partial baseline results from ref. 13 and ref. 27 for comparative analysis.
  2. The best result for each dataset and metric is marked in bold and the second-best result is in italics.