Table 1 Ablation study on various settings of visual encoder architectures

From: Large-scale long-tailed disease diagnosis on radiology images

Visual Encoder

Architecture

AUC

AP

F1

MCC

R@0.01

R@0.05

R@0.1

 

Normalisation

Shared Enc

       

ViT

2-layer MLP

6-layer ViT

79.98/77.24

6.01/4.99

13.57/7.36

14.69/7.95

17.78/10.87

33.56/24.66

47.21/35.02

 

4-layer MLP

6-layer ViT

80.57/77.74

6.13/5.20

13.49/8.41

14.78/9.53

17.68/11.48

34.01/25.79

47.44/36.12

 

2-layer MLP

12-layer ViT

81.69/78.15

6.40/5.26

14.71/8.89

15.30/9.74

18.11/11.33

34.73/25.97

48.84/36.20

 

4-layer MLP

12-layer ViT

82.03/78.59

6.67/5.37

14.94/8.87

15.66/9.77

18.20/12.09

34.99/26.58

49.52/36.74

ResNet

ResNet-18

ResNet-18

86.91/81.16

11.00/5.27

16.77/9.21

18.63/11.48

20.42/12.63

41.87/29.20

59.38/42.04

 

ResNet-34

ResNet-18

86.99/81.75

11.15/5.70

17.14/10.06

19.21/11.47

20.82/13.61

44.67/30.73

61.13/43.54

 

ResNet-18

ResNet-34

87.06/82.09

11.27/6.09

17.36/10.15

19.23/12.00

21.48/1.96

44.38/31.46

61.54/44.13

 

ResNet-34

ResNet-34

87.10/82.44

11.31/6.32

17.66/10.06

19.41/12.34

21.33/13.88

44.19/31.60

62.25/44.24

ResNet-ViT

ResNet-34

6-layer ViT

88.74/84.02

11.52/7.07

17.86/11.30

20.05/14.11

21.92/15.10

44.63/33.38

63.09/47.81

 

ResNet-50

6-layer ViT

89.53/84.76

11.75/7.74

19.59/12.51

20.61/15.01

23.18/15.33

51.34/33.92

67.39/48.67

 

ResNet-34

12-layer ViT

88.93/84.23

11.4/7.52

18.07/11.86

20.09/14.49

22.38/15.19

45.23/33.65

65.04/48.07

 

ResNet-50

12-layer ViT

89.56/84.95

11.73/7.72

19.73/12.36

21.16/14.97

22.58/15.81

51.64/34.35

67.92/48.98

  1. “Normalisation” denotes the separated visual encoder part to perform 2D and 3D normalization and “Shared Enc.” denotes the shared encoder part for both 2D and 3D scans. The value preceding ‘/’ represents the results from the subset with top 200 classes, while the value following ‘/’ denotes the results from the subset with 200 random classes.