Table 1 Ablation study on various settings of visual encoder architectures
From: Large-scale long-tailed disease diagnosis on radiology images
Visual Encoder | Architecture | AUC | AP | F1 | MCC | R@0.01 | R@0.05 | R@0.1 | |
|---|---|---|---|---|---|---|---|---|---|
Normalisation | Shared Enc | ||||||||
ViT | 2-layer MLP | 6-layer ViT | 79.98/77.24 | 6.01/4.99 | 13.57/7.36 | 14.69/7.95 | 17.78/10.87 | 33.56/24.66 | 47.21/35.02 |
4-layer MLP | 6-layer ViT | 80.57/77.74 | 6.13/5.20 | 13.49/8.41 | 14.78/9.53 | 17.68/11.48 | 34.01/25.79 | 47.44/36.12 | |
2-layer MLP | 12-layer ViT | 81.69/78.15 | 6.40/5.26 | 14.71/8.89 | 15.30/9.74 | 18.11/11.33 | 34.73/25.97 | 48.84/36.20 | |
4-layer MLP | 12-layer ViT | 82.03/78.59 | 6.67/5.37 | 14.94/8.87 | 15.66/9.77 | 18.20/12.09 | 34.99/26.58 | 49.52/36.74 | |
ResNet | ResNet-18 | ResNet-18 | 86.91/81.16 | 11.00/5.27 | 16.77/9.21 | 18.63/11.48 | 20.42/12.63 | 41.87/29.20 | 59.38/42.04 |
ResNet-34 | ResNet-18 | 86.99/81.75 | 11.15/5.70 | 17.14/10.06 | 19.21/11.47 | 20.82/13.61 | 44.67/30.73 | 61.13/43.54 | |
ResNet-18 | ResNet-34 | 87.06/82.09 | 11.27/6.09 | 17.36/10.15 | 19.23/12.00 | 21.48/1.96 | 44.38/31.46 | 61.54/44.13 | |
ResNet-34 | ResNet-34 | 87.10/82.44 | 11.31/6.32 | 17.66/10.06 | 19.41/12.34 | 21.33/13.88 | 44.19/31.60 | 62.25/44.24 | |
ResNet-ViT | ResNet-34 | 6-layer ViT | 88.74/84.02 | 11.52/7.07 | 17.86/11.30 | 20.05/14.11 | 21.92/15.10 | 44.63/33.38 | 63.09/47.81 |
ResNet-50 | 6-layer ViT | 89.53/84.76 | 11.75/7.74 | 19.59/12.51 | 20.61/15.01 | 23.18/15.33 | 51.34/33.92 | 67.39/48.67 | |
ResNet-34 | 12-layer ViT | 88.93/84.23 | 11.4/7.52 | 18.07/11.86 | 20.09/14.49 | 22.38/15.19 | 45.23/33.65 | 65.04/48.07 | |
ResNet-50 | 12-layer ViT | 89.56/84.95 | 11.73/7.72 | 19.73/12.36 | 21.16/14.97 | 22.58/15.81 | 51.64/34.35 | 67.92/48.98 | |