Table 8 Comparison of our proposed model against existing supervised baselines. All baselines are trained end-to-end without SSL or enhanced augmentation. The first and second highest performance for each metric is highlighted in bold.
Model | ROI extraction method | Architecture | F1 | AUC | Accuracy |
|---|---|---|---|---|---|
Jang et al.9 | Manual cropping | VGG16 | 0.43 ± 0.05 | 0.63 ± 0.03 | 0.62 ± 0.04 |
Hsieh et al.8 | Landmark-based cropping | VGG16 | 0.48 ± 0.04 | 0.65 ± 0.04 | 0.65 ± 0.03 |
Wang et al.22 | Landmark-based cropping | VGG16 + Transformer | 0.46 ± 0.06 | 0.64 ± 0.05 | 0.64 ± 0.03 |
Ho et al. (DeepDXA)10 | Segmentation-based | ResNet18 | 0.51 ± 0.03 | 0.66 ± 0.02 | 0.67 ± 0.02 |
Ours (full framework) | Segmentation-based + SSL + enhanced aug | ResNet50 | 0.68 ± 0.03 | 0.85 ± 0.01 | 0.82 ± 0.02 |