Table 5 Comparative performance of baseline architectures (ResNet-50, ViT-B/16, EfficientNet-B0) and the proposed ResViT model on the MCI dataset. The ResViT model achieves superior accuracy and F1-score, demonstrating the effectiveness of combining convolutional and transformer-based feature extraction.
From: MCI detection from handwritten drawing test using residual vision transformer
Model | Parameters (M) | Precision | Recall | F1-Score | Accuracy (%) | Loss |
|---|---|---|---|---|---|---|
ResNet-50 | 25.6 | 0.58 | 0.57 | 0.57 | 57.25 | 1.2706 |
EfficientNet-B0 | 5.3 | 0.66 | 0.65 | 0.66 | 66.30 | 1.2281 |
ViT-B/16 | 86.5 | 0.70 | 0.69 | 0.70 | 70.65 | 0.6053 |
Proposed ResViT | 32.1 | 0.73 | 0.74 | 0.67 | 74.09 | 0.5260 |