Table 5 Comparative performance of baseline architectures (ResNet-50, ViT-B/16, EfficientNet-B0) and the proposed ResViT model on the MCI dataset. The ResViT model achieves superior accuracy and F1-score, demonstrating the effectiveness of combining convolutional and transformer-based feature extraction.

From: MCI detection from handwritten drawing test using residual vision transformer

Model

Parameters (M)

Precision

Recall

F1-Score

Accuracy (%)

Loss

ResNet-50

25.6

0.58

0.57

0.57

57.25

1.2706

EfficientNet-B0

5.3

0.66

0.65

0.66

66.30

1.2281

ViT-B/16

86.5

0.70

0.69

0.70

70.65

0.6053

Proposed ResViT

32.1

0.73

0.74

0.67

74.09

0.5260