Table 4 Evaluation results of pre-trained Vision Transformers (ViTs) and their ensembles using weighted and soft voting.

Models/methods	Accuracy	Precision	Recall	F1	MCC	Inf. time
ViT-base³⁹	0.80	0.81	0.80	0.77	0.46	0.8 ms
ViT-large³⁹	0.87	0.87	0.87	0.87	0.62	15.5 ms
Swin-tiny⁴¹	0.85	0.85	0.85	0.84	0.65	4.7 ms
Swin-base⁴¹	0.90	0.90	0.90	0.89	0.77	3.1 ms
DeiT-small⁴⁰	0.83	0.82	0.83	0.82	0.60	10.7 ms
DeiT-base⁴⁰	0.87	0.87	0.87	0.87	0.64	4.7 ms
Vision transformer ensemble
Weighted voting ensemble	0.91	0.91	0.91	0.90	0.76	31.56 ms
Soft voting ensemble	0.93	0.94	0.93	0.93	0.77	7.0 ms

Quick links

Search