Table 5 Hyperparameter configurations of compared models.

Model	Batch size	Optimizer	LR	Epochs	Activation	Dropout rates	Layers	Number of Heads	Embedding dimension	Patch size
Baseline	16	Adam	1e − 4	100	ReLU	0.1	12	8	384	16 × 16
CrossViT¹⁹	16	AdamW	3e − 4	100	GeLU	0.1	16	8	192 and 384	Multi-scale
ViTfSCD³⁰		Adam	1e − 4		GeLU	0.1	24	16	512	16 × 16
MedViT²⁰	32	Adam	1e − 4	100	ReLU	0.1	12	–	768	16 × 16
FLATer¹⁶	16	Adam	1e − 3	300	ReLU	0.1	12	–	512	16 × 16
CST³¹	8	SGD	1e − 4	80	ReLU	0.1	12	–	512	16 × 16
AG-CNN¹⁸	32	SGD	1e − 3	50	ReLU	0.1	–	–	–	–
GTCAD	16	AdamW	1e − 4	100	ReLU	0.1	12	8	384 and 512	16 × 16

Quick links

Search