Table 3 Methods information comparisons [MB: megabyte; M: million; S: seconds per image (A6000 GPU time)].

Datasets	Depth	Size (MB)	Parameters (M)	Training time (S)	Inference time (S)
GoogleNet²⁴	22	27	5.631	0.0643	0.0433
VGG16²⁵	16	528	134.310	0.0857	0.0500
ResNet50²⁶	50	96	23.004	0.0700	0.0450
DenseNet201²⁷	201	77	20.037	0.0986	0.0550
Inceptionv3²⁸	48	89	24.377	0.8038	0.0513
Xception²⁹	71	85	37.916	0.0957	0.0517
InceptionResnetV2³⁰	164	209	54.325	0.1029	0.0533
NasnetLarge³¹	533	332	84.769	1.7736	0.4317
EfficientNetB7³²	438	256	63.818	2.7429	0.5083
Vision transformer¹⁷	225	327.366	85.817M	0.1271	0.0817
CONVT³³	208	327.226	85.780	0.0950	0.0667
Beit_large³⁴	369	1354.662	304.662	7.7307	2.2033
RVT	340	74.965	19.626	4.3943	0.9200

The inference time includes both validation and test datasets). Note that parameters are trainable parameters using our dataset, and they will be different from the number of parameters in the original model.

Quick links

Search