Table 2 Performance comparison of HUNet with other efficient CNNs and ViTs on the MTACCR Test_1 set

Model	Top-1 (%)	Params (M)	Feature_dim	Total depths	Throughput (images/s)
Model	Top-1 (%)	Params (M)	Feature_dim	Total depths	GPU	CPU	ONNX
ShiftViT_32(Base)²³	91.09	12.81 (2.43)	1280	12 (2,2,6,2)	18,022	47.1	42.2
FasterNet_32(Base)²¹	86.37	12.95 (2.56)	1280	12 (2,2,6,2)	20,268	51.6	141.8
FasterNet_24_CA^21,29	89.78	11.92 (1.54)	1280	12 (2,2,6,2)	3124	20.7	—
MobileNet_V2_1.0×³⁴	89.63	12.61 (2.22)	1280	11	4893	30	93.7
MobileNet_V2_1.0x_CA^29,34	93.61	13.05 (2.67)	1280	11	2638	21.8	—
MobileNet_V3_small²⁸	86.83	12.05 (1.67)	1280	11	16,904	111.8	322.6
MobileNet_V3_large²⁸	92.07	14.59 (4.20)	1280	15	9361	46.5	90.8
GhostNet_V2_1.0×³⁵	90.17	15.26 (4.88)	1280	16	4279	27.2	80.1
ShuffleNet_v2_x1.0³⁶	87.46	11.75 (1.38)	1280	16	4434	63.7	249
ShuffleNet_v2_x1.5³⁶	89.78	13.04 (2.66)	1280	16	2917	55	158.5
EfficientViT_M0³⁷	88.61	3.72 (2.16)	192	6 (1,2,3)	9471	49.2	330.3
HUNet_24_M0(Ours)	88.34	2.75 (1.19)	192(24 × 8)	12 (2,2,6,2)	17,318	64.9	117.7
HUNet_24(Ours)	92.59	11.82 (1.43)	1280	12 (2,2,6,2)	17,103	50.8	108.1
HUNet_32(Ours)	93.01	12.82 (2.43)	1280	12 (2,2,6,2)	17,777	39.1	81.4
HUNet_36(Ours)	93.22	13.42 (3.03)	1280	12 (2,2,6,2)	17,192	40.2	73.6
HUNet_48(Ours)	94.23	15.60 (5.22)	1280	12 (2,2,6,2)	14,806	32.7	53.9
Multi_HUNet_24(Ours)	93.28	13.89 (3.51)	1280	12 (2,2,6,2)	14,395	43.6	—

GPU Throughput and CPU Throughput are tested on the Nvidia RTX 3090 GPU and the Intel(R) Core (TM) i9-10900K CPU @ 3.70 GHz CPU, respectively. A higher Throughput indicates faster inference speed. The values in parentheses for Params represent the parameter size of the backbone network, and the numbers in the model names correspond to the number of channels C in the initial embedded feature maps.
Boldface indicates the best performance in the test set, while underlined text denotes the second-best results.

Quick links

Search