Table 5 The average number of images processed by our models in one second at training and inference time.

Model	BEiT	ViT	SWIN	ConvNeXt	Ensemble
Training (imgs/s) \(\uparrow \)	20.32	21.72	32.57	13.16	4.95
Inference (imgs/s) \(\uparrow \)	65.68	70.26	102.70	52.88	17.21

The values have been evaluated based on 1000 iterations. The higher the value, the faster the processing time.

Quick links

Search