Table 6 The experimental results for multiple different Transformers.

From: Visual feature-based multi-scale hybrid attention network for fine-grained Hawthorn varieties identification

Model Name

Top-1 ACC (%)

Params (M)

FLOPs

FPS

ViT40

70.3

86.57

16.86

17.17

FocalNet41

78.47

27.67

4.41

14.69

Swin Transformer29

85.67

28.29

4.36

24.29

CMT42

73.22

9.51

0.62

20.56

CvT43

73.76

20.23

4.53

22.83

PVT44

67.25

24.52

3.81

21.37

MaxViT51

75.42

30.92

5.48

14.73

EfficientViT52

83.52

2.3

79

49.62

SwinFG53

87.69

28.62

4.42

22.78

Ours

90.96

15.92

4.76

15.8