Table 4 Information of model backbones
Model | Architecture | Implementation | Version | Image Size | #Params(M) | FLOPs(G) |
---|---|---|---|---|---|---|
DeiT21 | Transformer | Distillation | Base | 384 | 86.10 | 55.65 |
ConvNeXt22 | ConvNet | Hierarchy | Tiny | 384 | 27.83 | 13.14 |
EfficientNet23 | ConvNet | Scaling | B4 | 380 | 17.56 | 4.51 |
Swin Transformer24 | Transformer | Hierarchy | Base | 384 | 86.89 | 47.19 |
DINOv225 | Transformer | Foundation Model | Base | 384 | 86.14 | 78.46 |
VisionFM26 | Transformer | Foundation Model | Base | 384 | 86.46 | 55.54 |
RealMNet-Min (Ours) | Hybrid | Hierarchy Pretraining Distillation | 21M | 224 | 20.63 | 4.28 |
RealMNet (Ours) | Hybrid | Hierarchy Pretraining Distillation | 21M | 384 | 20.66 | 13.77 |
RealMNet-Max (Ours) | Hybrid | Hierarchy Pretraining Distillation | 21M | 512 | 20.70 | 27.02 |