Table 8 Comparison with transformer-based detectors.
Model | mAP (%) | F1-score (%) | FPS | Architecture Type |
---|---|---|---|---|
DETR (ResNet-50) | 94.8 | 93.2 | 7.1 | Transformer + CNN |
DINO (Swin-L) | 96.1 | 94.8 | 5.6 | Transformer (encoder-decoder) |
ViTDet (ViT-L) | 96.3 | 95.1 | 6.2 | Vision Transformer |
YOLOv10-ST (Proposed) | 97.62 | 96.58 | 48.7 | Swin |