Table 6 Efficiency comparison of different object detectors. In addition to the number of parameters (Params) and FLOPs, we report CUDA memory usage (Mem) and inference speed (FPS), measured on an NVIDIA A100 GPU (80GB). All models are evaluated with an input resolution of 1088\(\times\)800. T: teacher model, S: student model.
From: Instance mask alignment for object detection knowledge distillation
Model | Params (M) | FLOPs (G) | Mem (MB) | FPS |
|---|---|---|---|---|
Single-Stage Detectors (RetinaNet) | ||||
 T: X101 | 95.86 | 424 | 367 | 29.4 |
 T: R101 | 56.96 | 283 | 220 | 30.7 |
 S: R50 | 37.97 | 215 | 148 | 41.9 |
Two-Stage Detectors (Faster R-CNN) | ||||
 T: X101 | 135.0 | 2014 | 528 | 20.6 |
 T: R101 | 60.75 | 255 | 244 | 31.1 |
 S: R50 | 41.75 | 187 | 171 | 42.1 |
Anchor-Free Detectors (RepPoints) | ||||
 T: X101 | 94.74 | 380 | 230 | 16.6 |
 T: R101 | 55.84 | 239 | 224 | 24.5 |
 S: R50 | 36.85 | 171 | 151 | 31.4 |