Table 6 Efficiency comparison of different object detectors. In addition to the number of parameters (Params) and FLOPs, we report CUDA memory usage (Mem) and inference speed (FPS), measured on an NVIDIA A100 GPU (80GB). All models are evaluated with an input resolution of 1088\(\times\)800. T: teacher model, S: student model.

From: Instance mask alignment for object detection knowledge distillation

Model

Params (M)

FLOPs (G)

Mem (MB)

FPS

Single-Stage Detectors (RetinaNet)

 T: X101

95.86

424

367

29.4

 T: R101

56.96

283

220

30.7

 S: R50

37.97

215

148

41.9

Two-Stage Detectors (Faster R-CNN)

 T: X101

135.0

2014

528

20.6

 T: R101

60.75

255

244

31.1

 S: R50

41.75

187

171

42.1

Anchor-Free Detectors (RepPoints)

 T: X101

94.74

380

230

16.6

 T: R101

55.84

239

224

24.5

 S: R50

36.85

171

151

31.4