Table 3 Performance comparison with VisDrones state-of-the-art computer vision backbone on the VisDrone 2019 validation set.
From: End to end polysemantic cooperative mixed task trainer for UAV target detection
Backbone | res. | param#(M) | Flops | Accuracy(ALL-top-1) | Accuracy(ALL-top-5) |
---|---|---|---|---|---|
ResNet-50 | 224 | 24.1M | 4.0G | 31.39 | 48.23 |
LR-net-50 | 224 | 22.2M | 4.2G | 31.39 | 48.23 |
Stand-alone\(\uparrow\) | 224 | 16.9M | 3.5G | 31.42 | – |
AA-ResNet-50 | 224 | 24.7M | 4.1G | 31.43 | 48.25 |
Botnet-s1-50 | 224 | 19.7M | 4.2G | 31.43 | – |
VIT-b/16 | 382 | – | – | 31.45 | – |
San19 | 224 | 19.4M | 3.2G | 31.48 | 48.53 |
Lambda-ResNet-50\(\uparrow\) | 224 | 13.9M | – | 31.5 | – |
PoT-50 | 224 | 21.1M | 3.2G | 33.83 | 49.13 |
PoT-50\(\uparrow\) | 224 | 21.1M | 3.2G | 33.89 | 49.17 |
SE-PoT-50 | 224 | 22.0M | 4.0G | 33.89 | 49.15 |
SE-PoT-50\(\uparrow\) | 224 | 22.0M | 4.0G | 33.96 | 49.83 |
ResNet-101 | 224 | 43.5M | 7.8G | 33.13 | 48.83 |
LR-net-101 | 224 | 40.9M | 7.9G | 33.13 | 48.84 |
AA-ResNet-101 | 224 | 44.3M | 8.0G | 33.15 | 48.85 |
PoT-101 | 224 | 37.2M | 6.0G | 34.63 | 49.53 |
PoT-101\(\uparrow\) | 224 | 37.2M | 6.0G | 34.72 | 49.57 |
SE-PoT-101 | 224 | 39.8M | 8.4G | 34.68 | 49.55 |
SE-PoT-101\(\uparrow\) | 224 | 39.8M | 8.4G | 36.03 | 50.23 |