Table 5 Performance comparison of different semantic alignment methods.
Dataset | Method | MOTA\(\uparrow\) | MOTP\(\uparrow\) | IDF1 (%)\(\uparrow\) | IDSW\(\downarrow\) | MT (%)\(\uparrow\) | ML (%)\(\uparrow\) | FP\(\downarrow\) | FN\(\downarrow\) |
|---|---|---|---|---|---|---|---|---|---|
VisDrone | None | 36.4 | 73.8 | 55.8 | 1547 | 51.3 | 45.4 | 7236 | 11368 |
Feature Fusion | 37.5 | 75.2 | 60.2 | 1266 | 53.6 | 53.6 | 6915 | 10762 | |
ESC | 38.9 | 76.9 | 65.7 | 981 | 54.8 | 62.9 | 6105 | 10093 | |
UAVDT | None | 57.1 | 73.7 | 63.8 | 1892 | 43.7 | 23.7 | 26207 | 64875 |
Feature Fusion | 58.8 | 74.6 | 65.5 | 1521 | 47.1 | 25.4 | 23658 | 62673 | |
ESC | 61.9 | 75.3 | 68.1 | 1278 | 53.8 | 28.2 | 21478 | 51238 |