Table 5 Performance comparison of different semantic alignment methods.

From: End-to-end multiple object tracking in high-resolution optical sensors of drones with transformer models

Dataset

Method

MOTA\(\uparrow\)

MOTP\(\uparrow\)

IDF1 (%)\(\uparrow\)

IDSW\(\downarrow\)

MT (%)\(\uparrow\)

ML (%)\(\uparrow\)

FP\(\downarrow\)

FN\(\downarrow\)

VisDrone

None

36.4

73.8

55.8

1547

51.3

45.4

7236

11368

Feature Fusion

37.5

75.2

60.2

1266

53.6

53.6

6915

10762

ESC

38.9

76.9

65.7

981

54.8

62.9

6105

10093

UAVDT

None

57.1

73.7

63.8

1892

43.7

23.7

26207

64875

Feature Fusion

58.8

74.6

65.5

1521

47.1

25.4

23658

62673

ESC

61.9

75.3

68.1

1278

53.8

28.2

21478

51238

  1. The bold entities indicate the best result of the comparison methods