Table 4 Performance comparison of different multi-object tracking methods combined with the RT-DETR-R50 detector on an RTX 3090 GPU. Params indicate the total model size including the detector. All baseline methods were reproduced according to the official implementations or descriptions in their original papers to ensure fair comparison.

From: A spatiotemporal transformer with cross-frame encoding and trajectory-aware decoding for multi-target fish tracking

Method

MOTA

IDF1

Recall

IDP

IDR

Params(M)

FPS

SORT35

0.511

0.543

0.620

0.520

0.480

42.1

107.6

DeepSORT36

0.592

0.612

0.660

0.600

0.580

45.3

95.4

ByteTrack41

0.701

0.664

0.710

0.660

0.620

43.7

98.9

CenterTrack37

0.614

0.598

0.680

0.590

0.570

47.5

86.3

FairMOT38

0.655

0.642

0.700

0.630

0.610

49.2

83.7

TraDes46

0.628

0.606

0.690

0.600

0.580

50.6

79.4

QDTrack47

0.641

0.649

0.690

0.640

0.630

46.8

84.2

TransTrack39

0.663

0.661

0.710

0.650

0.620

53.9

68.1

MOTR7

0.676

0.668

0.720

0.660

0.640

55.4

61.7

GTR48

0.688

0.671

0.720

0.670

0.650

57.8

59.3

Ours

0.719

0.693

0.742

0.689

0.676

51.2

76.5