Table 2 Quantitative evaluations on the BABEL-TAL-20 (BT-20) dataset

From: Localization and recognition of human action in 3D using transformers

Method

tIoU

mAP

 

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

 

Beyond-Joints31

14.3

13.6

13.3

12.3

11.4

10.5

8.9

6.2

4.1

10.5

ASFD72

24.2

23.1

22.6

22.2

21.9

20.4

18.9

12.2

9.0

19.3

SRN51

25.1

24.0

22.7

21.7

20.1

18.2

16.7

15.9

10.4

19.4

TSP73

26.9

25.6

24.1

23.0

22.5

20.4

17.1

13.0

10.1

20.3

G-TAD37

25.1

24.1

23.9

23.0

22.1

21.1

18.5

14.1

11.8

20.4

AGT38

27.3

26.0

25.7

24.5

23.4

21.5

19.4

15.9

12.4

21.9

ActionFormer39

30.4

27.1

25.3

25.1

24.5

22.7

20.6

16.1

12.0

22.6

LocATe

43.5

41.1

41.0

38.2

35.1

30.5

23.7

16.4

9.99

31.1

LocATe w/ tricks

46.6

45.5

43.0

40.2

36.0

30.5

23.7

15.9

9.78

32.0

  1. We report the AP with the tIoU in the range [0.1, 0.9] as well as the mAP. LocATe represents our single-stage transformer-based approach, while LocATe w/ tricks refers to our method enhanced with tricks, including iterative bounding box refinement and a two-stage decoder40. Notably, our approach LocATe outperforms the previous method Beyond-Joints, with particularly substantial improvements at lower tIoU thresholds when compared to other benchmark methods. AP Average Precision, tIoU threshold IoU, mAP mean Average Precision.