Table 1 Comparison of existing 3D action localization datasets

From: Localization and recognition of human action in 3D using transformers

Datasets

Classes

Sequences

Instances

Subjects

Modalities

Year

Duration

G3D46

20

210

1467

10

RGB, D, S

2012

CAD-12076

20

120

~1200

4

RGB, D, S

2013

Comp. Act49.

16

693

2529

14

RGB, D, S

2014

Watch-N-Patch48

21

458

~ 2500

7

RGB, D, S

2015

230 min

OAD77

10

59

~ 700

RGB, D, S

2016

216 min

PKU-MMD32

51

1076

21545

66

RGB, D, IR, S

2017

3000 min

Wei et al.51

35

201

RGB, D, S

2020

BABEL-TAL-20

20

5727

6244

346

3D Mesh, S

2024

BABEL-TAL-60

60

6808

7332

584

3D Mesh, S

2024

BABEL-TAL-ALL

102

8808

9617

925

3D Mesh, S

2024

2580 min

  1. The BABEL-TAL (BT) dataset stands out from existing 3D action localization datasets in several key ways. Firstly, it pioneers in using 3D motion-capture data to provide precise body joint movements for temporal action localization. Secondly, this dataset comprises an extensive range of action labels and showcases substantial intra-class diversity. Thirdly, the action data adheres to a long-tailed distribution, mirroring real-world scenarios. Lastly, the dataset contains continuous actions in extended motion sequences, free from environmental or actor constraints. D Depth, S Skeleton, IR Infrared Radiation.