Table 2 Convolutional neural network based action recognition.

From: HARNet in deep learning approach—a systematic survey

Method

Data type

Dataset

Performance

References

PoseConv3D

RGB + Depth

NTU-RGBD

Accuracy: 69.4, 94.2

1

Temporal difference networks

RGB

Something-SomethingV1, Kinetics

Accuracy: 68.2, 79.4

2

CNN

RGB

UCF101, HMDB51, FCVID, Activity Net

Accuracy: 98.6, 84.3, 82.1, 84.4

3

2-Stream convolution network

RGB

UCF101, HMDB51

Accuracy: 91.5, 65.9

4

3-Stream CNN

RGB

KTH, UCF101, HMDB51

Accuracy: 96.8, 92.2, 65.2

5

Multi-stream CNN

Skeleton

NTU-RGBD (CS), NTU-RGBD (CV), MSRC-12 (CS), Northwestern-UCLA

Accuracy: 80.03, 87.21, 96.62, 92.61

6

3D CNN

RGB

UCF101, HMDB51

Accuracy: 90.2

7

Actional-graph-based CNN

Skeleton

UCF50, UCF101, YouTube action, HMDB51

Accuracy: 86.8, 94.2, Top-5 acc: 56.5, Top-1 acc: 34.8

8

CNN

RGB

UCF50

Accuracy: 92.5, 65.2

9

CNN

RGB

UTD-MHAD, NTU-RGBD (CV), NTU-RGBD (CS)

Accuracy: 96.4, 94.33, 96.21, 70.33

10

CNN-genetic algorithm

RGB

UCF50

Accuracy: 99.98

11

CNN

Skeleton

UTD-MHAD, NTU-RGBD (CV), NTU-RGBD (CS)

Accuracy: 88.10, 82.3, 76.2

12