Table 1 Summary of Early Activity Detection Methods.

From: Bi-directional ConvLSTM networks for early recognition of human activities and action prediction

Paper

Key Points

Methodology

Datasets

Results

11

Early recognition of human actions using a depth camera, no progress level assumption, soft label learning for subsequences

Regression-based model with Local Accumulative Frame Feature (LAFF) and Joint Classification-Regression Recurrent Neural Network with deep LSTM subnetworks

New dataset, G3D dataset

Outperformed existing models on RGB-D sequences

12

Enhancing robot recognition of human activities using first-person films, early recognition via the ’onset’ concept

Combines event history and visual data

Not specified

Improved and sped up recognition

13

Recognizing human activities in partially observed videos

Segmentation of activities into spatiotemporal features with sparse coding, global posterior for activities

Actual videos

Successful evaluation in activity prediction and fully observed videos

23

Human behavior recognition in real films, removal of non-action parts

Non-action classifier to reduce the importance of irrelevant segments, LSSVM

Action Thread dataset

Improved action detection performance

24

Learning models for human dynamics using switching linear dynamic system models

Variational inference method for mixed-state graphical models

Not specified

Effective in analyzing figure motion and gesture identification

25

Action anticipation with a low observation ratio

Sophisticated LSTM framework, innovative loss function

JHMDB-21, UT-Interaction, UCF-101

Accuracy improvement of 22.0%, 14.0%, and 49.9% respectively

27

Architectural framework with knowledge distillation for early detection

Semi-supervised learning, teacher-student model

NTU RGB-D dataset

AUC of 62.8%, outperformed LSTM and RNN methods

28

Knowledge distillation for action anticipation network training

Self-supervised learning, symmetric bidirectional attention loss

JHMDB dataset

Accuracy of 76.6%, surpassing the previous best result

29

Pinpointing initiation of action using bidirectional RNN

Bidirectional LSTM for forward and backward information flow

Montalbano Gesture dataset

AUC of 61.2%, superior in ambiguous starting points