Table 1 Summary of Early Activity Detection Methods.
From: Bi-directional ConvLSTM networks for early recognition of human activities and action prediction
Paper | Key Points | Methodology | Datasets | Results |
|---|---|---|---|---|
Early recognition of human actions using a depth camera, no progress level assumption, soft label learning for subsequences | Regression-based model with Local Accumulative Frame Feature (LAFF) and Joint Classification-Regression Recurrent Neural Network with deep LSTM subnetworks | New dataset, G3D dataset | Outperformed existing models on RGB-D sequences | |
Enhancing robot recognition of human activities using first-person films, early recognition via the ’onset’ concept | Combines event history and visual data | Not specified | Improved and sped up recognition | |
Recognizing human activities in partially observed videos | Segmentation of activities into spatiotemporal features with sparse coding, global posterior for activities | Actual videos | Successful evaluation in activity prediction and fully observed videos | |
Human behavior recognition in real films, removal of non-action parts | Non-action classifier to reduce the importance of irrelevant segments, LSSVM | Action Thread dataset | Improved action detection performance | |
Learning models for human dynamics using switching linear dynamic system models | Variational inference method for mixed-state graphical models | Not specified | Effective in analyzing figure motion and gesture identification | |
Action anticipation with a low observation ratio | Sophisticated LSTM framework, innovative loss function | JHMDB-21, UT-Interaction, UCF-101 | Accuracy improvement of 22.0%, 14.0%, and 49.9% respectively | |
Architectural framework with knowledge distillation for early detection | Semi-supervised learning, teacher-student model | NTU RGB-D dataset | AUC of 62.8%, outperformed LSTM and RNN methods | |
Knowledge distillation for action anticipation network training | Self-supervised learning, symmetric bidirectional attention loss | JHMDB dataset | Accuracy of 76.6%, surpassing the previous best result | |
Pinpointing initiation of action using bidirectional RNN | Bidirectional LSTM for forward and backward information flow | Montalbano Gesture dataset | AUC of 61.2%, superior in ambiguous starting points |