Fig. 1: Creating the Human Action Video Database.
From: Revealing Key Dimensions Underlying the Recognition of Dynamic Human Actions

a Starting with the Moments in Time database18, we used a multi-step process to generate a naturalistic, well-balanced dataset of 1 s videos, consisting of 768 videos from 256 action categories. To identify representative 1 s video segments and assess the similarity structure of the action category distribution within the dataset, we used a deep neural network (DNN), combined with a human-guided selection process. b To generate the database, we (1) removed non-human action categories from the initial database and then used the softmax vectors from a DNN fed with hand-selected human-related action videos for a hierarchical clustering analysis (HCA, see corresponding dendrogram), identifying 256 categories, (2) manually selected three individual video stimuli to represent each action category, evaluated with a human rating experiment for understandability and interpretability of the videos, (3) selected a representative 1 s video segment additionally eliminating scene cuts, and (4) standardized resolution, aspect ratio and frame rate (fps) for all videos.