Extended Data Fig. 2: Extended embeddings and feature windows. | Nature Methods

Extended Data Fig. 2: Extended embeddings and feature windows.

From: A-SOiD, an active-learning platform for expert-guided, data-efficient discovery of behavior

Extended Data Fig. 2: Extended embeddings and feature windows.The alternative text for this image may have been generated using AI.

a) Different embeddings of the features extracted from the CalMS21 dataset (feature window = 400 ms or 12 frames). Left, UMAP embedding as seen in Fig. 1g. Middle, principal component analysis (PCA, n = 2). Right, t-Distributed Stochastic Neighbor Embedding (t-SNE, n = 2). The annotations for each underlying behavior are superimposed onto the embedding (attack = red, investigation = orange, mount = blue, other = dark gray). Inserts show the underlying distribution separated by behavior (light gray). b) 2D UMAP embedding of CalMS21 features provided with the dataset. The features include the pose estimation of all body parts and 32 additional features extracted with TREBA[53]. c) 2D UMAP embeddings of feature bins across a range of 2 frames to 150 frames. Note that 2 frames (top left) is the minimum with a 30 Hz frame-rate, and 150 frames (bottom right) is considered very coarse for most observations. d) Adjusted mutual information score (AMI, black line) between the assignments and the original human annotations for a minimum cluster size range of 0.5 to 10.0% with 0.5% intervals. An AMI score of 1.0 indicates a perfect overlap even if the number of groups is different- that is, if investigation was perfectly represented by two clusters instead of one. In addition, we investigated the total number of clusters (gray, dashed line) as we expected a higher split to be more likely to incorporate the subtle differences between the behaviors. e) Selected HDBscan clustering of the embedded features (400 ms) in d. Left, the highest AMI reached a minimum cluster size of 6.0% (AMI = 0.267). Right, the highest number of clusters was found with a minimum cluster size of 1.5% (AMI = 0.195). Colors show identified clusters within each plot. Note that samples that cannot be confidently associated with a cluster are collectively annotated as noise (dark gray) and can therefore span the entire embedding. f) 2D histogram of the assigned cluster groups (y-axis) in d in relation to their ground truth annotations. Each histogram is normalized so that the sum of each column (ground truth behavior, for example, attack) is 1.0.

Back to article page