Fig. 2: COCA as a data augmenter for multi-animal tracking.

a, Concept diagram of COCA. From the raw scenario, the instances of background and animals can be synthesized with occlusion in a new combination. That achieves generation of big data from small data. b, Video capture of two free-moving animals. Two animals are put in the transparent circular open field and the video streams of behaviour are captured by a camera array. c, COCA as a general augmenter for multi-animal patching according to a little manually labelled data. Behavioural video streams are separated into backgrounds (top left), trajectories (medium left) and manually labelled masks (bottom left). Self-training instance segmentation model is used to predict more unlabelled masks from manually labelled masks. They are then combined with backgrounds and trajectories to generate new scenarios of two free-moving mice. d, Mask and pose prediction. Spatial-temporal learning is used for the new scenarios and to predict the masks of real mouse instances. Then, the single-animal pose estimation model can be used for each animal and, further, the 2D poses of them are merged to achieve multi-animal pose estimation. e, 3D pose reconstruction. The camera array is calibrated by chessboard images using Zhang’s calibration. Reprojection errors of all combination pairs of 2D poses of each animal are optimized for 3D reconstruction. The top right shows a 3D view of the 3D poses of two mice in this case. The bottom right shows a 2D view of the 3D poses of two mice. f, Comparison of the number of manually labelled points of SBeA and maDLC. g, Distance distribution of two free-moving mice. Pink stems are distance boundaries clustered by k-means (close 60.69, interim 195.03, far 327.47). h, Prediction error comparison of all validation data. The differences between all and close data are about ±2 pixels (two-way ANOVA followed by a Sidak multiple comparisons test, n1 (All) = 14,400, n2 (Close) = 4,602: the adjusted P values from nose to tip of tail are <0.0001, 0.0023, <0.0001, 0.0369, 0.1049, 0.0590, 0.0002, <0.0001, <0.0001, 0.2068, 0.0026, 0.0013, 0.4167, <0.0001, <0.0001 and <0.0001). Stems represent the mean values of each violin plot. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001. NS, not significant.