Table 2 Keypoint labeling performance among models.

Dataset	Full	Half	Quarter	Eighth
# Videos train	188	94	47	24
# Videos test	N/A	94	94	94
# Networks	1	10	10	10
Training RMSE pixels (mean (95%CI))	7.82	9.29 (8.13–10.73)	9.84 (8.53–11.7)	11.02 (9.11–12.91)
Testing RMSE pixels (mean (95%CI))	N/A	19.37 (16.92–22.28)	12.3 (10.51–14.4)	14.34 (12.66–15.98)
F1 score (mean (95%CI))	0.43 (0.21–0.57)	0.39 (0.17–0.54)	0.41 (0.19–0.56)	0.36 (0.14–0.52)
P-value vs full, human	N/A, 0.51	0.03, **1.7E−4	0.03, **1.4E−4	0.02, **3.9E−5

Subsets of our manually-labeled frames were used to train different neural network models using DeepLabCut. All models were initialized using the pretrained ResNet50 model available through DLC and trained for up to 100,000 iterations at a learning rate of 0.001. Performance was assessed using root-mean-squared error, in pixels, between model-assigned and manually-labeled snout and tailbase positions. *p < 0.05, ***p < 0.001.

Quick links

Search