Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Lightning Pose: improved animal pose estimation via semi-supervised learning, Bayesian ensembling and cloud-native open-source tools

Abstract

Contemporary pose estimation methods enable precise measurements of behavior via supervised deep learning with hand-labeled video frames. Although effective in many cases, the supervised approach requires extensive labeling and often produces outputs that are unreliable for downstream analyses. Here, we introduce ‘Lightning Pose’, an efficient pose estimation package with three algorithmic contributions. First, in addition to training on a few labeled video frames, we use many unlabeled videos and penalize the network whenever its predictions violate motion continuity, multiple-view geometry and posture plausibility (semi-supervised learning). Second, we introduce a network architecture that resolves occlusions by predicting pose on any given frame using surrounding unlabeled frames. Third, we refine the pose predictions post hoc by combining ensembling and Kalman smoothing. Together, these components render pose trajectories more accurate and scientifically usable. We released a cloud application that allows users to label data, train networks and process new videos directly from the browser.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Fully supervised pose estimation often outputs unstable predictions and requires many labels to generalize to new animals.
Fig. 2: Lightning Pose exploits unlabeled data in pose estimation model training.
Fig. 3: Unsupervised losses complement model confidence for outlier detection.
Fig. 4: Unlabeled frames improve pose estimation (raw network predictions).
Fig. 5: The EKS post-processor.
Fig. 6: Lightning Pose models and EKS improve pose estimation on IBL-pupil data.

Similar content being viewed by others

Data availability

All labeled data used in this paper are publicly available.

mirror-mouse https://doi.org/10.6084/m9.figshare.24993315.v1 (ref. 71)

mirror-fish https://doi.org/10.6084/m9.figshare.24993363.v1 (ref. 72)

CRIM13 https://doi.org/10.6084/m9.figshare.24993384.v1 (ref. 73)

IBL-paw https://ibl-brain-wide-map-public.s3.amazonaws.com/aggregates/Tags/2023_Q1_Biderman_Whiteway_et_al/_ibl_videoTracking.trainingDataPaw.7e79e865-f2fc-4709-b203-77dbdac6461f.zip

IBL-pupil https://ibl-brain-wide-map-public.s3.amazonaws.com/aggregates/Tags/2023_Q1_Biderman_Whiteway_et_al/_ibl_videoTracking.trainingDataPupil.27dcdbb6-3646-4a50-886d-03190db68af3.zip

All of the model predictions on labeled frames and unlabeled videos are available via Figshare at https://doi.org/10.6084/m9.figshare.25412248.v2 (ref. 74). These results, along with the labeled data, can be used to reproduce the main figures of the paper.

To access the IBL data analyzed in Fig. 6 and Extended Data Fig. 7, see the documentation at https://int-brain-lab.github.io/ONE/FAQ.html#how-do-i-download-the-datasets-cache-for-a-specific-ibl-paper-release and use the tag 2023_Q1_Biderman_Whiteway_et_al. This will provide access to spike-sorted neural activity, trial timing variables (stimulus onset, feedback delivery and so on), the original IBL DeepLabCut traces and the raw videos.

Code availability

The code for Lightning Pose is available at https://github.com/danbider/lightning-pose/ under the MIT license. The repository also contains a Google Colab tutorial notebook that trains a model, forms predictions on videos and visualizes the results. From the command-line interface, running pip install lightning-pose will install the latest release of Lightning Pose via the Python Package Index (PyPI).

The code for the EKS is available at https://github.com/paninski-lab/eks/ under the MIT license. The repository contains the core EKS code as well as scripts demonstrating how to use the code on several example datasets.

The code for the cloud-hosted application is available at https://github.com/Lightning-Universe/Pose-app/ under the Apache-2.0 license. This code enables launching our app locally or on cloud resources by creating a Lightning.ai account.

Code for reproducing the figures in the main text is available at https://github.com/themattinthehatt/lightning-pose-2024-nat-methods/ under the MIT license. This repository also includes a script for downloading all required data from the proper repositories.

The hardware and software used for IBL video collection is described elsewhere41. The protocols used in the mirror-mouse and mirror-fish datasets (both have the same video acquisition pipeline) are also described elsewhere14.

We used the following packages in our data analysis: CUDA toolkit (12.1.0), cuDNN (8.5.0.96), deeplabcut (2.3.5 for runtime benchmarking, 2.2.3 for everything else), ffmpeg (3.4.11), fiftyone (0.23.4), h5py (3.9.0), hydra-core (1.3.2), ibllib (2.32.3), imgaug (0.4.0), kaleido (0.2.1), kornia (0.6.12), lightning (2.1.0), lightning-pose (1.0.0), matplotlib (3.7.5), moviepy (1.0.3), numpy (1.24.4), nvidia-dali-cuda120 (1.28.0), opencv-python (4.9.0.80), pandas (2.0.3), pillow (9.5.0), plotly (5.15.0), scikit-learn (1.3.0), scipy (1.10.1), seaborn (0.12.2), streamlit (1.31.1), tensorboard (2.13.0) and torchvision (0.15.2).

References

  1. Krakauer, J. W., Ghazanfar, A. A., Gomez-Marin, A., MacIver, M. A. & Poeppel, D. Neuroscience needs behavior: correcting a reductionist bias. Neuron 93, 480–490 (2017).

    Article  CAS  PubMed  Google Scholar 

  2. Branson, K., Robie, A. A., Bender, J., Perona, P. & Dickinson, M. H. High-throughput ethomics in large groups of Drosophila. Nat. Methods 6, 451–457 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Berman, G. J., Choi, D. M., Bialek, W. & Shaevitz, J. W. Mapping the stereotyped behaviour of freely moving fruit flies. J. Royal Soc. Interface 11, 20140672 (2014).

    Article  Google Scholar 

  4. Wiltschko, A. B. et al. Mapping sub-second structure in mouse behavior. Neuron 88, 1121–1135 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Wiltschko, A. B. et al. Revealing the structure of pharmacobehavioral space through motion sequencing. Nat. Neurosci. 23, 1433–1443 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Luxem, K. et al. Identifying behavioral structure from deep variational embeddings of animal motion. Commun. Biol. 5, 1267 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Mathis, A. et al. Deeplabcut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21, 1281–1289 (2018).

    Article  CAS  PubMed  Google Scholar 

  8. Pereira, T. D. et al. Fast animal pose estimation using deep neural networks. Nat. Methods 16, 117–125 (2019).

    Article  CAS  PubMed  Google Scholar 

  9. Graving, J. M. et al. Deepposekit, a software toolkit for fast and robust animal pose estimation using deep learning. Elife 8, e47994 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Dunn, T. W. et al. Geometric deep learning enables 3D kinematic profiling across species and environments. Nat. Methods 18, 564–573 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Chen, Z. et al. Alphatracker: a multi-animal tracking and behavioral analysis tool. Front. Behav. Neurosci. 17, 1111908 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Jones, J. M. et al. A machine-vision approach for automated pain measurement at millisecond timescales. Elife 9, e57258 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Padilla-Coreano, N. et al. Cortical ensembles orchestrate social competition through hypothalamic outputs. Nature 603, 667–671 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Warren, R. A. et al. A rapid whisker-based decision underlying skilled locomotion in mice. Elife 10, e63596 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Hsu, A. I. & Yttri, E. A. B-SOiD, an open-source unsupervised algorithm for identification and fast prediction of behaviors. Nat. Commun. 12, 5188 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Pereira, T. D. et al. Sleap: a deep learning system for multi-animal pose tracking. Nat. Methods 19, 486–495 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Weinreb, C. et al. Keypoint-MoSeq: parsing behavior by linking point tracking to pose dynamics. Preprint at bioRxiv https://doi.org/10.1101/2023.03.16.532307 (2023).

  18. Karashchuk, P. et al. Anipose: a toolkit for robust markerless 3D pose estimation. Cell Rep. 36, 109730 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Monsees, A. et al. Estimation of skeletal kinematics in freely moving rodents. Nat. Methods 19, 1500–1509 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Rodgers, C. C. A detailed behavioral, videographic, and neural dataset on object recognition in mice. Sci. Data 9, 620 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Chapelle, O., Schölkopf, B. & Zien, A. (eds) Semi-Supervised Learning (The MIT Press, 2006).

  22. Lakshminarayanan, B., Pritzel, A. & Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. in Advances in Neural Information Processing Systems vol. 30 (eds Guyon, I. et al.) (Curran Associates, 2017).

  23. Abe, T. et al. Neuroscience cloud analysis as a service: An open-source platform for scalable, reproducible data analysis. Neuron 110, 2771–2789 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Falcon, W. et al. Pytorchlightning/pytorch-lightning: 0.7.6 release. Zenodo https://doi.org/10.5281/zenodo.3828935 (2020).

  25. Recht, B., Roelofs, R., Schmidt, L. & Shankar, V. Do imagenet classifiers generalize to imagenet? In International Conference on Machine Learning, 5389–5400 (PMLR, 2019).

  26. Tran, D. et al. Plex: Towards reliability using pretrained large model extensions. Preprint at https://arxiv.org/abs/2207.07411 (2022).

  27. Burgos-Artizzu, X. P., Dollár, P., Lin, D., Anderson, D. J. & Perona, P. Social behavior recognition in continuous video. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, 1322–1329 (IEEE, 2012).

  28. Segalin, C. et al. The mouse action recognition system (mars) software pipeline for automated analysis of social behaviors in mice. Elife 10, e63720 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. IBL. Data release - Brainwide map - Q4 2022 (2023). Figshare https://doi.org/10.6084/m9.figshare.21400815.v6 (2022).

  30. Desai, N. et al. Openapepose, a database of annotated ape photographs for pose estimation. Elife 12, RP86873 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Syeda, A. et al. Facemap: a framework for modeling neural activity based on orofacial tracking. Nat. Neurosci. 27, 187–195 (2024).

  32. Spelke, E. S. Principles of object perception. Cogn. Sci. 14, 29–56 (1990).

    Article  Google Scholar 

  33. Wu, A. et al. Deep graph pose: a semi-supervised deep graphical model for improved animal pose tracking. in Advances in Neural Information Processing Systems (eds Larochelle, H. et al.) 6040–6052 (2020).

  34. Nath, T. et al. Using deeplabcut for 3D markerless pose estimation across species and behaviors. Nat. Protoc. 14, 2152–2176 (2019).

    Article  CAS  PubMed  Google Scholar 

  35. Zhang, Y. & Park, H. S. Multiview supervision by registration. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 420–428 (2020).

  36. He, Y., Yan, R., Fragkiadaki, K. & Yu, S.-I. Epipolar transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7779–7788 (2020).

  37. Hartley, R. & Zisserman, A. Multiple View Geometry in Computer Vision (Cambridge University Press, 2003).

  38. Bialek, W. On the dimensionality of behavior. Proc. Natl Acad. Sci. uSA 119, e2021860119 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Stephens, G. J., Johnson-Kerner, B., Bialek, W. & Ryu, W. S. From modes to movement in the behavior of caenorhabditis elegans. PloS ONE 5, e13914 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Yan, Y., Goodman, J. M., Moore, D. D., Solla, S. A. & Bensmaia, S. J. Unexpected complexity of everyday manual behaviors. Nat. Commun. 11, 3564 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. IBL. Video hardware and software for the international brain laboratory. Figshare https://doi.org/10.6084/m9.figshare.19694452.v1 (2022).

  42. Li, T., Severson, K. S., Wang, F. & Dunn, T. W. Improved 3Dd markerless mouse pose estimation using temporal semi-supervision. Int. J. Comput. Vis. 131, 1389–1405 (2023).

  43. Beluch, W. H., Genewein, T., Nürnberger, A. & Köhler, J. M. The power of ensembles for active learning in image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 9368–9377 (2018).

  44. Abe, T., Buchanan, E. K., Pleiss, G., Zemel, R. & Cunningham, J. P. Deep ensembles work, but are they necessary? in Advances in Neural Information Processing Systems 35, 33646–33660 (2022).

    Google Scholar 

  45. Bishop, C. M. & Nasrabadi, N. M. Pattern Recognition and Machine Learning, vol. 4 (Springer, 2006).

  46. Yu, H. et al. AP-10K: a benchmark for animal pose estimation in the wild. Preprint at https://arxiv.org/abs/2108.12617 (2021).

  47. Ye, S. et al. SuperAnimal models pretrained for plug-and-play analysis of animal behavior. Preprint at https://arxiv.org/abs/2203.07436 (2022).

  48. Zheng, C. et al. Deep learning-based human pose estimation: a survey. ACM Computing Surveys 56, 1–37 (2023).

    Article  Google Scholar 

  49. Lin, T. -Y. et al. Microsoft coco: common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings. Vol. 8693, 740–755 (Springer, 2014).

  50. Ionescu, C., Papava, D., Olaru, V. & Sminchisescu, C. Human3. 6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1325–1339 (2013).

    Article  Google Scholar 

  51. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G. & Black, M. J. SMPL: a skinned multi-person linear model. In Seminal Graphics Papers: Pushing the Boundaries. Vol. 2, 851–866 (2023).

  52. Marshall, J. D., Li, T., Wu, J. H. & Dunn, T. W. Leaving flatland: advances in 3D behavioral measurement. Curr. Opin. Neurobiol. 73, 102522 (2022).

    Article  CAS  PubMed  Google Scholar 

  53. Günel, S. et al. DeepFly3D, a deep learning-based approach for 3D limb and appendage tracking in tethered, adult Drosophila. Elife 8, e48571 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  54. Sun, J. J. et al. BKinD-3D: self-supervised 3D keypoint discovery from multi-view videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9001–9010 (2023).

  55. Bala, P. C. et al. Automated markerless pose estimation in freely moving macaques with openmonkeystudio. Nat. Commun. 11, 4560 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. Preprint at https://arxiv.org/abs/1503.02531 (2015).

  57. Lauer, J. et al. Multi-animal pose estimation, identification and tracking with deeplabcut. Nat. Meth. 19, 496–504 (2022).

    Article  CAS  Google Scholar 

  58. Chettih, S. N., Mackevicius, E. L., Hale, S. & Aronov, D. Barcoding of episodic memories in the hippocampus of a food-caching bird. Cell 187, 1922–1935 (2024).

    Article  CAS  PubMed  Google Scholar 

  59. IBLet al. Standardized and reproducible measurement of decision-making in mice. Elife 10, e63711 (2021).

    Article  Google Scholar 

  60. IBL et al. Reproducibility of in vivo electrophysiological measurements in mice. Preprint at bioRxiv https://doi.org/10.1101/2022.05.09.491042 (2022).

  61. Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. in Advances in Neural Information Processing Systems 32, 8024–8035 (2019).

  62. Jafarian, Y., Yao, Y. & Park, H. S. MONET: multiview semi-supervised keypoint via epipolar divergence. Preprint at https://arxiv.org/abs/1806.00104 (2018).

  63. Tresch, M. C. & Jarc, A. The case for and against muscle synergies. Curr. Opin. Neurobiol. 19, 601–607 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Stephens, G. J., Johnson-Kerner, B., Bialek, W. & Ryu, W. S. Dimensionality and dynamics in the behavior of C. elegans. PLoS Comput. Biol. 4, e1000028 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  65. Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).

  66. Virtanen, P. et al. Scipy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. IBL et al. A brain-wide map of neural activity during complex behaviour. Preprint at bioRxiv https://doi.org/10.1101/2023.07.04.547681 (2023).

  68. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  69. Zolnouri, M., Li, X. & Nia, V. P. Importance of data loading pipeline in training deep neural networks. Preprint at https://arxiv.org/abs/2005.02130 (2020).

  70. Yadan, O. Hydra - a framework for elegantly configuring complex applications. Github https://github.com/facebookresearch/hydra (2019).

  71. Whiteway, M, Biderman, D., Warren, R., Zhang, Q. & Sawtell, N. B. Lightning Pose dataset: mirror-mouse. Figshare https://doi.org/10.6084/m9.figshare.24993315.v1 (2024).

  72. Whiteway, M. et al. Lightning Pose dataset: mirror-fish. Figshare https://doi.org/10.6084/m9.figshare.24993363.v1 (2024).

  73. Whiteway, M. & Biderman, D. Lightning Pose dataset: CRIM13. Figshare https://doi.org/10.6084/m9.figshare.24993384.v1 (2024).

  74. Whiteway, M. & Biderman, D. Lightning Pose results: Nature Methods 2024. Figshare https://doi.org/10.6084/m9.figshare.25412248.v2 (2024).

Download references

Acknowledgements

We thank P. Dayan and N. Steinmetz for serving on our IBL-paper board, as well as two anonymous reviewers whose detailed comments considerably strengthened our paper. We are grateful to N. Biderman for productive discussions and help with visualization. We thank M. Carandini and J. Portes for helpful comments; T. Abe, K. Buchanan and G. Pleiss for helpful discussions on ensembling; and H. Xiang for conversations on active learning and outlier detection. We thank W. Falcon, L. Antiga, T. Chaton and A. Wälchi (Lightning AI) for their technical support and advice on implementing our package and the cloud application. This work was supported by the following grants: Gatsby Charitable Foundation GAT3708 (to D.B., M.R.W., C.H., N.R.G., A.V., J.P.C. and L.P.), German National Academy of Sciences Leopoldina (to A.E.U.), Irma T Hirschl Trust (to N.B.S.), Netherlands Organisation for Scientific Research (VI.Veni.212.184; to A.E.U.), NSF IOS-2115007 (to N.B.S.), National Institutes of Health (NIH) K99NS128075 (to J.P.N.), NIH NS075023 (to N.B.S.), NIH NS118448 (to N.B.S.), NIH DK131086-02 (to N.B.S.), NIH U19NS123716 (to M.R.W.) and NSF 1707398 (to D.B., M.R.W., C.H., N.R.G., A.V., J.P.C. and L.P.), funds provided by the National Science Foundation and by DoD OUSD (R&E) under Cooperative Agreement PHY-2229929, The NSF AI Institute for Artificial and Natural Intelligence (to D.B., M.R.W., C.H., A.V., J.P.C. and L.P.), Simons Foundation (to M.R.W., M.M.S., J.M.H., A.K., G.T.M., J.P.N., A.P.V. and K.Z.S.) and the Wellcome Trust 216324 (M.M.S., J.M.H., A.K., G.T.M., J.P.N., A.P.V. and K.Z.S.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

Conceptualization: D.B., M.R.W. and L.P.; software package—core development: D.B., M.R.W. and N.R.G.; software package—contribution: C.H. and A.V.; cloud application—development: M.R.W., D.B., R.L. and A.V.; first draft—writing: D.B., M.R.W. and L.P.; first draft—editing: D.B., M.R.W., C.H. and L.P.; data collection: D.B., M.S., J.M.H., A.K., G.T.M., J.P.N., A.P.V., K.Z.S., A.E.U., R.W., D.N. and F.P.; Funding—J.P.C., N.S. and L.P.; semi-supervised learning algorithms: D.B., M.R.W., N.R.G. and L.P.; deep ensembling: D.B., M.R.W., C.H. and L.P.; EKS: C.H. and L.P.; TCN: C.H., D.B., M.R.W. and L.P.; diagnostic tools and visualization: D.B., M.R.W. and A.V.; neural network experiments and analysis: D.B. and M.R.W.

Corresponding authors

Correspondence to Dan Biderman or Matthew R. Whiteway.

Ethics declarations

Competing interests

R.S.L. assisted in the initial development of the cloud application as a solution architect at Lightning AI in Spring/Summer 2022. R.S.L. left the company in August 2022 and continues to hold shares. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Nina Vogt, in collaboration with the Nature Methods team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Unsupervised losses complement model confidence for outlier detection on mirror-fish dataset.

Example traces, unsupervised metrics, and predictions from a DeepLabCut model (trained on 354 frames) on held-out videos. Conventions for panels A-D as in Fig. 3. A: Example frame sequence. B: Example traces from the same video. C: Total number of keypoints flagged as outliers by each metric, and their overlap. D: Area under the receiver operating characteristic curve for several body parts. We define a ‘true outlier’ to be frames where the horizontal displacement between top and bottom predictions or the vertical displacement between top and right predictions exceeds 20 pixels. AUROC values are only shown for the three body parts that have corresponding keypoints across all three views included in the Pose PCA computation (many keypoints are excluded from the Pose PCA subspace due to many missing hand labels). AUROC values are computed across frames from 10 test videos; boxplot variability is over n=5 random subsets of training data. The same subset of keypoints is used for panel C. Boxes in panel D use 25th/50th/75th percentiles for min/center/max; whiskers extend to 1.5 * IQR.

Extended Data Fig. 2 Unsupervised losses complement model confidence for outlier detection on CRIM13 dataset.

Example traces, unsupervised metrics, and predictions from a DeepLabCut model (trained on 800 frames) on held-out videos. Conventions for panels A-C as in Fig. 3. A: Example frame sequence. B: Example traces from the same video. Because the size of CRIM13 frames are larger than those of the mirror-mouse and mirror-fish datasets, we use a threshold of 50 pixels instead of 20 to define outliers through the unsupervised losses. C: Total number of keypoints flagged as outliers by each metric, and their overlap. Outliers are collected from predictions across frames from 18 test videos and across predictions from five different networks trained on random subsets of labeled data.

Extended Data Fig. 3 PCA-derived losses drive most improvements in semi-supervised models.

For each model type we train three networks with different random seeds controlling the data presentation order. The models train on 75 labeled frames and unlabeled videos. We plot the mean pixel error and 95% CI across keypoints and OOD frames, as a function of ensemble standard deviation, as in Fig. 4. At the 100% vertical line, n=17150 keypoints for mirror-mouse, n=18180 for mirror-fish, and n=89180 for CRIM13.

Extended Data Fig. 4 Unlabeled frames improve pose estimation in mirror-fish dataset.

Conventions as in Fig. 4. A. Example traces from the baseline model and the semi-supervised TCN model (trained with 75 labeled frames) for a single keypoint on a held-out video (Supplementary Video 6). B. A sequence of frames corresponding to the grey shaded region in panel (A). C. Pixel error as a function of ensemble standard devation for scarce (top) and abundant (bottom) labeling regimes. D. Individual unsupervised loss terms plotted as a function of ensemble standard deviation for the scarce (top) and abundant (bottom) label regimes.

Extended Data Fig. 5 Unlabeled frames improve pose estimation in CRIM13 dataset.

Conventions as in Fig. 4. A. Example traces from the baseline model and the semi-supervised TCN model (trained with 800 labeled frames) for a single keypoint on a held-out video (Supplementary Video 7). B. A sequence of frames corresponding to the grey shaded region in panel (A). C. Pixel error as a function of ensemble standard deviation for scarce (top) and abundant (bottom) labeling regimes. D. Individual unsupervised loss terms plotted as a function of ensemble standard deviation for the scarce (top) and abundant (bottom) labeling regimes.

Extended Data Fig. 6 The Ensemble Kalman Smoother improves pose estimation across datasets.

We trained an ensemble of five semi-supervised TCN models on the same training data. The networks differed in the order of data presentation and in the random weight initializations for their ‘head’. This figure complements Fig. 5 which uses an ensemble of DeepLabCut models as input to EKS. A. Mean OOD pixel error over frames and keypoints as a function of ensemble standard deviation (as in Fig. 4). B. Time series of predictions (x and y coordinates on top and bottom, respectively) from the five individual semi-supervised TCN models (75 labeled training frames; blue lines) and EKS-temporal (brown lines). Ground truth labels are shown as green dots. C,D. Identical to A,B but for the mirror-fish dataset. E,F. Identical to A,B but for the CRIM13 dataset.

Extended Data Fig. 7 Lightning Pose models and ensemble smoothing improve pose estimation on IBL paw data.

A. Sample frames from each camera view overlaid with a subset of paw markers estimated from DeepLabCut (left), Lightning Pose using a semi-supervised TCN model (center), and a 5-member ensemble using semi-supervised TCN models (right). B. Example left view frames from a subset of 44 IBL sessions. C. The empirical distribution of the right paw position from each view projected onto the 1D subspace of maximal correlation in a canonical correlation analysis (CCA). Column arrangement as in A. D. Correlation in the CCA subspace is computed across n=44 sessions for each model and paw. The LP+EKS model has a correlation of 1.0 by construction. E. Median right paw speed plotted across correct trials aligned to first movement onset of the wheel; error bars show 95% confidence interval across n=273 trials. The same trial consistency metric from Fig. 6 is computed. F. Trial consistency computed across n=44 sessions. G. Example traces of Kalman smoothed right paw speed (blue) and predictions from neural activity (orange) for several trials using cross-validated, regularized linear regression. H. Neural decoding performance across n=44 sessions. Panels D, F, and H use a one-sided Wilcoxon signed-rank test; boxes use 25th/50th/75th percentiles for min/center/max; whiskers extend to 1.5 * IQR. See Supplementary Table 2 for further quantification of boxes.

Extended Data Fig. 8 Lightning Pose enables easy model development, fast training, and is accessible via a cloud application.

A. Our software package outsources many tasks to existing tools within the deep learning ecosystem, resulting in a lighter, modular package that is easy to maintain and extend. The innermost purple box indicates the core components: accelerated video reading (via NVIDIA DALI), modular network design, and our general-purpose loss factory. The middle purple box denotes the training and logging operations which we outsource to PyTorch Lightning, and the outermost purple box denotes our use of the Hydra job manager. The right box depicts a rich set of interactive diagnostic metrics which are served via Streamlit and FiftyOne GUIs. B. A diagram of our cloud application. The application’s critical components are dataset curation, parallel model training, interactive performance diagnostics, and parallel prediction of new videos. C. Screenshots from our cloud application. From left to right: LabelStudio GUI for frame labeling, TensorFlow monitoring of training performance overlaying two different networks, FiftyOne GUI for comparing these two networks’ predictions on a video, and a Streamlit application that shows these two networks’ time series of predictions, confidences, and spatiotemporal constraint violations.

Supplementary information

Supplementary Information (download PDF )

Supplementary Methods, Tables 1–4 and Figs. 1–10.

Reporting Summary (download PDF )

Peer Review File (download PDF )

Supplementary Video 1 (download MP4 )

DeepLabCut model predictions (631 labeled frames), mirror-mouse dataset, corresponding to traces in Fig. 1b. See Supplementary Fig. 5 for a detailed caption.

Supplementary Video 2 (download MP4 )

Selected frames with high Pose PCA errors, mirror-mouse dataset. DLC model trained with 631 labeled frames. See Supplementary Fig. 6 for a detailed caption.

Supplementary Video 3 (download MP4 )

Selected frames with high Pose PCA errors, mirror-fish dataset. DLC model trained with 354 labeled frames. See Supplementary Fig. 6 for a detailed caption.

Supplementary Video 4 (download MP4 )

Selected frames with high Pose PCA errors, CRIM13 dataset. DLC model trained with 800 labeled frames. See Supplementary Fig. 6 for a detailed caption.

Supplementary Video 5 (download MP4 )

Baseline versus semi-supervised TCN model predictions (75 labeled frames), mirror-mouse dataset, corresponding to traces in Fig. 4a. See Supplementary Fig. 5 for a detailed caption.

Supplementary Video 6 (download MP4 )

Baseline versus semi-supervised TCN model predictions (75 labeled frames), mirror-fish dataset, corresponding to traces in Extended Data Fig. 4a. See Supplementary Fig. 5 for a detailed caption.

Supplementary Video 7 (download MP4 )

Baseline versus semi-supervised TCN model predictions (800 labeled frames), CRIM13 dataset, corresponding to traces in Extended Data Fig. 5a. See Supplementary Fig. 5 for a detailed caption.

Supplementary Video 8 (download MP4 )

EKS predictions (multi-view PCA), mirror-mouse dataset, corresponding to the session in Supplementary Fig. 2. See Supplementary Fig. 7 for a detailed caption.

Supplementary Video 9 (download MP4 )

EKS predictions (Pose PCA), IBL-pupil dataset, corresponding to the session in Supplementary Fig. 3. See Supplementary Fig. 7 for a detailed caption.

Supplementary Video 10 (download MP4 )

EKS predictions (multi-view PCA), IBL-paw dataset, corresponding to the session in Supplementary Fig. 4. See Supplementary Fig. 7 for a detailed caption.

Supplementary Video 11 (download MP4 )

EKS predictions (multi-view PCA), mirror-fish dataset. See Supplementary Fig. 7 for a detailed caption.

Supplementary Video 12 (download MP4 )

EKS predictions (temporal), CRIM13 dataset. See Supplementary Fig. 7 for a detailed caption.

Supplementary Video 13 (download MP4 )

Trial-by-trial markers and traces for IBL-pupil dataset, corresponding to the example session in Fig. 6. See Supplementary Fig. 8 for a detailed caption.

Supplementary Video 14 (download MP4 )

Trial-by-trial markers and traces for IBL-paw dataset, corresponding to the example session in Extended Data Fig. 7. See Supplementary Fig. 8 for a detailed caption.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Biderman, D., Whiteway, M.R., Hurwitz, C. et al. Lightning Pose: improved animal pose estimation via semi-supervised learning, Bayesian ensembling and cloud-native open-source tools. Nat Methods 21, 1316–1328 (2024). https://doi.org/10.1038/s41592-024-02319-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41592-024-02319-1

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing