Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

General-purpose foundation models for increased autonomy in robot-assisted surgery

Abstract

The dominant paradigm for end-to-end robot learning focuses on optimizing task-specific objectives that solve a single robotic problem such as picking up an object or reaching a target position. However, recent work on high-capacity models in robotics has shown promise towards being trained on large collections of diverse and task-agnostic datasets of video demonstrations. These models have shown impressive levels of generalization to unseen circumstances, especially as the amount of data and the model complexity scale. Surgical robot systems that learn from data have struggled to advance as quickly as other fields of robot learning for a few reasons: there is a lack of existing large-scale open-source data to train models; it is challenging to model the soft-body deformations that these robots work with during surgery because simulation cannot match the physical and visual complexity of biological tissue; and surgical robots risk harming patients when tested in clinical trials and require more extensive safety measures. This Perspective aims to provide a path towards increasing robot autonomy in robot-assisted surgery through the development of a multi-modal, multi-task, vision–language–action model for surgical robots. Ultimately, we argue that surgical robots are uniquely positioned to benefit from general-purpose models and provide four guiding actions towards increased autonomy in robot-assisted surgery.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: An architecture diagram of the proposed vision–language–action robot transformer.
Fig. 2: A proposed control loop for the autonomous RT-RAS.
Fig. 3: Outline of the two-step pre-training for the RT-RAS.

Similar content being viewed by others

References

  1. Blakeslee, S. Robot arm assists in three brain operations. The New York Times (25 June 1985).

  2. Seo, H.-J. et al. Comparison of robot-assisted radical prostatectomy and open radical prostatectomy outcomes: a systematic review and meta-analysis. Yonsei Med. J. 57, 1165–1177 (2016).

    Google Scholar 

  3. Sheetz, K. H., Claflin, J. & Dimick, J. B. Trends in the adoption of robotic surgery for common surgical procedures. JAMA Netw. Open 3, e1918911 (2020).

    Google Scholar 

  4. Dhanani, N. H. et al. The evidence behind robot-assisted abdominopelvic surgery: a systematic review. Ann. Intern. Med. 174, 1110–1117 (2021).

    Google Scholar 

  5. Lotan, Y. Is robotic surgery cost-effective: no. Curr. Opin. Urol. 22, 66–69 (2012).

    Google Scholar 

  6. Shademan, A. et al. Supervised autonomous robotic soft tissue surgery. Sci. Transl. Med. 8, 337ra64 (2016).

    Google Scholar 

  7. Saeidi, H. et al. Autonomous robotic laparoscopic surgery for intestinal anastomosis. Sci. Robot. 7, eabj2908 (2022).

    Google Scholar 

  8. Kuntz, A. et al. Autonomous medical needle steering in vivo. Sci. Robot. 8, eadf7614 (2023).

    Google Scholar 

  9. Richter, F. et al. Autonomous robotic suction to clear the surgical field for hemostasis using image-based blood flow detection. IEEE Robot. Autom. Lett. 6, 1383–1390 (2021).

    Google Scholar 

  10. Reed, S. et al. A generalist agent. Trans. Mach. Learn. Res. https://openreview.net/forum?id=1ikK0kHjvj (2022).

  11. Brohan, A. et al. RT-1: robotics transformer for real-world control at scale. In Proc. Robotics: Science and Systems XIX (eds Bekris, K. et al.) 25 (RSS, 2023).

  12. Zitkovich, B. et al. RT-2: vision-language-action models transfer web knowledge to robotic control. In Conference on Robot Learning 2165–2183 (PMLR, 2023).

  13. Open X-Embodiment Collaboration. Open X-Embodiment: robotic learning datasets and RT-X models. GitHub https://robotics-transformer-x.github.io (2023).

  14. Hu, Y. et al. Toward general-purpose robots via foundation models: a survey and meta-analysis. Preprint at https://arxiv.org/abs/2312.08782 (2023).

  15. Arulkumaran, K., Deisenroth, M. P., Brundage, M. & Bharath, A. A. Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34, 26–38 (2017).

    Google Scholar 

  16. Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V. & Hutter, M. Learning quadrupedal locomotion over challenging terrain. Sci. Robot. 5, eabc5986 (2020).

    Google Scholar 

  17. Agarwal, A., Kumar, A., Malik, J. & Pathak, D. Legged locomotion in challenging terrains using egocentric vision. In Conference on Robot Learning 403–415 (PMLR, 2023).

  18. Liu, R., Nageotte, F., Zanne, P., de Mathelin, M. & Dresp-Langley, B. Deep reinforcement learning for the control of robotic manipulation: a focussed mini-review. Robotics 10, 22 (2021).

    Google Scholar 

  19. Zhao, T. Z., Kumar, V., Levine, S. & Finn, C. Learning fine-grained bimanual manipulation with low-cost hardware. In Proc. Robotics: Science and Systems XIX (eds Bekris, K. et al.) 16 (RSS, 2023).

  20. Yip, M. & Das, N. in The Encyclopedia of MEDICAL ROBOTICS: Volume 1 Minimally Invasive Surgical Robotics (ed. Patel, R. V.) 281–313 (World Scientific, 2019).

  21. Zhang, C., Vinyals, O., Munos, R. & Bengio, S. A study on overfitting in deep reinforcement learning. Preprint at https://arxiv.org/abs/1804.06893 (2018).

  22. Van Den Berg, J. et al. Superhuman performance of surgical tasks by robots using iterative learning from human-guided demonstrations. In 2010 IEEE International Conference on Robotics and Automation 2074–2081 (IEEE, 2010).

  23. Hu, Y. et al. Model predictive optimization for imitation learning from demonstrations. Robot. Auton. Syst. 163, 104381 (2023).

    Google Scholar 

  24. Huang, T., Chen, K., Li, B., Liu, Y. H. & Dou, Q. Demonstration-guided reinforcement learning with efficient exploration for task automation of surgical robot. In 2023 IEEE International Conference on Robotics and Automation (ICRA) 4640–4647 (IEEE, 2023).

  25. Osa, T. et al. An algorithmic perspective on imitation learning. Found. Trends Robot. 7, 1–179 (2018).

    Google Scholar 

  26. Ibarz, J. et al. How to train your robot with deep reinforcement learning: lessons we have learned. Int. J. Robot. Res. 40, 698–721 (2021).

    Google Scholar 

  27. Octo Model Team et al. Octo: an open-source generalist robot policy. GitHub https://octo-models.github.io (2023).

  28. Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://arxiv.org/abs/2108.07258 (2021).

  29. Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).

    Google Scholar 

  30. Touvron, H. et al. Llama 2: open foundation and fine-tuned chat models. Preprint at https://arxiv.org/abs/2307.09288 (2023).

  31. Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems 30 (eds Guyon, I. et al.) (NIPS 2017).

  32. Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. In International Conference on Learning Representations (ICLR, 2021).

  33. Zemmar, A., Lozano, A. M. & Nelson, B. J. The rise of robots in surgical environments during COVID-19. Nat. Mach. Intell. 2, 566–572 (2020).

    Google Scholar 

  34. Wang, K., Ho, C.-C., Zhang, C. & Wang, B. A review on the 3D printing of functional structures for medical phantoms and regenerated tissue and organ applications. Engineering 3, 653–662 (2017).

    Google Scholar 

  35. Ghazi, A. A call for change. Can 3D printing replace cadavers for surgical training? Urol. Clin. 49, 39–56 (2022).

    Google Scholar 

  36. Bismuth, H. Surgical anatomy and anatomical surgery of the liver. World J. Surg. 6, 3–9 (1982).

    Google Scholar 

  37. Rice, C. P. et al. Operative complications and economic outcomes of cholecystectomy for acute cholecystitis. World J. Gastroenterol. 25, 6916 (2019).

    Google Scholar 

  38. Kumar, A., Zhou, A., Tucker, G. & Levine, S. Conservative q-learning for offline reinforcement learning. Adv. Neural Inf. Process. Syst. 33, 1179–1191 (2020).

    Google Scholar 

  39. Yevgen, C. et al. Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions. In Conference on Robot Learning 3909–3928 (PMLR, 2023).

  40. Angelopoulos, A. N. & Bates, S. A gentle introduction to conformal prediction and distribution-free uncertainty quantification. Preprint at https://arxiv.org/abs/2107.07511 (2021).

  41. Ren, A. Z. et al. Robots that ask for help: uncertainty alignment for large language model planners. In Conference on Robot Learning 661–682 (PMLR, 2023).

  42. Zhang, T. Toward automated vehicle teleoperation: vision, opportunities, and challenges. IEEE Internet Things J. 7, 11347–11354 (2020).

    Google Scholar 

  43. Lim, T., Hwang, M., Kim, E. & Cha, H. Authority transfer according to a driver intervention intention considering coexistence of communication delay. Computers 12, 228 (2023).

    Google Scholar 

  44. Alhajj, H., Lamard, M., Conze, P.-h., Cochener, B. & Quellec, G. Cataracts. IEEEDataPort https://doi.org/10.21227/ac97-8m18 (2021).

  45. Schoeffmann, K. et al. Cataract-101: video dataset of 101 cataract surgeries. In Proc. 9th ACM Multimedia Systems Conference 421–425 (ACM, 2018).

  46. Bouget, D. et al. Detecting surgical tools by modelling local appearance and global shape. IEEE Trans. Med. Imaging 34, 2603–2617 (2015).

    Google Scholar 

  47. Twinanda, A. P. et al. EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 36, 86–97 (2016).

    Google Scholar 

  48. Hong, W.-Y. et al. CholecSeg8k: a semantic segmentation dataset for laparoscopic cholecystectomy based on Cholec80. Preprint at https://arxiv.org/abs/2012.12453 (2020).

  49. Nwoye, C. I. et al. Rendezvous: attention mechanisms for the recognition of surgical action triplets in endoscopic videos. Med. Image Anal. 78, 102433 (2022).

    Google Scholar 

  50. Maier-Hein, L. et al. Heidelberg colorectal data set for surgical data science in the sensor operating room. Sci. Data 8, 101 (2021).

    Google Scholar 

  51. Valderrama, N. et al. Towards holistic surgical scene understanding. In International Conference on Medical Image Computing and Computer-assisted Intervention 442–452 (Springer, 2022).

  52. Gao, Y. et al. Jhu-isi gesture and skill assessment working set (jigsaws): a surgical activity dataset for human motion modeling. MICCAI Workshop: M2cai https://api.semanticscholar.org/CorpusID:16185857 (2014).

  53. Madapana, N. et al. Desk: a robotic activity dataset for dexterous surgical skills transfer to medical robots. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 6928–6934 (IEEE, 2019).

  54. Huaulmé, A. et al. Peg Transfer Workflow recognition challenge report: does multi-modal data improve recognition? Preprint at https://arxiv.org/abs/2202.05821 (2022).

  55. Rivas-Blanco, I., Del-Pulgar, C. J. P., Mariani, A., Tortora, G. & Reina, A. J. A surgical dataset from the da Vinci research kit for task automation and recognition. In 2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME) 1–6 (IEEE, 2023).

  56. Goodman, E. D. et al. A real-time spatiotemporal AI model analyzes skill in open surgical videos. Preprint at https://arxiv.org/abs/2112.07219 (2021).

  57. Yuan, K. et al. Learning multi-modal representations by watching hundreds of surgical video lectures. Preprint at https://arxiv.org/abs/2307.15220 (2023).

  58. Schmidgall, S., Cho, J., Zakka, C. & Hiesinger, W. GP-VLS: a general-purpose vision language model for surgery. Preprint at https://arxiv.org/abs/2407.19305 (2024).

  59. Kim, H.-S., Kim, D.-J. & Yoon, K.-H. Medical big data is not yet available: why we need realism rather than exaggeration. Endocrinol. Metab. 34, 349–354 (2019).

    Google Scholar 

  60. Gabelica, M., Bojčić, R. & Puljak, L. Many researchers were not compliant with their published data sharing statement: a mixed-methods study. J. Clin. Epidemiol. 150, 33–41 (2022).

    Google Scholar 

  61. Hamilton, D. G. et al. Prevalence and predictors of data and code sharing in the medical and health sciences: systematic review with meta-analysis of individual participant data. BMJ 382, e075767 (2023).

  62. Lin, J. et al. Automatic analysis of available source code of top artificial intelligence conference papers. Int. J. Softw. Eng. Knowl. Eng. 32, 947–970 (2022).

    Google Scholar 

  63. Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021).

    Google Scholar 

  64. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Google Scholar 

  65. Wu, C., Zhang, X., Zhang, Y., Wang, Y. & Xie, W. Towards generalist foundation model for radiology. Preprint at https://arxiv.org/abs/2308.02463 (2023).

  66. Wang, D. et al. A real-world dataset and benchmark for foundation model adaptation in medical image classification. Sci. Data 10, 574 (2023).

    Google Scholar 

  67. Hsu, L. G. et al. Nonsurgical factors that influence the outcome of bariatric surgery: a review. Psychosom. Med. 60, 338–346 (1998).

    Google Scholar 

  68. Benoist, S., Panis, Y., Alves, A. & Valleur, P. Impact of obesity on surgical outcomes after colorectal resection. Am. J. Surg. 179, 275–281 (2000).

    Google Scholar 

  69. Rosenberger, P. H., Jokl, P. & Ickovics, J. Psychosocial factors and surgical outcomes: an evidence-based literature review. J. Am. Acad. Orthop. Surg. 14, 397–405 (2006).

    Google Scholar 

  70. Lam, K. et al. Machine learning for technical skill assessment in surgery: a systematic review. npj Digit. Med. 5, 24 (2022).

    Google Scholar 

  71. Khalid, S., Goldenberg, M., Grantcharov, T., Taati, B. & Rudzicz, F. Evaluation of deep learning models for identifying surgical actions and measuring performance. JAMA Netw. Open 3, e201664–e201664 (2020).

    Google Scholar 

  72. Haque, T. F. et al. An assessment tool to provide targeted feedback to robotic surgical trainees: development and validation of the end-to-end assessment of suturing expertise (EASE). Urol. Pract. 9, 532–539 (2022).

    Google Scholar 

  73. Moon, M. R. Early-and late-career surgeon deficiencies in complex cases. J. Thorac. Cardiovasc. Surg. 164, 1023–1025 (2022).

    Google Scholar 

  74. O’Sullivan, S. et al. Legal, regulatory, and ethical frameworks for development of standards in artificial intelligence (AI) and autonomous robotic surgery. Int. J. Med. Robot. Comput. Assist. Surg. 15, e1968 (2019).

    Google Scholar 

  75. Van Norman, G. A. Drugs, devices, and the FDA: part 2: an overview of approval processes: FDA approval of medical devices. JACC Basic Transl. Sci. 1, 277–287 (2016).

    Google Scholar 

  76. Kim, J. W. et al. Surgical robot transformer (SRT): imitation learning for surgical tasks. In Conference on Robot Learning (PMLR, 2024).

  77. Beasley, R. A. Medical robots: current systems and research directions. J. Robot. 2012, 401613 (2012).

    Google Scholar 

  78. Lee, C. et al. A grip force model for the da Vinci end-effector to predict a compensation force. Med. Biol. Eng. Comput. 5, 253–261 (2015).

    Google Scholar 

Download references

Acknowledgements

This material is based on work supported by the National Science Foundation under grant numbers DGE 2139757, NSF/FRR 2144348, NIH R56EB033807 and ARPA-H AY1AX000023.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samuel Schmidgall.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Francesco Stella and Zhen Li for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schmidgall, S., Kim, J.W., Kuntz, A. et al. General-purpose foundation models for increased autonomy in robot-assisted surgery. Nat Mach Intell 6, 1275–1283 (2024). https://doi.org/10.1038/s42256-024-00917-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s42256-024-00917-4

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics