Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

npj Digital Medicine
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. npj digital medicine
  3. articles
  4. article
Towards accurate and interpretable competency-based assessment: enhancing clinical competency assessment through multimodal AI and anomaly detection
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 03 February 2026

Towards accurate and interpretable competency-based assessment: enhancing clinical competency assessment through multimodal AI and anomaly detection

  • Sapir Gershov1,2,
  • Fadi Mahameed3,4,
  • Aeyal Raz3,5 &
  • …
  • Shlomi Laufer1,4 

npj Digital Medicine , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computational biology and bioinformatics
  • Health care
  • Mathematics and computing
  • Medical research

Abstract

Artificial Intelligence (AI) is reshaping medical education, particularly in the domain of competency-based assessment, where current methods remain subjective and resource-intensive. We introduce a multimodal AI framework that integrates video, audio, and patient monitor data to provide objective and interpretable competency assessments. Using 90 anesthesia residents, we established “ideal” performance benchmarks and trained an anomaly detection model (MEMTO) to quantify deviations from these benchmarks. Competency scores derived from these deviations showed strong alignment with expert ratings (Spearman’s ρ = 0.78; ICC = 0.75) and demonstrated high ranking precision (Relative L2-distance = 0.12). SHAP analysis revealed that communication and eye contact with the patient monitor are key drivers of variability. By linking AI-assisted anomaly detection with interpretable feedback, our framework addresses critical challenges of fairness, reliability, and transparency in simulation-based education. This work provides actionable evidence for integrating AI into medical training and advancing scalable, equitable evaluation of competence.

Data availability

All requests for raw and analyzed data, as well as related materials, will be reviewed by our legal department (Technion–Israel Institute of Technology) to verify whether the request is subject to any intellectual property or confidentiality constraints. Any data and materials that can be shared will be released via a material transfer agreement for noncommercial research purposes. The request should be addressed to S.L.

Code availability

All requests for programming code should be addressed to S.L. Any materials that can be shared will be released via a material transfer agreement for noncommercial research purposes. All analyses were conducted on a computing cluster with two NVIDIA RTX A6000 48GB GPUs. The software environment consisted of Ubuntu 20.04 LTS, Python (v3.11), and PyTorch (v2.00).

References

  1. Lam, K. et al. Machine learning for technical skill assessment in surgery: a systematic review. npj Digit. Med. 5, 24 (2022).

    Google Scholar 

  2. Organization, W. H. Global patient safety action plan 2021–2030: towards eliminating avoidable harm in health care (World Health Organization, 2021).

  3. Turner, L., Hashimoto, D. A., Vasisht, S. & Schaye, V. Demystifying AI: current state and future role in medical education assessment. Acad. Med. 99, S42–S47 (2024).

    Google Scholar 

  4. Liu, D. et al. Towards unified surgical skill assessment. in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 9517–9526 (IEEE, 2021).

  5. Holmboe, E. S. & Durning, S. J. Practical Guide to the Assessment of Clinical Competence, (Elsevier Health Sciences, 2024).

  6. Elendu, C. et al. The impact of simulation-based training in medical education: a review. Medicine 103, e38813 (2024).

    Google Scholar 

  7. Glavin, R. J. & Gaba, D. M. Challenges and opportunities in simulation and assessment. Simul. Healthc. 3, 69–71 (2008).

    Google Scholar 

  8. Berkenstadt, H., Ziv, A., Gafni, N. & Sidi, A. Incorporating simulation-based objective structured clinical examination into the Israeli national board examination in anesthesiology. Anesth. Analg. 102, 853–858 (2006).

    Google Scholar 

  9. Petrusa, E. R. Current challenges and future opportunities for simulation in high-stakes assessment. Simul. Healthc. 4, 3–5 (2009).

    Google Scholar 

  10. Norcini, J. J. Setting standards on educational tests. Med. Educ. 37, 464–469 (2003).

    Google Scholar 

  11. Khalid, S., Goldenberg, M., Grantcharov, T., Taati, B. & Rudzicz, F. Evaluation of deep learning models for identifying surgical actions and measuring performance. JAMA Netw. Open 3, e201664 (2020).

    Google Scholar 

  12. Kiyasseh, D. et al. A vision transformer for decoding surgeon activity from surgical videos. Nat. Biomed. Eng. 7, 780–796 (2023).

    Google Scholar 

  13. Fazlollahi, A. M. et al. Effect of artificial intelligence tutoring vs expert instruction on learning simulated surgical skills among medical students. JAMA Netw. Open 5, e2149008 (2022).

    Google Scholar 

  14. Igaki, T. et al. Automatic surgical skill assessment system based on concordance of standardized surgical field development using artificial intelligence. JAMA Surg. 158, e231131 (2023).

    Google Scholar 

  15. Boal, M. W. E. et al. Evaluation of objective tools and artificial intelligence in robotic surgery technical skills assessment: a systematic review. Br. J. Surg. 111 (2024).

  16. Tekin, M., Yurdal, M. O. O., Toraman, Ç, Korkmaz, G. & Uysal, İ Is AI the future of evaluation in medical education? AI vs. human evaluation in objective structured clinical examination. BMC Med. Educ. 25, 641 (2025).

    Google Scholar 

  17. Zhou, K., Ma, Y., Shum, H. P. H. & Liang, X. Hierarchical graph convolutional networks for action quality assessment. IEEE Trans. Circuits Syst. Video Technol. 33, 7749–7763 (2023).

    Google Scholar 

  18. Wang, S. et al. A survey of video-based action quality assessment. in 2021 International Conference on Networking Systems of AI (INSAI) 1–9 (IEEE, 2021).

  19. Mascagni, P. et al. Computer vision in surgery: from potential to clinical value. npj Digit. Med. 5, 163 (2022).

    Google Scholar 

  20. Carciumaru, T. Z. et al. Systematic review of machine learning applications using nonoptical motion tracking in surgery. npj Digit. Med. 8, 28 (2025).

    Google Scholar 

  21. Harari, R. E. et al. Deep learning analysis of surgical video recordings to assess nontechnical skills. JAMA Netw. Open 7, e2422520 (2024).

    Google Scholar 

  22. Wang, Z. & Majewicz Fey, A. Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery. Int. J. Comput. Assist. Radiol. Surg. (IJCARS 13, 1959–1970 (2018).

    Google Scholar 

  23. Levin, M., McKechnie, T., Khalid, S., Grantcharov, T. P. & Goldenberg, M. Automated methods of technical skill assessment in surgery: a systematic review. J. Surg. Educ. 76, 1629–1639 (2019).

    Google Scholar 

  24. Pirsiavash, H., Vondrick, C. & Torralba, A. Assessing the quality of actions. in European Conference on Computer Vision (ECCV) 556–571 (Springer International Publishing, 2014).

  25. Lei, Q., Du, J.-X., Zhang, H.-B., Ye, S. & Chen, D.-S. A survey of vision-based human action evaluation methods. Sensors 19, 4129 (2019).

    Google Scholar 

  26. Kiyasseh, D. et al. Human visual explanations mitigate bias in AI-based assessment of surgeon skills. npj Digit. Med. 6, 54 (2023).

    Google Scholar 

  27. Gao, Y. et al. JHU-ISI gesture and skill assessment working set (JIGSAWS): a surgical activity dataset for human motion modeling. https://cirl.lcsr.jhu.edu/wp-content/uploads/2015/11/JIGSAWS.pdf (2014).

  28. Zhang, Y., Weng, Y. & Wang, B. CWT-ViT: a time–frequency representation and vision transformer-based framework for automated robotic surgical skill assessment. Expert Syst. Appl. 258, 125064 (2024).

    Google Scholar 

  29. Caloca-Amber, S., Mauriz, E. & Vázquez-Casares, A. M. Exploring eye-tracking data as an indicator of situational awareness in nursing students during a cardiorespiratory arrest simulation. Nurse Educ. Pract. 76, 103911 (2024).

    Google Scholar 

  30. Hu, H. et al. Application of eye-tracking in nursing research: a scoping review. Nurs. Open 11, e2108 (2024).

  31. Pugh, C. M. Quantifying performance decline in the operating room using fNIRS. Ann. Surg. 272, 658–659 (2020).

    Google Scholar 

  32. Nemani, A. et al. Objective assessment of surgical skill transfer using non-invasive brain imaging. Surg. Endosc. 33, 2485–2494 (2019).

    Google Scholar 

  33. Kamat, A. et al. Assessment of surgical tasks using neuroimaging dataset (ASTaUND). Sci. Data 10, 699 (2023).

    Google Scholar 

  34. Wang, S. et al. A survey of video-based action quality assessment. in International Conference on Networking Systems of AI (INSAI) 1–9 (IEEE, 2021).

  35. Li, G. & Jung, J. J. Deep learning for anomaly detection in multivariate time series: approaches, applications, and challenges. Inf. Fusion 91, 93–102 (2023).

    Google Scholar 

  36. Choi, K., Yi, J., Park, C. & Yoon, S. Deep learning for anomaly detection in time-series data: review, analysis, and guidelines. IEEE Access 9, 120043–120065 (2021).

    Google Scholar 

  37. Song, Junho. et al. Memto: Memory-guided transformer for multivariate time series anomaly detection. Adv. Neural Inf. Process. Syst 36, 57947–57963 (2023).

    Google Scholar 

  38. Kathrine Kollerup, N. et al. Clinical needs and preferences for AI-based explanations in clinical simulation training. Behav. Inf. Technol. 44 (2025).

  39. Calisto, F. M., Abrantes, J. M., Santiago, C., Nunes, N. J. & Nascimento, J. C. Personalized explanations for clinician-AI interaction in breast imaging diagnosis by adapting communication to expertise levels. Int. J. Hum. Comput. Stud. 197, 103444 (2025).

    Google Scholar 

  40. Yibulayimu, S. et al. An explainable machine learning method for assessing surgical skill in liposuction surgery. Int. J. Comput. Assist. Radiol. Surg. 17, 2325–2336 (2022).

    Google Scholar 

  41. Wang, T., Jin, M. & Li, M. Towards accurate and interpretable surgical skill assessment: a video-based method for skill score prediction and guiding feedback generation. Int. J. Comput. Assist. Radiol. Surg. 16, 1595–1605 (2021).

    Google Scholar 

  42. Yamada, K. et al. Surgical skill analysis using explainable AI in endoscopic sinus surgery. in IEEE/SICE International Symposium on System Integration (SII) 287–291 (IEEE, Munich, Germany, 2025).

  43. Mislevy, R. J., Almond, R. G. & Lukas, J. F. A brief introduction to evidence-centered design. ETS Res. Rep. Ser 2003, i–29 (2003).

    Google Scholar 

  44. Wilson, M. Constructing measures: an item response modeling approach, (Routledge, New York, 2023).

  45. Gershov, S., Mahameed, F., Raz, A. & Laufer, S. More than meets the eye: physicians’ visual attention in the operating room. in International Workshop on Applications of Medical AI 11–20 (Springer Nature Switzerland, 2024).

  46. Wang, A. et al. Yolov10: real-time end-to-end object detection. in Advances in Neural Information Processing Systems 37, 107984–108011 (2024).

  47. Aharon, N.a.O., Roy and Bobrovsky, Ben-Zion. BoT-SORT: robust associations multi-pedestrian tracking. arXiv https://arxiv.org/abs/2206.14651 (2022).

  48. Wang, Y., Wang, Z., Liu, L. & Daniilidis, K. TRAM: global trajectory and motion of 3D humans from in-the-wild videos. in European Conference on Computer Vision (ECCV) 467–487 (Springer Nature Switzerland, 2025).

  49. Li, C. et al. ESPnet-SE: end-to-end speech enhancement and separation toolkit designed for ASR integration. in IEEE Spoken Language Technology Workshop (SLT) 785–792 (IEEE, 2021).

  50. Hu, Y. et al. DCCRN: deep complex convolution recurrent network for phase-aware speech enhancement. in INTERSPEECH 2472–2476 (Shanghai, China, 2020).

  51. Liao, H. et al. DocTr: document transformer for structured information extraction in documents. in IEEE/CVF International Conference on Computer Vision (ICCV) 19527–19537 (IEEE Computer Society, 2023).

  52. Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. in Advances in Neural Information Processing Systems 30 (eds. Guyon, I. et al.) 4765–4774 (Curran Associates, Inc., 2017).

  53. Antwarg, L., Miller, R. M., Shapira, B. & Rokach, L. Explaining anomalies detected by autoencoders using shapley additive explanations. Expert Syst. Appl. 186, 115736 (2021).

    Google Scholar 

Download references

Acknowledgements

We thank the Technion Autonomous Systems Program for support.

Author information

Authors and Affiliations

  1. Technion Autonomous Systems Program, Technion - Israel Institute of Technology, Haifa, Israel

    Sapir Gershov & Shlomi Laufer

  2. Department of Psychiatry, NYU Grossman School of Medicine, New York, NY, USA

    Sapir Gershov

  3. Rambam Health Care Campus, Haifa, Israel

    Fadi Mahameed & Aeyal Raz

  4. Faculty of Data and Decision Sciences, Technion - Israel Institute of Technology, Haifa, Israel

    Fadi Mahameed & Shlomi Laufer

  5. Rappaport Faculty of Medicine, Technion - Israel Institute of Technology, Haifa, Israel

    Aeyal Raz

Authors
  1. Sapir Gershov
    View author publications

    Search author on:PubMed Google Scholar

  2. Fadi Mahameed
    View author publications

    Search author on:PubMed Google Scholar

  3. Aeyal Raz
    View author publications

    Search author on:PubMed Google Scholar

  4. Shlomi Laufer
    View author publications

    Search author on:PubMed Google Scholar

Contributions

S.L. was responsible for the conceptualization and overall supervision of the study, as well as securing the funding. S.G. led the formal analysis, methodology development, and visualization, and drafted the original manuscript. S.G. also performed validation and final editing under the guidance of S.L. and A.R., who provided critical feedback and contributed to the review and editing process. F.M. conducted data curation and was responsible for data collection and project administration. All authors reviewed and approved the final version of the manuscript.

Corresponding author

Correspondence to Shlomi Laufer.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gershov, S., Mahameed, F., Raz, A. et al. Towards accurate and interpretable competency-based assessment: enhancing clinical competency assessment through multimodal AI and anomaly detection. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-025-02299-2

Download citation

  • Received: 11 September 2025

  • Accepted: 17 December 2025

  • Published: 03 February 2026

  • DOI: https://doi.org/10.1038/s41746-025-02299-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Associated content

Collection

Transforming Medical Education through Artificial Intelligence

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Collections
  • Follow us on Twitter
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Content types
  • Journal Information
  • About the Editors
  • Contact
  • Editorial policies
  • Calls for Papers
  • Journal Metrics
  • About the Partner
  • Open Access
  • Early Career Researcher Editorial Fellowship
  • Editorial Team Vacancies
  • News and Views Student Editor
  • Communication Fellowship

Publish with us

  • For Authors and Referees
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

npj Digital Medicine (npj Digit. Med.)

ISSN 2398-6352 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics