Abstract
Most ambient AI medical scribes process audio only, omitting clinically important visual details. We developed a vision-enabled AI scribe using Google’s Gemini model and Ray-Ban Meta smart glasses to document medication histories—a task requiring both audio and visual input. Ten clinical pharmacists video-recorded 110 simulated medication history interviews. Following iterative prompt engineering on 10 training recordings, the scribe was evaluated on 100 test recordings (2160 data points) across patient details and medication-specific fields. The vision-enabled scribe achieved 98% overall accuracy (2114/2,160 data points), ranging from 96% for patient details to 99% for dosing directions and indication. Video input significantly outperformed audio-only processing (98% vs 81%, P < 0.001), primarily through reduced omissions (10 vs 358 errors). Vision-enabled AI scribes substantially improved documentation accuracy for tasks requiring visual input, demonstrating potential to markedly reduce omission errors in clinical documentation.
Data availability
All data, including the patient and medication details used in the simulated cases and the corresponding AI outputs, are available at https://zenodo.org/records/17032178.
Code availability
The Python code used to create the vision enabled AI scribe is available at [Vision-Enabled-AI-Scribe] ([https://github.com/MenzBD/Vision-Enabled-AI-Scribe/tree/main]). The R scripts used to generate the manuscript and supplementary tables and figures are available at [Vision-Enabled-AI-Scribe] ([https://github.com/MenzBD/Vision-Enabled-AI-Scribe/tree/main]). The data file is restricted to public access, and is available to the reviewers or editor(s) while under peer review.
References
Tierney, A. A. et al. Ambient artificial intelligence scribes to alleviate the burden of clinical documentation. NEJM Catalyst 5, CAT.23.0404 (2024).
Sorich, M. J., Mangoni, A. A., Bacchi, S., Menz, B. D. & Hopkins, A. M. The triage and diagnostic accuracy of frontier large language models: updated comparison to physician performance. J. Med. Internet Res. 26, e67409 (2024).
Menz, B. D. et al. Generative AI chatbots for reliable cancer information: evaluating web-search, multilingual, and reference capabilities of emerging large language models. Eur. J. Cancer 218, 115274 (2025).
Shahnam, A. et al. Application of generative artificial intelligence for physician and patient oncology letters—AI-OncLetters. JCO Clin. Cancer Inform. e2400323 (2025).
Zaretsky, J. et al. Generative artificial intelligence to transform inpatient discharge summaries to patient-friendly language and format. JAMA Netw. Open 7, e240357 (2024).
Fahrner, L. J., Chen, E., Topol, E. & Rajpurkar, P. The generative era of medical AI. Cell 188, 3648–3660 (2025).
Tierney, A. A. et al. Ambient artificial intelligence scribes: learnings after 1 Year and over 2.5 Million Uses. NEJM Catalyst 6, CAT.25.0040 (2025).
Altschuler, S., Huntington, I., Antoniak, M. & Klein, L. F. Clinician as editor: notes in the era of AI scribes. Lancet 404, 2154–2155 (2024).
van Buchem, M. M. et al. The digital scribe in clinical practice: a scoping review and research agenda. npj Digital Med. 4, 57 (2021).
Shah, S. J. et al. Physician perspectives on ambient AI scribes. JAMA Netw. Open 8, e251904 (2025).
Coiera, E., Kocaballi, B., Halamka, J. & Laranjo, L. The digital scribe. NPJ Digit Med. 1, 58 (2018).
Duggan, M. J. et al. Clinician experiences with ambient scribe technology to assist with documentation burden and efficiency. JAMA Netw. Open 8, e2460637 (2025).
Evans, K. et al. Impact of using an AI scribe on clinical documentation and clinician-patient interactions in allied health private practice: perspectives of clinicians and patients. Musculoskelet. Sci. Pract. 78, 103333 (2025).
Shah, S. J. et al. Ambient artificial intelligence scribes: physician burnout and perspectives on usability and documentation burden. J. Am. Med. Inform. Assoc. 32, 375–380 (2025).
Leung, T. I., Coristine, A. J. & Benis, A. AI scribes in health care: balancing transformative potential with responsible integration. JMIR Med. Inform. 13, e80898 (2025).
Albrecht, M. et al. Enhancing clinical documentation with ambient artificial intelligence: a quality improvement survey assessing clinician perspectives on work burden, burnout, and job satisfaction. JAMIA Open 8, ooaf013 (2025).
Biro, J. et al. Accuracy and safety of AI-enabled scribe technology: instrument validation study. J. Med. Internet Res. 27, e64993 (2025).
NHS: Guidance on the use of AI-enabled ambient scribing products in health and care settings. https://www.england.nhs.uk/long-read/guidance-on-the-use-of-ai-enabled-ambient-scribing-products-in-health-and-care-settings/?utm_source=chatgpt.com. Accessed: 01/08/2025. (2025).
Banerji, C. R. S. et al. Clinicians must participate in the development of multimodal AI. eClin. Med. 84, (2025).
Acosta, J. N., Falcone, G. J., Rajpurkar, P. & Topol, E. J. Multimodal biomedical AI. Nat. Med. 28, 1773–1784 (2022).
Rao, V. M. et al. Multimodal generative AI for medical image interpretation. Nature 639, 888–896 (2025).
Lee, J.-O., Zhou, H.-Y., Berzin, T. M., Sodickson, D. K. & Rajpurkar, P. Multimodal generative AI for interpreting 3D medical images and videos. npj Dig. Med. 8, 273 (2025).
Moryousef, J., Nadesan, P., Uy, M., Matti, D. & Guo, Y. Assessing the efficacy and clinical utility of artificial intelligence scribes in urology. Urology 196, 12–17 (2025).
DeepScribe: DeepScribe Outperforms GPT-4 by 59% on AI Medical Scribing: A Benchmark Study (2025). https://www.deepscribe.ai/resources/deepscribe-outperforms-gpt-4-by-59-percent-on-ai-medical-scribing#:~:text=challenging%20input%20characteristics%20such%20as,lack%20of%20medical%20data%20training. Accessed: 01/08/2025.
Stokel-Walker, C. The “ambient scribe” tools listening to and summarising your doctor-patient consultations. BMJ 389, r663 (2025).
Kernberg, A., Gold, J. A. & Mohan, V. Using ChatGPT-4 to create structured medical notes from audio recordings of physician-patient encounters: comparative study. J. Med Internet Res 26, e54419 (2024).
Shemtob, L., Majeed, A. & Beaney, T. Regulation of AI scribes in clinical practice. BMJ 389, r1248 (2025).
Khanna, A. et al. Enhancing accuracy of operative reports with automated artificial intelligence analysis of surgical video. J. Am. Coll. Surg. 240, (2025).
Dhote, M. G. et al. Hybrid vision-language models for real-time surgical report generation and documentation. J. Neonatal Surg. 14, 1–12 (2025).
Sasseville, M. et al. The impact of AI scribes on streamlining clinical documentation: a systematic review. in Healthcare, Vol. 13 (2025).
Draper, T. C. et al. The impact of acoustic and informational noise on AI-generated clinical summaries. medRxiv, 2025.2003. 2024.25324398 (2025).
Generative AI on Vertex AI: Gemini pro https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-pro (2025).
Duffourc, M. N. & Gerke, S. Health care AI and patient privacy—dinerstein v Google. JAMA 331, 909–910 (2024).
Hewson, T. et al. The recording of mental health consultations by patients: clinical, ethical and legal considerations. BJPsych Bull. 46, 133–137 (2022).
Lear, R., Ellis, S., Ollivierre-Harris, T., Long, S. & Mayer, E. K. Video recording patients for direct care purposes: systematic review and narrative synthesis of international empirical studies and UK professional guidance. J. Med. Internet Res. 25, e46478 (2023).
Menz, B. D. et al. Gender representation of health care professionals in large language model-generated stories. JAMA Netw. Open 7, e2434997 (2024).
Menz, B. D. et al. Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis. BMJ 384, e078538 (2024).
Murdoch, B. Privacy and artificial intelligence: challenges for protecting health information in a new era. BMC Med. Ethics 22, 122 (2021).
Rozenblit, L. et al. Toward responsible AI governance: balancing multi-stakeholder perspectives on AI in healthcare. Int. J. Med. Inform. 203, 106015 (2025).
Silcox, C. et al. The potential for artificial intelligence to transform healthcare: perspectives from international health leaders. npj Digital Med. 7, 88 (2024).
Bouderhem, R. Shaping the future of AI in healthcare through ethics and governance. Human. Soc. Sci. Commun. 11, 416 (2024).
World Health Organisation (WHO): Health Data Overview for the United States of America. https://data.who.int/countries/840 (2025).
United States Pharmacopeia (USP): Chapter 17 - prescription container labeling. https://www.usp.org/sites/default/files/usp/webform/c17.pdf.
American Society of Hospital Pharmacists (ASHP): Medication history form. https://www.ashp.org/-/media/assets/pharmacy-technician/docs/EdAndTrainMedHxForm.pdf.
Australian Commission on Safety and Quality in Health Care: How to take a BPMH - resources and tools. https://www.ashp.org/-/media/assets/pharmacy-technician/docs/EdAndTrainMedHxForm.pdf.
Zaghir, J. et al. Prompt engineering paradigms for medical applications: scoping review. J. Med. Internet Res. 26, e60501 (2024).
Meskó, B. Prompt engineering as an important emerging skill for medical professionals: tutorial. J. Med. Internet Res. 25, e50638 (2023).
El Hajj, M. S., Asiri, R., Husband, A. & Todd, A. Medication errors in community pharmacies: a systematic review of the international literature. PLoS ONE 20, e0322392 (2025).
Uhlenhopp, D. J. et al. Hospital-wide medication reconciliation program: error identification, cost-effectiveness, and detecting high-risk individuals on admission. Integr. Pharm. Res. Pract. 9, 195–203 (2020).
Acknowledgements
The PhD scholarship of B.D.M. is supported by the National Health and Medical Research Council, Australia (APP2030913). A.M.H. holds an Emerging Leader Investigator Fellow, National Health and Medical Research Council, Australia (APP2008119). M.J.S. is supported by a Beat Cancer Research Fellowship from the Cancer Council South Australia. S.B. is supported by a Fulbright Scholarship. The funding organisations had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication. Disclosures: The authors used ChatGPT, Claude, and Grammarly AI to assist in formatting and editing the manuscript. Figures 1 and 3 were generated using Biorender.
Author information
Authors and Affiliations
Contributions
Concept and study design—B.M., A.M.H., M.J.S. Creation of AI scribe—B.M. Data collection—B.M., N.L.S., N.D.M., E.C., L.X.L., J.Q.E.T., J.G., D.M., D.K., K.D., V.M. Statistical analysis—B.M., A.M.H., M.J.S. Generation of figures—B.M. Interpretation of results—B.M., N.L.S., N.D.M., E.C., L.X.L., J.Q.E.T., J.G., D.M., D.K., K.D., V.M., S.B., R.A.M., M.D.W., A.R., M.J.S., A.M.H. Manuscript writing—B.M., A.M.H. Manuscript editing and review—B.M., N.L.S., N.D.M., E.C., L.X.L., J.Q.E.T., J.G., D.M., D.K., K.D., V.M., S.B., R.A.M., M.D.W., A.R., M.J.S., A.M.H. Supervision—A.M.H., A.R., M.J.S. Funding acquisition—B.M., A.M.H. All authors approved submission of the final manuscript.
Corresponding author
Ethics declarations
Competing interests
A.M.H. is a recipient of investigator-initiated funding for research outside the scope of the current study from Boehringer Ingelheim. A.R. and M.J.S. are recipients of investigator-initiated funding for research outside the scope of the current study from AstraZeneca, Boehringer Ingelheim, Pfizer and Takeda. A.R. is a recipient of speaker fees from Boehringer Ingelheim and Genentech. The author team have no other potential conflicts of interest with respect to this research and/or publication to declare.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Menz, B.D., Scarfo, N.L., Modi, N.D. et al. Vision-Enabled AI scribes reduce omissions in clinical conversations: evidence from simulated medication histories. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02494-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41746-026-02494-9