Beyond translation: a patient-centered research agenda for artificial intelligence interpreter services in healthcare

Lynch, Olivia F.; Witt, Emily E.; Fernandez, Alicia; Rodriguez, Jeslyn A.; Montoya Rubiano, Maria Alejandra; Burstin, Helen; Sheridan, Susan; Hightower, Maia; Bates, David W.; Johnson, Kevin B.; Bergmark, Regan W.; Ortega, Gezzer

doi:10.1038/s41746-026-02764-6

Download PDF

Comment
Open access
Published: 15 May 2026

Beyond translation: a patient-centered research agenda for artificial intelligence interpreter services in healthcare

Olivia F. Lynch^1,2,
Emily E. Witt¹,
Alicia Fernandez³,
Jeslyn A. Rodriguez⁴,
Maria Alejandra Montoya Rubiano¹,
Helen Burstin⁵,
Susan Sheridan⁶,
Maia Hightower^7,8,
David W. Bates^9,10,11,
Kevin B. Johnson¹²,
Regan W. Bergmark^1,13 &
…
Gezzer Ortega^1,13,14

npj Digital Medicine volume 9, Article number: 376 (2026) Cite this article

1933 Accesses
1 Altmetric
Metrics details

Subjects

Artificial intelligence (AI) is increasingly utilized in healthcare, including in language access services, but certain aspects remain understudied. We offer a research agenda to guide the development of evidence on how AI language access services are perceived by patients and how they impact trust and comprehension in clinical encounters, and to inform implementation strategies. We recommend a governance system to mitigate potential harm and capitalize on benefits for patients with a non-English language preference.

Introduction

Artificial intelligence (AI) is rapidly reshaping medicine. From guiding diagnostic decisions to supporting patient-facing applications to generating clinical documentation, AI systems now influence multiple facets of healthcare delivery^1,2,3. This expansion reflects the rapid maturation of machine-learning models and their integration into routine clinical workflows. Interpreter services are a key area of healthcare prime for implementation of AI tools. However, it remains unclear whether academic research and rigorous evaluations have kept pace with the emergence of new applications of AI, particularly those that are patient-facing.

Using four AI-enabled knowledge platforms, ChatGPT, Perplexity, Gemini, and OpenEvidence, we ran structured exploratory queries to generate estimates of the number of publications from the past five years addressing AI in healthcare or AI in health services. We then narrowed our query to those addressing interpreter services, and finally to those including patient perspectives, patient experience, or patient-reported outcomes. Outputs meeting all three concepts were classified as “all criteria.” Results were reported verbatim as returned by each platform. We found that from 2019 to 2024, AI-related healthcare publications increased from ~11,500 in 2019 to over 28,000 in 2024 (Fig. 1). However, despite this rapid expansion, research specifically exploring the impact of AI on services that shape the patient experience, including interpreter services for patients with a non-English language preference (NELP), remains scarce (Table 1).

Table 1 Evidence gap: publications on AI interpreter services and inclusion of patient perspectives (2020–2025^a)

Full size table

Lessons from algorithmic bias in healthcare

Our results in Table 1 highlight a pressing research and policy need. When deployed without thorough evaluation across diverse linguistic and cultural groups, AI-supported communication tools may introduce or amplify inequities.

Recent work has documented communication-specific risks across language technologies. Automated speech recognition (ASR) systems show reduced accuracy for certain accents and dialects, while neural machine translation (NMT) models demonstrate variable performance across languages and clinical contexts^4,5,6,7. A recent systematic review of NMT technologies in healthcare settings reported substantial variability in translation accuracy and emphasized the need for systematic evaluation before clinical implementation⁸. The limitations of existing language technologies underscore the need for context-specific assessment of AI-based interpreter systems prior to use in high-stakes communication.

Currently, there is no established methodology to ensure the accuracy of AI-based interpretation. As new technologies emerges promising to aid interpretation, it is essential that we develop clear metrics with which to assess accuracy. Without careful evaluation of these novel technologies’ performance across languages and cultural contexts, they risk increasing communication-based disparities in clinical encounters.

The emerging role of AI in interpreter services

AI-powered language access tools, including ASR, real-time translation or interpretation systems, and video-based avatars, are increasingly being considered as alternatives to traditional interpretive services, which typically rely on in-person or phone/video-based live human interpretation.

To better understand the role of these tools in clinical communication, it is important to distinguish between translation and interpretation, as they represent different communication tasks. Translation refers primarily to the conversion of written text from one language to another, whereas interpretation refers to the real-time conversion of spoken language during an interaction between individuals who speak different languages. Although emerging AI systems can approximate real-time speech-to-speech communication by combining speech recognition, machine translation, and audio synthesis, these systems largely rely on machine translation pipelines rather than human interpretive processes used in professional medical interpretation^6,7,8,9. The technology may lack some of the benefits afforded by in-person interpretation, which is particularly valuable in dynamic, high-stakes clinical encounters where meaning must be conveyed accurately while accounting for tone, ambiguity, emotional nuance, and conversational context^6,7.

In parallel, a two-wave cross-sectional survey of clinicians demonstrated growing expectations for diagnostic AI tools and emphasized the importance of usability and workflow integration¹⁰. Although these studies do not evaluate AI-based interpretation directly, they underscore the accelerating pace of technological adoption and the need for careful evaluation frameworks before deploying such tools in interpreter-mediated, high-stakes communication. These tools promise cost savings and expanded access in under-resourced clinical settings, especially for less common languages for which human interpreters may not readily be available. However, their performance in complex, high-stakes clinical encounters remains poorly understood. A 2024 systematic review found that AI interpretation tools performed best in simple, low-risk interactions¹¹. Real-world medicine often involves emotionally complex discussions around diagnoses, treatment options, or end-of-life care, in which appreciating cultural nuance and maintaining patient trust are essential. Using physician-coded clinical severity ratings, in which harm was defined as a mistranslation that could cause a patient to take or fail to take an action that would delay or endanger care, a recent study found that ~5% of discharge instruction translations contained at least one error with potential for clinically significant or life-threatening harm, with high inter-rater reliability (kappa 86–97%)¹². Based on these findings, continual assessment and oversight are necessary when using AI in non-low-risk healthcare communications.

As they exist now, AI-based language tools may be most appropriate for limited scenarios, such as low-risk written translation of patient-facing documents (e.g. hospital navigation instructions or general patient education materials), communication involving commonly supported languages in current AI systems when certified interpreters are not immediately available or temporary support while connection to a live interpreter is being arranged, and encounters involving rare languages as the technology continues to evolve. Even in these settings, their use should be approached cautiously and evaluated for accuracy, safety, and patient acceptability^11,12.

Current gaps in the literature

Despite the explosion of AI-related healthcare research, studies that investigate the patient perspective are exceedingly rare. The few that have explored this topic have generally found patient apprehension regarding AI’s safety and efficacy¹³. While patients appear to be enthusiastic about AI’s potential, they continue to have reservations about particular use cases^13,14. As our query found, fewer than 0.4 percent of AI-in-healthcare publications mention patient perspectives and no publications about interpreter services do so. Querying ChatGPT¹⁵ yielded only 20–30 papers in 2024 that mention AI and interpreter services, and only one or two discussing patient experience (Table 1). Using the same search criteria, Open Evidence¹⁶ surfaced just a single 2024 article (Barwise et al.)¹⁷, that focused on patient perspectives but did not include patient-reported outcomes. Importantly, patient-reported outcome measures (PROMs) require formal processes of translation, back-translation, and psychometric validation, and therefore represent a distinct methodological consideration from interpretation or translation in clinical encounters. Gemini¹⁸ and Perplexity¹⁹ returned no papers meeting all three criteria. Table 1 displays the output from this search. Notably, the different models differ in their levels of certainty, displaying numerical ranges, or utilizing the words “sparse” vs “none identified” vs “no match.” The convergence of these independent sources point to a systemic blind spot: the rapid technical development of AI interpreter tools is not being matched by research evaluating their impact on communities with NELP. Addressing this research gap is a prerequisite for optimal performance and equitable adoption. Further, using AI-based research assistants (ChatGPT, Perplexity, Gemini, and OpenEvidence) to generate comparative literature estimates represents a novel methodological approach for assessing gaps in emerging fields. As more researchers employ AI for these purposes, we believe being transparent is critical to evaluate the accuracy of results compared to traditional research methods. Our current understanding of whether AI interpretation helps or hinders patient comprehension, enhances or erodes provider trust, and improves or worsens disparities in care delivery is lacking.

Language access as a health equity imperative

Language represents a foundational component of effective communication and health equity. Patients with NELP experience higher rates of misdiagnosis, poor comprehension of treatment plans, and dissatisfaction with care²⁰. Language barriers are linked to increased emergency room use, lower adherence to medications, and reduced engagement in shared decision-making^20,21. Interpreter services are integral to safe, equitable, high-quality care. Furthermore, interpreter services are federally mandated. Compliance with the Emergency Medical Treatment and Labor Act (EMTALA) necessitates access to interpreters, and Section 1557 of the Affordable Care Act requires “meaningful access” to medical care for patients with limited English proficiency, which necessitates available interpreters^22,23.

Extensive evidence demonstrates that certified medical interpreters improve communication accuracy, reduce clinical errors, enhance shared decision-making, and in some cases decrease hospital length of stay and readmission rates^24,25. These well-documented benefits underscore the risks of substituting trained interpreters with emerging AI-based tools that have not yet undergone validation in high-stakes clinical settings. The adoption of AI tools should not be driven solely by cost or availability. Replacing trained interpreters with unvalidated technologies risks miscommunication in critical moments, especially if the technology is not perceived by patients as trustworthy or accurate.

A patient-centered research agenda

A robust patient-centered research agenda is essential to guide the successful utilization of AI interpreter services.

First, careful evaluation of accuracy and safety is critical. Comparative-effectiveness studies should benchmark AI systems against certified human interpreters on clinical-communication accuracy, shared-decision-making scores, visit length, cost-utility, and performance in simulated high-stakes scenarios^24,25,26. Accuracy alone, however, is insufficient to determine whether these tools are appropriate for real-world clinical environments.

Second, research must examine patient perception and trust. Mixed-methods studies should capture how patients with NELP perceive and trust these tools, such as combining post-visit patient experience surveys with qualitative interviews, near-real-time smartphone experience-sampling, and multilingual sentiment analysis. All outcomes should be stratified by language, health-literacy level, and socioeconomic status to illuminate intersectional inequities.

Third, feasibility and usability must be evaluated. Early assessments should examine ease of use, learnability, efficiency within clinical workflows, perceived usefulness among clinicians and patients, acceptability, task-completion error rates, and integration with existing communication practices.

Fourth, systems for continuous error monitoring are needed. This may include human-in-the-loop annotation pipelines through which bilingual clinicians flag clinically significant mistranslations, real-time confidence scoring that escalates uncertain outputs to live interpreters, and electronic health record (EHR)-embedded safety dashboards that track usage, error type, and override rates, mirroring established pharmacovigilance models^27,28,29.

Fifth, equity-focused implementation science, including interrupted time-series analyses of quality metrics and geographically diverse pragmatic studies, must evaluate whether AI interpreter services narrow or widen disparities in adverse events, readmissions, and patient-reported outcomes^24,25,30.

Finally, participatory design and strong governance frameworks are key. Establishing community advisory boards of patients with NELP, professional interpreters, and cultural brokers can ensure that system development reflects community needs. As these technologies evolve, health-system oversight councils with diverse representation and authority over high-risk deployments will be necessary, alongside policy advocacy for modality-neutral interpreter reimbursement to prevent premature substitution of certified interpreters^27,28,29,30.

Ultimately, successful implementation will depend not only on patient-centered evidence, but also on institutional governance that proactively identifies and mitigates algorithmic risk, an area where physician-informaticists and health-system leaders play a pivotal role³¹ (Table 2).

Table 2 Patient-centered research agenda items

Full size table

Conclusion

AI shows significant promise in interpreter services but also carries risk. As healthcare systems increasingly consider AI-mediated language tools, patient-centered evidence will be essential to understand their impact on patient experience and clinical outcomes. We must prioritize high-quality, inclusive research that assesses the impact of these technologies for patients with NELP and centers the voices of those patients.

Importantly, discussion about AI interpreter services should also include the perspectives of professional medical interpreters, whose expertise in linguistic nuance, cultural mediation, and clinical communication is essential to safe and equitable care.

Finally, it will be critical to evaluate how accurate interpretation in clinical encounters influences trust and engagement in the healthcare system. Will these technologies ultimately strengthen or weaken trust in healthcare? Only through careful research can we ensure that emerging language technologies advance, rather than erode, equitable care.

Data availability

No datasets were generated or analyzed during the current study.

References

Sahni, N. R. & Carrus, B. Artificial intelligence in U.S. health care delivery. N. Engl. J. Med. 389, 348–358 (2023).
Article PubMed Google Scholar
Haug, C. J. & Drazen, J. M. Artificial intelligence and machine learning in clinical medicine, 2023. N. Engl. J. Med. 388, 1201–1208 (2023).
Article CAS PubMed Google Scholar
Brunyé, T. T., Mitroff, S. R. & Elmore, J. G. Artificial intelligence and computer-aided diagnosis in diagnostic decisions: 5 questions for medical informatics and human-computer interface research. J. Am. Med. Inform. Assoc. ocaf123, https://doi.org/10.1093/jamia/ocaf123 (2025).
Colacci, M. et al. Sociodemographic bias in clinical machine learning models: a scoping review of algorithmic bias instances and mechanisms. J. Clin. Epidemiol. 178, 111606 (2025).
Article PubMed Google Scholar
Ferryman, K., Mackintosh, M. & Ghassemi, M. Considering biased data as informative artifacts in AI-assisted health care. N. Engl. J. Med. 389, 833–838 (2023).
Article PubMed Google Scholar
Ng, J. J. W. et al. Evaluating the performance of artificial intelligence-based speech recognition for clinical documentation: a systematic review. BMC Med. Inf. Decis. Mak. 25, 236 (2025).
Article Google Scholar
Xu, Z. et al. Voice for all: evaluating the accuracy and equity of automatic speech recognition systems in transcribing patient communications in home healthcare. Stud. Health Technol. Inf. 329, 1904–1906 (2025).
Google Scholar
Karakus, I. et al. Bridging language gaps in healthcare: a systematic review of the practical implementation of neural machine translation technologies in clinical settings. J. Am. Med. Inform. Assoc. ocaf150, https://doi.org/10.1093/jamia/ocaf150 (2025).
Singh, K., Prabhu, A. & Kaur, N. The impact and role of artificial intelligence (AI) in healthcare: a systematic review. Curr. Top. Med. Chem. CTMC-EPUB-146975, https://doi.org/10.2174/0115680266339394250225112747 (2025).
Cabral, B. P. et al. Future use of AI in diagnostic medicine: 2-wave cross-sectional survey study. J. Med. Internet Res. 27, e53892 (2025).
Article PubMed PubMed Central Google Scholar
Genovese, A. et al. Artificial intelligence in clinical settings: a systematic review of its role in language translation and interpretation. Ann. Transl. Med. 12, 117 (2024).
Article PubMed PubMed Central Google Scholar
Kong, M. et al. Evaluation of the accuracy and safety of machine translation of patient-specific discharge instructions: a comparative analysis. BMJ Qual. Saf. 0, 1–9 (2025).
Google Scholar
Richardson, J. P. et al. Patient apprehensions about the use of artificial intelligence in healthcare. NPJ Digit Med. 4, 140 (2021).
Article PubMed PubMed Central Google Scholar
Young, A. T. et al. Patient and general public attitudes towards clinical artificial intelligence: a mixed methods systematic review. Lancet Digit Health 3, e599–e611 (2021).
Article CAS PubMed Google Scholar
OpenAI. ChatGPT (August 6 version) [Large Language Model] (OpenAI, 2025).
OpenEvidence. OpenEvidence [AI research assistant] (OpenEvidence, 2025).
Barwise, A. K. et al. Using artificial intelligence to promote equitable care for inpatients with language barriers and complex medical needs: clinical stakeholder perspectives. J. Am. Med Inf. Assoc. 31, 611–621 (2024).
Article Google Scholar
Google. Gemini (Aug 6 version) [Large language model] (Google, 2025).
Perplexity. Perplexity.ai (Aug 6 version) [AI search engine] (Perplexity, 2025).
Pandey, M. et al. Impacts of English language proficiency on healthcare access, use, and outcomes among immigrants: a qualitative study. BMC Health Serv. Res 21, 741 (2021).
Article PubMed PubMed Central Google Scholar
Sarver, J. & Baker, D. W. Effect of language barriers on follow-up appointments after an emergency department visit. J. Gen. Intern. Med. 15, 256–264 (2000).
Article CAS PubMed PubMed Central Google Scholar
State Operations Manual, Appendix V - Interpretive Guidelines - Responsibilities of Medicare Participating Hospitals in Emergency Cases (Centers for Medicare and Medicaid Services, 2019).
Rainer, M. F. Language Access Provisions of the Final Rule Implementing Section 1557 of the Affordable Care Act (Department of Health and Human Services, Office for Civil Rights, 2024).
Karliner, L. S. et al. Do professional interpreters improve clinical care for patients with limited English proficiency? A systematic review of the literature. Health Serv. Res. 42, 727–754 (2007).
Article PubMed PubMed Central Google Scholar
Lindholm, M. et al. Professional language interpretation and inpatient length of stay and readmission rates. J. Gen. Intern. Med. 27, 1294–1299 (2012).
Article PubMed PubMed Central Google Scholar
Radu, I. et al. Digital health for migrants, ethnic and cultural minorities and the role of participatory development: a scoping review. Int. J. Environ. Res. Public Health 20, 6962 (2023).
Selbst, A. D. & Barocas, S. The intuitive appeal of explainable machines. Fordham Law Rev. 87, 1085–1139 (2019).
Google Scholar
American Medical Association. CPT Code Set: Language Interpreter Services Proposed (American Medical Association, 2024).
OSTP, U. S. Blueprint for an AI Bill of Rights: Technical Companion (OSTP, U. S., 2022).
Beaton, D. E. et al. Guidelines for the process of cross-cultural adaptation of self-report measures. Spine 25, 3186–3191 (2000).
Article CAS PubMed Google Scholar
Obermeyer, Z. et al. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

G.O. is supported by the NIH NIMHD under award number K23MD016129, the Brigham and Women’s Hospital Center for Academic Development and Enrichment Faculty Career Development Award, the H. Richard Nesson Fellowship at Brigham and Women’s Hospital, and the Gordon and Betty Moore Foundation in partnership with the Council of Medical Specialty Societies through the National Academy of Medicine Scholars in Diagnostic Excellence program. This work was supported by G.O.’s funding. G.O. and R.W.B. also disclosed grant funding from Mass General Brigham to support language-concordant surgical care in otolaryngology–head and neck surgery, which is unrelated to the submitted work. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

Authors and Affiliations

Center for Surgery and Public Health, Mass General Brigham, Boston, MA, USA
Olivia F. Lynch, Emily E. Witt, Maria Alejandra Montoya Rubiano, Regan W. Bergmark & Gezzer Ortega
National Clinician Scholars Program, Yale School of Medicine, New Haven, CT, USA
Olivia F. Lynch
University of California San Francisco, San Francisco, CA, USA
Alicia Fernandez
Department of Surgery, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
Jeslyn A. Rodriguez
Council of Medical Specialty Societies, Washington, DC, USA
Helen Burstin
Patients for Patient Safety, Atlanta, GA, USA
Susan Sheridan
University of Chicago Medicine, Chicago, IL, USA
Maia Hightower
Equality AI, Salt Lake City, UT, USA
Maia Hightower
Division of General Internal Medicine, Department of Medicine, Mass General Brigham, Boston, MA, USA
David W. Bates
Department of Medicine, Harvard Medical School, Boston, MA, USA
David W. Bates
Department of Health Policy and Management, Harvard T.H. Chan School of Public Health, Boston, MA, USA
David W. Bates
University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
Kevin B. Johnson
Department of Surgery, Harvard Medical School, Boston, MA, USA
Regan W. Bergmark & Gezzer Ortega
Patient Reported Outcomes, Value, and Experience (PROVE) Center, Mass General Brigham, Boston, MA, USA
Gezzer Ortega

Authors

Olivia F. Lynch
View author publications
Search author on:PubMed Google Scholar
Emily E. Witt
View author publications
Search author on:PubMed Google Scholar
Alicia Fernandez
View author publications
Search author on:PubMed Google Scholar
Jeslyn A. Rodriguez
View author publications
Search author on:PubMed Google Scholar
Maria Alejandra Montoya Rubiano
View author publications
Search author on:PubMed Google Scholar
Helen Burstin
View author publications
Search author on:PubMed Google Scholar
Susan Sheridan
View author publications
Search author on:PubMed Google Scholar
Maia Hightower
View author publications
Search author on:PubMed Google Scholar
David W. Bates
View author publications
Search author on:PubMed Google Scholar
Kevin B. Johnson
View author publications
Search author on:PubMed Google Scholar
Regan W. Bergmark
View author publications
Search author on:PubMed Google Scholar
Gezzer Ortega
View author publications
Search author on:PubMed Google Scholar

Contributions

Conception idea (O.F.L., E.E.W., A.F., J.R., H.B., S.S., M.H., D.W.B., K.J., R.W.B., G.O.); study design (O.F.L., E.E.W., A.F., J.R., H.B., S.S., M.H., D.W.B., K.J., R.W.B., G.O.); data recolection (O.F.L., E.E.W., A.F., J.R., H.B., S.S., M.H., D.W.B., K.J., R.W.B., G.O.); data analysis (O.F.L., E.E.W., A.F., J.R., H.B., S.S., M.H., D.W.B., K.J., R.W.B., G.O.); draft manuscript (O.F.L., E.E.W., A.F., J.R., M.A.M.R., H.B., S.S., M.H., D.W.B., K.J., R.W.B., G.O.); All authors review ctically the manuscript.

Corresponding author

Correspondence to Gezzer Ortega.

Ethics declarations

Competing interests

R.W.B. discloses unrelated clinical trial grant funding from I-Mab Biopharma and unrelated research consulting funding from Analysis Group. D.W.B. has received consulting fees and/or stock options from AESOP, FeelBetter, Guided Clinical Solution, ValeraHealth, Clew, and MDClone, as well as consulting fees from Relyens, all outside the submitted work. All other authors declare no competing interests. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Lynch, O.F., Witt, E.E., Fernandez, A. et al. Beyond translation: a patient-centered research agenda for artificial intelligence interpreter services in healthcare. npj Digit. Med. 9, 376 (2026). https://doi.org/10.1038/s41746-026-02764-6

Download citation

Received: 24 September 2025
Accepted: 06 May 2026
Published: 15 May 2026
Version of record: 15 May 2026
DOI: https://doi.org/10.1038/s41746-026-02764-6