Bridging the mentorship divide: how large language models could reshape medical workforce equity

Nguyen, Nghia Phu; Tran, Phillip

doi:10.1038/s41746-025-02167-z

Download PDF

Comment
Open access
Published: 09 January 2026

Bridging the mentorship divide: how large language models could reshape medical workforce equity

npj Digital Medicine volume 9, Article number: 29 (2026) Cite this article

2770 Accesses
3 Altmetric
Metrics details

Subjects

Mentorship in predictive large language models (LLMs) presents both opportunities and risks of inequity if these systems are not designed with care. Mentorship quietly shapes medical careers, yet opportunities for it remain unequal. LLMs can analyze student writing and may reveal potential that exams overlook. This predictive capacity could inform fairer mentorship and direct support to students who need it most, helping build a more equitable medical workforce.

Mentorship inequity: an overlooked driver of inequity

Mentorship is part of the “hidden curriculum”^1,2,3. It reflects the institutional and interpersonal influences that shape learning beyond formal instruction, encompassing values, norms, and opportunities embedded in the culture and organization of medical training⁴. Mentorship occurs in multiple forms—academic, clinical, research, and psychosocial—whose meanings and expectations differ across contexts and stages of education^3,5. In this paper, the term refers mainly to formative academic and professional mentorship that supports learners’ reasoning, research engagement, and career development during medical education.

Mentorship can influence who feels confident, who tries research, and who teachers notice as “promising”^2,3,6,7. In our view, it matters as much as lectures or exams, but we rarely treat it that way. In many low- and middle-income countries (LMICs), one clinical teacher may supervise forty or more students, making individual attention impossible^8,9,10. This normalizes inequity: students get used to learning without personal guidance.

This gap is not random. Faculty shortages, heavy service duties, and unequal resources mean mentorship becomes scarce^{6,11,12,13,14,15,16}. When mentorship is rare, the system tends to reward those already visible such as students in elite schools, fluent in dominant languages, or high score in exams, while LMIC institutions with fewer resources fall further behind. Traditional exams and Objective Structured Clinical Examination (OSCE) reinforce this inequity because they measure only current performance, not potential growth^6,17,18,19. For medical trainees, this means that critical skills like diagnostic reasoning and bedside communication are often shaped by access to mentors rather than curriculum quality alone. This may contribute to a reinforcing structural pattern in which underserved regions produce fewer specialists and academic leaders over time.

What LLMs make visible

Research shows even childhood essays can predict learning and life outcomes decades later^20,21,22,23. This suggests that language is more than words—it is a signal of how people think and learn. In medical education, reflective writing and case-based reasoning notes serve a similar function: they reveal how students synthesize evidence, empathy, and uncertainty in clinical contexts^24,25. In universities, LLM-based tools can predict student performance early, even when no past grades exist^26,27,28,29. This means they can give teachers a head start in noticing who might need help. For LMIC schools that cannot rely on long-term data systems, this ability to make early predictions from simple student writing is especially valuable.

Fine-tuned models perform as well as knowledge-tracing systems that track learning over time^20,30,31. Multi-agent LLM systems can read reflective writing and flag students likely to struggle long before exams show it^32,33. By proactively identifying learners at risk of disengagement or dropout, institutions could intervene early and provide timely support.

Even in adult literacy programs, GPT-4 has shown accuracy as good as or better than traditional benchmarks⁸. This suggests that curiosity, reasoning style, and engagement leave a trace in language across all ages. Yet most debates on LLMs in education still focus on plagiarism concerns, overlooking their potential to reveal strengths as well as risks.

From prediction to redistribution

Prediction alone is not enough. If we see hidden potential but do nothing, frustration only grows. The real challenge is to use insights for action: to give mentorship where it is most needed. We refer to this process as predictive redistribution—the purposeful use of model-derived insights to guide the fair reallocation of human mentorship and institutional attention. A possible pipeline could operates as a continuous feedback loop^20,26,32:

Input: Collect standardized multimodal learner data encompassing both quantitative indicators and qualitative narratives (e.g., surveys, grades, engagement analytics, mentorship records, reflective essays, and structured case summaries) drawn from regular coursework and institutional systems, providing a consistent and equitable foundation for analysis (see Supplementary Information).
Analysis: LLMs extract indicators of reasoning, curiosity, collaboration, and communication that underpin safe clinical judgment.
Prediction: Generate interpretable profiles showing growth potential and learning risks.
Prescription: Match each profile with tailored mentorship strategies and resources. This stage is guided by human mentors through a dashboard that summarizes each learner’s growth and risk profile, enabling educators to craft and deliver appropriate mentorship actions.
Feedback: Mentor actions and learner outcomes feed back into the system, refining both model performance and human mentorship quality.

To visualize the conceptual flow, Fig. 1 presents the predictive mentorship redistribution loop, illustrating how LLM-based prediction links data interpretation to equitable mentorship actions. The framework functions as a closed equity feedback system, where each phase—Input, Analysis, Prediction, Prescription, and Feedback—continuously refines both algorithmic insight and human mentoring practice.

**Fig. 1: Predictive mentorship redistribution loop.**

While some may perceive this approach as overly structured, it is worth considering whether it is, in fact, more structured than existing systems in which mentorship opportunities are often influenced by chance, personal confidence, or social connections.

From a practical standpoint, this pipeline could be implemented using open-weight, instruction-tuned transformer models that support secure on-premise deployment, such as LlaMA 3, Mistral, Qwen 2, or Gemma 2. These models balance performance and privacy, allowing institutions to analyze sensitive educational data without external data transfer^34,35,36,37. Comparable frameworks have already proven effective for educational reflection analysis and performance prediction: fine-tuned LLMs have achieved performance comparable to knowledge-tracing baselines across multiple datasets²⁰; multi-agent models have successfully assessed student reflections in real-time feedback environments³²; and early-prediction systems such as LLM-EPSP have demonstrated robust integration with institutional learning platforms²⁶. Studies of adult-literacy prediction similarly confirm that modest text datasets (a few thousand short entries) are sufficient for reliable inference³⁸. Importantly, empirical validation has already emerged across educational and clinical domains: fine-tuned and multi-agent LLM frameworks have demonstrated reliable prediction and assessment of student reflections^{6,20,26,30,31}, AI-based evaluation has improved clinical-reasoning assessment⁶, and LLM-driven tutoring pilots and case studies have shown promising outcomes across diverse learning environments^8,39. Notably, recent studies further demonstrate that LLM-based systems can identify hidden potential and flag at-risk learners early^26,32,33,38, enabling proactive mentorship interventions and equitable resource allocation.

Collectively, these findings indicate the technical and ethical feasibility of implementing predictive mentorship within existing infrastructure—running on institutional Graphics Processing Unit (GPU) servers or secure cloud Application Programming Interface (API)—to generate interpretable feedback dashboards that augment rather than replace human mentors, thereby enhancing both efficiency and equity in the distribution of mentorship resources.

To operationalize this concept, the framework could be piloted in a cohort of medical students using an integrated dataset that combines institutional data (e.g., course grades, learning analytics…) with a brief self-report survey on study strategies, motivation, and perceived mentorship, along with a short reflective paragraph on learning challenges. The pilot could involve approximately 300–500 learners over one semester. A large language model (e.g., LLaMA-3 8B Instruct, Mistral 7B Instruct, or Qwen-2 7B) could be prompted to predict end-of-term performance and flag students at risk of low achievement, while a baseline regression model (e.g., linear or random-forest) trained on the same dataset would serve as a quantitative benchmark. Evaluation would compare predictive reliability across metrics of accuracy and fairness, and faculty mentors could qualitatively review early “risk” flags to assess interpretability and fairness. Such a lightweight pilot could provide initial evidence of feasibility and offer a transparent template for developing AI-assisted early-warning systems in medical education. (see Supplementary Information: Predictive Mentorship Dataset Specification for data items and survey instrument).

In medical training, such systems could help educators triage mentorship support by identifying learners struggling with diagnostic reasoning or professionalism early enough for formative feedback rather than remediation. Yet predictive mentorship cannot replace the human subtleties of encouragement and belonging. Its role is to surface opportunities, not to automate empathy.

While predictive mentorship frameworks hold particular promise for LMICs, their realization remains constrained by infrastructural and sociotechnical challenges. Empirical analyses across Southeast Asia reveal that unstable internet connectivity, limited GPU and cloud access, and weak digital governance continue to hinder sustainable AI integration in medical education and health-training^40,41. Evidence from simulation-based and generative-AI learning programs further shows that cost, hardware dependency, and language diversity restrict scalability and long-term maintenance^{6,11,42,43,44,45,46,47,48}. Implementation success in LMICs depends not only on technology but also on local capacity for model adaptation, educator training, and ethical oversight^{6,11,40,47,49,50,51}. To ensure linguistic fairness, LLMs should be fine-tuned or prompted in the learners’ primary language where possible, and evaluated across multilingual samples to avoid penalizing students for non-dominant language styles. Addressing these barriers requires co-designed, resource-adaptive systems that can operate under intermittent connectivity, leverage multilingual datasets, and empower local educators to maintain and audit AI tools sustainably.

Equity by design

No algorithm is neutral^8,52,53,54. Models trained on data from specific regions or institutions inevitably inherit the assumptions and biases of those contexts^11,55,56. When applied across settings, especially from high-income to low- and middle-income environments, such models can inadvertently amplify existing inequities rather than correct them^11,43,52,57. Models trained on limited or homogeneous data may disadvantage students whose writing reflects local dialects or culturally distinct styles^{7,8,29,52,56,58,59}. And even the most advanced systems still struggle with balancing fairness and accuracy^{26,32,52,60,61,62}. This is not a reason to give up but a call to design carefully: achieving fairness requires continuous local adaptation, bias auditing, and human-mentorship oversight—embedding transparency, diversity, and ethical review at every stage—so that predictive tools highlight potential rather than reproduce privilege. This principle of equity by design means embedding fairness, transparency, and human oversight into every stage of model development and educational use. In our view, humility is key. LLMs suggest probabilities, not truths. They should guide teachers, not replace them.

Beyond technical fairness, ethical safeguards are equally vital. Responsible predictive systems in medical education and healthcare training must protect data privacy, obtain informed consent, and ensure transparency in how outputs are generated and interpreted^{8,11,44,52,56,63}. Learners’ writings should be anonymized and reviewed only within approved ethical frameworks, following clear communication about how the analysis will be used to guide—not judge—their development^43,52. Because predictive assessment can evoke discomfort or a sense of surveillance, systems must provide interpretable feedback and retain human mentorship oversight at every stage. Recent evidence shows that both students and faculty value guidance on ethical and human-centered AI use, expressing concern that excessive reliance on algorithms could erode trust, empathy, and critical thinking in medical education^{40,43,44,56,64,65,66,67,68,69}. Embedding these safeguards of privacy, consent, transparency, and empathy can help predictive mentorship advance equity while maintaining human dignity and confidence in the learning process.

Toward workforce equity

The effects of missed mentorship last for decades. Students who never receive guidance are less likely to publish, specialize, or become mentors themselves^3,6,70,71,72. Over time, this builds into workforce gaps. Policymakers often talk about shortages in numbers. But behind the numbers is an upstream filter: who got attention early on. These shortages in LMICs translate directly into fewer clinicians in rural or primary-care settings, slower specialist training, and weaker academic pipelines^14,73,74. This pattern is compounded by the fact that primary-care systems in many LMICs remain underdeveloped, limiting the structural capacity to retain and distribute trained professionals equitably^75,76. Predictive mentorship thus provides a conceptual link between educational equity and the broader conditions that shape future workforce distribution.

In places with too few human mentors, predictive tools could help stretch limited resources, pointing support to the students who need it most²⁶. In this sense, predictive mentorship remains primarily an educational strategy that leverages large language models to personalize learning and expand access to academic support—an approach consistent with recent theoretical frameworks for integrating LLMs into education⁸. By promoting fairer and more adaptive mentorship processes, such tools may indirectly influence who advances, specializes, and contributes back as future mentors. Table 1 shows some key application areas where LLM prediction can be leveraged to promote mentorship equity, detailing current challenges, potential solutions, and anticipated equity impacts. It focuses on why and where predictive mentorship may promote workforce equity.

Table 1 Key application areas where LLM-based prediction can promote mentorship equity, summarizing current challenges, predictive mechanisms, and expected equity impacts

Full size table

Together, these five applications illustrate how LLMs might help shift mentorship from a matter of luck into a more deliberate and fair process. By revealing hidden strengths, spotting risks earlier, guiding mentor–student matches, widening access to leadership roles, and informing workforce planning, predictive mentorship could help make better use of scarce resources. The idea is straightforward: instead of waiting for students to either thrive or fail on their own, we can use signals in their language to guide support where it will have the most impact. Rather than replacing human judgment, such tools can guide support where it matters most, fostering a more equitable and sustainable professional pipeline over time.

Conclusion

Mentorship, when considered through the lens of predictive LLMs, invites reflection on how opportunities in medical education are recognized and shared. These systems may help surface learning needs and potential that traditional assessments overlook, yet they also risk reproducing existing inequities if applied without care. The medical workforce of the future may depend as much on how mentorship is supported and distributed as on curricula or examinations. The value of predictive mentorship lies less in technological novelty than in its capacity, when responsibly designed and governed, to inform fairer and more transparent educational support.

Data availability

No empirical datasets were generated or analysed during the current study. The study introduces the Predictive Mentorship Dataset Specification, which outlines the structure and variables of an integrated mentorship dataset combining institutional analytics and learner self-reports. The full dataset specification and survey instrument developed by the authors are available in the Supplementary Information file.

References

Hafferty, F. W. Beyond curriculum reform: confronting medicineʼs hidden curriculum. Acad. Med. 73, 403–407 (1998).
Article PubMed CAS Google Scholar
Jackson, V. A. et al. Having the right chemistry: a qualitative study of mentoring in academic medicine. Acad. Med. 78, 328–334 (2003).
Article PubMed Google Scholar
Sambunjak, D., Straus, S. E. & Marušić, A. Mentoring in academic medicine: a systematic review. JAMA 296, 1103 (2006).
Article PubMed CAS Google Scholar
Lawrence, C. et al. The hidden curricula of medical education: a scoping review. Acad. Med. 93, 648–656 (2018).
Article PubMed PubMed Central Google Scholar
Ellis, M., Wilson, G., Nulan, E., Day, M. & McElroy, J. Mentoring, coaching and peer-support programs promoting well-being for physicians: A systematic review. MRAJ 12, (2024).
Schaye, V. et al. Artificial intelligence based assessment of clinical reasoning documentation: an observational study of the impact of the clinical learning environment on resident documentation quality. BMC Med. Educ. 25, 591 (2025).
Article PubMed PubMed Central Google Scholar
Wolfram, T. Large language models predict cognition and education close to or better than genomics or expert assessment. Commun. Psychol. 3, 95 (2025).
Article PubMed PubMed Central Google Scholar
Shahzad, T. et al. A comprehensive review of large language models: issues and solutions in learning environments. Discov. Sustain 6, 27 (2025).
Article Google Scholar
Mkony, C. A., Kaaya, E. E., Goodell, A. J. & Macfarlane, S. B. Where teachers are few: documenting available faculty in five Tanzanian medical schools. Glob. Health Action 9, 32717 (2016).
Article PubMed Google Scholar
De Villiers, M. et al. Decentralised training for medical students: a scoping review. BMC Med Educ. 17, 196 (2017).
Article PubMed PubMed Central Google Scholar
Feigerlova, E., Hani, H. & Hothersall-Davies, E. A systematic review of the impact of artificial intelligence on educational outcomes in health professions education. BMC Med. Educ. 25, 129 (2025).
Article PubMed PubMed Central Google Scholar
Schaye, V. et al. Development of a Clinical Reasoning Documentation Assessment Tool for Resident and Fellow Admission Notes: a Shared Mental Model for Feedback. J. Gen. Intern. Med. 37, 507–512 (2022).
Article PubMed Google Scholar
Bennett, S., Paina, L., Ssengooba, F., Waswa, D. & M′Imunya, J. Mentorship in African health research training programs: an exploratory study of fogarty international center programs in Kenya and Uganda. Educ. Health 26, 183 (2013).
Article Google Scholar
Lescano, A. G. et al. Strengthening Mentoring in Low- and Middle-Income Countries to Advance Global Health Research: An Overview. Am. J. Tropical Med. Hyg. 100, 3–8 (2019).
Article Google Scholar
Nakanjako, D. et al. Doctoral training in Uganda: evaluation of mentoring best practices at Makerere university college of health sciences. BMC Med. Educ. 14, 9 (2014).
Article PubMed PubMed Central Google Scholar
Schwerdtle, P., Morphet, J. & Hall, H. A scoping review of mentorship of health personnel to improve the quality of health care in low and middle-income countries. Glob. Health 13, 77 (2017).
Article Google Scholar
Reid, H., Gormley, G. J., Dornan, T. & Johnston, J. L. Harnessing insights from an activity system – OSCEs past and present expanding future assessments. Med. Teach. 43, 44–49 (2021).
Article PubMed Google Scholar
Malau-Aduli, B. S., Jones, K., Saad, S. & Richmond, C. Has the OSCE Met Its Final Demise? Rebalancing Clinical Assessment Approaches in the Peri-Pandemic World. Front. Med. 9, 825502 (2022).
Article Google Scholar
Newble, D. Techniques for measuring clinical competence: objective structured clinical examinations. Med. Educ. 38, 199–203 (2004).
Article PubMed Google Scholar
Neshaei, S. P. et al. Towards Modeling Learner Performance with Large Language Models. Preprint at https://doi.org/10.48550/arXiv.2403.14661 (2024).
Boehm, J. K., Qureshi, F. & Kubzansky, L. D. In the words of early adolescents: a novel assessment of positive psychological well-being predicts young adult depressive symptoms. J. Adolesc. Health 74, 713–719 (2024).
Article PubMed Google Scholar
Radford, K. et al. Can adult mental health be predicted by childhood future-self narratives? Insights from the CLPsych 2018 Shared Task. In Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic 126–135 (Association for Computational Linguistics, New Orleans, LA, 2018). https://doi.org/10.18653/v1/W18-0614.
Laurin, K., Engstrom, H. R. & Huang, M. What will my life be like when I am 25? How do children’s social class contexts predict their imagined and actual futures? J. Soc. Issues 80, 1433–1459 (2024).
Article Google Scholar
Mamede, S. & Schmidt, H. G. Deliberate reflection and clinical reasoning: Founding ideas and empirical findings. Med. Educ. 57, 76–85 (2023).
Article PubMed Google Scholar
Lim, J. Y. et al. A systematic scoping review of reflective writing in medical education. BMC Med Educ. 23, 12 (2023).
Article PubMed PubMed Central Google Scholar
Zhou, H. et al. LLM-EPSP: Large language model empowered early prediction of student performance. Inf. Process. Manag. 63, 104351 (2026).
Article Google Scholar
Kalita, E. et al. Predicting student academic performance using Bi-LSTM: a deep learning framework with SHAP-based interpretability and statistical validation. Front. Educ. 10, 1581247 (2025).
Article Google Scholar
Turkmenbayev, A., Abdykerimova, E., Nurgozhayev, S., Karabassova, G. & Baigozhanova, D. The application of machine learning in predicting student performance in university engineering programs: a rapid review. Front. Educ. 10, 1562586 (2025).
Article Google Scholar
Ahmed, W. et al. Machine learning-based academic performance prediction with explainability for enhanced decision-making in educational institutions. Sci. Rep. 15, 26879 (2025).
Article PubMed PubMed Central CAS Google Scholar
Wang, D., Chen, G. & Lu, Y. Fine-Tuning Large Language Models for Knowledge Tracing Harnessing Insights from Explainable AI. In Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium, Blue Sky, and WideAIED (eds Cristea, A. I., Walker, E., Lu, Y., Santos, O. C. & Isotani, S.) vol. 2590, 297–302 (Springer Nature Switzerland, Cham, 2025).
Wang, Z. et al. LLM-KT: Aligning Large Language Models with Knowledge Tracing using a Plug-and-Play Instruction. Preprint at https://doi.org/10.48550/ARXIV.2502.02945 (2025).
Li, G. et al. Single-agent vs. Multi-agent LLM Strategies for Automated Student Reflection Assessment. in Advances in Knowledge Discovery and Data Mining (eds Wu, X. et al.) vol. 15874 300–311 (Springer Nature Singapore, Singapore, 2025).
Lin, C.-C., Cheng, E. S. J., Huang, A. Y. Q. & Yang, S. J. H. DNA of learning behaviors: A novel approach of learning performance prediction by NLP. Computers Educ.: Artif. Intell. 6, 100227 (2024).
Google Scholar
Jiang, A. Q. et al. Mixtral of Experts. Preprint at https://doi.org/10.48550/ARXIV.2401.04088 (2024).
Gemma Team et al. Gemma 2: Improving Open Language Models at a Practical Size. Preprint at https://doi.org/10.48550/ARXIV.2408.00118 (2024).
Grattafiori, A. et al. The Llama 3 Herd of Models. Preprint at https://doi.org/10.48550/ARXIV.2407.21783 (2024).
Yang, A. et al. Qwen2 Technical Report. Preprint at https://doi.org/10.48550/ARXIV.2407.10671 (2024).
Zhang, L. et al. Predicting Learning Performance with Large Language Models: A Study in Adult Literacy. Preprint at https://doi.org/10.48550/arXiv.2403.14668 (2024).
García-Méndez, S., Arriba-Pérez, F. D. & Somoza-López, M. D. C. A review on the use of large language models as virtual tutors. Sci. Educ. 34, 877–892 (2025).
Article Google Scholar
Ahsan, Z. Integrating artificial intelligence into medical education: a narrative systematic review of current applications, challenges, and future directions. BMC Med. Educ. 25, 1187 (2025).
Article PubMed PubMed Central Google Scholar
Wibowo, M. F. et al. Insights into the current and future state of AI adoption within health systems in southeast asia: cross-sectional qualitative study. J. Med. Int. Res. 27, e71591 (2025).
Google Scholar
Li, X., Elnagar, D., Song, G. & Ghannam, R. Advancing medical education using virtual and augmented reality in low- and middle-income countries: a systematic and critical review. Virtual Worlds 3, 384–403 (2024).
Article Google Scholar
Duan, S., Liu, C., Rong, T., Zhao, Y. & Liu, B. Integrating AI in medical education: a comprehensive study of medical students’ attitudes, concerns, and behavioral intentions. BMC Med. Educ. 25, 599 (2025).
Article PubMed PubMed Central Google Scholar
Weidener, L. & Fischer, M. Artificial intelligence in medicine: cross-sectional study among medical students on application, education, and ethical aspects. JMIR Med. Educ. 10, e51247 (2024).
Article PubMed PubMed Central Google Scholar
Ferreira, J. M. G. et al. Effectiveness of Low-cost, Technology-enhanced Simulation Training for Healthcare Training in Low—and Middle-income Countries (LMICs): A Systematic Literature Review. J. Gen. Intern. Med. https://doi.org/10.1007/s11606-025-09794-y.
Nag, A., Mukherjee, A., Ganguly, N. & Chakrabarti, S. Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs. in Findings of the Association for Computational Linguistics: EMNLP 2024 15681–15701 (Association for Computational Linguistics, Miami, Florida, USA, 2024). https://doi.org/10.18653/v1/2024.findings-emnlp.920.
Robinson, S. J. et al. A guide to outcome evaluation of simulation-based education programmes in low and middle-income countries. ANZ J. Surg. 94, 1011–1020 (2024).
Article PubMed Google Scholar
Chandran, V. P. et al. Mobile applications in medical education: A systematic review and meta-analysis. PLoS ONE 17, e0265927 (2022).
Article PubMed PubMed Central CAS Google Scholar
Rincón, E. H. H. et al. Mapping the use of artificial intelligence in medical education: a scoping review. BMC Med Educ. 25, 526 (2025).
Article PubMed PubMed Central Google Scholar
Robinson, S. J. A. et al. Simulation-Based Education of Health Workers in Low- and Middle-Income Countries: A Systematic Review. Glob. Health Sci. Pr. 12, e2400187 (2024).
Article Google Scholar
Shen, M. et al. Development and implementation of a multiple stage emergency care training program in Kono, Sierra Leone: a clinician-educator curriculum. BMC Med. Educ. 25, 1411 (2025).
Article PubMed PubMed Central Google Scholar
Ethics and Governance of Artificial Intelligence for Health: Large Multi-Modal Models. WHO Guidance. (World Health Organization, Geneva, 2024).
Stinson, C. Algorithms are not neutral: Bias in collaborative filtering. AI Ethics 2, 763–770 (2022).
Article PubMed PubMed Central Google Scholar
Phillips-Brown, M. Algorithmic neutrality. Preprint at https://doi.org/10.48550/ARXIV.2303.05103 (2023).
Templin, T. et al. Framework for bias evaluation in large language models in healthcare settings. npj Digit. Med. 8, 414 (2025).
Article PubMed PubMed Central Google Scholar
Gordon, M. et al. A scoping review of artificial intelligence in medical education: BEME Guide No. 84. Med. Teach. 46, 446–470 (2024).
Article PubMed Google Scholar
Yang, J. et al. Mitigating machine learning bias between high income and low–middle income countries for enhanced model fairness and generalizability. Sci. Rep. 14, 13318 (2024).
Article PubMed PubMed Central CAS Google Scholar
Joshi, A. et al. Natural language processing for dialects of a language: a survey. ACM Comput. Surv. 57, 1–37 (2025).
Article Google Scholar
Fleisig, E. et al. Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing 13541–13564 (Association for Computational Linguistics, Miami, Florida, USA, 2024). https://doi.org/10.18653/v1/2024.emnlp-main.750.
Liu, W. et al. Fairness identification of large language models in recommendation. Sci. Rep. 15, 5516 (2025).
Article PubMed PubMed Central CAS Google Scholar
Plecko, D. & Bareinboim, E. Fairness-accuracy trade-offs: a causal perspective. AAAI 39, 26344–26353 (2025).
Article Google Scholar
Buijsman, S. Navigating fairness measures and trade-offs. AI Ethics 4, 1323–1334 (2024).
Article Google Scholar
Jobin, A., Ienca, M. & Vayena, E. The global landscape of AI ethics guidelines. Nat. Mach. Intell. 1, 389–399 (2019).
Article Google Scholar
Blanco, M. A. et al. Integrating artificial intelligence into medical education: a roadmap informed by a survey of faculty and students. Med. Educ. Online 30, 2531177 (2025).
Article PubMed PubMed Central Google Scholar
Salih, S. M. Perceptions of faculty and students about use of artificial intelligence in medical education: a qualitative study. Cureus https://doi.org/10.7759/cureus.57605 (2024).
Sami, A. et al. Medical students’ attitudes toward AI in education: perception, effectiveness, and its credibility. BMC Med Educ. 25, 82 (2025).
Article PubMed PubMed Central Google Scholar
Jackson, P. et al. Artificial intelligence in medical education - perception among medical students. BMC Med Educ. 24, 804 (2024).
Article PubMed PubMed Central Google Scholar
Zheng, L. & Xiao, Y. Refining AI perspectives: assessing the impact of ai curricular on medical students’ attitudes towards artificial intelligence. BMC Med. Educ. 25, 1115 (2025).
Article PubMed PubMed Central Google Scholar
Abouammoh, N. et al. Perceptions and earliest experiences of medical students and faculty with ChatGPT in medical education: qualitative study. JMIR Med. Educ. 11, e63400 (2025).
Article PubMed PubMed Central Google Scholar
Straus, S. E., Johnson, M. O., Marquez, C. & Feldman, M. D. Characteristics of successful and failed mentoring relationships: a qualitative study across two academic health centers. Academic Med. 88, 82–89 (2013).
Article Google Scholar
Wu, J. & Olagunju, A. T. Mentorship in medical education: reflections on the importance of both unofficial and official mentorship programs. BMC Med Educ. 24, 1233 (2024).
Article PubMed PubMed Central CAS Google Scholar
Ren, M. et al. Optimizing a mentorship program from the perspective of academic medicine leadership – a qualitative study. BMC Med Educ. 24, 530 (2024).
Article PubMed PubMed Central Google Scholar
Hamid, M. & Rasheed, M. A. A new path to mentorship for emerging global health leaders in low-income and middle-income countries. Lancet Glob. Health 10, e946–e948 (2022).
Article PubMed CAS Google Scholar
Kpokiri, E. E. et al. Health research mentorship in low-income and middle-income countries: a global qualitative evidence synthesis of data from a crowdsourcing open call and scoping review. BMJ Glob. Health 9, e011166 (2024).
Article PubMed PubMed Central Google Scholar
Druetz, T. Integrated primary health care in low- and middle-income countries: a double challenge. BMC Med Ethics 19, 48 (2018).
Article PubMed PubMed Central Google Scholar
Alegre, J. C., Sharma, S., Cleghorn, F. & Avila, C. Strengthening primary health care in low- and middle-income countries: furthering structural changes in the post-pandemic era. Front. Public Health 11, 1270510 (2024).
Article PubMed PubMed Central Google Scholar
Macfadyen, L. P. & Dawson, S. Mining LMS data to develop an “early warning system” for educators: A proof of concept. Computers Educ. 54, 588–599 (2010).
Article Google Scholar
Da Silva Souza, R. C., Bersaneti, M. D. R., Dos Santos Yamaguti, W. P. & Baia, W. R. M. Mentoring in research: development of competencies for health professionals. BMC Nurs. 22, 244 (2023).
Article PubMed PubMed Central Google Scholar
Armijo, I. Balanced profiles: the role of cognitive and non-cognitive competencies in Chilean higher education academic achievement. Discov. Educ. 4, 302 (2025).
Article Google Scholar
Lai, J. W., Zhang, L., Sze, C. C. & Lim, F. S. Learning analytics for bridging the skills gap: a data-driven study of undergraduate aspirations and skills awareness for career preparedness. Educ. Sci. 15, 40 (2025).
Article Google Scholar
Ogurek, B. & Harendza, S. Medical students‘ leadership competence in health care: development of a self-assessment scale. BMC Med Educ. 24, 1275 (2024).
Article PubMed PubMed Central Google Scholar
Lee, I. R., Jung, H., Lee, Y., Shin, J. I. & An, S. An analysis of student essays on medical leadership and its educational implications in South Korea. Sci. Rep. 12, 5788 (2022).
Article PubMed PubMed Central CAS Google Scholar
Sebok-Syer, S. S. et al. Sharing is caring: helping institutions and health organizations leverage data for educational improvement. Perspect. Med. Educ. 13, 486–495 (2024).
Article PubMed PubMed Central Google Scholar
Chan, T. et al. Learning analytics in medical education assessment: the past, the present, and the future. AEM Educ. Train. 2, 178–187 (2018).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

Not applicable.

Author information

Authors and Affiliations

Faculty of Medicine, Nam Can Tho University, 168 Nguyen Van Cu Street, An Binh Ward, Can Tho City, Vietnam
Nghia Phu Nguyen & Phillip Tran

Authors

Nghia Phu Nguyen
View author publications
Search author on:PubMed Google Scholar
Phillip Tran
View author publications
Search author on:PubMed Google Scholar

Contributions

NN conceived the concept and drafted the manuscript. PT contributed to the conceptual development and critically revised the manuscript for important intellectual content. All authors read and approved the final version of the manuscript.

Corresponding author

Correspondence to Nghia Phu Nguyen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Nguyen, N.P., Tran, P. Bridging the mentorship divide: how large language models could reshape medical workforce equity. npj Digit. Med. 9, 29 (2026). https://doi.org/10.1038/s41746-025-02167-z

Download citation

Received: 29 September 2025
Accepted: 10 November 2025
Published: 09 January 2026
Version of record: 09 January 2026
DOI: https://doi.org/10.1038/s41746-025-02167-z