Mentorship in predictive large language models (LLMs) presents both opportunities and risks of inequity if these systems are not designed with care. Mentorship quietly shapes medical careers, yet opportunities for it remain unequal. LLMs can analyze student writing and may reveal potential that exams overlook. This predictive capacity could inform fairer mentorship and direct support to students who need it most, helping build a more equitable medical workforce.
Mentorship inequity: an overlooked driver of inequity
Mentorship is part of the “hidden curriculum”1,2,3. It reflects the institutional and interpersonal influences that shape learning beyond formal instruction, encompassing values, norms, and opportunities embedded in the culture and organization of medical training4. Mentorship occurs in multiple forms—academic, clinical, research, and psychosocial—whose meanings and expectations differ across contexts and stages of education3,5. In this paper, the term refers mainly to formative academic and professional mentorship that supports learners’ reasoning, research engagement, and career development during medical education.
Mentorship can influence who feels confident, who tries research, and who teachers notice as “promising”2,3,6,7. In our view, it matters as much as lectures or exams, but we rarely treat it that way. In many low- and middle-income countries (LMICs), one clinical teacher may supervise forty or more students, making individual attention impossible8,9,10. This normalizes inequity: students get used to learning without personal guidance.
This gap is not random. Faculty shortages, heavy service duties, and unequal resources mean mentorship becomes scarce6,11,12,13,14,15,16. When mentorship is rare, the system tends to reward those already visible such as students in elite schools, fluent in dominant languages, or high score in exams, while LMIC institutions with fewer resources fall further behind. Traditional exams and Objective Structured Clinical Examination (OSCE) reinforce this inequity because they measure only current performance, not potential growth6,17,18,19. For medical trainees, this means that critical skills like diagnostic reasoning and bedside communication are often shaped by access to mentors rather than curriculum quality alone. This may contribute to a reinforcing structural pattern in which underserved regions produce fewer specialists and academic leaders over time.
What LLMs make visible
Research shows even childhood essays can predict learning and life outcomes decades later20,21,22,23. This suggests that language is more than words—it is a signal of how people think and learn. In medical education, reflective writing and case-based reasoning notes serve a similar function: they reveal how students synthesize evidence, empathy, and uncertainty in clinical contexts24,25. In universities, LLM-based tools can predict student performance early, even when no past grades exist26,27,28,29. This means they can give teachers a head start in noticing who might need help. For LMIC schools that cannot rely on long-term data systems, this ability to make early predictions from simple student writing is especially valuable.
Fine-tuned models perform as well as knowledge-tracing systems that track learning over time20,30,31. Multi-agent LLM systems can read reflective writing and flag students likely to struggle long before exams show it32,33. By proactively identifying learners at risk of disengagement or dropout, institutions could intervene early and provide timely support.
Even in adult literacy programs, GPT-4 has shown accuracy as good as or better than traditional benchmarks8. This suggests that curiosity, reasoning style, and engagement leave a trace in language across all ages. Yet most debates on LLMs in education still focus on plagiarism concerns, overlooking their potential to reveal strengths as well as risks.
From prediction to redistribution
Prediction alone is not enough. If we see hidden potential but do nothing, frustration only grows. The real challenge is to use insights for action: to give mentorship where it is most needed. We refer to this process as predictive redistribution—the purposeful use of model-derived insights to guide the fair reallocation of human mentorship and institutional attention. A possible pipeline could operates as a continuous feedback loop20,26,32:
-
Input: Collect standardized multimodal learner data encompassing both quantitative indicators and qualitative narratives (e.g., surveys, grades, engagement analytics, mentorship records, reflective essays, and structured case summaries) drawn from regular coursework and institutional systems, providing a consistent and equitable foundation for analysis (see Supplementary Information).
-
Analysis: LLMs extract indicators of reasoning, curiosity, collaboration, and communication that underpin safe clinical judgment.
-
Prediction: Generate interpretable profiles showing growth potential and learning risks.
-
Prescription: Match each profile with tailored mentorship strategies and resources. This stage is guided by human mentors through a dashboard that summarizes each learner’s growth and risk profile, enabling educators to craft and deliver appropriate mentorship actions.
-
Feedback: Mentor actions and learner outcomes feed back into the system, refining both model performance and human mentorship quality.
To visualize the conceptual flow, Fig. 1 presents the predictive mentorship redistribution loop, illustrating how LLM-based prediction links data interpretation to equitable mentorship actions. The framework functions as a closed equity feedback system, where each phase—Input, Analysis, Prediction, Prescription, and Feedback—continuously refines both algorithmic insight and human mentoring practice.
This conceptual model illustrates how large language models (LLMs) can support equity-oriented mentorship through a continuous five-phase cycle. Input collects standardized multimodal learner data (e.g., surveys, reflective texts, performance metrics, and mentorship records). Analysis extracts reasoning and collaboration cues. Prediction generates interpretable learner profiles indicating potential and risk. Prescription guides tailored mentorship interventions, and Feedback captures human and learner responses to refine subsequent predictions. The labeled transitions—Visibility, Interpretation, Action, Support, and Reflection—represent the pedagogical bridges between these phases: Visibility makes hidden potential observable; Interpretation converts linguistic patterns into meaningful educational insight; Action transforms prediction into mentorship decisions; Support reflects the human process of guidance and engagement; and Reflection closes the loop by feeding experiential learning back into the system, sustaining continuous improvement and equity. Mentorship decisions are always crafted and delivered by educators using dashboard-based summaries, ensuring that AI insights inform—but never replace—human judgment.
While some may perceive this approach as overly structured, it is worth considering whether it is, in fact, more structured than existing systems in which mentorship opportunities are often influenced by chance, personal confidence, or social connections.
From a practical standpoint, this pipeline could be implemented using open-weight, instruction-tuned transformer models that support secure on-premise deployment, such as LlaMA 3, Mistral, Qwen 2, or Gemma 2. These models balance performance and privacy, allowing institutions to analyze sensitive educational data without external data transfer34,35,36,37. Comparable frameworks have already proven effective for educational reflection analysis and performance prediction: fine-tuned LLMs have achieved performance comparable to knowledge-tracing baselines across multiple datasets20; multi-agent models have successfully assessed student reflections in real-time feedback environments32; and early-prediction systems such as LLM-EPSP have demonstrated robust integration with institutional learning platforms26. Studies of adult-literacy prediction similarly confirm that modest text datasets (a few thousand short entries) are sufficient for reliable inference38. Importantly, empirical validation has already emerged across educational and clinical domains: fine-tuned and multi-agent LLM frameworks have demonstrated reliable prediction and assessment of student reflections6,20,26,30,31, AI-based evaluation has improved clinical-reasoning assessment6, and LLM-driven tutoring pilots and case studies have shown promising outcomes across diverse learning environments8,39. Notably, recent studies further demonstrate that LLM-based systems can identify hidden potential and flag at-risk learners early26,32,33,38, enabling proactive mentorship interventions and equitable resource allocation.
Collectively, these findings indicate the technical and ethical feasibility of implementing predictive mentorship within existing infrastructure—running on institutional Graphics Processing Unit (GPU) servers or secure cloud Application Programming Interface (API)—to generate interpretable feedback dashboards that augment rather than replace human mentors, thereby enhancing both efficiency and equity in the distribution of mentorship resources.
To operationalize this concept, the framework could be piloted in a cohort of medical students using an integrated dataset that combines institutional data (e.g., course grades, learning analytics…) with a brief self-report survey on study strategies, motivation, and perceived mentorship, along with a short reflective paragraph on learning challenges. The pilot could involve approximately 300–500 learners over one semester. A large language model (e.g., LLaMA-3 8B Instruct, Mistral 7B Instruct, or Qwen-2 7B) could be prompted to predict end-of-term performance and flag students at risk of low achievement, while a baseline regression model (e.g., linear or random-forest) trained on the same dataset would serve as a quantitative benchmark. Evaluation would compare predictive reliability across metrics of accuracy and fairness, and faculty mentors could qualitatively review early “risk” flags to assess interpretability and fairness. Such a lightweight pilot could provide initial evidence of feasibility and offer a transparent template for developing AI-assisted early-warning systems in medical education. (see Supplementary Information: Predictive Mentorship Dataset Specification for data items and survey instrument).
In medical training, such systems could help educators triage mentorship support by identifying learners struggling with diagnostic reasoning or professionalism early enough for formative feedback rather than remediation. Yet predictive mentorship cannot replace the human subtleties of encouragement and belonging. Its role is to surface opportunities, not to automate empathy.
While predictive mentorship frameworks hold particular promise for LMICs, their realization remains constrained by infrastructural and sociotechnical challenges. Empirical analyses across Southeast Asia reveal that unstable internet connectivity, limited GPU and cloud access, and weak digital governance continue to hinder sustainable AI integration in medical education and health-training40,41. Evidence from simulation-based and generative-AI learning programs further shows that cost, hardware dependency, and language diversity restrict scalability and long-term maintenance6,11,42,43,44,45,46,47,48. Implementation success in LMICs depends not only on technology but also on local capacity for model adaptation, educator training, and ethical oversight6,11,40,47,49,50,51. To ensure linguistic fairness, LLMs should be fine-tuned or prompted in the learners’ primary language where possible, and evaluated across multilingual samples to avoid penalizing students for non-dominant language styles. Addressing these barriers requires co-designed, resource-adaptive systems that can operate under intermittent connectivity, leverage multilingual datasets, and empower local educators to maintain and audit AI tools sustainably.
Equity by design
No algorithm is neutral8,52,53,54. Models trained on data from specific regions or institutions inevitably inherit the assumptions and biases of those contexts11,55,56. When applied across settings, especially from high-income to low- and middle-income environments, such models can inadvertently amplify existing inequities rather than correct them11,43,52,57. Models trained on limited or homogeneous data may disadvantage students whose writing reflects local dialects or culturally distinct styles7,8,29,52,56,58,59. And even the most advanced systems still struggle with balancing fairness and accuracy26,32,52,60,61,62. This is not a reason to give up but a call to design carefully: achieving fairness requires continuous local adaptation, bias auditing, and human-mentorship oversight—embedding transparency, diversity, and ethical review at every stage—so that predictive tools highlight potential rather than reproduce privilege. This principle of equity by design means embedding fairness, transparency, and human oversight into every stage of model development and educational use. In our view, humility is key. LLMs suggest probabilities, not truths. They should guide teachers, not replace them.
Beyond technical fairness, ethical safeguards are equally vital. Responsible predictive systems in medical education and healthcare training must protect data privacy, obtain informed consent, and ensure transparency in how outputs are generated and interpreted8,11,44,52,56,63. Learners’ writings should be anonymized and reviewed only within approved ethical frameworks, following clear communication about how the analysis will be used to guide—not judge—their development43,52. Because predictive assessment can evoke discomfort or a sense of surveillance, systems must provide interpretable feedback and retain human mentorship oversight at every stage. Recent evidence shows that both students and faculty value guidance on ethical and human-centered AI use, expressing concern that excessive reliance on algorithms could erode trust, empathy, and critical thinking in medical education40,43,44,56,64,65,66,67,68,69. Embedding these safeguards of privacy, consent, transparency, and empathy can help predictive mentorship advance equity while maintaining human dignity and confidence in the learning process.
Toward workforce equity
The effects of missed mentorship last for decades. Students who never receive guidance are less likely to publish, specialize, or become mentors themselves3,6,70,71,72. Over time, this builds into workforce gaps. Policymakers often talk about shortages in numbers. But behind the numbers is an upstream filter: who got attention early on. These shortages in LMICs translate directly into fewer clinicians in rural or primary-care settings, slower specialist training, and weaker academic pipelines14,73,74. This pattern is compounded by the fact that primary-care systems in many LMICs remain underdeveloped, limiting the structural capacity to retain and distribute trained professionals equitably75,76. Predictive mentorship thus provides a conceptual link between educational equity and the broader conditions that shape future workforce distribution.
In places with too few human mentors, predictive tools could help stretch limited resources, pointing support to the students who need it most26. In this sense, predictive mentorship remains primarily an educational strategy that leverages large language models to personalize learning and expand access to academic support—an approach consistent with recent theoretical frameworks for integrating LLMs into education8. By promoting fairer and more adaptive mentorship processes, such tools may indirectly influence who advances, specializes, and contributes back as future mentors. Table 1 shows some key application areas where LLM prediction can be leveraged to promote mentorship equity, detailing current challenges, potential solutions, and anticipated equity impacts. It focuses on why and where predictive mentorship may promote workforce equity.
Together, these five applications illustrate how LLMs might help shift mentorship from a matter of luck into a more deliberate and fair process. By revealing hidden strengths, spotting risks earlier, guiding mentor–student matches, widening access to leadership roles, and informing workforce planning, predictive mentorship could help make better use of scarce resources. The idea is straightforward: instead of waiting for students to either thrive or fail on their own, we can use signals in their language to guide support where it will have the most impact. Rather than replacing human judgment, such tools can guide support where it matters most, fostering a more equitable and sustainable professional pipeline over time.
Conclusion
Mentorship, when considered through the lens of predictive LLMs, invites reflection on how opportunities in medical education are recognized and shared. These systems may help surface learning needs and potential that traditional assessments overlook, yet they also risk reproducing existing inequities if applied without care. The medical workforce of the future may depend as much on how mentorship is supported and distributed as on curricula or examinations. The value of predictive mentorship lies less in technological novelty than in its capacity, when responsibly designed and governed, to inform fairer and more transparent educational support.
Data availability
No empirical datasets were generated or analysed during the current study. The study introduces the Predictive Mentorship Dataset Specification, which outlines the structure and variables of an integrated mentorship dataset combining institutional analytics and learner self-reports. The full dataset specification and survey instrument developed by the authors are available in the Supplementary Information file.
References
Hafferty, F. W. Beyond curriculum reform: confronting medicineʼs hidden curriculum. Acad. Med. 73, 403–407 (1998).
Jackson, V. A. et al. Having the right chemistry: a qualitative study of mentoring in academic medicine. Acad. Med. 78, 328–334 (2003).
Sambunjak, D., Straus, S. E. & Marušić, A. Mentoring in academic medicine: a systematic review. JAMA 296, 1103 (2006).
Lawrence, C. et al. The hidden curricula of medical education: a scoping review. Acad. Med. 93, 648–656 (2018).
Ellis, M., Wilson, G., Nulan, E., Day, M. & McElroy, J. Mentoring, coaching and peer-support programs promoting well-being for physicians: A systematic review. MRAJ 12, (2024).
Schaye, V. et al. Artificial intelligence based assessment of clinical reasoning documentation: an observational study of the impact of the clinical learning environment on resident documentation quality. BMC Med. Educ. 25, 591 (2025).
Wolfram, T. Large language models predict cognition and education close to or better than genomics or expert assessment. Commun. Psychol. 3, 95 (2025).
Shahzad, T. et al. A comprehensive review of large language models: issues and solutions in learning environments. Discov. Sustain 6, 27 (2025).
Mkony, C. A., Kaaya, E. E., Goodell, A. J. & Macfarlane, S. B. Where teachers are few: documenting available faculty in five Tanzanian medical schools. Glob. Health Action 9, 32717 (2016).
De Villiers, M. et al. Decentralised training for medical students: a scoping review. BMC Med Educ. 17, 196 (2017).
Feigerlova, E., Hani, H. & Hothersall-Davies, E. A systematic review of the impact of artificial intelligence on educational outcomes in health professions education. BMC Med. Educ. 25, 129 (2025).
Schaye, V. et al. Development of a Clinical Reasoning Documentation Assessment Tool for Resident and Fellow Admission Notes: a Shared Mental Model for Feedback. J. Gen. Intern. Med. 37, 507–512 (2022).
Bennett, S., Paina, L., Ssengooba, F., Waswa, D. & M′Imunya, J. Mentorship in African health research training programs: an exploratory study of fogarty international center programs in Kenya and Uganda. Educ. Health 26, 183 (2013).
Lescano, A. G. et al. Strengthening Mentoring in Low- and Middle-Income Countries to Advance Global Health Research: An Overview. Am. J. Tropical Med. Hyg. 100, 3–8 (2019).
Nakanjako, D. et al. Doctoral training in Uganda: evaluation of mentoring best practices at Makerere university college of health sciences. BMC Med. Educ. 14, 9 (2014).
Schwerdtle, P., Morphet, J. & Hall, H. A scoping review of mentorship of health personnel to improve the quality of health care in low and middle-income countries. Glob. Health 13, 77 (2017).
Reid, H., Gormley, G. J., Dornan, T. & Johnston, J. L. Harnessing insights from an activity system – OSCEs past and present expanding future assessments. Med. Teach. 43, 44–49 (2021).
Malau-Aduli, B. S., Jones, K., Saad, S. & Richmond, C. Has the OSCE Met Its Final Demise? Rebalancing Clinical Assessment Approaches in the Peri-Pandemic World. Front. Med. 9, 825502 (2022).
Newble, D. Techniques for measuring clinical competence: objective structured clinical examinations. Med. Educ. 38, 199–203 (2004).
Neshaei, S. P. et al. Towards Modeling Learner Performance with Large Language Models. Preprint at https://doi.org/10.48550/arXiv.2403.14661 (2024).
Boehm, J. K., Qureshi, F. & Kubzansky, L. D. In the words of early adolescents: a novel assessment of positive psychological well-being predicts young adult depressive symptoms. J. Adolesc. Health 74, 713–719 (2024).
Radford, K. et al. Can adult mental health be predicted by childhood future-self narratives? Insights from the CLPsych 2018 Shared Task. In Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic 126–135 (Association for Computational Linguistics, New Orleans, LA, 2018). https://doi.org/10.18653/v1/W18-0614.
Laurin, K., Engstrom, H. R. & Huang, M. What will my life be like when I am 25? How do children’s social class contexts predict their imagined and actual futures? J. Soc. Issues 80, 1433–1459 (2024).
Mamede, S. & Schmidt, H. G. Deliberate reflection and clinical reasoning: Founding ideas and empirical findings. Med. Educ. 57, 76–85 (2023).
Lim, J. Y. et al. A systematic scoping review of reflective writing in medical education. BMC Med Educ. 23, 12 (2023).
Zhou, H. et al. LLM-EPSP: Large language model empowered early prediction of student performance. Inf. Process. Manag. 63, 104351 (2026).
Kalita, E. et al. Predicting student academic performance using Bi-LSTM: a deep learning framework with SHAP-based interpretability and statistical validation. Front. Educ. 10, 1581247 (2025).
Turkmenbayev, A., Abdykerimova, E., Nurgozhayev, S., Karabassova, G. & Baigozhanova, D. The application of machine learning in predicting student performance in university engineering programs: a rapid review. Front. Educ. 10, 1562586 (2025).
Ahmed, W. et al. Machine learning-based academic performance prediction with explainability for enhanced decision-making in educational institutions. Sci. Rep. 15, 26879 (2025).
Wang, D., Chen, G. & Lu, Y. Fine-Tuning Large Language Models for Knowledge Tracing Harnessing Insights from Explainable AI. In Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium, Blue Sky, and WideAIED (eds Cristea, A. I., Walker, E., Lu, Y., Santos, O. C. & Isotani, S.) vol. 2590, 297–302 (Springer Nature Switzerland, Cham, 2025).
Wang, Z. et al. LLM-KT: Aligning Large Language Models with Knowledge Tracing using a Plug-and-Play Instruction. Preprint at https://doi.org/10.48550/ARXIV.2502.02945 (2025).
Li, G. et al. Single-agent vs. Multi-agent LLM Strategies for Automated Student Reflection Assessment. in Advances in Knowledge Discovery and Data Mining (eds Wu, X. et al.) vol. 15874 300–311 (Springer Nature Singapore, Singapore, 2025).
Lin, C.-C., Cheng, E. S. J., Huang, A. Y. Q. & Yang, S. J. H. DNA of learning behaviors: A novel approach of learning performance prediction by NLP. Computers Educ.: Artif. Intell. 6, 100227 (2024).
Jiang, A. Q. et al. Mixtral of Experts. Preprint at https://doi.org/10.48550/ARXIV.2401.04088 (2024).
Gemma Team et al. Gemma 2: Improving Open Language Models at a Practical Size. Preprint at https://doi.org/10.48550/ARXIV.2408.00118 (2024).
Grattafiori, A. et al. The Llama 3 Herd of Models. Preprint at https://doi.org/10.48550/ARXIV.2407.21783 (2024).
Yang, A. et al. Qwen2 Technical Report. Preprint at https://doi.org/10.48550/ARXIV.2407.10671 (2024).
Zhang, L. et al. Predicting Learning Performance with Large Language Models: A Study in Adult Literacy. Preprint at https://doi.org/10.48550/arXiv.2403.14668 (2024).
García-Méndez, S., Arriba-Pérez, F. D. & Somoza-López, M. D. C. A review on the use of large language models as virtual tutors. Sci. Educ. 34, 877–892 (2025).
Ahsan, Z. Integrating artificial intelligence into medical education: a narrative systematic review of current applications, challenges, and future directions. BMC Med. Educ. 25, 1187 (2025).
Wibowo, M. F. et al. Insights into the current and future state of AI adoption within health systems in southeast asia: cross-sectional qualitative study. J. Med. Int. Res. 27, e71591 (2025).
Li, X., Elnagar, D., Song, G. & Ghannam, R. Advancing medical education using virtual and augmented reality in low- and middle-income countries: a systematic and critical review. Virtual Worlds 3, 384–403 (2024).
Duan, S., Liu, C., Rong, T., Zhao, Y. & Liu, B. Integrating AI in medical education: a comprehensive study of medical students’ attitudes, concerns, and behavioral intentions. BMC Med. Educ. 25, 599 (2025).
Weidener, L. & Fischer, M. Artificial intelligence in medicine: cross-sectional study among medical students on application, education, and ethical aspects. JMIR Med. Educ. 10, e51247 (2024).
Ferreira, J. M. G. et al. Effectiveness of Low-cost, Technology-enhanced Simulation Training for Healthcare Training in Low—and Middle-income Countries (LMICs): A Systematic Literature Review. J. Gen. Intern. Med. https://doi.org/10.1007/s11606-025-09794-y.
Nag, A., Mukherjee, A., Ganguly, N. & Chakrabarti, S. Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs. in Findings of the Association for Computational Linguistics: EMNLP 2024 15681–15701 (Association for Computational Linguistics, Miami, Florida, USA, 2024). https://doi.org/10.18653/v1/2024.findings-emnlp.920.
Robinson, S. J. et al. A guide to outcome evaluation of simulation-based education programmes in low and middle-income countries. ANZ J. Surg. 94, 1011–1020 (2024).
Chandran, V. P. et al. Mobile applications in medical education: A systematic review and meta-analysis. PLoS ONE 17, e0265927 (2022).
Rincón, E. H. H. et al. Mapping the use of artificial intelligence in medical education: a scoping review. BMC Med Educ. 25, 526 (2025).
Robinson, S. J. A. et al. Simulation-Based Education of Health Workers in Low- and Middle-Income Countries: A Systematic Review. Glob. Health Sci. Pr. 12, e2400187 (2024).
Shen, M. et al. Development and implementation of a multiple stage emergency care training program in Kono, Sierra Leone: a clinician-educator curriculum. BMC Med. Educ. 25, 1411 (2025).
Ethics and Governance of Artificial Intelligence for Health: Large Multi-Modal Models. WHO Guidance. (World Health Organization, Geneva, 2024).
Stinson, C. Algorithms are not neutral: Bias in collaborative filtering. AI Ethics 2, 763–770 (2022).
Phillips-Brown, M. Algorithmic neutrality. Preprint at https://doi.org/10.48550/ARXIV.2303.05103 (2023).
Templin, T. et al. Framework for bias evaluation in large language models in healthcare settings. npj Digit. Med. 8, 414 (2025).
Gordon, M. et al. A scoping review of artificial intelligence in medical education: BEME Guide No. 84. Med. Teach. 46, 446–470 (2024).
Yang, J. et al. Mitigating machine learning bias between high income and low–middle income countries for enhanced model fairness and generalizability. Sci. Rep. 14, 13318 (2024).
Joshi, A. et al. Natural language processing for dialects of a language: a survey. ACM Comput. Surv. 57, 1–37 (2025).
Fleisig, E. et al. Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing 13541–13564 (Association for Computational Linguistics, Miami, Florida, USA, 2024). https://doi.org/10.18653/v1/2024.emnlp-main.750.
Liu, W. et al. Fairness identification of large language models in recommendation. Sci. Rep. 15, 5516 (2025).
Plecko, D. & Bareinboim, E. Fairness-accuracy trade-offs: a causal perspective. AAAI 39, 26344–26353 (2025).
Buijsman, S. Navigating fairness measures and trade-offs. AI Ethics 4, 1323–1334 (2024).
Jobin, A., Ienca, M. & Vayena, E. The global landscape of AI ethics guidelines. Nat. Mach. Intell. 1, 389–399 (2019).
Blanco, M. A. et al. Integrating artificial intelligence into medical education: a roadmap informed by a survey of faculty and students. Med. Educ. Online 30, 2531177 (2025).
Salih, S. M. Perceptions of faculty and students about use of artificial intelligence in medical education: a qualitative study. Cureus https://doi.org/10.7759/cureus.57605 (2024).
Sami, A. et al. Medical students’ attitudes toward AI in education: perception, effectiveness, and its credibility. BMC Med Educ. 25, 82 (2025).
Jackson, P. et al. Artificial intelligence in medical education - perception among medical students. BMC Med Educ. 24, 804 (2024).
Zheng, L. & Xiao, Y. Refining AI perspectives: assessing the impact of ai curricular on medical students’ attitudes towards artificial intelligence. BMC Med. Educ. 25, 1115 (2025).
Abouammoh, N. et al. Perceptions and earliest experiences of medical students and faculty with ChatGPT in medical education: qualitative study. JMIR Med. Educ. 11, e63400 (2025).
Straus, S. E., Johnson, M. O., Marquez, C. & Feldman, M. D. Characteristics of successful and failed mentoring relationships: a qualitative study across two academic health centers. Academic Med. 88, 82–89 (2013).
Wu, J. & Olagunju, A. T. Mentorship in medical education: reflections on the importance of both unofficial and official mentorship programs. BMC Med Educ. 24, 1233 (2024).
Ren, M. et al. Optimizing a mentorship program from the perspective of academic medicine leadership – a qualitative study. BMC Med Educ. 24, 530 (2024).
Hamid, M. & Rasheed, M. A. A new path to mentorship for emerging global health leaders in low-income and middle-income countries. Lancet Glob. Health 10, e946–e948 (2022).
Kpokiri, E. E. et al. Health research mentorship in low-income and middle-income countries: a global qualitative evidence synthesis of data from a crowdsourcing open call and scoping review. BMJ Glob. Health 9, e011166 (2024).
Druetz, T. Integrated primary health care in low- and middle-income countries: a double challenge. BMC Med Ethics 19, 48 (2018).
Alegre, J. C., Sharma, S., Cleghorn, F. & Avila, C. Strengthening primary health care in low- and middle-income countries: furthering structural changes in the post-pandemic era. Front. Public Health 11, 1270510 (2024).
Macfadyen, L. P. & Dawson, S. Mining LMS data to develop an “early warning system” for educators: A proof of concept. Computers Educ. 54, 588–599 (2010).
Da Silva Souza, R. C., Bersaneti, M. D. R., Dos Santos Yamaguti, W. P. & Baia, W. R. M. Mentoring in research: development of competencies for health professionals. BMC Nurs. 22, 244 (2023).
Armijo, I. Balanced profiles: the role of cognitive and non-cognitive competencies in Chilean higher education academic achievement. Discov. Educ. 4, 302 (2025).
Lai, J. W., Zhang, L., Sze, C. C. & Lim, F. S. Learning analytics for bridging the skills gap: a data-driven study of undergraduate aspirations and skills awareness for career preparedness. Educ. Sci. 15, 40 (2025).
Ogurek, B. & Harendza, S. Medical students‘ leadership competence in health care: development of a self-assessment scale. BMC Med Educ. 24, 1275 (2024).
Lee, I. R., Jung, H., Lee, Y., Shin, J. I. & An, S. An analysis of student essays on medical leadership and its educational implications in South Korea. Sci. Rep. 12, 5788 (2022).
Sebok-Syer, S. S. et al. Sharing is caring: helping institutions and health organizations leverage data for educational improvement. Perspect. Med. Educ. 13, 486–495 (2024).
Chan, T. et al. Learning analytics in medical education assessment: the past, the present, and the future. AEM Educ. Train. 2, 178–187 (2018).
Acknowledgements
Not applicable.
Author information
Authors and Affiliations
Contributions
NN conceived the concept and drafted the manuscript. PT contributed to the conceptual development and critically revised the manuscript for important intellectual content. All authors read and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Nguyen, N.P., Tran, P. Bridging the mentorship divide: how large language models could reshape medical workforce equity. npj Digit. Med. 9, 29 (2026). https://doi.org/10.1038/s41746-025-02167-z
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41746-025-02167-z
