Mentorship inequity: an overlooked driver of inequity

Mentorship is part of the “hidden curriculum”1,2,3. It reflects the institutional and interpersonal influences that shape learning beyond formal instruction, encompassing values, norms, and opportunities embedded in the culture and organization of medical training4. Mentorship occurs in multiple forms—academic, clinical, research, and psychosocial—whose meanings and expectations differ across contexts and stages of education3,5. In this paper, the term refers mainly to formative academic and professional mentorship that supports learners’ reasoning, research engagement, and career development during medical education.

Mentorship can influence who feels confident, who tries research, and who teachers notice as “promising”2,3,6,7. In our view, it matters as much as lectures or exams, but we rarely treat it that way. In many low- and middle-income countries (LMICs), one clinical teacher may supervise forty or more students, making individual attention impossible8,9,10. This normalizes inequity: students get used to learning without personal guidance.

This gap is not random. Faculty shortages, heavy service duties, and unequal resources mean mentorship becomes scarce6,11,12,13,14,15,16. When mentorship is rare, the system tends to reward those already visible such as students in elite schools, fluent in dominant languages, or high score in exams, while LMIC institutions with fewer resources fall further behind. Traditional exams and Objective Structured Clinical Examination (OSCE) reinforce this inequity because they measure only current performance, not potential growth6,17,18,19. For medical trainees, this means that critical skills like diagnostic reasoning and bedside communication are often shaped by access to mentors rather than curriculum quality alone. This may contribute to a reinforcing structural pattern in which underserved regions produce fewer specialists and academic leaders over time.

What LLMs make visible

Research shows even childhood essays can predict learning and life outcomes decades later20,21,22,23. This suggests that language is more than words—it is a signal of how people think and learn. In medical education, reflective writing and case-based reasoning notes serve a similar function: they reveal how students synthesize evidence, empathy, and uncertainty in clinical contexts24,25. In universities, LLM-based tools can predict student performance early, even when no past grades exist26,27,28,29. This means they can give teachers a head start in noticing who might need help. For LMIC schools that cannot rely on long-term data systems, this ability to make early predictions from simple student writing is especially valuable.

Fine-tuned models perform as well as knowledge-tracing systems that track learning over time20,30,31. Multi-agent LLM systems can read reflective writing and flag students likely to struggle long before exams show it32,33. By proactively identifying learners at risk of disengagement or dropout, institutions could intervene early and provide timely support.

Even in adult literacy programs, GPT-4 has shown accuracy as good as or better than traditional benchmarks8. This suggests that curiosity, reasoning style, and engagement leave a trace in language across all ages. Yet most debates on LLMs in education still focus on plagiarism concerns, overlooking their potential to reveal strengths as well as risks.

From prediction to redistribution

Prediction alone is not enough. If we see hidden potential but do nothing, frustration only grows. The real challenge is to use insights for action: to give mentorship where it is most needed. We refer to this process as predictive redistribution—the purposeful use of model-derived insights to guide the fair reallocation of human mentorship and institutional attention. A possible pipeline could operates as a continuous feedback loop20,26,32:

  • Input: Collect standardized multimodal learner data encompassing both quantitative indicators and qualitative narratives (e.g., surveys, grades, engagement analytics, mentorship records, reflective essays, and structured case summaries) drawn from regular coursework and institutional systems, providing a consistent and equitable foundation for analysis (see Supplementary Information).

  • Analysis: LLMs extract indicators of reasoning, curiosity, collaboration, and communication that underpin safe clinical judgment.

  • Prediction: Generate interpretable profiles showing growth potential and learning risks.

  • Prescription: Match each profile with tailored mentorship strategies and resources. This stage is guided by human mentors through a dashboard that summarizes each learner’s growth and risk profile, enabling educators to craft and deliver appropriate mentorship actions.

  • Feedback: Mentor actions and learner outcomes feed back into the system, refining both model performance and human mentorship quality.

To visualize the conceptual flow, Fig. 1 presents the predictive mentorship redistribution loop, illustrating how LLM-based prediction links data interpretation to equitable mentorship actions. The framework functions as a closed equity feedback system, where each phase—Input, Analysis, Prediction, Prescription, and Feedback—continuously refines both algorithmic insight and human mentoring practice.

Fig. 1: Predictive mentorship redistribution loop.
Fig. 1: Predictive mentorship redistribution loop.
Full size image

This conceptual model illustrates how large language models (LLMs) can support equity-oriented mentorship through a continuous five-phase cycle. Input collects standardized multimodal learner data (e.g., surveys, reflective texts, performance metrics, and mentorship records). Analysis extracts reasoning and collaboration cues. Prediction generates interpretable learner profiles indicating potential and risk. Prescription guides tailored mentorship interventions, and Feedback captures human and learner responses to refine subsequent predictions. The labeled transitions—Visibility, Interpretation, Action, Support, and Reflection—represent the pedagogical bridges between these phases: Visibility makes hidden potential observable; Interpretation converts linguistic patterns into meaningful educational insight; Action transforms prediction into mentorship decisions; Support reflects the human process of guidance and engagement; and Reflection closes the loop by feeding experiential learning back into the system, sustaining continuous improvement and equity. Mentorship decisions are always crafted and delivered by educators using dashboard-based summaries, ensuring that AI insights inform—but never replace—human judgment.

While some may perceive this approach as overly structured, it is worth considering whether it is, in fact, more structured than existing systems in which mentorship opportunities are often influenced by chance, personal confidence, or social connections.

From a practical standpoint, this pipeline could be implemented using open-weight, instruction-tuned transformer models that support secure on-premise deployment, such as LlaMA 3, Mistral, Qwen 2, or Gemma 2. These models balance performance and privacy, allowing institutions to analyze sensitive educational data without external data transfer34,35,36,37. Comparable frameworks have already proven effective for educational reflection analysis and performance prediction: fine-tuned LLMs have achieved performance comparable to knowledge-tracing baselines across multiple datasets20; multi-agent models have successfully assessed student reflections in real-time feedback environments32; and early-prediction systems such as LLM-EPSP have demonstrated robust integration with institutional learning platforms26. Studies of adult-literacy prediction similarly confirm that modest text datasets (a few thousand short entries) are sufficient for reliable inference38. Importantly, empirical validation has already emerged across educational and clinical domains: fine-tuned and multi-agent LLM frameworks have demonstrated reliable prediction and assessment of student reflections6,20,26,30,31, AI-based evaluation has improved clinical-reasoning assessment6, and LLM-driven tutoring pilots and case studies have shown promising outcomes across diverse learning environments8,39. Notably, recent studies further demonstrate that LLM-based systems can identify hidden potential and flag at-risk learners early26,32,33,38, enabling proactive mentorship interventions and equitable resource allocation.

Collectively, these findings indicate the technical and ethical feasibility of implementing predictive mentorship within existing infrastructure—running on institutional Graphics Processing Unit (GPU) servers or secure cloud Application Programming Interface (API)—to generate interpretable feedback dashboards that augment rather than replace human mentors, thereby enhancing both efficiency and equity in the distribution of mentorship resources.

To operationalize this concept, the framework could be piloted in a cohort of medical students using an integrated dataset that combines institutional data (e.g., course grades, learning analytics…) with a brief self-report survey on study strategies, motivation, and perceived mentorship, along with a short reflective paragraph on learning challenges. The pilot could involve approximately 300–500 learners over one semester. A large language model (e.g., LLaMA-3 8B Instruct, Mistral 7B Instruct, or Qwen-2 7B) could be prompted to predict end-of-term performance and flag students at risk of low achievement, while a baseline regression model (e.g., linear or random-forest) trained on the same dataset would serve as a quantitative benchmark. Evaluation would compare predictive reliability across metrics of accuracy and fairness, and faculty mentors could qualitatively review early “risk” flags to assess interpretability and fairness. Such a lightweight pilot could provide initial evidence of feasibility and offer a transparent template for developing AI-assisted early-warning systems in medical education. (see Supplementary Information: Predictive Mentorship Dataset Specification for data items and survey instrument).

In medical training, such systems could help educators triage mentorship support by identifying learners struggling with diagnostic reasoning or professionalism early enough for formative feedback rather than remediation. Yet predictive mentorship cannot replace the human subtleties of encouragement and belonging. Its role is to surface opportunities, not to automate empathy.

While predictive mentorship frameworks hold particular promise for LMICs, their realization remains constrained by infrastructural and sociotechnical challenges. Empirical analyses across Southeast Asia reveal that unstable internet connectivity, limited GPU and cloud access, and weak digital governance continue to hinder sustainable AI integration in medical education and health-training40,41. Evidence from simulation-based and generative-AI learning programs further shows that cost, hardware dependency, and language diversity restrict scalability and long-term maintenance6,11,42,43,44,45,46,47,48. Implementation success in LMICs depends not only on technology but also on local capacity for model adaptation, educator training, and ethical oversight6,11,40,47,49,50,51. To ensure linguistic fairness, LLMs should be fine-tuned or prompted in the learners’ primary language where possible, and evaluated across multilingual samples to avoid penalizing students for non-dominant language styles. Addressing these barriers requires co-designed, resource-adaptive systems that can operate under intermittent connectivity, leverage multilingual datasets, and empower local educators to maintain and audit AI tools sustainably.

Equity by design

No algorithm is neutral8,52,53,54. Models trained on data from specific regions or institutions inevitably inherit the assumptions and biases of those contexts11,55,56. When applied across settings, especially from high-income to low- and middle-income environments, such models can inadvertently amplify existing inequities rather than correct them11,43,52,57. Models trained on limited or homogeneous data may disadvantage students whose writing reflects local dialects or culturally distinct styles7,8,29,52,56,58,59. And even the most advanced systems still struggle with balancing fairness and accuracy26,32,52,60,61,62. This is not a reason to give up but a call to design carefully: achieving fairness requires continuous local adaptation, bias auditing, and human-mentorship oversight—embedding transparency, diversity, and ethical review at every stage—so that predictive tools highlight potential rather than reproduce privilege. This principle of equity by design means embedding fairness, transparency, and human oversight into every stage of model development and educational use. In our view, humility is key. LLMs suggest probabilities, not truths. They should guide teachers, not replace them.

Beyond technical fairness, ethical safeguards are equally vital. Responsible predictive systems in medical education and healthcare training must protect data privacy, obtain informed consent, and ensure transparency in how outputs are generated and interpreted8,11,44,52,56,63. Learners’ writings should be anonymized and reviewed only within approved ethical frameworks, following clear communication about how the analysis will be used to guide—not judge—their development43,52. Because predictive assessment can evoke discomfort or a sense of surveillance, systems must provide interpretable feedback and retain human mentorship oversight at every stage. Recent evidence shows that both students and faculty value guidance on ethical and human-centered AI use, expressing concern that excessive reliance on algorithms could erode trust, empathy, and critical thinking in medical education40,43,44,56,64,65,66,67,68,69. Embedding these safeguards of privacy, consent, transparency, and empathy can help predictive mentorship advance equity while maintaining human dignity and confidence in the learning process.

Toward workforce equity

The effects of missed mentorship last for decades. Students who never receive guidance are less likely to publish, specialize, or become mentors themselves3,6,70,71,72. Over time, this builds into workforce gaps. Policymakers often talk about shortages in numbers. But behind the numbers is an upstream filter: who got attention early on. These shortages in LMICs translate directly into fewer clinicians in rural or primary-care settings, slower specialist training, and weaker academic pipelines14,73,74. This pattern is compounded by the fact that primary-care systems in many LMICs remain underdeveloped, limiting the structural capacity to retain and distribute trained professionals equitably75,76. Predictive mentorship thus provides a conceptual link between educational equity and the broader conditions that shape future workforce distribution.

In places with too few human mentors, predictive tools could help stretch limited resources, pointing support to the students who need it most26. In this sense, predictive mentorship remains primarily an educational strategy that leverages large language models to personalize learning and expand access to academic support—an approach consistent with recent theoretical frameworks for integrating LLMs into education8. By promoting fairer and more adaptive mentorship processes, such tools may indirectly influence who advances, specializes, and contributes back as future mentors. Table 1 shows some key application areas where LLM prediction can be leveraged to promote mentorship equity, detailing current challenges, potential solutions, and anticipated equity impacts. It focuses on why and where predictive mentorship may promote workforce equity.

Table 1 Key application areas where LLM-based prediction can promote mentorship equity, summarizing current challenges, predictive mechanisms, and expected equity impacts

Together, these five applications illustrate how LLMs might help shift mentorship from a matter of luck into a more deliberate and fair process. By revealing hidden strengths, spotting risks earlier, guiding mentor–student matches, widening access to leadership roles, and informing workforce planning, predictive mentorship could help make better use of scarce resources. The idea is straightforward: instead of waiting for students to either thrive or fail on their own, we can use signals in their language to guide support where it will have the most impact. Rather than replacing human judgment, such tools can guide support where it matters most, fostering a more equitable and sustainable professional pipeline over time.

Conclusion

Mentorship, when considered through the lens of predictive LLMs, invites reflection on how opportunities in medical education are recognized and shared. These systems may help surface learning needs and potential that traditional assessments overlook, yet they also risk reproducing existing inequities if applied without care. The medical workforce of the future may depend as much on how mentorship is supported and distributed as on curricula or examinations. The value of predictive mentorship lies less in technological novelty than in its capacity, when responsibly designed and governed, to inform fairer and more transparent educational support.