Abstract
Large language models (LLMs) are emerging as powerful tools in healthcare, with a growing role in global health, particularly in low- and middle-income countries (LMICs). This Perspective examines the current progress, challenges and prospects of LLMs in addressing health system disparities and supporting the achievement of the Sustainable Development Goals (SDGs). While high-income countries dominate the development and deployment of LLMs, LMICs face substantial barriers. These include limited digital infrastructure, a scarcity of locally relevant data, regulatory gaps, under-representation of local languages and dialects, and challenges related to privacy and data security. The limited availability of local expertise, capacity building programmes and sustained technical support remains a key barrier to scaling LLMs in LMICs. Nonetheless, case studies highlight how mobile-based LLM applications, hybrid artificial intelligence systems and open-weight models like DeepSeek are enhancing access to care, improving diagnostics and supporting clinical decision-making in resource-limited settings. Key risks include model hallucinations, equity concerns and environmental impacts. These underscore the need for rigorous validation, localized fine-tuning and global governance frameworks. The implementation of LLMs with contextual sensitivity, responsible oversight and codevelopment partnerships is important to avoid perpetuating health inequities. With the right safeguards and strategic investments in capacity building, LLMs have the potential to transform global health by bridging divides in access, augmenting overburdened health workforces, and enabling scalable and cost-effective innovations for the most underserved communities.
Similar content being viewed by others
Main
Large language models (LLMs) have evolved rapidly from text generation tools to engines of knowledge retrieval, integration and decision support. LLMs have now surpassed the capabilities of their predecessors in natural language understanding1. In healthcare, LLMs are currently being actively integrated into clinical and operational workflows in hospital systems, including their use in clinical documentation to facilitate and alleviate clinician burden2, as well as in medical education3, biomedical research4 and even clinical decision support as a ‘copilot’ to physicians5.
In contrast to the relatively rapid progress in some contexts, the role and use of LLMs in the broader landscape of global health remain at a nascent stage6,7,8. Few studies delve into the practical, technical and economic levers needed to translate LLMs’ potential into real-world impact, especially in low- and middle-income countries (LMICs), where resource constraints, such as the lack of experts in local implementation and inadequate regulatory frameworks, present additional layers of complexity. Existing discussions often remain general6,7,8, without grounding the opportunities and risks of LLMs in the concrete development needs of health systems across different contexts, particularly in LMICs where these challenges are more pronounced.
To address this gap, this Perspective synthesizes recent progress in LLM applications through a global health lens, explicitly linking the roles of LLMs with the United Nations’ Sustainable Development Goal (SDG) 3 (that is, ensure healthy lives and promote well-being for all at all ages). By explicitly mapping the opportunities and risks presented by LLMs to each target of SDG 3, we provide a practical framework for policymakers and researchers to assess trade-offs and identify gaps in LLM applications. Anchored in these goals, we explore recent innovations that may lower barriers to LLM deployment, particularly in resource-limited settings. Finally, the existing known risks of LLMs are highlighted, and a practical road map for realizing the benefits of LLMs in global health is proposed, focusing on equitable deployment, contextual fine-tuning and collaborative human–artificial intelligence (AI) integration.
Evolution of LLM technologies and applications in global health
The development and adoption of generative AI, including LLMs, is rapidly accelerating. The introduction of the transformer architecture has sparked a rapid surge in capabilities within just a few years9. Models have since evolved from accepting text-only inputs to multimodal inputs (for example, images, audio)—from models that generate an output directly to models capable of generating tokens that reflect their reasoning process10,11 (Fig. 1 and Box 1).
The timeline begins with the release of the transformer architecture, which was originally designed for machine translation and is based on an encoder–decoder architecture. This formed the foundation of LLMs, although many later models adopted a variation: the decoder-only architecture. We categorize LLMs into the following types: open-weight LLMs, proprietary LLMs, multimodal LLMs, reasoning LLMs and LLMs developed for medical applications. This diagram illustrates the most popular models and current development trends but is not intended to be an exhaustive list.
Despite their rapid development, a large disparity in the usage and adoption of LLMs exists among different countries. Commonly used proprietary models, when tested, demonstrate systematic bias towards retrieving and reflecting the origins of the data from which they are developed, typically including US-based contexts12. While high- and middle-income countries account for >90% of generative AI traffic, low-income countries account for <1% (ref. 13). This traffic includes the usage of LLMs, as well as image and digital art generation tools like Midjourney and Stable Diffusion. More broadly, the intensity of generative AI usage per internet user remains substantially higher in high-income nations, highlighting a persistent digital divide. Economic and social conditions remain obstacles to the adoption of advanced technology, mirroring the technological gaps observed in poorly resourced regions13.
Open-source or open-weight models show promise as a powerful catalyst for change in global health innovation14,15,16. Open-source strategies for modelling frameworks present advantages for data privacy, as these models can be downloaded and run locally, reducing the risk of exposing sensitive patient information to model creators17. However, while AI models labelled as ‘open source’ are available for use and distribution, the lack of access to their training data and source code may limit in-depth analysis and customization18. These are more accurately referred to as open-weight models. For instance, users of models commonly labelled as open source, such as Llama 3, Grok and Phi-2, do not have access to information about their training data. In LMICs, limited access to training data constrains customization to local health priorities and may reinforce existing global data biases. By contrast, there are examples—such as the OLMo family of models developed by the Allen Institute for AI19—that provide full transparency, making training data, training code and evaluation methodologies openly available.
The value proposition of LLMs in global health
LLMs can act as technology integrators and multipliers by addressing fundamental gaps in global health. These tools may support clinical decision-making, automate tasks and expand access to care where specialists are scarce. In particular, utility may be greatest in primary healthcare, where minimally trained community workers are often the first line of support20. The potential impact of LLM-based enhancements to LMIC healthcare systems spans lower costs, improved efficiency and enhanced productivity within both better-resourced contexts as well as more remote and less well-resourced settings.
The value of LLMs is most clearly demonstrated through practical case studies and ‘pilot’ projects. These projects either build on existing digital platforms to enable the potential for rapid and scalable deployment, enhance generalizability and reduce the cost limitations typically associated with traditional AI tools, or improve service delivery by integrating into existing healthcare workflows. For example, MomConnect is a national flagship programme in South Africa that provides free health and pregnancy-related information to pregnant mothers and answers health enquiries through automated chatbots delivered by Short Messaging Service (SMS) or a smartphone app, especially in areas with limited internet connectivity. With over 5 million users since 2014, this platform is leveraging LLMs to flag urgent enquiries and reduce the number of unresolved pressing health issues21,22. Programmes that are integrated within an already established information and communication system tend to follow a more predictable and efficient trajectory towards rapid expansion21. In another example, transformer-based architectures similar to LLMs have been applied to smartphone app-based malaria detection, providing a scalable and practical alternative to conventional computer vision methods on microscopy platforms23. Finally, a hybrid AI system, DeepDR-LLM, which combines language and image models, has demonstrated improved diabetes care outcomes in China by supporting primary care physicians with personalized recommendations, leading to better medication adherence and patient self-management24. Heterogeneity in the quality of care exist across different providers in China, with online platforms showing strong potential to improve healthcare access in underserved rural areas25. Collectively, these examples underscore the pragmatic ways in which LLMs can be leveraged to expand access, improve efficiency and enhance health outcomes (Fig. 2). These tools may address the health inequity that exists—not only in resource-constrained environments but also in underserved communities—between wealthy and poor communities within the same country.
The success of AI, including LLM-based systems, depends heavily on local context. Variations between global and regional settings exist, from healthcare infrastructure and workforce training to technological maturity26. For instance, maternal mortality remains disproportionately high in low-income countries, particularly in sub-Saharan Africa and South Asia, due to disparities in healthcare access and quality (https://www.who.int/health-topics/maternal-health). Increasingly, there is a realization that the value of AI-driven technologies in real-world settings faces similar disparities9. Models trained and validated on datasets from a single institution often demonstrate poor generalizability. In particular, deterioration in model performance and the introduction of bias have been observed with training datasets that are poorly representative of population diversity in other datasets27,28. For LLMs, a model’s performance may change when asked to perform a task it is not explicitly trained for and without being shown sample input–output pairs, also known as zero-shot learning, or when it is fine-tuned on local training data. Locally fine-tuning or adapting LLMs in LMICs offers a pathway to improve model generalizability and mitigate bias, particularly for under-represented populations, by incorporating context-specific data and perspectives to promote more equitable and trustworthy outputs. The advent of more representative benchmarking datasets presents a critical step forward in helping to identify bias and generalizability issues before real-world deployment. One example is AfriMed-QA29, a unique dataset of 25,000 Africa-focused question–answer pairs. This dataset consists of questions and answers that span 32 clinical specialties, contributed by more than 1,000 African clinicians across 15 different countries. This benchmark will allow for a more comprehensive assessment of LLM capabilities within the varied contexts of African healthcare to identify bias and generalizability issues before real-world deployment.
LLMs can serve as both a facilitator and a barrier to achieving the SDGs. Table 1 provides a list of specific targets for the United Nations’ SDG 3 (ensure healthy lives and promote well-being for all at all ages) and examples of how LLMs can be an enabling force or pose a deterrent to achieving these targets. For example, AI has the potential to promote good health and well-being through AI-powered healthcare solutions. However, if inadequately evaluated and monitored, AI may negatively affect other forms of equity, such as socioeconomic equity by disadvantaging lower-income individuals, as well as linguistic or cultural equity by marginalizing speakers of less-represented languages or cultures, especially when model biases or unequal data are present30,31.
Current barriers and enablers of LLM adoption
Critical domains, such as people, processes, policies, platforms and products, need to be optimized for the successful large-scale implementation of health AI32. In this section, we focus on key barriers, such as limited human and technical resources, the lack of clear governance for LLMs, and the potential impact of LLM adoption on the healthcare workforce and environment. We also briefly highlight the enablers that may help to address these limitations. Concrete proposals to overcome these challenges are elaborated in the next section.
Healthcare resource and data limitations
Infrastructure limitations, including inconsistent electricity supply and unreliable internet connectivity, remain critical barriers to LLM implementation in resource-limited regions7. Critical shortages of skilled professionals, ranging from developers to software engineers and cybersecurity experts, are exacerbated by ‘brain drain’ through migration to higher-income countries33. This loss of technical talent directly undermines the local capacity to develop, test and maintain LLM-based systems. Digital medical records remain fragmented in many countries, limiting the availability of complete and high-quality training datasets34. Facilities without electronic health records are unable to generate the patient-level data that would support context-specific applications35. Beyond these general challenges, LLM deployment also requires specialized resources, such as high-performance computing, large-scale storage and expertise in data annotation, which are often lacking in the local context36.
Existing pretrained models often lack a contextual understanding of local practices, indigenous languages and dialects, and societal or cultural norms in LMICs, limiting their effectiveness and accessibility37,38. With approximately 7,000 languages spoken worldwide and around 800 of them spoken by at least 1.5 million people39, the demand for inclusive language technologies is substantial, further widening existing resource gaps. Commercial models, such as those developed by OpenAI, often demonstrate stronger performance in English than in non-English languages, largely because languages from lower-resourced regions are under-represented in the training data40,41. This disparity can worsen health inequities, as weaker performance increases the risk of miscommunication, reduces user engagement and limits the availability of tailored digital tools, ultimately contributing to digital exclusion. Addressing these challenges is crucial for the equitable and effective implementation of LLMs and generative AI in global health.
Global coordination for the safe use of LLMs
Despite the potential of generative AI and LLMs, greater efforts are needed to ensure the responsible and safe use of LLMs globally. A major concern is the proliferation of open-weight LLMs without adequate guardrails42. Guardrails typically include postprocessing output screening, whereby safety protocols filter the outputs of LLMs and enforce safety actions43. Open-weight LLMs may be more susceptible to targeted attacks due to their model weights being accessible to both good and bad actors, although closed-source models are not exempt from these attacks. In one study, system-level instructions to produce incorrect responses to health queries were given to five different LLMs44. Both open-weight and closed-source models generated misleading health information, delivered in an authoritative, scientific tone. LLM-based systems that are vulnerable may proliferate misinformation and influence medical diagnoses or patient behaviours. Although open-weight models offer promise for localization and cost-efficiency, they introduce additional challenges, including integration complexity, the need for continuous model updates and a lack of built-in safety mechanisms. The absence of version control and update tracking further complicates safety monitoring across implementations. It is imperative for the global community, including developers, health product regulators, ministries of health and health institutions, to coalesce in addressing the unique and disproportionate risks these technologies pose.
The lack of robust frameworks and internationally recognized benchmarks or guidelines to evaluate clinical safety and effectiveness is another critical concern45. Hallucinations, instances in which LLMs confidently generate inaccurate or fabricated content, are prevalent and can be critical in some clinical contexts. Preimplementation assessments can identify potential safety implications to better design safe implementation strategies46. In the implementation of DeepSeek, for example, its inference chain-based reasoning enhances overall performance but has also raised considerable concerns due to its tendency to produce hallucinations46,47. In addition, reasoning and explanations can sometimes increase automation bias, where the user excessively relies on the automated recommendations48. The rapid expansion of open-weight tools among underserved populations poses serious safety risks due to limited capacity for clinical oversight and low health literacy rates, as well as limited access to accurate medical information for cross-referencing. In regions where regulatory frameworks and technical capacity lag behind, the unchecked deployment of open models may result in the unsafe provision of medical advice49. The ability of users, whether patients or healthcare professionals, to detect hallucinations will increasingly become crucial50. Regular audits of model outputs against verified medical sources could help to identify and mitigate the risks of hallucination.
The safe and responsible development and deployment of LLMs for global health require the scrutiny of global governance frameworks and their impact on equity and justice. A coordinated international effort is needed to establish minimum safety standards, equitable and representative data practices, fair labour protections, and support for regulatory capacity building. Without such collaboration, the negative impact of AI on global health is likely to outweigh the potential benefits49. Establishing an international oversight body with representation from diverse healthcare systems could help to standardize safety protocols while respecting local contexts. We discuss some existing initiatives in the sections below.
Labour and environmental implications
The adoption of LLMs to automate healthcare processes may have labour implications. LLMs deployed in triage, documentation and basic patient communication may displace lower-wage healthcare and administrative jobs, particularly roles filled by community health workers or call centre agents. This raises ethical concerns about the reallocation of benefits from AI, where job loss may occur in resource-constrained settings while value is captured elsewhere50. AI exposure has been observed to decrease as country income levels decline, with only 12% of workers in low-income countries experiencing a high level of exposure51. In the immediate and short term, the impact of AI on the labour market in developing countries is likely to be less pronounced than in high-income nations due to a lower level of exposure. However, in the long run, a widened income disparity may be observed, despite fewer job displacements52. Since the pandemic, the use of telemedicine has surged, driving up remuneration for telehealth specialists in the USA53. By contrast, adoption in LMICs remains limited, largely because of inadequate digital infrastructure and lower patient acceptance54,55. Estimating the full impact remains challenging, as projections for AI utilization are highly uncertain, especially in LMICs.
Beyond healthcare, LLMs raise questions about environmental sustainability. Training and running large models consume vast computational resources and electricity, contributing to carbon emissions56. The environmental impact encompasses electricity use, water consumption and hardware-related emissions57. At the level of individual queries, the energy and water consumption may appear minimal. However, cumulative use in healthcare settings (for example, triage, ambient scribing and documentation, and electronic health record queries) can scale dramatically, with daily emissions comparable to those of hundreds of households58. Additionally, the hardware needed to run these models incurs carbon and environmental costs, including rare-earth mining and water use57, all of which are currently under-reported.
Privacy, cybersecurity and data security
When deploying LLMs in global health contexts, privacy protection, cybersecurity and data security represent critical challenges that cannot be overlooked. These concerns are particularly pronounced in LMICs, where legal frameworks, cybersecurity capacity and resource allocation are often underdeveloped7,46. In the absence of transparent data governance mechanisms and adequate technical safeguards, patient queries, clinical information and system interaction logs generated during LLM use may be misused, thereby undermining patient trust and weakening the long-term resilience of health systems59.
These risks manifest across several dimensions. In terms of data privacy, during interactions with LLMs, user queries and the clinical information required from local health systems may be collected and stored by model providers or external servers. Without stringent regulation and protection, such information could be inappropriately exploited60. Moreover, as LLMs are integrated into health information systems, the risks of hacking, data interception and malicious code injection are increasing—particularly in the case of open-weight LLMs that lack ongoing security auditing61. In addition, many LMICs face structural vulnerabilities, including underdeveloped data protection laws, weak cybersecurity infrastructure and insufficient oversight of cross-border data transfers59. These institutional gaps not only amplify the potential harm of external threats but may also lead to an over-reliance of local health systems on international vendors.
Proposals to advancing LLMs in global health
Although resource, technical, social and cultural barriers continue to pose challenges, we propose a few practical pathways to advance LLMs’ contributions to global health goals. Table 2 presents the key considerations and strategies that lead to implementation. This begins with a strategic, high-level appraisal of key domains, including the sociotechnical contexts in which LLM-based tools are developed, tested and deployed. This should be followed by rigorous real-world evaluations to assess health outcomes, along with coordinated development and harmonization of governance and regulatory frameworks to ensure safe, equitable and sustainable implementation.
Adapting global technologies for local adoption
Access to open-source platforms democratizes advanced models such as DeepSeek, Llama (by Meta) and Mistral. These models have achieved performance comparable to that of leading proprietary large models on some medical benchmarks62. One such example, DeepSeek, was developed under hardware access limitations, achieving notable reductions in computational requirements for both training and inference62. This has challenged conventional assumptions about the computational efficiency and sustainability of generative AI63. Early evidence suggests that the performance of the DeepSeek-R1 model is comparable to state-of-the-art proprietary models in making medical diagnoses and providing treatment recommendations15,64. Its efficient architecture may benefit resource-constrained healthcare settings.
Benchmarking studies have demonstrated the potential of open-source models to match the performance of closed-source models. In a systematic comparison of open-weight and closed-source systems, open models achieved an accuracy as high as 0.92 on the US Medical Licensing Examination, close to the 0.95 reported for OpenAI o1 (ref. 64). Meanwhile, in a specialized evaluation in ophthalmology, DeepSeek-R1 and OpenAI o1 performed equivalently on 300 multiple-choice clinical cases across ten ophthalmic subspecialties, with DeepSeek-R1 having an estimated cost of only 6.71% of that of OpenAI o1 (ref. 65). In another benchmarking exercise, the diagnostic accuracy of Meta’s open-weight Llama 3-70B model was comparable to that of GPT-4o when tested on a series of peer-reviewed clinical radiology case reports66. Although useful, benchmarks are not substitutes for peer-reviewed clinical validation. To date, few trials67 have examined the benefits of DeepSeek or similar models in a trial setting. Despite this, we are already seeing large-scale adoption of open-weight models in clinical practice. More than 300 hospitals in China have integrated DeepSeek locally into various clinical and administrative functions, including decision support, patient communication and hospital management47. However, the clinical impact of this strategy has not been studied or reported68.
In the biotechnology and drug discovery field, the cost of drug development may decrease as AI-powered discovery and candidate refinement processes become more efficient. Applications leveraging open-weight tools are already emerging, for instance, in solid dosage formulation design and development69. The pharmaceutical industry has seen substantial transformation through the adoption of generative AI and LLMs. These technologies have accelerated the discovery of new drug targets70, improved the process of de novo drug design71,72 and automated the matching of eligible patients to clinical trials73. However, due to computational constraints, the implementation of high-end AI models is generally not feasible for smaller research entities74. DeepSeek’s demonstration of novel engineering and algorithmic innovation suggests that even organizations with limited resources can participate competitively and purposefully in delivering impactful AI initiatives. This could lead to greater motivation or fewer obstacles in developing or repurposing drugs for rare and neglected diseases that are prevalent or difficult to treat in developing regions75,76. The model’s efficiency could particularly benefit research into tropical and infectious diseases that predominantly affect resource-limited settings.
A call for global impact assessment studies
To date, most of the published work assessing medical LLMs involves case scenarios, patient actors and other methods that are meant to simulate medical practice. Only a limited number of randomized clinical trials have been performed to evaluate LLMs, and they offer limited insight into the real-world impact of LLMs on patient care77. This is beginning to change, with several trials having been initiated in LMICs78, including the world’s first phase 3 randomized clinical trial of an LLM-based clinical decision support system79. There is growing evidence that LLMs exhibit demographic biases in healthcare applications; however, the true extent of their impact on health disparities is limited80. This lack of robust real-world evidence from LMICs highlights the urgent need for rigorous clinical effectiveness evaluations, particularly randomized clinical trials, beyond model validation. In addition, cost-effectiveness studies of LLMs in health systems are lacking. Systematic documentation of implementation challenges and successes across diverse healthcare settings could guide future deployments.
To realize the promise of LLMs equitably, implementation science principles must guide their deployment through local partnerships and healthcare networks that assess real-world impact within diverse LMIC contexts81,82. To support this, the establishment of centres of excellence in AI for global health equity within LMICs has been proposed83. These centres can be operated in collaboration with global technology and health leaders, tailored to local needs and supported by international financing. These centres would not only help to harness AI for local health priorities and innovation but also ensure culturally adapted policy guidance, ethical oversight, education and local ownership, aligning technological advances with sustainable and equitable health outcomes. Importantly, these centres could also support in-depth evaluations of LLMs that incorporate human factors, as well as the contextual and cultural environments in which they are deployed, examining how these elements shape the effectiveness and reliability of LLM applications.
Various global initiatives have been launched to advance the AI in global health agenda. The Global Initiative on AI for Health (GI-AI4H) was launched in July 2023 by the World Health Organization (WHO), the International Telecommunication Union and the World Intellectual Property Organization84. Designed as a long-term institutional framework, GI-AI4H aims to support the responsible development and deployment of AI in healthcare84. It provides a global platform to enable collaboration, facilitate the sharing of best practices and implement AI-driven health solutions that align with global health goals. On the academic front, the Lancet Global Health Commission on AI and HIV was launched to synthesize evidence on AI’s economic and health impacts across different settings, guide responsible AI model development, and create actionable guidance for stakeholders in the responsible regulation and adoption of AI applications85. Philanthropy groups such as the Gates Foundation are focusing on funding projects that promote AI equity and language inclusivity to ensure equitable access to AI in all settings86,87.
Skills training and workforce redesign
An immediate priority for integrating LLMs into healthcare is to strengthen digital literacy and support healthcare professionals in using LLMs effectively. Disparities in digital infrastructure and AI access between high-income countries and LMICs not only affect the use of health AI but also influence how healthcare workers in LMICs perceive the value of AI in medical education and practice. In addition, many healthcare professionals may already be using generative AI tools in their medical practice but often lack a sufficient understanding of the associated risks and best practices for their effective application. These gaps call for tailored and flexible educational approaches to help clinicians to critically assess AI-generated content and apply it responsibly within clinical workflows88. For instance, the Lao People’s Democratic Republic Digital Health Strategy 2023–2027 outlines workforce capacity building by integrating digital health literacy into medical curricula, training pipelines and continuous professional development89. At the institutional level, the University of Bordeaux and African partner institutions offer a joint postgraduate programme in digital health that has trained hundreds of healthcare professionals across West and Central Africa90. Massive open online courses on health AI and digital health also provide scalable resources, although LMICs would benefit most from locally designed or adapted courses.
This skill development is also important for positioning underprivileged workers to navigate the long-term impact of AI on the healthcare workforce91. As described in a World Bank report51, the low AI exposure in LMICs offers a window of opportunity to prepare these health systems adequately before widespread AI adoption. Proactively training the healthcare workforce now can help to mitigate disruptions and ensure equitable adaptation to an AI-augmented healthcare future. However, it is critical to acknowledge that assigning humans to supervise LLMs through constant vigilance is suboptimal. Research from other high-risk industries shows that humans are poor at sustained monitoring, which can increase rather than reduce errors92. The design of AI–clinician partnerships must go beyond passive oversight to ensure safety and reliability93.
Improving AI literacy among the general public is equally essential for democratizing access to healthcare information. Public awareness of the strengths and limitations of LLMs, along with recognition of potential risks such as privacy breaches and biases in AI-generated content, can foster effective and appropriate LLM use and build justified trust94. Moreover, prompt phrasing influences LLM responses95, yet crafting effective prompts can be challenging for lay users, especially non-native English speakers. Public seminars or workshops that introduce basic prompting strategies for common health-related enquiries may help to support safer and more productive daily use.
The long-term integration of LLMs in healthcare systems also requires the redesign of education and training programmes for future healthcare professionals, researchers and technology developers. AI research today is largely driven by contributions from high-income countries96,97, raising concerns that the resulting innovations may be better suited to those contexts. Although AI has not yet been widely adopted to reshape medical curricula, some institutions (primarily in high-income countries) have begun integrating AI into medical education to prepare students for its growing role in healthcare. Some examples include incorporating AI training into the general curriculum to help students to critically assess AI tools and offering graduate-level courses and programmes focused on health AI98,99. In Europe, the Sustainable Healthcare with Digital Health Data Competence initiative supports the development of foundational competencies for working effectively with AI in clinical care, including digital and data-driven skills100.
Although reforming medical education is challenging, particularly in resource-constrained settings, there are promising starting points that can be further expanded. For instance, Data Science Africa builds grassroots capacity through workshops and summer schools focused on machine learning and AI (https://www.datascienceafrica.org/about-2). These efforts can be strengthened through additional resources and global collaboration. The AI4D (AI for Development) initiative is one such example, supporting AI research and innovation in areas such as public health, responsible AI use and context-appropriate technologies across regions including Africa and Asia (https://www.ai4d.ai/about). Another example is the African Master’s in Machine Intelligence, a 1-year, fully funded programme launched in 2019 to provide top-tier AI training in Africa and build a strong ecosystem of socially engaged AI practitioners (https://aims.edu.gh/african-masters-in-machine-learning). These efforts offer adaptable models that can help LMICs to strengthen their local capacity for AI integration in health and beyond.
Global generative AI and LLM regulation and governance
Particularly in LMICs, inadequate regulatory capacity raises concerns about relying on AI approvals from the USA or the European Union, as the standards and data used may not be directly applicable. This contextual bias can lead to algorithms recommending treatments that are unsuitable or not cost-effective for low-resource settings101. The deficiency of appropriate regulatory oversight in LMICs may allow companies to market AI-based health solutions that do not meet regulatory standards in high-income countries. Some argue that, given the limited healthcare access, lowering quality and safety requirements could be justified if imperfect tools still present improvements in LMIC contexts102. However, without sufficient oversight, such pragmatic approaches risk institutionalizing regulatory double standards that undermine the rights of LMIC populations to safe and equitable healthcare. Repeated testing has shown gender and racial biases in LLMs, which may not be evident in small-scale pilot trials but emerge in real-world settings. This necessitates robust monitoring to prevent harm and ensure equitable healthcare implementation.
AI governance frameworks are slowly taking shape, with an increasing number of nations across Africa, South Asia and South America introducing national AI strategies, while several others are currently drafting guidance103,104. The African Union is leading the African Medicines Regulatory Harmonization initiative, which aims to harmonize regulatory approval for medical devices, including AI-based tools, across the continent105. At the global level, the United Nations Educational, Scientific and Cultural Organization’s recommendation on the ethics of AI and WHO’s ethical guidance on health AI and large multimodal models set out normative principles for governance. However, these frameworks tend to emphasize high-level strategic goals and ethical principles while typically lacking binding legal mechanisms or technical details for enforcement. This lack of a clear regulatory pathway may hinder the development and deployment of health AI in two opposite yet equally harmful ways. On the one hand, it may inadvertently allow industry, nongovernmental organizations and private funders to drive commercial agendas rapidly without adequate safety guardrails. On the other hand, it is important to acknowledge that overly stringent regulatory frameworks may pose barriers to the meaningful implementation of LLMs, particularly where viable alternatives are lacking. Regulatory requirements developed by high-income countries require adaptation before imposing the same rules on LMICs. For example, strict data privacy frameworks such as the GDPR (General Data Protection Regulation) and HIPAA (Health Insurance Portability and Accountability Act) are poorly applicable to data-scarce settings. Addressing this tension is best guided by local governments, which are best positioned to balance regulatory rigour with contextual needs.
It is worth noting that there is a trend towards the development of more formal enforcement infrastructure. In Africa, a collaborative effort to harmonize regulatory approval for medical devices is underway, and an extension to the African Union Model Law, which includes an annex on AI, is expected in late 2025. The recent WHO publication on the Global Benchmarking Tool for medical devices106 provides a robust framework for the maturity assessment of national regulatory authorities. Yet, there is not a single maturity level 3 (that is, able to license for export) medical device (including software or AI as a medical device) regulator on the African continent. That said, Ghana has begun registering devices for local use, and other countries, such as South Africa, have recently announced their intention to regulate AI-based medical devices.
Conclusion
LLMs offer an unprecedented opportunity to influence and possibly even transform global health, particularly in LMICs, by addressing fundamental global health challenges, closing healthcare access gaps, empowering frontline primary and community healthcare, and enabling scalable and cost-effective innovations. In this Perspective, we frame these opportunities and risks through the lens of the SDGs, providing a structured link between LLM development and global health priorities. However, fully realizing this promise requires joint efforts led by a broad range of stakeholders in global health. International organizations and global health agencies should coordinate proactive policies and promote shared standards. National governments need to establish context-sensitive guardrails and invest in local healthcare capacity. Funding bodies and academic–clinical partnerships should prioritize the generation of real-world evidence, particularly in LMICs, to guide the safe and effective deployment of LLMs. The path forward requires clarity, and the global health community must work collectively and with some urgency for LLMs to become not just a tool with transformative potential in clinical medicine but also a trusted partner in achieving more equitable, effective and resilient healthcare systems worldwide.
References
Tarabanis, C. et al. Performance of publicly available large language models on internal medicine board-style questions. PLOS Digit. Health 3, e0000604 (2024).
Tierney, A. A. et al. Ambient artificial intelligence scribes: learnings after 1 year and over 2.5 million uses. NEJM Catal. Innov. Care Deliv. 6, CAT.25.0040 (2025).
Rao, A. S. et al. Synthetic medical education in dermatology leveraging generative artificial intelligence. NPJ Digit. Med. 8, 247 (2025).
Yang, R. et al. Retrieval-augmented generation for generative artificial intelligence in health care. NPJ Health Syst. 2, 2 (2025).
Omiye, J. A., Gui, H., Rezaei, S. J., Zou, J. & Daneshjou, R. Large language models in medicine: the potentials and pitfalls: a narrative review. Ann. Intern. Med. 177, 210–220 (2024).
Khan, M. S., Umer, H. & Faruqe, F. Artificial intelligence for low income countries. Humanit. Soc. Sci. Commun. 11, 1422 (2024).
Ong, J. C. L. et al. Artificial intelligence, ChatGPT, and other large language models for social determinants of health: current state and future directions. Cell Rep. Med. 5, 101356 (2024).
Akbarialiabad, H. et al. The utility of generative AI in advancing global health. NEJM AI 2, AIp2400875 (2025).
Vaswani, A. et al. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (eds von Luxburg, U. et al.) 6000–6010 (Curran Associates, 2017).
Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. In Proc. 36th International Conf. on Neural Information Processing Systems 1800, 24824–24837 (Curran Associates, 2022).
Raiaan, M. A. K. et al. A review on large language models: architectures, applications, taxonomies, open issues and challenges. IEEE Access 12, 26839–26874 (2024).
Wu, K. et al. An automated framework for assessing how well LLMs cite relevant medical references. Nat. Commun. 16, 3615 (2025).
Liu, Y. & Wang, H. Who on Earth is using generative AI? World Dev. 199, 107260 (2026).
Gibney, E. Scientists flock to DeepSeek: how they’re using the blockbuster AI model. Nature https://doi.org/10.1038/d41586-025-00275-0 (2025).
Sandmann, S. et al. Benchmark evaluation of DeepSeek large language models in clinical decision-making. Nat. Med. https://doi.org/10.1038/s41591-025-03727-2 (2025).
Gibney, E. China’s cheap, open AI model DeepSeek thrills scientists. Nature 638, 13–14 (2025).
Ritoré, Á. et al. The role of open access data in democratizing healthcare AI: a pathway to research enhancement, patient well-being and treatment equity in Andalusia, Spain. PLOS Digit. Health 3, e0000599 (2024).
Maffulli, S. ‘Open source’ AI isn’t truly open—here’s how researchers can reclaim the term. Nature 640, 9 (2025).
Groeneveld, D. et al. OLMo: accelerating the science of language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Ku, L.-W. et al.) 15789–15809 (Association for Computational Linguistics, 2024).
Smithwick, J. et al. “Community health workers bring value and deserve to be valued too:” key considerations in improving CHW career advancement opportunities. Front. Public Health 11, 1036481 (2023).
Stanford Center for Digital Health. Generative AI for Health in Low & Middle Income Countries (Stanford Center for Digital Health, 2025).
Ochieng, S. et al. Exploring the implementation of an SMS-based digital health tool on maternal and infant health in informal settlements. BMC Pregnancy Childbirth 24, 222 (2024).
Liu, R. et al. AIDMAN: an AI-based object detection system for malaria diagnosis from smartphone thin-blood-smear image. Patterns 4, 100806 (2023).
Li, J. et al. Integrated image-based deep learning and language models for primary diabetes care. Nat. Med. 30, 2886–2896 (2024).
Huang, M. et al. Primary care quality and provider disparities in China: a standardized-patient-based study. Lancet Reg. Health West. Pac. 50, 101161 (2024).
Yang, J. et al. Generalizability assessment of AI models across hospitals in a low–middle and high income country. Nat. Commun. 15, 8270 (2024).
Liu, X., Alderman, J. & Laws, E. A global health data divide. NEJM AI 1, AIe2400388 (2024).
Alderman, J. E. et al. Tackling algorithmic bias and promoting transparency in health datasets: the STANDING Together consensus recommendations. Lancet Digit. Health 7, e64–e88 (2025).
Olatunji, T. et al. AfriMed-QA: a pan-African, multi-specialty, medical question-answering benchmark dataset. In Proc. 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1948–1973 (Association for Computational Linguistics, 2025).
Adams, R. et al. Mapping the potentials and limitations of using generative AI technologies to address socio-economic challenges in LMICs. Preprint at VeriXiv https://verixiv.org/articles/2-57/v1 (2025).
Vinuesa, R. et al. The role of artificial intelligence in achieving the Sustainable Development Goals. Nat. Commun. 11, 233 (2020).
Gunasekeran, D. V. et al. National use of artificial intelligence for eye screening in Singapore. NEJM AI 1, AIcs2400404 (2024).
Thakur, R. Unraveling the brain drain dilemma: analysis among skilled information technology professionals of Nepal. Preprint at SSRN https://doi.org/10.2139/ssrn.4778684 (2024).
Ahmed, M. I. et al. A systematic review of the barriers to the implementation of artificial intelligence in healthcare. Cureus 15, e46454 (2023).
Eisinger-Mathason, T. S. K. et al. Data linkage multiplies research insights across diverse healthcare sectors. Commun. Med. 5, 58 (2025).
Woldemariam, M. T. & Jimma, W. Adoption of electronic health record systems to enhance the quality of healthcare in low-income countries: a systematic review. BMJ Health Care Inform. 30, e100704 (2023).
Ullah, E., Parwani, A., Baig, M. M. & Singh, R. Challenges and barriers of using large language models (LLM) such as ChatGPT for diagnostic medicine with a focus on digital pathology—a recent scoping review. Diagn. Pathol. 19, 43 (2024).
Park, P. S., Schoenegger, P. & Zhu, C. Diminished diversity-of-thought in a standard large language model. Behav. Res. Methods 56, 5754–5770 (2024).
Yang, Y., Liu, X., Jin, Q., Huang, F. & Lu, Z. Unmasking and quantifying racial bias of large language models in medical report generation. Commun. Med. 4, 176 (2024).
Ahia, O. et al. Do all languages cost the same? Tokenization in the era of commercial language models. Preprint at arXiv https://doi.org/10.48550/arXiv.2305.13707 (2023).
Alhanai, T. et al. Bridging the gap: enhancing LLM performance for low-resource African languages with new benchmarks, fine-tuning, and cultural adjustments. In The Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25) 27802–27812 (AAAI, 2025).
Han, T. et al. Medical large language models are susceptible to targeted misinformation attacks. NPJ Digit. Med. 7, 288 (2024).
Dong, Y. et al. Position: building guardrails for large language models requires systematic design. In Proc. 41st International Conf. on Machine Learning 451, https://openreview.net/forum?id=JvMLkGF2Ms (JMLR.org, 2024).
Modi, N. D. et al. Assessing the system-instruction vulnerabilities of large language models to malicious conversion into health disinformation chatbots. Ann. Intern. Med. 178, 1172–1180 (2025).
Hartman, V. et al. Developing and evaluating large language model-generated emergency medicine handoff notes. JAMA Netw. Open 7, e2448723 (2024).
Peng, Y. et al. From GPT to DeepSeek: significant gaps remain in realizing AI in healthcare. J. Biomed. Inform. 163, 104791 (2025).
Zeng, D., Qin, Y., Sheng, B. & Wong, T. Y. DeepSeek’s “low-cost” adoption across China’s hospital systems: too fast, too soon? JAMA https://doi.org/10.1001/jama.2025.6571 (2025).
Vered, M., Livni, T., Howe, P. D. L., Miller, T. & Sonenberg, L. The effects of explanations on automation bias. Artif. Intell. 322, 103952 (2023).
WHO. Leading the future of global health with responsible artificial intelligence. World Health Organization https://www.who.int/publications/m/item/leading-the-future-of-global-health-with-responsible-artificial-intelligence (2024).
McKinsey & Company. The economic potential of generative AI: the next productivity frontier. McKinsey & Company https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier#introduction (2023).
Demombynes, G., Langbein, J. & Weber, M. The Exposure of Workers to Artificial Intelligence in Low- and Middle-Income Countries. Policy Research Working Paper (World Bank Group, 2025).
Ernst, E., Berg, J. & Moore, P. V. Editorial: Artificial intelligence and the future of work: humans in control. Front. Artif. Intell. 7, 1378893 (2024).
Gage, A. D. et al. Disparities in telemedicine use and payment policies in the United States between 2019 and 2023. Commun. Med. 5, 52 (2025).
Mahmoud, K., Jaramillo, C. & Barteit, S. Telemedicine in low- and middle-income countries during the COVID-19 pandemic: a scoping review. Front. Public Health 10, 914423 (2022).
Ye, J., He, L. & Beestrum, M. Implications for implementation and adoption of telehealth in developing countries: a systematic review of China’s practices and experiences. NPJ Digit. Med. 6, 174 (2023).
Kleinig, O. et al. Environmental impact of large language models in medicine. Intern. Med. J. 54, 2083–2086 (2024).
Luccioni, S., Jernite, Y. & Strubell, E. Power hungry processing: watts driving the cost of AI deployment? In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency 85–99 (Association for Computing Machinery, 2024).
Chauhan, D., Bahad, P. & Jain, J. K. Sustainable AI: environmental implications, challenges, and opportunities. In Explainable AI (XAI) for Sustainable Development (eds Lakshmi, D. et al.) 1–15 (Association for Computing Machinery, 2024).
Alami, H. et al. Artificial intelligence in health care: laying the foundation for responsible, sustainable, and inclusive innovation in low- and middle-income countries. Global Health 16, 52 (2020).
Jonnagaddala, J. & Wong, Z. S.-Y. Privacy preserving strategies for electronic health records in the era of large language models. NPJ Digit. Med. 8, 34 (2025).
Greshake, K. et al. Not what you’ve signed up for: compromising real-world LLM-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security 79–90 (Association for Computing Machinery, 2023).
Normile, D. Chinese firm’s large language model makes a splash. Science 387, 238 (2025).
Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).
Tordjman, M. et al. Comparative benchmarking of the DeepSeek large language model on medical tasks and clinical reasoning. Nat. Med. 31, 2550–2555 (2025).
Mikhail, D. et al. Performance of DeepSeek-R1 in ophthalmology: an evaluation of clinical decision-making and cost-effectiveness. Br. J. Ophthalmol. 109, 976–981 (2025).
Kim, S. H. et al. Benchmarking the diagnostic performance of open source LLMs in 1933 Eurorad case reports. NPJ Digit. Med. 8, 97 (2025).
Wu, Y. et al. An eyecare foundation model for clinical assistance: a randomized controlled trial. Nat. Med. https://doi.org/10.1038/s41591-025-03900-7 (2025).
Yuan, M. et al. Large-scale local deployment of DeepSeek-R1 in pilot hospitals in China: a nationwide cross-sectional survey. Preprint at medRxiv https://doi.org/10.1101/2025.05.15.25326843 (2025).
Lin, L., Zhou, X., Yang, K. & Chen, X. DeepSeek powered solid dosage formulation design and development. Preprint at arXiv https://doi.org/10.48550/arXiv.2503.11068 (2025).
Bordukova, M., Makarov, N., Rodriguez-Esteban, R., Schmich, F. & Menden, M. P. Generative artificial intelligence empowers digital twins in drug discovery and clinical trials. Expert Opin. Drug Discov. 19, 33–42 (2024).
Gangwal, A. & Lavecchia, A. Unleashing the power of generative AI in drug discovery. Drug Discov. Today 29, 103992 (2024).
Namba-Nzanguim, C. T. Artificial intelligence for antiviral drug discovery in low resourced settings: a perspective. Front. Drug Discov. 2, 1013285 (2022).
Nievas, M., Basu, A., Wang, Y. & Singh, H. Distilling large language models for matching patients to clinical trials. J. Am. Med. Inform. Assoc. 31, 1953–1963 (2024).
Chakraborty, C., Bhattacharya, M., Lee, S.-S., Wen, Z.-H. & Lo, Y.-H. The changing scenario of drug discovery using AI to deep learning: recent advancement, success stories, collaborations, and challenges. Mol. Ther. Nucleic Acids 35, 102295 (2024).
Nishan, M. D. N. H. AI-powered drug discovery for neglected diseases: accelerating public health solutions in the developing world. J. Glob. Health 15, 03002 (2025).
Eisenstein, M. Overlooked and underfunded: neglected diseases exert a toll. Nature 598, S20–S22 (2021).
Omar, M., Nadkarni, G. N., Klang, E. & Glicksberg, B. S. Large language models in medicine: a review of current clinical trials across healthcare applications. PLOS Digit. Health 3, e0000662 (2024).
PATH. PATH launches clinical trial on the use of artificial intelligence in primary health care. PATH https://www.path.org/our-impact/media-center/path-launches-artifical-intelligence-clinical-trial (2025).
Agweyu, A. et al. Large language model-assisted clinicians versus unassisted clinicians in clinical decision making: a multi-centre randomized controlled trial in Nairobi, Kenya. Preprint at Zenodo https://doi.org/10.5281/zenodo.15266188 (2025).
Omar, M. et al. Evaluating and addressing demographic disparities in medical large language models: a systematic review. Int. J. Equity Health 24, 57 (2025).
Beste, J. et al. Working towards a decolonized, longitudinal, and equitable global health training and partnerships program. J. Med. Educ. Curric. Dev. 12, 23821205251324297 (2025).
Longhurst, C. A., Singh, K., Chopra, A., Atreja, A. & Brownstein, J. S. A call for artificial intelligence implementation science centers to evaluate clinical effectiveness. NEJM AI 1, AIp2400223 (2024).
Akbarialiabad, H. & Sewankambo, N. K. Centres of excellence in AI for global health equity—a strategic vision for LMICs. Nature 625, 450 (2024).
WHO. Global Initiative on AI for Health. World Health Organization https://www.who.int/initiatives/global-initiative-on-ai-for-health (2025).
Reid, M. J. A. et al. Announcing the Lancet Global Health Commission on artificial intelligence (AI) and HIV: leveraging AI for equitable and sustainable impact. Lancet Glob. Health 13, e611–e612 (2025).
Cheney, C. Exclusive: donors commit $10M to include African languages in AI models. Devex https://www.devex.com/news/sponsored/exclusive-donors-commit-10m-to-include-african-languages-in-ai-models-109044 (2025).
Gates Foundation. AI equity: ensuring access to AI for all. Gates Foundation https://www.gatesfoundation.org/ideas/science-innovation-technology/artificial-intelligence (2025).
Ong, Q. C., Ang, C.-S., Lai, N. M., Atun, R. & Car, J. Differences in expert perspectives on AI training in medical education: secondary analysis of a multinational Delphi study. J. Med. Internet Res. 27, e72186 (2025).
Ministry of Health of Lao People’s Democratic Republic. Lao People’s Democratic Republic: Digital Health Strategy, 2023–2027 (Ministry of Health of Lao People’s Democratic Republic, 2023).
Fondation Pierre Fabre. Training digital healthcare professionals in Africa: 6th class of graduates for the eHealth inter-university diploma. Fondation Pierre Fabre https://www.fondationpierrefabre.org/en/current-initiatives/training-digital-healthcare-professionals-in-africa-6th-class-of-graduates-for-the-ehealth-inter-university-diploma (2024).
Edzie, E. K. M. et al. Perspectives of radiologists in Ghana about the emerging role of artificial intelligence in radiology. Heliyon 9, e15558 (2023).
Stewart, J. Tesla’s autopilot was involved in another deadly car crash. Wired https://www.wired.com/story/tesla-autopilot-self-driving-crash-california/ (2018).
Adler-Milstein, J., Redelmeier, D. A. & Wachter, R. M. The limits of clinician vigilance as an AI safety bulwark. JAMA 331, 1173–1174 (2024).
Long, D. & Magerko, B. What is AI literacy? Competencies and design considerations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems 1–16 (Association for Computing Machinery, 2020); https://doi.org/10.1145/3313831.3376727
Wang, L. et al. Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs. NPJ Digit. Med. 7, 41 (2024).
Celi, L. A. et al. Sources of bias in artificial intelligence that perpetuate healthcare disparities—a global review. PLOS Digit. Health 1, e0000022 (2022).
Yang, R. et al. Disparities in clinical studies of AI enabled applications from a global perspective. NPJ Digit. Med. 7, 209 (2024).
Gehrman, E. How generative AI is transforming medical education. Harvard Medicine Magazine https://magazine.hms.harvard.edu/articles/how-generative-ai-transforming-medical-education (2024).
College of Medicine Rockford. UICOMR students participate in AI curriculum. College of Medicine Rockford, University of Illinois College of Medicine https://rockford.medicine.uic.edu/news-stories/uicomr-students-participate-in-ai-curriculum (2024).
SUSA. Sustainable healthcare with digital health data competence. University of Oulu https://www.oulu.fi/en/projects/sustainable-healthcare-digital-health-data-competence (2025).
Zhou, K. & Gattinger, G. The evolving regulatory paradigm of AI in MedTech: a review of perspectives and where we are today. Ther. Innov. Regul. Sci. 58, 456–464 (2024).
WHO. Ethics and governance of artificial intelligence for health: WHO guidance. Executive summary. World Health Organization https://www.who.int/publications/i/item/9789240037403 (2021).
Digital Watch Observatory. Kenya launches project to develop National AI Strategy in collaboration with German and EU partners. Digital Watch Observatory https://dig.watch/updates/kenya-launches-project-to-develop-national-ai-strategy-in-collaboration-with-german-and-eu-partners (2024).
Luminate. Partnerships will ensure inclusivity for Nigeria’s AI strategy. Luminate https://www.luminategroup.com/posts/news/partnerships-nigeria-ai-strategy (2024).
Wairagkar, N. et al. The African Medicines Agency—a potential gamechanger that requires strategic focus. PLOS Glob. Public Health 5, e0004276 (2025).
WHO. WHO Global Benchmarking Tool + Medical Devices (GBT + medical devices) for evaluation of national regulatory systems of medical devices including in-vitro diagnostics. World Health Organization https://www.who.int/tools/global-benchmarking-tools/evaluation-of-national-regulatory-systems-of-medical-devices-in-vitro-diagnostics (2024).
Martinson, S., Kong, L., Kim, C. W., Taneja, A. & Tambe, M. LLM-based agent simulation for maternal health interventions: uncertainty estimation and decision-focused evaluation. Preprint at arXiv https://doi.org/10.48550/arXiv.2503.22719 (2025).
Gates Foundation. Large language model (LLM)-based conversational agent for women from prenatal to postnatal care. Gates Foundation: Global Grand Challenges https://gcgh.grandchallenges.org/grant/large-language-model-llm-based-conversational-agent-women-prenatal-postnatal-care (2024).
Gumilar, K. E. et al. Artificial intelligence–large language models (AI–LLMs) for reliable and accurate cardiotocography (CTG) interpretation in obstetric practice. Comput. Struct. Biotechnol. J. 27, 1140–1147 (2025).
Broad, A. et al. Factors associated with abusive head trauma in young children presenting to emergency medical services using a large language model. Prehosp. Emerg. Care 29, 227–237 (2025).
Liu, W. et al. Bridging the gap in neonatal care: evaluating AI chatbots for chronic neonatal lung disease and home oxygen therapy management. Pediatr. Pulmonol. 60, e71020 (2025).
Levin, C., Kagan, T., Rosen, S. & Saban, M. An evaluation of the capabilities of language models and nurses in providing neonatal clinical decision support. Int. J. Nurs. Stud. 155, 104771 (2024).
Yang, J. et al. RDmaster: a novel phenotype-oriented dialogue system supporting differential diagnosis of rare disease. Comput. Biol. Med. 169, 107924 (2024).
Beam, K. et al. Performance of a large language model on practice questions for the neonatal board examination. JAMA Pediatr. 177, 977–979 (2023).
Li, Y. et al. Exploring the performance of large language models on hepatitis B infection-related questions: a comparative study. World J. Gastroenterol. 31, 101092 (2025).
Wang, Y., Chen, Y. & Sheng, J. Assessing ChatGPT as a medical consultation assistant for chronic hepatitis B: cross-language study of English and Chinese. JMIR Med. Inform. 12, e56426 (2024).
Wu, C. et al. The large language model diagnoses tuberculous pleural effusion in pleural effusion patients through clinical feature landscapes. Respir. Res. 26, 52 (2025).
Busch, D. et al. A blueprint for large language model-augmented telehealth for HIV mitigation in Indonesia: a scoping review of a novel therapeutic modality. Health Informatics J. 31, 14604582251315595 (2025).
De Vito, A. et al. Assessing ChatGPT’s potential in HIV prevention communication: a comprehensive evaluation of accuracy, completeness, and inclusivity. AIDS Behav. 28, 2746–2754 (2024).
Hua, Y. et al. A scoping review of large language models for generative tasks in mental health care. NPJ Digit. Med. 8, 230 (2025).
Akdogan, O. et al. Effect of a ChatGPT-based digital counseling intervention on anxiety and depression in patients with cancer: a prospective, randomized trial. Eur. J. Cancer 221, 115408 (2025).
Lauderdale, S. A. et al. Effectiveness of generative AI-large language models’ recognition of veteran suicide risk: a comparison with human mental health providers using a risk stratification model. Front. Psychiatry 16, 1544951 (2025).
Lara-Abelenda, F. J. et al. Personalized glucose forecasting for people with type 1 diabetes using large language models. Comput. Methods Programs Biomed. 265, 108737 (2025).
Giorgi, S. et al. Evaluating generative AI responses to real-world drug-related questions. Psychiatry Res. 339, 116058 (2024).
Russell, A. M., Acuff, S. F., Kelly, J. F., Allem, J.-P. & Bergman, B. G. ChatGPT-4: alcohol use disorder responses. Addiction 119, 2205–2210 (2024).
Gabriel, R. A., Park, B. H., Hsu, C.-N. & Macias, A. A. A review of leveraging artificial intelligence to predict persistent postoperative opioid use and opioid use disorder and its ethical considerations. Curr. Pain Headache Rep. 29, 30 (2025).
Zhang, K. et al. Integrating visual large language model and reasoning chain for driver behavior analysis and risk assessment. Accid. Anal. Prev. 198, 107497 (2024).
Burns, C. et al. Use of generative AI for improving health literacy in reproductive health: case study. JMIR Form. Res. 8, e59434 (2024).
Swisher, A. R. et al. Enhancing health literacy: evaluating the readability of patient handouts revised by ChatGPT’s large language model. Otolaryngol. Head Neck Surg. https://doi.org/10.1002/ohn.927 (2024).
Oniani, D. et al. Emerging opportunities of using large language models for translation between drug molecules and indications. Sci. Rep. 14, 10738 (2024).
Li, S. et al. CodonBERT large language model for mRNA vaccines. Genome Res. 34, 1027–1035 (2024).
Consens, M. E., Li, B., Poetsch, A. R. & Gilbert, S. Genomic language models could transform medicine but not yet. NPJ Digit. Med. 8, 212 (2025).
Ng, F. Y. C. et al. Artificial intelligence education: an evidence-based medicine approach for consumers, translators, and developers. Cell Rep. Med. 4, 101230 (2023).
Wang, X. et al. ChatGPT: promise and challenges for deployment in low- and middle-income countries. Lancet Reg. Health West. Pac. 41, 100905 (2023).
van Hoek, A. J. et al. Importance of investing time and money in integrating large language model-based agents into outbreak analytics pipelines. Lancet Microbe 5, 100881 (2024).
Zhu, K. et al. Evaluating the accuracy of responses by large language models for information on disease epidemiology. Pharmacoepidemiol. Drug Saf. 34, e70111 (2025).
Acknowledgements
D.S.B. receives funding support from the American Cancer Society and the American Society for Radiation Oncology (ASTRO-CSDG-24-1244514-01-CTPS grant; https://doi.org/10.53354/ACS.ASTRO-CSDG-24-1244514-01-CTPS.pc.gr.222210), a Patient-Centered Outcomes Research Institute (PCORI) Project Program Award (ME-2024C2-37484), and the Woods Foundation. All statements in this article, including its findings and conclusions, are solely those of the authors and do not necessarily represent the views of the PCORI, its Board of Governors or its Methodology Committee. B.S. is supported by the National Natural Science Foundation of China (grant nos. T2525004 and 62272298). This work was supported by the Duke-NUS Signature Research Programmes, funded by the Ministry of Health of Singapore. The funder had no role in study design, conduct, data analysis and interpretation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not reflect the views of the Singapore Ministry of Health.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
D.S.B. is an associate editor of JCO Clinical Cancer Informatics (not related to the submitted work), an associate editor of the Radiation Oncology section of HemOnc.Org (not related to the submitted work) and on the scientific advisory board of Mercurial AI (not related to the submitted work). The other authors declare no competing interests.
Peer review
Peer review information
Nature Health thanks Sandra Barteit and Seyi Soremekun for their contribution to the peer review of this work. Primary Handling Editor: Lorenzo Righetto, in collaboration with the Nature Health team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Table 1: SDG 3-specific targets and search criteria in PubMed for examples and evidence.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ong, J.C.L., Ning, Y., Yang, R. et al. Large language models in global health. Nat. Health 1, 35–47 (2026). https://doi.org/10.1038/s44360-025-00024-7
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s44360-025-00024-7




