Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

npj Digital Medicine
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. npj digital medicine
  3. articles
  4. article
Enhanced language models for predicting and understanding HIV care disengagement: a case study in Tanzania
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 21 January 2026

Enhanced language models for predicting and understanding HIV care disengagement: a case study in Tanzania

  • Waverly Wei1 na1,
  • Junzhe Shao2 na1,
  • Rita Qiuran Lyu2 na1,
  • Rebecca Hemono2,
  • Xinwei Ma3,
  • Joseph Giorgio4,
  • Zeyu Zheng5,
  • Feng Ji6,
  • Xiaoya Zhang7,
  • Emmanuel Katabaro8,
  • Matilda Mlowe8,
  • Amon Sabasaba8,
  • Caroline Lister8,
  • Siraji Shabani9,
  • Prosper Njau9,
  • Sandra I. McCoy2 &
  • …
  • Jingshen Wang2 

npj Digital Medicine , Article number:  (2026) Cite this article

  • 776 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Disease prevention
  • Epidemiology
  • HIV infections

Abstract

Sustained engagement in HIV care and adherence to ART are crucial for meeting the UNAIDS “95-95-95” targets. Disengagement from care remains a significant issue, especially in sub-Saharan Africa. Traditional machine learning (ML) models have had moderate success in predicting disengagement, enabling early intervention. We developed an enhanced large language model (LLM) fine-tuned with electronic medical records (EMRs) to predict individuals at risk of disengaging from HIV care in Tanzania. Using 4.8 million EMR records from the National HIV Care and Treatment Program (2018–2023), we identified risks of ART non-adherence, non-suppressed viral load, and loss to follow-up. Our enhanced LLM may outperform traditional machine learning models and zero-shot LLMs. HIV physicians in Tanzania evaluated the model’s predictions and justifications, finding 65% alignment with expert assessments, and 92.3% of the aligned cases were considered clinically relevant. This model can support data-driven decisions and may improve patient outcomes and reduce HIV transmission.

Similar content being viewed by others

Harnessing registry data to identify socio-demographic and socio-economic gaps in HIV care in the Netherlands

Article Open access 26 November 2025

Survival prediction models for people living with HIV based on four machine learning models

Article Open access 25 August 2025

Labor Market Outcomes of People with HIV Pre- and Post-Diagnosis in the Netherlands

Article Open access 28 January 2026

Data availability

The electronic medical records data from Tanzania’s National HIV Care and Treatment Program used in this study contains protected health information and, therefore, cannot be shared publicly.

References

  1. Frescura, L. et al. Achieving the 95 95 95 targets for all: a pathway to ending AIDS. PLoS ONE 17, e0272405 (2022).

    Google Scholar 

  2. Plazy, M., Orne-Gliemann, J., Dabis, F. & Dray-Spira, R. Retention in care prior to antiretroviral treatment eligibility in sub-Saharan Africa: a systematic review of the literature. BMJ Open 5, e006927 (2015).

    Google Scholar 

  3. Penn, A. W. et al. Supportive interventions to improve retention on ART in people with HIV in low- and middle-income countries: a systematic review. PLOS ONE 13, e0208814 (2018).

    Google Scholar 

  4. Fahey, C. A. et al. Financial incentives to promote retention in care and viral suppression in adults with HIV initiating antiretroviral therapy in Tanzania: a three-arm randomised controlled trial. Lancet HIV 7, e762–e771 (2020).

    Google Scholar 

  5. Olatosi, B., Vermund, S. H. & Li, X. Power of Big Data in ending HIV. AIDS 35, S1–S5 (2021).

    Google Scholar 

  6. Akanbi, M. O. et al. Use of electronic health records in sub-Saharan Africa: progress and challenges. J. Med Trop. 14, 1–6 (2012).

    Google Scholar 

  7. Audere & Desmond Tutu Health Foundation. Artificial intelligence to enhance HIV prevention in age ofdisruptions: Recommendations from the Audere and Desmond Tutu Health Foundation expert consultation. Audere Africa (2025).

  8. Fahey, C. A. et al. Machine learning with routine electronic medical record data to identify people at high risk of disengagement from HIV care in Tanzania. PLOS Glob. Public Health 2, e0000720 (2022).

    Google Scholar 

  9. Meta AI. The Llama 3 herd of models. https://ai.meta.com/research/publications/the-llama-3-herdof-models/ (2024).

  10. Bedi, S. et al. Testing and evaluation of health care applications of large language models: a systematic review. JAMA. https://doi.org/10.1001/jama.2024.21700. (2024).

  11. Thirunavukarasu, A. J. et al. Large language models in medicine. Nat. Med. 29, 1930–1940 (2023).

    Google Scholar 

  12. Esra, R. T. et al. Historical visit attendance as predictor of treatment interruption in South African HIV patients: extension of a validated machine learning model. PLOS Glob. Public Health 3, e0002105 (2023).

    Google Scholar 

  13. Maskew, M. et al. Applying machine learning and predictive modeling to retention and viral suppression in South African HIV treatment cohorts. Sci. Rep. 12, 12715 (2022).

    Google Scholar 

  14. Xie, Z. et al. Prevention of adverse HIV treatment outcomes: machine learning to enable proactive support of people at risk of HIV care disengagement in Tanzania. BMJ Open 14, e088782 (2024).

    Google Scholar 

  15. PEPFAR Fiscal Year 2023 Monitoring, Evaluation, and Reporting (MER) Indicators. United States Department of State. https://www.state.gov/pepfar-fy-2023-mer-indicators/ (accessed Dec 20, 2024).

  16. Fang, X. et al. Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey. https://doi.org/10.48550/arXiv.2402.17944. (2024).

  17. Yu, B., Fu, C., Yu, H., Huang, F. & Li Y. Unified language representation for question answering over text, tables, andimages. In Findings of the Association for Computational Linguistics: ACL 2023, 4756–4765, https://doi.org/10.18653/v1/2023.findings-acl.292 (Toronto, Canada, 2023).

  18. Gong, H. et al. TableGPT: Few-shot Table-to-Text Generation with Table Structure Reconstruction and Content Matching. In Proc. 28th International Conference on Computational Linguistics. Barcelona, Spain (Online) (eds Scott D., Bel N., Zong C.) 1978–1988 (International Committee on Computational Linguistics, 2020).

  19. Hegselmann, S. et al. TabLLM: few-shot classification of tabular data with large language models. In Proc. 26th International Conference on Artificial Intelligence and Statistics (PMLR, 2023).

  20. Hu, E. J. et al. LoRA: Low-Rank Adaptation of Large Language Models. https://doi.org/10.48550/arXiv.2106.09685. (2021).

  21. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: unbiased boosting with categorical features. In Proc. 32nd International Conference on Neural Information Processing Systems (Curran Associates Inc., 2018).

  22. Hand, D. J. & Till, R. J. A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45, 171–186 (2001).

    Google Scholar 

  23. Vaswani, A. et al. Attention is all you need. In Proc. 31st International Conference on Neural Information Processing System. 6000–6010 (Curran Associates Inc., 2017).

  24. Yeh, C. et al. AttentionViz: a global view of transformer attention. In IEEE Trans Visual Comput Graphics (IEEE, 2023).

Download references

Acknowledgements

The study was supported by a grant from the US National Institutes of Health (NIH): NIH 1R01MH125746. We thank the review team for their valuable suggestions and comments, which significantly improved our paper.

Author information

Author notes
  1. These authors contributed equally: Waverly Wei, Junzhe Shao, Rita Qiuran Lyu.

Authors and Affiliations

  1. Marshall School of Business, University of Southern California, Los Angeles, CA, USA

    Waverly Wei

  2. School of Public Health, University of California, Berkeley, CA, USA

    Junzhe Shao, Rita Qiuran Lyu, Rebecca Hemono, Sandra I. McCoy & Jingshen Wang

  3. Department of Economics, University of California, San Diego, CA, USA

    Xinwei Ma

  4. Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA

    Joseph Giorgio

  5. College of Engineering, University of California, Berkeley, CA, USA

    Zeyu Zheng

  6. Department of Applied Psychology, University of Toronto, Toronto, ON, Canada

    Feng Ji

  7. Department of Family, Youth and Community Sciences, University of Florida, Gainesville, FL, USA

    Xiaoya Zhang

  8. Health for a Prosperous Nation, Dar es Salaam, Tanzania

    Emmanuel Katabaro, Matilda Mlowe, Amon Sabasaba & Caroline Lister

  9. Ministry of Health, Dodoma, Tanzania

    Siraji Shabani & Prosper Njau

Authors
  1. Waverly Wei
    View author publications

    Search author on:PubMed Google Scholar

  2. Junzhe Shao
    View author publications

    Search author on:PubMed Google Scholar

  3. Rita Qiuran Lyu
    View author publications

    Search author on:PubMed Google Scholar

  4. Rebecca Hemono
    View author publications

    Search author on:PubMed Google Scholar

  5. Xinwei Ma
    View author publications

    Search author on:PubMed Google Scholar

  6. Joseph Giorgio
    View author publications

    Search author on:PubMed Google Scholar

  7. Zeyu Zheng
    View author publications

    Search author on:PubMed Google Scholar

  8. Feng Ji
    View author publications

    Search author on:PubMed Google Scholar

  9. Xiaoya Zhang
    View author publications

    Search author on:PubMed Google Scholar

  10. Emmanuel Katabaro
    View author publications

    Search author on:PubMed Google Scholar

  11. Matilda Mlowe
    View author publications

    Search author on:PubMed Google Scholar

  12. Amon Sabasaba
    View author publications

    Search author on:PubMed Google Scholar

  13. Caroline Lister
    View author publications

    Search author on:PubMed Google Scholar

  14. Siraji Shabani
    View author publications

    Search author on:PubMed Google Scholar

  15. Prosper Njau
    View author publications

    Search author on:PubMed Google Scholar

  16. Sandra I. McCoy
    View author publications

    Search author on:PubMed Google Scholar

  17. Jingshen Wang
    View author publications

    Search author on:PubMed Google Scholar

Contributions

W.W., J.S., and R.Q.L. conducted the model training and data analysis and organized the study results, including figures and tables. X.M., J.G., Z.Z., X.Z., R.H., and F.J. provided guidance on methodology development. E.K., M.M., A.S., C.L., S.S., P.N., and S.I.M. facilitated data access, offered administrative support, and contributed domain knowledge and insights during manuscript development. J.W. drafted the manuscript, guided the study design, and supervised the project. All authors contributed to study results interpretation and major revision of the manuscript. All authors have read and approved the final version of the manuscript.

Corresponding author

Correspondence to Jingshen Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, W., Shao, J., Lyu, R.Q. et al. Enhanced language models for predicting and understanding HIV care disengagement: a case study in Tanzania. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02349-3

Download citation

  • Received: 25 February 2025

  • Accepted: 07 January 2026

  • Published: 21 January 2026

  • DOI: https://doi.org/10.1038/s41746-026-02349-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Associated content

Collection

AI for Population Medicine and Public Health

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Collections
  • Follow us on Twitter
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Content types
  • Journal Information
  • About the Editors
  • Contact
  • Editorial policies
  • Calls for Papers
  • Journal Metrics
  • About the Partner
  • Open Access
  • Early Career Researcher Editorial Fellowship
  • Editorial Team Vacancies
  • News and Views Student Editor
  • Communication Fellowship

Publish with us

  • For Authors and Referees
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

npj Digital Medicine (npj Digit. Med.)

ISSN 2398-6352 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing