Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Access to care affects electronic health record reliability and AI-driven disease prediction

Abstract

Despite well-documented healthcare access disparities, their impact on electronic health record reliability and resulting clinical prediction models remains poorly understood. Here, analysing 205,186 participants from the All of Us Research Program, we found that participants with cost-constrained or delayed care had worse electronic health record reliability for 73% of examined medical conditions as measured by participant self-reported conditions, driven in part by lower visit rates. In a type 2 diabetes prediction task, including participant self-reported conditions significantly improved the predictive performance for participants with lower access to care and improved targeting of low-access patients who would go on to develop type 2 diabetes. This study demonstrates that healthcare access systematically affects both data quality and clinical prediction performance, suggesting that improving health equity requires addressing both data collection biases and algorithmic limitations. Our findings provide an empirical foundation for developing clinical prediction systems that work effectively for all patients regardless of barriers to access.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Study overview and data cohort.
Fig. 2: EHR reliability rate for conditions.
Fig. 3: Cumulative incidence of time to first visit.
Fig. 4: Diabetes task model performance.

Data availability

Owing to data use agreement restrictions to protect health information, the data are available only to registered users of the All of Us Researcher Workbench.

Code availability

Relevant code is hosted on the All of Us platform; we have also published the program files to a GitHub repository for public use via GitHub at https://github.com/zinka88/access.

References

  1. van Doorslaer, E., Masseria, C., Koolman, X. & OECD Health Equity Research Group. Inequalities in access to medical care by income in developed countries. CMAJ 174, 177–183 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  2. Caraballo, C. et al. Trends in racial and ethnic disparities in barriers to timely medical care among adults in the US, 1999 to 2018. JAMA Health Forum 3, e223856 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Lavergne, M. R. et al. Disparities in access to primary care are growing wider in Canada. Healthc. Manage. Forum 36, 272–279 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Palm, W. et al. Gaps in coverage and access in the European Union. Health Policy 125, 341–350 (2021).

    Article  PubMed  Google Scholar 

  5. Goldstein, B. A., Navar, A. M., Pencina, M. J. & Ioannidis, J. P. A. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J. Am. Med. Inform. Assoc. 24, 198–208 (2017).

    Article  PubMed  Google Scholar 

  6. Davenport, T. & Kalakota, R. The potential for artificial intelligence in healthcare. Future Healthc. J. 6, 94–98 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Ghassemi, M. et al. A review of challenges and opportunities in machine learning for health. AMIA Jt Summits Transl. Sci. Proc. 2020, 191–200 (2020).

    PubMed  PubMed Central  Google Scholar 

  8. Shen, J. H., Raji, I. D. & Chen, I. Y. The data addition dilemma. In Proc. 9th Machine Learning for Healthcare Conference (PMLR, 2024).

  9. Sauer, C. M. et al. Leveraging electronic health records for data science: common pitfalls and how to avoid them. Lancet Digit. Health 4, e893–e898 (2022).

    Article  CAS  PubMed  Google Scholar 

  10. Bower, J. K., Patel, S., Rudy, J. E. & Felix, A. S. Addressing bias in electronic health record-based surveillance of cardiovascular disease risk: finding the signal through the noise. Curr. Epidemiol. Rep. 4, 346–352 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Beaulieu-Jones, B. K. et al. Machine learning for patient risk stratification: standing on, or looking over, the shoulders of clinicians? npj Digit. Med. 4, 1–6 (2021).

    Article  Google Scholar 

  12. Seyyed-Kalantari, L., Zhang, H., McDermott, M. B. A., Chen, I. Y. & Ghassemi, M. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat. Med. 27, 2176–2182 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019).

    Article  CAS  PubMed  Google Scholar 

  14. Chen, I. Y., Joshi, S. & Ghassemi, M. Treating health disparities with artificial intelligence. Nat. Med. 26, 16–17 (2020).

    Article  CAS  PubMed  Google Scholar 

  15. Chen, I. Y., Szolovits, P. & Ghassemi, M. Can AI help reduce disparities in general medical and mental health care? AMA J. Ethics 21, 167–179 (2019).

    Article  Google Scholar 

  16. Chen, I. Y. et al. Ethical machine learning in healthcare. Annu. Rev. Biomed. Data Sci. 4, 123–144 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Murray, J. The ‘All of Us’ Research Program. N. Engl. J. Med. 381, 1884 (2019).

    PubMed  Google Scholar 

  18. Razavian, N. et al. Population-level prediction of type 2 diabetes from claims data and analysis of risk factors. Big Data 3, 277–287 (2015).

    Article  PubMed  Google Scholar 

  19. Park, J. H. et al. Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data. npj Digit. Med. 3, 1–7 (2020).

    Article  Google Scholar 

  20. Zhang, Y. et al. Applying artificial intelligence methods for the estimation of disease incidence: the utility of language models. Front. Digit. Health 2, 569261 (2020).

  21. Delpino, F. M. et al. Machine learning for predicting chronic diseases: a systematic review. Public Health 205, 14–25 (2022).

    Article  CAS  PubMed  Google Scholar 

  22. Pratley, R. E. The early treatment of type 2 diabetes. Am. J. Med. 126, S2–S9 (2013).

    Article  PubMed  Google Scholar 

  23. Ong, K. L. et al. Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021. Lancet 402, 203–234 (2023).

    Article  Google Scholar 

  24. Mani, S., Chen, Y., Elasy, T., Clayton, W. & Denny, J. Type 2 diabetes risk forecasting from EMR data using machine learning. AMIA Annu. Symp. Proc. 2012, 606–615 (2012).

    PubMed  PubMed Central  Google Scholar 

  25. Anderson, J. P. et al. Reverse engineering and evaluation of prediction models for progression to type 2 diabetes: an application of machine learning using electronic health records. J. Diabetes Sci. Technol. 10, 6–18 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Zink, A., Obermeyer, Z. & Pierson, E. Race adjustments in clinical algorithms can help correct for racial disparities in data quality. Proc. Natl Acad. Sci. USA 121, e2402267121 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Lindström, J. & Tuomilehto, J. The diabetes risk score: a practical tool to predict type 2 diabetes risk. Diabetes Care 26, 725–731 (2003).

    Article  PubMed  Google Scholar 

  28. Lai, H., Huang, H., Keshavjee, K., Guergachi, A. & Gao, X. Predictive models for diabetes mellitus using machine learning techniques. BMC Endocr. Disord. 19, 101 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  29. DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988).

    Article  CAS  PubMed  Google Scholar 

  30. Taber, J. M., Leyva, B. & Persoskie, A. Why do people avoid medical care? A qualitative study using national data. J. Gen. Intern. Med. 30, 290–297 (2015).

    Article  PubMed  Google Scholar 

  31. Nong, P., Adler-Milstein, J., Apathy, N. C., Holmgren, A. J. & Everson, J. Current use and evaluation of artificial intelligence and predictive models in US Hospitals. Health Affairs 44, 90–98 (2025).

    Article  PubMed  Google Scholar 

  32. Klein, H. E. Americans give US health care system a C grade, calling for policy change. AJMC https://www.ajmc.com/view/americans-give-us-health-care-system-a-c-grade-calling-for-policy-change (11 March 2025).

  33. Carter, J., Skopec, L., Buettgens, M. & Banthin, J. Uninsurance and Medicaid eligibility among young adults in 2025: patterns by state and subgroup. Urban Institute https://www.urban.org/research/publication/uninsurance-and-medicaid-eligibility-among-young-adults-2025 (18 March 2025).

  34. Rivera, S. C. et al. Embedding patient-reported outcomes at the heart of artificial intelligence health-care technologies. Lancet Digit. Health 5, e168–e173 (2023).

    Article  Google Scholar 

  35. Paudel, R., Dias, S., Wade, C. G., Cronin, C. & Hassett, M. J. Use of Patient-Reported Outcomes in Risk Prediction Model Development to Support Cancer Care Delivery: A Scoping Review. JCO Clin Cancer Inform e2400145 https://doi.org/10.1200/CCI-24-00145 (2024).

  36. McWilliams, J. M., Weinreb, G., Landrum, M. B. & Chernew, M. E. Use of patient health survey data for risk adjustment to limit distortionary coding incentives in Medicare. Health Affairs 44, 48–57 (2025).

    Article  PubMed  Google Scholar 

  37. Getzen, E., Ungar, L., Mowery, D., Jiang, X. & Long, Q. Mining for equitable health: assessing the impact of missing data in electronic health records. J. Biomed. Inform. 139, 104269 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Zhang, H., Clark, A. S. & Hubbard, R. A. A quantitative bias analysis approach to informative presence bias in electronic health records. Epidemiology 35, 349–358 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Wang, H. E. et al. A bias evaluation checklist for predictive models and its pilot application for 30-day hospital readmission models. J. Am. Med. Inform. Assoc. 29, 1323–1333 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Dankwa-Mullan, I. et al. A proposed framework on integrating health equity and racial justice into the artificial intelligence development lifecycle. J. Health Care Poor Underserved 32, 300–317 (2021).

    Article  Google Scholar 

  41. Wagaw, F. Linking data from health surveys and electronic health records: a demonstration project in two Chicago health center clinics. Prev. Chronic Dis. 15, 170085 (2018).

  42. O’Brien, E. C. et al. Concordance between patient-reported health data and electronic health data in the ADAPTABLE trial. JAMA Cardiol. 7, 1235–1243 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Huang, J., Galal, G., Etemadi, M. & Vaidyanathan, M. Evaluation and mitigation of racial bias in clinical machine learning models: scoping review. JMIR Med. Inform. 10, e36388 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Martin, L. M., Leff, M., Calonge, N., Garrett, C. & Nelson, D. E. Validation of self-reported chronic conditions and health services in a managed care population. Am. J. Prev. Med. 18, 215–218 (2000).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge the All of Us participants for their contributions, without whom this research would not have been possible. We also thank the National Institutes of Health’s All of Us Research Program for making available the participant data examined in this study. This study used data from the All of Us Research Program’s controlled tier dataset v8, available to authorized users on the Researcher Workbench. I.Y.C., H.L. and A.Z. were funded by a Google Research Scholar award for this work. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. I.Y.C. receives additional support from an Apple Machine Learning Faculty Research Award.

Author information

Authors and Affiliations

Authors

Contributions

A.Z. and I.Y.C. conceived of the study and study design. A.Z. and H.L. analysed and interpreted the data. A.Z. wrote the paper. A.Z., H.L. and I.Y.C. edited the paper. All authors read and approved the final paper.

Corresponding author

Correspondence to Anna Zink.

Ethics declarations

Competing interests

I.Y.C. consults for Flatiron Health. A.Z. has equity in Knit Health Technologies Inc. H.L. declares no competing interests.

Peer review

Peer review information

Nature Health thanks Nicholas Conway and Christopher Sauer for their contribution to the peer review of this work. Primary Handling Editor: Ben Johnson, in collaboration with the Nature Health team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Share of Participants who Responded Yes to Access-Specific Questions.

Responses from 305,857 participants that responded to the Health Care Access & Utilization survey. We define participants as having low access if they indicated they had cost-constrained or delayed care for any of the listed reasons.

Source data

Extended Data Table 1 Sample Characteristics Compared to the U.S. Population
Extended Data Table 2 Age-Adjusted Self-Reported Condition Prevalence Rate
Extended Data Table 3 Diabetes Sample Characteristics
Extended Data Table 4 Comparing AUC Values for Standard Care versus Other Access Groups

Supplementary information

Supplementary Information

Supplementary Tables 1 and 2.

Reporting Summary

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Table 1

Statistical source data.

Source Data Extended Data Table 2

Statistical source data.

Source Data Extended Data Table 3

Statistical source data.

Source Data Extended Data Table 4

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zink, A., Luan, H. & Chen, I.Y. Access to care affects electronic health record reliability and AI-driven disease prediction. Nat. Health (2026). https://doi.org/10.1038/s44360-026-00054-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s44360-026-00054-9

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing