Abstract
Despite well-documented healthcare access disparities, their impact on electronic health record reliability and resulting clinical prediction models remains poorly understood. Here, analysing 205,186 participants from the All of Us Research Program, we found that participants with cost-constrained or delayed care had worse electronic health record reliability for 73% of examined medical conditions as measured by participant self-reported conditions, driven in part by lower visit rates. In a type 2 diabetes prediction task, including participant self-reported conditions significantly improved the predictive performance for participants with lower access to care and improved targeting of low-access patients who would go on to develop type 2 diabetes. This study demonstrates that healthcare access systematically affects both data quality and clinical prediction performance, suggesting that improving health equity requires addressing both data collection biases and algorithmic limitations. Our findings provide an empirical foundation for developing clinical prediction systems that work effectively for all patients regardless of barriers to access.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout




Data availability
Owing to data use agreement restrictions to protect health information, the data are available only to registered users of the All of Us Researcher Workbench.
Code availability
Relevant code is hosted on the All of Us platform; we have also published the program files to a GitHub repository for public use via GitHub at https://github.com/zinka88/access.
References
van Doorslaer, E., Masseria, C., Koolman, X. & OECD Health Equity Research Group. Inequalities in access to medical care by income in developed countries. CMAJ 174, 177–183 (2006).
Caraballo, C. et al. Trends in racial and ethnic disparities in barriers to timely medical care among adults in the US, 1999 to 2018. JAMA Health Forum 3, e223856 (2022).
Lavergne, M. R. et al. Disparities in access to primary care are growing wider in Canada. Healthc. Manage. Forum 36, 272–279 (2023).
Palm, W. et al. Gaps in coverage and access in the European Union. Health Policy 125, 341–350 (2021).
Goldstein, B. A., Navar, A. M., Pencina, M. J. & Ioannidis, J. P. A. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J. Am. Med. Inform. Assoc. 24, 198–208 (2017).
Davenport, T. & Kalakota, R. The potential for artificial intelligence in healthcare. Future Healthc. J. 6, 94–98 (2019).
Ghassemi, M. et al. A review of challenges and opportunities in machine learning for health. AMIA Jt Summits Transl. Sci. Proc. 2020, 191–200 (2020).
Shen, J. H., Raji, I. D. & Chen, I. Y. The data addition dilemma. In Proc. 9th Machine Learning for Healthcare Conference (PMLR, 2024).
Sauer, C. M. et al. Leveraging electronic health records for data science: common pitfalls and how to avoid them. Lancet Digit. Health 4, e893–e898 (2022).
Bower, J. K., Patel, S., Rudy, J. E. & Felix, A. S. Addressing bias in electronic health record-based surveillance of cardiovascular disease risk: finding the signal through the noise. Curr. Epidemiol. Rep. 4, 346–352 (2017).
Beaulieu-Jones, B. K. et al. Machine learning for patient risk stratification: standing on, or looking over, the shoulders of clinicians? npj Digit. Med. 4, 1–6 (2021).
Seyyed-Kalantari, L., Zhang, H., McDermott, M. B. A., Chen, I. Y. & Ghassemi, M. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat. Med. 27, 2176–2182 (2021).
Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019).
Chen, I. Y., Joshi, S. & Ghassemi, M. Treating health disparities with artificial intelligence. Nat. Med. 26, 16–17 (2020).
Chen, I. Y., Szolovits, P. & Ghassemi, M. Can AI help reduce disparities in general medical and mental health care? AMA J. Ethics 21, 167–179 (2019).
Chen, I. Y. et al. Ethical machine learning in healthcare. Annu. Rev. Biomed. Data Sci. 4, 123–144 (2021).
Murray, J. The ‘All of Us’ Research Program. N. Engl. J. Med. 381, 1884 (2019).
Razavian, N. et al. Population-level prediction of type 2 diabetes from claims data and analysis of risk factors. Big Data 3, 277–287 (2015).
Park, J. H. et al. Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data. npj Digit. Med. 3, 1–7 (2020).
Zhang, Y. et al. Applying artificial intelligence methods for the estimation of disease incidence: the utility of language models. Front. Digit. Health 2, 569261 (2020).
Delpino, F. M. et al. Machine learning for predicting chronic diseases: a systematic review. Public Health 205, 14–25 (2022).
Pratley, R. E. The early treatment of type 2 diabetes. Am. J. Med. 126, S2–S9 (2013).
Ong, K. L. et al. Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021. Lancet 402, 203–234 (2023).
Mani, S., Chen, Y., Elasy, T., Clayton, W. & Denny, J. Type 2 diabetes risk forecasting from EMR data using machine learning. AMIA Annu. Symp. Proc. 2012, 606–615 (2012).
Anderson, J. P. et al. Reverse engineering and evaluation of prediction models for progression to type 2 diabetes: an application of machine learning using electronic health records. J. Diabetes Sci. Technol. 10, 6–18 (2015).
Zink, A., Obermeyer, Z. & Pierson, E. Race adjustments in clinical algorithms can help correct for racial disparities in data quality. Proc. Natl Acad. Sci. USA 121, e2402267121 (2024).
Lindström, J. & Tuomilehto, J. The diabetes risk score: a practical tool to predict type 2 diabetes risk. Diabetes Care 26, 725–731 (2003).
Lai, H., Huang, H., Keshavjee, K., Guergachi, A. & Gao, X. Predictive models for diabetes mellitus using machine learning techniques. BMC Endocr. Disord. 19, 101 (2019).
DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988).
Taber, J. M., Leyva, B. & Persoskie, A. Why do people avoid medical care? A qualitative study using national data. J. Gen. Intern. Med. 30, 290–297 (2015).
Nong, P., Adler-Milstein, J., Apathy, N. C., Holmgren, A. J. & Everson, J. Current use and evaluation of artificial intelligence and predictive models in US Hospitals. Health Affairs 44, 90–98 (2025).
Klein, H. E. Americans give US health care system a C grade, calling for policy change. AJMC https://www.ajmc.com/view/americans-give-us-health-care-system-a-c-grade-calling-for-policy-change (11 March 2025).
Carter, J., Skopec, L., Buettgens, M. & Banthin, J. Uninsurance and Medicaid eligibility among young adults in 2025: patterns by state and subgroup. Urban Institute https://www.urban.org/research/publication/uninsurance-and-medicaid-eligibility-among-young-adults-2025 (18 March 2025).
Rivera, S. C. et al. Embedding patient-reported outcomes at the heart of artificial intelligence health-care technologies. Lancet Digit. Health 5, e168–e173 (2023).
Paudel, R., Dias, S., Wade, C. G., Cronin, C. & Hassett, M. J. Use of Patient-Reported Outcomes in Risk Prediction Model Development to Support Cancer Care Delivery: A Scoping Review. JCO Clin Cancer Inform e2400145 https://doi.org/10.1200/CCI-24-00145 (2024).
McWilliams, J. M., Weinreb, G., Landrum, M. B. & Chernew, M. E. Use of patient health survey data for risk adjustment to limit distortionary coding incentives in Medicare. Health Affairs 44, 48–57 (2025).
Getzen, E., Ungar, L., Mowery, D., Jiang, X. & Long, Q. Mining for equitable health: assessing the impact of missing data in electronic health records. J. Biomed. Inform. 139, 104269 (2023).
Zhang, H., Clark, A. S. & Hubbard, R. A. A quantitative bias analysis approach to informative presence bias in electronic health records. Epidemiology 35, 349–358 (2024).
Wang, H. E. et al. A bias evaluation checklist for predictive models and its pilot application for 30-day hospital readmission models. J. Am. Med. Inform. Assoc. 29, 1323–1333 (2022).
Dankwa-Mullan, I. et al. A proposed framework on integrating health equity and racial justice into the artificial intelligence development lifecycle. J. Health Care Poor Underserved 32, 300–317 (2021).
Wagaw, F. Linking data from health surveys and electronic health records: a demonstration project in two Chicago health center clinics. Prev. Chronic Dis. 15, 170085 (2018).
O’Brien, E. C. et al. Concordance between patient-reported health data and electronic health data in the ADAPTABLE trial. JAMA Cardiol. 7, 1235–1243 (2022).
Huang, J., Galal, G., Etemadi, M. & Vaidyanathan, M. Evaluation and mitigation of racial bias in clinical machine learning models: scoping review. JMIR Med. Inform. 10, e36388 (2022).
Martin, L. M., Leff, M., Calonge, N., Garrett, C. & Nelson, D. E. Validation of self-reported chronic conditions and health services in a managed care population. Am. J. Prev. Med. 18, 215–218 (2000).
Acknowledgements
We gratefully acknowledge the All of Us participants for their contributions, without whom this research would not have been possible. We also thank the National Institutes of Health’s All of Us Research Program for making available the participant data examined in this study. This study used data from the All of Us Research Program’s controlled tier dataset v8, available to authorized users on the Researcher Workbench. I.Y.C., H.L. and A.Z. were funded by a Google Research Scholar award for this work. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. I.Y.C. receives additional support from an Apple Machine Learning Faculty Research Award.
Author information
Authors and Affiliations
Contributions
A.Z. and I.Y.C. conceived of the study and study design. A.Z. and H.L. analysed and interpreted the data. A.Z. wrote the paper. A.Z., H.L. and I.Y.C. edited the paper. All authors read and approved the final paper.
Corresponding author
Ethics declarations
Competing interests
I.Y.C. consults for Flatiron Health. A.Z. has equity in Knit Health Technologies Inc. H.L. declares no competing interests.
Peer review
Peer review information
Nature Health thanks Nicholas Conway and Christopher Sauer for their contribution to the peer review of this work. Primary Handling Editor: Ben Johnson, in collaboration with the Nature Health team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Share of Participants who Responded Yes to Access-Specific Questions.
Responses from 305,857 participants that responded to the Health Care Access & Utilization survey. We define participants as having low access if they indicated they had cost-constrained or delayed care for any of the listed reasons.
Supplementary information
Supplementary Information
Supplementary Tables 1 and 2.
Source data
Source Data Fig. 1
Statistical source data.
Source Data Fig. 2
Statistical source data.
Source Data Fig. 3
Statistical source data.
Source Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 1
Statistical source data.
Source Data Extended Data Table 1
Statistical source data.
Source Data Extended Data Table 2
Statistical source data.
Source Data Extended Data Table 3
Statistical source data.
Source Data Extended Data Table 4
Statistical source data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zink, A., Luan, H. & Chen, I.Y. Access to care affects electronic health record reliability and AI-driven disease prediction. Nat. Health (2026). https://doi.org/10.1038/s44360-026-00054-9
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s44360-026-00054-9