Machine learning models are increasingly being deployed in real-world clinical settings and have shown promise in patient diagnosis, treatment and outcome tasks. However, such models have also been shown to exhibit biases towards specific demographic groups, leading to inequitable outcomes for under-represented or historically marginalized communities.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 1 digital issues and online access to articles
$119.00 per year
only $119.00 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout

References
Maserejian, N. N. et al. Disparities in physicians’ interpretations of heart disease symptoms by patient gender: results of a video vignette factorial experiment. J. Womens Health 18, 1661–1667 (2009).
Zack, T. et al. Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study. Lancet Digit. Health 6, e12–e22 (2024).
Gonen, H. & Goldberg, Y. Lipstick on a pig: debiasing methods cover up systematic gender biases in word embeddings but do not remove them. in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (eds Burstein, J. et al.) 609–614 (Association for Computational Linguistics, 2019).
Schröder, S. et al. Measuring fairness with biased data: a case study on the effects of unsupervised data in fairness evaluation. in Advances in Computational Intelligence. IWANN 2023 vol. 14134 (eds Rojas, I. et al.) 134–145 (Springer, 2023).
Ktena, I. et al. Generative models improve fairness of medical classifiers under distribution shifts. Nat. Med. 30, 1166–1173 (2024).
Obermeyer, Z. et al. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019).
Caruana, R. et al. Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1721–1730 (ACM, 2015).
Srivastava, M., Hashimoto, T. & Liang, P. Robustness to spurious correlations via human annotations. in Proceedings of the 37th International Conference on Machine Learning Vol. 119, 9109–9119 (PMLR, 2020).
Yang, Y. et al. The limits of fair medical imaging AI in real-world generalization. Nat. Med. 30, 2838–2848 (2024).
Schrouff, J. et al. Diagnosing failures of fairness transfer across distribution shift in real-world medical settings. in Advances in Neural Information Processing Systems Vol. 35, 19304–19318 (NeurIPS, 2022).
Acknowledgements
This work was supported in part by a National Science Foundation (NSF) 22-586 Faculty Early Career Development Award (no. 2339381), a Gordon & Betty Moore Foundation award and a Google Research Scholar award.
Author information
Authors and Affiliations
Contributions
The authors contributed equally to all aspects of the article.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Rights and permissions
About this article
Cite this article
Zhang, H., Gerych, W. & Ghassemi, M. A data-centric perspective to fair machine learning for healthcare. Nat Rev Methods Primers 4, 86 (2024). https://doi.org/10.1038/s43586-024-00371-x
Published:
Version of record:
DOI: https://doi.org/10.1038/s43586-024-00371-x