Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Machine learning and algorithmic fairness in public and population health

Abstract

Until now, much of the work on machine learning and health has focused on processes inside the hospital or clinic. However, this represents only a narrow set of tasks and challenges related to health; there is greater potential for impact by leveraging machine learning in health tasks more broadly. In this Perspective we aim to highlight potential opportunities and challenges for machine learning within a holistic view of health and its influences. To do so, we build on research in population and public health that focuses on the mechanisms between different cultural, social and environmental factors and their effect on the health of individuals and communities. We present a brief introduction to research in these fields, data sources and types of tasks, and use these to identify settings where machine learning is relevant and can contribute to new knowledge. Given the key foci of health equity and disparities within public and population health, we juxtapose these topics with the machine learning subfield of algorithmic fairness to highlight specific opportunities where machine learning, public and population health may synergize to achieve health equity.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The socio-ecological model of health.
Fig. 2: Illustration of sources of bias at different stages of data and algorithm use.
Fig. 3: Abstract illustration of challenge in algorithmic fairness due to unexplained variance or proxy variables.
Fig. 4: Abstract illustration of challenge in algorithmic fairness owing to who the data represents.

Similar content being viewed by others

References

  1. Rose, G. Sick individuals and sick populations. Int. J. Epidemiol. 14, 427–432 (1985).

    Article  Google Scholar 

  2. Braveman, P. Health disparities and health equity: concepts and measurement. Annu. Rev. Public Health 27, 167–194 (2006).

    Article  Google Scholar 

  3. Woolf, S. H., Johnson, R. E., Fryer Jr, G. E., Rust, G. & Satcher, D. The health impact of resolving racial disparities: an analysis of US mortality data. Am. J. Public Health 94, 2078–2081 (2004).

    Article  Google Scholar 

  4. Bronfenbrenner, U. Toward an experimental ecology of human development. Am. Psychol. 32, 513 (1977).

    Article  Google Scholar 

  5. Veinot, T. C., Mitchell, H. & Ancker, J. S. Good intentions are not enough: how informatics interventions can worsen inequality. J. Am. Med. Inform. Assoc. 25, 1080–1088 (2018).

    Article  Google Scholar 

  6. Barrientos-Gutierrez, T. et al. Neighborhood physical environment and changes in body mass index: results from the multi-ethnic study of atherosclerosis. Am. J. Epidemiol. 186, 1237–1245 (2017).

    Article  Google Scholar 

  7. Creanga, A. A. et al. Maternal mortality and morbidity in the United States: where are we now? J. Women’s Health 23, 3–9 (2014).

    Article  Google Scholar 

  8. Social Determinants of Health (WHO Regional Office for South-East Asia, 2008).

  9. Heiman, H. J. & Artiga, S. Beyond health care: the role of social determinants in promoting health and health equity. Health 20, 1–10 (2015).

    Google Scholar 

  10. 2008-2013 Action Plan for the Global Strategy for the Prevention and Control of Noncommunicable Diseases: Prevent and Control Cardiovascular Diseases, Cancers, Chronic Respiratory Diseases and Diabetes (World Health Organization, 2009).

  11. Saria, S., Rajani, A. K., Gould, J., Koller, D. & Penn, A. A. Integration of early physiological responses predicts later illness severity in preterm infants. Sci. Transl. Med. 2, 48ra65 (2010).

    Article  Google Scholar 

  12. Sweatt, A. J. et al. Discovery of distinct immune phenotypes using machine learning in pulmonary arterial hypertension. Circ. Res. 124, 904–919 (2019).

    Article  Google Scholar 

  13. Gatto, M. et al. Spread and dynamics of the COVID-19 epidemic in Italy: effects of emergency containment measures. Proc. Natl Acad. Sci. USA 117, 10484–10491 (2020).

    Article  Google Scholar 

  14. Smit, A. J. et al. Winter is coming: a southern hemisphere perspective of the environmental drivers of SARS-CoV-2 and the potential seasonality of COVID-19. Int. J. Environ. Res. Public Health 17, 5634 (2020).

    Article  Google Scholar 

  15. Sajadi, M. M. et al. Temperature, humidity, and latitude analysis to estimate potential spread and seasonality of coronavirus disease 2019 (COVID-19). JAMA Netw. Open 3, e2011834 (2020).

    Article  Google Scholar 

  16. Chaudhry, R., Dranitsaris, G., Mubashir, T., Bartoszko, J. & Riazi, S. A country level analysis measuring the impact of government actions, country preparedness and socioeconomic factors on COVID-19 mortality and related health outcomes. EClinicalMedicine 25, 100464 (2020).

    Article  Google Scholar 

  17. Bann, D. et al. Changes in the behavioural determinants of health during the COVID-19 pandemic: gender, socioeconomic and ethnic inequalities in five British cohort studies. J. Epidemiol. Commun. Health https://doi.org/10.1136/jech-2020-215664 (2021).

  18. Laurencin, C. T. & McClinton, A. The COVID-19 pandemic: a call to action to identify and address racial and ethnic disparities. J. Racial Ethnic Health Dispar. 7, 398–402 (2020).

    Article  Google Scholar 

  19. Abedi, V. et al. Racial, economic, and health inequality and COVID-19 infection in the United States. J. Racial Ethnic Health Dispar. 8, 732–742 (2021).

    Article  Google Scholar 

  20. Chunara, R., Smolinski, M. S. & Brownstein, J. S. Why we need crowdsourced data in infectious disease surveillance. Curr. Infect. Dis. Rep. 15, 316–319 (2013).

    Article  Google Scholar 

  21. Kusnoor, S. V. et al. Collection of social determinants of health in the community clinic setting: a cross-sectional study. BMC Public Health 18, 550 (2018).

    Article  Google Scholar 

  22. Chunara, R., Wisk, L. E. & Weitzman, E. R. Denominator issues for personally generated data in population health monitoring. Am. J. Prevent. Med. 52, 549–553 (2017).

    Article  Google Scholar 

  23. Mhasawade, V., Elghafari, A., Duncan, D. T. & Chunara, R. Role of the built and online social environments on expression of dining on instagram. Int. J. Environ. Res. Public Health 17, 735 (2020).

    Article  Google Scholar 

  24. Zhan, A. et al. Using smartphones and machine learning to quantify Parkinson disease severity: the mobile Parkinson disease score. JAMA Neurol. 75, 876–880 (2018).

    Article  Google Scholar 

  25. Mhasawade, V., Rehman, N. A. & Chunara, R. Population-aware hierarchical Bayesian domain adaptation via multi-component invariant learning. In Proc. ACM Conference on Health, Inference, and Learning 182–192 (ACM, 2020).

  26. Burgess, S., Foley, C. N. & Zuber, V. Inferring causal relationships between risk factors and outcomes from genome-wide association study data. Annu. Rev. Genom. Hum. Genet. 19, 303–327 (2018).

    Article  Google Scholar 

  27. Bhatt, S. et al. The global distribution and burden of dengue. Nature 496, 504–507 (2013).

    Article  Google Scholar 

  28. Zhao, Y. et al. Machine learning for integrating social determinants in cardiovascular disease prediction models: a systematic review. Preprint at medRxiv https://doi.org/10.1101/2020.09.11.20192989 (2020).

  29. Goldberg, D. S. Social justice, health inequalities and methodological individualism in US health promotion. Public Health Ethics 5, 104–115 (2012).

    Article  Google Scholar 

  30. Burns, M. N. et al. Harnessing context sensing to develop a mobile intervention for depression. J. Med. Internet Res. 13, e55 (2011).

    Article  Google Scholar 

  31. Manuvinakurike, R., Velicer, W. F. & Bickmore, T. W. Automated indexing of internet stories for health behavior change: weight loss attitude pilot study. J. Med. Internet Res. 16, e285 (2014).

    Article  Google Scholar 

  32. Ahsan, G. T. et al. Toward an mhealth intervention for smoking cessation. In Proc. 2013 IEEE 37th Annual Computer Software and Applications Conference Workshops 345–350 (IEEE, 2013).

  33. Triantafyllidis, A. K. & Tsanas, A. Applications of machine learning in real-life digital health interventions: review of the literature. J. Med. Internet Res. 21, e12286 (2019).

    Article  Google Scholar 

  34. Mahamoud, A., Roche, B. & Homer, J. Modelling the social determinants of health and simulating short-term and long-term intervention impacts for the city of Toronto, Canada. Soc. Sci. Med. 93, 247–255 (2013).

    Article  Google Scholar 

  35. Kouser, H. N., Barnard-Mayers, R. & Murray, E. Complex systems models for causal inference in social epidemiology. J. Epidemiol. Commun. Health 75, 702–708 (2021).

    Article  Google Scholar 

  36. Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. npj Digit. Med. 1, 18 (2018).

    Article  Google Scholar 

  37. Shameer, K. et al. Predictive modeling of hospital readmission rates using electronic medical record-wide machine learning: a case-study using Mount Sinai heart failure cohort. In Pacific Symposium on Biocomputing 2017 276–287 (World Scientific, 2017).

  38. Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).

    Article  Google Scholar 

  39. Bhatt, S. et al. Improved prediction accuracy for disease risk mapping using Gaussian process stacked generalization. J. R. Soc. Interface 14, 20170520 (2017).

    Article  Google Scholar 

  40. Galiatsatos, P. et al. The association between neighborhood socioeconomic disadvantage and readmissions for patients hospitalized with sepsis. In C94: The Impact of Social Determinants in Pulmonary and Critical Care A5569 (American Thoracic Society, 2019).

  41. Vyas, D. A., Eisenstein, L. G. & Jones, D. S. Hidden in plain sight-reconsidering the use of race correction in clinical algorithms. N. Engl. J. Med. 383, 874–882 (2020).

    Article  Google Scholar 

  42. Hamad, R., Nguyen, T. T., Bhattacharya, J., Glymour, M. M. & Rehkopf, D. H. Educational attainment and cardiovascular disease in the united states: a quasi-experimental instrumental variables analysis. PLoS Med. 16, e1002834 (2019).

    Article  Google Scholar 

  43. Bynum, J. & Lewis, V. Value-based payments and inaccurate risk adjustment-who is harmed? JAMA Intern. Med. 178, 1507–1508 (2018).

    Article  Google Scholar 

  44. Alley, D. E., Asomugha, C. N., Conway, P. H. & Sanghavi, D. M. et al. Accountable health communities-addressing social needs through medicare and medicaid. N. Engl. J. Med 374, 8–11 (2016).

    Article  Google Scholar 

  45. Alaa, A. M. & van der Schaar, M. in Advances in Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.) (NeurIPS, 2017).

  46. Chang, C.-H., Mai, M. & Goldenberg, A. Dynamic measurement scheduling for event forecasting using deep RL. In International Conference on Machine Learning 951–960 (PMLR, 2019).

  47. Coughlin, L. N. et al. Developing an adaptive mobile intervention to address risky substance use among adolescents and emerging adults: usability study. JMIR mHealth uHealth 9, e24424 (2021).

    Article  Google Scholar 

  48. Snyder, J. J. et al. Organ distribution without geographic boundaries: a possible framework for organ allocation. Am. J. Transplant. 18, 2635–2640 (2018).

    Article  Google Scholar 

  49. Mantelero, A. in Group Privacy 139–158 (Springer, 2017).

  50. Gasser, U., Ienca, M., Scheibner, J., Sleigh, J. & Vayena, E. Digital tools against COVID-19: taxonomy, ethical challenges, and navigation aid. Lancet Digit. Health 2, e425–e434 (2020).

    Article  Google Scholar 

  51. Jobin, A., Ienca, M. & Vayena, E. The global landscape of AI ethics guidelines. Nat. Mach. Intell. 1, 389–399 (2019).

    Article  Google Scholar 

  52. Privacy and the COVID-19 Outbreak (Office of the Privacy Commissioner of Canada, 2020); https://priv.gc.ca/en/privacy-topics/health-genetic-and-other-body-information/health-emergencies/gd_covid_202003/

  53. Langarizadeh, M., Orooji, A., Sheikhtaheri, A. & Hayn, D. Effectiveness of anonymization methods in preserving patients’ privacy: a systematic literature review. eHealth 80–87 (2018).

  54. Smith, M., Szongott, C., Henne, B. & Von Voigt, G. Big data privacy issues in public social media. In Proc. 2012 6th IEEE International Conference on Digital Ecosystems and Technologies (DEST) 1–6 (IEEE, 2012).

  55. Yearby, R. Structural racism and health disparities: reconfiguring the social determinants of health framework to include the root cause. J. Law Med. Ethics 48, 518–526 (2020).

    Article  Google Scholar 

  56. Fiesler, C. & Proferes, N. ‘Participant’ perceptions of Twitter research ethics. Soc. Media Soc. 4, 2056305118763366 (2018).

    Google Scholar 

  57. Sandhaus, S., Kaufmann, D. & Ramirez-Andreotta, M. Public participation, trust and data sharing: gardens as hubs for citizen science and environmental health literacy efforts. Int. J. Sci. Educ. B 9, 54–71 (2019).

    Google Scholar 

  58. Chunara, R. & Cook, S. H. Using digital data to protect and promote the most vulnerable in the fight against COVID-19. Front. Public Health 8, 296 (2020).

    Article  Google Scholar 

  59. Liu, X., Zhang, B., Susarla, A. & Padman, R. Youtube for patient education: a deep learning approach for understanding medical knowledge from user-generated videos. Preprint at https://arxiv.org/abs/1807.03179 (2018).

  60. Dawkins-Moultin, L., McDonald, A. & McKyer, L. Integrating the principles of socioecology and critical pedagogy for health promotion health literacy interventions. J. Health Commun. 21, 30–35 (2016).

    Article  Google Scholar 

  61. Hong, S. J., Drake, B., Goodman, M. & Kaphingst, K. A. Race, trust in doctors, privacy concerns, and consent preferences for biobanks. Health Commun. 35, 1219–1228 (2020).

    Article  Google Scholar 

  62. Tanner, M. A. & Wong, W. H. The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 82, 528–540 (1987).

    Article  MathSciNet  MATH  Google Scholar 

  63. Daughton, A. R., Chunara, R. & Paul, M. J. Comparison of social media, syndromic surveillance, and microbiologic acute respiratory infection data: observational study. JMIR Public Health Surveill. 6, e14986 (2020).

    Article  Google Scholar 

  64. Sun, B., Feng, J. & Saenko, K. Return of frustratingly easy domain adaptation. In Proc. AAAI Conference on Artificial Intelligence Vol. 30 (AAAI, 2016).

  65. Pearl, J. & Bareinboim, E. Transportability of causal and statistical relations: a formal approach. In Proc. AAAI Conference on Artificial Intelligence Vol. 25 (AAAI, 2011).

  66. Scepanovic, S., Martin-Lopez, E., Quercia, D. & Baykaner, K. Extracting medical entities from social media. In Proc. ACM Conference on Health, Inference, and Learning 170–181 (ACM, 2020).

  67. Abdur Rehman, N., Saif, U. & Chunara, R. Deep landscape features for improving vector-borne disease prediction. In Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshops 44–51 (IEEE, 2019).

  68. Relia, K., Akbari, M., Duncan, D. & Chunara, R. Socio-spatial self-organizing maps: using social media to assess relevant geographies for exposure to social processes. Proc. ACM Hum.Comput. Interact. 2, 1–23 (2018).

    Article  Google Scholar 

  69. Relia, K., Li, Z., Cook, S. H. & Chunara, R. Race, ethnicity and national origin-based discrimination in social media and hate crimes across 100 US cities. In Proc. International AAAI Conference on Web and Social Media Vol. 13, 417–427 (AAAI, 2019).

  70. Harper, S., Lynch, J. & Smith, G. D. Social determinants and the decline of cardiovascular diseases: understanding the links. Annu. Rev. Public Health 32, 39–69 (2011).

    Article  Google Scholar 

  71. Marmot, M. Social justice, epidemiology and health inequalities. Eur. J. Epidemiol. 32, 537–546 (2017).

    Article  Google Scholar 

  72. Akbar, M. & Chunara, R. Using contextual information to improve blood glucose prediction. In Proc. Machine Learning Research Vol. 106, 91–108 (PMLR, 2019); http://proceedings.mlr.press/v106/akbar19a.html

  73. Quisel, T., Kale, D. C. & Foschini, L. Intra-day activity better predicts chronic conditions. Preprint at https://arxiv.org/abs/1612.01200 (2016).

  74. Glymour, C. & Glymour, M. R. Commentary: race and sex are causes. Epidemiology 25, 488–490 (2014).

    Article  Google Scholar 

  75. Bauman, A. E., Sallis, J. F., Dzewaltowski, D. A. & Owen, N. Toward a better understanding of the influences on physical activity: the role of determinants, correlates, causal variables, mediators, moderators, and confounders. Am. J. Prevent. Med. 23, 5–14 (2002).

    Article  Google Scholar 

  76. Verma, S. & Rubin, J. Fairness definitions explained. In 2018 IEEE/ACM International Workshop on Software Fairness (Fairware) 1–7 (IEEE, 2018).

  77. McCradden, M. D., Joshi, S., Mazwi, M. & Anderson, J. A. Ethical limitations of algorithmic fairness solutions in health care machine learning. Lancet Digit. Health 2, e221–e223 (2020).

    Article  Google Scholar 

  78. Chen, I. Y., Agrawal, M., Horng, S. & Sontag, D. Robustly extracting medical knowledge from EHRS: a case study of learning a health knowledge graph. In Pacific Symposium on Biocomputing 2020 19–30 (World Scientific, 2020).

  79. Obermeyer, Z. & Mullainathan, S. Dissecting racial bias in an algorithm that guides health decisions for 70 million people. In Proc. Conference on Fairness, Accountability and Transparency 89 (ACM, 2019).

  80. Braveman, P. A., Egerter, S. A., Cubbin, C. & Marchi, K. S. An approach to studying social disparities in health and health care. Am. J. Public Health 94, 2139–2148 (2004).

    Article  Google Scholar 

  81. Penman-Aguilar, A. et al. Measurement of health disparities, health inequities, and social determinants of health to support the advancement of health equity. J. Public Health Manag. Pract. 22, S33 (2016).

    Article  Google Scholar 

  82. Rajkomar, A., Hardt, M., Howell, M. D., Corrado, G. & Chin, M. H. Ensuring fairness in machine learning to advance health equity. Ann. Intern. Med. 169, 866–872 (2018).

    Article  Google Scholar 

  83. Tichenor, M. & Sridhar, D. Metric partnerships: global burden of disease estimates within the World Bank, the World Health Organisation and the Institute for Health Metrics and Evaluation. Wellcome Open Res. 4, 35 (2019).

    Article  Google Scholar 

  84. Buolamwini, J. & Gebru, T. Gender shades: intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency 77–91 (PMLR, 2018).

  85. Agarwal, C. & Hooker, S. Estimating example difficulty using variance of gradients. Preprint at https://arxiv.org/abs/2008.11600 (2020).

  86. Hooker, S., Moorosi, N., Clark, G., Bengio, S. & Denton, E. Characterising bias in compressed models. Preprint at https://arxiv.org/abs/2010.03058 (2020).

  87. Suresh, H. & Guttag, J. V. A framework for understanding unintended consequences of machine learning. Preprint at https://arxiv.org/abs/1901.10002 (2019).

  88. Krieger, N. Refiguring ‘race’: epidemiology, racialized biology, and biological expressions of race relations. Int. J. Health Serv. 30, 211–216 (2000).

    Article  Google Scholar 

  89. Bonham, V. L., Green, E. D. & Pérez-Stable, E. J. Examining how race, ethnicity, and ancestry data are used in biomedical research. JAMA 320, 1533–1534 (2018).

    Article  Google Scholar 

  90. Crenshaw, K. Demarginalizing the intersection of race and sex: a black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. Univ. Chicago Legal Forum 139–167 (1989).

  91. Morris, J. N. Uses of epidemiology. Br. Med. J. 2, 395 (1955).

    Article  Google Scholar 

  92. Evans, C. R., Williams, D. R., Onnela, J.-P. & Subramanian, S. A multilevel approach to modeling health inequalities at the intersection of multiple social identities. Soc. Sci. Med. 203, 64–73 (2018).

    Article  Google Scholar 

  93. Benjamin, R. Race After Technology: Abolitionist Tools for the New Jim Code (Polity Press, 2019).

  94. Mitchell, S., Potash, E., Barocas, S., D’Amour, A. & Lum, K. Algorithmic fairness: choices, assumptions, and definitions. Annu. Rev. Stat. Appl. 8, 141–163 (2021).

    Article  MathSciNet  Google Scholar 

  95. VanderWeele, T. J. & Robinson, W. R. On causal interpretation of race in regressions adjusting for confounding and mediating variables. Epidemiology 25, 473 (2014).

    Article  Google Scholar 

  96. Diez-Roux, A. V. Bringing context back into epidemiology: variables and fallacies in multilevel analysis. Am. J. Public Health 88, 216–222 (1998).

    Article  Google Scholar 

  97. Mhasawade, V. & Chunara, R. Causal multi-level fairness. Preprint at https://arxiv.org/abs/2010.07343 (2020).

  98. Card, D. E. et al. The Impact of Health Insurance Status on Treatment Intensity and Health Outcomes (RAND, 2007).

  99. Pearl, J. & Bareinboim, E. External validity: from do-calculus to transportability across populations. Stat. Sci. 29, 579–595 (2014).

    Article  MathSciNet  MATH  Google Scholar 

  100. Mitchell, S., Potash, E., Barocas, S., D’Amour, A. & Lum, K. Prediction-based decisions and fairness: a catalogue of choices, assumptions, and definitions. Preprint at https://arxiv.org/abs/1811.07867 (2018).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rumi Chunara.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review informationNature Machine Intelligence thanks Melissa Mccradden, Marcello Ienca and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mhasawade, V., Zhao, Y. & Chunara, R. Machine learning and algorithmic fairness in public and population health. Nat Mach Intell 3, 659–666 (2021). https://doi.org/10.1038/s42256-021-00373-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s42256-021-00373-4

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing