Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Methodological opportunities in genomic data analysis to advance health equity

Abstract

The causes and consequences of inequities in genomic research and medicine are complex and widespread. However, it is widely acknowledged that underrepresentation of diverse populations in human genetics research risks exacerbating existing health disparities. Efforts to improve diversity are ongoing, but an often-overlooked source of inequity is the choice of analytical methods used to process, analyse and interpret genomic data. This choice can influence all areas of genomic research, from genome-wide association studies and polygenic score development to variant prioritization and functional genomics. New statistical and machine learning techniques to understand, quantify and correct for the impact of biases in genomic data are emerging within the wider genomic research and genomic medicine ecosystems. At this crucial time point, it is important to clarify where improvements in methods and practices can, or cannot, have a role in improving equity in genomics. Here, we review existing approaches to promote equity and fairness in statistical analysis for genomics, and propose future methodological developments that are likely to yield the most impact for equity.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: A conceptual framework for a general genomic data analysis task.
Fig. 2: Pathways to equity.

Similar content being viewed by others

References

  1. Fatumo, S. et al. A roadmap to increase diversity in genomic studies. Nat. Med. 28, 243–250 (2022). This paper reports the persistent lack of diversity across genetic ancestry for participants in genome-wide association studies and discusses strategies to enhance inclusion.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Popejoy, A. B. & Fullerton, S. M. Genomics is failing on diversity. Nature 538, 161–164 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Need, A. C. & Goldstein, D. B. Next generation disparities in human genomics: concerns and remedies. Trends Genet. 25, 489–494 (2009).

    Article  CAS  PubMed  Google Scholar 

  4. Bustamante, C. D., De La Vega, F. M. & Burchard, E. G. Genomics for the world. Nature 475, 163–165 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Carrot-Zhang, J. et al. Comprehensive analysis of genetic ancestry and its molecular correlates in cancer. Cancer Cell 37, 639–654.e6 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Bentley, A. R., Callier, S. & Rotimi, C. N. Diversity and inclusion in genomic research: why the uneven progress? J. Community Genet. 8, 255–266 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Atutornu, J., Milne, R., Costa, A., Patch, C. & Middleton, A. Towards equitable and trustworthy genomics research. eBioMedicine 76, 103879 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  8. World Health Organization. A Conceptual Framework for Action on the Social Determinants of Health (WHO, 2010).

  9. Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Duncan, L. et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun. 10, 3328 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Ding, Y. et al. Polygenic scoring accuracy varies across the genetic ancestry continuum. Nature 618, 774–781 (2023). This paper shows that the predictive accuracy of polygenic scores declines gradually across the continuum of genetic ancestry.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Polygenic Risk Score Task Force of the International Common Disease Alliance. Responsible use of polygenic risk scores in the clinic: potential benefits, risks and gaps. Nat. Med. 27, 1876–1884 (2021).

  13. Kullo, I. et al. Polygenic scores in biomedical research. Nat. Rev. Genet. 23, 524–532 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019). This paper explores the potential clinical implications of the limited generalizability of polygenic scores across populations.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. The All of Us Research Program Investigators. The “All of Us” research program. N. Engl. J. Med. 381, 668–676 (2019).

  16. The H3Africa Consortium et al. Enabling the genomic revolution in Africa. Science 344, 1346–1348 (2014).

    Article  PubMed Central  Google Scholar 

  17. Manrai, A. K. et al. Genetic misdiagnoses and the potential for health disparities. N. Engl. J. Med. 375, 655–665 (2016). This study shows that several genetic variants previously thought to cause hypertrophic cardiomyopathy were misclassified due to limited ancestral diversity in reference datasets, highlighting the need for more inclusive genomic data.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Manolio, T. A. Using the data we have: improving diversity in genomic research. Am. J. Hum. Genet. 105, 233–236 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Schoeler, T. et al. Participation bias in the UK Biobank distorts genetic associations and downstream analyses. Nat. Hum. Behav. 7, 1216–1227 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Wang, Y., Tsuo, K., Kanai, M., Neale, B. & Martin, A. Challenges and opportunities for developing more generalizable polygenic risk scores. Annu. Rev. Biomed. Data Sci. 5, 293–320 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Peterson, R. E. et al. Genome-wide association studies in ancestrally diverse populations: opportunities, methods, pitfalls, and recommendations. Cell 179, 589–603 (2019). This paper provides methodological guidance to support the analysis of genome-wide association studies in populations of diverse genetic ancestry.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Kachuri, L. et al. Principles and methods for transferring polygenic risk scores across global populations. Nat. Rev. Genet. 25, 8–25 (2024). This review describes the factors limiting the generalizability of polygenic scores across populations and explores the merits of currently available methods to improve generalizability.

    Article  CAS  PubMed  Google Scholar 

  25. Chen, I. Y. et al. Ethical machine learning in healthcare. Annu. Rev. Biomed. Data Sci. 4, 123–144 (2021). This paper presents challenges and recommendations for developing equitable machine learning systems in healthcare across the entire development pipeline — from problem selection to deployment.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Burr, C. & Leslie, D. Ethical assurance: a practical approach to the responsible design, development, and deployment of data-driven technologies. AI Ethics 3, 73–98 (2023).

    Article  Google Scholar 

  27. Martin, A. R. et al. Increasing diversity in genomics requires investment in equitable partnerships and capacity building. Nat. Genet. 54, 740–745 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177, 26–31 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Purcell, S., Cherny, S. S. & Sham, P. C. Genetic power calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics 19, 149–150 (2003).

    Article  CAS  PubMed  Google Scholar 

  30. Wang, G. T., Li, B., Lyn Santos-Cortez, R. P., Peng, B. & Leal, S. M. Power analysis and sample size estimation for sequence-based association studies. Bioinformatics 30, 2377–2378 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. National Academies of Sciences, Engineering, and Medicine. Using population descriptors in genetics and genomics research: a new framework for an evolving field. https://doi.org/10.17226/26902 (National Academies Press, 2023). This National Academies report explores how population descriptors are currently used in genomics research, outlining best practices for researchers within the biomedical and scientific communities.

  32. Dickman, S. L., Himmelstein, D. U. & Woolhandler, S. Inequality and the health-care system in the USA. Lancet 389, 1431–1441 (2017).

    Article  PubMed  Google Scholar 

  33. Richmond, J., Anderson, A., Cunningham-Erves, J., Ozawa, S. & Wilkins, C. H. Conceptualizing and measuring trust, mistrust, and distrust: implications for advancing health equity and building trustworthiness. Annu. Rev. Public. Health 45, 465–484 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Hughson, J. et al. A review of approaches to improve participation of culturally and linguistically diverse populations in clinical trials. Trials 17, 263 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Kraft, S. A. & Doerr, M. Engaging populations underrepresented in research through novel approaches to consent. Am. J. Med. Genet. C. Semin. Med. Genet. 178, 75–80 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Hemstrom, W., Grummer, J. A., Luikart, G. & Christie, M. R. Next-generation data filtering in the genomics era. Nat. Rev. Genet. 25, 750–767 (2024).

    Article  CAS  PubMed  Google Scholar 

  37. Leek, J. T. et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11, 733–739 (2010).

    Article  CAS  PubMed  Google Scholar 

  38. Kowal, E., Greenwood, A. & McWhirter, R. E. All in the blood: a review of Aboriginal Australians’ cultural beliefs about blood and implications for biospecimen research. J. Empir. Res. Hum. Res. Ethics 10, 347–359 (2015).

    Article  PubMed  Google Scholar 

  39. Yao, R. A., Akinrinade, O., Chaix, M. & Mital, S. Quality of whole genome sequencing from blood versus saliva derived DNA in cardiac patients. BMC Med. Genomics 13, 11 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Yancey, A. K., Ortega, A. N. & Kumanyika, S. K. Effective recruitment and retention of minority research participants. Annu. Rev. Public. Health 27, 1–28 (2006).

    Article  PubMed  Google Scholar 

  41. Wojcik, G. L. et al. Opportunities and challenges for the use of common controls in sequencing studies. Nat. Rev. Genet. 23, 665–679 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Byrd, J. B., Greene, A. C., Prasad, D. V., Jiang, X. & Greene, C. S. Responsible, practical genomic data sharing that accelerates research. Nat. Rev. Genet. 21, 615–629 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Boscarino, N., Cartwright, R. A., Fox, K. & Tsosie, K. S. Federated learning and Indigenous genomic data sovereignty. Nat. Mach. Intell. 4, 909–911 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Schneider, V. A. et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 27, 849–864 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Ballouz, S., Dobin, A. & Gillis, J. A. Is it time to change the reference genome? Genome Biol. 20, 159 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Chen, N.-C., Solomon, B., Mun, T., Iyer, S. & Langmead, B. Reference flow: reducing reference bias using multiple population genomes. Genome Biol. 22, 8 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Huang, L. et al. Genotype-imputation accuracy across worldwide human populations. Am. J. Hum. Genet. 84, 235–250 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Xiang, R. et al. A comparison for dimensionality reduction methods of single-cell RNA-seq data. Front. Genet. 12, 646936 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Diaz-Papkovich, A., Anderson-Trocmé, L. & Gravel, S. A review of UMAP in population genetics. J. Hum. Genet. 66, 85–91 (2021).

    Article  PubMed  Google Scholar 

  50. Kozlov, M. ‘All of Us’ genetics chart stirs unease over controversial depiction of race. Nature https://doi.org/10.1038/d41586-024-00568-w (2024).

  51. Lin, P.-I., Vance, J. M., Pericak-Vance, M. A. & Martin, E. R. No gene is an island: the flip-flop phenomenon. Am. J. Hum. Genet. 80, 531–538 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Kim, M. S., Patel, K. P., Teng, A. K., Berens, A. J. & Lachance, J. Genetic disease risks can be misestimated across global populations. Genome Biol. 19, 179 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Buolamwini, J. & Gebru, T. Gender shades: intersectional accuracy disparities in commercial gender classification. in Proceedings of the 1st Conference on Fairness, Accountability and Transparency 77–91 (PMLR, 2018).

  54. Kamiza, A. B. et al. Transferability of genetic risk scores in African populations. Nat. Med. 28, 1163–1166 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Payne, K., Gavan, S. P., Wright, S. J. & Thompson, A. J. Cost-effectiveness analyses of genetic and genomic diagnostic tests. Nat. Rev. Genet. 19, 235–246 (2018).

    Article  CAS  PubMed  Google Scholar 

  56. Khoury, M. J., Iademarco, M. F. & Riley, W. T. Precision public health for the era of precision medicine. Am. J. Prev. Med. 50, 398–401 (2016).

    Article  PubMed  Google Scholar 

  57. LaVeist, T. A. et al. The economic burden of racial, ethnic, and educational health inequities in the US. JAMA 329, 1682–1692 (2023).

    Article  PubMed  Google Scholar 

  58. Cookson, R. et al. Using cost-effectiveness analysis to address health equity concerns. Value Health 20, 206–212 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019).

    Article  CAS  PubMed  Google Scholar 

  60. Liu, X. et al. The medical algorithmic audit. Lancet Digit. Health 4, e384–e397 (2022).

    Article  CAS  PubMed  Google Scholar 

  61. Tian, P. et al. Multiethnic polygenic risk prediction in diverse populations through transfer learning. Front. Genet. 13, 906965 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Zhao, Z., Fritsche, L. G., Smith, J. A., Mukherjee, B. & Lee, S. The construction of cross-population polygenic risk scores using transfer learning. Am. J. Hum. Genet. 109, 1998–2008 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Zhao, H., Rebbeck, T. R. & Mitra, N. A propensity score approach to correction for bias due to population stratification using genetic and non‐genetic factors. Genet. Epidemiol. 33, 679–690 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Zaitlen, N., Paşaniuc, B., Gur, T., Ziv, E. & Halperin, E. Leveraging genetic variability across populations for the identification of causal variants. Am. J. Hum. Genet. 86, 23–33 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Lehmann, B., Mackintosh, M., McVean, G. & Holmes, C. Optimal strategies for learning multi-ancestry polygenic scores vary across traits. Nat. Commun. 14, 4023 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Cai, W. et al. Adaptive sampling strategies to construct equitable training datasets. in Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency 1467–1478 (Association for Computing Machinery, 2022).

  67. Jimenez-Kaufmann, A. et al. Imputation performance in Latin American populations: improving rare variants representation with the inclusion of Native American genomes. Front. Genet. 12, 719791 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  68. Yu, K. et al. Meta-imputation: an efficient method to combine genotype data after imputation with multiple reference panels. Am. J. Hum. Genet. 109, 1007–1015 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Arriaga-MacKenzie, I. et al. Summix: a method for detecting and adjusting for population structure in genetic summary data. Am. J. Hum. Genet. 108, 1270–1282 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Martin, E. R. et al. Properties of global- and local-ancestry adjustments in genetic association tests in admixed populations. Genet. Epidemiol. 42, 214–229 (2018).

    Article  PubMed  Google Scholar 

  71. Gay, N. R. et al. Impact of admixture and ancestry on eQTL analysis and GWAS colocalization in GTEx. Genome Biol. 21, 233 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Natri, H. M. et al. Genetic architecture of gene regulation in Indonesian populations identifies QTLs associated with global and local ancestries. Am. J. Hum. Genet. 109, 50–65 (2022).

    Article  CAS  PubMed  Google Scholar 

  73. Han, B. & Eskin, E. Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am. J. Hum. Genet. 88, 586–598 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Mägi, R. et al. Trans-ethnic meta-regression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution. Hum. Mol. Genet. 26, 3639–3650 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  75. Morris, A. P. Transethnic meta-analysis of genomewide association studies. Genet. Epidemiol. 35, 809–822 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  76. Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with Bayesian sparse linear mixed models. PLOS Genet. 9, e1003264 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Chen, H. et al. Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models. Am. J. Hum. Genet. 98, 653–666 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Atkinson, E. et al. Tractor uses local ancestry to enable the inclusion of admixed individuals in GWAS and to boost power. Nat. Genet. 53, 195–204 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Heckerman, D. et al. Linear mixed model for heritability estimation that explicitly addresses environmental variation. Proc. Natl Acad. Sci. USA 113, 7377–7382 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Ser. B Stat. Methodol. 82, 1273–1300 (2020).

    Article  Google Scholar 

  83. Yuan, K. et al. Fine-mapping across diverse ancestries drives the discovery of putative causal variants underlying human complex traits and diseases. Nat. Genet. 56, 1841–1850 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Gao, B. & Zhou, X. MESuSiE enables scalable and powerful multi-ancestry fine-mapping of causal variants in genome-wide association studies. Nat. Genet. 56, 170–179 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Lam, M. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 51, 1670–1678 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1776 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  87. Jin, J. et al. MUSSEL: enhanced Bayesian polygenic risk prediction leveraging information across multiple ancestry groups. Cell Genomics 4, 100539 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Zhang, H. et al. A new method for multiancestry polygenic prediction improves performance across diverse populations. Nat. Genet. 55, 1757–1768 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Zhang, J. et al. An ensemble penalized regression method for multi-ancestry polygenic risk prediction. Nat. Commun. 15, 3238 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Coram, M. A., Fang, H., Candille, S. I., Assimes, T. L. & Tang, H. Leveraging multi-ethnic evidence for risk assessment of quantitative traits in minority populations. Am. J. Hum. Genet. 101, 218–226 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Sun, Q. et al. Improving polygenic risk prediction in admixed populations by explicitly modeling ancestral-differential effects via GAUDI. Nat. Commun. 15, 1016 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Bitarello, B. D. & Mathieson, I. Polygenic scores for height in admixed populations. G3 GenesGenomesGenetics 10, 4027–4036 (2020).

    Article  CAS  Google Scholar 

  93. Tanigawa, Y. et al. Components of genetic associations across 2,138 phenotypes in the UK Biobank highlight adipocyte biology. Nat. Commun. 10, 4064 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  94. Kim, J., Bai, Y. & Pan, W. An adaptive association test for multiple phenotypes with GWAS summary statistics. Genet. Epidemiol. 39, 651–663 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  95. Xiao, J. et al. XPXP: improving polygenic prediction by cross-population and cross-phenotype analysis. Bioinformatics 38, 1947–1955 (2022).

    Article  CAS  PubMed  Google Scholar 

  96. Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  97. Weissbrod, O. et al. Leveraging fine-mapping and multi-population training data to improve cross-population polygenic risk scores. Nat. Genet. 54, 450–458 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Amariuta, T. et al. Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements. Nat. Genet. 52, 1346–1354 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Smith, S. P. et al. Enrichment analyses identify shared associations for 25 quantitative traits in over 600,000 individuals from seven diverse ancestries. Am. J. Hum. Genet. 109, 871–884 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Hujoel, M. L. A., Loh, P.-R., Neale, B. M. & Price, A. L. Incorporating family history of disease improves polygenic risk scores in diverse populations. Cell Genomics 2, 100152 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Weale, M. E. et al. Validation of an integrated risk tool, including polygenic risk score, for atherosclerotic cardiovascular disease in multiple ethnicities and ancestries. Am. J. Cardiol. 148, 157–164 (2021). This study validates a new integrated risk tool that combines a traditional clinical risk scores with a polygenic score to improve prediction of atherosclerotic cardiovascular disease across diverse ethnic and ancestry groups.

    Article  PubMed  Google Scholar 

  102. National Academies of Sciences, Engineering, and Medicine. Improving representation in clinical trials and research: building research equity for women and underrepresented groups. https://doi.org/10.17226/26479 (National Academies Press, 2022).

  103. Haynes, W. A., Tomczak, A. & Khatri, P. Gene annotation bias impedes biomedical research. Sci. Rep. 8, 1362 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  104. Mitra, R. et al. Learning from data with structured missingness. Nat. Mach. Intell. 5, 13–23 (2023).

    Article  Google Scholar 

  105. Long, E. et al. The case for increasing diversity in tissue-based functional genomics datasets to understand human disease susceptibility. Nat. Commun. 13, 2907 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Breeze, C. E., Beck, S., Berndt, S. I. & Franceschini, N. The missing diversity in human epigenomic studies. Nat. Genet. 54, 737–739 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. Sofer, T. et al. A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL. Genet. Epidemiol. 41, 251–258 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  108. Huang, Q. Q. et al. Transferability of genetic loci and polygenic scores for cardiometabolic traits in British Pakistani and Bangladeshi individuals. Nat. Commun. 13, 4664 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. O’Connor, T. D. et al. Rare variation facilitates inferences of fine-scale population structure in humans. Mol. Biol. Evol. 32, 653–660 (2015).

    Article  PubMed  Google Scholar 

  110. Belbin, G. M. et al. Toward a fine-scale population health monitoring system. Cell 184, 2068–2083.e11 (2021).

    Article  CAS  PubMed  Google Scholar 

  111. Fan, C., Mancuso, N. & Chiang, C. W. K. A genealogical estimate of genetic relationships. Am. J. Hum. Genet. 109, 812–824 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Visscher, P. M., Hill, W. G. & Wray, N. R. Heritability in the genomics era — concepts and misconceptions. Nat. Rev. Genet. 9, 255–266 (2008).

    Article  CAS  PubMed  Google Scholar 

  113. Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  114. Luo, Y. et al. Estimating heritability and its enrichment in tissue-specific gene sets in admixed populations. Hum. Mol. Genet. 30, 1521–1534 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  115. Shi, H. et al. Localizing components of shared transethnic genetic architecture of complex traits from GWAS summary data. Am. J. Hum. Genet. 106, 805–817 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  116. Brown, B. C., Ye, C. J., Price, A. L. & Zaitlen, N. Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 99, 76–88 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Lu, H. et al. Evaluating marginal genetic correlation of associated loci for complex diseases and traits between European and East Asian populations. Hum. Genet. 140, 1285–1297 (2021).

    Article  CAS  PubMed  Google Scholar 

  118. Cai, M. et al. A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits. Am. J. Hum. Genet. 108, 632–655 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  119. Wang, Y. et al. Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations. Nat. Commun. 11, 3865 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  120. Tan, T. & Atkinson, E. G. Strategies for the genomic analysis of admixed populations. Annu. Rev. Biomed. Data Sci. 6, 105–127 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  121. Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  122. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLOS Genet. 2, e190 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  124. Conomos, M. P., Miller, M. B. & Thornton, T. A. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness. Genet. Epidemiol. 39, 276–293 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  125. Wu, J., Liu, Y. & Zhao, Y. Systematic review on local ancestor inference from a mathematical and algorithmic perspective. Front. Genet. 12, 639877 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  126. Salter-Townshend, M. & Myers, S. Fine-scale inference of ancestry segments without prior knowledge of admixing groups. Genetics 212, 869–889 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  127. Li, N. & Stephens, M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  128. Maples, B. K., Gravel, S., Kenny, E. E. & Bustamante, C. D. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet. 93, 278–288 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  129. Khramtsova, E. A., Davis, L. K. & Stranger, B. E. The role of sex in the genomics of human complex traits. Nat. Rev. Genet. 20, 173–190 (2019).

    Article  CAS  PubMed  Google Scholar 

  130. Accounting for sex in the genome. Nat. Med. 23, 1243–1243 (2017).

  131. Sun, L., Wang, Z., Lu, T., Manolio, T. A. & Paterson, A. D. eXclusionarY: 10 years later, where are the sex chromosomes in GWASs? Am. J. Hum. Genet. 110, 903–912 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  132. Khramtsova, E. A. et al. Quality control and analytic best practices for testing genetic models of sex differences in large populations. Cell 186, 2044–2061 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  133. Clayton, D. Testing for association on the X chromosome. Biostatistics 9, 593–600 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  134. Loley, C., Ziegler, A. & König, I. R. Association tests for X-chromosomal markers — a comparison of different test statistics. Hum. Hered. 71, 23–36 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  135. Gao, F. et al. XWAS: a software toolset for genetic data analysis and association studies of the X chromosome. J. Hered. 106, 666–671 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  136. Webster, T. H. et al. Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data. GigaScience 8, giz074 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  137. Tallman, S. et al. Missing genetic diversity impacts variant prioritisation for rare disorders. Preprint at medRxiv https://doi.org/10.1101/2024.08.12.24311664 (2024).

  138. Schrijver, I. et al. The spectrum of CFTR variants in nonwhite cystic fibrosis patients: implications for molecular diagnostic testing. J. Mol. Diagn. 18, 39–50 (2016).

    Article  CAS  PubMed  Google Scholar 

  139. Kaseniit, K. E., Haque, I. S., Goldberg, J. D., Shulman, L. P. & Muzzey, D. Genetic ancestry analysis on >93,000 individuals undergoing expanded carrier screening reveals limitations of ethnicity-based medical guidelines. Genet. Med. 22, 1694–1702 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  140. Khan, A. T. et al. Recommendations on the use and reporting of race, ethnicity, and ancestry in genetic research: experiences from the NHLBI TOPMed program. Cell Genomics 2, 100155 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  141. Peterson, R. E. et al. The utility of empirically assigning ancestry groups in cross-population genetic studies of addiction. Am. J. Addict. 26, 494–501 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  142. Martschenko, D. O., Wand, H., Young, J. L. & Wojcik, G. L. Including multiracial individuals is crucial for race, ethnicity and ancestry frameworks in genetics and genomics. Nat. Genet. 55, 895–900 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  143. Lewis, A. C. F. et al. Getting genetic ancestry right for science and society. Science 376, 250–252 (2022). This paper argues for a move away from discrete continental labels towards a multidimensional, continuous view to characterise genetic ancestry.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  144. Speidel, L., Forest, M., Shi, S. & Myers, S. R. A method for genome-wide genealogy estimation for thousands of samples. Nat. Genet. 51, 1321–1329 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  145. Kelleher, J. et al. Inferring whole-genome histories in large population datasets. Nat. Genet. 51, 1330–1338 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  146. Zhang, B. C., Biddanda, A., Gunnarsson, Á. F., Cooper, F. & Palamara, P. F. Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits. Nat. Genet. 55, 768–776 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  147. Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  148. Eizenga, J. M. et al. Pangenome graphs. Annu. Rev. Genomics Hum. Genet. 21, 139–162 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  149. The Computational Pan-Genomics Consortium. Computational pan-genomics: status, promises and challenges. Brief. Bioinform. 19, 118–135 (2018).

    Google Scholar 

  150. Wang, T. et al. The Human Pangenome Project: a global resource to map genomic diversity. Nature 604, 437–446 (2022). This paper introduces the Human Pangenome Reference Consortium’s effort to build a high-quality, graph-based human reference genome that better captures global genetic diversity.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  151. Bonomi, L., Huang, Y. & Ohno-Machado, L. Privacy challenges and research opportunities for genomic data sharing. Nat. Genet. 52, 646–654 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  152. Arora, A. Synthetic data: the future of open-access health-care datasets? Lancet 401, 997 (2023).

    Article  PubMed  Google Scholar 

  153. Ghalebikesabi, S. et al. Mitigating statistical bias within differentially private synthetic data. in Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence 696–705 (PMLR, 2022).

  154. Bak, M. et al. Federated learning is not a cure-all for data ethics. Nat. Mach. Intell. 6, 370–372 (2024).

    Article  Google Scholar 

  155. Marmot, M. Social determinants of health inequalities. Lancet 365, 1099–1104 (2005).

    Article  PubMed  Google Scholar 

  156. Marmot, M. & Allen, J. J. Social determinants of health equity. Am. J. Public. Health 104, S517–S519 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  157. Sanderson, E. et al. Mendelian randomization. Nat. Rev. Methods Prim. 2, 1–21 (2022).

    Google Scholar 

  158. Burgess, S., Foley, C. N., Allara, E., Staley, J. R. & Howson, J. M. M. A robust and efficient method for Mendelian randomization with hundreds of genetic variants. Nat. Commun. 11, 376 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  159. Salas, L. A. et al. A transdisciplinary approach to understand the epigenetic basis of race/ethnicity health disparities. Epigenomics 13, 1761–1770 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  160. Cerutti, J., Lussier, A. A., Zhu, Y., Liu, J. & Dunn, E. C. Associations between indicators of socioeconomic position and DNA methylation: a scoping review. Clin. Epigenetics 13, 221 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  161. Yousefi, P. D. et al. DNA methylation-based predictors of health: applications and statistical considerations. Nat. Rev. Genet. 23, 369–383 (2022).

    Article  CAS  PubMed  Google Scholar 

  162. Rattray, N. J. W. et al. Beyond genomics: understanding exposotypes through metabolomics. Hum. Genomics 12, 4 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  163. Yang, G., Mishra, M. & Perera, M. A. Multi-omics studies in historically excluded populations: the road to equity. Clin. Pharmacol. Ther. 113, 541–556 (2023).

    Article  PubMed  Google Scholar 

  164. Lonsdale, J. et al. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

    Article  CAS  Google Scholar 

  165. Vandereyken, K., Sifrim, A., Thienpont, B. & Voet, T. Methods and applications for single-cell and spatial multi-omics. Nat. Rev. Genet. 24, 494–515 (2023).

    Article  CAS  PubMed  Google Scholar 

  166. Thomas, C. E. & Peters, U. Genomic landscape of cancer in racially and ethnically diverse populations. Nat. Rev. Genet. 12, 946625 (2024). This review highlights the need for more inclusive cancer genomics research across racial and ethnic groups to better understand population-specific genetic factors and reduce disparities in cancer outcomes.

    Google Scholar 

  167. Alderman, J. E. et al. Tackling algorithmic bias and promoting transparency in health datasets: the STANDING Together consensus recommendations. Lancet Digit. Health 7, e64–e88 (2025). This paper introduces the STANDING Together recommendations, developed through international consultation, to promote transparency and proactive evaluation of health datasets in artificial intelligence technologies, aiming to identify and reduce biases that could exacerbate health inequalities.

    Article  PubMed  Google Scholar 

  168. Mitchell, S., Potash, E., Barocas, S., D’Amour, A. & Lum, K. Algorithmic fairness: choices, assumptions, and definitions. Annu. Rev. Stat. Its Appl. 8, 141–163 (2021).

    Article  Google Scholar 

  169. Pfohl, S. R. et al. A toolbox for surfacing health equity harms and biases in large language models. Nat. Med. 30, 3590–3600 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  170. Hindorff, L. A. et al. Prioritizing diversity in human genomics research. Nat. Rev. Genet. 19, 175–185 (2018).

    Article  CAS  PubMed  Google Scholar 

  171. Mello, M. M. & Wolf, L. E. The Havasupai Indian tribe case — lessons for research involving stored biologic samples. N. Engl. J. Med. 363, 204–207 (2010).

    Article  CAS  PubMed  Google Scholar 

  172. Lee, S. S.-J. et al. “I don’t want to be Henrietta Lacks”: diverse patient perspectives on donating biospecimens for precision medicine research. Genet. Med. 21, 107–113 (2019).

    Article  PubMed  Google Scholar 

  173. Kaye, J. The tension between data sharing and the protection of privacy in genomics research. Annu. Rev. Genomics Hum. Genet. 13, 415–431 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  174. Israel, B. A. et al. Community-based participatory research: a capacity-building approach for policy advocacy aimed at eliminating health disparities. Am. J. Public. Health 100, 2094–2102 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  175. Rebbeck, T. R. et al. A framework for promoting diversity, equity, and inclusion in genetics and genomics research. JAMA Health Forum 3, e220603 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  176. Pereira, L., Mutesa, L., Tindana, P. & Ramsay, M. African genetic diversity and adaptation inform a precision medicine agenda. Nat. Rev. Genet. 22, 284–306 (2021).

    Article  CAS  PubMed  Google Scholar 

  177. Mathieson, I. & Scally, A. What is ancestry? PLoS Genet. 16, e1008624 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  178. Nielsen, R., Vaughn, A. H. & Deng, Y. Inference and applications of ancestral recombination graphs. Nat. Rev. Genet. 26, 47–58 (2025).

    Article  CAS  PubMed  Google Scholar 

  179. Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  180. Márquez-Luna, C. et al. Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets. Nat. Commun. 12, 6052 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  181. Busby, G. B. et al. Ancestry-specific polygenic risk scores are risk enhancers for clinical cardiovascular disease assessments. Nat. Commun. 14, 7105 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  182. Fuat, A. et al. A polygenic risk score added to a QRISK®2 cardiovascular disease risk calculator demonstrated robust clinical acceptance and clinical utility in the primary care setting. Eur. J. Prev. Cardiol. 31, 716–722 (2024).

    Article  PubMed  Google Scholar 

  183. Samani, N. J. et al. Polygenic risk score adds to a clinical risk score in the prediction of cardiovascular disease in a clinical setting. Eur. Heart J. 45, 3152–3160 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge the speakers and attendees of the joint Data Science for Health Equity workshops on ‘Challenges to statistical approaches for fairness in genomics’ and ‘Challenges to statistical approaches for health equity’ held in January 2022. They also thank C. Harbron, G. McVean, S. Walker, D. Deen, A. Shalek and H. Martin for comments on an earlier version of this manuscript. F.F. acknowledges the receipt of studentship awards from the Health Data Research UK-The Alan Turing Institute Wellcome PhD Programme (218529/Z/19/Z), and the Enrichment Scheme of The Alan Turing Institute under an Engineering & Physical Sciences Research Council grant (EP/N510129/1). K.K. was supported by the European Research Council under the European Union Horizon 2020 research and innovation programme (948561). N.C. was supported by US National Institutes of Health grants (R01HG013137, R01HG010480, U01HG011719).

Author information

Authors and Affiliations

Authors

Contributions

B.L., L.B. and F.F. researched the literature. B.L. and L.B. wrote the article. All authors contributed substantially to discussions of the content, and reviewed and/or edited the manuscript.

Corresponding author

Correspondence to Brieuc Lehmann.

Ethics declarations

Competing interests

This manuscript was informed by a project commissioned by the Diverse Data (DD) initiative at Genomics England (GEL) in December 2022 to explore the use of statistical and machine learning methods to improve fairness and equity in genomics. K.K. is the Scientific Lead for DD. S.T., T.N. and Y.C. are Genomic Data Scientists at GEL. M.S. was the Lead Genomic Data Scientist for DD, and M.M. was the Programme Lead for DD. B.L. and L.B. were paid consultants to GEL for the project. M.M. is Director of One HealthTech, which provides the secretariat for the Data Science for Health Equity community, which B.L. is also the co-founder of. B.L. and L.B. have acted as consultants for Google DeepMind in relation to other research in this field; however, Google DeepMind was not involved in this project or this publication. F.F. is an employee and shareholder of Microsoft Corporation.

Peer review

Peer review information

Nature Reviews Genetics thanks Anna C. F. Lewis, Cheryl L. Willman and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Catalogue of Bias: https://catalogofbias.org/

Our Future Health: https://ourfuturehealth.org.uk/

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lehmann, B., Bräuninger, L., Cho, Y. et al. Methodological opportunities in genomic data analysis to advance health equity. Nat Rev Genet 26, 635–649 (2025). https://doi.org/10.1038/s41576-025-00839-w

Download citation

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41576-025-00839-w

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research