Methodological opportunities in genomic data analysis to advance health equity

Lehmann, Brieuc; Bräuninger, Leandra; Cho, Yoonsu; Falck, Fabian; Jayadeva, Smera; Katell, Michael; Nguyen, Thuy; Perini, Antonella; Tallman, Sam; Mackintosh, Maxine; Silver, Matt; Kuchenbäcker, Karoline; Leslie, David; Chatterjee, Nilanjan; Holmes, Chris

doi:10.1038/s41576-025-00839-w

Review Article
Published: 15 May 2025

Methodological opportunities in genomic data analysis to advance health equity

Brieuc Lehmann ORCID: orcid.org/0000-0002-7302-4391¹,
Leandra Bräuninger ORCID: orcid.org/0009-0005-5137-0217^1,2,
Yoonsu Cho^3,4,
Fabian Falck^2,5,
Smera Jayadeva²,
Michael Katell²,
Thuy Nguyen³,
Antonella Perini²,
Sam Tallman³,
Maxine Mackintosh^2,3,
Matt Silver^3,6,
Karoline Kuchenbäcker^3,7^na1,
David Leslie²^na1,
Nilanjan Chatterjee ORCID: orcid.org/0000-0002-9060-008X^8,9^na1 &
…
Chris Holmes^5,10^na1

Nature Reviews Genetics volume 26, pages 635–649 (2025)Cite this article

3584 Accesses
4 Citations
26 Altmetric
Metrics details

Subjects

Abstract

The causes and consequences of inequities in genomic research and medicine are complex and widespread. However, it is widely acknowledged that underrepresentation of diverse populations in human genetics research risks exacerbating existing health disparities. Efforts to improve diversity are ongoing, but an often-overlooked source of inequity is the choice of analytical methods used to process, analyse and interpret genomic data. This choice can influence all areas of genomic research, from genome-wide association studies and polygenic score development to variant prioritization and functional genomics. New statistical and machine learning techniques to understand, quantify and correct for the impact of biases in genomic data are emerging within the wider genomic research and genomic medicine ecosystems. At this crucial time point, it is important to clarify where improvements in methods and practices can, or cannot, have a role in improving equity in genomics. Here, we review existing approaches to promote equity and fairness in statistical analysis for genomics, and propose future methodological developments that are likely to yield the most impact for equity.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: A conceptual framework for a general genomic data analysis task.**

A roadmap to increase diversity in genomic studies

Article 10 February 2022

Genomic sequencing: the case for equity of care in the era of personalized medicine

Article 22 January 2025

Genomic health data generation in the UK: a 360 view

Article Open access 19 October 2021

References

Fatumo, S. et al. A roadmap to increase diversity in genomic studies. Nat. Med. 28, 243–250 (2022). This paper reports the persistent lack of diversity across genetic ancestry for participants in genome-wide association studies and discusses strategies to enhance inclusion.
Article CAS PubMed PubMed Central Google Scholar
Popejoy, A. B. & Fullerton, S. M. Genomics is failing on diversity. Nature 538, 161–164 (2016).
Article CAS PubMed PubMed Central Google Scholar
Need, A. C. & Goldstein, D. B. Next generation disparities in human genomics: concerns and remedies. Trends Genet. 25, 489–494 (2009).
Article CAS PubMed Google Scholar
Bustamante, C. D., De La Vega, F. M. & Burchard, E. G. Genomics for the world. Nature 475, 163–165 (2011).
Article CAS PubMed PubMed Central Google Scholar
Carrot-Zhang, J. et al. Comprehensive analysis of genetic ancestry and its molecular correlates in cancer. Cancer Cell 37, 639–654.e6 (2020).
Article CAS PubMed PubMed Central Google Scholar
Bentley, A. R., Callier, S. & Rotimi, C. N. Diversity and inclusion in genomic research: why the uneven progress? J. Community Genet. 8, 255–266 (2017).
Article PubMed PubMed Central Google Scholar
Atutornu, J., Milne, R., Costa, A., Patch, C. & Middleton, A. Towards equitable and trustworthy genomics research. eBioMedicine 76, 103879 (2022).
Article PubMed PubMed Central Google Scholar
World Health Organization. A Conceptual Framework for Action on the Social Determinants of Health (WHO, 2010).
Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).
Article CAS PubMed PubMed Central Google Scholar
Duncan, L. et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun. 10, 3328 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ding, Y. et al. Polygenic scoring accuracy varies across the genetic ancestry continuum. Nature 618, 774–781 (2023). This paper shows that the predictive accuracy of polygenic scores declines gradually across the continuum of genetic ancestry.
Article CAS PubMed PubMed Central Google Scholar
Polygenic Risk Score Task Force of the International Common Disease Alliance. Responsible use of polygenic risk scores in the clinic: potential benefits, risks and gaps. Nat. Med. 27, 1876–1884 (2021).
Kullo, I. et al. Polygenic scores in biomedical research. Nat. Rev. Genet. 23, 524–532 (2022).
Article CAS PubMed PubMed Central Google Scholar
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019). This paper explores the potential clinical implications of the limited generalizability of polygenic scores across populations.
Article CAS PubMed PubMed Central Google Scholar
The All of Us Research Program Investigators. The “All of Us” research program. N. Engl. J. Med. 381, 668–676 (2019).
The H3Africa Consortium et al. Enabling the genomic revolution in Africa. Science 344, 1346–1348 (2014).
Article PubMed Central Google Scholar
Manrai, A. K. et al. Genetic misdiagnoses and the potential for health disparities. N. Engl. J. Med. 375, 655–665 (2016). This study shows that several genetic variants previously thought to cause hypertrophic cardiomyopathy were misclassified due to limited ancestral diversity in reference datasets, highlighting the need for more inclusive genomic data.
Article PubMed PubMed Central Google Scholar
Manolio, T. A. Using the data we have: improving diversity in genomic research. Am. J. Hum. Genet. 105, 233–236 (2019).
Article CAS PubMed PubMed Central Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Article CAS PubMed PubMed Central Google Scholar
Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).
Article PubMed PubMed Central Google Scholar
Schoeler, T. et al. Participation bias in the UK Biobank distorts genetic associations and downstream analyses. Nat. Hum. Behav. 7, 1216–1227 (2023).
Article PubMed PubMed Central Google Scholar
Wang, Y., Tsuo, K., Kanai, M., Neale, B. & Martin, A. Challenges and opportunities for developing more generalizable polygenic risk scores. Annu. Rev. Biomed. Data Sci. 5, 293–320 (2022).
Article PubMed PubMed Central Google Scholar
Peterson, R. E. et al. Genome-wide association studies in ancestrally diverse populations: opportunities, methods, pitfalls, and recommendations. Cell 179, 589–603 (2019). This paper provides methodological guidance to support the analysis of genome-wide association studies in populations of diverse genetic ancestry.
Article CAS PubMed PubMed Central Google Scholar
Kachuri, L. et al. Principles and methods for transferring polygenic risk scores across global populations. Nat. Rev. Genet. 25, 8–25 (2024). This review describes the factors limiting the generalizability of polygenic scores across populations and explores the merits of currently available methods to improve generalizability.
Article CAS PubMed Google Scholar
Chen, I. Y. et al. Ethical machine learning in healthcare. Annu. Rev. Biomed. Data Sci. 4, 123–144 (2021). This paper presents challenges and recommendations for developing equitable machine learning systems in healthcare across the entire development pipeline — from problem selection to deployment.
Article PubMed PubMed Central Google Scholar
Burr, C. & Leslie, D. Ethical assurance: a practical approach to the responsible design, development, and deployment of data-driven technologies. AI Ethics 3, 73–98 (2023).
Article Google Scholar
Martin, A. R. et al. Increasing diversity in genomics requires investment in equitable partnerships and capacity building. Nat. Genet. 54, 740–745 (2022).
Article CAS PubMed PubMed Central Google Scholar
Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177, 26–31 (2019).
Article CAS PubMed PubMed Central Google Scholar
Purcell, S., Cherny, S. S. & Sham, P. C. Genetic power calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics 19, 149–150 (2003).
Article CAS PubMed Google Scholar
Wang, G. T., Li, B., Lyn Santos-Cortez, R. P., Peng, B. & Leal, S. M. Power analysis and sample size estimation for sequence-based association studies. Bioinformatics 30, 2377–2378 (2014).
Article CAS PubMed PubMed Central Google Scholar
National Academies of Sciences, Engineering, and Medicine. Using population descriptors in genetics and genomics research: a new framework for an evolving field. https://doi.org/10.17226/26902 (National Academies Press, 2023). This National Academies report explores how population descriptors are currently used in genomics research, outlining best practices for researchers within the biomedical and scientific communities.
Dickman, S. L., Himmelstein, D. U. & Woolhandler, S. Inequality and the health-care system in the USA. Lancet 389, 1431–1441 (2017).
Article PubMed Google Scholar
Richmond, J., Anderson, A., Cunningham-Erves, J., Ozawa, S. & Wilkins, C. H. Conceptualizing and measuring trust, mistrust, and distrust: implications for advancing health equity and building trustworthiness. Annu. Rev. Public. Health 45, 465–484 (2024).
Article PubMed PubMed Central Google Scholar
Hughson, J. et al. A review of approaches to improve participation of culturally and linguistically diverse populations in clinical trials. Trials 17, 263 (2016).
Article PubMed PubMed Central Google Scholar
Kraft, S. A. & Doerr, M. Engaging populations underrepresented in research through novel approaches to consent. Am. J. Med. Genet. C. Semin. Med. Genet. 178, 75–80 (2018).
Article PubMed PubMed Central Google Scholar
Hemstrom, W., Grummer, J. A., Luikart, G. & Christie, M. R. Next-generation data filtering in the genomics era. Nat. Rev. Genet. 25, 750–767 (2024).
Article CAS PubMed Google Scholar
Leek, J. T. et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11, 733–739 (2010).
Article CAS PubMed Google Scholar
Kowal, E., Greenwood, A. & McWhirter, R. E. All in the blood: a review of Aboriginal Australians’ cultural beliefs about blood and implications for biospecimen research. J. Empir. Res. Hum. Res. Ethics 10, 347–359 (2015).
Article PubMed Google Scholar
Yao, R. A., Akinrinade, O., Chaix, M. & Mital, S. Quality of whole genome sequencing from blood versus saliva derived DNA in cardiac patients. BMC Med. Genomics 13, 11 (2020).
Article CAS PubMed PubMed Central Google Scholar
Yancey, A. K., Ortega, A. N. & Kumanyika, S. K. Effective recruitment and retention of minority research participants. Annu. Rev. Public. Health 27, 1–28 (2006).
Article PubMed Google Scholar
Wojcik, G. L. et al. Opportunities and challenges for the use of common controls in sequencing studies. Nat. Rev. Genet. 23, 665–679 (2022).
Article CAS PubMed PubMed Central Google Scholar
Byrd, J. B., Greene, A. C., Prasad, D. V., Jiang, X. & Greene, C. S. Responsible, practical genomic data sharing that accelerates research. Nat. Rev. Genet. 21, 615–629 (2020).
Article CAS PubMed PubMed Central Google Scholar
Boscarino, N., Cartwright, R. A., Fox, K. & Tsosie, K. S. Federated learning and Indigenous genomic data sovereignty. Nat. Mach. Intell. 4, 909–911 (2022).
Article PubMed PubMed Central Google Scholar
Schneider, V. A. et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 27, 849–864 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ballouz, S., Dobin, A. & Gillis, J. A. Is it time to change the reference genome? Genome Biol. 20, 159 (2019).
Article PubMed PubMed Central Google Scholar
Chen, N.-C., Solomon, B., Mun, T., Iyer, S. & Langmead, B. Reference flow: reducing reference bias using multiple population genomes. Genome Biol. 22, 8 (2021).
Article PubMed PubMed Central Google Scholar
Huang, L. et al. Genotype-imputation accuracy across worldwide human populations. Am. J. Hum. Genet. 84, 235–250 (2009).
Article CAS PubMed PubMed Central Google Scholar
Xiang, R. et al. A comparison for dimensionality reduction methods of single-cell RNA-seq data. Front. Genet. 12, 646936 (2021).
Article CAS PubMed PubMed Central Google Scholar
Diaz-Papkovich, A., Anderson-Trocmé, L. & Gravel, S. A review of UMAP in population genetics. J. Hum. Genet. 66, 85–91 (2021).
Article PubMed Google Scholar
Kozlov, M. ‘All of Us’ genetics chart stirs unease over controversial depiction of race. Nature https://doi.org/10.1038/d41586-024-00568-w (2024).
Lin, P.-I., Vance, J. M., Pericak-Vance, M. A. & Martin, E. R. No gene is an island: the flip-flop phenomenon. Am. J. Hum. Genet. 80, 531–538 (2007).
Article CAS PubMed PubMed Central Google Scholar
Kim, M. S., Patel, K. P., Teng, A. K., Berens, A. J. & Lachance, J. Genetic disease risks can be misestimated across global populations. Genome Biol. 19, 179 (2018).
Article PubMed PubMed Central Google Scholar
Buolamwini, J. & Gebru, T. Gender shades: intersectional accuracy disparities in commercial gender classification. in Proceedings of the 1st Conference on Fairness, Accountability and Transparency 77–91 (PMLR, 2018).
Kamiza, A. B. et al. Transferability of genetic risk scores in African populations. Nat. Med. 28, 1163–1166 (2022).
Article CAS PubMed PubMed Central Google Scholar
Payne, K., Gavan, S. P., Wright, S. J. & Thompson, A. J. Cost-effectiveness analyses of genetic and genomic diagnostic tests. Nat. Rev. Genet. 19, 235–246 (2018).
Article CAS PubMed Google Scholar
Khoury, M. J., Iademarco, M. F. & Riley, W. T. Precision public health for the era of precision medicine. Am. J. Prev. Med. 50, 398–401 (2016).
Article PubMed Google Scholar
LaVeist, T. A. et al. The economic burden of racial, ethnic, and educational health inequities in the US. JAMA 329, 1682–1692 (2023).
Article PubMed Google Scholar
Cookson, R. et al. Using cost-effectiveness analysis to address health equity concerns. Value Health 20, 206–212 (2017).
Article PubMed PubMed Central Google Scholar
Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019).
Article CAS PubMed Google Scholar
Liu, X. et al. The medical algorithmic audit. Lancet Digit. Health 4, e384–e397 (2022).
Article CAS PubMed Google Scholar
Tian, P. et al. Multiethnic polygenic risk prediction in diverse populations through transfer learning. Front. Genet. 13, 906965 (2022).
Article PubMed PubMed Central Google Scholar
Zhao, Z., Fritsche, L. G., Smith, J. A., Mukherjee, B. & Lee, S. The construction of cross-population polygenic risk scores using transfer learning. Am. J. Hum. Genet. 109, 1998–2008 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zhao, H., Rebbeck, T. R. & Mitra, N. A propensity score approach to correction for bias due to population stratification using genetic and non‐genetic factors. Genet. Epidemiol. 33, 679–690 (2009).
Article PubMed PubMed Central Google Scholar
Zaitlen, N., Paşaniuc, B., Gur, T., Ziv, E. & Halperin, E. Leveraging genetic variability across populations for the identification of causal variants. Am. J. Hum. Genet. 86, 23–33 (2010).
Article CAS PubMed PubMed Central Google Scholar
Lehmann, B., Mackintosh, M., McVean, G. & Holmes, C. Optimal strategies for learning multi-ancestry polygenic scores vary across traits. Nat. Commun. 14, 4023 (2023).
Article CAS PubMed PubMed Central Google Scholar
Cai, W. et al. Adaptive sampling strategies to construct equitable training datasets. in Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency 1467–1478 (Association for Computing Machinery, 2022).
Jimenez-Kaufmann, A. et al. Imputation performance in Latin American populations: improving rare variants representation with the inclusion of Native American genomes. Front. Genet. 12, 719791 (2022).
Article PubMed PubMed Central Google Scholar
Yu, K. et al. Meta-imputation: an efficient method to combine genotype data after imputation with multiple reference panels. Am. J. Hum. Genet. 109, 1007–1015 (2022).
Article CAS PubMed PubMed Central Google Scholar
Arriaga-MacKenzie, I. et al. Summix: a method for detecting and adjusting for population structure in genetic summary data. Am. J. Hum. Genet. 108, 1270–1282 (2021).
Article CAS PubMed PubMed Central Google Scholar
Martin, E. R. et al. Properties of global- and local-ancestry adjustments in genetic association tests in admixed populations. Genet. Epidemiol. 42, 214–229 (2018).
Article PubMed Google Scholar
Gay, N. R. et al. Impact of admixture and ancestry on eQTL analysis and GWAS colocalization in GTEx. Genome Biol. 21, 233 (2020).
Article CAS PubMed PubMed Central Google Scholar
Natri, H. M. et al. Genetic architecture of gene regulation in Indonesian populations identifies QTLs associated with global and local ancestries. Am. J. Hum. Genet. 109, 50–65 (2022).
Article CAS PubMed Google Scholar
Han, B. & Eskin, E. Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am. J. Hum. Genet. 88, 586–598 (2011).
Article CAS PubMed PubMed Central Google Scholar
Mägi, R. et al. Trans-ethnic meta-regression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution. Hum. Mol. Genet. 26, 3639–3650 (2017).
Article PubMed PubMed Central Google Scholar
Morris, A. P. Transethnic meta-analysis of genomewide association studies. Genet. Epidemiol. 35, 809–822 (2011).
Article PubMed PubMed Central Google Scholar
Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with Bayesian sparse linear mixed models. PLOS Genet. 9, e1003264 (2013).
Article CAS PubMed PubMed Central Google Scholar
Chen, H. et al. Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models. Am. J. Hum. Genet. 98, 653–666 (2016).
Article CAS PubMed PubMed Central Google Scholar
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Article CAS PubMed PubMed Central Google Scholar
Atkinson, E. et al. Tractor uses local ancestry to enable the inclusion of admixed individuals in GWAS and to boost power. Nat. Genet. 53, 195–204 (2021).
Article CAS PubMed PubMed Central Google Scholar
Heckerman, D. et al. Linear mixed model for heritability estimation that explicitly addresses environmental variation. Proc. Natl Acad. Sci. USA 113, 7377–7382 (2016).
Article CAS PubMed PubMed Central Google Scholar
Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Ser. B Stat. Methodol. 82, 1273–1300 (2020).
Article Google Scholar
Yuan, K. et al. Fine-mapping across diverse ancestries drives the discovery of putative causal variants underlying human complex traits and diseases. Nat. Genet. 56, 1841–1850 (2024).
Article CAS PubMed PubMed Central Google Scholar
Gao, B. & Zhou, X. MESuSiE enables scalable and powerful multi-ancestry fine-mapping of causal variants in genome-wide association studies. Nat. Genet. 56, 170–179 (2024).
Article CAS PubMed PubMed Central Google Scholar
Lam, M. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 51, 1670–1678 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1776 (2019).
Article PubMed PubMed Central Google Scholar
Jin, J. et al. MUSSEL: enhanced Bayesian polygenic risk prediction leveraging information across multiple ancestry groups. Cell Genomics 4, 100539 (2024).
Article CAS PubMed PubMed Central Google Scholar
Zhang, H. et al. A new method for multiancestry polygenic prediction improves performance across diverse populations. Nat. Genet. 55, 1757–1768 (2023).
Article CAS PubMed PubMed Central Google Scholar
Zhang, J. et al. An ensemble penalized regression method for multi-ancestry polygenic risk prediction. Nat. Commun. 15, 3238 (2024).
Article CAS PubMed PubMed Central Google Scholar
Coram, M. A., Fang, H., Candille, S. I., Assimes, T. L. & Tang, H. Leveraging multi-ethnic evidence for risk assessment of quantitative traits in minority populations. Am. J. Hum. Genet. 101, 218–226 (2017).
Article CAS PubMed PubMed Central Google Scholar
Sun, Q. et al. Improving polygenic risk prediction in admixed populations by explicitly modeling ancestral-differential effects via GAUDI. Nat. Commun. 15, 1016 (2024).
Article CAS PubMed PubMed Central Google Scholar
Bitarello, B. D. & Mathieson, I. Polygenic scores for height in admixed populations. G3 GenesGenomesGenetics 10, 4027–4036 (2020).
Article CAS Google Scholar
Tanigawa, Y. et al. Components of genetic associations across 2,138 phenotypes in the UK Biobank highlight adipocyte biology. Nat. Commun. 10, 4064 (2019).
Article PubMed PubMed Central Google Scholar
Kim, J., Bai, Y. & Pan, W. An adaptive association test for multiple phenotypes with GWAS summary statistics. Genet. Epidemiol. 39, 651–663 (2015).
Article PubMed PubMed Central Google Scholar
Xiao, J. et al. XPXP: improving polygenic prediction by cross-population and cross-phenotype analysis. Bioinformatics 38, 1947–1955 (2022).
Article CAS PubMed Google Scholar
Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015).
Article CAS PubMed PubMed Central Google Scholar
Weissbrod, O. et al. Leveraging fine-mapping and multi-population training data to improve cross-population polygenic risk scores. Nat. Genet. 54, 450–458 (2022).
Article CAS PubMed PubMed Central Google Scholar
Amariuta, T. et al. Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements. Nat. Genet. 52, 1346–1354 (2020).
Article CAS PubMed PubMed Central Google Scholar
Smith, S. P. et al. Enrichment analyses identify shared associations for 25 quantitative traits in over 600,000 individuals from seven diverse ancestries. Am. J. Hum. Genet. 109, 871–884 (2022).
Article CAS PubMed PubMed Central Google Scholar
Hujoel, M. L. A., Loh, P.-R., Neale, B. M. & Price, A. L. Incorporating family history of disease improves polygenic risk scores in diverse populations. Cell Genomics 2, 100152 (2022).
Article CAS PubMed PubMed Central Google Scholar
Weale, M. E. et al. Validation of an integrated risk tool, including polygenic risk score, for atherosclerotic cardiovascular disease in multiple ethnicities and ancestries. Am. J. Cardiol. 148, 157–164 (2021). This study validates a new integrated risk tool that combines a traditional clinical risk scores with a polygenic score to improve prediction of atherosclerotic cardiovascular disease across diverse ethnic and ancestry groups.
Article PubMed Google Scholar
National Academies of Sciences, Engineering, and Medicine. Improving representation in clinical trials and research: building research equity for women and underrepresented groups. https://doi.org/10.17226/26479 (National Academies Press, 2022).
Haynes, W. A., Tomczak, A. & Khatri, P. Gene annotation bias impedes biomedical research. Sci. Rep. 8, 1362 (2018).
Article PubMed PubMed Central Google Scholar
Mitra, R. et al. Learning from data with structured missingness. Nat. Mach. Intell. 5, 13–23 (2023).
Article Google Scholar
Long, E. et al. The case for increasing diversity in tissue-based functional genomics datasets to understand human disease susceptibility. Nat. Commun. 13, 2907 (2022).
Article CAS PubMed PubMed Central Google Scholar
Breeze, C. E., Beck, S., Berndt, S. I. & Franceschini, N. The missing diversity in human epigenomic studies. Nat. Genet. 54, 737–739 (2022).
Article CAS PubMed PubMed Central Google Scholar
Sofer, T. et al. A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL. Genet. Epidemiol. 41, 251–258 (2017).
Article PubMed PubMed Central Google Scholar
Huang, Q. Q. et al. Transferability of genetic loci and polygenic scores for cardiometabolic traits in British Pakistani and Bangladeshi individuals. Nat. Commun. 13, 4664 (2022).
Article CAS PubMed PubMed Central Google Scholar
O’Connor, T. D. et al. Rare variation facilitates inferences of fine-scale population structure in humans. Mol. Biol. Evol. 32, 653–660 (2015).
Article PubMed Google Scholar
Belbin, G. M. et al. Toward a fine-scale population health monitoring system. Cell 184, 2068–2083.e11 (2021).
Article CAS PubMed Google Scholar
Fan, C., Mancuso, N. & Chiang, C. W. K. A genealogical estimate of genetic relationships. Am. J. Hum. Genet. 109, 812–824 (2022).
Article CAS PubMed PubMed Central Google Scholar
Visscher, P. M., Hill, W. G. & Wray, N. R. Heritability in the genomics era — concepts and misconceptions. Nat. Rev. Genet. 9, 255–266 (2008).
Article CAS PubMed Google Scholar
Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Luo, Y. et al. Estimating heritability and its enrichment in tissue-specific gene sets in admixed populations. Hum. Mol. Genet. 30, 1521–1534 (2021).
CAS PubMed PubMed Central Google Scholar
Shi, H. et al. Localizing components of shared transethnic genetic architecture of complex traits from GWAS summary data. Am. J. Hum. Genet. 106, 805–817 (2020).
Article CAS PubMed PubMed Central Google Scholar
Brown, B. C., Ye, C. J., Price, A. L. & Zaitlen, N. Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 99, 76–88 (2016).
Article CAS PubMed PubMed Central Google Scholar
Lu, H. et al. Evaluating marginal genetic correlation of associated loci for complex diseases and traits between European and East Asian populations. Hum. Genet. 140, 1285–1297 (2021).
Article CAS PubMed Google Scholar
Cai, M. et al. A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits. Am. J. Hum. Genet. 108, 632–655 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wang, Y. et al. Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations. Nat. Commun. 11, 3865 (2020).
Article PubMed PubMed Central Google Scholar
Tan, T. & Atkinson, E. G. Strategies for the genomic analysis of admixed populations. Annu. Rev. Biomed. Data Sci. 6, 105–127 (2023).
Article PubMed PubMed Central Google Scholar
Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).
Article CAS PubMed PubMed Central Google Scholar
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Article CAS PubMed PubMed Central Google Scholar
Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLOS Genet. 2, e190 (2006).
Article PubMed PubMed Central Google Scholar
Conomos, M. P., Miller, M. B. & Thornton, T. A. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness. Genet. Epidemiol. 39, 276–293 (2015).
Article PubMed PubMed Central Google Scholar
Wu, J., Liu, Y. & Zhao, Y. Systematic review on local ancestor inference from a mathematical and algorithmic perspective. Front. Genet. 12, 639877 (2021).
Article PubMed PubMed Central Google Scholar
Salter-Townshend, M. & Myers, S. Fine-scale inference of ancestry segments without prior knowledge of admixing groups. Genetics 212, 869–889 (2019).
Article PubMed PubMed Central Google Scholar
Li, N. & Stephens, M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003).
Article CAS PubMed PubMed Central Google Scholar
Maples, B. K., Gravel, S., Kenny, E. E. & Bustamante, C. D. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet. 93, 278–288 (2013).
Article CAS PubMed PubMed Central Google Scholar
Khramtsova, E. A., Davis, L. K. & Stranger, B. E. The role of sex in the genomics of human complex traits. Nat. Rev. Genet. 20, 173–190 (2019).
Article CAS PubMed Google Scholar
Accounting for sex in the genome. Nat. Med. 23, 1243–1243 (2017).
Sun, L., Wang, Z., Lu, T., Manolio, T. A. & Paterson, A. D. eXclusionarY: 10 years later, where are the sex chromosomes in GWASs? Am. J. Hum. Genet. 110, 903–912 (2023).
Article CAS PubMed PubMed Central Google Scholar
Khramtsova, E. A. et al. Quality control and analytic best practices for testing genetic models of sex differences in large populations. Cell 186, 2044–2061 (2023).
Article CAS PubMed PubMed Central Google Scholar
Clayton, D. Testing for association on the X chromosome. Biostatistics 9, 593–600 (2008).
Article PubMed PubMed Central Google Scholar
Loley, C., Ziegler, A. & König, I. R. Association tests for X-chromosomal markers — a comparison of different test statistics. Hum. Hered. 71, 23–36 (2011).
Article PubMed PubMed Central Google Scholar
Gao, F. et al. XWAS: a software toolset for genetic data analysis and association studies of the X chromosome. J. Hered. 106, 666–671 (2015).
Article CAS PubMed PubMed Central Google Scholar
Webster, T. H. et al. Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data. GigaScience 8, giz074 (2019).
Article PubMed PubMed Central Google Scholar
Tallman, S. et al. Missing genetic diversity impacts variant prioritisation for rare disorders. Preprint at medRxiv https://doi.org/10.1101/2024.08.12.24311664 (2024).
Schrijver, I. et al. The spectrum of CFTR variants in nonwhite cystic fibrosis patients: implications for molecular diagnostic testing. J. Mol. Diagn. 18, 39–50 (2016).
Article CAS PubMed Google Scholar
Kaseniit, K. E., Haque, I. S., Goldberg, J. D., Shulman, L. P. & Muzzey, D. Genetic ancestry analysis on >93,000 individuals undergoing expanded carrier screening reveals limitations of ethnicity-based medical guidelines. Genet. Med. 22, 1694–1702 (2020).
Article PubMed PubMed Central Google Scholar
Khan, A. T. et al. Recommendations on the use and reporting of race, ethnicity, and ancestry in genetic research: experiences from the NHLBI TOPMed program. Cell Genomics 2, 100155 (2022).
Article CAS PubMed PubMed Central Google Scholar
Peterson, R. E. et al. The utility of empirically assigning ancestry groups in cross-population genetic studies of addiction. Am. J. Addict. 26, 494–501 (2017).
Article PubMed PubMed Central Google Scholar
Martschenko, D. O., Wand, H., Young, J. L. & Wojcik, G. L. Including multiracial individuals is crucial for race, ethnicity and ancestry frameworks in genetics and genomics. Nat. Genet. 55, 895–900 (2023).
Article CAS PubMed PubMed Central Google Scholar
Lewis, A. C. F. et al. Getting genetic ancestry right for science and society. Science 376, 250–252 (2022). This paper argues for a move away from discrete continental labels towards a multidimensional, continuous view to characterise genetic ancestry.
Article CAS PubMed PubMed Central Google Scholar
Speidel, L., Forest, M., Shi, S. & Myers, S. R. A method for genome-wide genealogy estimation for thousands of samples. Nat. Genet. 51, 1321–1329 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kelleher, J. et al. Inferring whole-genome histories in large population datasets. Nat. Genet. 51, 1330–1338 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zhang, B. C., Biddanda, A., Gunnarsson, Á. F., Cooper, F. & Palamara, P. F. Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits. Nat. Genet. 55, 768–776 (2023).
Article CAS PubMed PubMed Central Google Scholar
Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
Article CAS PubMed PubMed Central Google Scholar
Eizenga, J. M. et al. Pangenome graphs. Annu. Rev. Genomics Hum. Genet. 21, 139–162 (2020).
Article CAS PubMed PubMed Central Google Scholar
The Computational Pan-Genomics Consortium. Computational pan-genomics: status, promises and challenges. Brief. Bioinform. 19, 118–135 (2018).
Google Scholar
Wang, T. et al. The Human Pangenome Project: a global resource to map genomic diversity. Nature 604, 437–446 (2022). This paper introduces the Human Pangenome Reference Consortium’s effort to build a high-quality, graph-based human reference genome that better captures global genetic diversity.
Article CAS PubMed PubMed Central Google Scholar
Bonomi, L., Huang, Y. & Ohno-Machado, L. Privacy challenges and research opportunities for genomic data sharing. Nat. Genet. 52, 646–654 (2020).
Article CAS PubMed PubMed Central Google Scholar
Arora, A. Synthetic data: the future of open-access health-care datasets? Lancet 401, 997 (2023).
Article PubMed Google Scholar
Ghalebikesabi, S. et al. Mitigating statistical bias within differentially private synthetic data. in Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence 696–705 (PMLR, 2022).
Bak, M. et al. Federated learning is not a cure-all for data ethics. Nat. Mach. Intell. 6, 370–372 (2024).
Article Google Scholar
Marmot, M. Social determinants of health inequalities. Lancet 365, 1099–1104 (2005).
Article PubMed Google Scholar
Marmot, M. & Allen, J. J. Social determinants of health equity. Am. J. Public. Health 104, S517–S519 (2014).
Article PubMed PubMed Central Google Scholar
Sanderson, E. et al. Mendelian randomization. Nat. Rev. Methods Prim. 2, 1–21 (2022).
Google Scholar
Burgess, S., Foley, C. N., Allara, E., Staley, J. R. & Howson, J. M. M. A robust and efficient method for Mendelian randomization with hundreds of genetic variants. Nat. Commun. 11, 376 (2020).
Article CAS PubMed PubMed Central Google Scholar
Salas, L. A. et al. A transdisciplinary approach to understand the epigenetic basis of race/ethnicity health disparities. Epigenomics 13, 1761–1770 (2021).
Article CAS PubMed PubMed Central Google Scholar
Cerutti, J., Lussier, A. A., Zhu, Y., Liu, J. & Dunn, E. C. Associations between indicators of socioeconomic position and DNA methylation: a scoping review. Clin. Epigenetics 13, 221 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yousefi, P. D. et al. DNA methylation-based predictors of health: applications and statistical considerations. Nat. Rev. Genet. 23, 369–383 (2022).
Article CAS PubMed Google Scholar
Rattray, N. J. W. et al. Beyond genomics: understanding exposotypes through metabolomics. Hum. Genomics 12, 4 (2018).
Article PubMed PubMed Central Google Scholar
Yang, G., Mishra, M. & Perera, M. A. Multi-omics studies in historically excluded populations: the road to equity. Clin. Pharmacol. Ther. 113, 541–556 (2023).
Article PubMed Google Scholar
Lonsdale, J. et al. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Article CAS Google Scholar
Vandereyken, K., Sifrim, A., Thienpont, B. & Voet, T. Methods and applications for single-cell and spatial multi-omics. Nat. Rev. Genet. 24, 494–515 (2023).
Article CAS PubMed Google Scholar
Thomas, C. E. & Peters, U. Genomic landscape of cancer in racially and ethnically diverse populations. Nat. Rev. Genet. 12, 946625 (2024). This review highlights the need for more inclusive cancer genomics research across racial and ethnic groups to better understand population-specific genetic factors and reduce disparities in cancer outcomes.
Google Scholar
Alderman, J. E. et al. Tackling algorithmic bias and promoting transparency in health datasets: the STANDING Together consensus recommendations. Lancet Digit. Health 7, e64–e88 (2025). This paper introduces the STANDING Together recommendations, developed through international consultation, to promote transparency and proactive evaluation of health datasets in artificial intelligence technologies, aiming to identify and reduce biases that could exacerbate health inequalities.
Article PubMed Google Scholar
Mitchell, S., Potash, E., Barocas, S., D’Amour, A. & Lum, K. Algorithmic fairness: choices, assumptions, and definitions. Annu. Rev. Stat. Its Appl. 8, 141–163 (2021).
Article Google Scholar
Pfohl, S. R. et al. A toolbox for surfacing health equity harms and biases in large language models. Nat. Med. 30, 3590–3600 (2024).
Article CAS PubMed PubMed Central Google Scholar
Hindorff, L. A. et al. Prioritizing diversity in human genomics research. Nat. Rev. Genet. 19, 175–185 (2018).
Article CAS PubMed Google Scholar
Mello, M. M. & Wolf, L. E. The Havasupai Indian tribe case — lessons for research involving stored biologic samples. N. Engl. J. Med. 363, 204–207 (2010).
Article CAS PubMed Google Scholar
Lee, S. S.-J. et al. “I don’t want to be Henrietta Lacks”: diverse patient perspectives on donating biospecimens for precision medicine research. Genet. Med. 21, 107–113 (2019).
Article PubMed Google Scholar
Kaye, J. The tension between data sharing and the protection of privacy in genomics research. Annu. Rev. Genomics Hum. Genet. 13, 415–431 (2012).
Article CAS PubMed PubMed Central Google Scholar
Israel, B. A. et al. Community-based participatory research: a capacity-building approach for policy advocacy aimed at eliminating health disparities. Am. J. Public. Health 100, 2094–2102 (2010).
Article PubMed PubMed Central Google Scholar
Rebbeck, T. R. et al. A framework for promoting diversity, equity, and inclusion in genetics and genomics research. JAMA Health Forum 3, e220603 (2022).
Article PubMed PubMed Central Google Scholar
Pereira, L., Mutesa, L., Tindana, P. & Ramsay, M. African genetic diversity and adaptation inform a precision medicine agenda. Nat. Rev. Genet. 22, 284–306 (2021).
Article CAS PubMed Google Scholar
Mathieson, I. & Scally, A. What is ancestry? PLoS Genet. 16, e1008624 (2020).
Article CAS PubMed PubMed Central Google Scholar
Nielsen, R., Vaughn, A. H. & Deng, Y. Inference and applications of ancestral recombination graphs. Nat. Rev. Genet. 26, 47–58 (2025).
Article CAS PubMed Google Scholar
Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).
Article PubMed PubMed Central Google Scholar
Márquez-Luna, C. et al. Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets. Nat. Commun. 12, 6052 (2021).
Article PubMed PubMed Central Google Scholar
Busby, G. B. et al. Ancestry-specific polygenic risk scores are risk enhancers for clinical cardiovascular disease assessments. Nat. Commun. 14, 7105 (2023).
Article CAS PubMed PubMed Central Google Scholar
Fuat, A. et al. A polygenic risk score added to a QRISK®2 cardiovascular disease risk calculator demonstrated robust clinical acceptance and clinical utility in the primary care setting. Eur. J. Prev. Cardiol. 31, 716–722 (2024).
Article PubMed Google Scholar
Samani, N. J. et al. Polygenic risk score adds to a clinical risk score in the prediction of cardiovascular disease in a clinical setting. Eur. Heart J. 45, 3152–3160 (2024).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge the speakers and attendees of the joint Data Science for Health Equity workshops on ‘Challenges to statistical approaches for fairness in genomics’ and ‘Challenges to statistical approaches for health equity’ held in January 2022. They also thank C. Harbron, G. McVean, S. Walker, D. Deen, A. Shalek and H. Martin for comments on an earlier version of this manuscript. F.F. acknowledges the receipt of studentship awards from the Health Data Research UK-The Alan Turing Institute Wellcome PhD Programme (218529/Z/19/Z), and the Enrichment Scheme of The Alan Turing Institute under an Engineering & Physical Sciences Research Council grant (EP/N510129/1). K.K. was supported by the European Research Council under the European Union Horizon 2020 research and innovation programme (948561). N.C. was supported by US National Institutes of Health grants (R01HG013137, R01HG010480, U01HG011719).

Author information

These authors contributed equally: Karoline Kuchenbäcker, David Leslie, Nilanjan Chatterjee, Chris Holmes.

Authors and Affiliations

Department of Statistical Science, University College London, London, UK
Brieuc Lehmann & Leandra Bräuninger
The Alan Turing Institute, London, UK
Leandra Bräuninger, Fabian Falck, Smera Jayadeva, Michael Katell, Antonella Perini, Maxine Mackintosh & David Leslie
Genomics England, London, UK
Yoonsu Cho, Thuy Nguyen, Sam Tallman, Maxine Mackintosh, Matt Silver & Karoline Kuchenbäcker
Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK
Yoonsu Cho
Department of Statistics, University of Oxford, Oxford, UK
Fabian Falck & Chris Holmes
Medical Research Council Unit The Gambia at the London School of Hygiene & Tropical Medicine, Banjul, The Gambia
Matt Silver
Division of Psychiatry, University College London, London, UK
Karoline Kuchenbäcker
Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
Nilanjan Chatterjee
Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
Nilanjan Chatterjee
Nuffield Department of Medicine, University of Oxford, Oxford, UK
Chris Holmes

Authors

Brieuc Lehmann
View author publications
Search author on:PubMed Google Scholar
Leandra Bräuninger
View author publications
Search author on:PubMed Google Scholar
Yoonsu Cho
View author publications
Search author on:PubMed Google Scholar
Fabian Falck
View author publications
Search author on:PubMed Google Scholar
Smera Jayadeva
View author publications
Search author on:PubMed Google Scholar
Michael Katell
View author publications
Search author on:PubMed Google Scholar
Thuy Nguyen
View author publications
Search author on:PubMed Google Scholar
Antonella Perini
View author publications
Search author on:PubMed Google Scholar
Sam Tallman
View author publications
Search author on:PubMed Google Scholar
Maxine Mackintosh
View author publications
Search author on:PubMed Google Scholar
Matt Silver
View author publications
Search author on:PubMed Google Scholar
Karoline Kuchenbäcker
View author publications
Search author on:PubMed Google Scholar
David Leslie
View author publications
Search author on:PubMed Google Scholar
Nilanjan Chatterjee
View author publications
Search author on:PubMed Google Scholar
Chris Holmes
View author publications
Search author on:PubMed Google Scholar

Contributions

B.L., L.B. and F.F. researched the literature. B.L. and L.B. wrote the article. All authors contributed substantially to discussions of the content, and reviewed and/or edited the manuscript.

Corresponding author

Correspondence to Brieuc Lehmann.

Ethics declarations

Competing interests

This manuscript was informed by a project commissioned by the Diverse Data (DD) initiative at Genomics England (GEL) in December 2022 to explore the use of statistical and machine learning methods to improve fairness and equity in genomics. K.K. is the Scientific Lead for DD. S.T., T.N. and Y.C. are Genomic Data Scientists at GEL. M.S. was the Lead Genomic Data Scientist for DD, and M.M. was the Programme Lead for DD. B.L. and L.B. were paid consultants to GEL for the project. M.M. is Director of One HealthTech, which provides the secretariat for the Data Science for Health Equity community, which B.L. is also the co-founder of. B.L. and L.B. have acted as consultants for Google DeepMind in relation to other research in this field; however, Google DeepMind was not involved in this project or this publication. F.F. is an employee and shareholder of Microsoft Corporation.

Peer review

Peer review information

Nature Reviews Genetics thanks Anna C. F. Lewis, Cheryl L. Willman and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lehmann, B., Bräuninger, L., Cho, Y. et al. Methodological opportunities in genomic data analysis to advance health equity. Nat Rev Genet 26, 635–649 (2025). https://doi.org/10.1038/s41576-025-00839-w

Download citation

Accepted: 27 March 2025
Published: 15 May 2025
Version of record: 15 May 2025
Issue date: September 2025
DOI: https://doi.org/10.1038/s41576-025-00839-w

This article is cited by

Ancestral diversity in complex disease genetics: from discovery to translation
- Karoline Kuchenbaecker
- Georgina Navoly
Nature Reviews Genetics (2026)
Why genomic diversity should not be framed by census alone
- Manuel Corpas
- Heinner Guio
- Segun Fatumo
Nature Genetics (2025)