Abstract
The causes and consequences of inequities in genomic research and medicine are complex and widespread. However, it is widely acknowledged that underrepresentation of diverse populations in human genetics research risks exacerbating existing health disparities. Efforts to improve diversity are ongoing, but an often-overlooked source of inequity is the choice of analytical methods used to process, analyse and interpret genomic data. This choice can influence all areas of genomic research, from genome-wide association studies and polygenic score development to variant prioritization and functional genomics. New statistical and machine learning techniques to understand, quantify and correct for the impact of biases in genomic data are emerging within the wider genomic research and genomic medicine ecosystems. At this crucial time point, it is important to clarify where improvements in methods and practices can, or cannot, have a role in improving equity in genomics. Here, we review existing approaches to promote equity and fairness in statistical analysis for genomics, and propose future methodological developments that are likely to yield the most impact for equity.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout


Similar content being viewed by others
References
Fatumo, S. et al. A roadmap to increase diversity in genomic studies. Nat. Med. 28, 243–250 (2022). This paper reports the persistent lack of diversity across genetic ancestry for participants in genome-wide association studies and discusses strategies to enhance inclusion.
Popejoy, A. B. & Fullerton, S. M. Genomics is failing on diversity. Nature 538, 161–164 (2016).
Need, A. C. & Goldstein, D. B. Next generation disparities in human genomics: concerns and remedies. Trends Genet. 25, 489–494 (2009).
Bustamante, C. D., De La Vega, F. M. & Burchard, E. G. Genomics for the world. Nature 475, 163–165 (2011).
Carrot-Zhang, J. et al. Comprehensive analysis of genetic ancestry and its molecular correlates in cancer. Cancer Cell 37, 639–654.e6 (2020).
Bentley, A. R., Callier, S. & Rotimi, C. N. Diversity and inclusion in genomic research: why the uneven progress? J. Community Genet. 8, 255–266 (2017).
Atutornu, J., Milne, R., Costa, A., Patch, C. & Middleton, A. Towards equitable and trustworthy genomics research. eBioMedicine 76, 103879 (2022).
World Health Organization. A Conceptual Framework for Action on the Social Determinants of Health (WHO, 2010).
Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).
Duncan, L. et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun. 10, 3328 (2019).
Ding, Y. et al. Polygenic scoring accuracy varies across the genetic ancestry continuum. Nature 618, 774–781 (2023). This paper shows that the predictive accuracy of polygenic scores declines gradually across the continuum of genetic ancestry.
Polygenic Risk Score Task Force of the International Common Disease Alliance. Responsible use of polygenic risk scores in the clinic: potential benefits, risks and gaps. Nat. Med. 27, 1876–1884 (2021).
Kullo, I. et al. Polygenic scores in biomedical research. Nat. Rev. Genet. 23, 524–532 (2022).
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019). This paper explores the potential clinical implications of the limited generalizability of polygenic scores across populations.
The All of Us Research Program Investigators. The “All of Us” research program. N. Engl. J. Med. 381, 668–676 (2019).
The H3Africa Consortium et al. Enabling the genomic revolution in Africa. Science 344, 1346–1348 (2014).
Manrai, A. K. et al. Genetic misdiagnoses and the potential for health disparities. N. Engl. J. Med. 375, 655–665 (2016). This study shows that several genetic variants previously thought to cause hypertrophic cardiomyopathy were misclassified due to limited ancestral diversity in reference datasets, highlighting the need for more inclusive genomic data.
Manolio, T. A. Using the data we have: improving diversity in genomic research. Am. J. Hum. Genet. 105, 233–236 (2019).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).
Schoeler, T. et al. Participation bias in the UK Biobank distorts genetic associations and downstream analyses. Nat. Hum. Behav. 7, 1216–1227 (2023).
Wang, Y., Tsuo, K., Kanai, M., Neale, B. & Martin, A. Challenges and opportunities for developing more generalizable polygenic risk scores. Annu. Rev. Biomed. Data Sci. 5, 293–320 (2022).
Peterson, R. E. et al. Genome-wide association studies in ancestrally diverse populations: opportunities, methods, pitfalls, and recommendations. Cell 179, 589–603 (2019). This paper provides methodological guidance to support the analysis of genome-wide association studies in populations of diverse genetic ancestry.
Kachuri, L. et al. Principles and methods for transferring polygenic risk scores across global populations. Nat. Rev. Genet. 25, 8–25 (2024). This review describes the factors limiting the generalizability of polygenic scores across populations and explores the merits of currently available methods to improve generalizability.
Chen, I. Y. et al. Ethical machine learning in healthcare. Annu. Rev. Biomed. Data Sci. 4, 123–144 (2021). This paper presents challenges and recommendations for developing equitable machine learning systems in healthcare across the entire development pipeline — from problem selection to deployment.
Burr, C. & Leslie, D. Ethical assurance: a practical approach to the responsible design, development, and deployment of data-driven technologies. AI Ethics 3, 73–98 (2023).
Martin, A. R. et al. Increasing diversity in genomics requires investment in equitable partnerships and capacity building. Nat. Genet. 54, 740–745 (2022).
Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177, 26–31 (2019).
Purcell, S., Cherny, S. S. & Sham, P. C. Genetic power calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics 19, 149–150 (2003).
Wang, G. T., Li, B., Lyn Santos-Cortez, R. P., Peng, B. & Leal, S. M. Power analysis and sample size estimation for sequence-based association studies. Bioinformatics 30, 2377–2378 (2014).
National Academies of Sciences, Engineering, and Medicine. Using population descriptors in genetics and genomics research: a new framework for an evolving field. https://doi.org/10.17226/26902 (National Academies Press, 2023). This National Academies report explores how population descriptors are currently used in genomics research, outlining best practices for researchers within the biomedical and scientific communities.
Dickman, S. L., Himmelstein, D. U. & Woolhandler, S. Inequality and the health-care system in the USA. Lancet 389, 1431–1441 (2017).
Richmond, J., Anderson, A., Cunningham-Erves, J., Ozawa, S. & Wilkins, C. H. Conceptualizing and measuring trust, mistrust, and distrust: implications for advancing health equity and building trustworthiness. Annu. Rev. Public. Health 45, 465–484 (2024).
Hughson, J. et al. A review of approaches to improve participation of culturally and linguistically diverse populations in clinical trials. Trials 17, 263 (2016).
Kraft, S. A. & Doerr, M. Engaging populations underrepresented in research through novel approaches to consent. Am. J. Med. Genet. C. Semin. Med. Genet. 178, 75–80 (2018).
Hemstrom, W., Grummer, J. A., Luikart, G. & Christie, M. R. Next-generation data filtering in the genomics era. Nat. Rev. Genet. 25, 750–767 (2024).
Leek, J. T. et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11, 733–739 (2010).
Kowal, E., Greenwood, A. & McWhirter, R. E. All in the blood: a review of Aboriginal Australians’ cultural beliefs about blood and implications for biospecimen research. J. Empir. Res. Hum. Res. Ethics 10, 347–359 (2015).
Yao, R. A., Akinrinade, O., Chaix, M. & Mital, S. Quality of whole genome sequencing from blood versus saliva derived DNA in cardiac patients. BMC Med. Genomics 13, 11 (2020).
Yancey, A. K., Ortega, A. N. & Kumanyika, S. K. Effective recruitment and retention of minority research participants. Annu. Rev. Public. Health 27, 1–28 (2006).
Wojcik, G. L. et al. Opportunities and challenges for the use of common controls in sequencing studies. Nat. Rev. Genet. 23, 665–679 (2022).
Byrd, J. B., Greene, A. C., Prasad, D. V., Jiang, X. & Greene, C. S. Responsible, practical genomic data sharing that accelerates research. Nat. Rev. Genet. 21, 615–629 (2020).
Boscarino, N., Cartwright, R. A., Fox, K. & Tsosie, K. S. Federated learning and Indigenous genomic data sovereignty. Nat. Mach. Intell. 4, 909–911 (2022).
Schneider, V. A. et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 27, 849–864 (2017).
Ballouz, S., Dobin, A. & Gillis, J. A. Is it time to change the reference genome? Genome Biol. 20, 159 (2019).
Chen, N.-C., Solomon, B., Mun, T., Iyer, S. & Langmead, B. Reference flow: reducing reference bias using multiple population genomes. Genome Biol. 22, 8 (2021).
Huang, L. et al. Genotype-imputation accuracy across worldwide human populations. Am. J. Hum. Genet. 84, 235–250 (2009).
Xiang, R. et al. A comparison for dimensionality reduction methods of single-cell RNA-seq data. Front. Genet. 12, 646936 (2021).
Diaz-Papkovich, A., Anderson-Trocmé, L. & Gravel, S. A review of UMAP in population genetics. J. Hum. Genet. 66, 85–91 (2021).
Kozlov, M. ‘All of Us’ genetics chart stirs unease over controversial depiction of race. Nature https://doi.org/10.1038/d41586-024-00568-w (2024).
Lin, P.-I., Vance, J. M., Pericak-Vance, M. A. & Martin, E. R. No gene is an island: the flip-flop phenomenon. Am. J. Hum. Genet. 80, 531–538 (2007).
Kim, M. S., Patel, K. P., Teng, A. K., Berens, A. J. & Lachance, J. Genetic disease risks can be misestimated across global populations. Genome Biol. 19, 179 (2018).
Buolamwini, J. & Gebru, T. Gender shades: intersectional accuracy disparities in commercial gender classification. in Proceedings of the 1st Conference on Fairness, Accountability and Transparency 77–91 (PMLR, 2018).
Kamiza, A. B. et al. Transferability of genetic risk scores in African populations. Nat. Med. 28, 1163–1166 (2022).
Payne, K., Gavan, S. P., Wright, S. J. & Thompson, A. J. Cost-effectiveness analyses of genetic and genomic diagnostic tests. Nat. Rev. Genet. 19, 235–246 (2018).
Khoury, M. J., Iademarco, M. F. & Riley, W. T. Precision public health for the era of precision medicine. Am. J. Prev. Med. 50, 398–401 (2016).
LaVeist, T. A. et al. The economic burden of racial, ethnic, and educational health inequities in the US. JAMA 329, 1682–1692 (2023).
Cookson, R. et al. Using cost-effectiveness analysis to address health equity concerns. Value Health 20, 206–212 (2017).
Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019).
Liu, X. et al. The medical algorithmic audit. Lancet Digit. Health 4, e384–e397 (2022).
Tian, P. et al. Multiethnic polygenic risk prediction in diverse populations through transfer learning. Front. Genet. 13, 906965 (2022).
Zhao, Z., Fritsche, L. G., Smith, J. A., Mukherjee, B. & Lee, S. The construction of cross-population polygenic risk scores using transfer learning. Am. J. Hum. Genet. 109, 1998–2008 (2022).
Zhao, H., Rebbeck, T. R. & Mitra, N. A propensity score approach to correction for bias due to population stratification using genetic and non‐genetic factors. Genet. Epidemiol. 33, 679–690 (2009).
Zaitlen, N., Paşaniuc, B., Gur, T., Ziv, E. & Halperin, E. Leveraging genetic variability across populations for the identification of causal variants. Am. J. Hum. Genet. 86, 23–33 (2010).
Lehmann, B., Mackintosh, M., McVean, G. & Holmes, C. Optimal strategies for learning multi-ancestry polygenic scores vary across traits. Nat. Commun. 14, 4023 (2023).
Cai, W. et al. Adaptive sampling strategies to construct equitable training datasets. in Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency 1467–1478 (Association for Computing Machinery, 2022).
Jimenez-Kaufmann, A. et al. Imputation performance in Latin American populations: improving rare variants representation with the inclusion of Native American genomes. Front. Genet. 12, 719791 (2022).
Yu, K. et al. Meta-imputation: an efficient method to combine genotype data after imputation with multiple reference panels. Am. J. Hum. Genet. 109, 1007–1015 (2022).
Arriaga-MacKenzie, I. et al. Summix: a method for detecting and adjusting for population structure in genetic summary data. Am. J. Hum. Genet. 108, 1270–1282 (2021).
Martin, E. R. et al. Properties of global- and local-ancestry adjustments in genetic association tests in admixed populations. Genet. Epidemiol. 42, 214–229 (2018).
Gay, N. R. et al. Impact of admixture and ancestry on eQTL analysis and GWAS colocalization in GTEx. Genome Biol. 21, 233 (2020).
Natri, H. M. et al. Genetic architecture of gene regulation in Indonesian populations identifies QTLs associated with global and local ancestries. Am. J. Hum. Genet. 109, 50–65 (2022).
Han, B. & Eskin, E. Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am. J. Hum. Genet. 88, 586–598 (2011).
Mägi, R. et al. Trans-ethnic meta-regression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution. Hum. Mol. Genet. 26, 3639–3650 (2017).
Morris, A. P. Transethnic meta-analysis of genomewide association studies. Genet. Epidemiol. 35, 809–822 (2011).
Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).
Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with Bayesian sparse linear mixed models. PLOS Genet. 9, e1003264 (2013).
Chen, H. et al. Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models. Am. J. Hum. Genet. 98, 653–666 (2016).
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Atkinson, E. et al. Tractor uses local ancestry to enable the inclusion of admixed individuals in GWAS and to boost power. Nat. Genet. 53, 195–204 (2021).
Heckerman, D. et al. Linear mixed model for heritability estimation that explicitly addresses environmental variation. Proc. Natl Acad. Sci. USA 113, 7377–7382 (2016).
Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Ser. B Stat. Methodol. 82, 1273–1300 (2020).
Yuan, K. et al. Fine-mapping across diverse ancestries drives the discovery of putative causal variants underlying human complex traits and diseases. Nat. Genet. 56, 1841–1850 (2024).
Gao, B. & Zhou, X. MESuSiE enables scalable and powerful multi-ancestry fine-mapping of causal variants in genome-wide association studies. Nat. Genet. 56, 170–179 (2024).
Lam, M. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 51, 1670–1678 (2019).
Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1776 (2019).
Jin, J. et al. MUSSEL: enhanced Bayesian polygenic risk prediction leveraging information across multiple ancestry groups. Cell Genomics 4, 100539 (2024).
Zhang, H. et al. A new method for multiancestry polygenic prediction improves performance across diverse populations. Nat. Genet. 55, 1757–1768 (2023).
Zhang, J. et al. An ensemble penalized regression method for multi-ancestry polygenic risk prediction. Nat. Commun. 15, 3238 (2024).
Coram, M. A., Fang, H., Candille, S. I., Assimes, T. L. & Tang, H. Leveraging multi-ethnic evidence for risk assessment of quantitative traits in minority populations. Am. J. Hum. Genet. 101, 218–226 (2017).
Sun, Q. et al. Improving polygenic risk prediction in admixed populations by explicitly modeling ancestral-differential effects via GAUDI. Nat. Commun. 15, 1016 (2024).
Bitarello, B. D. & Mathieson, I. Polygenic scores for height in admixed populations. G3 GenesGenomesGenetics 10, 4027–4036 (2020).
Tanigawa, Y. et al. Components of genetic associations across 2,138 phenotypes in the UK Biobank highlight adipocyte biology. Nat. Commun. 10, 4064 (2019).
Kim, J., Bai, Y. & Pan, W. An adaptive association test for multiple phenotypes with GWAS summary statistics. Genet. Epidemiol. 39, 651–663 (2015).
Xiao, J. et al. XPXP: improving polygenic prediction by cross-population and cross-phenotype analysis. Bioinformatics 38, 1947–1955 (2022).
Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015).
Weissbrod, O. et al. Leveraging fine-mapping and multi-population training data to improve cross-population polygenic risk scores. Nat. Genet. 54, 450–458 (2022).
Amariuta, T. et al. Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements. Nat. Genet. 52, 1346–1354 (2020).
Smith, S. P. et al. Enrichment analyses identify shared associations for 25 quantitative traits in over 600,000 individuals from seven diverse ancestries. Am. J. Hum. Genet. 109, 871–884 (2022).
Hujoel, M. L. A., Loh, P.-R., Neale, B. M. & Price, A. L. Incorporating family history of disease improves polygenic risk scores in diverse populations. Cell Genomics 2, 100152 (2022).
Weale, M. E. et al. Validation of an integrated risk tool, including polygenic risk score, for atherosclerotic cardiovascular disease in multiple ethnicities and ancestries. Am. J. Cardiol. 148, 157–164 (2021). This study validates a new integrated risk tool that combines a traditional clinical risk scores with a polygenic score to improve prediction of atherosclerotic cardiovascular disease across diverse ethnic and ancestry groups.
National Academies of Sciences, Engineering, and Medicine. Improving representation in clinical trials and research: building research equity for women and underrepresented groups. https://doi.org/10.17226/26479 (National Academies Press, 2022).
Haynes, W. A., Tomczak, A. & Khatri, P. Gene annotation bias impedes biomedical research. Sci. Rep. 8, 1362 (2018).
Mitra, R. et al. Learning from data with structured missingness. Nat. Mach. Intell. 5, 13–23 (2023).
Long, E. et al. The case for increasing diversity in tissue-based functional genomics datasets to understand human disease susceptibility. Nat. Commun. 13, 2907 (2022).
Breeze, C. E., Beck, S., Berndt, S. I. & Franceschini, N. The missing diversity in human epigenomic studies. Nat. Genet. 54, 737–739 (2022).
Sofer, T. et al. A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL. Genet. Epidemiol. 41, 251–258 (2017).
Huang, Q. Q. et al. Transferability of genetic loci and polygenic scores for cardiometabolic traits in British Pakistani and Bangladeshi individuals. Nat. Commun. 13, 4664 (2022).
O’Connor, T. D. et al. Rare variation facilitates inferences of fine-scale population structure in humans. Mol. Biol. Evol. 32, 653–660 (2015).
Belbin, G. M. et al. Toward a fine-scale population health monitoring system. Cell 184, 2068–2083.e11 (2021).
Fan, C., Mancuso, N. & Chiang, C. W. K. A genealogical estimate of genetic relationships. Am. J. Hum. Genet. 109, 812–824 (2022).
Visscher, P. M., Hill, W. G. & Wray, N. R. Heritability in the genomics era — concepts and misconceptions. Nat. Rev. Genet. 9, 255–266 (2008).
Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Luo, Y. et al. Estimating heritability and its enrichment in tissue-specific gene sets in admixed populations. Hum. Mol. Genet. 30, 1521–1534 (2021).
Shi, H. et al. Localizing components of shared transethnic genetic architecture of complex traits from GWAS summary data. Am. J. Hum. Genet. 106, 805–817 (2020).
Brown, B. C., Ye, C. J., Price, A. L. & Zaitlen, N. Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 99, 76–88 (2016).
Lu, H. et al. Evaluating marginal genetic correlation of associated loci for complex diseases and traits between European and East Asian populations. Hum. Genet. 140, 1285–1297 (2021).
Cai, M. et al. A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits. Am. J. Hum. Genet. 108, 632–655 (2021).
Wang, Y. et al. Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations. Nat. Commun. 11, 3865 (2020).
Tan, T. & Atkinson, E. G. Strategies for the genomic analysis of admixed populations. Annu. Rev. Biomed. Data Sci. 6, 105–127 (2023).
Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLOS Genet. 2, e190 (2006).
Conomos, M. P., Miller, M. B. & Thornton, T. A. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness. Genet. Epidemiol. 39, 276–293 (2015).
Wu, J., Liu, Y. & Zhao, Y. Systematic review on local ancestor inference from a mathematical and algorithmic perspective. Front. Genet. 12, 639877 (2021).
Salter-Townshend, M. & Myers, S. Fine-scale inference of ancestry segments without prior knowledge of admixing groups. Genetics 212, 869–889 (2019).
Li, N. & Stephens, M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003).
Maples, B. K., Gravel, S., Kenny, E. E. & Bustamante, C. D. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet. 93, 278–288 (2013).
Khramtsova, E. A., Davis, L. K. & Stranger, B. E. The role of sex in the genomics of human complex traits. Nat. Rev. Genet. 20, 173–190 (2019).
Accounting for sex in the genome. Nat. Med. 23, 1243–1243 (2017).
Sun, L., Wang, Z., Lu, T., Manolio, T. A. & Paterson, A. D. eXclusionarY: 10 years later, where are the sex chromosomes in GWASs? Am. J. Hum. Genet. 110, 903–912 (2023).
Khramtsova, E. A. et al. Quality control and analytic best practices for testing genetic models of sex differences in large populations. Cell 186, 2044–2061 (2023).
Clayton, D. Testing for association on the X chromosome. Biostatistics 9, 593–600 (2008).
Loley, C., Ziegler, A. & König, I. R. Association tests for X-chromosomal markers — a comparison of different test statistics. Hum. Hered. 71, 23–36 (2011).
Gao, F. et al. XWAS: a software toolset for genetic data analysis and association studies of the X chromosome. J. Hered. 106, 666–671 (2015).
Webster, T. H. et al. Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data. GigaScience 8, giz074 (2019).
Tallman, S. et al. Missing genetic diversity impacts variant prioritisation for rare disorders. Preprint at medRxiv https://doi.org/10.1101/2024.08.12.24311664 (2024).
Schrijver, I. et al. The spectrum of CFTR variants in nonwhite cystic fibrosis patients: implications for molecular diagnostic testing. J. Mol. Diagn. 18, 39–50 (2016).
Kaseniit, K. E., Haque, I. S., Goldberg, J. D., Shulman, L. P. & Muzzey, D. Genetic ancestry analysis on >93,000 individuals undergoing expanded carrier screening reveals limitations of ethnicity-based medical guidelines. Genet. Med. 22, 1694–1702 (2020).
Khan, A. T. et al. Recommendations on the use and reporting of race, ethnicity, and ancestry in genetic research: experiences from the NHLBI TOPMed program. Cell Genomics 2, 100155 (2022).
Peterson, R. E. et al. The utility of empirically assigning ancestry groups in cross-population genetic studies of addiction. Am. J. Addict. 26, 494–501 (2017).
Martschenko, D. O., Wand, H., Young, J. L. & Wojcik, G. L. Including multiracial individuals is crucial for race, ethnicity and ancestry frameworks in genetics and genomics. Nat. Genet. 55, 895–900 (2023).
Lewis, A. C. F. et al. Getting genetic ancestry right for science and society. Science 376, 250–252 (2022). This paper argues for a move away from discrete continental labels towards a multidimensional, continuous view to characterise genetic ancestry.
Speidel, L., Forest, M., Shi, S. & Myers, S. R. A method for genome-wide genealogy estimation for thousands of samples. Nat. Genet. 51, 1321–1329 (2019).
Kelleher, J. et al. Inferring whole-genome histories in large population datasets. Nat. Genet. 51, 1330–1338 (2019).
Zhang, B. C., Biddanda, A., Gunnarsson, Á. F., Cooper, F. & Palamara, P. F. Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits. Nat. Genet. 55, 768–776 (2023).
Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
Eizenga, J. M. et al. Pangenome graphs. Annu. Rev. Genomics Hum. Genet. 21, 139–162 (2020).
The Computational Pan-Genomics Consortium. Computational pan-genomics: status, promises and challenges. Brief. Bioinform. 19, 118–135 (2018).
Wang, T. et al. The Human Pangenome Project: a global resource to map genomic diversity. Nature 604, 437–446 (2022). This paper introduces the Human Pangenome Reference Consortium’s effort to build a high-quality, graph-based human reference genome that better captures global genetic diversity.
Bonomi, L., Huang, Y. & Ohno-Machado, L. Privacy challenges and research opportunities for genomic data sharing. Nat. Genet. 52, 646–654 (2020).
Arora, A. Synthetic data: the future of open-access health-care datasets? Lancet 401, 997 (2023).
Ghalebikesabi, S. et al. Mitigating statistical bias within differentially private synthetic data. in Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence 696–705 (PMLR, 2022).
Bak, M. et al. Federated learning is not a cure-all for data ethics. Nat. Mach. Intell. 6, 370–372 (2024).
Marmot, M. Social determinants of health inequalities. Lancet 365, 1099–1104 (2005).
Marmot, M. & Allen, J. J. Social determinants of health equity. Am. J. Public. Health 104, S517–S519 (2014).
Sanderson, E. et al. Mendelian randomization. Nat. Rev. Methods Prim. 2, 1–21 (2022).
Burgess, S., Foley, C. N., Allara, E., Staley, J. R. & Howson, J. M. M. A robust and efficient method for Mendelian randomization with hundreds of genetic variants. Nat. Commun. 11, 376 (2020).
Salas, L. A. et al. A transdisciplinary approach to understand the epigenetic basis of race/ethnicity health disparities. Epigenomics 13, 1761–1770 (2021).
Cerutti, J., Lussier, A. A., Zhu, Y., Liu, J. & Dunn, E. C. Associations between indicators of socioeconomic position and DNA methylation: a scoping review. Clin. Epigenetics 13, 221 (2021).
Yousefi, P. D. et al. DNA methylation-based predictors of health: applications and statistical considerations. Nat. Rev. Genet. 23, 369–383 (2022).
Rattray, N. J. W. et al. Beyond genomics: understanding exposotypes through metabolomics. Hum. Genomics 12, 4 (2018).
Yang, G., Mishra, M. & Perera, M. A. Multi-omics studies in historically excluded populations: the road to equity. Clin. Pharmacol. Ther. 113, 541–556 (2023).
Lonsdale, J. et al. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Vandereyken, K., Sifrim, A., Thienpont, B. & Voet, T. Methods and applications for single-cell and spatial multi-omics. Nat. Rev. Genet. 24, 494–515 (2023).
Thomas, C. E. & Peters, U. Genomic landscape of cancer in racially and ethnically diverse populations. Nat. Rev. Genet. 12, 946625 (2024). This review highlights the need for more inclusive cancer genomics research across racial and ethnic groups to better understand population-specific genetic factors and reduce disparities in cancer outcomes.
Alderman, J. E. et al. Tackling algorithmic bias and promoting transparency in health datasets: the STANDING Together consensus recommendations. Lancet Digit. Health 7, e64–e88 (2025). This paper introduces the STANDING Together recommendations, developed through international consultation, to promote transparency and proactive evaluation of health datasets in artificial intelligence technologies, aiming to identify and reduce biases that could exacerbate health inequalities.
Mitchell, S., Potash, E., Barocas, S., D’Amour, A. & Lum, K. Algorithmic fairness: choices, assumptions, and definitions. Annu. Rev. Stat. Its Appl. 8, 141–163 (2021).
Pfohl, S. R. et al. A toolbox for surfacing health equity harms and biases in large language models. Nat. Med. 30, 3590–3600 (2024).
Hindorff, L. A. et al. Prioritizing diversity in human genomics research. Nat. Rev. Genet. 19, 175–185 (2018).
Mello, M. M. & Wolf, L. E. The Havasupai Indian tribe case — lessons for research involving stored biologic samples. N. Engl. J. Med. 363, 204–207 (2010).
Lee, S. S.-J. et al. “I don’t want to be Henrietta Lacks”: diverse patient perspectives on donating biospecimens for precision medicine research. Genet. Med. 21, 107–113 (2019).
Kaye, J. The tension between data sharing and the protection of privacy in genomics research. Annu. Rev. Genomics Hum. Genet. 13, 415–431 (2012).
Israel, B. A. et al. Community-based participatory research: a capacity-building approach for policy advocacy aimed at eliminating health disparities. Am. J. Public. Health 100, 2094–2102 (2010).
Rebbeck, T. R. et al. A framework for promoting diversity, equity, and inclusion in genetics and genomics research. JAMA Health Forum 3, e220603 (2022).
Pereira, L., Mutesa, L., Tindana, P. & Ramsay, M. African genetic diversity and adaptation inform a precision medicine agenda. Nat. Rev. Genet. 22, 284–306 (2021).
Mathieson, I. & Scally, A. What is ancestry? PLoS Genet. 16, e1008624 (2020).
Nielsen, R., Vaughn, A. H. & Deng, Y. Inference and applications of ancestral recombination graphs. Nat. Rev. Genet. 26, 47–58 (2025).
Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).
Márquez-Luna, C. et al. Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets. Nat. Commun. 12, 6052 (2021).
Busby, G. B. et al. Ancestry-specific polygenic risk scores are risk enhancers for clinical cardiovascular disease assessments. Nat. Commun. 14, 7105 (2023).
Fuat, A. et al. A polygenic risk score added to a QRISK®2 cardiovascular disease risk calculator demonstrated robust clinical acceptance and clinical utility in the primary care setting. Eur. J. Prev. Cardiol. 31, 716–722 (2024).
Samani, N. J. et al. Polygenic risk score adds to a clinical risk score in the prediction of cardiovascular disease in a clinical setting. Eur. Heart J. 45, 3152–3160 (2024).
Acknowledgements
The authors gratefully acknowledge the speakers and attendees of the joint Data Science for Health Equity workshops on ‘Challenges to statistical approaches for fairness in genomics’ and ‘Challenges to statistical approaches for health equity’ held in January 2022. They also thank C. Harbron, G. McVean, S. Walker, D. Deen, A. Shalek and H. Martin for comments on an earlier version of this manuscript. F.F. acknowledges the receipt of studentship awards from the Health Data Research UK-The Alan Turing Institute Wellcome PhD Programme (218529/Z/19/Z), and the Enrichment Scheme of The Alan Turing Institute under an Engineering & Physical Sciences Research Council grant (EP/N510129/1). K.K. was supported by the European Research Council under the European Union Horizon 2020 research and innovation programme (948561). N.C. was supported by US National Institutes of Health grants (R01HG013137, R01HG010480, U01HG011719).
Author information
Authors and Affiliations
Contributions
B.L., L.B. and F.F. researched the literature. B.L. and L.B. wrote the article. All authors contributed substantially to discussions of the content, and reviewed and/or edited the manuscript.
Corresponding author
Ethics declarations
Competing interests
This manuscript was informed by a project commissioned by the Diverse Data (DD) initiative at Genomics England (GEL) in December 2022 to explore the use of statistical and machine learning methods to improve fairness and equity in genomics. K.K. is the Scientific Lead for DD. S.T., T.N. and Y.C. are Genomic Data Scientists at GEL. M.S. was the Lead Genomic Data Scientist for DD, and M.M. was the Programme Lead for DD. B.L. and L.B. were paid consultants to GEL for the project. M.M. is Director of One HealthTech, which provides the secretariat for the Data Science for Health Equity community, which B.L. is also the co-founder of. B.L. and L.B. have acted as consultants for Google DeepMind in relation to other research in this field; however, Google DeepMind was not involved in this project or this publication. F.F. is an employee and shareholder of Microsoft Corporation.
Peer review
Peer review information
Nature Reviews Genetics thanks Anna C. F. Lewis, Cheryl L. Willman and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
Catalogue of Bias: https://catalogofbias.org/
Our Future Health: https://ourfuturehealth.org.uk/
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lehmann, B., Bräuninger, L., Cho, Y. et al. Methodological opportunities in genomic data analysis to advance health equity. Nat Rev Genet 26, 635–649 (2025). https://doi.org/10.1038/s41576-025-00839-w
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41576-025-00839-w
This article is cited by
-
Why genomic diversity should not be framed by census alone
Nature Genetics (2025)