Pan-UK Biobank genome-wide association analyses enhance discovery and resolution of ancestry-enriched effects

Karczewski, Konrad J.; Gupta, Rahul; Kanai, Masahiro; Lu, Wenhan; Tsuo, Kristin; Wang, Ying; Walters, Raymond K.; Turley, Patrick; Callier, Shawneequa; Shah, Nirav N.; Baya, Nikolas; Palmer, Duncan S.; Goldstein, Jacqueline I.; Sarma, Gopal; Solomonson, Matthew; Cheng, Nathan; Bryant, Sam; Churchhouse, Claire; Cusick, Caroline M.; Poterba, Timothy; Compitello, John; King, Daniel; Zhou, Wei; Seed, Cotton; Finucane, Hilary K.; Daly, Mark J.; Neale, Benjamin M.; Atkinson, Elizabeth G.; Martin, Alicia R.

doi:10.1038/s41588-025-02335-7

Article
Published: 18 September 2025

Pan-UK Biobank genome-wide association analyses enhance discovery and resolution of ancestry-enriched effects

Nature Genetics volume 57, pages 2408–2417 (2025)Cite this article

10k Accesses
43 Citations
25 Altmetric
Metrics details

Subjects

Abstract

Large biobanks, such as the UK Biobank (UKB), enable massive phenome by genome-wide association studies that elucidate genetic etiology of complex traits. However, people from diverse genetic ancestry groups are often excluded from association analyses due to concerns about population structure introducing false positive associations. Here we generate mixed model associations and meta-analyses across genetic ancestry groups, inclusive of a larger fraction of the UK Biobank than previous efforts, to produce freely available summary statistics for 7,266 traits. We build a quality control and analysis framework informed by genetic architecture. Overall, we identify 14,676 significant loci (P < 5 × 10⁻⁸) in the meta-analysis that were not found in the EUR genetic ancestry group alone, including new associations, for example between CAMK2D and triglycerides. We also highlight associations from ancestry-enriched variation, including a known pleiotropic missense variant in G6PD associated with several biomarker traits. We release these results publicly alongside frequently asked questions that describe caveats for interpretation of results, enhancing available resources for interpretation of risk variants across diverse populations.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Genetic ancestry in the Pan-UKB.**

**Fig. 2: Pan-UKB GWAS resource facilitates multiancestry multitrait analyses.**

**Fig. 3: Heritability informs robustness of GWAS across ancestry–trait pairs.**

**Fig. 4: UKB-wide analysis improves genetic discovery.**

**Fig. 5: Differences in allele frequencies across ancestries yield new genetic discoveries.**

**Fig. 6: Meta-analysis identifies pleiotropic signals from non-European populations.**

Analyses of biomarker traits in diverse UK biobank participants identify associations missed by European-centric analysis strategies

Article 11 August 2021

Efficient and accurate frailty model approach for genome-wide survival association analysis in large-scale biobanks

Article Open access 16 September 2022

Fine-scale population structure and widespread conservation of genetic effect sizes between human groups across traits

Article Open access 03 February 2025

Data availability

All data are available at https://pan.ukbb.broadinstitute.org/, as well as on the AWS Open Data program (https://aws.amazon.com/marketplace/pp/prodview-2efssfw2ezyq6). Sample metadata is available in the UKB showcase under https://biobank.ndph.ox.ac.uk/ukb/dset.cgi?id=2442.

Code availability

All analysis code is available via GitHub at https://github.com/atgu/ukbb_pan_ancestry and via Zenodo at https://doi.org/10.5281/zenodo.15420125 (ref. ⁷³).

References

Abul-Husn, N. S. & Kenny, E. E. Personalized medicine and the power of electronic health records. Cell 177, 58–69 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zhou, W. et al. Global Biobank meta-analysis initiative: powering genetic discovery across human disease. Cell Genom. 2, 100192 (2022).
Article CAS PubMed PubMed Central Google Scholar
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
Article CAS PubMed Google Scholar
Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177, 26–31 (2019).
Article CAS PubMed PubMed Central Google Scholar
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Article CAS PubMed PubMed Central Google Scholar
Morales, J. et al. A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog. Genome Biol. 19, 21 (2018).
Article PubMed PubMed Central Google Scholar
Sinnott-Armstrong, N. et al. Genetics of 35 blood and urine biomarkers in the UK Biobank. Nat. Genet. 53, 185–194 (2021).
Article CAS PubMed Google Scholar
Lam, M. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 51, 1670–1678 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chen, J. et al. The trans-ancestral genomic architecture of glycemic traits. Nat. Genet. 53, 840–860 (2021).
Article CAS PubMed PubMed Central Google Scholar
Hou, K. et al. Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals. Nat. Genet. 55, 549–558 (2023).
Article CAS PubMed PubMed Central Google Scholar
Hu, S. et al. Fine-scale population structure and widespread conservation of genetic effect sizes between human groups across traits. Nat. Genet. 57, 379–389 (2025).
Article CAS PubMed PubMed Central Google Scholar
SIGMA Type 2 Diabetes Consortium et al. Association of a low-frequency variant in HNF1A with type 2 diabetes in a Latino population. JAMA 311, 2305–2314 (2014).
Article Google Scholar
Cohen, J. et al. Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9. Nat. Genet. 37, 161–165 (2005).
Article CAS PubMed Google Scholar
Liu, Z. et al. Genetic architecture of the inflammatory bowel diseases across East Asian and European ancestries. Nat. Genet. 55, 796–806 (2023).
Article CAS PubMed PubMed Central Google Scholar
Miller, L. H., Mason, S. J., Clyde, D. F. & McGinniss, M. H. The resistance factor to Plasmodium vivax in blacks. The Duffy-blood-group genotype, FyFy. N. Engl. J. Med. 295, 302–304 (1976).
Article CAS PubMed Google Scholar
Genovese, G. et al. Association of trypanolytic ApoL1 variants with kidney disease in African Americans. Science 329, 841–845 (2010).
Article CAS PubMed PubMed Central Google Scholar
Ross, M. J. New insights into APOL1 and kidney disease in African children and Brazilians living with end-stage kidney disease. Kidney Int. Rep. 4, 908–910 (2019).
Article PubMed PubMed Central Google Scholar
Genovese, G., Friedman, D. J. & Pollak, M. R. APOL1 variants and kidney disease in people of recent African ancestry. Nat. Rev. Nephrol. 9, 240–244 (2013).
Article CAS PubMed Google Scholar
Mägi, R. et al. Trans-ethnic meta-regression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution. Hum. Mol. Genet. 26, 3639–3650 (2017).
Article PubMed PubMed Central Google Scholar
Asimit, J. L., Hatzikotoulas, K., McCarthy, M., Morris, A. P. & Zeggini, E. Trans-ethnic study design approaches for fine-mapping. Eur. J. Hum. Genet. 24, 1330–1336 (2016).
Article PubMed PubMed Central Google Scholar
Mahajan, A. et al. Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation. Nat. Genet. 54, 560–572 (2022).
Article CAS PubMed PubMed Central Google Scholar
Huang, H. et al. Fine-mapping inflammatory bowel disease loci to single-variant resolution. Nature 547, 173–178 (2017).
Article CAS PubMed PubMed Central Google Scholar
Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).
Article CAS PubMed PubMed Central Google Scholar
Graff, M. et al. Discovery and fine-mapping of height loci via high-density imputation of GWASs in individuals of African ancestry. Am. J. Hum. Genet. 108, 564–582 (2021).
Article CAS PubMed PubMed Central Google Scholar
Luo, Y. et al. A high-resolution HLA reference panel capturing global population diversity enables multi-ethnic fine-mapping in HIV host response. Nat. Genet. 53, 1504–1516 (2021).
Article CAS PubMed PubMed Central Google Scholar
Polygenic Risk Score Task Force of the International Common Disease Alliance. Responsible use of polygenic risk scores in the clinic: potential benefits, risks and gaps. Nat. Med. 27, 1876–1884 (2021).
Article CAS Google Scholar
Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).
Article CAS PubMed PubMed Central Google Scholar
Scutari, M., Mackay, I. & Balding, D. Using genetic distance to infer the accuracy of genomic prediction. PLoS Genet. 12, e1006288 (2016).
Article PubMed PubMed Central Google Scholar
Wang, Y. et al. Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations. Nat. Commun. 11, 3865 (2020).
Article PubMed PubMed Central Google Scholar
Ding, Y. et al. Polygenic scoring accuracy varies across the genetic ancestry continuum. Nature 618, 774–781 (2023).
Article CAS PubMed PubMed Central Google Scholar
Conti, D. V. et al. Trans-ancestry genome-wide association meta-analysis of prostate cancer identifies new susceptibility loci and informs genetic risk prediction. Nat. Genet. 53, 65–75 (2021).
Article CAS PubMed PubMed Central Google Scholar
Bigdeli, T. B. et al. Contributions of common genetic variants to risk of schizophrenia among individuals of African and Latino ancestry. Mol. Psychiatry 25, 2455–2467 (2020).
Article CAS PubMed Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Article CAS PubMed PubMed Central Google Scholar
National Academies of Sciences, Engineering, and Medicine. Using Population Descriptors in Genetics and Genomics Research: a New Framework for an Evolving Field (National Academies Press, 2023).
Ben-Eghan, C. et al. Don’t ignore genetic data from minority populations. Nature 585, 184–186 (2020).
Article CAS PubMed Google Scholar
1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article Google Scholar
Li, J. Z. et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100–1104 (2008).
Article CAS PubMed Google Scholar
Mathieson, I. & Scally, A. What is ancestry? PLoS Genet. 16, e1008624 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lewis, A. C. F. et al. Getting genetic ancestry right for science and society. Science 376, 250–252 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Article CAS PubMed PubMed Central Google Scholar
Schizophrenia Psychiatric Genome-Wide Association Study (GWAS) Consortium. Genome-wide association study identifies five new schizophrenia loci. Nat. Genet. 43, 969–976 (2011).
Article Google Scholar
Denny, J. C. et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat. Biotechnol. 31, 1102–1110 (2013).
Article CAS PubMed PubMed Central Google Scholar
COVID-19 Host Genetics Initiative. Mapping the human genetic architecture of COVID-19. Nature 600, 472–477 (2021).
Article Google Scholar
Howrigan, D. Details and considerations of the UK Biobank GWAS. https://www.nealelab.is/blog/2017/9/11/details-and-considerations-of-the-uk-biobank-gwas (2017).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Article CAS PubMed PubMed Central Google Scholar
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Article CAS PubMed PubMed Central Google Scholar
Pazokitoroudi, A. et al. Efficient variance components analysis across millions of genomes. Nat. Commun. 11, 4020 (2020).
Article CAS PubMed PubMed Central Google Scholar
Ghoussaini, M. et al. Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics. Nucleic Acids Res. 49, D1311–D1320 (2020).
Article PubMed Central Google Scholar
Solovieff, N., Cotsapas, C., Lee, P. H., Purcell, S. M. & Smoller, J. W. Pleiotropy in complex traits: challenges and strategies. Nat. Rev. Genet. 14, 483–495 (2013).
Article CAS PubMed PubMed Central Google Scholar
Sun, L., Wang, Z., Lu, T., Manolio, T. A. & Paterson, A. D. eXclusionarY: 10 years later, where are the sex chromosomes in GWASs? Am. J. Hum. Genet. 110, 903–912 (2023).
Article CAS PubMed PubMed Central Google Scholar
Rasooly, D. et al. Genome-wide association analysis and Mendelian randomization proteomics identify drug targets for heart failure. Nat. Commun. 14, 3826 (2023).
Article CAS PubMed PubMed Central Google Scholar
Gage, P. J., Suh, H. & Camper, S. A. Dosage requirement of Pitx2 for development of multiple organs. Development 126, 4643–4651 (1999).
Article CAS PubMed Google Scholar
Tümer, Z. & Bach-Holm, D. Axenfeld-Rieger syndrome and spectrum of PITX2 and FOXC1 mutations. Eur. J. Hum. Genet. 17, 1527–1539 (2009).
Article PubMed PubMed Central Google Scholar
Berry, F. B. et al. Functional interactions between FOXC1 and PITX2 underlie the sensitivity to FOXC1 gene dose in Axenfeld–Rieger syndrome and anterior segment dysgenesis. Hum. Mol. Genet. 15, 905–919 (2006).
Article CAS PubMed Google Scholar
Gibson, G. Population genetics and GWAS: a primer. PLoS Biol. 16, e2005485 (2018).
Article PubMed PubMed Central Google Scholar
Martin, A. R., Daly, M. J., Robinson, E. B., Hyman, S. E. & Neale, B. M. Predicting polygenic risk of psychiatric disorders. Biol. Psychiatry 86, 97–109 (2019).
Article PubMed Google Scholar
Liu, D. J. & Leal, S. M. Estimating genetic effects and quantifying missing heritability explained by identified rare-variant associations. Am. J. Hum. Genet. 91, 585–596 (2012).
Article CAS PubMed PubMed Central Google Scholar
Sarnowski, C. et al. Impact of rare and common genetic variants on diabetes diagnosis by hemoglobin A1c in multi-ancestry cohorts: the Trans-Omics for Precision Medicine Program. Am. J. Hum. Genet. 105, 706–718 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kanai, M. et al. Meta-analysis fine-mapping is often miscalibrated at single-variant resolution. Cell Genom. 2, 100210 (2022).
Article CAS PubMed PubMed Central Google Scholar
Atkinson, E. G. et al. Tractor uses local ancestry to enable the inclusion of admixed individuals in GWAS and to boost power. Nat. Genet. 53, 195–204 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zhou, W. et al. Global Biobank Meta-analysis Initiative: powering genetic discovery across human diseases. Cell Genom. 2, 100192 (2022).
Article CAS PubMed PubMed Central Google Scholar
Kurki, M. I. et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 613, 508–518 (2023).
Article CAS PubMed PubMed Central Google Scholar
Breeyear, J. H. et al. Adaptive selection at G6PD and disparities in diabetes complications. Nat. Med. 30, 2480–2488 (2024).
Article CAS PubMed PubMed Central Google Scholar
All of Us Research Program Genomics Investigators. Genomic data in the All of Us research program. Nature 627, 340–346 (2024).
Article Google Scholar
Panagiotou, O. A., Willer, C. J., Hirschhorn, J. N. & Ioannidis, J. P. A. The power of meta-analysis in genome-wide association studies. Annu. Rev. Genomics Hum. Genet. 14, 441–465 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lin, D. Y. & Zeng, D. Meta-analysis of genome-wide association studies: no efficiency gain in using individual participant data. Genet. Epidemiol. 34, 60–66 (2010).
Article CAS PubMed Google Scholar
Balding, D. J. A tutorial on statistical methods for population association studies. Nat. Rev. Genet. 7, 781–791 (2006).
Article CAS PubMed Google Scholar
Witherspoon, D. J. et al. Genetic similarities within and between human populations. Genetics 176, 351–359 (2007).
Article CAS PubMed PubMed Central Google Scholar
Henn, B. M., Cavalli-Sforza, L. L. & Feldman, M. W. The great human expansion. Proc. Natl Acad. Sci. USA 109, 17758–17764 (2012).
Article CAS PubMed PubMed Central Google Scholar
Bamshad, M., Wooding, S., Salisbury, B. A. & Stephens, J. C. Deconstructing the relationship between genetics and race. Nat. Rev. Genet. 5, 598–609 (2004).
Article CAS PubMed Google Scholar
Meyer, M. N. et al. Wrestling with social and behavioral genomics: risks, potential benefits, and ethical responsibility. Hastings Cent. Rep. 53, S2–S49 (2023).
Article PubMed PubMed Central Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Karczewski, K. et al. atgu/ukbb_pan_ancestry: figure release v.1.0. Zenodo https://doi.org/10.5281/zenodo.15420124 (2025).
Zhang, X. et al. Whole genome sequencing analysis of body mass index identifies novel African ancestry-specific risk allele. Preprint at medRxiv https://doi.org/10.1101/2023.08.21.23293271 (2023).

Download references

Acknowledgements

We thank P. Kraft and J.-A. Dias for helpful discussions. This work was supported by the Novo Nordisk Foundation (NNF21SA0072102; K.J.K., B.M.N.), NIH grants R37MH107649 (B.M.N.), R00MH117229 (A.R.M.), K01MH121659 (E.G.A.), F31HL167378 (K.T.) and F30AG074507 (R.G.) and BroadIgnite funding (A.R.M.).

Author information

These authors contributed equally: Konrad J. Karczewski, Rahul Gupta, Masahiro Kanai.

Authors and Affiliations

Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Konrad J. Karczewski, Rahul Gupta, Masahiro Kanai, Wenhan Lu, Kristin Tsuo, Ying Wang, Nikolas Baya, Duncan S. Palmer, Jacqueline I. Goldstein, Gopal Sarma, Matthew Solomonson, Nathan Cheng, Sam Bryant, Claire Churchhouse, Caroline M. Cusick, Timothy Poterba, John Compitello, Daniel King, Wei Zhou, Cotton Seed, Hilary K. Finucane, Mark J. Daly, Benjamin M. Neale & Alicia R. Martin
Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
Konrad J. Karczewski, Rahul Gupta, Masahiro Kanai, Wenhan Lu, Kristin Tsuo, Ying Wang, Raymond K. Walters, Nikolas Baya, Duncan S. Palmer, Jacqueline I. Goldstein, Gopal Sarma, Matthew Solomonson, Nathan Cheng, Sam Bryant, Claire Churchhouse, Caroline M. Cusick, Timothy Poterba, John Compitello, Daniel King, Wei Zhou, Cotton Seed, Hilary K. Finucane, Mark J. Daly, Benjamin M. Neale & Alicia R. Martin
Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Konrad J. Karczewski & Benjamin M. Neale
Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
Rahul Gupta & Kristin Tsuo
Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, USA
Masahiro Kanai
Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Wenhan Lu, Kristin Tsuo, Ying Wang, Raymond K. Walters, Nikolas Baya, Duncan S. Palmer, Jacqueline I. Goldstein, Gopal Sarma, Nathan Cheng, Sam Bryant, Claire Churchhouse, Caroline M. Cusick, Timothy Poterba, John Compitello, Daniel King, Wei Zhou, Cotton Seed, Hilary K. Finucane, Mark J. Daly, Benjamin M. Neale & Alicia R. Martin
Department of Economics, University of Southern California, Los Angeles, CA, USA
Patrick Turley
Center for Economic and Social Research, University of Southern California, Los Angeles, CA, USA
Patrick Turley
Department of Clinical Research and Leadership, The George Washington University, Washington, DC, USA
Shawneequa Callier
Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
Shawneequa Callier
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
Nirav N. Shah & Elizabeth G. Atkinson
Institute for Molecular Medicine, Helsinki, Finland
Mark J. Daly

Authors

Konrad J. Karczewski
View author publications
Search author on:PubMed Google Scholar
Rahul Gupta
View author publications
Search author on:PubMed Google Scholar
Masahiro Kanai
View author publications
Search author on:PubMed Google Scholar
Wenhan Lu
View author publications
Search author on:PubMed Google Scholar
Kristin Tsuo
View author publications
Search author on:PubMed Google Scholar
Ying Wang
View author publications
Search author on:PubMed Google Scholar
Raymond K. Walters
View author publications
Search author on:PubMed Google Scholar
Patrick Turley
View author publications
Search author on:PubMed Google Scholar
Shawneequa Callier
View author publications
Search author on:PubMed Google Scholar
Nirav N. Shah
View author publications
Search author on:PubMed Google Scholar
Nikolas Baya
View author publications
Search author on:PubMed Google Scholar
Duncan S. Palmer
View author publications
Search author on:PubMed Google Scholar
Jacqueline I. Goldstein
View author publications
Search author on:PubMed Google Scholar
Gopal Sarma
View author publications
Search author on:PubMed Google Scholar
Matthew Solomonson
View author publications
Search author on:PubMed Google Scholar
Nathan Cheng
View author publications
Search author on:PubMed Google Scholar
Sam Bryant
View author publications
Search author on:PubMed Google Scholar
Claire Churchhouse
View author publications
Search author on:PubMed Google Scholar
Caroline M. Cusick
View author publications
Search author on:PubMed Google Scholar
Timothy Poterba
View author publications
Search author on:PubMed Google Scholar
John Compitello
View author publications
Search author on:PubMed Google Scholar
Daniel King
View author publications
Search author on:PubMed Google Scholar
Wei Zhou
View author publications
Search author on:PubMed Google Scholar
Cotton Seed
View author publications
Search author on:PubMed Google Scholar
Hilary K. Finucane
View author publications
Search author on:PubMed Google Scholar
Mark J. Daly
View author publications
Search author on:PubMed Google Scholar
Benjamin M. Neale
View author publications
Search author on:PubMed Google Scholar
Elizabeth G. Atkinson
View author publications
Search author on:PubMed Google Scholar
Alicia R. Martin
View author publications
Search author on:PubMed Google Scholar

Contributions

K.J.K. developed pipelines and performed association analysis, variant and association QC, and summary statistics analyses. K.J.K., R.G., M.K., W.L., N.N.S. and A.R.M. generated figures. R.G. performed heritability analysis and QC. M.K. created LD matrices and LD scores, and performed locus definition analysis, meta-analysis and fine-mapping. W.L. performed additional association analyses. K.T. and N.B. performed LD analyses including clumping. Y.W. performed polygenicity analyses. R.K.W. performed sample QC and advised on association analyses. P.T., S.C., E.G.A. and A.R.M. wrote the FAQs. N.N.S. and E.G.A. performed Tractor analyses. D.S.P. and E.G.A. performed phenotype curation, processing and QC. J.I.G., T.P., J.C., D.K. and C.S. built the Hail infrastructure that enabled association analysis. G.S. curated prescription data. M.S. developed the website. N.C. performed initial comparisons of meta- and mega-analysis. S.B., C.C. and C.M.C. provided data and project management. W.Z. aided in development of association methods. A.R.M., H.K.F., M.J.D., B.M.N. and E.G.A. provided oversight and direction of the project. A.R.M. performed ancestry assignment and pruning analysis. A.R.M., E.G.A., B.M.N. and K.J.K. conceived the study. K.J.K., R.G., M.K., E.G.A. and A.R.M. wrote the manuscript. All authors reviewed and approved the manuscript.

Corresponding author

Correspondence to Alicia R. Martin.

Ethics declarations

Competing interests

K.J.K. is a consultant for Tome Biosciences, AlloDx and Vor Biosciences, and a member of the scientific advisory board of Nurture Genomics. M.J.D is a founder of Maze Therapeutics. B.M.N. is a member of the scientific advisory board at Deep Genomics and Neumora. The other authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Paul Auer and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Global and subcontinental PCA.

a, Global PCA projection of UKBB into PCs 1-2 defined by HGDP and 1000 Genomes Project reference panel, which are shown in colored dots on top of UKBB in black. b, Global PCA density plot of UKBB points only, excluding reference panel. c, Map of HGDP, 1000 Genomes Project, and AGVP reference used to define AFR PC space. d, PCs 1-2 within AFR, reference panel colored, UKBB in grey. Inset shows density of UKBB samples assigned to AFR using a random forest. In c and d, colors and shapes are consistent across panels. e, Map of HGDP and 1000 Genomes Project reference used to define CSA PC space. f, PCs 1-2 within CSA, reference panel colored, UKBB in grey. Inset shows density of UKBB samples assigned to CSA using a random forest. In e and f, colors and shapes are consistent across panels. g, Map of HGDP and 1000 Genomes Project reference used to define EAS PC space. h, PCs 1-2 within EAS, reference panel colored, UKBB in grey. Inset shows density of UKBB samples assigned to EAS using a random forest. In g and h, colors and shapes are consistent across panels. i, Map of HGDP and 1000 Genomes Project reference used to define EUR PC space. j, PCs 1-2 within EUR, reference panel colored, UKBB in grey. Inset shows density of UKBB samples assigned to EUR using a random forest. In I and j, colors and shapes are consistent across panels. k, Map of HGDP and 1000 Genomes Project reference used to define MID PC space. l, PCs 1-2 within MID, reference panel colored, UKBB in grey. Inset shows density of UKBB samples assigned to MID using a random forest. In k and l, colors and shapes are consistent across panels.

Extended Data Fig. 2 Heritability informs the robustness of GWAS across ancestry-trait pairs.

a, Heritability estimates are generally concordant in the EUR genetic ancestry group across 64 pilot phenotypes (Supplementary Table 9) and two statistical methods. RHE-mc uses a randomized multi-component version of classical Haseman-Elston regression with a genetic relatedness matrix⁵³, whereas S-LDSC uses GWAS summary statistics⁵². For binary phenotypes, heritability estimates are reported on the liability scale. All pilot phenotypes are shown, except for sepsis, which had negative heritability estimates by both methods. The dotted line shows y = x, while the dashed line is a fitted linear regression (slope = 0.87, intercept = 0.05, P = 7 × 10⁻¹³). Error bars indicate one standard error. b, Across the same non-EUR ancestry-trait pairs, heritability estimated with RHE-mc have higher z-scores due to the smaller standard errors compared to S-LDSC. Dashed line at z = 4 was used as a QC filter. c, As in Fig. 2b, without filtering to phenotypes passing QC, but instead only filtering to EUR z > 4 and defined heritability in both genetic ancestry groups. Dotted line shows y = x and dashed line shows York regression fit (n phenotypes = 147, slope = 0.66, intercept = 0.17, P < 10⁻¹⁰⁰). Points indicate the point estimate of heritability, and error bars indicate one standard error.

Extended Data Fig. 3 Heritability summaries across trait types and genetic ancestry groups.

a, The confidence metrics (heritability z score) across traits (columns) and ancestry groups (rows) are shown for the final heritability metrics used (S-LDSC for EUR, otherwise RHE-mc). Dashed line indicates inclusion criteria (z ≥ 4). b, The mean observed heritability (h²) is plotted by ancestry group and trait type. For ancestry groups with smaller sample sizes, heritabilities are likely inflated due to a combination of residual stratification and winner’s curse, as only significantly heritable phenotypes in each ancestry group are shown. Error bars are standard deviations of the distribution of the heritability point estimates.

Extended Data Fig. 4 Improved identification of associations by EFO category.

Number (left) and percentage (right) of known and novel variants identified in this study compared to the GWAS catalog across EFO categories.

Extended Data Fig. 5 GWAS hits near haploinsufficient genes.

a, The percentage of novel associations by gene category. 66% of haploinsufficient genes have a novel significant hit nearby, compared to 34% of all genes. b, Locuszoom plots of a 1-Mb region around rs1379871 (purple diamond; DMD), for whole body fat mass (P = 1.84 × 10⁻⁴¹; n = 431,792). The −log₁₀(P-value) is plotted along chromosomal position, with neighboring variants colored by sample-size weighted LD (with lead SNP) for ancestries included in meta-analysis (gray: LD not defined for at least one ancestry group). This variant has recently been identified in a larger study of BMI⁷⁴.

Extended Data Fig. 6 Comparison of meta-analysis and EUR summary statistics.

a, As in Fig. 3c, the P-value in EUR is plotted compared to the P-value in the meta-analysis, as a density plot to indicate the relative number of points in each region of the plot. Three quadrants are highlighted for significant in meta-analysis only (green), both meta-analysis and EUR (purple), and EUR-only (blue). b, Summaries and meta-data of the variants in each of these three quadrants are shown. Heterogeneous is defined as Cochran’s Q P < 0.01, low INFO score is defined as INFO < 0.9, and low quality is defined as failing quality filters from gnomAD or allele frequency significantly differing between gnomAD and Pan-UKB in at least one ancestry group (see Supplementary Note, QC of summary statistics). Common is defined as frequency > 1%.

Extended Data Fig. 7 Fine-mapping of the G6PD locus.

a,b, Fine-mapping results for rs1050828 (G6PD) in AFR (a) and meta-analysis (b). a, AFR fine-mapping results highlight the missense variant (rs1050828) in a credible set, with a second independent signal for some phenotypes. b, Meta-analysis fine-mapped results show instability as the major signal at rs1050828 is discovered in a group with a relatively small sample size, which results in a small contribution to the LD panel and thus, poor performance in fine-mapping. Detailed results are shown in Supplementary Table 12.

Extended Data Fig. 8 Manhattan and QQ plot comparison for SAIGE and Tractor GWAS for mean corpuscular hemoglobin concentration.

a,b, Original GWAS performed using SAIGE for AFR. c-f, Among AFR individuals, Tractor GWAS results are shown for AFR haplotypes (c,d) and EUR haplotypes (e,f). QQ plots in b, d and f include bands indicating the confidence bounds based on the normal distribution.

Extended Data Table 1 Comparison of λ₁₀₀₀ and λ_GC for five phenotypes across three association study paradigms

Full size table

Extended Data Table 2 Number of significant associations for height in AFR and CSA

Full size table

Supplementary information

Supplementary Information

Supplementary Note, Figs. 1–34 and Tables 1–12.

Reporting Summary

Peer Review File

Supplementary Data 1

Assigned genetic ancestry labels correlate with the country of birth or known migration events. The number of people by genetic ancestry and country of birth (non-UK) are shown.

Supplementary Data 2

Summary of all phenotypes in Pan-UKB. Phenotypes are keyed by five keys: trait type, phenocode, pheno_sex, coding and modifier. Where available, description and coding_description are provided from the UKB showcase. For each ancestry group, we include the number of cases, heritability estimates (observed, liability, standard errors and z scores), whether the phenotype passes QC, and lambda GC. We provide QC flags, whether the phenotype is in the maximal independent set, and filename information, including a download link for the phenotype-specific file and tabix index on Amazon S3 and md5 checksums for each.

Supplementary Data 3

Summary of all heritability metrics. Phenotypes are keyed as in Supplementary Data 2. For each ancestry group, we provide heritability estimates (observed, liability, standard errors and z scores) for LDSC and S-LDSC, and for ancestry groups other than EUR, also RHE-mc, as well as details of QC flags.

Supplementary Data 4

Pairwise genetic correlations. Genetic correlations (rg) from S-LDSC are computed for pairs of 528 phenotypes (phenotype_code_1 and phenotype_code_2) using summary statistics from EUR.

Supplementary Data 5

Pairwise phenotypic correlations. Covariates were regressed out from each of the 452 high-quality phenotypes, and pairwise correlations (entry) were computed for each pair of phenotypes (residuals), i (with phenotype identifier in i_data) and j (identifier in j_data). The correlation for all phenotypes is available at gs://ukb-diverse-pops-public/misc/pairwise/pairwise_correlations_regressed.txt.bgz.

Supplementary Data 6

Polygenicity estimates. Polygenicity estimates (mean and s.d.) from SBayesS for 451 phenotypes, along with convergence criteria (R_GelmanRubin).

Supplementary Data 7

Summary statistics for key loci across GWAS methods. SAIGE AFR and SAIGE EUR refer to the SAIGE analyses performed on the AFR and EUR genetically inferred ancestry groups of UKB. Tractor AFR and Tractor EUR indicate the Tractor GWAS conducted on the AFR or EUR haplotype tracts, respectively, within the AFR group. Variants are filtered as described above in Tractor GWAS analysis.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Karczewski, K.J., Gupta, R., Kanai, M. et al. Pan-UK Biobank genome-wide association analyses enhance discovery and resolution of ancestry-enriched effects. Nat Genet 57, 2408–2417 (2025). https://doi.org/10.1038/s41588-025-02335-7

Download citation

Received: 13 March 2024
Accepted: 13 August 2025
Published: 18 September 2025
Version of record: 18 September 2025
Issue date: October 2025
DOI: https://doi.org/10.1038/s41588-025-02335-7

This article is cited by

Ancestral diversity in complex disease genetics: from discovery to translation
- Karoline Kuchenbaecker
- Georgina Navoly
Nature Reviews Genetics (2026)
MultiSuSiE improves multi-ancestry fine-mapping in All of Us whole-genome sequencing data
- Jordan Rossen
- Huwenbo Shi
- Alkes L. Price
Nature Genetics (2026)
Multi-trait and multi-ancestry genetic analysis of comorbid lung diseases and traits improves genetic discovery and polygenic risk prediction
- Yixuan He
- Wenhan Lu
- Alicia R. Martin
Nature Genetics (2026)
Genomics of drug target prioritization for complex diseases
- Robert Chen
- Áine Duffy
- Ron Do
Nature Reviews Genetics (2025)
Machine learning-guided deconvolution of plasma protein levels
- Maik Pietzner
- Carl Beuchel
- Claudia Langenberg
Molecular Systems Biology (2025)