Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Noncoding rare variant associations with blood traits in 166,740 UK Biobank genomes

Abstract

Large biobanks with whole-genome sequencing (WGS) now enable the association of noncoding rare variants with complex human traits. Given that >98% of the genome is available for exploration, the selection of noncoding variants remains a critical yet unresolved challenge in these analyses. Here we leverage knowledge of blood gene regulation and deleteriousness scores to select noncoding variants pertinent for association with blood-related traits. Integrating WGS and 42 blood cell count and biomarker measurements for 166,740 UK Biobank samples, we perform variant collapsing tests, identifying hundreds of gene–trait associations involving noncoding variants. However, we demonstrate that most of these noncoding rare variant associations (1) reproduce associations known from previous studies and (2) are driven by linkage disequilibrium between nearby common and rare variants. This study underscores the prevailing challenges in rare variant analysis and the need for caution when interpreting noncoding rare variant association results.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Schematic representation of the study approach.
Fig. 2: Manhattan plot for gene-based rare variant associations across 42 traits and multiple annotations.
Fig. 3: Known gene–trait associations.
Fig. 4: Rare variant association type and conditioning analysis.
Fig. 5: Replication of discovered gene–trait associations.

Similar content being viewed by others

Data availability

The lists of gene–trait associations identified here are available as Supplementary Tables 112. These lists and other source data are available via Zenodo at https://doi.org/10.5281/zenodo.15546894 (ref. 68). The WGS data and individual phenotypes used for association testing can be accessed via the UKBB Research Analysis Platform: https://ukbiobank.dnanexus.com/landing. This platform is open to researchers who are listed as collaborators on UKBB-approved access applications. Public regulatory annotation datasets used include: the ABC models, available at https://www.engreitzlab.org/resources; promoter capture Hi-C data, obtained from ref. 28, accessible at https://osf.io/u8tzp; and CRDs obtained from the original publications29,30. CADD scores of deleteriousness are available at https://cadd.gs.washington.edu. Known gene–trait associations used are available in the GWAS catalog (https://www.ebi.ac.uk/gwas) and Genebass databases (https://app.genebass.org). Source data are provided with this paper.

Code availability

Code to reproduce analysis and plots are avilable via GitHub at https://github.com/diogomribeiro/noncoding_rarevariant and via Zenodo at https://doi.org/10.5281/zenodo.15546894.

References

  1. Abdellaoui, A., Yengo, L., Verweij, K. J. H. & Visscher, P. M. 15 years of GWAS discovery: realizing the promise. Am. J. Hum. Genet. 110, 179–194 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. Rubinacci, S., Delaneau, O. & Marchini, J. Genotype imputation using the positional Burrows Wheeler transform. PLoS Genet. 16, e1009049 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Li, Y., Willer, C., Sanna, S. & Abecasis, G. Genotype imputation. Annu. Rev. Genom. Hum. Genet. 10, 387–406 (2009).

    CAS  Google Scholar 

  4. Wu, M. C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Bocher, O. & Génin, E. Rare variant association testing in the non-coding genome. Hum. Genet. 139, 1345–1362 (2020).

    PubMed  Google Scholar 

  6. Wainschtein, P. et al. Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nat. Genet. 54, 263–273 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Zuk, O. et al. Searching for missing heritability: designing rare variant association studies. Proc. Natl Acad. Sci. USA 111, E455–E464 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Wang, Q. et al. Rare variant contribution to human disease in 281,104 UK Biobank exomes. Nature 597, 527–532 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Momozawa, Y. & Mizukami, K. Unique roles of rare variants in the genetics of complex diseases in humans. J. Hum. Genet. 66, 11–23 (2021).

    PubMed  Google Scholar 

  10. Backman, J. D. et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature 599, 628–634 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Hall, S. S. Genetics: a gene of rare effect. Nature 496, 152–155 (2013).

    CAS  PubMed  Google Scholar 

  12. Sabatine, M. S. et al. Evolocumab and clinical outcomes in patients with cardiovascular disease. N. Engl. J. Med. 376, 1713–1722 (2017).

    CAS  PubMed  Google Scholar 

  13. Karczewski, K. J. et al. Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes. Cell Genom. 2, 100168 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Hawkes, G. et al. Whole genome sequencing analysis identifies rare, large-effect noncoding variants and regulatory regions associated with circulating protein levels. Nat. Genet. 57, 626–634 (2025).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Hawkes, G. et al. Whole-genome sequencing in 333,100 individuals reveals rare non-coding single variant and aggregate associations with height. Nat. Commun. 15, 8549 (2024).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Halldorsson, B. V. et al. The sequences of 150,119 genomes in the UK Biobank. Nature 607, 732–740 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. All of Us Research Program Investigators et al. The ‘All of Us’ research program. N. Engl. J. Med. 381, 668–676 (2019).

    Google Scholar 

  22. Andersson, R. & Sandelin, A. Determinants of enhancer and promoter activities of regulatory elements. Nat. Rev. Gene. 21, 71–87 (2020).

    CAS  Google Scholar 

  23. Claringbould, A. & Zaugg, J. B. Enhancers in disease: molecular basis and emerging treatment strategies. Trends Mol. Med. 27, 1060–1073 (2021).

    CAS  PubMed  Google Scholar 

  24. Ribeiro, D. M. et al. The molecular basis, genetic control and pleiotropic effects of local gene co-expression. Nat. Commun. 12, 4842 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Hoellinger, T. et al. Enhancer/gene relationships: need for more reliable genome-wide reference sets. Front. Bioinform. 3, 1092853 (2023).

    PubMed  PubMed Central  Google Scholar 

  26. Sonawane, A. R. et al. Understanding tissue-specific gene regulation. Cell Rep. 21, 1077–1088 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Delaneau, O. et al. Chromatin three-dimensional interactions mediate genetic effects on gene expression. Science 364, eaat8266 (2019).

    CAS  PubMed  Google Scholar 

  30. Avalos, D. et al. Genetic variation in cis-regulatory domains suggests cell type-specific regulatory mechanisms in immunity. Commun. Biol. 6, 335 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

    PubMed Central  Google Scholar 

  32. Boix, C. A., James, B. T., Park, Y. P., Meuleman, W. & Kellis, M. Regulatory genomic circuitry of human disease loci by integrative epigenomics. Nature 590, 300–307 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Fulco, C. P. et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Nasser, J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Gazal, S. et al. Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity. Nat. Genet. 54, 827–836 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Dey, K. K. et al. SNP-to-gene linking strategies reveal contributions of enhancer-related and candidate master-regulator genes to autoimmune disease. Cell Genom. 2, 100145 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Bocher, O. et al. Testing for association with rare variants in the coding and non-coding genome: RAVA-FIRST, a new approach based on CADD deleteriousness score. PLoS Genet. 18, e1009923 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019).

    CAS  PubMed  Google Scholar 

  39. Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 53, 1097–1103 (2021).

    CAS  PubMed  Google Scholar 

  40. Ribeiro, D. M., Ziyani, C. & Delaneau, O. Shared regulation and functional relevance of local gene co-expression revealed by single cell analysis. Commun. Biol. 5, 876 (2022).

    Google Scholar 

  41. Hambleton, S. et al. IRF8 mutations and human dendritic-cell immunodeficiency. N. Engl. J. Med. 365, 127–138 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Tamura, T., Kurotaki, D. & Koizumi, S.-I. Regulation of myelopoiesis by the transcription factor IRF8. Int. J. Hematol. 101, 342–351 (2015).

    CAS  PubMed  Google Scholar 

  43. Karczewski, K. J. et al. Pan-UK Biobank GWAS improves discovery, analysis of genetic architecture, and resolution into ancestry-enriched effects. Preprint at bioRxiv https://doi.org/10.1101/2024.03.13.24303864 (2024).

  44. Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. Vuckovic, D. et al. The polygenic and monogenic basis of blood traits and diseases. Cell 182, 1214–1231 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. Mostafavi, H., Spence, J. P., Naqvi, S. & Pritchard, J. K. Systematic differences in discovery of genetic effects on gene expression and complex traits. Nat. Genet. 55, 1866–1875 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. Connally, N. J. et al. The missing link between genetic association and regulatory function. eLife 11, e74970 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. Li, Z. et al. Dynamic scan procedure for detecting rare-variant association regions in whole-genome sequencing studies. Am. J. Hum. Genet. 104, 802–814 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. Yang, Y. et al. eSCAN: scan regulatory regions for aggregate association testing using whole-genome sequencing data. Brief. Bioinform. 23, bbab497 (2022).

    PubMed  Google Scholar 

  50. Li, Z. et al. A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies. Nat. Methods 19, 1599–1611 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Li, X. et al. Powerful, scalable and resource-efficient meta-analysis of rare variant associations in large whole genome sequencing studies. Nat. Genet. 55, 154–164 (2023).

    CAS  PubMed  Google Scholar 

  52. Hu, Y. et al. Whole-genome sequencing association analysis of quantitative red blood cell phenotypes: the NHLBI TOPMed program. Am. J. Hum. Genet. 108, 874–893 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Gaynor, S. M. et al. Yield of genetic association signals from genomes, exomes, and imputation in the UK Biobank. Nat. Genet. 56, 2345–2351 (2024).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Yengo, L. et al. A saturated map of common genetic variants associated with human height. Nature 610, 704–712 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. Graham, S. E. et al. The power of genetic diversity in genome-wide association studies of lipids. Nature 600, 675–679 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. Dickson, S. P., Wang, K., Krantz, I., Hakonarson, H. & Goldstein, D. B. Rare variants create synthetic genome-wide associations. PLoS Biol. 8, e1000294 (2010).

    PubMed  PubMed Central  Google Scholar 

  57. Anderson, C. A., Soranzo, N., Zeggini, E. & Barrett, J. C. Synthetic associations are unlikely to account for many common disease genome-wide association signals. PLoS Biol. 9, e1000580 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. Wray, N. R., Purcell, S. M. & Visscher, P. M. Synthetic associations created by rare variants do not explain most GWAS results. PLoS Biol. 9, e1000579 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. Hofmeister, R. J., Ribeiro, D. M., Rubinacci, S. & Delaneau, O. Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank. Nat. Genet. 55, 1243–1249 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  60. Dutta, D. et al. Meta-MultiSKAT: multiple phenotype meta-analysis for region-based association test. Genet. Epidemiol. 43, 800–814 (2019).

    PubMed  PubMed Central  Google Scholar 

  61. Smedley, D. et al. A whole-genome analysis framework for effective identification of pathogenic regulatory variants in Mendelian disease. Am. J. Hum. Genet. 99, 595–606 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).

    PubMed  PubMed Central  Google Scholar 

  63. Ionita-Laza, I., McCallum, K., Xu, B. & Buxbaum, J. D. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat. Genet. 48, 214–220 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023).

    CAS  PubMed  Google Scholar 

  65. Lee, B. T. et al. The UCSC Genome Browser database: 2022 update. Nucleic Acids Res. 50, D1115–D1122 (2022).

    CAS  PubMed  Google Scholar 

  66. Raudvere, U. et al. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 47, W191–W198 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. Sollis, E. et al. The NHGRI-EBI GWAS catalog: knowledgebase and deposition resource. Nucleic Acids Res. 51, D977–D985 (2023).

    CAS  PubMed  Google Scholar 

  68. Ribeiro, D. diogomribeiro/noncoding_rarevariant: v2. Zenodo https://doi.org/10.5281/zenodo.15546894 (2025).

Download references

Acknowledgements

This work was funded by a Swiss National Science Foundation project grant (no. PP00P3_176977) to O.D. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank Z. Kutalik for fruitful discussions and support.

Author information

Authors and Affiliations

Authors

Contributions

D.M.R. and R.J.H. performed the experiments. D.M.R. analyzed the data and wrote the manuscript with input from O.D. D.M.R., S.R. and O.D. conceived, designed and managed the study.

Corresponding authors

Correspondence to Diogo M. Ribeiro or Olivier Delaneau.

Ethics declarations

Competing interests

O.D. is a current employee of Regeneron Genetics Center, which is a subsidiary of Regeneron Pharmaceuticals. D.M.R. currently works as a contractor for F. Hoffmann-La Roche AG. The other authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Andrew Johnson, Jacob Ulirsch and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–20.

Reporting Summary

Peer Review File

Supplementary Tables

Supplementary Tables 1–12.

Source data

Source Data Fig. 2

Statistical source data for Fig. 2.

Source Data Fig. 3

Statistical source data for Fig. 3.

Source Data Fig. 4

Statistical source data for Fig. 4.

Source Data Fig. 5

Statistical source data for Fig. 5.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ribeiro, D.M., Hofmeister, R.J., Rubinacci, S. et al. Noncoding rare variant associations with blood traits in 166,740 UK Biobank genomes. Nat Genet 57, 2146–2155 (2025). https://doi.org/10.1038/s41588-025-02288-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41588-025-02288-x

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing