Abstract
Large biobanks with whole-genome sequencing (WGS) now enable the association of noncoding rare variants with complex human traits. Given that >98% of the genome is available for exploration, the selection of noncoding variants remains a critical yet unresolved challenge in these analyses. Here we leverage knowledge of blood gene regulation and deleteriousness scores to select noncoding variants pertinent for association with blood-related traits. Integrating WGS and 42 blood cell count and biomarker measurements for 166,740 UK Biobank samples, we perform variant collapsing tests, identifying hundreds of gene–trait associations involving noncoding variants. However, we demonstrate that most of these noncoding rare variant associations (1) reproduce associations known from previous studies and (2) are driven by linkage disequilibrium between nearby common and rare variants. This study underscores the prevailing challenges in rare variant analysis and the need for caution when interpreting noncoding rare variant association results.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
The lists of gene–trait associations identified here are available as Supplementary Tables 1–12. These lists and other source data are available via Zenodo at https://doi.org/10.5281/zenodo.15546894 (ref. 68). The WGS data and individual phenotypes used for association testing can be accessed via the UKBB Research Analysis Platform: https://ukbiobank.dnanexus.com/landing. This platform is open to researchers who are listed as collaborators on UKBB-approved access applications. Public regulatory annotation datasets used include: the ABC models, available at https://www.engreitzlab.org/resources; promoter capture Hi-C data, obtained from ref. 28, accessible at https://osf.io/u8tzp; and CRDs obtained from the original publications29,30. CADD scores of deleteriousness are available at https://cadd.gs.washington.edu. Known gene–trait associations used are available in the GWAS catalog (https://www.ebi.ac.uk/gwas) and Genebass databases (https://app.genebass.org). Source data are provided with this paper.
Code availability
Code to reproduce analysis and plots are avilable via GitHub at https://github.com/diogomribeiro/noncoding_rarevariant and via Zenodo at https://doi.org/10.5281/zenodo.15546894.
References
Abdellaoui, A., Yengo, L., Verweij, K. J. H. & Visscher, P. M. 15 years of GWAS discovery: realizing the promise. Am. J. Hum. Genet. 110, 179–194 (2023).
Rubinacci, S., Delaneau, O. & Marchini, J. Genotype imputation using the positional Burrows Wheeler transform. PLoS Genet. 16, e1009049 (2020).
Li, Y., Willer, C., Sanna, S. & Abecasis, G. Genotype imputation. Annu. Rev. Genom. Hum. Genet. 10, 387–406 (2009).
Wu, M. C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011).
Bocher, O. & Génin, E. Rare variant association testing in the non-coding genome. Hum. Genet. 139, 1345–1362 (2020).
Wainschtein, P. et al. Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nat. Genet. 54, 263–273 (2022).
Zuk, O. et al. Searching for missing heritability: designing rare variant association studies. Proc. Natl Acad. Sci. USA 111, E455–E464 (2014).
Wang, Q. et al. Rare variant contribution to human disease in 281,104 UK Biobank exomes. Nature 597, 527–532 (2021).
Momozawa, Y. & Mizukami, K. Unique roles of rare variants in the genetics of complex diseases in humans. J. Hum. Genet. 66, 11–23 (2021).
Backman, J. D. et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature 599, 628–634 (2021).
Hall, S. S. Genetics: a gene of rare effect. Nature 496, 152–155 (2013).
Sabatine, M. S. et al. Evolocumab and clinical outcomes in patients with cardiovascular disease. N. Engl. J. Med. 376, 1713–1722 (2017).
Karczewski, K. J. et al. Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes. Cell Genom. 2, 100168 (2022).
Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Hawkes, G. et al. Whole genome sequencing analysis identifies rare, large-effect noncoding variants and regulatory regions associated with circulating protein levels. Nat. Genet. 57, 626–634 (2025).
Hawkes, G. et al. Whole-genome sequencing in 333,100 individuals reveals rare non-coding single variant and aggregate associations with height. Nat. Commun. 15, 8549 (2024).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Halldorsson, B. V. et al. The sequences of 150,119 genomes in the UK Biobank. Nature 607, 732–740 (2022).
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).
All of Us Research Program Investigators et al. The ‘All of Us’ research program. N. Engl. J. Med. 381, 668–676 (2019).
Andersson, R. & Sandelin, A. Determinants of enhancer and promoter activities of regulatory elements. Nat. Rev. Gene. 21, 71–87 (2020).
Claringbould, A. & Zaugg, J. B. Enhancers in disease: molecular basis and emerging treatment strategies. Trends Mol. Med. 27, 1060–1073 (2021).
Ribeiro, D. M. et al. The molecular basis, genetic control and pleiotropic effects of local gene co-expression. Nat. Commun. 12, 4842 (2021).
Hoellinger, T. et al. Enhancer/gene relationships: need for more reliable genome-wide reference sets. Front. Bioinform. 3, 1092853 (2023).
Sonawane, A. R. et al. Understanding tissue-specific gene regulation. Cell Rep. 21, 1077–1088 (2017).
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384 (2016).
Delaneau, O. et al. Chromatin three-dimensional interactions mediate genetic effects on gene expression. Science 364, eaat8266 (2019).
Avalos, D. et al. Genetic variation in cis-regulatory domains suggests cell type-specific regulatory mechanisms in immunity. Commun. Biol. 6, 335 (2023).
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Boix, C. A., James, B. T., Park, Y. P., Meuleman, W. & Kellis, M. Regulatory genomic circuitry of human disease loci by integrative epigenomics. Nature 590, 300–307 (2021).
Fulco, C. P. et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).
Nasser, J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021).
Gazal, S. et al. Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity. Nat. Genet. 54, 827–836 (2022).
Dey, K. K. et al. SNP-to-gene linking strategies reveal contributions of enhancer-related and candidate master-regulator genes to autoimmune disease. Cell Genom. 2, 100145 (2022).
Bocher, O. et al. Testing for association with rare variants in the coding and non-coding genome: RAVA-FIRST, a new approach based on CADD deleteriousness score. PLoS Genet. 18, e1009923 (2022).
Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019).
Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 53, 1097–1103 (2021).
Ribeiro, D. M., Ziyani, C. & Delaneau, O. Shared regulation and functional relevance of local gene co-expression revealed by single cell analysis. Commun. Biol. 5, 876 (2022).
Hambleton, S. et al. IRF8 mutations and human dendritic-cell immunodeficiency. N. Engl. J. Med. 365, 127–138 (2011).
Tamura, T., Kurotaki, D. & Koizumi, S.-I. Regulation of myelopoiesis by the transcription factor IRF8. Int. J. Hematol. 101, 342–351 (2015).
Karczewski, K. J. et al. Pan-UK Biobank GWAS improves discovery, analysis of genetic architecture, and resolution into ancestry-enriched effects. Preprint at bioRxiv https://doi.org/10.1101/2024.03.13.24303864 (2024).
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).
Vuckovic, D. et al. The polygenic and monogenic basis of blood traits and diseases. Cell 182, 1214–1231 (2020).
Mostafavi, H., Spence, J. P., Naqvi, S. & Pritchard, J. K. Systematic differences in discovery of genetic effects on gene expression and complex traits. Nat. Genet. 55, 1866–1875 (2023).
Connally, N. J. et al. The missing link between genetic association and regulatory function. eLife 11, e74970 (2022).
Li, Z. et al. Dynamic scan procedure for detecting rare-variant association regions in whole-genome sequencing studies. Am. J. Hum. Genet. 104, 802–814 (2019).
Yang, Y. et al. eSCAN: scan regulatory regions for aggregate association testing using whole-genome sequencing data. Brief. Bioinform. 23, bbab497 (2022).
Li, Z. et al. A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies. Nat. Methods 19, 1599–1611 (2022).
Li, X. et al. Powerful, scalable and resource-efficient meta-analysis of rare variant associations in large whole genome sequencing studies. Nat. Genet. 55, 154–164 (2023).
Hu, Y. et al. Whole-genome sequencing association analysis of quantitative red blood cell phenotypes: the NHLBI TOPMed program. Am. J. Hum. Genet. 108, 874–893 (2021).
Gaynor, S. M. et al. Yield of genetic association signals from genomes, exomes, and imputation in the UK Biobank. Nat. Genet. 56, 2345–2351 (2024).
Yengo, L. et al. A saturated map of common genetic variants associated with human height. Nature 610, 704–712 (2022).
Graham, S. E. et al. The power of genetic diversity in genome-wide association studies of lipids. Nature 600, 675–679 (2021).
Dickson, S. P., Wang, K., Krantz, I., Hakonarson, H. & Goldstein, D. B. Rare variants create synthetic genome-wide associations. PLoS Biol. 8, e1000294 (2010).
Anderson, C. A., Soranzo, N., Zeggini, E. & Barrett, J. C. Synthetic associations are unlikely to account for many common disease genome-wide association signals. PLoS Biol. 9, e1000580 (2011).
Wray, N. R., Purcell, S. M. & Visscher, P. M. Synthetic associations created by rare variants do not explain most GWAS results. PLoS Biol. 9, e1000579 (2011).
Hofmeister, R. J., Ribeiro, D. M., Rubinacci, S. & Delaneau, O. Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank. Nat. Genet. 55, 1243–1249 (2023).
Dutta, D. et al. Meta-MultiSKAT: multiple phenotype meta-analysis for region-based association test. Genet. Epidemiol. 43, 800–814 (2019).
Smedley, D. et al. A whole-genome analysis framework for effective identification of pathogenic regulatory variants in Mendelian disease. Am. J. Hum. Genet. 99, 595–606 (2016).
Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).
Ionita-Laza, I., McCallum, K., Xu, B. & Buxbaum, J. D. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat. Genet. 48, 214–220 (2016).
Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023).
Lee, B. T. et al. The UCSC Genome Browser database: 2022 update. Nucleic Acids Res. 50, D1115–D1122 (2022).
Raudvere, U. et al. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 47, W191–W198 (2019).
Sollis, E. et al. The NHGRI-EBI GWAS catalog: knowledgebase and deposition resource. Nucleic Acids Res. 51, D977–D985 (2023).
Ribeiro, D. diogomribeiro/noncoding_rarevariant: v2. Zenodo https://doi.org/10.5281/zenodo.15546894 (2025).
Acknowledgements
This work was funded by a Swiss National Science Foundation project grant (no. PP00P3_176977) to O.D. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank Z. Kutalik for fruitful discussions and support.
Author information
Authors and Affiliations
Contributions
D.M.R. and R.J.H. performed the experiments. D.M.R. analyzed the data and wrote the manuscript with input from O.D. D.M.R., S.R. and O.D. conceived, designed and managed the study.
Corresponding authors
Ethics declarations
Competing interests
O.D. is a current employee of Regeneron Genetics Center, which is a subsidiary of Regeneron Pharmaceuticals. D.M.R. currently works as a contractor for F. Hoffmann-La Roche AG. The other authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Andrew Johnson, Jacob Ulirsch and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–20.
Supplementary Tables
Supplementary Tables 1–12.
Source data
Source Data Fig. 2
Statistical source data for Fig. 2.
Source Data Fig. 3
Statistical source data for Fig. 3.
Source Data Fig. 4
Statistical source data for Fig. 4.
Source Data Fig. 5
Statistical source data for Fig. 5.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ribeiro, D.M., Hofmeister, R.J., Rubinacci, S. et al. Noncoding rare variant associations with blood traits in 166,740 UK Biobank genomes. Nat Genet 57, 2146–2155 (2025). https://doi.org/10.1038/s41588-025-02288-x
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41588-025-02288-x