Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

cellSTAAR: incorporating single-cell-sequencing-based functional data to boost power in rare variant association testing of noncoding regions

Abstract

Understanding how rare genetic variants influence complex traits remains a major challenge, particularly when these variants lie in noncoding regions of the genome. The effects of variants within candidate cis-regulatory elements (cCREs) often depend on the cell type, making interpretation difficult. Here we introduce cellSTAAR, which integrates whole-genome sequencing data with single-cell assay for transposase-accessible chromatin using sequencing data to capture variability in chromatin accessibility across cell types via the construction of cell-type-specific functional annotations and regulatory elements. To reflect the uncertainty in cCRE–gene linking, cellSTAAR uses a comprehensive strategy to link cCREs to their target genes. We applied cellSTAAR to data from the Trans-Omics for Precision Medicine consortium (n 60,000) and replicated our findings using the UK Biobank (n 190,000). Across four lipid traits, cellSTAAR improved the detection of biologically meaningful associations and enhanced biological interpretability. These results demonstrate the potential of cell-type-aware approaches to boost discovery in rare variant whole-genome sequencing association studies.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: cellSTAAR overview.
Fig. 2: Variability in association across linking approaches and cell types (including three bulk comparisons from liver).
Fig. 3: Percentage enrichment by cell type.

Similar content being viewed by others

Data availability

This paper used the TOPMed Freeze 8 WGS data and lipid phenotype data. Genotype and phenotype data are both available in the database of Genotypes and Phenotypes. The TOPMed WGS data were from the following 20 study phases (accession numbers provided in parentheses): Old Order Amish (phs000956.v1.p1), Atherosclerosis Risk in Communities Study (phs001211), Mt Sinai BioMe Biobank (phs001644), Coronary Artery Risk Development in Young Adults (phs001612), Cleveland Family Study (phs000954), Cardiovascular Health Study (phs001368), Diabetes Heart Study (phs001412), FHS (phs000974), Genetic Study of Atherosclerosis Risk (phs001218), Genetic Epidemiology Network of Arteriopathy (phs001345), Genetic Epidemiology Network of Salt Sensitivity (phs001217), Genetics of Lipid Lowering Drugs and Diet Network (phs001359), Hispanic Community Health Study - Study of Latinos (phs001395), Hypertension Genetic Epidemiology Network and Genetic Epidemiology Network of Arteriopathy (phs001293), JHS (phs000964), Multi-Ethnic Study of Atherosclerosis (phs001416), San Antonio Family Heart Study (phs001215), Genome-wide Association Study of Adiposity in Samoans (phs000972), Taiwan Study of Hypertension using Rare Variants (phs001387) and Women’s Health Initiative (phs001237). UKB WGS data are available from the UKB Research Analysis Platform, and the UKB analyses were conducted using the UKB resource under application 52008. The single-cell ATAC-seq used from CATlas is publicly available at http://catlas.org/humanenhancer/.

Code availability

cellSTAAR is freely available as an R package at https://github.com/edvanburen/cellSTAAR/, vcf2agds68 was used to preprocess the UKB WGS data and is freely available as a collection of applets in the UKB RAP at https://github.com/drarwood/vcf2agds_overview. GENESIS18, available at https://bioconductor.org/packages/release/bioc/html/GENESIS.html and FastSparseGRM69, available at https://github.com/rounakdey/FastSparseGRM are freely available as R packages, and were used to calculate the ancestral principal components and sparse GRMs for TOPMed and UKB data, respectively. Code used in the analysis has been archived on Zenodo at https://doi.org/10.5281/zenodo.16113567 (ref. 70).

References

  1. Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  2. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  3. All of Us Research Program Investigators; Denny, J. C. et al. The “All of Us” Research Program. N. Engl. J. Med. 381, 668–676, (2019).

  4. Li, Z. et al. A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies. Nat. Methods 19, 1599–1611 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  5. Bansal, V., Libiger, O., Torkamani, A. & Schork, N. J. Statistical analysis strategies for association studies involving rare variants. Nat. Rev. Genet. 11, 773–785 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  6. Kiezun, A. et al. Exome sequencing and the genetic basis of complex traits. Nat. Genet. 44, 623–630 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  7. Lee, S., Abecasis, G. R., Boehnke, M. & Lin, X. Rare-variant association analysis: study designs and statistical tests. Am. J. Hum. Genet. 95, 5–23 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. Wu, M. C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  9. Morris, A. P. & Zeggini, E. An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet. Epidemiol. 34, 188–193 (2010).

    Article  PubMed  Google Scholar 

  10. Liu, Y. et al. ACAT: a fast and powerful P value combination method for rare-variant analysis in sequencing studies. Am. J. Hum. Genet. 104, 410–421 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  11. Lee, S., Wu, M. C. & Lin, X. Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13, 762–775 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Sun, J., Zheng, Y. & Hsu, L. A unified mixed-effects model for rare-variant association in sequencing studies. Genet. Epidemiol. 37, 334–344 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Li, X. et al. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nat. Genet. 52, 969–983 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Weissbrod, O. et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 52, 1355–1363 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  15. Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. Gogarten, S. M. et al. Genetic association testing using the GENESIS R/Bioconductor package. Bioinformatics 35, 5346–5348 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  19. Zhou, H. et al. FAVOR: functional annotation of variants online resource and annotator for variation across the human genome. Nucleic Acids Res. 51, D1300–D1311 (2023).

    Article  PubMed  Google Scholar 

  20. Preissl, S., Gaulton, K. J. & Ren, B. Characterizing cis-regulatory elements using single-cell epigenomics. Nat. Rev. Genet. https://doi.org/10.1038/s41576-022-00509-1 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Nasser, J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. Gasperini, M., Tome, J. M. & Shendure, J. Towards a comprehensive catalogue of validated and target-linked human enhancers. Nat. Rev. Genet. 21, 292–310 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  23. Zhang, K. et al. A single-cell atlas of chromatin accessibility in the human genome. Cell 184, 5985–6001 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  24. Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Corces, M. R. et al. Single-cell epigenomic analyses implicate candidate causal variants at inherited risk loci for Alzheimer’s and Parkinson’s diseases. Nat. Genet. 52, 1158–1168 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. Schilder, B. M. & Raj, T. Fine-mapping of Parkinson’s disease susceptibility loci identifies putative causal variants. Hum. Mol. Genet. 31, 888–900 (2022).

    Article  PubMed  CAS  Google Scholar 

  27. Selvaraj, M. S. et al. Whole genome sequence analysis of blood lipid levels in >66,000 individuals. Nat. Commun. 13, 5995 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  28. Moore, J. E., Pratt, H. E., Purcaro, M. J. & Weng, Z. A curated benchmark of enhancer-gene interactions for evaluating enhancer-target gene prediction methods. Genome Biol. 21, 17 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Boix, C. A., James, B. T., Park, Y. P., Meuleman, W. & Kellis, M. Regulatory genomic circuitry of human disease loci by integrative epigenomics. Nature 590, 300–307 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. Abascal, F. et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).

    Article  Google Scholar 

  32. Jagadeesh, K. A. et al. Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics. Nat. Genet. https://doi.org/10.1038/s41588-022-01187-9 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Zhang, M. J. et al. Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data. Nat. Genet. https://doi.org/10.1038/s41588-022-01167-z (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Hamel, A. R. et al. Integrating genetic regulation and single-cell expression with GWAS prioritizes causal genes and cell types for glaucoma. Nat. Commun. 15, 396 (2024).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Yin, M. et al. sc2GWAS: a comprehensive platform linking single cell and GWAS traits of human. Nucleic Acids Res. https://doi.org/10.1093/nar/gkae1008 (2024).

  36. Das, A. C. et al. Single-cell chromatin accessibility data combined with GWAS improves detection of relevant cell types in 59 complex phenotypes. Int. J. Mol. Sci. 23, 11456 (2022).

  37. Fishilevich, S. et al. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database 2017, bax028 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  39. Forrest, A. R. R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).

    Article  PubMed  CAS  Google Scholar 

  40. Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Article  CAS  Google Scholar 

  41. Consortium, I. The impact of genomic variation on function (IGVF) Consortium. Nature 663, 47–57 (2024).

  42. Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Rozenblatt-Rosen, O., Stubbington, M. J. T., Regev, A. & Teichmann, S. A. The Human Cell Atlas: from vision to reality. Nature 550, 451–453 (2017).

    Article  PubMed  CAS  Google Scholar 

  44. Chen, C. -H. et al. Determinants of transcription factor regulatory range. Nat. Commun. 11, 2472 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  45. Schaffner, S. F. et al. Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 15, 1576–1583 (2005).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. Li, X. et al. A statistical framework for multi-trait rare variant analysis in large-scale whole-genome sequencing studies. Nat. Comput. Sci. https://doi.org/10.1038/s43588-024-00764-8 (2025).

  47. Graham, S. E. et al. The power of genetic diversity in genome-wide association studies of lipids. Nature 600, 675–679 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  48. Klarin, D. et al. Genetics of blood lipids among ~300,000 multi-ethnic participants of the Million Veteran Program. Nat. Genet. 50, 1514–1523 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  49. Maestri, A. et al. Lipid droplets, autophagy, and ageing: a cell-specific tale. Ageing Res. Rev. 94, 102194 (2024).

    Article  PubMed  CAS  Google Scholar 

  50. Molenaar, M. R., Penning, L. C. & Helms, J. B. Playing Jekyll and Hyde—the dual role of lipids in fatty liver disease. Cells 9, 2244 (2020).

  51. Schulze, R. J., Schott, M. B., Casey, C. A., Tuma, P. L. & McNiven, M. A. The cell biology of the hepatocyte: a membrane trafficking machine. J. Cell Biol. 218, 2096–2112 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  52. Rutkowski, J. M., Stern, J. H. & Scherer, P. E. The cell biology of fat expansion. J. Cell Biol. 208, 501–512 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  53. He, Q. et al. Role of liver sinusoidal endothelial cell in metabolic dysfunction-associated fatty liver disease. Cell Commun. Signal. 22, 346 (2024).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  54. Hussain, M. M. Intestinal lipid absorption and lipoprotein formation. Curr. Opin. Lipidol. 25, 200–206 (2014).

  55. Jones, R. C. et al. The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science https://doi.org/10.1126/science.abl4896 (2022).

  56. Ignatiadis, N. & Huber, W. Covariate powered cross-weighted multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 83, 720–751 (2021).

    Article  Google Scholar 

  57. Hao, Y. et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol. 42, 293–304 (2024).

    Article  PubMed  CAS  Google Scholar 

  58. Engreitz, J. M. et al. Deciphering the impact of genomic variation on function. Nature 633, 47–57 (2024).

    Article  Google Scholar 

  59. Hindy, G. et al. Rare coding variants in 35 genes associate with circulating lipid levels—a multi-ancestry analysis of 170,000 exomes. Am. J. Hum. Genet. 109, 81–96 (2022).

    Article  PubMed  CAS  Google Scholar 

  60. Safarova, M. et al. Advances in targeting LDL cholesterol: PCSK9 inhibitors and beyond. Am. J. Prev. Cardiol. 19, 100701 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  61. Wadhera, R. K., Steen, D. L., Khan, I., Giugliano, R. P. & Foody, J. M. A review of low-density lipoprotein cholesterol, treatment strategies, and its impact on cardiovascular disease morbidity and mortality. J. Clin. Lipidol. 10, 472–489 (2016).

    Article  PubMed  Google Scholar 

  62. Liu, Y. & Xie, J. Cauchy combination test: a powerful test with analytic P-value calculation under arbitrary dependency structures. J. Am. Stat. Assoc. 115, 393–402 (2020).

    Article  PubMed  CAS  Google Scholar 

  63. Breslow, N. E. & Clayton, D. G. Approximate inference in generalized linear mixed models. J. Am. Stat. Assoc. 88, 9–25 (1993).

    Article  Google Scholar 

  64. Chen, H. et al. Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole-genome sequencing studies. Am. J. Hum. Genet. 104, 260–274 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  65. Chen, H. et al. Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models. Am. J. Hum. Genet. 98, 653–666 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  66. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  67. Conomos, M. P., Reiner, A. P., Weir, B. S. & Thornton, T. A. Model-free estimation of recent genetic relatedness. Am. J. Hum. Genet. 98, 127–148 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  68. Li, X. et al. Streamlining large-scale genomic data management: Insights from the UK Biobank whole-genome sequencing data. Cell Genom. https://doi.org/10.1016/j.xgen.2025.101009 (2025).

  69. Lin, X. et al. Scalable analysis of large multi-ancestry biobanks by leveraging sparse ancestry-adjusted sample-relatedness. Preprint at Research Square https://doi.org/10.21203/rs.3.rs-5343361/v1 (2024).

  70. Van Buren, E. cellSTAAR paper analysis code. Zenodo https://doi.org/10.5281/zenodo.16113567 (2025).

Download references

Acknowledgements

This work was supported by grants R35-CA197449, U19-CA203654, U01-HG012064 and U01-HG009088 (to X. Lin); R01-HL142711 and R01-HL127564 (to P.N. and G.M.P.); 75N92020D00001, HHSN268201500003I, N01-HC-95159, 75N92020D00005, N01-HC-95160, 75N92020D00002, N01-HC-95161, 75N92020D00003, N01-HC-95162, 75N92020D00006, N01-HC-95163, 75N92020D00004, N01-HC-95164, 75N92020D00007, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-000040, UL1-TR-001079, UL1-TR-001420, UL1-TR001881, DK063491, R01-HL071051, R01-HL071205, R01-HL071250, R01-HL071251, R01-HL071258, R01-HL071259 and UL1-RR033176 (to J.R. and Y.C.); 1R35-HL135818, R01-HL113338 and HL046389 (to S.R.); HL105756 (to B.P.); HHSN268201600018C, HHSN268201600001C, HHSN268201600002C, HHSN268201600003C and HHSN268201600004C (to C.K.); R01-MD012765 and R01-DK117445 (to N.F.); R01-HL153805 and R03-HL154284 (to B.E.C.); HHSN268201700001I, HHSN268201700002I, HHSN268201700003I, HHSN268201700005I and HHSN268201700004I (to E.B.); U01- HL072524, R01-HL104135-04S1, U01-HL054472, U01-HL054473, U01-HL054495, U01-HL054509 and R01-HL055673-18S1 (to D.K.A.); U01-HL72518, HL087698, HL49762, HL59684, HL58625, HL071025, HL112064, NR0224103 and M01-RR000052 (to the Johns Hopkins General Clinical Research Center); R01-HL133040 (to R.L.M.); R01-HL093093 (to S. T. McGarvey); R01-HL173044 and R01-AG085581 (to X. Li); and NHLBI TOPMed Fellowship 75N92021F00229 (to X. Li and M.S.S.). The Cardiovascular Health Study research was supported by NHLBI contracts HHSN268201200036C, HHSN268200800007C, HHSN268201800001C, N01HC55222, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, N01HC85086 and 75N92021D00006; and NHLBI grants R01-HL172803, U01HL080295, R01-HL087652, R01-HL105756, R01-HL103612, R01-HL120393 and U01HL130114, with additional contribution from the National Institute of Neurological Disorders and Stroke. Additional support was provided through R01AG023629 from the National Institute on Aging. A full list of principal CHS investigators and institutions can be found at CHS-NHLBI (https://chs-nhlbi.org/). This work was also supported by R01-HL92301, R01-HL67348, R01-NS058700, R01-AR48797, R01-DK071891, R01-AG058921, the General Clinical Research Center of the Wake Forest University School of Medicine (M01-RR07122, F32 HL085989), the American Diabetes Association and a pilot grant from the Claude Pepper Older Americans Independence Center of Wake Forest University Health Sciences (P60 AG10484). The Coronary Artery Risk Development in Young Adults Study (CARDIA) is conducted and supported by the NHLBI in collaboration with the University of Alabama at Birmingham (75N92023D00002, 75N92023D00005), Northwestern University (75N92023D00004), University of Minnesota (75N92023D00006) and Kaiser Foundation Research Institute (75N92023D00003). The FHS acknowledges the support of contracts NO1-HC-25195, HHSN268201500001I and 75N92019D00031 from the NHLBI and grant supplement R01-HL092577-06S1 for this research. We also acknowledge the dedication of the FHS study participants without whom this research would not be possible. R.S.V. is supported in part by the Evans Medical Foundation and the Jay and Louis Coffman Endowment from the Department of Medicine, Boston University School of Medicine. The JHS is supported and conducted in collaboration with Jackson State University (HHSN268201800013I), Tougaloo College (HHSN268201800014I), the Mississippi State Department of Health (HHSN268201800015I) and the University of Mississippi Medical Center (HHSN268201800010I, HHSN268201800011I and HHSN268201800012I) contracts from the NHLBI and the National Institute on Minority Health and Health Disparities. We also thank the staff and participants of the JHS. Support for GENOA was provided by the NHLBI (U01HL054457, U01HL054464, U01HL054481, R01-HL119443 and R01-HL087660) of the National Institutes of Health. Collection of the San Antonio Family Study data was supported in part by National Institutes of Health grants P01 HL045522, MH078143, MH078111 and MH083824; and WGS of SAFS participants was supported by U01 DK085524 and R01-HL113323. The Diabetes Heart Study was supported by R01-HL92301, R01-HL67348, R01-NS058700, R01-AR48797, R01-DK071891, R01-AG058921, the General Clinical Research Center of the Wake Forest University School of Medicine (M01-RR07122, F32 HL085989), the American Diabetes Association and a pilot grant from the Claude Pepper Older Americans Independence Center of Wake Forest University Health Sciences (P60 AG10484). Molecular data for the TOPMed program was supported by the NHLBI. Genome sequencing for ‘NHLBI TOPMed: Coronary Artery Risk Development in Young Adults (CARDIA)’ (phs001612.v1.p1) was performed at the Baylor Sequencing Center (HHSN268201600033I). Core support, including centralized genomic read mapping and genotype calling, variant quality metrics and filtering, was provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Core support, including phenotype harmonization, data management, sample-identity quality control and general program coordination, was provided by the TOPMed Data Coordinating Center (R01-HL120393, U01HL-120393; contract HHSN268201800001I). Support for the Multi-Ethnic Study of Atherosclerosis was provided by contracts 75N92025D00022, 75N92025D00026, 75N92025D00024, 75N92025D00027, 75N92025D00025 and 75N92025D00028. We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. The full study-specific acknowledgements are detailed in the Supplementary Note.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

E.V.B., Y.Z., X. Li, Z. Li and X. Lin designed the experiments. E.V.B., Y.Z., X. Li and X. Lin performed the experiments. E.V.B., Y.Z., X. Li, Z. Li, H.Z., M.S.S., N.D.P., D.K.A., J.B., E.B., B.E.C., J.C.C., A.P.C., Y.D.I.C., J.C., R.D., M.F., N.F., M.G., C.G., X.G., J.H., N.H.C., L.H., Y.J.H., R.R.K., S.L.R.K., E.K., C.K., B.G.K., L.L., D.L., C.L., S.L., D.L.J., R.J.F.L., A.W.M., L.M., R.L.M., R.J.M., B.M., J.C.M., T.N., K.N., J.O., J.P., P.P., B.P., L.R., R.S.V., S.R., A.R., S.S.R., J.S., B.S., H.T., K.D.T., R.T., S.V., L.Y.,W.Z., J.R., G.M.P., P.N. and X. Lin acquired, analyzed or interpreted data. J.R., G.M.P., P.N. and the NHLBI TOPMed Lipids Working Group provided administrative, technical or material support. E.V.B. and X. Lin drafted the manuscript and revised it according to suggestions by the coauthors. All authors critically reviewed the manuscript, suggested revisions as needed and approved the final version.

Corresponding author

Correspondence to Xihong Lin.

Ethics declarations

Competing interests

E.K. has received personal fees from Regeneron Pharmaceuticals, 23&Me, Allelica and Illumina; has received research funding from Allelica; and serves on the advisory boards for Encompass Biosciences, Overtone and Galateo Bio. P.N. reports research grants from Allelica, Amgen, Apple, Boston Scientific, Cleerly, Genentech/Roche, Ionis, Novartis and Silence Therapeutics, personal fees from AIRNA, Allelica, Apple, AstraZeneca, Bain Capital, Blackstone Life Sciences, Bristol Myers Squibb, Creative Education Concepts, CRISPR Therapeutics, Eli Lilly & Co, Esperion Therapeutics, Foresite Capital, Foresite Labs, Genentech/Roche, GV, HeartFlow, Magnet Biomedicine, Merck, Novartis, Novo Nordisk, TenSixteen Bio and Tourmaline Bio; equity in Bolt, Candela, Mercury, MyOme, Parameter Health, Preciseli and TenSixteen Bio; royalties from Recora for intensive cardiac rehabilitation; and spousal employment at Vertex Pharmaceuticals, all unrelated to the present work. B.M.P. serves on the Steering Committee of the Yale Open Data Access Project funded by Johnson & Johnson. L.M.R. is a consultant for the TOPMed Administrative Coordinating Center (ACC) through Westat. X. Lin is a consultant of AbbVie Pharmaceuticals and Verily Life Sciences. The other authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Lin Tang, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Van Buren, E., Zhang, Y., Li, X. et al. cellSTAAR: incorporating single-cell-sequencing-based functional data to boost power in rare variant association testing of noncoding regions. Nat Methods (2025). https://doi.org/10.1038/s41592-025-02919-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41592-025-02919-5

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research