Abstract
Multiancestry statistical fine-mapping of cis-molecular quantitative trait loci (cis-molQTL) aims to improve the precision of distinguishing causal cis-molQTLs from tagging variants. Here we present the sum of shared single effects (SuShiE) model, which leverages linkage disequilibrium heterogeneity to improve fine-mapping precision, infer cross-ancestry effect size correlations and estimate ancestry-specific expression prediction weights. Through extensive simulations, we find that SuShiE consistently outperforms existing methods. We apply SuShiE to 36,907 molecular phenotypes including mRNA expression and protein levels from individuals of diverse ancestries in the TOPMed-MESA and GENOA studies. SuShiE fine-maps cis-molQTLs for 18.2% more genes compared with existing methods while prioritizing fewer variants and exhibiting greater functional enrichment. While SuShiE infers highly consistent cis-molQTL architectures across ancestries, it finds evidence of heterogeneity at genes with predicted loss-of-function intolerance. Lastly, using SuShiE-derived cis-molQTL effect sizes, we perform transcriptome- and proteome-wide association studies on six white blood cell-related traits in the All of Us biobank and identify 25.4% more genes compared with existing methods. Overall, SuShiE provides new insights into the cis-genetic architecture of molecular traits.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
SuShiE-derived prediction models (in both tsv format and FUSION format) for TWAS, PWAS, fine-mapping and other analyzed results across cis-molQTL datasets are available via Zenodo at https://doi.org/10.5281/zenodo.10963033 (ref. 69). The TOPMed-MESA data can be found and requested at dbGaP: phs000209.v13.p3, phs001416.v3.p1 and phs001416.v1.p1. The GENOA data can be found and requested at dbGaP: phs001238.v2.p1 and GEO: GSE138914. The GEUVADIS data can be found at https://www.internationalgenome.org/data-portal/data-collection/geuvadis. The INTERVAL data can be found and requested at https://ega-archive.org/datasets/EGAD00001004080. The summary statistics in Chen et al. can be found at https://doi.org/10.1016/j.cell.2020.06.045. The LDSC annotation files can be found at https://console.cloud.google.com/storage/browser/broad-alkesgroup-public-requester-pays/. The ENCODE cCRE v3 can be found at https://screen.encodeproject.org/index/cversions. The snATAC-seq cCRE can be found at (ref. 24). The scATAC-seq cCRE can be found at (ref. 25). The All of Us data can be requested through https://allofus.nih.gov. The 1000G project data can be found at https://www.internationalgenome.org. The gnomAD v4.0 dataset for pLI and LOEUF is available at https://gnomad.broadinstitute.org/news/2023-11-gnomad-v4-0/. The RVIS dataset can be found at (ref. 39). The shet dataset can be found at (ref. 70). The EDS dataset can be found at (ref. 38).
Code availability
SuShiE v0.16 software is available via GitHub at https://github.com/mancusolab/sushie. The analysis codes for simulation and real-data analysis of this Article are available via GitHub at https://github.com/mancusolab/sushie-project-codes and https://doi.org/10.5281/zenodo.10963033 (ref. 69). The twas_sim software is available via GitHub at https://github.com/mancusolab/twas_sim. TOPMed RNA-seq Harmonization pipeline instructions are available via GitHub at https://github.com/broadinstitute/gtex-pipeline/blob/master/TOPMed_RNAseq_pipeline.md. The GTEx eQTL analysis pipeline is available at https://www.gtexportal.org/home/methods. The PLINK2 software is available at https://www.cog-genomics.org/plink/2.0. The BCFTOOLS v1.21 software is available at https://samtools.github.io/bcftools/bcftools.html. The FUSION pipeline is available at http://gusevlab.org/projects/fusion/. The LiftOver software is available at https://genome.ucsc.edu/cgi-bin/hgLiftOver. The WashU Epigenome Browser is available at https://epigenomegateway.wustl.edu/. The Plotgardener v1.8.3 software is available via GitHub at https://github.com/PhanstielLab/plotgardener/. The AnnoQ is available at http://annoq.org. The SuSiEx v1.1.2 software is available via GitHub at https://github.com/getian107/SuSiEx. The MESuSiE software is available via GitHub at https://github.com/borangao/MESuSiE. The XMAP v1.0.1 software is available via GitHub at https://github.com/YangLabHKUST/XMAP.
References
Cheung, V. G. et al. Mapping determinants of human gene expression by regional and genome-wide association. Nature 437, 1365–1369 (2005).
Aguet, F. et al. Molecular quantitative trait loci. Nat. Rev. Methods Prim. 3, 4 (2023).
Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Ser. B 82, 1273–1300 (2020).
Wen, X., Luca, F. & Pique-Regi, R. Cross-population joint analysis of eQTLs: fine mapping and functional annotation. PLoS Genet. 11, e1005176 (2015).
Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015).
LaPierre, N. et al. Identifying causal variants by fine mapping across multiple studies. PLoS Genet. 17, e1009733 (2021).
Yuan, K. et al. Fine-mapping across diverse ancestries drives the discovery of putative causal variants underlying human complex traits and diseases. Nat. Genet. 56, 1841–1850 (2024).
Gao, B. & Zhou, X. MESuSiE enables scalable and powerful multi-ancestry fine-mapping of causal variants in genome-wide association studies. Nat. Genet. 56, 170–179 (2024).
Cai, M. et al. XMAP: cross-population fine-mapping by leveraging genetic diversity and accounting for confounding bias. Nat. Commun. 14, 6870 (2023).
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
Bild, D. E. et al. Ethnic differences in coronary calcification: the Multi-Ethnic Study of Atherosclerosis (MESA). Circulation 111, 1313–1320 (2005).
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).
Shang, L. et al. Genetic architecture of gene expression in European and African Americans: an eQTL mapping study in GENOA. Am. J. Hum. Genet. 106, 496–512 (2020).
The All of Us Research Program Genomics Investigators. Genomic data in the All of Us Research Program. Nature 627, 340–346 (2024).
Zou, Y., Carbonetto, P., Wang, G. & Stephens, M. Fine-mapping from summary data with the ‘Sum of Single Effects’ model. PLoS Genet. 18, e1010299 (2022).
Tibshirani, R. Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. Ser. B 58, 267–288 (1996).
Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005).
Clark, S. A. & van der Werf, J. Genomic best linear unbiased prediction (gBLUP) for the estimation of genomic breeding values. Methods Mol. Biol. 1019, 321–330 (2013).
GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Võsa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).
Mostafavi, H., Spence, J. P., Naqvi, S. & Pritchard, J. K. Systematic differences in discovery of genetic effects on gene expression and complex traits. Nat. Genet. 55, 1866–1875 (2023).
ENCODE Project Consortium et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
Chiou, J. et al. Interpreting type 1 diabetes risk with genetics and single-cell epigenomics. Nature 594, 398–402 (2021).
Satpathy, A. T. et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat. Biotechnol. 37, 925–936 (2019).
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
Sun, B. B. et al. Genomic atlas of the human plasma proteome. Nature 558, 73–79 (2018).
Cai, J. et al. URGCP promotes non-small cell lung cancer invasiveness by activating the NF-κB–MMP-9 pathway. Oncotarget 6, 36489–36504 (2015).
Kachuri, L. et al. Gene expression in African Americans, Puerto Ricans and Mexican Americans reveals ancestry-specific patterns of genetic architecture. Nat. Genet. 55, 952–963 (2023).
Shi, H. et al. Localizing components of shared transethnic genetic architecture of complex traits from GWAS summary data. Am. J. Hum. Genet. 106, 805–817 (2020).
Shi, H. et al. Population-specific causal disease effect sizes in functionally important regions impacted by selection. Nat. Commun. 12, 1098 (2021).
Hou, K. et al. Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals. Nat. Genet. 55, 549–558 (2023).
Saito, S. et al. Gene-specific somatic epigenetic mosaicism of FDFT1 underlies a non-hereditary localized form of porokeratosis. Am. J. Hum. Genet. 111, 896–912 (2024).
Taylor, D. J. et al. Sources of gene expression variation in a globally diverse human cohort. Nature 632, 122–130 (2024).
Brown, B. C., Asian Genetic Epidemiology Network Type 2 Diabetes Consortium, Ye, C. J., Price, A. L. & Zaitlen, N. Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 99, 76–88 (2016).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Wang, X. & Goldstein, D. B. Enhancer domains predict gene pathogenicity and inform gene discovery in complex disease. Am. J. Hum. Genet. 106, 215–233 (2020).
Petrovski, S., Wang, Q., Heinzen, E. L., Allen, A. S. & Goldstein, D. B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).
Zeng, T., Spence, J. P., Mostafavi, H. & Pritchard, J. K. Bayesian estimation of gene constraint from an evolutionary model with gene features. Nat. Genet. 56, 1632–1643 (2024).
Lu, Z. et al. Multi-ancestry fine-mapping improves precision to identify causal genes in transcriptome-wide association studies. Am. J. Hum. Genet. 109, 1388–1404 (2022).
Tapia, A. L. et al. A large-scale transcriptome-wide association study (TWAS) of 10 blood cell phenotypes reveals complexities of TWAS fine-mapping. Genet. Epidemiol. 46, 3–16 (2022).
Wen, J. et al. Transcriptome-wide association study of blood cell traits in African ancestry and Hispanic/Latino populations. Genes 12, 1049 (2021).
Rowland, B. et al. Transcriptome-wide association study in UK Biobank Europeans identifies associations with blood cell traits. Hum. Mol. Genet. 31, 2333–2347 (2022).
Chen, M.-H. et al. Trans-ethnic and ancestry-specific blood-cell genetics in 746,667 individuals from 5 global populations. Cell 182, 1198–1213.e14 (2020).
Ding, Y. et al. Polygenic scoring accuracy varies across the genetic ancestry continuum. Nature 618, 774–781 (2023).
Mester, R. et al. Impact of cross-ancestry genetic architecture on GWASs in admixed populations. Am. J. Hum. Genet. 110, 927–939 (2023).
Zhang, J. & Stram, D. O. The role of local ancestry adjustment in association studies using admixed populations. Genet. Epidemiol. 38, 502–515 (2014).
Wang, L. et al. Integrating single cell expression quantitative trait loci summary statistics to understand complex trait risk genes. Nat. Commun. 15, 4260 (2024).
Bhattacharya, A. et al. Best practices for multi-ancestry, meta-analytic transcriptome-wide association studies: lessons from the Global Biobank Meta-analysis Initiative. Cell Genom. 2, 100180 (2022).
Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51, 592–599 (2019).
Mancuso, N. et al. Probabilistic fine-mapping of transcriptome-wide association studies. Nat. Genet. 51, 675–682 (2019).
Blei, D. M., Kucukelbir, A. & McAuliffe, J. D. Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112, 859–877 (2017).
Blei, D. M., Ng, A. Y. & Jordan, M. I. Latent Dirichlet Allocation. In Advances in Neural Information Processing Systems Vol. 14, 601–608 (MIT Press, 2002).
Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).
Frankish, A. et al. GENCODE 2021. Nucleic Acids Res. 49, D916–D923 (2021).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Hujoel, M. L. A., Gazal, S., Hormozdiari, F., van de Geijn, B. & Price, A. L. Disease heritability enrichment of regulatory elements is concentrated in elements with ancient sequence age and conserved function across species. Am. J. Hum. Genet. 104, 611–624 (2019).
Wen, X. Molecular QTL discovery incorporating genomic annotations using Bayesian false discovery rate control. Ann. Appl. Stat. 10, 1619–1638 (2016).
Liu, Z. et al. Annotation Query (AnnoQ): an integrated and interactive platform for large-scale genetic variant annotation. Nucleic Acids Res. 50, W57–W65 (2022).
Li, D. et al. WashU Epigenome Browser update 2022. Nucleic Acids Res. 50, W774–W781 (2022).
Kramer, N. E. et al. Plotgardener: cultivating precise multi-panel figures in R. Bioinformatics 38, 2042–2045 (2022).
Hudson, R. R., Slatkin, M. & Maddison, W. P. Estimation of levels of gene flow from DNA sequence data. Genetics 132, 583–589 (1992).
Bhatia, G., Patterson, N., Sankararaman, S. & Price, A. L. Estimating and interpreting FST: the impact of rare variants. Genome Res. 23, 1514–1521 (2013).
Lu, Z. et al. Improved multi-ancestry fine-mapping identifies cis-regulatory variants underlying molecular traits and disease risk. Zenodo https://doi.org/10.5281/zenodo.10963033 (2025).
Zeng, T., Spence, J. P., Mostafavi, H. & Pritchard, J. K.and found highly significantraint from an evolutionary model with gene features. Nat. Genet. 56, 1632–1643 (2024).
Acknowledgements
We thank members of the Mancuso and Gazal laboratories for fruitful discussions regarding this Article. We also thank M. D. Edge for his thoughtful comments and suggestions. This work was funded in part by National Institutes of Health (NIH) under awards R01HG012133 (N.M.), R01CA258808 (N.M.), R01GM140287 (P.M.), R35GM142783 (N.M.), R01GM140287 (P.M.), U54HG013243 (L.W.), R35GM147789 (S.G.), K08HL159346 (J.P.), R00CA246076 (L.K.) and R01MH125252 (A.G.). MESA phenotypes (dbGaP: phs000209.v13.p3): MESA and the MESA SHARe project are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support for MESA is provided by contracts HHSN268201500003I, N01-HC-95159, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-001079, UL1-TR000040, UL1-TR-001420, UL1-TR-001881 and DK063491. Funding for SHARe genotyping was provided by NHLBI Contract N02-HL-64278. TOPMed-MESA WGS genotype, mRNA and protein expression data (dbGaP: phs001416.v3.p1): molecular data for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung and Blood Institute (NHLBI). WGS genotype data for NHLBI TOPMed: MESA (phs001416.v3.p1) was performed at Broad Genomics (HHSN268201600034I). mRNA expression data for NHLBI TOPMed: MESA (phs001416.v3.p1) was performed at NWGC (HHSN268201600032I). SOMAscan proteomics for NHLBI TOPMed: Multi-Ethnic Study of Atherosclerosis (MESA) (phs001416.v1.p1) was performed at the Broad Institute and Beth Israel Proteomics Platform (HHSN268201600034I). Core support including centralized genomic read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Core support including phenotype harmonization, data management, sample-identity quality control, and general program coordination were provided by the TOPMed Data Coordinating Center (R01HL-120393; U01HL-120393; contract HHSN268201800001I). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. GENOA genotype (dbGaP: phs001238.v2.p1) and gene expression (GEO: GSE138914) data were supported by grants from NIH NHLBI (HL054457, HL054464, HL054481, HL119443 and HL087660). We acknowledge S. Kardia and J. Smith in preparing GENOA eQTL data. The All of Us Research Program is supported by the National Institutes of Health, Office of the Director: Regional Medical Centers: 1 OT2 OD026549; 1 OT2 OD026554; 1 OT2 OD026557; 1 OT2 OD026556; 1 OT2 OD026550; 1 OT2 OD 026552; 1 OT2 OD026553; 1 OT2 OD026548; 1 OT2 OD026551; 1 OT2 OD026555; IAA #: AOD 16037; Federally Qualified Health Centers: HHSN 263201600085U; Data and Research Center: 5 U2C OD023196; Biobank: 1 U24 OD023121; The Participant Center: U24 OD023176; Participant Technology Systems Center: 1 U24 OD023163; Communications and Engagement: 3 OT2 OD023205; 3 OT2 OD023206; and Community Partners: 1 OT2 OD025277; 3 OT2 OD025315; 1 OT2 OD025337; 1 OT2 OD025276. In addition, the All of Us Research Program would not be possible without the partnership of its participants.
Author information
Authors and Affiliations
Contributions
Z.L. and N.M. developed the model and study design. Z.L. performed simulations and fine-mapping analyses. Z.L., X.W., J.P. and L.K. performed TWAS and AoU analyses. Z.L., M.C. and N.M. developed the model and inference scheme. Z.L. and A.K. prepared functional genomic annotations and performed heritability enrichment analyses. Z.L. and N.M. wrote the initial paper. Z.L., X.W., M.C., A.K., S.G., P.M., L.W., J.P., L.K., A.G. and N.M. edited the final paper.
Corresponding authors
Ethics declarations
Competing interests
L.W. provided consulting service to Pupil Bio Inc. and reviewed manuscripts for Gastroenterology Report, not related to this study, and received honorarium. S.G. received consulting fees from Eleven Therapeutics unrelated to this work. The other authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Biao Zeng and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Notes 1 and 2, and Figs. 1–39.
Supplementary Tables
Supplementary Tables 1–15.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lu, Z., Wang, X., Carr, M. et al. Improved multiancestry fine-mapping identifies cis-regulatory variants underlying molecular traits and disease risk. Nat Genet 57, 1881–1889 (2025). https://doi.org/10.1038/s41588-025-02262-7
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41588-025-02262-7