Abstract
Microbial genome-wide association studies (GWAS) have uncovered numerous host genetic variants associated with gut microbiota. However, links between host genetics, the gut microbiome and specific cellular contexts remain unclear. Here we use a computational framework, scBPS (single-cell Bacteria Polygenic Score), to integrate existing microbial GWAS and single-cell RNA-sequencing profiles of 24 human organs, including the liver, pancreas, lung and intestine, to identify host tissues and cell types relevant to gut microbes. Analysing 207 microbial taxa and 254 host cell types, scBPS-inferred cellular enrichments confirmed known biology such as dominant communications between gut microbes and the digestive tissue module and liver epithelial cell compartment. scBPS also identified a robust association between Collinsella and the central-veinal hepatocyte subpopulation. We experimentally validated the causal effects of Collinsella on cholesterol metabolism in mice through single-nuclei RNA sequencing on liver tissue to identify relevant cell subpopulations. Mechanistically, oral gavage of Collinsella modulated cholesterol pathway gene expression in central-veinal hepatocytes. We further validated our approach using independent microbial GWAS data, alongside single-cell and bulk transcriptomic analyses, demonstrating its robustness and reproducibility. Together, scBPS enables a systematic mapping of the host–microbe crosstalk by linking cell populations to their interacting gut microbes.
This is a preview of subscription content, access via your institution
Access options






Similar content being viewed by others
Data availability
The microbial GWAS summary data of the Dutch Microbiome Project were downloaded from https://dutchmicrobiomeproject.molgeniscloud.org. The Tabula Sapiens human single-cell transcriptome data were downloaded from https://tabula-sapiens-portal.ds.czbiohub.org/. The GWAS summary data of MiBioGen project were downloaded from https://www.mibiogen.org/. The GWAS summary data of 10 liver-associated diseases were downloaded from the FinnGen database at https://r8.risteys.finngen.fi/ (accession phenocodes: T2D, T2D_WIDE, NAFLD, K11_TOXLIV, K11_FIBROCHIRLIV, FIBROLIV, E4_HYPERCHOL, E4_FH, E4_FH_IHD, CHIRHEP_NAS and C3_LIVER_INTRAHEPATIC_BILE_DUCTS_EXALLC). The KEGG pathways were downloaded from https://www.genome.jp/kegg/. The snRNA-seq data of mice livers are deposited in the Gene Expression Omnibus (GEO) database under accession number GSE289267. Source data are provided with this paper.
Code availability
Codes used for the analyses are provided in Zenodo at https://doi.org/10.5281/zenodo.15073160 (ref. 97).
References
Brandl, K., Kumar, V. & Eckmann, L. Gut–liver axis at the frontier of host–microbial interactions. Am. J. Physiol. Gastrointest. Liver Physiol. 312, G413–G419 (2017).
Tang, W. W., Li, D. Y. & Hazen, S. L. Dietary metabolism, the gut microbiome, and heart failure. Nat. Rev. Cardiol. 16, 137–154 (2019).
Schuit, F. C., Huypens, P., Heimberg, H. & Pipeleers, D. G. Glucose sensing in pancreatic β-cells: a model for the study of other glucose-regulated cells in gut, pancreas, and hypothalamus. Diabetes 50, 1–11 (2001).
Mayer, E. A., Nance, K. & Chen, S. The gut–brain axis. Annu. Rev. Med. 73, 439–453 (2022).
Yang, T., Richards, E. M., Pepine, C. J. & Raizada, M. K. The gut microbiota and the brain–gut–kidney axis in hypertension and chronic kidney disease. Nat. Rev. Nephrol. 14, 442–456 (2018).
Budden, K. F. et al. Emerging pathogenic links between microbiota and the gut–lung axis. Nat. Rev. Microbiol. 15, 55–63 (2017).
Floyd, J. L. & Grant, M. B. The gut–eye axis: lessons learned from murine models. Ophthalmol. Ther. 9, 499–513 (2020).
Org, E. et al. Genetic and environmental control of host–gut microbiota interactions. Genome Res. 25, 1558–1569 (2015).
Leamy, L. J. et al. Host genetics and diet, but not immunoglobulin A expression, converge to shape compositional features of the gut microbiome in an advanced intercross population of mice. Genome Biol. 15, 552 (2014).
Goodrich, J. K. et al. Human genetics shape the gut microbiome. Cell 159, 789–799 (2014).
Goodrich, J. K. et al. Genetic determinants of the gut microbiome in UK twins. Cell Host Microbe 19, 731–743 (2016).
Lopera-Maya, E. A. et al. Effect of host genetics on the gut microbiome in 7,738 participants of the Dutch Microbiome Project. Nat. Genet. 54, 143–151 (2022).
Ruhlemann, M. C. et al. Genome-wide association study in 8,956 German individuals identifies influence of ABO histo-blood groups on gut microbiome. Nat. Genet. 53, 147–155 (2021).
Wang, J. et al. Genome-wide association analysis identifies variation in vitamin D receptor and other host factors influencing the gut microbiota. Nat. Genet. 48, 1396–1406 (2016).
Bonder, M. J. et al. The effect of host genetics on the gut microbiome. Nat. Genet. 48, 1407–1412 (2016).
Kurilshikov, A. et al. Large-scale association analyses identify host factors influencing human gut microbiome composition. Nat. Genet. 53, 156–165 (2021).
Srinivas, G. et al. Genome-wide mapping of gene–microbiota interactions in susceptibility to autoimmune skin blistering. Nat. Commun. 4, 2462 (2013).
Parks, B. W. et al. Genetic control of obesity and gut microbiota composition in response to high-fat, high-sucrose diet in mice. Cell Metab. 17, 141–152 (2013).
McKnite, A. M. et al. Murine gut microbiota is defined by host genetics and modulates variation of metabolic traits. PLoS ONE 7, e39191 (2012).
Aguet, F. et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Zhang, Y. et al. Benchmarking algorithms for pathway activity transformation of single-cell RNA-seq data. Comput. Struct. Biotechnol. J. 18, 2953–2961 (2020).
Xiang, B. et al. Single cell sequencing analysis identifies genetics-modulated ORMDL3+ cholangiocytes having higher metabolic effects on primary biliary cholangitis. J. Nanobiotechnol. 19, 406 (2021).
Elmentaite, R., Domínguez Conde, C., Yang, L. & Teichmann, S. A. Single-cell atlases: shared and tissue-specific cell types across human organs. Nat. Rev. Genet. 23, 395–410 (2022).
Ma, Y. et al. Systematic dissection of pleiotropic loci and critical regulons in excitatory neurons and microglia relevant to neuropsychiatric and ocular diseases. Transl. Psychiatry 15, 24 (2025).
Ma, Y. et al. Integrating single-cell sequencing data with GWAS summary statistics reveals CD16+ monocytes and memory CD8+ T cells involved in severe COVID-19. Genome Med. 14, 16 (2022).
Zhang, M. J. et al. Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data. Nat. Genet. 54, 1572–1580 (2022).
Calderon, D. et al. Inferring relevant cell types for complex traits by using single-cell gene expression. Am. J. Hum. Genet. 101, 686–699 (2017).
Ongen, H. et al. Estimating the causal tissues for complex traits and diseases. Nat. Genet. 49, 1676–1683 (2017).
Ma, Y. et al. Polygenic regression uncovers trait-relevant cellular contexts through pathway activation transformation of single-cell RNA sequencing data. Cell Genom. 3, 100383 (2023).
Bryois, J. et al. Genetic identification of cell types underlying brain complex traits yields insights into the etiology of Parkinson’s disease. Nat. Genet. 52, 482–493 (2020).
Jagadeesh, K. A. et al. Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics. Nat. Genet. 54, 1479–1492 (2022).
Tabula Sapiens Consortium. The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science 376, eabl4896 (2022).
de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
Frost, H. R. Variance-adjusted Mahalanobis (VAM): a fast and accurate method for cell-specific gene set scoring. Nucleic Acids Res. 48, e94 (2020).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).
Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).
Louis, P., Hold, G. L. & Flint, H. J. The gut microbiota, bacterial metabolites and colorectal cancer. Nat. Rev. Microbiol. 12, 661–672 (2014).
Schroeder, B. O. & Bäckhed, F. Signals from the gut microbiota to distant organs in physiology and disease. Nat. Med. 22, 1079–1089 (2016).
Bravo González-Blas, C. et al. SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks. Nat. Methods 20, 1355–1367 (2023).
Astbury, S. et al. Lower gut microbiome diversity and higher abundance of proinflammatory genus Collinsella are associated with biopsy-proven nonalcoholic steatohepatitis. Gut Microbes 11, 569–580 (2020).
Lee, N. Y. et al. Lactobacillus attenuates progression of nonalcoholic fatty liver disease by lowering cholesterol and steatosis. Clin. Mol. Hepatol. 27, 110–124 (2021).
Zhang, X. et al. Dietary cholesterol drives fatty liver-associated liver cancer by modulating gut microbiota and metabolites. Gut 70, 761–774 (2021).
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).
Lonsdale, J. et al. The genotype-tissue expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015).
McLoughlin, K. et al. Host selection of microbiota via differential adhesion. Cell Host Microbe 19, 550–559 (2016).
Schluter, J. & Foster, K. R. The evolution of mutualism in gut microbiota via host epithelial selection. PLoS Biol. 10, e1001424 (2012).
Pettersen, V. K. & Arrieta, M.-C. Host–microbiome intestinal interactions during early life: considerations for atopy and asthma development. Curr. Opin. Allergy Clin. Immunol. 20, 138–148 (2020).
Tripathi, A. et al. The gut–liver axis and the intersection with the microbiome. Nat. Rev. Gastroenterol. Hepatol. 15, 397–411 (2018).
Choi, W. et al. Serotonin signals through a gut–liver axis to regulate hepatic steatosis. Nat. Commun. 9, 4824 (2018).
Delzenne, N. M. et al. Contribution of the gut microbiota to the regulation of host metabolism and energy balance: a focus on the gut–liver axis. Proc. Nutr. Soc. 78, 319–328 (2019).
Consortium, G. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Bernstein, B. E. et al. The NIH roadmap epigenomics mapping consortium. Nat. Biotechnol. 28, 1045–1048 (2010).
Rozowsky, J. et al. The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models. Cell 186, 1493–1511.e40 (2023).
Liu, Z. et al. Network analyses in microbiome based on high-throughput multi-omics data. Brief. Bioinform. 22, 1639–1655 (2021).
Matchado, M. S. et al. Network analysis methods for studying microbial communities: a mini review. Comput. Struct. Biotechnol. J. 19, 2687–2698 (2021).
Ma, Y. & Li, M. D. Establishment of a strong link between smoking and cancer pathogenesis through DNA methylation analysis. Sci. Rep. 7, 1811 (2017).
Ma, Y. et al. Integration of human organoids single‐cell transcriptomic profiles and human genetics repurposes critical cell type‐specific drug targets for severe COVID‐19. Cell Prolif. 57, e13558 (2023).
Kriaa, A. et al. Microbial impact on cholesterol and bile acid metabolism: current status and future prospects. J. Lipid Res. 60, 323–332 (2019).
Aizarani, N. et al. A human liver cell atlas reveals heterogeneity and epithelial progenitors. Nature 572, 199–204 (2019).
Halpern, K. B. et al. Single-cell spatial reconstruction reveals global division of labour in the mammalian liver. Nature 542, 352–356 (2017).
Benito-Vicente, A. et al. Familial hypercholesterolemia: the most frequent cholesterol metabolism disorder caused disease. Int. J. Mol. Sci. 19, 3426 (2018).
Parham, J. S. & Goldberg, A. C. Review of recent clinical trials and their impact on the treatment of hypercholesterolemia. Prog. Cardiovasc. Dis. 75, 90–96 (2022).
Bepler, T. & Berger, B. Learning the protein language: evolution, structure, and function. Cell Syst. 12, 654–669.e53 (2021).
Li, C. et al. Gut microbiome and metabolome profiling in Framingham heart study reveals cholesterol-metabolizing bacteria. Cell 187, 1834–1852.e19 (2024).
Richter, M. L. et al. Single-nucleus RNA-seq2 reveals functional crosstalk between liver zonation and ploidy. Nat. Commun. 12, 4264 (2021).
Wang, X. et al. Comparative analysis of cell lineage differentiation during hepatogenesis in humans and mice at the single-cell transcriptome level. Cell Res. 30, 1109–1126 (2020).
Gury-BenAri, M. et al. The spectrum and regulatory landscape of intestinal innate lymphoid cells are shaped by the microbiome. Cell 166, 1231–1246.e13 (2016).
Andrlová, H. et al. MAIT and Vδ2 unconventional T cells are supported by a diverse intestinal microbiome and correlate with favorable patient outcome after allogeneic HCT. Sci. Transl. Med. 14, eabj2829 (2022).
Banerjee, A. et al. Succinate produced by intestinal microbes promotes specification of tuft cells to suppress ileal inflammation. Gastroenterology 159, 2101–2115.e5 (2020).
Haber, A. L. et al. A single-cell survey of the small intestinal epithelium. Nature 551, 333–339 (2017).
Zhao, R. et al. Single-cell and spatiotemporal transcriptomic analyses reveal the effects of microorganisms on immunity and metabolism in the mouse liver. Comput. Struct. Biotechnol. J. 21, 3466–3477 (2023).
Kovatcheva-Datchary, P. et al. Simplified intestinal microbiota to study microbe–diet–host interactions in a mouse model. Cell Rep. 26, 3772–3783.e6 (2019).
Mager, L. F. et al. Microbiome-derived inosine modulates response to checkpoint inhibitor immunotherapy. Science 369, 1481–1489 (2020).
Drokhlyansky, E. et al. The human and mouse enteric nervous system at single-cell resolution. Cell 182, 1606–1622.e23 (2020).
Gomez-Arango, L. F. et al. Low dietary fiber intake increases Collinsella abundance in the gut microbiota of overweight and obese pregnant women. Gut Microbes 9, 189–201 (2018).
Sui, G., Jia, L., Quan, D., Zhao, N. & Yang, G. Activation of the gut microbiota–kynurenine–liver axis contributes to the development of nonalcoholic hepatic steatosis in nondiabetic adults. Aging 13, 21309 (2021).
Wang, C. et al. Integrated microbiome and metabolome analysis reveals correlations between gut microbiota components and metabolic profiles in mice with methotrexate-induced hepatoxicity. Drug Des. Devel. Ther. 16, 3877–3891 (2022).
Yin, X. et al. Structural changes of gut microbiota in a rat non-alcoholic fatty liver disease model treated with a Chinese herbal formula. Syst. Appl. Microbiol. 36, 188–196 (2013).
Yang, X. et al. Alleviating effects of noni fruit polysaccharide on hepatic oxidative stress and inflammation in rats under a high-fat diet and its possible mechanisms. Food Funct. 11, 2953–2968 (2020).
Khan, T. J. et al. Atorvastatin treatment modulates the gut microbiota of the hypercholesterolemic patients. Omics 22, 154–163 (2018).
Martínez, I. et al. Diet-induced metabolic improvements in a hamster model of hypercholesterolemia are strongly linked to alterations of the gut microbiota. Appl. Environ. Microbiol. 75, 4175–4184 (2009).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Ma, Y. et al. Integrative genomics analysis reveals a 21q22.11 locus contributing risk to COVID-19. Hum. Mol. Genet. 30, 1247–1258 (2021).
Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Skene, N. G. et al. Genetic identification of brain cell types underlying schizophrenia. Nat. Genet. 50, 825–833 (2018).
Skene, N. G. & Grant, S. G. Identification of vulnerable cell types in major brain disorders using single cell transcriptomes and expression weighted cell type enrichment. Front. Neurosci. 10, 16 (2016).
Liao, Y., Wang, J., Jaehnig, E. J., Shi, Z. & Zhang, B. WebGestalt 2019: gene set analysis toolkit with revamped UIs and APIs. Nucleic Acids Res. 47, W199–W205 (2019).
Alghamdi, N. et al. A graph neural network model to estimate cell-wise metabolic flux using single-cell RNA-seq data. Genome Res. 31, 1867–1884 (2021).
Kurki, M. I. et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 613, 508–518 (2023).
Lin, L. et al. The airway microbiome mediates the interaction between environmental exposure and respiratory health in humans. Nat. Med. 29, 1750–1759 (2023).
Ong, J.-S. et al. A comprehensive re-assessment of the association between vitamin D and cancer susceptibility using Mendelian randomization. Nat. Commun. 12, 246 (2021).
Zackular, J. P. et al. Dietary zinc alters the microbiota and decreases resistance to Clostridium difficile infection. Nat. Med. 22, 1330–1334 (2016).
Smith, A. B. et al. Enterococci enhance Clostridioides difficile pathogenesis. Nature 611, 780–786 (2022).
Yu, J. et al. Bifidobacterium longum promotes postoperative liver function recovery in patients with hepatocellular carcinoma. Cell Host Microbe 32, 131–144.e6 (2024).
Paik, D. et al. Human gut bacteria produce ΤΗ17-modulating bile acid metabolites. Nature 603, 907–912 (2022).
Li, J. et al. scBPS (version 1.0.0). Zenodo https://doi.org/10.5281/zenodo.15073160 (2025).
Acknowledgements
We thank J. Chen and H. Liu for technical assistance, and W. Pan for support in molecular experiments. This study was funded by the National Natural Science Foundation of China (32200535 to Y.M.), the Zhejiang Provincial Natural Science Foundation of China (2025C02153 to J.S.), and the China Postdoctoral Science Foundation (2023M732679 to J.L.).
Author information
Authors and Affiliations
Contributions
J.S., J.L. and Y.M. designed the study and developed statistical methodologies. W.-H.C., Y.C., Q.R., Q.Z. and Y.L. designed the animal study and conducted wet-lab experiments. J.L., Y.M., G.Z., C.C., Y. Zhou, Y. Zhang and C.D. performed data analysis and visualization. Y.M., W.-H.C. and J.S. provided guidance on data analysis and biological interpretations. J.L., Y.M., W.-H.C. and J.S. wrote the paper and response letters.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Microbiology thanks Alexander Kurilshikov, Qingbo Wang, Martin Zhang and Tao Zhang for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Performance comparison for scBPS.
a, The quantile-quantile plot illustrates results from null simulations. We randomly selected 1,000 as putative disease genes with random GWAS gene weights that matching the MAGMA z-score distributions of the gut microbes. b, Power analysis for causal simulations. Expression levels of causal genes were increased by factors of 1.1, 1.2, 1.3, 1.4, and 1.5 in the target cell types. We assessed the power to identify these target cell types at an FDR of 0.05 across these various fold changes. c, Impact of GWAS power on results for scBPS, scDRS, scPagwas, LDSC-SEG and MAGMA Celltyping. We analyzed GWAS summary statistics of 12 UK Biobank traits across varying subsample sizes (5 K, 10 K, 20 K, 50 K, and 80 K samples) coupled with the human single-cell atlas using different methods. The median number of discovered cell-types for the 12 traits were grouped by GWAS sample size. Dots represent traits and error bars denote upper and lower quantiles. The 12 traits are ASM (Asthma), BMD-HT (Heel Test), CLC (Clinical LDL Cholesterol), ECOL (College Education), Eczema, HDL (Free Cholesterol in HDL), LDL (Free Cholesterol in LDL), RBC (Red Blood Cell Count), RDW (Red Cell Distribution Width), SBP (Systolic Blood Pressure), Smoking, VLDL (Free Cholesterol in VLDL).
Extended Data Fig. 2 Associations between gut microbial clusters and human tissue modules.
a, Representative taxa at different taxonomic level are displayed for each tissue-related bacteria cluster. The color of the dots indicated the percentage of taxa within the taxonomic level that assigned to the respective bacteria clusters. b, Barplot summarizing the fraction of associations with FDR < 0.05 for each module. c, Representation of interaction strengths between the four tissue modules and the three bacteria clusters, highlighting the top three associations. The width of the line corresponded to the median value of BPSAUC of the extended-data-figure_fig2.tif organ-taxon pairs within the respective module pairs. d, The top ten microbial taxa within cluster T1 amd T2 in association with tissue module M1. e, Heatmap exhibiting the association strengths between the three bacteria clusters and the 24 tissues. f, Boxplot summarizing distribution of BPSAUC values for bacteria clusters T1, T2, and T3 in association with the 24 tissues. The central lines indicated the median values. The lower and upper hinges indicated the first and third quartiles. The lower and upper whiskers extended from the hinge to the smallest and largest values no further than 1.5× the interquartile range from the hinge. g, Histograms for the BPSAUC scores of all the organ-bacterial taxon associations. Vertical redline indicated cutoff for outliers of the distribution that identified by exploreThresholds function in AUCell R package. h, The associations of the organs with the bacterial taxa at all taxonomic levels that met the stringent threshold described in Fig. 1d. The colors of the boxes were corresponded to the BPSAUC values of respective organ-taxon pairs. i, Boxplot summarizing distribution of BPSAUC values for each tissue.
Extended Data Fig. 3 Associations of gut microbial taxa with human tissues from validation datasets.
a, Venn plot showing the overlap of microbial taxa identified to be associated with kidney, heart, pancreas, and eye, as identified by our computational framework, with that identified from the GTEx and the Franke lab dataset using LDSC-SEG. The names of the overlapped taxa were labeled. b, Dot plots showing correlation of the number of taxa (stringent criterion) for individual organs, as identified by our computational framework, with that identified from the GTEx dataset using a significance threshold of p < 0.05 (left) and p < 0.1 (right). Significance was measured by Spearman correlation analysis. c, Same as panel (b) for results from the Franke lab dataset. d, Independent validation using GWAS summary data from the MiBioGen project. Correlation between the BPSAUC values of all organ-microbe pairs using GWAS data from the Lifelines project and those using GWAS data from the MiBioGen project. P-values were calculated by Pearson correlation analysis e, Same as panel (d) for all cell type-microbe pairs.
Extended Data Fig. 4 Associations between gut microbiome and host cell types.
a, Representative taxa at the family levels were displayed for each cell-type-related bacteria cluster from Fig. 2a. The color of the dots indicated the percentage of taxa within the taxonomic level that assigned to the respective bacteria clusters. b, Overlap of the cell-type-related bacteria clusters (C1-C9, as shown in Fig. 3a) with the tissue-related bacteria clusters (T1-T3, as shown in Fig. 2a). The color of dots represented the percentage of taxa within cell-type-related clusters that belong to each tissue-related cluster. c, Comparison of BPSAUC values across four cell type compartments. Boxplots inside the violin plots showed distribution of the BPSAUC values. The central line indicated the median. The lower and upper hinges indicated the first and third quartiles. The lower and upper whiskers extended from the hinge to the smallest and largest values no further than 1.5× the interquartile range from the hinge. Significances were tested using Wilcoxon Rank-sum tests. d, e, Barplots summarizing the fraction of associations with FDR < 0.05 for each cell-type module (d) and for cell types from the target organs within the four compartments (e). f, The top 10 interactions between the five cell type modules and the nine bacteria clusters. g, The connectivity among the taxa in terms of their BPSAUC values at cell type level. Node size was relative to centrality of the taxon. The edges indicate strong correlation (coefficient > 0.8) between the connected taxa. h, Histograms for the BPSAUC scores of all the cell type-bacterial taxon associations. Vertical redline indicated cutoff for outliers of the distribution that identified by exploreThresholds function in AUCell R package. i, Bar plot summarized the cell type profiles of significant associations that reached the stringent threshold. The number of significant associations for individual cell type were shown in stacked bars. All taxonomic levels were summarized. j, Comparison of the bacterial taxa within the Stringent, Moderate and Nonsig groups regarding their significance of association with liver or hepatocyte terms in the Roadmap dataset and the EN-TEX dataset. P-values were determined using the Wilcoxon Rank-sum tests. k, Bar plots showing weighted degree, degree, and Pageranks scores of the 44 taxa in the network in Fig. 2h.
Extended Data Fig. 5 Heterogeneity of hepatocytes in gene expression and metabolism profiles.
a, Results of gene set enrichment analysis (GSEA) on genes correlated with Collinsella BPS scores among the hepatocytes. Significant pathways that reached the threshold of FDR < 0.05 were displayed. The pathways involved in cholesterol biosynthesis and metabolism were highlighted in red. b, Correlation of cholesterol pathways with Collinsella compared to random pathways. P-values were calculated using a Monte Carlo simulation approach. c, Correlations of Collinsella BPS quintiles with mean pathway scores across all KEGG pathways. Significance was assessed using linear regression model. d, Comparsons of mean pathway scores between cells with top and bottom 20% (left panel), 5% (middle panel) and 1% (right panel) BPS Collinsella scores. Significance assessed using t-test. e, Marker genes for hepatocyte zonation and their expression levels in the three subpopulations. f, Dot plots showing differentially expressed genes for the three hepatocyte subpopulations. The number of up-regulated (red) and down-regulated (blue) genes were summarized. g, Differential functions among the three hepatocyte subpopulations. Gene set enrichment analysis (GSEA) was performed to identify distinctive functional profiles for the three subpopulations. h, Expression level of gene CYP7A1 (the rate-limiting step of the bile-acid biosynthetic pathway), HMGCR (rate-limiting enzyme of cholesterol synthesis), and NR1H4 (the gene encoding FXR receptor) in the subpopulations of hepatocytes. i, Rankings of top 30 marker metabolic reactions for each hepatocyte subpopulation. j, Mean level of metabolic flux for reaction “Cholesterol −> Chenodeoxycholate” in the three hepatocyte subpopulations. k, Correlation of the reaction flux for “Cholesterol −> Chenodeoxycholate” and the Collinsella BPS values among the hepatocytes, as stratified by zonation. Significances of correlations were estimated by Pearson correlation analyses.
Extended Data Fig. 6 Association of Collinsella with liver-related diseases.
a, Distribution of DPS values of the 10 liver-associated diseases among the 24 tissues. The central line indicated the median. The lower and upper hinges indicated the first and third quartiles. The lower and upper whiskers extended from the hinge to the smallest and largest values no further than 1.5× the interquartile range from the hinge. b, Distribution of DPS values of the 10 liver-associated diseases among the 16 liver cell types. The central line indicated the median. The lower and upper hinges indicated the first and third quartiles. The lower and upper whiskers extended from the hinge to the smallest and largest values no further than 1.5× the interquartile range from the hinge. c, Associations of all (upper) and top 1000 (bottom) magma z-scores derived from Collinsella with disease GWAS data. Association coefficient and p-values were generated by linear regression analyses. d, Mendelian Randomization analysis inferring causal relationships of Collinsella with both HYPERCHOL and FH. Forest plot showed the MR estimates and 95% CI values of the bi-directional causal effects between Collinsella and both HYPERCHOL and FH, as estimated using eight different two-sample MR methods. The P values calculated by each MR method were listed. e, Multivariate linear regression model estimated association between Collinsella and HYPERCHOL among the hepatocytes, stratified by zonation, while adjusting for the effect of cholesterol biosynthesis. f, g, Multivariate linear regression model estimating association between Collinsella and FH among the hepatocytes, stratified by zonation, while adjusting for effect of cholesterol metabolism pathway (f) and cholesterol biosynthesis pathway (g). h, Predicted PROSE distance of three Collinsella isolate proteins with key proteins involved in cholesterol metabolism. Collinsella proteins with both sequence and structure similarity are highlighted in black. Those with only structure similarity are shown in gray.
Extended Data Fig. 7 Normalized t-statistics between cells from the expected and unexpected cell types for the eight hepatocyte-associated bacteria.
a, Normalized t-statistics for differences of scBPS values between expected (hepatocytes) and unexpected (other) cells for each taxon applying different gene selection threshold. b, Normalized t-statistics for differences of BPSAUC values between hepatocyte to other cell types.
Supplementary information
Supplementary Information (download PDF )
Supplementary Figs. 1–9, Methods and Results.
Source data
Source Data Fig. 1 (download XLSX )
Statistical source data.
Source Data Fig. 2 (download XLSX )
Statistical source data.
Source Data Fig. 3 (download XLSX )
Statistical source data.
Source Data Fig. 4 (download XLSX )
Statistical source data.
Source Data Fig. 5 (download XLSX )
Statistical source data.
Source Data Fig. 6 (download PDF )
Unprocessed western blots and/or gels.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, J., Ma, Y., Cao, Y. et al. Integrating microbial GWAS and single-cell transcriptomics reveals associations between host cell populations and the gut microbiome. Nat Microbiol 10, 1210–1226 (2025). https://doi.org/10.1038/s41564-025-01978-w
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41564-025-01978-w


