Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Integrating microbial GWAS and single-cell transcriptomics reveals associations between host cell populations and the gut microbiome

Abstract

Microbial genome-wide association studies (GWAS) have uncovered numerous host genetic variants associated with gut microbiota. However, links between host genetics, the gut microbiome and specific cellular contexts remain unclear. Here we use a computational framework, scBPS (single-cell Bacteria Polygenic Score), to integrate existing microbial GWAS and single-cell RNA-sequencing profiles of 24 human organs, including the liver, pancreas, lung and intestine, to identify host tissues and cell types relevant to gut microbes. Analysing 207 microbial taxa and 254 host cell types, scBPS-inferred cellular enrichments confirmed known biology such as dominant communications between gut microbes and the digestive tissue module and liver epithelial cell compartment. scBPS also identified a robust association between Collinsella and the central-veinal hepatocyte subpopulation. We experimentally validated the causal effects of Collinsella on cholesterol metabolism in mice through single-nuclei RNA sequencing on liver tissue to identify relevant cell subpopulations. Mechanistically, oral gavage of Collinsella modulated cholesterol pathway gene expression in central-veinal hepatocytes. We further validated our approach using independent microbial GWAS data, alongside single-cell and bulk transcriptomic analyses, demonstrating its robustness and reproducibility. Together, scBPS enables a systematic mapping of the host–microbe crosstalk by linking cell populations to their interacting gut microbes.

This is a preview of subscription content, access via your institution

Access options

Fig. 1: Association of gut microbiome with host tissues.
Fig. 2: Associations of gut microbiome with cell types.
Fig. 3: Genus Collinsella shows a robust correlation with hepatocytes.
Fig. 4: Heterogeneous hepatocyte subpopulations interact with Collinsella.
Fig. 5: Cholesterol metabolism in hepatocytes mediated correlation between Collinsella and hypercholesterolaemia.
Fig. 6: Effect of C.aero on cholesterol metabolism in vivo.

Similar content being viewed by others

Data availability

The microbial GWAS summary data of the Dutch Microbiome Project were downloaded from https://dutchmicrobiomeproject.molgeniscloud.org. The Tabula Sapiens human single-cell transcriptome data were downloaded from https://tabula-sapiens-portal.ds.czbiohub.org/. The GWAS summary data of MiBioGen project were downloaded from https://www.mibiogen.org/. The GWAS summary data of 10 liver-associated diseases were downloaded from the FinnGen database at https://r8.risteys.finngen.fi/ (accession phenocodes: T2D, T2D_WIDE, NAFLD, K11_TOXLIV, K11_FIBROCHIRLIV, FIBROLIV, E4_HYPERCHOL, E4_FH, E4_FH_IHD, CHIRHEP_NAS and C3_LIVER_INTRAHEPATIC_BILE_DUCTS_EXALLC). The KEGG pathways were downloaded from https://www.genome.jp/kegg/. The snRNA-seq data of mice livers are deposited in the Gene Expression Omnibus (GEO) database under accession number GSE289267. Source data are provided with this paper.

Code availability

Codes used for the analyses are provided in Zenodo at https://doi.org/10.5281/zenodo.15073160 (ref. 97).

References

  1. Brandl, K., Kumar, V. & Eckmann, L. Gut–liver axis at the frontier of host–microbial interactions. Am. J. Physiol. Gastrointest. Liver Physiol. 312, G413–G419 (2017).

    PubMed  PubMed Central  Google Scholar 

  2. Tang, W. W., Li, D. Y. & Hazen, S. L. Dietary metabolism, the gut microbiome, and heart failure. Nat. Rev. Cardiol. 16, 137–154 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Schuit, F. C., Huypens, P., Heimberg, H. & Pipeleers, D. G. Glucose sensing in pancreatic β-cells: a model for the study of other glucose-regulated cells in gut, pancreas, and hypothalamus. Diabetes 50, 1–11 (2001).

    CAS  PubMed  Google Scholar 

  4. Mayer, E. A., Nance, K. & Chen, S. The gut–brain axis. Annu. Rev. Med. 73, 439–453 (2022).

    CAS  PubMed  Google Scholar 

  5. Yang, T., Richards, E. M., Pepine, C. J. & Raizada, M. K. The gut microbiota and the brain–gut–kidney axis in hypertension and chronic kidney disease. Nat. Rev. Nephrol. 14, 442–456 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Budden, K. F. et al. Emerging pathogenic links between microbiota and the gut–lung axis. Nat. Rev. Microbiol. 15, 55–63 (2017).

    CAS  PubMed  Google Scholar 

  7. Floyd, J. L. & Grant, M. B. The gut–eye axis: lessons learned from murine models. Ophthalmol. Ther. 9, 499–513 (2020).

    PubMed  PubMed Central  Google Scholar 

  8. Org, E. et al. Genetic and environmental control of host–gut microbiota interactions. Genome Res. 25, 1558–1569 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Leamy, L. J. et al. Host genetics and diet, but not immunoglobulin A expression, converge to shape compositional features of the gut microbiome in an advanced intercross population of mice. Genome Biol. 15, 552 (2014).

    PubMed  PubMed Central  Google Scholar 

  10. Goodrich, J. K. et al. Human genetics shape the gut microbiome. Cell 159, 789–799 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Goodrich, J. K. et al. Genetic determinants of the gut microbiome in UK twins. Cell Host Microbe 19, 731–743 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Lopera-Maya, E. A. et al. Effect of host genetics on the gut microbiome in 7,738 participants of the Dutch Microbiome Project. Nat. Genet. 54, 143–151 (2022).

    CAS  PubMed  Google Scholar 

  13. Ruhlemann, M. C. et al. Genome-wide association study in 8,956 German individuals identifies influence of ABO histo-blood groups on gut microbiome. Nat. Genet. 53, 147–155 (2021).

    PubMed  Google Scholar 

  14. Wang, J. et al. Genome-wide association analysis identifies variation in vitamin D receptor and other host factors influencing the gut microbiota. Nat. Genet. 48, 1396–1406 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Bonder, M. J. et al. The effect of host genetics on the gut microbiome. Nat. Genet. 48, 1407–1412 (2016).

    CAS  PubMed  Google Scholar 

  16. Kurilshikov, A. et al. Large-scale association analyses identify host factors influencing human gut microbiome composition. Nat. Genet. 53, 156–165 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Srinivas, G. et al. Genome-wide mapping of gene–microbiota interactions in susceptibility to autoimmune skin blistering. Nat. Commun. 4, 2462 (2013).

    PubMed  Google Scholar 

  18. Parks, B. W. et al. Genetic control of obesity and gut microbiota composition in response to high-fat, high-sucrose diet in mice. Cell Metab. 17, 141–152 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. McKnite, A. M. et al. Murine gut microbiota is defined by host genetics and modulates variation of metabolic traits. PLoS ONE 7, e39191 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Aguet, F. et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

    Google Scholar 

  21. Zhang, Y. et al. Benchmarking algorithms for pathway activity transformation of single-cell RNA-seq data. Comput. Struct. Biotechnol. J. 18, 2953–2961 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Xiang, B. et al. Single cell sequencing analysis identifies genetics-modulated ORMDL3+ cholangiocytes having higher metabolic effects on primary biliary cholangitis. J. Nanobiotechnol. 19, 406 (2021).

    CAS  Google Scholar 

  23. Elmentaite, R., Domínguez Conde, C., Yang, L. & Teichmann, S. A. Single-cell atlases: shared and tissue-specific cell types across human organs. Nat. Rev. Genet. 23, 395–410 (2022).

    CAS  PubMed  Google Scholar 

  24. Ma, Y. et al. Systematic dissection of pleiotropic loci and critical regulons in excitatory neurons and microglia relevant to neuropsychiatric and ocular diseases. Transl. Psychiatry 15, 24 (2025).

    PubMed  PubMed Central  Google Scholar 

  25. Ma, Y. et al. Integrating single-cell sequencing data with GWAS summary statistics reveals CD16+ monocytes and memory CD8+ T cells involved in severe COVID-19. Genome Med. 14, 16 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Zhang, M. J. et al. Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data. Nat. Genet. 54, 1572–1580 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Calderon, D. et al. Inferring relevant cell types for complex traits by using single-cell gene expression. Am. J. Hum. Genet. 101, 686–699 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Ongen, H. et al. Estimating the causal tissues for complex traits and diseases. Nat. Genet. 49, 1676–1683 (2017).

    CAS  PubMed  Google Scholar 

  29. Ma, Y. et al. Polygenic regression uncovers trait-relevant cellular contexts through pathway activation transformation of single-cell RNA sequencing data. Cell Genom. 3, 100383 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Bryois, J. et al. Genetic identification of cell types underlying brain complex traits yields insights into the etiology of Parkinson’s disease. Nat. Genet. 52, 482–493 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Jagadeesh, K. A. et al. Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics. Nat. Genet. 54, 1479–1492 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Tabula Sapiens Consortium. The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science 376, eabl4896 (2022).

    Google Scholar 

  33. de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).

    PubMed  PubMed Central  Google Scholar 

  34. Frost, H. R. Variance-adjusted Mahalanobis (VAM): a fast and accurate method for cell-specific gene set scoring. Nucleic Acids Res. 48, e94 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Louis, P., Hold, G. L. & Flint, H. J. The gut microbiota, bacterial metabolites and colorectal cancer. Nat. Rev. Microbiol. 12, 661–672 (2014).

    CAS  PubMed  Google Scholar 

  38. Schroeder, B. O. & Bäckhed, F. Signals from the gut microbiota to distant organs in physiology and disease. Nat. Med. 22, 1079–1089 (2016).

    CAS  PubMed  Google Scholar 

  39. Bravo González-Blas, C. et al. SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks. Nat. Methods 20, 1355–1367 (2023).

    PubMed  PubMed Central  Google Scholar 

  40. Astbury, S. et al. Lower gut microbiome diversity and higher abundance of proinflammatory genus Collinsella are associated with biopsy-proven nonalcoholic steatohepatitis. Gut Microbes 11, 569–580 (2020).

    PubMed  Google Scholar 

  41. Lee, N. Y. et al. Lactobacillus attenuates progression of nonalcoholic fatty liver disease by lowering cholesterol and steatosis. Clin. Mol. Hepatol. 27, 110–124 (2021).

    PubMed  Google Scholar 

  42. Zhang, X. et al. Dietary cholesterol drives fatty liver-associated liver cancer by modulating gut microbiota and metabolites. Gut 70, 761–774 (2021).

    CAS  PubMed  Google Scholar 

  43. Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Lonsdale, J. et al. The genotype-tissue expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

    CAS  Google Scholar 

  45. Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015).

    CAS  PubMed  Google Scholar 

  46. McLoughlin, K. et al. Host selection of microbiota via differential adhesion. Cell Host Microbe 19, 550–559 (2016).

    CAS  PubMed  Google Scholar 

  47. Schluter, J. & Foster, K. R. The evolution of mutualism in gut microbiota via host epithelial selection. PLoS Biol. 10, e1001424 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. Pettersen, V. K. & Arrieta, M.-C. Host–microbiome intestinal interactions during early life: considerations for atopy and asthma development. Curr. Opin. Allergy Clin. Immunol. 20, 138–148 (2020).

    PubMed  Google Scholar 

  49. Tripathi, A. et al. The gut–liver axis and the intersection with the microbiome. Nat. Rev. Gastroenterol. Hepatol. 15, 397–411 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Choi, W. et al. Serotonin signals through a gut–liver axis to regulate hepatic steatosis. Nat. Commun. 9, 4824 (2018).

    PubMed  PubMed Central  Google Scholar 

  51. Delzenne, N. M. et al. Contribution of the gut microbiota to the regulation of host metabolism and energy balance: a focus on the gut–liver axis. Proc. Nutr. Soc. 78, 319–328 (2019).

    CAS  PubMed  Google Scholar 

  52. Consortium, G. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

    Google Scholar 

  53. Bernstein, B. E. et al. The NIH roadmap epigenomics mapping consortium. Nat. Biotechnol. 28, 1045–1048 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Rozowsky, J. et al. The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models. Cell 186, 1493–1511.e40 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. Liu, Z. et al. Network analyses in microbiome based on high-throughput multi-omics data. Brief. Bioinform. 22, 1639–1655 (2021).

    CAS  PubMed  Google Scholar 

  56. Matchado, M. S. et al. Network analysis methods for studying microbial communities: a mini review. Comput. Struct. Biotechnol. J. 19, 2687–2698 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. Ma, Y. & Li, M. D. Establishment of a strong link between smoking and cancer pathogenesis through DNA methylation analysis. Sci. Rep. 7, 1811 (2017).

    PubMed  PubMed Central  Google Scholar 

  58. Ma, Y. et al. Integration of human organoids single‐cell transcriptomic profiles and human genetics repurposes critical cell type‐specific drug targets for severe COVID‐19. Cell Prolif. 57, e13558 (2023).

    PubMed  PubMed Central  Google Scholar 

  59. Kriaa, A. et al. Microbial impact on cholesterol and bile acid metabolism: current status and future prospects. J. Lipid Res. 60, 323–332 (2019).

    CAS  PubMed  Google Scholar 

  60. Aizarani, N. et al. A human liver cell atlas reveals heterogeneity and epithelial progenitors. Nature 572, 199–204 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  61. Halpern, K. B. et al. Single-cell spatial reconstruction reveals global division of labour in the mammalian liver. Nature 542, 352–356 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. Benito-Vicente, A. et al. Familial hypercholesterolemia: the most frequent cholesterol metabolism disorder caused disease. Int. J. Mol. Sci. 19, 3426 (2018).

    PubMed  PubMed Central  Google Scholar 

  63. Parham, J. S. & Goldberg, A. C. Review of recent clinical trials and their impact on the treatment of hypercholesterolemia. Prog. Cardiovasc. Dis. 75, 90–96 (2022).

    PubMed  Google Scholar 

  64. Bepler, T. & Berger, B. Learning the protein language: evolution, structure, and function. Cell Syst. 12, 654–669.e53 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  65. Li, C. et al. Gut microbiome and metabolome profiling in Framingham heart study reveals cholesterol-metabolizing bacteria. Cell 187, 1834–1852.e19 (2024).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Richter, M. L. et al. Single-nucleus RNA-seq2 reveals functional crosstalk between liver zonation and ploidy. Nat. Commun. 12, 4264 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. Wang, X. et al. Comparative analysis of cell lineage differentiation during hepatogenesis in humans and mice at the single-cell transcriptome level. Cell Res. 30, 1109–1126 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. Gury-BenAri, M. et al. The spectrum and regulatory landscape of intestinal innate lymphoid cells are shaped by the microbiome. Cell 166, 1231–1246.e13 (2016).

    CAS  PubMed  Google Scholar 

  69. Andrlová, H. et al. MAIT and Vδ2 unconventional T cells are supported by a diverse intestinal microbiome and correlate with favorable patient outcome after allogeneic HCT. Sci. Transl. Med. 14, eabj2829 (2022).

    PubMed  PubMed Central  Google Scholar 

  70. Banerjee, A. et al. Succinate produced by intestinal microbes promotes specification of tuft cells to suppress ileal inflammation. Gastroenterology 159, 2101–2115.e5 (2020).

    CAS  PubMed  Google Scholar 

  71. Haber, A. L. et al. A single-cell survey of the small intestinal epithelium. Nature 551, 333–339 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  72. Zhao, R. et al. Single-cell and spatiotemporal transcriptomic analyses reveal the effects of microorganisms on immunity and metabolism in the mouse liver. Comput. Struct. Biotechnol. J. 21, 3466–3477 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. Kovatcheva-Datchary, P. et al. Simplified intestinal microbiota to study microbe–diet–host interactions in a mouse model. Cell Rep. 26, 3772–3783.e6 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  74. Mager, L. F. et al. Microbiome-derived inosine modulates response to checkpoint inhibitor immunotherapy. Science 369, 1481–1489 (2020).

    CAS  PubMed  Google Scholar 

  75. Drokhlyansky, E. et al. The human and mouse enteric nervous system at single-cell resolution. Cell 182, 1606–1622.e23 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  76. Gomez-Arango, L. F. et al. Low dietary fiber intake increases Collinsella abundance in the gut microbiota of overweight and obese pregnant women. Gut Microbes 9, 189–201 (2018).

    PubMed  PubMed Central  Google Scholar 

  77. Sui, G., Jia, L., Quan, D., Zhao, N. & Yang, G. Activation of the gut microbiota–kynurenine–liver axis contributes to the development of nonalcoholic hepatic steatosis in nondiabetic adults. Aging 13, 21309 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  78. Wang, C. et al. Integrated microbiome and metabolome analysis reveals correlations between gut microbiota components and metabolic profiles in mice with methotrexate-induced hepatoxicity. Drug Des. Devel. Ther. 16, 3877–3891 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  79. Yin, X. et al. Structural changes of gut microbiota in a rat non-alcoholic fatty liver disease model treated with a Chinese herbal formula. Syst. Appl. Microbiol. 36, 188–196 (2013).

    PubMed  Google Scholar 

  80. Yang, X. et al. Alleviating effects of noni fruit polysaccharide on hepatic oxidative stress and inflammation in rats under a high-fat diet and its possible mechanisms. Food Funct. 11, 2953–2968 (2020).

    CAS  PubMed  Google Scholar 

  81. Khan, T. J. et al. Atorvastatin treatment modulates the gut microbiota of the hypercholesterolemic patients. Omics 22, 154–163 (2018).

    CAS  PubMed  Google Scholar 

  82. Martínez, I. et al. Diet-induced metabolic improvements in a hamster model of hypercholesterolemia are strongly linked to alterations of the gut microbiota. Appl. Environ. Microbiol. 75, 4175–4184 (2009).

    PubMed  PubMed Central  Google Scholar 

  83. Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  84. Ma, Y. et al. Integrative genomics analysis reveals a 21q22.11 locus contributing risk to COVID-19. Hum. Mol. Genet. 30, 1247–1258 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  85. Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  86. Skene, N. G. et al. Genetic identification of brain cell types underlying schizophrenia. Nat. Genet. 50, 825–833 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  87. Skene, N. G. & Grant, S. G. Identification of vulnerable cell types in major brain disorders using single cell transcriptomes and expression weighted cell type enrichment. Front. Neurosci. 10, 16 (2016).

    PubMed  PubMed Central  Google Scholar 

  88. Liao, Y., Wang, J., Jaehnig, E. J., Shi, Z. & Zhang, B. WebGestalt 2019: gene set analysis toolkit with revamped UIs and APIs. Nucleic Acids Res. 47, W199–W205 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  89. Alghamdi, N. et al. A graph neural network model to estimate cell-wise metabolic flux using single-cell RNA-seq data. Genome Res. 31, 1867–1884 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  90. Kurki, M. I. et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 613, 508–518 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  91. Lin, L. et al. The airway microbiome mediates the interaction between environmental exposure and respiratory health in humans. Nat. Med. 29, 1750–1759 (2023).

    CAS  PubMed  Google Scholar 

  92. Ong, J.-S. et al. A comprehensive re-assessment of the association between vitamin D and cancer susceptibility using Mendelian randomization. Nat. Commun. 12, 246 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  93. Zackular, J. P. et al. Dietary zinc alters the microbiota and decreases resistance to Clostridium difficile infection. Nat. Med. 22, 1330–1334 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  94. Smith, A. B. et al. Enterococci enhance Clostridioides difficile pathogenesis. Nature 611, 780–786 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  95. Yu, J. et al. Bifidobacterium longum promotes postoperative liver function recovery in patients with hepatocellular carcinoma. Cell Host Microbe 32, 131–144.e6 (2024).

    CAS  PubMed  Google Scholar 

  96. Paik, D. et al. Human gut bacteria produce ΤΗ17-modulating bile acid metabolites. Nature 603, 907–912 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  97. Li, J. et al. scBPS (version 1.0.0). Zenodo https://doi.org/10.5281/zenodo.15073160 (2025).

Download references

Acknowledgements

We thank J. Chen and H. Liu for technical assistance, and W. Pan for support in molecular experiments. This study was funded by the National Natural Science Foundation of China (32200535 to Y.M.), the Zhejiang Provincial Natural Science Foundation of China (2025C02153 to J.S.), and the China Postdoctoral Science Foundation (2023M732679 to J.L.).

Author information

Authors and Affiliations

Authors

Contributions

J.S., J.L. and Y.M. designed the study and developed statistical methodologies. W.-H.C., Y.C., Q.R., Q.Z. and Y.L. designed the animal study and conducted wet-lab experiments. J.L., Y.M., G.Z., C.C., Y. Zhou, Y. Zhang and C.D. performed data analysis and visualization. Y.M., W.-H.C. and J.S. provided guidance on data analysis and biological interpretations. J.L., Y.M., W.-H.C. and J.S. wrote the paper and response letters.

Corresponding authors

Correspondence to Wei-Hua Chen or Jianzhong Su.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Microbiology thanks Alexander Kurilshikov, Qingbo Wang, Martin Zhang and Tao Zhang for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Performance comparison for scBPS.

a, The quantile-quantile plot illustrates results from null simulations. We randomly selected 1,000 as putative disease genes with random GWAS gene weights that matching the MAGMA z-score distributions of the gut microbes. b, Power analysis for causal simulations. Expression levels of causal genes were increased by factors of 1.1, 1.2, 1.3, 1.4, and 1.5 in the target cell types. We assessed the power to identify these target cell types at an FDR of 0.05 across these various fold changes. c, Impact of GWAS power on results for scBPS, scDRS, scPagwas, LDSC-SEG and MAGMA Celltyping. We analyzed GWAS summary statistics of 12 UK Biobank traits across varying subsample sizes (5 K, 10 K, 20 K, 50 K, and 80 K samples) coupled with the human single-cell atlas using different methods. The median number of discovered cell-types for the 12 traits were grouped by GWAS sample size. Dots represent traits and error bars denote upper and lower quantiles. The 12 traits are ASM (Asthma), BMD-HT (Heel Test), CLC (Clinical LDL Cholesterol), ECOL (College Education), Eczema, HDL (Free Cholesterol in HDL), LDL (Free Cholesterol in LDL), RBC (Red Blood Cell Count), RDW (Red Cell Distribution Width), SBP (Systolic Blood Pressure), Smoking, VLDL (Free Cholesterol in VLDL).

Extended Data Fig. 2 Associations between gut microbial clusters and human tissue modules.

a, Representative taxa at different taxonomic level are displayed for each tissue-related bacteria cluster. The color of the dots indicated the percentage of taxa within the taxonomic level that assigned to the respective bacteria clusters. b, Barplot summarizing the fraction of associations with FDR < 0.05 for each module. c, Representation of interaction strengths between the four tissue modules and the three bacteria clusters, highlighting the top three associations. The width of the line corresponded to the median value of BPSAUC of the extended-data-figure_fig2.tif organ-taxon pairs within the respective module pairs. d, The top ten microbial taxa within cluster T1 amd T2 in association with tissue module M1. e, Heatmap exhibiting the association strengths between the three bacteria clusters and the 24 tissues. f, Boxplot summarizing distribution of BPSAUC values for bacteria clusters T1, T2, and T3 in association with the 24 tissues. The central lines indicated the median values. The lower and upper hinges indicated the first and third quartiles. The lower and upper whiskers extended from the hinge to the smallest and largest values no further than 1.5× the interquartile range from the hinge. g, Histograms for the BPSAUC scores of all the organ-bacterial taxon associations. Vertical redline indicated cutoff for outliers of the distribution that identified by exploreThresholds function in AUCell R package. h, The associations of the organs with the bacterial taxa at all taxonomic levels that met the stringent threshold described in Fig. 1d. The colors of the boxes were corresponded to the BPSAUC values of respective organ-taxon pairs. i, Boxplot summarizing distribution of BPSAUC values for each tissue.

Extended Data Fig. 3 Associations of gut microbial taxa with human tissues from validation datasets.

a, Venn plot showing the overlap of microbial taxa identified to be associated with kidney, heart, pancreas, and eye, as identified by our computational framework, with that identified from the GTEx and the Franke lab dataset using LDSC-SEG. The names of the overlapped taxa were labeled. b, Dot plots showing correlation of the number of taxa (stringent criterion) for individual organs, as identified by our computational framework, with that identified from the GTEx dataset using a significance threshold of p < 0.05 (left) and p < 0.1 (right). Significance was measured by Spearman correlation analysis. c, Same as panel (b) for results from the Franke lab dataset. d, Independent validation using GWAS summary data from the MiBioGen project. Correlation between the BPSAUC values of all organ-microbe pairs using GWAS data from the Lifelines project and those using GWAS data from the MiBioGen project. P-values were calculated by Pearson correlation analysis e, Same as panel (d) for all cell type-microbe pairs.

Extended Data Fig. 4 Associations between gut microbiome and host cell types.

a, Representative taxa at the family levels were displayed for each cell-type-related bacteria cluster from Fig. 2a. The color of the dots indicated the percentage of taxa within the taxonomic level that assigned to the respective bacteria clusters. b, Overlap of the cell-type-related bacteria clusters (C1-C9, as shown in Fig. 3a) with the tissue-related bacteria clusters (T1-T3, as shown in Fig. 2a). The color of dots represented the percentage of taxa within cell-type-related clusters that belong to each tissue-related cluster. c, Comparison of BPSAUC values across four cell type compartments. Boxplots inside the violin plots showed distribution of the BPSAUC values. The central line indicated the median. The lower and upper hinges indicated the first and third quartiles. The lower and upper whiskers extended from the hinge to the smallest and largest values no further than 1.5× the interquartile range from the hinge. Significances were tested using Wilcoxon Rank-sum tests. d, e, Barplots summarizing the fraction of associations with FDR < 0.05 for each cell-type module (d) and for cell types from the target organs within the four compartments (e). f, The top 10 interactions between the five cell type modules and the nine bacteria clusters. g, The connectivity among the taxa in terms of their BPSAUC values at cell type level. Node size was relative to centrality of the taxon. The edges indicate strong correlation (coefficient > 0.8) between the connected taxa. h, Histograms for the BPSAUC scores of all the cell type-bacterial taxon associations. Vertical redline indicated cutoff for outliers of the distribution that identified by exploreThresholds function in AUCell R package. i, Bar plot summarized the cell type profiles of significant associations that reached the stringent threshold. The number of significant associations for individual cell type were shown in stacked bars. All taxonomic levels were summarized. j, Comparison of the bacterial taxa within the Stringent, Moderate and Nonsig groups regarding their significance of association with liver or hepatocyte terms in the Roadmap dataset and the EN-TEX dataset. P-values were determined using the Wilcoxon Rank-sum tests. k, Bar plots showing weighted degree, degree, and Pageranks scores of the 44 taxa in the network in Fig. 2h.

Extended Data Fig. 5 Heterogeneity of hepatocytes in gene expression and metabolism profiles.

a, Results of gene set enrichment analysis (GSEA) on genes correlated with Collinsella BPS scores among the hepatocytes. Significant pathways that reached the threshold of FDR < 0.05 were displayed. The pathways involved in cholesterol biosynthesis and metabolism were highlighted in red. b, Correlation of cholesterol pathways with Collinsella compared to random pathways. P-values were calculated using a Monte Carlo simulation approach. c, Correlations of Collinsella BPS quintiles with mean pathway scores across all KEGG pathways. Significance was assessed using linear regression model. d, Comparsons of mean pathway scores between cells with top and bottom 20% (left panel), 5% (middle panel) and 1% (right panel) BPS Collinsella scores. Significance assessed using t-test. e, Marker genes for hepatocyte zonation and their expression levels in the three subpopulations. f, Dot plots showing differentially expressed genes for the three hepatocyte subpopulations. The number of up-regulated (red) and down-regulated (blue) genes were summarized. g, Differential functions among the three hepatocyte subpopulations. Gene set enrichment analysis (GSEA) was performed to identify distinctive functional profiles for the three subpopulations. h, Expression level of gene CYP7A1 (the rate-limiting step of the bile-acid biosynthetic pathway), HMGCR (rate-limiting enzyme of cholesterol synthesis), and NR1H4 (the gene encoding FXR receptor) in the subpopulations of hepatocytes. i, Rankings of top 30 marker metabolic reactions for each hepatocyte subpopulation. j, Mean level of metabolic flux for reaction “Cholesterol −> Chenodeoxycholate” in the three hepatocyte subpopulations. k, Correlation of the reaction flux for “Cholesterol −> Chenodeoxycholate” and the Collinsella BPS values among the hepatocytes, as stratified by zonation. Significances of correlations were estimated by Pearson correlation analyses.

Extended Data Fig. 6 Association of Collinsella with liver-related diseases.

a, Distribution of DPS values of the 10 liver-associated diseases among the 24 tissues. The central line indicated the median. The lower and upper hinges indicated the first and third quartiles. The lower and upper whiskers extended from the hinge to the smallest and largest values no further than 1.5× the interquartile range from the hinge. b, Distribution of DPS values of the 10 liver-associated diseases among the 16 liver cell types. The central line indicated the median. The lower and upper hinges indicated the first and third quartiles. The lower and upper whiskers extended from the hinge to the smallest and largest values no further than 1.5× the interquartile range from the hinge. c, Associations of all (upper) and top 1000 (bottom) magma z-scores derived from Collinsella with disease GWAS data. Association coefficient and p-values were generated by linear regression analyses. d, Mendelian Randomization analysis inferring causal relationships of Collinsella with both HYPERCHOL and FH. Forest plot showed the MR estimates and 95% CI values of the bi-directional causal effects between Collinsella and both HYPERCHOL and FH, as estimated using eight different two-sample MR methods. The P values calculated by each MR method were listed. e, Multivariate linear regression model estimated association between Collinsella and HYPERCHOL among the hepatocytes, stratified by zonation, while adjusting for the effect of cholesterol biosynthesis. f, g, Multivariate linear regression model estimating association between Collinsella and FH among the hepatocytes, stratified by zonation, while adjusting for effect of cholesterol metabolism pathway (f) and cholesterol biosynthesis pathway (g). h, Predicted PROSE distance of three Collinsella isolate proteins with key proteins involved in cholesterol metabolism. Collinsella proteins with both sequence and structure similarity are highlighted in black. Those with only structure similarity are shown in gray.

Extended Data Fig. 7 Normalized t-statistics between cells from the expected and unexpected cell types for the eight hepatocyte-associated bacteria.

a, Normalized t-statistics for differences of scBPS values between expected (hepatocytes) and unexpected (other) cells for each taxon applying different gene selection threshold. b, Normalized t-statistics for differences of BPSAUC values between hepatocyte to other cell types.

Supplementary information

Source data

Source Data Fig. 1 (download XLSX )

Statistical source data.

Source Data Fig. 2 (download XLSX )

Statistical source data.

Source Data Fig. 3 (download XLSX )

Statistical source data.

Source Data Fig. 4 (download XLSX )

Statistical source data.

Source Data Fig. 5 (download XLSX )

Statistical source data.

Source Data Fig. 6 (download PDF )

Unprocessed western blots and/or gels.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Ma, Y., Cao, Y. et al. Integrating microbial GWAS and single-cell transcriptomics reveals associations between host cell populations and the gut microbiome. Nat Microbiol 10, 1210–1226 (2025). https://doi.org/10.1038/s41564-025-01978-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41564-025-01978-w

Search

Quick links

Nature Briefing Microbiology

Sign up for the Nature Briefing: Microbiology newsletter — what matters in microbiology research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: Microbiology