Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
A diagnostic model based on pulmonary microbiota and host gene expression to distinguish colonization from pneumonia
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 03 April 2026

A diagnostic model based on pulmonary microbiota and host gene expression to distinguish colonization from pneumonia

  • Zhangfan Fu1 na1,
  • Yuhan Sun1 na1,
  • Haijun Yao3,4,5 na1,
  • Qihui Liu1,
  • Qiran Zhang1,
  • Jin Hu3,4,5,
  • Yang Zhou1,
  • Ning Jiang1,
  • Jingwen Ai1,
  • Jialin Jin1 &
  • …
  • Wenhong Zhang1,2 

Scientific Reports , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Biomarkers
  • Computational biology and bioinformatics
  • Diseases
  • Medical research
  • Microbiology
  • Pathogenesis

Abstract

Pneumonia remains a leading cause of global mortality. Conventional diagnostic approaches frequently fail to distinguish microbial colonization from true infection in the lower respiratory tract, complicating clinical decision-making and contributing to antibiotic overuse. Improved diagnostic strategies are urgently needed. In this prospective, single-center study, deep sputum specimens were collected from patients with respiratory colonization (n = 17) and infectious pneumonia (n = 27) admitted to the neurosurgical ICU of Huashan Hospital. Metagenomic next-generation sequencing (mNGS) and metatranscriptomic profiling were performed to characterize both the pulmonary microbiota and the host immune response. These features were subsequently integrated to construct a diagnostic model. Microbial community profiling revealed reduced alpha diversity and enrichment of metabolically active pathogenic taxa in the infection group, consistent with a dysbiotic state permissive to invasion. In contrast, the colonization group demonstrated a more balanced microbial ecosystem. Transcriptomic analyses identified 2232 differentially expressed host genes between the two groups. The colonization group showed marked activation of the Wnt, MAPK, chemokine, and focal adhesion pathways, which are functionally implicated in epithelial barrier maintenance and early immune homeostasis. A multi-omics diagnostic model incorporating seven gene features (ANKRD52, ZC3HAV1L, SERPINE3, CDPF1, ZNF720, TAGLN3, and LRRC15) achieved a discrimination between colonization and infection (AUC = 0.951 in the training cohort; 0.875 in the validation set). By jointly analyzing the pulmonary microbiome and host transcriptome, this study provides insight into host–microbe interactions distinguishing colonization from infection and presents a predictive model with potential clinical relevance.

Similar content being viewed by others

Deep longitudinal lower respiratory tract microbiome profiling reveals genome-resolved functional and evolutionary dynamics in critical illness

Article Open access 27 September 2024

Genomic attributes of airway commensal bacteria and mucosa

Article Open access 12 February 2024

Pneumococcal within-host diversity during colonization, transmission and treatment

Article Open access 10 October 2022

Data availability

The raw sequencing data generated in this study have been deposited in the Genome Sequence Archive (GSA) under accession number CRA032964. The data can be accessed via the following link: https://ngdc.cncb.ac.cn/gsa/browse/CRA032964.All analysis scripts (R code) used for statistical analysis, machine learning modeling, and figure generation are publicly available on GitHub at https://github.com/fuzhangfan-lgtm/colonization-and-infectious-pneumonia.git.The study protocol is available from the corresponding author upon reasonable request.

References

  1. Abdalla, S., Abd-Allah, F. & Abdel Aziz, M. I. Global, regional, and national age-sex specific all-cause and cause-specific mortality for 240 causes of death, 1990–2013: A systematic analysis for the Global Burden of Disease Study 2013. Lancet 385(117–171), 2015. https://doi.org/10.1016/s0140-6736(14)61682-2 (2015).

    Google Scholar 

  2. Jones, B. E. et al. Incidence and outcomes of non–ventilator-associated hospital-acquired pneumonia in 284 US hospitals using electronic surveillance criteria. JAMA Netw. Open 6, e2314185-e2314185. https://doi.org/10.1001/jamanetworkopen.2023.14185 (2023).

    Google Scholar 

  3. Li, W. et al. Incidence and risk factors of ventilator-associated pneumonia in the intensive care unit: A systematic review and meta-analysis. J. Thorac. Dis. 16, 5518–5528. https://doi.org/10.21037/jtd-24-150 (2024).

    Google Scholar 

  4. Sun, Y. et al. Incidence of community-acquired pneumonia in urban China: A national population-based study. Vaccine 38, 8362–8370. https://doi.org/10.1016/j.vaccine.2020.11.004 (2020).

    Google Scholar 

  5. Gray, J. & Coupland, L. J. The increasing application of multiplex nucleic acid detection tests to the diagnosis of syndromic infections. Epidemiol. Infect. 142, 1–11. https://doi.org/10.1017/s0950268813002367 (2014).

    Google Scholar 

  6. Fan, R. R. et al. Nasopharyngeal pneumococcal density and evolution of acute respiratory illnesses in young children, Peru, 2009-2011. Emerg. Infect. Dis. 22, 1996–1999. https://doi.org/10.3201/eid2211.160902 (2016).

    Google Scholar 

  7. Esposito, S. et al. DNA bacterial load in children with bacteremic pneumococcal community-acquired pneumonia. Eur. J. Clin. Microbiol. Infect. Dis. 32, 877–881. https://doi.org/10.1007/s10096-013-1821-0 (2013).

    Google Scholar 

  8. Platts-Mills, J. A., Liu, J. & Houpt, E. R. New concepts in diagnostics for infectious diarrhea. Mucosal Immunol. 6, 876–885. https://doi.org/10.1038/mi.2013.50 (2013).

    Google Scholar 

  9. Fischer, N. et al. Evaluation of unbiased next-generation sequencing of RNA (RNA-seq) as a diagnostic method in Influenza virus-positive respiratory samples. J. Clin. Microbiol. 53, 2238–2250. https://doi.org/10.1128/jcm.02495-14 (2015).

    Google Scholar 

  10. Fu, Z. et al. Pathogen quantitative efficacy of different spike-in internal controls and clinical application in central nervous system infection with metagenomic sequencing. Microbiol. Spectr. 11, e0113923. https://doi.org/10.1128/spectrum.01139-23 (2023).

    Google Scholar 

  11. Wilson, M. R. et al. Diagnosing Balamuthia mandrillaris encephalitis with metagenomic deep sequencing. Ann. Neurol. 78, 722–730. https://doi.org/10.1002/ana.24499 (2015).

    Google Scholar 

  12. Suarez, N. M. et al. Superiority of transcriptional profiling over procalcitonin for distinguishing bacterial from viral lower respiratory tract infections in hospitalized adults. J. Infect. Dis. 212, 213–222. https://doi.org/10.1093/infdis/jiv047 (2015).

    Google Scholar 

  13. Tsalik, E. L., McClain, M. & Zaas, A. K. Moving toward prime time: Host signatures for diagnosis of respiratory infections. J. Infect. Dis. 212, 173–175. https://doi.org/10.1093/infdis/jiv032 (2015).

    Google Scholar 

  14. Langelier, C. et al. Integrating host response and unbiased microbe detection for lower respiratory tract infection diagnosis in critically ill adults. Proc. Natl. Acad. Sci. U. S. A. 115, E12353-e12362. https://doi.org/10.1073/pnas.1809700115 (2018).

    Google Scholar 

  15. Diao, Z. et al. Validation of a metagenomic next-generation sequencing assay for lower respiratory pathogen detection. Microbiol. Spectr. 11, e0381222. https://doi.org/10.1128/spectrum.03812-22 (2023).

    Google Scholar 

  16. Sulaiman, I. et al. Microbial signatures in the lower airways of mechanically ventilated COVID-19 patients associated with poor clinical outcome. Nat. Microbiol. 6, 1245–1258. https://doi.org/10.1038/s41564-021-00961-5 (2021).

    Google Scholar 

  17. Zhou, Z. et al. Heightened innate immune responses in the respiratory tract of COVID-19 patients. Cell Host Microbe 27, 883-890.e882. https://doi.org/10.1016/j.chom.2020.04.017 (2020).

    Google Scholar 

  18. Han, D. et al. Metagenomic fingerprints in bronchoalveolar lavage differentiate pulmonary diseases. npj Digit. Med. 8, 599. https://doi.org/10.1038/s41746-025-01977-5 (2025).

    Google Scholar 

  19. Sun, Q. et al. Clinical implement of probe-capture metagenomics in sepsis patients: A multicentre and prospective study. Clin. Transl. Med. 15, e70297. https://doi.org/10.1002/ctm2.70297 (2025).

    Google Scholar 

  20. Tao, Y. et al. Diagnostic performance of metagenomic next-generation sequencing in pediatric patients: A retrospective study in a large children’s medical center. Clin. Chem. 68, 1031–1041. https://doi.org/10.1093/clinchem/hvac067 (2022).

    Google Scholar 

  21. Steen, C. B., Liu, C. L., Alizadeh, A. A. & Newman, A. M. Profiling cell type abundance and expression in bulk tissues with CIBERSORTx. Methods Mol. Biol. 2117, 135–157. https://doi.org/10.1007/978-1-0716-0301-7_7 (2020).

    Google Scholar 

  22. Newman, A. M. et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 37, 773–782. https://doi.org/10.1038/s41587-019-0114-2 (2019).

    Google Scholar 

  23. Whitney, A. W. A direct method of nonparametric measurement selection. IEEE Trans. Comput. C–20, 1100–1103. https://doi.org/10.1109/T-C.1971.223410 (1971).

    Google Scholar 

  24. Collins, G. S. et al. TRIPOD+AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 385, e078378. https://doi.org/10.1136/bmj-2023-078378 (2024).

    Google Scholar 

  25. Ageevets, V. A. et al. Emergence of carbapenemase-producing Gram-negative bacteria in Saint Petersburg, Russia. Int. J. Antimicrob. Agents. 44, 152–155. https://doi.org/10.1016/j.ijantimicag.2014.05.004 (2014).

    Google Scholar 

  26. Huang, L. et al. Risk factors for imipenem-nonsusceptible Acinetobacter nosocomialis bloodstream infection. J. Microbiol. Immunol. Infect. 47, 311–317. https://doi.org/10.1016/j.jmii.2013.02.002 (2014).

    Google Scholar 

  27. Pace, N. R. Structure and synthesis of the ribosomal ribonucleic acid of prokaryotes. Bacteriol. Rev. 37, 562–603. https://doi.org/10.1128/br.37.4.562-603.1973 (1973).

    Google Scholar 

  28. Bonamy, C., Hirschbein, L. & Szulmajster, J. Synthesis of ribosomal ribonucleic acid during sporulation of Bacillus subtilis. J. Bacteriol. 113, 1296–1306. https://doi.org/10.1128/jb.113.3.1296-1306.1973 (1973).

    Google Scholar 

  29. Zhu, B. et al. Uncoupling of macrophage inflammation from self-renewal modulates host recovery from respiratory viral infection. Immunity 54, 1200-1218.e1209. https://doi.org/10.1016/j.immuni.2021.04.001 (2021).

    Google Scholar 

  30. Wheaton, A. K., Agarwal, M., Jia, S. & Kim, K. K. Lung epithelial cell focal adhesion kinase signaling inhibits lung injury and fibrosis. Am. J. Physiol. Lung Cell Mol. Physiol. 312, L722-l730. https://doi.org/10.1152/ajplung.00478.2016 (2017).

    Google Scholar 

  31. Hu, X. et al. PI3K-Akt-mTOR/PFKFB3 pathway mediated lung fibroblast aerobic glycolysis and collagen synthesis in lipopolysaccharide-induced pulmonary fibrosis. Lab. Invest. 100, 801–811. https://doi.org/10.1038/s41374-020-0404-9 (2020).

    Google Scholar 

  32. Spragge, F. et al. Microbiome diversity protects against pathogens by nutrient blocking. Science 382, eadj3502. https://doi.org/10.1126/science.adj3502 (2023).

    Google Scholar 

  33. Heilbronner, S., Krismer, B., Brötz-Oesterhelt, H. & Peschel, A. The microbiome-shaping roles of bacteriocins. Nat. Rev. Microbiol. 19, 726–739. https://doi.org/10.1038/s41579-021-00569-w (2021).

    Google Scholar 

  34. Newman, A. M. et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12, 453–457. https://doi.org/10.1038/nmeth.3337 (2015).

    Google Scholar 

  35. Aran, D., Hu, Z. & Butte, A. J. xCell: Digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 18, 220. https://doi.org/10.1186/s13059-017-1349-1 (2017).

    Google Scholar 

  36. Sturm, G. et al. Comprehensive evaluation of transcriptome-based cell-type quantification methods for immuno-oncology. Bioinformatics 35, i436–i445. https://doi.org/10.1093/bioinformatics/btz363 (2019).

    Google Scholar 

  37. Strehl, C., Ehlers, L., Gaber, T. & Buttgereit, F. Glucocorticoids—all-rounders tackling the versatile players of the immune system. Front. Immunol. 10, 1744. https://doi.org/10.3389/fimmu.2019.01744 (2019).

    Google Scholar 

  38. Idris, T. et al. Akt-driven TGF-β and DKK1 secretion impairs F508del cystic fibrosis airway epithelium polarity. Am. J. Respir. Cell Mol. Biol. 71, 81–94. https://doi.org/10.1165/rcmb.2023-0408OC (2024).

    Google Scholar 

  39. Zhou, F., Onizawa, S., Nagai, A. & Aoshiba, K. Epithelial cell senescence impairs repair process and exacerbates inflammation after airway injury. Respir. Res. 12, 78. https://doi.org/10.1186/1465-9921-12-78 (2011).

    Google Scholar 

  40. Zhou, X. et al. Mast cell chymase impairs bronchial epithelium integrity by degrading cell junction molecules of epithelial cells. Allergy 74, 1266–1276. https://doi.org/10.1111/all.13666 (2019).

    Google Scholar 

  41. Le, Y., Zhou, Y., Iribarren, P. & Wang, J. Chemokines and chemokine receptors: Their manifold roles in homeostasis and disease. Cell. Mol. Immunol. 1, 95–104 (2004).

    Google Scholar 

  42. Svitkina, T. M. & Borisy, G. G. Arp2/3 complex and actin depolymerizing factor/cofilin in dendritic organization and treadmilling of actin filament array in lamellipodia. J. Cell Biol. 145, 1009–1026. https://doi.org/10.1083/jcb.145.5.1009 (1999).

    Google Scholar 

  43. MacDonald, B. T., Tamai, K. & He, X. Wnt/beta-catenin signaling: Components, mechanisms, and diseases. Dev. Cell 17, 9–26. https://doi.org/10.1016/j.devcel.2009.06.016 (2009).

    Google Scholar 

  44. Engelman, J. A., Luo, J. & Cantley, L. C. The evolution of phosphatidylinositol 3-kinases as regulators of growth and metabolism. Nat. Rev. Genet. 7, 606–619. https://doi.org/10.1038/nrg1879 (2006).

    Google Scholar 

  45. Yuan, T. L. & Cantley, L. C. PI3K pathway alterations in cancer: Variations on a theme. Oncogene 27, 5497–5510. https://doi.org/10.1038/onc.2008.245 (2008).

    Google Scholar 

  46. Song, C. et al. Evidence for the critical role of the PI3K signaling pathway in particulate matter-induced dysregulation of the inflammatory mediators COX-2/PGE(2) and the associated epithelial barrier protein filaggrin in the bronchial epithelium. Cell Biol. Toxicol. 36, 301–313. https://doi.org/10.1007/s10565-019-09508-1 (2020).

    Google Scholar 

  47. Pollard, T. D. & Cooper, J. A. Actin, a central player in cell shape and movement. Science 326, 1208–1212. https://doi.org/10.1126/science.1175862 (2009).

    Google Scholar 

  48. Sun, T. et al. TAZ is required for lung alveolar epithelial cell differentiation after injury. JCI Insight https://doi.org/10.1172/jci.insight.128674 (2019).

    Google Scholar 

  49. Loo, L. et al. Fibroblast-expressed LRRC15 is a receptor for SARS-CoV-2 spike and controls antiviral and antifibrotic transcriptional programs. PLoS Biol. 21, e3001967. https://doi.org/10.1371/journal.pbio.3001967 (2023).

    Google Scholar 

  50. Song, T. Y. et al. Tumor evolution selectively inactivates the core microRNA machinery for immune evasion. Nat. Commun. 12, 7003. https://doi.org/10.1038/s41467-021-27331-3 (2021).

    Google Scholar 

  51. Huang, W. et al. ZC3HAV1 promotes the proliferation and metastasis via regulating KRAS in pancreatic cancer. Aging (Albany NY) 13, 18482–18497. https://doi.org/10.18632/aging.203296 (2021).

    Google Scholar 

Download references

Funding

This work was supported by Research grants from the Shanghai Science and Technology Committee (20Y11900400) and the National Key Research and Development Program of China (2022YFC2009802, 2022YFC2009800, 2022YFC2009801).

Author information

Author notes
  1. Zhangfan Fu, Yuhan Sun and Haijun Yao have contributed equally to this work.

Authors and Affiliations

  1. Department of Infectious Diseases, Shanghai Key Laboratory of Infectious Diseases and Biosafety Emergency Response, National Medical Center for Infectious Diseases, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, China

    Zhangfan Fu, Yuhan Sun, Qihui Liu, Qiran Zhang, Yang Zhou, Ning Jiang, Jingwen Ai, Jialin Jin & Wenhong Zhang

  2. Shanghai Sci-Tech Inno Center for Infection & Immunity, Shanghai, 200052, China

    Wenhong Zhang

  3. Department of Neurosurgery, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, 200040, China

    Haijun Yao & Jin Hu

  4. National Center for Neurological Disorders, Shanghai, 200040, China

    Haijun Yao & Jin Hu

  5. Department of Neurosurgery and Neurocritical Care, Huashan Hospital, Fudan University, Shanghai, 200040, China

    Haijun Yao & Jin Hu

Authors
  1. Zhangfan Fu
    View author publications

    Search author on:PubMed Google Scholar

  2. Yuhan Sun
    View author publications

    Search author on:PubMed Google Scholar

  3. Haijun Yao
    View author publications

    Search author on:PubMed Google Scholar

  4. Qihui Liu
    View author publications

    Search author on:PubMed Google Scholar

  5. Qiran Zhang
    View author publications

    Search author on:PubMed Google Scholar

  6. Jin Hu
    View author publications

    Search author on:PubMed Google Scholar

  7. Yang Zhou
    View author publications

    Search author on:PubMed Google Scholar

  8. Ning Jiang
    View author publications

    Search author on:PubMed Google Scholar

  9. Jingwen Ai
    View author publications

    Search author on:PubMed Google Scholar

  10. Jialin Jin
    View author publications

    Search author on:PubMed Google Scholar

  11. Wenhong Zhang
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Z.F.F. and Y.H.S. wrote the main manuscript text. H.J.Y. and J.H. prepared Figs. 1, 2 and 3. Q.H.L., Q.Z., Y.Z., N.J., J.W.A., J.L.J., and W.H.Z. contributed to data analysis and interpretation. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Ning Jiang, Jingwen Ai or Jialin Jin.

Ethics declarations

Competing interests

All authors have read the journal’s policy on disclosure of potential conflicts of interest and have none to declare.

Ethical approval and consent to participants

All participants in the parent trial agreed to participate in this study and provided written informed consent (Ky2020-1304).

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1. (download DOCX )

Supplementary Information 2. (download XLSX )

Supplementary Information 3. (download XLSX )

Supplementary Information 4. (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fu, Z., Sun, Y., Yao, H. et al. A diagnostic model based on pulmonary microbiota and host gene expression to distinguish colonization from pneumonia. Sci Rep (2026). https://doi.org/10.1038/s41598-026-44972-w

Download citation

  • Received: 20 February 2025

  • Accepted: 16 March 2026

  • Published: 03 April 2026

  • DOI: https://doi.org/10.1038/s41598-026-44972-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Respiratory colonization
  • Infectious pneumonia
  • Metagenomic next-Generation Sequencing (mNGS)
  • Host gene expression
  • Diagnostic model
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research