Introduction

Community-acquired pneumonia (CAP) is a globally prevalent respiratory infectious disease, and remains a major cause of morbidity and mortality worldwide1,2. The annual incidence of CAP is approximately 649 to 847 adults per 100,000 population, leading to around 1.6 million hospitalizations every year in United States2. Severe CAP (SCAP) is a common disease in intensive care unit (ICU) accounting for 17–21% of hospitalized CAP patients, and the 30-day or 6-month case fatality rates of SCAP were as high as 27% and 39%3,4. The timely and accurate identification of pathogens is critical for a proper diagnosis and effective treatment of CAP especially in SCAP cases. Currently, traditional etiological detection methods, such as culture, smear microscopy, antigen or antibody testing, and nucleic acid amplification testing (NAAT) are commonly used for clinical diagnosis of CAP5. However, the total pathogens detection rate of CAP was less than 50% when only using these routine tests5,6. The unknown etiology of CAP may lead to delayed or insufficient anti-infection therapy, worsening the prognosis. Therefore, early, rapid, and accurate pathogenic diagnostics tools are necessary for SCAP.

In recent years, an unbiased and detailed high-throughput nucleic acid sequencing technology, called metagenomic next-generation sequencing (mNGS) has been widely adopted in clinical settings for pathogen detection7,8. mNGS offers higher efficiency, increased sensitivity, faster turnaround time, and a broader pathogen spectrum including bacteria, fungi, viruses, tuberculosis, parasites and atypical pathogens7,9. Studies have shown that compared to conventional microbiological tests (CMTs) alone, mNGS combined with CMTs reduced the time to clinical improvement for SCAP8. Many studies have confirmed that mNGS is both feasible and effective in SCAP patients7,10,11. For example, mNGS of bronchoalveolar lavage fluid (BALF) samples from SCAP performed significantly higher sensitivity (97.54% vs 28.68%), coincidence (90.34% vs 35.17%), and negative predictive value (80.00% vs 13.21%) than CMTs7. However, few studies have compared the diagnostic efficiency and pathogenic spectrum of BALF mNGS, blood mNGS, CMTs together in SCAP patients11,12,13. In addition, efficiency comparison of mNGS in pathogen detection among mild CAP, SCAP (survivor), SCAP (death) has rarely been reported. Moreover, whether SCAP (death) have a specific pathogen profile contributing to worse prognosis deserves further investigation.

Nowadays, the CRUB-65 and PSI scoring system are widely used to evaluate the severity of CAP, which is helpful for predicting patients’ prognosis and future surviving state4,5. Although it is relatively easy to distinguish mild CAP and SCAP cases by clinical scoring system, the outcome of SCAP patients is still difficult to predict due to the similarity of clinical symptoms and laboratory indicators14. Thus, it is crucial to further understand heterogeneity in SCAP patients and identify better prognostic biomarkers. Bulk RNA-seq of PBMC and bioinformatic analysis enable dynamic tracing of the immune system and inflammatory host responses in SCAP patients14,15. The construction of key gene regulatory networks based on RNA-sequencing data is highly suitable for identifying biomarkers of disease severity and prognosis14,16. Most previous studies have concentrated on identifying biomarkers for mild or severe CAP, but there has been limited investigation into the host PBMC transcriptome signature associated with poor outcomes in the SCAP population, with the notable exception of studies on severe COVID-19 cases during the pandemic14,17.

This study seeks to elucidate a comprehensive landscape encompassing the microbiome, transcriptome, clinical characteristics, and laboratory markers in SCAP, with the objective of identifying novel biomarkers associated with disease prognosis and severity in SCAP patients. Therefore, we conducted a prospective multi-centers study to explore the pathogen distribution in mild CAP, SCAP (survivor), SCAP (death) patients using BALF and PBMC DNA + RNA mNGS as well as CMTs. Additionally, we further analyzed the diagnostic performance of blood or BALF DNA + RNA mNGS in CAP cases with different outcome or severity of illness. The PBMC samples of 89 patients were collected for bulk RNA-seq and host gene network changes were analyzed using bioinformatics.

Methods

Study design and participant enrollment

During the COVID-19 management period, a prospective multi-center study was conducted in 8 hospitals in Shanghai from January 1, 2022, to December 31, 2023, to evaluate the clinical, microbiological, and transcriptomic characteristics of mild CAP and SCAP. We obtained written informed consent from participants or their legal representative within 24 h of hospital admission. This study was approved by the Ruijin Hospital Ethics Committee, Shanghai Jiao Tong University School of Medicine (Ethics Approval No.: Ruijin Hospital Ethics Committee 2021–73); Shanghai General Hospital, Shanghai Jiao Tong University (Ethics Approval No.: 2021KY041); Zhongshan Hospital, Fudan University (Ethics Approval No.: B2021-183R); Xinhua Hospital, Shanghai Jiao Tong University School of Medicine (Ethics Approval No.: XHEC-C-2021-038-1); Shanghai Chest Hospital, Shanghai Jiao Tong University (Ethics Approval No.: 2021-130); Shanghai Public Health Clinical Center (Ethics Approval No.: 2021-S035-02); Renji Hospital, Shanghai Jiao Tong University School of Medicine (Ethics Approval No.: KY2021-103-B); and Shanghai Sixth People’s Hospital, Shanghai Jiao Tong University (Ethics Approval No.: IS2147). Inclusion criteria included: adults over 18 years of age; no gender limitations; patients with a diagnosis of CAP or SCAP; and newly diagnosed pneumonia patients. Exclusion criteria included: being under 18 years of age; active infection with SARS-CoV-2 (confirmed by routine PCR or nasopharyngeal swab at admission) or diagnosed with COVID-19 pneumonia; immunosuppressed status due to other diseases or treatments, such as HIV infection or ongoing immunosuppressive therapy (e.g., organ transplant recipients, long-term use of corticosteroids or other immunosuppressive medications), which results in severe and sustained immunodeficiency; active massive hemoptysis, severe bleeding disorders, and coagulation disorders; hospitalized for more than two consecutive days in the past three months; and refusal to undergo BAL.

CAP was defined as the presence of one or more acute clinical symptoms or signs compatible with CAP, including dyspnea, cough, purulent sputum, inspiratory crackles, hypoxia, fever (> 38 °C), and radiologic features of pneumonia on CT or X-ray. According to the guidelines of the American Thoracic Society and Infectious Diseases Society of America, SCAP was defined as CAP patients who met either one major criterion or three or more minor criteria. The criteria for SCAP diagnosis include a respiratory rate greater than 30 breaths per minute, a PaO2/FIO2 ratio of less than 250, and the presence of multilobar infiltrates. Additionally, patients may exhibit confusion or disorientation, uremia (defined as a blood urea nitrogen level exceeding 20 mg/dl), leukopenia, thrombocytopenia, hypothermia, and hypotension that necessitates aggressive fluid resuscitation. Major diagnostic criteria include septic shock with the requirement for vasopressor therapy and respiratory failure requiring mechanical ventilation. Mild community-acquired pneumonia (mild CAP) refers to patients who meet the general criteria for CAP but do not fulfill the diagnostic criteria for SCAP. These patients exhibit typical CAP symptoms, such as cough, dyspnea, and fever, but do not require intensive interventions like mechanical ventilation or vasopressor therapy, which are characteristic of SCAP.

In total, our study enrolled 89 CAP patients, including 14 mild CAP and 75 SCAP cases. All patients received treatment according to the CAP consensus guidelines in China. PBMC samples from all patients was collected within 48 h of admission and sent to the Beijing Genomics Institute for bulk RNA-seq. At the same time, BALF and PBMC samples from 14 mild CAP and 65 SCAP patients were also collected for DNA + RNA mNGS. Baseline data at admission were collected, including demographic characteristics (sex, age, BMI, smoking or drinking history), medical history, comorbidities, laboratory indicators (blood routine, blood biochemistry, electrolytes, arterial blood gas analysis, immune indicators), microbiologic results of CMTs, and all variables required for calculating CRUB-65 and PSI scores. The primary outcome parameter was 30-day all-cause mortality.

Conventional microbiological tests (CMTs)

Bacterial and fungal cultures, microscopic smears of respiratory specimens, and polymerase chain reaction (PCR) assays of blood and throat swabs for virus detection (including influenza virus, parainfluenza virus, coronavirus, respiratory syncytial virus, adenovirus, human metapneumovirus, rhinovirus and enterovirus) were routinely conducted. Urine antigen tests (for Streptococcus pneumoniae and Legionella pneumophila) and serology tests (including β-D-1,3 glucan detection, aspergillus galactomannan detection, and pathogen specific antibody detection) were also performed after admission.

Sample collection

BALF samples were collected from the most severe lesion sites through bedside fibreoptic bronchoscopy. Each BALF sample was divided into three tubes, two were sent to Beijing Genomics Institute for DNA and RNA mNGS, and one was used for bacterial and fungal culture in the hospital’s microbiological laboratory. Disposable sterile needles were used to collect whole blood samples into three specialized nucleic acid-free tube (10 ml for each) within 48 h of admission. Specimen quality should be strictly controlled, and hemolysis was meticulously avoided. Two blood samples were sent for DNA and RNA mNGS, while the third was used for PBMC separation through density gradient centrifugation. Subsequently, 1 ml TRIzol reagent was added to each PBMC sample, which was then refrigerated at -80℃ before being sent to Beijing Genomics Institute for bulk RNA-seq.

DNA + RNA mNGS of BALF and blood

The process of DNA and RNA mNGS predominantly involves the following steps: DNA extraction, RNA extraction, metagenomic library preparation, sequencing, and subsequent data analysis. All samples followed a standardized protocol: 300 µL for plasma and 450 µL for BALF, with 300 µL used after lysis. Quality control ensured consistent DNA/RNA extraction and minimized dilution effects. The details were as follows: (1) DNA extraction: According to the manufacturer’s instructions, we extracted DNA from BALF and Blood samples using QIAamp® UCP Pathogen DNA Kit (Qiagen). Benzonase (Qiagen) and Tween20 were used to remove human DNA. (2) RNA extraction: QIAamp®Viral RNA Kit (Qiagen) and Ribo-Zero rRNA Removal Kit (Illumina) were used to extract RNA and remove ribosomal RNA. (3) Metagenomic library preparation and sequencing: cDNA was generated from an RNA template by reverse transcription before library preparation. Then, DNA and cDNA were used separately for subsequent library construction using a Nextera XT DNA Library Prep Kit (Illumina, San Diego, CA). All libraries were quantified by Agilent 2100 (Agilent Technologies, Santa Clara, CA) and Qubit 2.0 (Invitrogen, USA). The qualified DNA or cDNA library was transformed into single-stranded circular DNA library by DNA denaturation and cyclization. Single-stranded circular DNA molecules are replicated by rolling rings to form a DNA nanosphere (DNB) containing multiple copies. The DNBs obtained were added to the mesh holes on the chip using high-density DNA nanochip technology, and sequenced by combined probe anchoring polymerization (cPAS) on MGISEQ-2000 platform (MGI, China). (4) Data analysis: First, raw data were quality-filtered using SOAPnuke (v1.5.0), and adapter sequences, low-quality reads (Q < 30), short reads (length < 36 bp), duplicate reads, and low-complexity reads were removed for subsequent analysis. Trimmed reads were aligned against the human reference genome (hs37d5) using Bowtie2 software to remove human reads. The remaining non-human sequences were then aligned to microbial genome databases of bacteria, viruses, fungi, parasites, and specific pathogens (Genome Database: http://ftp.ncbi.nlm.nih.gov/genomes/genbank/). Species annotation of the mapped data was conducted by Kraken2, the species-level abundance was estimated by Bracken using Bayesian algorithms and Kraken classification results. Finally, suspected pathogens were listed including strictly mapped read counts, coverage rates, and depths.

In each mNGS sequencing batch, two negative controls (sterile water) were included to monitor contamination, and bioinformatics pipelines were used to remove common contaminants. Strict aseptic procedures and dual-index barcoding minimized cross-contamination. Sequencing was performed on the MGI2000 platform, processing 48 samples per chip with a depth of 40–100 M reads per sample. Quality control was implemented at multiple stages: reference materials with known pathogens were used during reagent production, and bacterial-spiked positive controls were included in routine testing to ensure accuracy.

Pathogen classification and mNGS vs. CMT comparison

Pneumonia-associated microorganisms, referred to as pathogens, are defined as microorganisms that can contribute to the onset or exacerbation of disease under specific clinical conditions. While Corynebacterium18, Enterococcus19, TTV20, and Candida21 are typically considered part of the normal flora in healthy individuals, they may act as opportunistic pathogens in patients with severe pneumonia, particularly those with compromised immune systems or lung tissue damage. In clinical reports provided by BGI, these microorganisms were identified as pneumonia-associated microorganisms rather than background flora due to their potential role in disease progression. The classification of background microorganisms takes into account microbial colonization characteristics in specific body sites, such as the skin, oral cavity, and gut, as well as their pathogenicity and virulence. Microorganisms that are commonly present, have stable abundance, and are not typically associated with disease in healthy individuals are considered background microorganisms. In contrast, pathogens are identified based on their potential to cause disease under certain clinical conditions. This classification, guided by mNGS technology, epidemiological data, and clinical experience, ensures that relevant microorganisms are appropriately categorized in the context of severe pneumonia. We compared CMT and mNGS results based on clinical judgment. Physicians integrated clinical, laboratory, and imaging data for pathogen identification. While results may differ, the two methods are complementary, with final identification determined by physician expertise.

RNA-seq of PBMC and bioinformatics analysis

Initially, total RNA was extracted from PBMC of each CAP patient using TRIzol (Invitrogen). The quality, integrity, and concentration of RNA were assessed. A transcriptome library was constructed from 2 µg of RNA using KAPA RNA HyperPrep with Ribo-Zero rRNA Removal Kit (Roche). RNA-sequencing was then performed on the DNBSEQ platform to obtain raw data. Raw data were filtered using SOAPnuke (v1.5.6) to produce clean data. The following reads were removed: (1) reads containing connector sequences; (2) reads with unknown base N content (≥ 5%); (3) low-quality reads (i.e., those with a base quality score of less than 15 accounting for more than 20% of the total base number).

Clean data were compared to the reference genome using the HISAT2 (v2.1.0) and Bowtie2(v2.3.4.3). Gene expression quantification was performed using RSEM (v1.3.1), and the clustering heatmap of gene expression across different samples was generated using pheatmap (v1.0.8). Read count normalization and differential gene expression analysis (SCAP_survivor vs mild CAP, SCAP_death vs mild CAP, SCAP_survivor vs SCAP_death) were performed using DESeq2 (v1.4.5) with a Q value ≤ 0.05 or FDR ≤ 0.001. Specifically, comparisons between Mild CAP and SCAP (Survivor), as well as between Mild CAP and SCAP (Death), were conducted to assess the transcriptomic differences between mild and severe pneumonia. Additionally, a comparison between SCAP (Survivor) and SCAP (Death) was performed to investigate gene expression patterns associated with patient prognosis. GO and KEGG pathway enrichment analyses of differentially expressed genes (DEGs) were performed using Phyper, with a threshold is set to Qvalue ≤ 0.05. The pathway analysis was performed using the KEGG database22,23,24.

Statistical analysis

Continuous variables conforming to a normal distribution were presented as the mean ± standard deviation (x ± s), and an unpaired t-test was used for group comparisons. Clinical indicators that exhibited a skewed distribution were reported as the median with the interquartile range [M (P25, P75)], and the groups were compared using the Mann–Whitney U test. Categorical variables were represented as absolute numbers and percentages [N(%)] and were compared using the Chi-square test or Fisher’s exact test. Statistical analysis and plotting were performed using SPSS 22.0 and Graphpad Prism 8 software. A P value of ≤ 0.05 was considered statistically significant.

Results

Baseline characteristics of enrolled patients

A total of 89 patients were enrolled in our study, comprising 14 with mild CAP and 75 with SCAP, of whom 32 SCAP (42.67%, 32/75) patients died during the 30-day follow-up period. Their demographic and clinical characteristics are detailed in Table 1. Patients in the SCAP (death) group were older than those in the SCAP (survivor) group (69.00 ± 15.64 vs 58.56 ± 17.57, P = 0.0095). Compared to mild CAP patients, a higher proportion of males was observed in both the SCAP (survivor) (86.05% vs 42.86%, P = 0.0011) and SCAP (death) (75% vs 42.86%, P = 0.0352) group, although gender was not significantly correlated with SCAP prognosis. Interestingly, allergic history was more prevalent in SCAP (death) group compared to both mild CAP (28.13% vs 15.38%) and SCAP (survivor) patients (28.13% vs 9.52%, P = 0.0372). The incidences of comorbidities (93.75% vs 61.54% P = 0.0069) and cerebrovascular disease (31.25% vs 7.69%, P = 0.0956) were higher in SCAP (death) group than in mild CAP cases. Additionally, SCAP (death) patients had a higher rate of cerebrovascular disease (31.25% vs 7.14%, P = 0.0022) compared to SCAP (survivor) patients, suggesting it may be an indicator of worse outcomes in SCAP patients. Both the SCAP (survivor) (36.59% vs 0%, P = 0.0080) and SCAP (death) (33.33% vs 0%, P = 0.0140) groups were more likely to use non-invasive mechanical ventilation (NIPPV) compared to the mild CAP group.

Table 1 Demographic and clinical characteristics of the 89 patients enrolled with mild CAP or SCAP.

Laboratory findings at admission

As shown in Table 2, blood routine, blood biochemistry, blood electrolytes, arterial blood gas (ABG), coagulation indicators, serum immunoglobulin, inflammatory indicators, and other clinical laboratory indicators of CAP and SCAP patients were collected upon admission. As the severity of CAP increased [mild CAP → SCAP(survivor) → SCAP(death)], the levels of white blood cell count and neutrophil count, lactic acid in ABG, alanine aminotransferase (ALT), and aspartic transaminase (AST), CRP first increased and then decreased, while lymphocyte count and total protein (TP) continuously declined, P < 0.05. Additionally, laboratory indicators, including fibrin degradation product (FDP), D-dimer, PRO-BNP, CK-MB, Myoglobin, serum IgE, and β-D-1, 3-glucan (fungus) were positively correlated with the severity of CAP and poor prognosis, P < 0.05. SCAP patients had a lower level of eosinophil count, Ca2 + and serum IgM than mild CAP patients, but this indicator had no significant association with SCAP prognosis. SCAP (death) had more hypersensitive troponin I than SCAP (survivor) group.

Table 2 Laboratory Findings at Admission.

Association between CAP severity and pathogen detection rates by mNGS and CMTs

BALF mNGS, blood mNGS, and CMTs were all performed in mild CAP (n = 13), SCAP (survivor) (n = 40), SCAP (death) (n = 24) to compare the detection performance (Fig. 1). The comparison of the positive rate of pathogens in mild CAP and SCAP patients using blood and BALF mNGS or CMTs is showed in Fig. 2 and Appendix Table 1. Based on BALF mNGS data, SCAP (survivor) patients had a higher positive rate of bacteria compared to both mild CAP (82.5% vs 53.85%, P = 0.037) and SCAP (death) patients (82.5% vs 58.33%, P = 0.0341). In contrast to the above results, the positive rates of bacteria detected by CMTs increased with CAP severity [mild CAP vs SCAP (survivor) vs SCAP (death): 23.08% vs 42.5% vs 62.5%]. The detection rate of fungi by CMTs (7.69% vs 20% vs 29.17%) or BALF mNGS (15.38% vs 42.5% vs 50%) also showed a gradual increase as CAP severity worsened. Furthermore, the detection rate of fungi by blood mNGS was low and showed no significant differences among the three CAP groups (7.69% vs 7.5% vs 8.33%). The positive rate of DNA viruses detected by blood mNGS increased with disease severity (61.54% vs 80% vs 91.67%), whereas the rate of DNA viruses detected by BALF mNGS only increased in SCAP (death) patients (69.23% vs 67.5% vs 87.5%). In summary, SCAP patients, particularly SCAP (death) patients had higher rates of bacteria, fungi and DNA viruses according to a comprehensive analysis of BALF mNGS, blood mNGS, and CMTs data.

Fig. 1
figure 1

Flow chart of the study.

Fig. 2
figure 2

Comparison of Pathogen Detection Rates using DNA + RNA mNGS in Blood and BALF Samples Versus CMTs in Mild and Severe CAP Cases.

Comparison of detection performance: BALF mNGS, blood mNGS, and CMTs

As shown in Fig. 2 and Table 3, BALF mNGS significantly improved bacteria, fungi, and DNA virus detection in SCAP (survivor) patients compared to CMTs. When comparing the performance of mNGS across different sample types, we found that BALF mNGS detected more bacteria in both SCAP (survivor) (82.5% vs 47.5%, P = 0.001) and SCAP (death) patients (58.33% vs 33.33%, P = 0.0822) than blood mNGS. Similarly, the detection rate of fungi by BALF mNGS was also much higher than by blood mNGS in both SCAP (survivor) (42.5% vs 7.5%, P = 0.0003) and SCAP (death) (50% vs 8.33%, P = 0.0015) groups. The detection performance of BALF mNGS on DNA virus was much better than that of CMTs, but slightly worse than blood mNGS. Blood mNGS might be more suitable for the detection of DNA viruses than BALF mNGS, particularly in SCAP (death) patients (87.5% vs 67.5%, P = 0.0322), but it had poor detection efficiency for fungi in all CAP patients. It is disappointing that the combined application of BALF and blood mNGS did not exhibit better performance than BALF mNGS alone for detecting bacteria and fungi among CAP or SCAP patients. In conclusion, pathogen detection rates were influenced by both the testing method and disease severity stratification. BALF mNGS and CMTs significantly improved pathogen detection rates in CAP and SCAP patients. Given the different sample sources and patient conditions, the varying detection rates between methods were expected.

Table 3 Comparison of pathogen detection performance between DNA + RNA mNGS in BALF and blood samples versus CMTs among mild CAP, SCAP (survivor), SCAP (death) patients.

Different pathogen spectra in mild CAP, SCAP (survivor), and SCAP (death) patients

As shown in Fig. 3 and Appendix Table 2, the pathogen spectra identified in mild CAP, SCAP (survivor), and SCAP (death) patients varied across different diagnostic methods (BALF mNGS, blood mNGS, and CMTs). In mild CAP patients, BALF mNGS detected a broader range of bacteria, including Acinetobacter baumannii (23.08%) and Klebsiella pneumoniae (15.38%), compared to blood mNGS and CMTs. Among SCAP (survivor) patients, K. pneumoniae and A. baumannii were most frequently identified, with BALF mNGS showing the highest detection rates. In SCAP (death) patients, BALF mNGS revealed a higher prevalence of bacteria such as A. baumannii (20.83%) and Corynebacterium striatum (20.83%), while blood mNGS and CMTs detected K. pneumoniae (20.83%) and A. baumannii (25%) with a slightly different distribution.Fungal and viral spectra also differed by disease severity. In SCAP (survivor) patients, Candida glabrata (17.5%) and Candida albicans (15%) were the most common fungi detected by BALF mNGS, whereas in SCAP (death) patients, C. albicans (29.17%) and Pneumocystis jiroveci (16.67%) were predominant. Viral detection showed that Epstein-Barr virus (EBV) and Torque teno virus (TTVs) were the most frequent in both SCAP groups, with higher detection rates in blood mNGS compared to BALF mNGS.

Fig. 3
figure 3

Pathogen Spectrum in Mild CAP, SCAP (survivor), and SCAP (death) Patients Identified by Blood and BALF DNA + RNA mNGS Versus CMTs.

DNA + RNA mNGS analysis identified significant pathogen-associated risk factors for severe CAP and poor prognosis (Table 4). In BALF samples, P. jiroveci (16.67% vs 2.5%, P = 0.0409) was more prevalent in fatal cases. Human cytomegalovirus (HCMV) in blood samples showed a higher detection rate in severe CAP cases (29.17% vs 7.69%, P = 0.0982). C. striatum, C. albicans, EBV, and TTVs had higher detection rates in severe cases, but these differences were not statistically significant. The results for DNA mNGS and RNA mNGS are provided separately in Appendix Table 2. Additionally, the information about co-infection conditions, read counts, and other details can be found in Appendix Tables 57.

Table 4 Pathogen-Associated Risk Factors for Severe CAP and Prognosis: An RNA + DNA mNGS Analysis of BALF and Blood Samples.

Concordance of mNGS with CMTs in mild CAP and SCAP patients

The positive and negative concordance rates between mNGS and CMTs are presented in Appendix Table 3. Concordance here refers to the agreement between mNGS and CMT in detecting pathogens. BALF mNGS exhibited a higher positive concordance rate with CMTs compared to blood mNGS across all three CAP groups, however, it demonstrated a lower negative concordance rate in SCAP patients relative to blood mNGS. The positive concordance rates of BALF mNGS with CMTs were 100%, 76.47%, 66.67% for bacterial detection, and 100%, 62.50%, 57.14% for fungal detection in mild CAP, SCAP (survivor), SCAP (death) patients, respectively. Additionally, the negative concordance rates of BALF mNGS to CMTs were 60%, 13.04%, 55.56% for bacterial detection and 91.67%, 65.71%, 52.94% for fungal detection across three groups. In summary, BALF mNGS demonstrates greater sensitivity in pathogen detection, albeit with reduced specificity in severe cases.

Transcriptomic differences among mild CAP, SCAP (survivor), and SCAP (death) patients

PBMC from all mild CAP, SCAP (survivor), and SCAP (death) patients at admission were used for bulk RNA-sequencing to identify potential biomarkers of CAP severity and prognosis (Fig. 1). As illustrated in Fig. 4, DEGs were analyzed through pairwise comparisons of the CAP groups. In SCAP (death) versus mild CAP, 261 DEGs were up-regulated and 170 were down-regulated, while SCAP (survivor) had only two up-regulated DEGs. Comparisons between SCAP (death) and SCAP (survivor) revealed three up-regulated and one down-regulated DEG. FOLR3 (Folate Receptor 3) and ITGA7 (integrin subunit alpha 7) were up-regulated DEGs in both SCAP groups compared to mild CAP patients, with log2FoldChange values of 4.41 and 3.39 for FOLR3 and 3.02 and 2.89 for ITGA7, respectively (adjusted P-value < 0.001). The expression of OTOF (Otoferlin), SIGLEC1 (Sialic Acid Binding Ig Like Lectin 1), CXCL10 (C-X-C motif chemokine ligand 10), and MS4A4A (Membrane Spanning 4-Domains A4A) was significantly elevated in SCAP (death) compared to both mild CAP and SCAP (survivor). Cyclin A1(CCNA1) expression was higher in SCAP (death) compared to SCAP (survivor) and mild CAP but did not differ between SCAP (survivor) and mild CAP. In conclusion, FOLR3 and ITGA7 may serve as biomarkers of CAP severity, while OTOF, SIGLEC1, MS4A4A, and CXCL10 may be valuable for predicting both disease severity and poor outcomes in CAP.

Fig. 4
figure 4

Differentially Expressed Genes (DEGs) Analysis Among Mild CAP, SCAP (survivor), and SCAP (death) Patients. (A) Bar chart showing the number of down-regulated and up-regulated DEGs in comparisons of SCAP (Survivor) vs. Mild CAP, SCAP (Non-Survivor) vs. Mild CAP, and SCAP (Non-Survivor) vs. SCAP (Survivor). (B) Venn diagram illustrating the overlap in the number of DEGs across pairwise comparisons: SCAP (Survivor) vs. Mild CAP, SCAP (Non-Survivor) vs. Mild CAP, and SCAP (Non-Survivor) vs. SCAP (Survivor). (C) Volcano plot depicting the distribution of DEGs between SCAP (Survivor) and Mild CAP patients. (D) Volcano plot showing the distribution of DEGs between SCAP (Non-Survivor) and Mild CAP patients. (E) Volcano plot presenting the distribution of DEGs between SCAP (Non-Survivor) and SCAP (Survivor) patients.

DEGs-based GO and KEGG pathway enrichment analysis

To identify key biological or molecular processes influencing CAP severity and prognosis, the identified DEGs were used to perform gene functional pathway enrichment analysis (Figs. 5 and 6). GO analysis was divided into three categories: biological process (BP), cellular component (CC), and molecular function (MF). The top up-regulated GO pathways in SCAP (survivor) patients compared to mild CAPs included folic acid transport, leukocyte migration, external side of plasma membrane, folic acid binding, and signaling receptor activity (Fig. 5A). Compared to mild CAP patients, the commonly up-regulated GO pathways in SCAP (death) primarily involved inflammatory response, extracellular region, collagen-containing extracellular matrix, extracellular space, extracellular matrix organization and RNA nuclease activity. The main down-regulated GO pathways included positive regulation of natural killer cell mediated cytotoxicity, adaptive immune response, T cell activation, immune response, external side of plasma membrane, and MHC class I protein complex binding (Fig. 5B, C). In addition, GO functional pathways, including cellular response to interleukin-17, endothelial cell activation, plasma membrane raft, CXCR3 chemokine receptor binding, and virion binding were up-regulated in SCAP(death) compared to SCAP (survivor) patients (Fig. 5D), while the down-regulated pathway was the response to corticosterone.

Fig. 5
figure 5

GO Functional Pathway Enrichment Analysis in Mild CAP, SCAP (survivor), and SCAP (death) Patients (A) Top 30 upregulated GO functional pathways in SCAP (survivor) compared to Mild CAP patients. (B)Top 30 upregulated GO functional pathways in SCAP (death) compared to Mild CAP patients. (C)Top 30 downregulated GO functional pathways in SCAP (death) compared to Mild CAP patients. (D)Top 30 upregulated GO functional pathways in SCAP (death) compared to SCAP (survivor) patients.

Fig. 6
figure 6

KEGG Pathway Enrichment Analysis of PBMC Transcriptomes in Mild CAP, SCAP (survivor), and SCAP (death) Patients. (A) Top 20 upregulated KEGG functional pathways in SCAP (survivor) compared to Mild CAP patients. (B) Top 20 upregulated KEGG functional pathways in SCAP (death) compared to Mild CAP patients. (C) Top 20 downregulated KEGG functional pathways in SCAP (death) compared to Mild CAP patients. (D) Top 20 upregulated KEGG functional pathways in SCAP (death) compared to SCAP (survivor) patients.

The most impacted pathways associated with these common DEGs were also identified based on KEGG databases. KEGG pathways such as antifolate resistance, ECM-receptor interaction, and focal adhesion were identified as activated and played crucial roles in SCAP (survivor) compared to mild cases (Fig. 6A). Complement and coagulation cascades, nitrogen metabolism, and ECM-receptor interaction were the top up-regulated KEGG pathways enriched in SCAP (death) patients compared to mild cases, while the top down-regulated KEGG pathways included hematopoietic cell lineage, cell adhesion molecules, antigen processing and presentation, primary immunodeficiency, cytokine-cytokine receptor interaction, and T cell receptor signaling pathway (Fig. 6B, C). The up-regulated genes in SCAP (death) patients were mainly associated with Epstein-Barr virus infection, RIG-I-like receptor signaling pathway, and cytosolic DNA-sensing pathway when compared to mild patients (Fig. 6D). To further substantiate the conclusions of the pathway enrichment analysis, detailed statistical data, including p-values, adjusted p-values, gene ratios, and background ratios, have been compiled in the Appendix Tables 8 and 9.

Gene set enrichment analysis (GSEA) of KEGG and GO pathways

GSEA KEGG pathway analysis, as shown in Appendix Table 10, revealed significant pathway enrichment in SCAP (death) and SCAP (survivor) groups compared to mild CAP. Immune-related pathways, including complement activation and neutrophil extracellular trap formation, were enriched in both groups, underscoring the crucial role of immune activation in severe pneumonia. In the SCAP (death) group, pathways related to energy metabolism, such as oxidative phosphorylation and lysosome function, were significantly upregulated, indicating increased cellular energy demand and stress response. In contrast, the SCAP (survivor) group showed downregulation of immune-related pathways, such as T cell receptor signaling and FoxO signaling, pointing to the importance of immune suppression and metabolic regulation for survival. The upregulation of oxidative phosphorylation, protein synthesis, and immune regulation pathways in SCAP (death) links metabolic dysregulation and excessive immune activation to mortality.

GSEA GO analysis of SCAP (death) vs. mild CAP revealed immune suppression and metabolic dysregulation as key features of severe pneumonia, including downregulation of T cell receptor signaling and T cell co-stimulation, and upregulation of proton-driven ATP synthesis, highlighting the role of immune decline and metabolic imbalance. The death group also showed significant upregulation in stress response pathways such as complement activation and protein folding, while downregulation of RNA transport and chromatin remodeling suggests immune and repair dysfunction as key factors in mortality. In SCAP (survivor) vs. mild CAP, the survivor group exhibited significant enrichment in acute phase response and cell cycle pathways, indicating the crucial role of immune regulation and cell repair for survival. Upregulation of low-density lipoprotein binding and carrier receptor activity supports energy supply for survival. Comparing SCAP (death) with SCAP (survivor), the death group showed significant upregulation of key pathways related to energy metabolism and immune response, such as proton-driven ATP synthesis, mitochondrial electron transport, and aerobic respiration, suggesting metabolic dysregulation as a direct cause of mortality. Excessive immune activation, particularly complement activation and telomere organization, points to immune overactivation accelerating death.

Discussion

In this study, we characterized the pathogen spectra of mild CAP, SCAP (survivor) and SCAP (death) patients as identified by CMTs and DNA + RNA mNGS of BALF and blood, and compared their detection performance. The microbiome in BALF and blood, PBMC transcriptome profiles, along with clinical and laboratory characteristics of mild CAP and SCAP patients, were also analyzed to identify the potential biomarkers, clinical indicators or key functional pathways influencing disease severity and prognosis.

Overall, we found that incorporating BALF mNGS, as opposed to blood mNGS, into routine diagnosis workflows significantly enhanced the detection of bacteria, fungi and viruses. Timely and accurate identification of pathogens in CAP, particularly SCAP, is essential for determining the appropriate treatment plan and improving prognosis9,10. Although few previous study have compared the performance of blood and BALF mNGS in the same SCAP population, numerous studies have shown that BALF mNGS alone exhibits high diagnostic performance in LRTI patients and can provide clues for identifying rare pathogens such as Mycobacterium abscessus, P. jiroveci, Orientia tsutsugamushi, Chlamydia psittaci9,25. A meta-analysis reported that the pooled sensitivity and specificity of BALF mNGS for penumonia were 78% (95% confidence interval [CI] 67–87%; I2 = 92%) and 77% (95% CI 64–94%; I2 = 74%), respectively26. It was found that the sensitivity of BALF mNGS on viruses was much higher than that of CMTs (92.31% versus 7.69%, P < 0.01)27. Similarly, our study found that both blood DNA + RNA mNGS and BALF DNA + RNA mNGS performed excellently for DNA viruses but the detection rate of RNA virus was very low. The small genomes of most RNA viruses, their low abundance, the high degradation rate of viral RNA relative to host RNA, and inappropriate RNA library preparation methods may contribute to the low detection rate of RNA viruses by mNGS28,29. While previous studies have reported a positive detection rate of up to 92% (95% CI 78–100%) for pathogens using mNGS in BALF samples from severely or immunocompromised pulmonary-infected patients26, our subgroup analyses demonstrated that the diagnostic performance of mNGS in blood or BALF for bacterial pathogens was influenced by disease severity and declined among critical ill CAP patients, such as those in the SCAP (death) group. BALF or blood samples from critically ill patients are often characterized by significant infiltration of host inflammatory and tissue cells, leading to a high burden of host DNA30. This substantial presence of host DNA can markedly impede the accurate detection of bacterial DNA31,32. Consequently, the combination of CMTs and BALF mNGS is strongly recommended.

We identified the most common pathogens in SCAP as bacteria, including A. baumannii, K. pneumoniae, Pseudomonas aeruginosa, and C. striatum; fungi, including C. albicans, P. jiroveci, C. glabrata, Candida tropicalis, and Aspergillus; and viruses, including EBV, TTVs, HCMV and human polyomavirus type 2 (PyV-2). A national surveillance study in China reported that the most frequent pathogens associated with SCAP among older adults were influenza virus (10.94%) and P. aeruginosa (15.37%). In adults, SCAP was significantly associated with infection by P. aeruginosa, K. pneumoniae, or S. pneumoniae, or co-infection with P. aeruginosa and K. pneumoniae33. Our findings underscore the diverse pathogen landscape in SCAP and the critical need for targeted surveillance and management strategies, particularly for bacterial and viral co-infections in high-risk populations.

Contrary to previous research that suggested pathogen category or microbial diversity might not be linked to 30-day mortality in SCAP14, we found that the infections by P. jiroveci, C. striatum, C. albicans, HCMV may indicate a worse prognosis in pneumonia patients. The poor prognosis of P. jiroveci-induced SCAP is primarily due to impaired host immunity, severe pulmonary pathology, delayed diagnosis, treatment challenges, and the presence of comorbidities and complications34. C. striatum detected by mNGS was once considered a contaminant because it is commonly found in the normal flora of the skin and oropharynx. However, recent studies have identified C. striatum as a pathogen in various infections including prosthetic joint infection, central line-associated bacteremia, endocarditis, and pleuropulmonary infections18,35. A retrospective study conducted in Korea from 2014 to 2019 indicated that C. striatum is emerging as a potential cause of severe pneumonia, particularly in immunocompromised patients18. To date, few studies have focused on C. striatum pneumonia, and the pathogenic mechanisms underlying SCAP caused by C. striatum remain to be explored. Severe pneumonia itself can lead to immune dysfunction, increasing susceptibility to opportunistic infections such as C. albicans and P. jirovecii, especially in critically ill patients36,37. The causality between severe pneumonia and the exacerbation by pathogens like C. striatum and C. albicans is challenging. In critically ill pneumonia patients, immune dysfunction is often a consequence of the disease itself, potentially increasing susceptibility to opportunistic infections. This highlights the complexity of distinguishing primary pathogens from secondary infections during pneumonia progression. Accelerated T-cell immunosenescence has been reported in HCMV-seropositive individuals38. Immune deficiency and severe infection can trigger the reactivation of latent HCMV infection, and the emergence of HCMV DNAemia in mild CAP patients is a rare event39. In general, the detection of HCMV DNAemia occurs predominantly in patients with PSI classes indicating more severe pneumonia39. Overall, our findings suggest that the detection of C. striatum, C.albicans, P. jiroveci, and HCMV by BALF or blood mNGS in SCAP patients generally indicates immune dysfunction and is associated with a worse prognosis.

According to RNA-seq analysis of PBMC, there were 431 DEGs in SCAP (death) compared to mild patients, however, few DEGs were found between SCAP (survivor) and mild CAP or between SCAP (death) and SCAP (survivor). Genes such as OTOF, MS4A4A, CXCL10, and SIGLEC1 serve as potential biomarkers for SCAP and its poorer outcomes. OTOF, involved in calcium ion signaling, may influence cellular responses to infection40. MS4A4A, known for its role in immune cell function modulation, could impact the immune response in severe CAP41. CXCL10 plays a critical role in recruiting immune cells to sites of inflammation, making it a reliable marker for predicting worse outcomes in SCAP42. SIGLEC1, expressed as CD169 on circulating monocytes, is involved in immune regulation and has been linked to disease severity in COVID-19, illustrating its potential as a biomarker for assessing severity and prognosis in SCAP43,44. So far, CXCL10 and SIGLEC1 have been more extensively studied and have established connections to severe inflammatory and infectious diseases, the roles of MS4A4A and OTOF in SCAP remain underexplored40,41,42,44.

In SCAP (death) patients, GO pathways such as extracellular region, and collagen-containing extracellular matrix were up-regulated, while immune response, adaptive immune response, and dendritic cell chemotaxis pathway were down-regulated. Abnormal remodeling of the extracellular matrix (ECM) in the lung was associated with inflammatory response and fibrosis, which may contribute to severe COVID-19, tuberculosis, invasive pneumococcal disease (IPD)45,46,47. In addition, KEGG pathways such as complement and coagulation cascades, nitrogen metabolism, ECM-receptor interaction, and the EBV infection pathway were significantly up-regulated, while pathways involved in hematopoietic cell lineage, cell adhesion molecules, and antigen processing and presentation were down-regulated. Previously, a dysregulated complement system, coagulation cascade, and platelet function has been found to be potentially associated with multi-organ injury or failure in severe COVID-19 or pediatric SCAP48,49,50. Metabolic dysregulation, such as altered nitrogen metabolism, affects multiple organs. The versatile chemistry of nitrogen is important for pulmonary physiology and is involved in a range of pulmonary diseases, including asthma, cystic fibrosis, and adult respiratory distress syndrome51,52. EBV, a member of the herpesvirus family, is generally considered the cause of infectious mononucleosis (IM), lymphomas, and nasopharyngeal carcinomas53. EBV reactivation has been found to be common in sepsis patients and is associated with longer ICU stays (12.9 vs. 9.2 days) and increased organ failure (day 1 SOFA score 6.9 vs. 5.9)54. Similarly, we observed a higher proportion of EBV infections in SCAP compared to mild CAP, and the pathway associated with EBV infection was up-regulated in both SCAP survivors and non-survivors. Both GO and KEGG analyses suggest that a secondary immunosuppressive state and a weakened innate and adaptive immune system might be common features of SCAP (death), indicating a poorer outcome. In summary, RNA-seq analysis of PBMCs provided molecular insights into the potential mechanisms of SCAP, identifying new biomarkers of disease severity and prognosis for early and precise intervention.

GSEA KEGG analysis revealed significant increase of immune-related pathways, such as complement activation and neutrophil extracellular trap formation, in the SCAP (death) group, that excessive immune activation may accelerate disease progression and contribute to mortality55. Furthermore, GSEA heightened activity in energy metabolism pathways, including oxidative phosphorylation and lysosomal function, in the SCAP (death) group, which reflects the increased cellular energy demands, consistent with the metabolic dysregulation observed in traditional analyses56. The interaction between metabolic dysregulation and immune overactivation likely causes cellular damage and exacerbates the disease. GSEA GO analysis further highlighted a reduction in T cell receptor signaling and metabolic pathways (e.g., proton-driven ATP synthesis) in the SCAP (death) group, while stress response pathways, including complement activation and protein folding, were notably increased, underlining the critical role of immune suppression and repair dysfunction in mortalityy57,58. In contrast, the SCAP (survivor) group showed enrichment in acute phase response and cell cycle pathways, emphasizing the importance of immune regulation and cellular repair for survival. By integrating GSEA with traditional GO/KEGG analysis, we gained a more comprehensive understanding of the roles of immune and metabolic dysregulation in severe pneumonia and identified potential therapeutic targets and prognostic markers for precision treatment.

Older age, a history of allergies, and cerebrovascular disease were clinical markers associated with a poorer outcome in SCAP patients. Laboratory indicators such as WBC count, neutrophil count, ALT, AST, and CRP initially increased and then decreased with the progression of CAP, warranting attention when assessing disease severity and predicting prognosis. An acute and critical condition should be considered in SCAP patients who despite being identified as SCAP by scoring systems such as the PSI score, CURB-65 score, or SOFA score, present with unusually low levels of WBC count, neutrophil count, ALT, AST, and CRP. This phenomenon is akin to the “silent chest” observed in critical asthma patients, which may reflect a state of multiple organ failure and myelosuppression. Additionally, laboratory indicators such as FDP, D-dimer, PRO-BNP, CK-MB, Myoglobin, serum IgE, and β-D-1, 3-glucan (fungus) gradually increased with disease severity and may be positively correlated with poorer outcomes. We hypothesized that coagulation abnormalities, cardiac insufficiency, fungal co-infection, and allergic status might have a mutual causality with SCAP severity and play a critical role in the progression of CAP.

Our study has several strengths. First, this research was a prospective multicenter study, which enhances its reliability and applicability. Second, enrolled patients received both blood and BALF mNGS simultaneously, enabling a pairwise comparison of the performance of blood mNGS, BALF mNGS, CMTs among SCAP patients. Third, our study linked detailed clinical features, laboratory indicators, and multi-omics bio-signatures, including DNA + RNA mNGS and PBMC RNA-seq analysis, to the differential severity and prognosis of SCAP patients. Our study also has some limitations. First, the relatively small size of the cohort may limit the generalizability of our results. Larger, multicenter studies will be needed to validate these findings across broader clinical settings. Moreover, the detection rate of RNA virus was low; therefore, the current library preparation and sequencing processes for DNA + RNA mNGS should be optimized. Besides, dynamic changes in multi-omics bio-signatures during SCAP progression were not assessed due to the difficulty of specimen acquisition. Finally, our sample collection occurred during COVID-19 management, excluding COVID-19 patients. With shifting global policies, changes in pathogen spectra and immune responses could affect our findings. Future studies should account for these factors.

Conclusions

This study investigated the performance of BALF and blood DNA + RNA mNGS, CMTs in pathogen detection among patients with mild CAP and SCAP. Furthermore, the clinical features, laboratory indicators, and multi-omics bio-signatures of the microbiome and PBMC transcriptome were analyzed to identify biomarkers for disease severity and prognosis. The performance of BALF mNGS was superior to that of blood mNGS, and the detection efficiency of mNGS for bacteria was influenced by disease severity. Therefore, we highly recommended the combination of BALF mNGS and CMTs for SCAP diagnosis. Microbiome biomarkers such as P. Jiroveci, C. striatum, C. albicans, HCMV, EBV, as well as transcriptome bio-signatures such as OTOF, MS4A4A, CXCL10, and SIGLEC1, were associated with poorer outcomes in SCAP. GSEA and traditional GO/KEGG analyses revealed key immune and metabolic dysregulation pathways in SCAP (death) patients, including upregulated pathways related to complement and coagulation cascades, oxidative phosphorylation, nitrogen metabolism, and EBV infection, while adaptive immune response, hematopoietic cell lineage, and antigen processing pathways were downregulated. Overall, microbiomic and transcriptomic analyses provide a molecular pathological basis for identifying biomarkers associated with CAP severity and prognosis which is beneficial for individualized treatment.