Introduction

Non-small-cell lung cancer (NSCLC) accounts for more than 80% of all lung cancers. In the last decade, the emergence of immune checkpoint inhibitors (ICIs) targeting programmed cell death 1 (PD-1) or its ligand (PD-L1) has led to significantly prolonged survival benefits for NSCLC patients1,2,3. Microorganisms in the human gastrointestinal tract shape a dynamically balanced and healthy internal microenvironment4. Gut microbiota remains essential for maintaining gut barrier integrity, regulating insulin sensitivity, and supporting immune system functions. More specifically, studies have shown that gut dysbiosis has a complex correlation with several lung diseases, with alteration of the microbiota composition triggering immune dysregulation and affecting other organs through the release of its metabolites into the circulation, inducing systemic inflammation and tumor development5,6,7. Recently, considerable attention has been focused on the association between gut microbiota and resistance to immunotherapy3,8,9. For example, Jiang et al.10 found that lung cancer patients exhibited reduced and more homogenous gut microbiota diversity, characterized by a significant decrease in short-chain fatty acids (SCFAs)-producing bacteria such as Firmicutes and Actinobacteria when compared to healthy individuals. This dysbiosis may play a key role in the development of lung cancer. Additionally, Ren et al.11 categorized lung cancer patients into responders and non-responders to immunotherapy, and found that responders had a significant increase in Faecalibacterium along with elevated levels of SCFAs. Intestinal bacteria have been reported to profoundly influence the clinical benefit of immunotherapy, not only in lung cancer, but also in melanoma, epithelial tumors, and other cancers. Administering antibiotics prior to the initiation of immunotherapy may impair its efficacy, potentially due to the antibiotic-induced gut dysbiosis. A recent study conducted by Cortellini et al.12 demonstrated that exposure to antibiotics affected the efficacy of first-line chemo-immunotherapy. Furthermore, successful fecal microbial transplantation (FMT) can alter the gut microbiota profile and reverse the resistance to ICIs in patients with melanoma3,8,13,14.

The mechanisms underlying the efficacy of combination chemotherapy and immunotherapy are complex. While existing studies suggest that the gut microbiota may enhance the effects of immunotherapy by modulating immune function, the specific role of gut microbiota in influencing the synergistic effects of chemotherapy and immunotherapy remains unclear. Unlike other cancers, the combination of cisplatin-based chemotherapy and immunotherapy has been shown to improve survival in NSCLC patients by enhancing immune responses and targeting tumor cells15. Given the critical role of the gut microbiota in immune regulation, it is essential to understand its impact on chemo-immunotherapy combinations. Through clinical and animal studies, we aim to explore how the gut microbiota modulates the synergistic effects of chemotherapy and immunotherapy, offering potential therapeutic strategies to enhance outcomes for NSCLC patients.

Results

Patient characteristics

A total of 106 patients were enrolled, including 12 females and 94 males, with a mean age of 65.7 ± 7.6 years. Of these, 58 had adenocarcinoma and 48 had squamous cell carcinoma. 53 patients (responders, R group) had progression-free survival (PFS) ≥ 6 months, while 53 patients (non-responders, NR group) had PFS < 6 months. No significant differences in comorbidities, age, sex, smoking status, histological subtypes, tumor stage, or lymph node stage were observed between the two groups (Supplementary Table 1). The ICIs used included pembrolizumab (n = 65), nivolumab (n = 14), and sintilimab (n = 27). For cisplatin-based chemotherapy, 55 patients received pemetrexed and 51 patients received paclitaxel.

Analysis of gut microbiota diversity alterations and compositional differences between responders and non-responders based on 16S rRNA sequencing

First, 16S rRNA sequencing was used to examine gut microbiota diversity in the R and NR groups. Shannon (p = 0.01205), Simpson (p = 0.04385), and Chao (p = 0.009348) analyses revealed significantly lower richness and evenness of operational taxonomic units (OTU) in the NR group compared to the R group (Fig. 1A–C). Additional analyses, including Sobs (p = 0.013), Ace (p = 0.007786), Bootstrap (p = 0.01382), Coverage (p = 0.02978), Boneh (p = 0.04051), and Shannoneven (p = 0.03256) analyses, also showed a significant difference between the R and NR groups (Supplementary Fig. 1A-F). The extent of the similarity of the microbial communities between the two groups was examined by principal coordinate analysis (PCoA) based on Bray-Curtis (R = 0.046, p = 0.002) (Fig. 1D), partial least squares discriminant analysis (PLS-DA) (Fig. 1E), PCoA based on unweighted-UniFrac (R = 0.0489, p = 0.004) (Supplementary Fig. 1G), non-metric multidimensional scaling (NMDS) (R = 0.046, p = 0.002) (Supplementary Fig. 1H), and analysis of similarities (ANOSIM) (R = 0.0461, p = 0.002) (Supplementary Fig. 1I) analyses. The results confirmed significant microbial diversity between the two groups. The Venn diagram (Fig. 1F) revealed 47 unique genera in the R group, 32 in the NR group, and 279 shared genera. In both groups, the dominant genera included Bacteroides, Escherichia-Shigella, Blautia, Faecalibacterium, Prevotella, and Subdoligranulum (Fig. 1G), whereas the predominant phyla were Firmicutes, Bacteroidetes, Proteobacteria, Actinobacteria, and Verrucomicrobiota (Supplementary Fig. 1J). Notably, Faecalibacterium and Subdoligranulum at the genus level, along with Firmicutes at the phylum level, were significantly higher in the R group compared to the NR group (Fig. 1H, Supplementary Fig. 1K). In summary, the findings suggest that the R group has higher and more heterogeneous gut microbiota diversity compared to the NR group.

Fig. 1: Significant difference in diversity between responders (R) and non-responders (NR) groups.
figure 1

The Shannon (A), Simpson (B), and Chao (C) analyses showed significantly lower OUT richness and evenness in the NR group compared to the R group. Principal coordinates analysis (PCoA) based on Bray-Curtis (D) and partial least squares discriminant analysis (PLS-DA) (PLS-DA) (E) indicated that the microbiota composition of the R group clusters varied significantly from that of the NR group. F The Venn diagram highlighted the differences between the two groups at the genus level. G Genera identified via 16S rRNA sequencing in the R and NR groups. H Analysis of the differences in gut microbiota composition and dominant microbiota among the two groups at the genus level. *P < 0.05, **P < 0.01.

Association between microbiota and clinical features of patients

To further investigate the impact of gut microbiota on peripheral immunity and cytokine production in patients undergoing chemo-immunotherapy combinations, Spearman correlation analysis was performed in 65 patients. At the genus level, the heatmap revealed positive correlations between Faecalibacterium and Butyricicoccus with the upregulation of CD8+ cells, while Lachnospiraceae_NK4A136_group showed a positive correlation with elevated CD4+ cells. Patients with a high abundance of Faecalibacterium also exhibited elevated levels of CD56+ B cells and lymphocytes (Fig. 2A), suggesting that gut microbiota may play an important role in mediating the immune system.

Fig. 2: Correlation between clinical parameters and gut microbiota.
figure 2

A Relationship between immune cell levels and gut microbiota. B Relationship between cytokine levels and gut microbiota. Colors in the figure notes: red represents high coefficient and positive correlation, green represents low coefficient and negative correlation. Values: Correlation coefficient between the dominant microbiota and clinical parameters, with a higher coefficient indicating higher correlation and a lower coefficient indicating lower correlation. C Detailed annotation of clinical parameters. *P < 0.05, **P < 0.01.

Subsequently, cytokine levels were also measured to assess their relationship with gut microbiota. The cytokine interleukin-2 receptor (IL-2R) was positively correlated with Barnesiella, whereas it was negatively correlated with the abundance of Butyricicoccus, Romboutsia, and other gut microbiota (Fig. 2B). Additionally, cytokine IL-1b levels were negatively correlated with lower abundances of Akkermansia and Eubacterium_hallii_group. These results suggest that gut microbiota may influence the secretion of cytokines that are essential for immunotherapy. Further analysis revealed that bacterial profiles varied based on age, sex, smoking status, TNM stage, and other clinical parameters (Supplementary Fig. 2A, B).

To gain a deeper understanding of the role of cytokines in response to chemo-immunotherapy combinations, diverse cytokine levels were compared between the R and NR groups (Supplementary Fig. 2D–I). Notably, IL-2R levels, which is inversely correlated with treatment efficacy, decreased in patients with higher gut microbiota richness. A similar trend was observed for cytokine IL-1b levels, although the difference was not statistically significant.

Overall, gut microbiota was associated with cytokine production and peripheral immunity, suggesting a potential connection to treatment efficacy.

Difference of gut microbiota composition between responders and non-responders according to metagenomic sequencing

To accurately analyze differences in gut microbiota composition between the R and NR groups, metagenomic sequencing was performed on 26 fecal samples. Linear discriminant analysis of effect size (LEfSe) revealed significant treatment-associated differences in fecal microbiota between R and NR (Fig. 3A). As shown in the bacterial community profiles at the genus level, Bacteroides, Faecalibacterium, unclassified_o_Bacteroidales, and other genera were significantly enriched in the R group, whereas Limosilactobacillus was more abundant in the NR group (Fig. 3B–M). These findings highlight the distinct microbial profiles between the two groups. In addition, compositional differences were also evident at the species level (Supplementary Fig. 3A). Several species, including unclassified_g_Bacteroides, Bacteroidescaccae, and Faecalibacterium_prausnitzii, were more abundant in the R group, compared with the NR group. Therefore, the composition of the gut microbiota community of the R group is distinct from that of the NR group.

Fig. 3: Different bacterial composition between responders (R) and non-responders (NR) groups.
figure 3

A Linear discriminant analysis Effect Size (LEfSe) analysis revealed different gut microbiota detected in R and NR groups. BM Comparative analysis of gut bacterial genera between R and NR groups at the genus level (*P < 0.05, **P < 0.01).

Prediction value of gut microbiota for treatment prognosis in lung cancer patients

To evaluate the predictive potential of gut microbiota for the efficacy of chemo-immunotherapy combination, a support vector machine (SVM) model was used by using the top 20 important microbiota at the genus and species levels. The SVM model was trained and tested through 5-fold cross-validation, where the dataset was divided into five subsets. In each iteration, four subsets were used for training, and one was reserved for validation. This process was repeated 5 times, selecting a different subset as the validation set each time. Through this approach, it ensures robust performance estimates and minimizes overfitting. Performance metrics for classification tasks were calculated, including accuracy, precision, recall, and F1 score (Supplementary Table 2). (1) Accuracy measures the ratio of correct predictions (true positive and true negative) to total predictions; however, it may be biased in imbalanced datasets. (2) Precision: measures the accuracy of positive predictions. It is defined as the proportion of true positive predictions relative to all positive predictions (true positive plus false positive), high precision indicating a low false positive rate. (3) Recall (Sensitivity) indicates the model’s ability to identify all positive instances. It is defined as the number of true positives divided by the number of true positives plus the number of false negatives. High recall relates to a low false negative rate. (4) F1 score, the harmonic mean of precision and recall, balances these metrics and is especially useful in scenarios with unequal costs for false positives and false negatives. The area under the curve (AUC) of the top 20 genera was 0.763 (Fig. 4A, B), and for the top 20 species, it was 0.855 (Fig. 4C, D). Collectively, these results suggest that gut microbiota profiles, especially the top 20 species, could be used as biomarkers for predicting the efficacy of chemo-immunotherapy combinations.

Fig. 4: Receiver operating curves (ROC) for gut microbiota predicting chemo-immunotherapy combination treatment efficacy.
figure 4

A Top 20 bacteria of importance at the genus level. B ROC curve of the top 20 bacteria of importance at the genus level. C Top 20 bacteria of importance at the species level. D ROC curve of the top 20 bacteria of importance at the species level.

Metagenomic sequencing revealed significantly different metabolic functional pathways between responding and non-responding patients

To further investigate the mechanisms by which the gut microbiota influences the response to chemo-immunotherapy combinations, we examined the functional pathways of the R and NR groups. PCoA, ANOSIM, and NMDS analyses revealed significant differences in the microbial community composition between the two groups (P = 0.044, stress = 0.032; Fig. 5A, Supplementary Fig. 4A, B). Specifically, the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and KEGG orthologs (KO) analyses revealed significant functional differences, with notable alterations in specific pathways across various levels between the R and NR groups (Fig. 5B–D).

Fig. 5: Metagenomic analysis showing the difference in Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and KEGG Orthologous (KO) between responders (R) and non-responders (NR) groups.
figure 5

A Principal coordinate analysis (PCoA) showing distinct distribution of gut functional differences between the R and NR groups. BD Differences in KEGG pathway levels 1, 2, and 3 (KO) between the R and NR groups. *P < 0.05, **P < 0.01.

Interestingly, glycan biosynthesis and metabolism pathways, involved in level 2 of the KEGG pathway, including galactose metabolism, and other glycan degradation pathways involved in level 3 of the KEGG pathway, were elevated in the R group compared to the NR group. Additionally, glycogen degradation (M00855) and the incomplete reductive citrate cycle (TCA cycle) (M00620) were enriched in the R group. These results suggested that glycolysis is the principal metabolic pathway of glucose in the R group, likely leading to increased production of SCFAs. Consistently, significant differences were noted in the Clusters of Orthologous Groups of proteins (COG) between the two groups (Supplementary Fig. 4C, D). In summary, the R group exhibited a distinct metabolic profile compared to the NR group, with glycolysis as the dominant pathway for glucose metabolism, potentially linked to higher SCFA production.

The SCFAs differed significantly in responders and non-responders

To assess whether SCFA concentrations varied between R and NR groups, eight different SCFAs (acetic acid, propionic acid, isobutyric acid, butyric acid, isovaleric acid, valeric acid, hexanoic acid, and isohexanoic acid) were measured in the 26 fecal samples using gas chromatography-mass spectrometry (GC-MS). The results revealed a distinct difference in SCFA composition between the two groups. The Z-score of each SCFA varied distinctly, with higher scores observed in the R group (Fig. 6A). The abundance of each SCFA was then evaluated separately. The results indicated that the levels of total SCFAs, butanoic acid, and propanoic acid were significantly increased in the R group, whereas the levels of acetic, isovaleric, valeric, hexanoic, isohexanoic acid, and isobutyric acid showed trends toward differences, but without statistical significance (Fig. 6B–J, Supplementary Fig. 5B, C). PCoA analysis further confirmed a compositional difference between the R and NR groups (Supplementary Fig. 5A). Heatmap analysis of the microbiota profile (Fig. 6K) identified correlations between SCFA production and the abundance of taxa such as Escherichia, Faecalibacterium, Bacteroides, Odoribacter, Clostridium, Dialister, Dorea, Prevotella, Roseburia, Ruminococcus, which were enriched in the R group.

Fig. 6: The difference in abundance of short-chain fatty acids (SCFAs) in responders (R) and non-responders (NR) groups and its association with gut microbiota.
figure 6

A Z-score of each SCFA in the R and NR groups. BJ The abundance of acetic, isovaleric, valeric, hexanoic, and isobutyric acid between the two groups, among which the levels of total SCFAs, butanoic and propanoic acid were significantly elevated in the R group, compared with the NR groups. K The relationship between each SCFA and gut microbiota. *P < 0.05, **P < 0.01.

Furthermore, the effects of SCFAs on functional cytokine production and immune cells in peripheral blood were evaluated. The results suggested that the concentrations of butanoic and hexanoic acid were significantly correlated with the numbers of lymphocytes, CD3+ T cells, CD4+ T cells, and CD8+ T cell populations, whereas valeric acid was correlated with lymphocytes, CD19+ B cells, CD3+ T cells, and CD4+ T cell populations (Table 1). For cytokines, acetic acid, propanoic acid, butanoic acid, and total SCFAs were negatively correlated with IL-2R levels, while valeric acid was positively correlated with IL-10 levels (Table 2). Taken together, these findings suggest that higher SCFA levels in the R group may contribute to the activation of effector T cells and lymphocytes and the suppression of IL-2R production.

Table 1 Correlation between short-chain fatty acids (SCFAs) and immune cell population in peripheral blood
Table 2 Correlation between short-chain fatty acids (SCFAs) and cytokine levels in peripheral blood

Gut microbiota and SCFAs enhance anti-tumor efficacy of combination therapy in tumor-bearing mice

To investigate the role of gut microbiota and its metabolites, SCFAs, on the treatment efficacy of chemo-immunotherapy combination, in vivo experiments were conducted using tumor-bearing mice. As depicted in Fig. 7A, the mice were divided into three groups: (1) antibiotic treatment (ABX)group, (2) ABX + fecal microbiota transplantation (FMT) group, and (3) ABX + SCFAs group. Tumor growth and progression were monitored by measuring tumor volume every three days starting on Day 9, after Lewis Lung Carcinoma (LLC) inoculation. Compared to the ABX group, both the ABX + FMT and ABX+SCFAs groups had significantly reduced tumor volume, particularly from Day 15 onwards. On Day 21, the tumor volumes in the ABX + FMT group and the ABX+SCFAs group were reduced by 78.91% and 61.77%, respectively (Fig. 7B). In addition, the remaining two groups also displayed decreased tumor weight compared to ABX group (Fig. 7C). These findings demonstrate that gut microbiota and its metabolites, SCFAs, can significantly enhance the anti-tumor efficacy of combination therapy in tumor-bearing mice, as evidence by both reduced tumor volume and weight.

Fig. 7: Enhanced anti-tumor efficacy of combined therapy by gut microbiota and short-chain fatty acids (SCFAs) in tumor-bearing mice.
figure 7

A Schematic diagram of mouse experiments. B Tumor volume changes in the antibiotic treatment (ABX) group, ABX+ fecal microbiota transplantation (FMT) group, and ABX+SCFAs group (n = 5 per group). Tumor volume was measured every three days starting on Day 9 of tumor inoculation. Statistical differences were examined using two-way analysis of variance (ANOVA). *P < 0.05, ****P < 0.0001, ####P < 0.0001. C Changes in tumor weight of mice in ABX group, ABX + FMT group, and ABX+SCFAs group. Statistical differences were examined using one-way ANOVA. *P < 0.05, **P < 0.01.

Gut microbiota and SCFAs modulate immune response and cytokine profiles to enhance anti-tumor effects

To explore how gut microbiota and its metabolites contribute to the anti-tumor effect, the immunological changes and cytokine levels in the spleen and tumors were examined using flow cytometry (Fig. 8A) and ELISA. In the ABX + FMT group, there were increased CD3+ and CD8+ T cells in the spleen, while there were increased CD45+, CD3+, and CD8+ T cells in the tumor tissues, when compared with the ABX group. Consistently, in the ABX+SCFAs group, there were significant increase in CD45+, CD3+, and CD8+ T cells, with increased expression of granzyme B (GZMB) and tumor necrosis factor-α (TNF- α) in the CD8+ T cells, in both the spleen and tumor tissues when compared with the ABX group (Fig. 8B). Notably, SCFAs administration in the ABT+SCFAs group could potentially promote more granzyme B (GZMB) and tumor necrosis factor-α (TNF- α) expression in CD8+ T cells than those in ABX + FMT group. Consequently, cytokine levels in the peripheral blood were analyzed. In comparison to the ABX group, increased IFN-γ level and decreased IL-6 level were observed in the ABX + FMT group. In the ABX+SCFAs group, the IFN-γ levels were increased, but IL-2R levels were decreased, relative to the ABX group. These findings suggest that the anti-tumor effect is correlated with an increase in IFN-γ levels and a decrease in IL-6 and IL-2R levels. For IL-10, neither of the groups showed a significant difference (Fig. 8C). With the results obtained from the clinical specimen analysis and the animal experiments, it shows that gut microbiota and its metabolites, SCFAs, could promote the anti-tumor effect by enhancing the infiltration and activation of effector T cells and regulating cytokines.

Fig. 8: Enhanced anti-tumor immune response by gut microbiota and short-chain fatty acids (SCFAs).
figure 8

A, B Flow cytometry to detect the immune cells in the spleen and tumors (n = 5 per group). Infiltration of CD45+ T cells, CD3+ T cells, CD8+ T cells, and the expression of tumor necrosis factor-α (TNF-α), granzyme B (GZMB) among CD8+ T cells in the spleen and tumor tissue. C interferon-γ (IFN-γ), interleukin-2 receptor (IL-2Rα), interleukin-6 (IL-6), and interleukin-10 (IL-10) levels in blood were detected by enzyme-linked immunosorbent assay (ELISA). Data represent the mean ± standard deviation (SD) and were analyzed by one-way analysis of variance (ANOVA). *P < 0.05, ** P > 0.01, ***P > 0.001, **** P < 0.0001.

Gut microbiota and SCFAs maintain gut mucosal integrity

According to the International Harmonization of Nomenclature and Diagnostic Criteria (INHAND) for Rats and Mice Pathology16, colon lesions were graded from 0 to 4 based on severity of erosion, ulceration, and inflammatory cell infiltration. Grade 0 indicates a normal mucosa within the physiological range. Grade 1 represents a slight lesion that exceeds the normal range but remains minimal. Grade 2 refers to a visible lesion that is moderate in severity but not yet severe. Grade 3 denotes a significant lesion with increased severity, and Grade 4 refers to a lesion that has occupied the entire colon and is classified as very severe. Mice in the ABX group displayed small areas of ulceration in the intestinal mucosal layer. The ulcerated surfaces showed mucosal epithelial shedding and minor gland necrosis (black arrows) with nuclear fragmentation, extending deep into the submucosa. Scattered lymphocyte infiltration was present in the lamina propria and submucosa (red arrows). Surrounding the ulcers, the glands appear irregularly shaped with dark-stained nuclei (orange arrows), reduced goblet cells, uneven muscle layer thickness, and loosely arranged muscle cells with lightly stained cytoplasm (blue arrows). In the ABX + FMT group, the FMT demonstrated a restorative effect on the gut mucosal structure, characterized by abundant mucosal folds and intact epithelial structure. Occasional epithelial degeneration was noted (orange arrows), with loose and lightly stained cytoplasm. The lamina propria contained numerous regularly arranged glands with abundant goblet cells and occasional focal lymphocyte aggregation (yellow arrows). The muscle layer showed muscle cells with loose, lightly stained cytoplasm. In the ABX+SCFAs group, the gut mucosal epithelial structure was intact, consisting of a single layer of columnar epithelium with normal cell morphology. The lamina propria contained numerous regularly arranged glands with abundant goblet cells. Of note, there was no significant inflammatory cell infiltration. Pathological scoring of colons revealed that ABX + FMT and ABX+SCFAs restored the structure of gut mucosal epithelium and reduced the infiltration of inflammatory cells. Interestingly, the restorative effect was more pronounced in the ABX+SCFAs group (Fig. 9A).

Fig. 9: Gut microbiota and short-chain fatty acids (SCFAs) enhance anti-tumor effects by modulating the gut microbial environment.
figure 9

A Histopathological changes of colon tissues were observed and scored under 5× and 20×. Based on the International Harmonization of Nomenclature and Diagnostic Criteria (INHAND) for Rats and Mice Pathology, the colon tissue pathology of the three groups was evaluated from two aspects: erosion or ulceration and inflammatory cell infiltration. Heatmap (B) and histogram (C) depict the gut microbiota composition across the three groups of mice. D, E Analysis of the differences in gut microbiota composition and dominant microbiota among the three groups. F Linear discriminant analysis effect size (LEfSe) analysis, with a Linear Discriminant Analysis (LDA) threshold of 4, identifies key microbiota taxa enriched in each group, visualized as colored nodes. These taxa significantly contribute to the intergroup differences. G The LDA discriminant histogram quantifies microbial taxa with significant effects across groups. With larger LDA scores indicating a greater influence of species abundance on group differences. Statistical differences were examined using one-way analysis of variance (ANOVA). *P < 0.05, ** P < 0.01, *** P < 0.001.

Longitudinal analysis of fecal samples from mice in the control (CON), antibiotic-treated (ABX), and fecal microbiota transplantation (FMT) groups revealed that most of the microbiota was eradicated following antibiotic treatment, resulting in a near-sterile state. However, following FMT, a marked recovery of the microbiota diversity was observed. Supplementary Fig. 6A shows a Venn diagram of microbiota composition at the genus level for the three groups. Composition analysis revealed that in the ABX group, Proteobacteria predominated at the phylum level (Supplementary Fig. 6B, D), and unclassified _f_Enterobacteriaceae at the genus level (Supplementary Fig. 6C, E), indicating a relatively simple microbiota. This suggests that the antibiotic regimen significantly depleted the mouse gut microbiota. After FMT, there was a notable improvement in microbiota richness both at the phylum (Supplementary Fig. 6B, D) and genus level (Supplementary Fig. 6C, E). Differential analysis revealed significantly higher levels of Bacteroidota and Firmicutes at the phylum level (Supplementary Fig. 6F), and increased abundances of Bacteroides, Lactobacillus, Erysipelatoclostridium, Enterococcus, Blautia, and Parasutterella at the genus level (Supplementary Fig. 6G) in the FMT group compared to the ABX group. To investigate the potential impact of gut microbiota on tumor growth, 16S rRNA sequencing was conducted on mice fecal samples on Day 21. A heatmap summarizing the top 30 genera and a bar graph depicting the top 10 genera by relative abundance in each group are presented in Fig. 9B, C, respectively. Differential analysis revealed that Blautia, Escherichia-Shigella, and Parasutterella exhibited significant differences in the ABX+SCFAs group, suggesting their potential role as the key microbiota in tumor treatment (Fig. 9D, E). Furthermore, Bacteroides and norank_f_Muribaculaceae were widely present in the ABX + FMT group, highlighting their potential significance in lung cancer treatment (Fig. 9D, E). LEfSe analysis, performed at a linear discriminant analysis (LDA) threshold of 4, identified dominant taxa from the family to species level with Bacteroides and Blautia being the most abundant genera in the ABX + FMT and ABX+SCFAs groups, respectively (Fig. 9F, G). Interestingly, we found that the abundance of Bacteroides was significantly increased in the ABX + FMT group at the genus level both after FMT and at the end of the experiment, indicating that there was no significant difference in the microbiota during the combined treatment, and Bacteroides may be the key microbiota affecting the treatment. These results suggest that gut microbiota and SCFAs could restore the gut structure of ABX mice, and the altered gut microbiota may influence tumor occurrence and development.

It was observed that SCFAs enhanced the anti-tumor effects in tumor-bearing mice. To further investigate the specific role of SCFAs, targeted metabolomics analysis of fecal samples was performed using GC-MS in the ABX + FMT group. Expression pattern clustering analysis revealed higher relative levels of propanoic acid, butanoic acid, valeric acid, and hexanoic acid in the ABX + FMT group than in the ABX group (Supplementary Fig. 7A) with a correspondingly higher Z-score (Supplementary Fig. 7B). The distribution of various SCFAs is shown in Supplementary Fig. 7C, and the variance analysis showed propanoic acid and valeric acid levels were significantly elevated in the ABX + FMT group (Supplementary Fig. 7D, E). Metabolic pathway analysis indicated that propionic acid plays a major role in the anti-tumor process. Propanoic acid can be converted into succinyl-CoA through the propanoic acid metabolism pathway, ultimately entering the tricarboxylic acid (TCA) cycle for energy production. Additionally, pyruvate produced by glycolysis was reduced to lactate, regenerating NAD+ to sustain glycolysis (Supplementary Fig. 7F). These findings align with previous metagenomic sequencing results, which identified glycolysis as the main metabolic pathway of glucose in the R Group, potentially linked to increased SCFA production. In summary, the targeted metabolic findings highlight the pivotal role of gut microbiota in the anti-tumor process, primarily through SCFA production.

Discussion

According to current guidelines for lung cancer treatment, the first-line therapy for advanced NSCLC without gene mutations remains the combination of cisplatin-based chemotherapy and immunotherapy15. However, most studies to date have predominantly focused on strategies to improve the response to immunotherapy only, often disregarding the impact of chemotherapy. In this study, we observed that the gut microbiota may play an important role in the response to chemo-immunotherapy combinations through the production of SCFAs.

In this prospective study of 106 patients with advanced NSCLC undergoing cisplatin-based chemotherapy combined with immunotherapy, 16S rRNA sequencing revealed distinct gut microbiota profiles between R and NR. Responders exhibited higher α-diversity, and β-diversity differed significantly from non-responders. Consistent with several studies on immunotherapy, higher gut microbiota diversity has been associated with improved outcomes across various tumor histologies and distributions, diet, and ICIs3,17,18. Furthermore, the composition of gut microbiota varied significantly between the R and NR groups. Bacteroides, Faecalibacterium, Subdoligranulum, and Butyricimonas were overrepresented in the R group, while Limosilactobacillus was underrepresented. Similarly, Vernocchi et al.19 and Haberman et al.20 found a significant reduction in both the abundance of Bacteroides and Faecalibacterium in NSCLC patients compared to healthy individuals, highlighting its protective role against lung cancer21. Given the positive correlation between elevated levels of Faecalibacterium and improved outcomes in ICI therapy, Bredon et al.22 demonstrated that administering the EXL01 strain to tumor-bearing mice treated with ABX enhanced T cell activation induced by ICIs in vitro. Interestingly, oral EXL01 administration did not alter fecal microbiota diversity or composition, suggesting its direct immunomodulatory effects. This strain is being evaluated as a potential adjunct to ICI therapy for various cancers. Collectively, these results reinforce the critical role of gut microbiota in lung cancer therapy.

Besides, gut microbiota has been demonstrated to differ across several pathological conditions and could influence the production of cytokines, which play a critical role in disease progression and exacerbation23,24. In this study, patients with higher Bacteroides abundance exhibited elevated levels of CD3+ T%, the CD4/8 ratio, and IL-1b. Similarly, the abundance of Faecalibacterium is significantly associated with elevated CD8+ T cells, CD19+ B cells, CD3+ T cells, and lymphocytes, findings consistent with recent research on ICIs. For instance, Gopalakrishnan et al.3 observed that melanoma patients with enriched Faecalibacterium exhibited improved ICIs responses and enhanced CD8+ T cell infiltration. Parallel work by Routy et al.8 revealed that Akkermansia muciniphila abundance positively predicted ICI efficacy in epithelial tumors, with supplementation shown to restore PD-1 blockade effects via IL-12–dependent T cell recruitment to tumors. IL-2R serves as a pivotal immune regulator in cancer immunotherapy, where elevated signaling drives Treg cell expansion to suppress anti-tumor immunity and promote tumor progression18,25. In our study, serum IL-2R levels were negatively correlated with α-diversity, suggesting that responders with higher gut microbiota diversity had lower IL-2R levels. These findings align with animal experiments showing reduced IL-2R expression in antibiotic-treated mice receiving FMT or SCFAs, corroborating the clinical observations. Additionally, prior research showed that lower baseline IL-2R levels were linked with longer PFS in lung cancer patients receiving ICI26. These findings underscore the potential role of IL-2R in enhancing the efficacy of chemo-immunotherapy combinations, possibly via modulation of gut microbiota.

In our study, metagenomic sequencing revealed distinct pathway enrichment profiles between the R and NR groups. Metabolic functions related to carbohydrate metabolism, including pyruvate metabolism and glycan biosynthesis, were significantly enriched in the R group. Moreover, the incomplete reductive citrate cycle (TCA cycle) module was more active in the R group, indicating enhanced anaerobic glycolysis in these patients. Analysis of the KEGG reaction network showed that pyruvate made from glycolysis can be turned into lactic acid through the propionate metabolism pathway. This regenerates NAD+, supporting the continuation of glycolysis, which aligns with the results of our clinical patient analysis.

The observed increase in CD3+ and CD8+ T cells in the spleen and tumor tissues suggests that FMT may enhance systemic immune responses, thereby improving tumor surveillance and clearance. CD8+ T cells, in particular, play a critical role in recognizing and eliminating tumor cells. The concurrent increase in CD45+ cells further indicates that FMT may stimulate the expansion of immune cell populations involved in adaptive immunity. Moreover, elevated levels of IFN-γ, a key anti-tumor cytokine predominantly secreted by CD8+ T cells, significantly enhanced anti-tumor immunity by activating macrophages and upregulating tumor-suppressive molecules. On the other hand, the reduction of IL-6 levels suggests that FMT may mitigate chronic inflammation and inhibit pathways that promote tumor growth and metastasis. In summary, FMT not only reshapes the gut microbiota but also enhances immune responses, which may explain its therapeutic synergy with immune chemotherapy.

SCFAs, defined as fatty acids with less than six carbon atoms27, primarily include acetate, propionate, and butyrate, which are produced from fermentation by gut bacteria28. Notably, fecal SCFA abundance was significantly higher in the R group compared to the NR group. According to previous reports, acetate - the metabolite of Blautia, played an important role in regulating the immune response in breast cancer, and its elevation has been shown to reduce chronic stress, thereby inhibiting cancer progression29. Parallel findings in colon cancer demonstrate that Blautia boosts CD8+ T cell-mediated immune surveillance by metabolizing immunosuppressive lysoglycerophospholipids30. While Blautia shows significant associations with colon cancer, its relationship with lung cancer remains underexplored. In our study, the marked increase in Blautia abundance in the ABX+SCFAs group suggests that SCFA supplementation promotes Blautia proliferation, potentially associated with enhanced metabolic and immune regulation. In addition to Blautia, SCFA-producing species such as Bacteroides, Faecalibacterium, Subdoligranulum, and Butyricimonas have been associated with colonic epithelium repair and disease prevention (https://github.com/OpenGene/fastp). Consistent with these findings, SCFA supplementation in this study not only altered gut microbial composition but also promoted intestinal mucosal restoration. In mice treated with ABX, a small area of ulcer was observed in the mucosal folds of the intestinal tissue, and a small number of lymphocytes were scattered in the lamina propria and submucosa. However, after SCFA supplementation, the mucosal epithelial structure was restored, with an abundance of goblet cells and minimal inflammatory cell infiltration.

SCFAs generated by specific gut microbiota not only affect genetic and epigenetic regulation but also affect metabolism in the cells of the immune system, including both immunosuppressive and inflammatory populations, via their cell receptors31. Butanoic, hexanoic, and valeric acids were found to be significantly correlated with the immune cell population, whereas acetic, propanoic, butanoic, valeric acid, and total SCFAs were negatively correlated with several cytokine levels. These results confirm the key role of SCFAs in chemo-immunotherapy efficacy through their direct modulation of T cell function, particularly the induction of Tregs in the gut32. For instance, butyrate can condition dendritic cells (DCs) to facilitate Foxp3+Treg differentiation through indoleamine 2,3-dioxygenase 1 (IDO1) and aldehyde dehydrogenase 1A2 (Aldh1A2)33. Butyrate can also negatively regulate the inflammatory signaling pathway mediated by NLRP3 (NOD-like receptor thermal protein domain-associated protein 3) to inhibit the activation of Macs34. SCFAs also suppress the proliferation and cytokine production of Th1, Th17, and Th22 cells via histone acetylation and the GPR43 signaling pathway35. In addition, pentanoate and butyrate have been reported to boost the anti-tumor activity of cytotoxic T lymphocytes by upregulating effector molecules, including CD25, IFN-γ, and TNF-α27. CD8+ cells are crucial for the efficacy of immunotherapy, butyrate has been reported to enhance IL-12 signaling pathway, promoting anti-tumor effects via ID-2 dependent regulation of CD8+ T cell36,37. In the experiments in vivo and in vitro for research in gastric cancer (GC), it was demonstrated that butyrate enhanced CD8+ T cell cytotoxicity via G protein-coupled receptor 109A (GPR109A)/homologous domain protein homologous box (HOPX), thus inhibiting GC carcinogenesis, which offered a novel direction for cancer treatment38. Similarly, in animal experiments, ABX-treated mice with FMT or SCFAs led to a significant increase in CD8+ T cells in both spleens and tumor tissues, accompanied by increased expression of GZMB and TNF-α in CD8+ T cells.

Our study establishes the gut microbiota as a critical determinant of chemo-immunotherapy response in NSCLC, mechanistically linking microbial-derived SCFAs to enhanced anti-tumor immunity through effector T cell activation in the tumor microenvironment (Fig. 10). This research provides valuable insights into improving treatment for advanced NSCLC and positions the gut microbiota as a potential therapeutic target in lung cancer, yet several key questions remain unanswered. The collective influence of host factors, including diet, lifestyle, and genetic background, on microbial diversity and therapeutic response requires systematic investigation. Building on our longitudinal mouse data, clinical studies incorporating serial microbiota analysis in human patients throughout treatment cycles are now needed to validate these findings. To strengthen these investigations, future work should prioritize multi-center collaborations with expanded patient cohorts to enhance demographic diversity and statistical power, coupled with mechanistic studies to define how microbial metabolites mediate treatment effects through specific molecular pathways.

Fig. 10: Gut microbiota influences the efficacy of chemo-immunotherapy combination through the production of short-chain fatty acids (SCFAs).
figure 10

Composition of gut microbiota altered in responders (R) and non-responders (NR) groups. In the R group, gut flora, including Escherichia, Bacteroides, Faecalibacterium, etc., produced more abundance of SCFAs through enhanced glycolysis. SCFAs promoted the induction of CD19+ B cells, CD3+ T cells, CD4+ T cells, CD8+ T cells, lymphocytes, and reduced the level of interleukin-2 receptor (IL-2R); therefore, they augmented the curative effect of chemo-immunotherapy combination on non-small cell lung cancer (NSCLC).

Methods

Patient enrollment and sample collection

In this prospective cohort study, 236 patients with advanced NSCLC were initially identified, and 106 patients who visited Zhongshan Hospital, Fudan University, from June 2020 to January 2022 were enrolled for final inclusion. The inclusion criteria were as follows: (1) Unresectable primary tumor and histopathological confirmation of stage IV NSCLC. (2) Patients with at least one measurable target lesion. (3) No history of prior treatment for NSCLC or other types of cancer. Exclusion criteria included: (1) Pregnancy, (2) Antibiotic use within the last 2 weeks. (3) Previous NSCLC treatment (4) History of cancer in other organ systems. All patients undergoing first-line chemo-immunotherapy were stratified into responders (R) and non-responders (NR) according to RECIST 1.1 criteria39. Patients who showed tumor progression within 6 months from treatment initiation were classified as NR, whereas those with PFS for longer than 6 months were classified as R.

Fecal samples were collected from patients to analyze the levels of SCFAs. Blood samples were drawn from the same patients to measure cytokine levels and immune cell populations. All samples were processed and stored according to standard protocols for microbiota and immune profiling. All participants were informed of the study, and written consent was obtained prior to enrollment in the study. This study was approved by the Ethics Committee of Zhongshan Hospital, Fudan University (Ethics NO. B2020-019R). This study was conducted in compliance with the principles of the Declaration of Helsinki.

Mouse experiments

Male C57 BL/6 mice, aged 6–8 weeks, were procured from Vital River and were housed in a specific-pathogen-free facility at Zhongshan Hospital, Fudan University. All experimental procedures were approved by the Animal Care and Use Committee of Zhongshan Hospital (Approval Number: 2023-290) and adhered to ethical guidelines and regulations.

A two-week antibiotic treatment (ABX) regimen was administered prior to tumor inoculation. From Day -23 to Day -17, ampicillin (1 mg/mL), neomycin (1 mg/mL), metronidazole (1 mg/mL), and vancomycin (0.5 mg/mL) were added to the drinking water. From Day -16 to Day -10, ampicillin 1 mg/mL was added to the drinking water, and the mixture of metronidazole 10 mg/mL, neomycin 10 mg/mL, vancomycin 5 mg/mL, and amphotericin B 0.1 mg/mL was administered by gavage once daily at 200 µl per mouse.

Fecal samples were collected from NSCLC patients who responded to chemotherapy combined with immunotherapy. Informed consent was obtained from all patients. The samples were collected promptly within 2 h of collection. First, the samples were mixed thoroughly with sterile saline at a ratio of 1:10 (mass: volume) and stirred evenly until no large particles were visible. Secondly, the suspension was filtered through a sterile mesh filter with decreasing pore sizes: a 100 µm filter to remove solid particles in feces, followed by passage through a sterile mesh filter of 70 µm and 40 µm sequentially to remove undigested food and smaller particulates. The filtrate was collected into a sterile centrifuge tube and vortexed for 5 min to obtain a resuspension, which was then centrifuged at 600 × g for 5 min to remove any remaining insoluble material. The supernatant was retained and dispensed into 1.5 ml centrifuge tubes. Then, 10% sterile glycerol was added, and samples were frozen at −80 °C for subsequent use in FMT.

Fifteen mice were randomly divided into three groups: ABX group, ABX + FMT group, and ABX+SCFAs group. Antibiotics were administered from Day -23 to Day -10 to deplete gut microbiota. All mice were subcutaneously injected with 1 × 106 Lewis lung cancer (LLC) cells at Day 0. Chemotherapy (pemetrexed 100 mg/kg and cisplatin 5 mg/kg) combined with immunotherapy (PD-1 antibody 200 µg/mouse) was administered every three days starting from Day 9 after tumor formation until Day 18.

In the ABX + FMT group, the fecal suspension was administered by gavage at a concentration of 10 µl/g every three days from Day -9 to Day 0 (a total of 10 days) after ABX. Additionally, 100 µl of the suspension was applied topically to the lower abdomen or back of the mice. Longitudinal fecal microbiota changes were monitored in this group at four time points: before ABX treatment on Day -23 (control group, CON), after ABX treatment on Day -10 (ABX group), after FMT on Day 0 (FMT group), and at the end of the experiment on Day 21. Fecal samples were collected at each time point for 16S rRNA sequencing.

In the ABX+SCFAs group, a cocktail of SCFAs (sodium acetate, 67.5 mM; sodium propionate, 25.9 mM; and sodium butyrate, 40 mM) dissolved in sterile phosphate-buffered saline (PBS) was administered orally at a dose of 10 µl/g once daily from Day 0 to Day 21 (22 days). On Day 21, tumor volume was measured using the formula: \({Volume}={Length}\times {{Width}}^{2}\times 0.52\). Samples, including tumor tissue, spleen, blood, colon, and feces, were collected for further analysis.

Mice were anesthetized with Avertin at a dose of 250 mg per kg body weight via intraperitoneal injection, ensuring a pain-free state throughout the procedure. The appropriate dose of Avertin was calculated based on each mouse’s body weight prior to anesthesia. Upon completion of the experiments, anesthetized mice were euthanized by cervical dislocation. All procedures were in full compliance with relevant animal research guidelines and regulations.

Flow cytometry

Peripheral blood from patients was used to assess immune cell populations. Antibodies used for staining included Alexa Fluor® 700 anti-human CD3 (BioLegend, 317340), Spark Violet™ 538 anti-human CD4 (BioLegend, 344674), PerCP/Cyanine5.5 anti-human CD8a (BioLegend, 301032), APC anti-human TCR γ/δ (BioLegend, 331212), PE/Fire™ 640 anti-human CD16 (BioLegend, 302068), Brilliant Violet 750™ anti-human CD56 (NCAM) (BioLegend, 362556), Brilliant Violet 570™ anti-human CD19 (BioLegend, 302236), PE/Cyanine7 anti-human CD1c (BioLegend, 331516). CD3TCRγδ was used to analyze NK cells, B cells, and lymphocytes. CD16 and CD56 were used to distinguish NK cells at different stages, and CD19 and CD1c were used to circle out B cells. 75 μl of anticoagulant blood was added to each tube. 5 μl of Human TruStain FcX (BioLegend, 422301) was added to each tube, mixed, and incubated at room temperature in the dark for 10 min. Then add 75 µl mixture containing pre-titrated antibody, True-Stain Monocyte Blocker (BioLegend, 426101), and Brilliant Stain Buffer Plus (BD Biosciences, 566385). After mixing, the mixture was incubated at room temperature in the dark for 20 min. 10× red blood cell lysate (BioLegend, 422401) was prepared into a 1× working solution with deionized water and equilibrated at room temperature before use. 2 ml of 1× erythrocyte lysate was added to each tube, immediately mixed gently, and incubated at room temperature in the dark for 15 min. 5, centrifuged at 350 × g for 5 min, and the supernatant was discarded. Wash once with at least 2 mL of Cell Staining Buffer (BioLegend, 420201). The stained cells were resuspended in 0.5 mL Cell Staining Buffer, and then detected and analyzed by Cytek Aurora 5L (Cytek, China).

Mouse spleens and tumors were harvested for flow cytometry analysis. Antibodies used for staining included APC-cy7 anti-mouse CD45 (BD Biosciences, 557659), Percp-cy5.5 anti-mouse CD3e (BD Biosciences, 551163), FITC anti-mouse CD8a (BD Biosciences, 553030), and BV510 Zombie Aqua™ Fixable Viability Kit (BioLegend, 423101). For cell surface staining, antibodies were added to the cells in the dark at room temperature for 15 min. Subsequently, 1 mL of Stain Buffer (BD Biosciences, 554656) was used to wash the cells, followed by centrifugation at 500 × g for 5 min. For intracellular cytokine staining, cells were stimulated for 4 h with PMA (50 ng/mL; Sigma, P8139) and Ion (1 μg/mL; Sigma, I9657) in the presence of Golgi Stop (BD Biosciences, 554724). After fixation and permeabilization with Fixation/Permeabilization Kit (BD Biosciences, 554714), cells were stained with BV421 anti-mouse TNF-α (BioLegend, 506328), PE-cy7 anti-mouse GZMB (BioLegend, 372213), and incubated at 4 °C for 30 min. Flow cytometry was performed using the BD FACSCelesta, and data were analyzed using FlowJo software (version 10.8.1.).

Chemiluminescence immunoassay (CLIA)

In patients, the levels of cytokines in peripheral blood, including IL-2R (Orienter, DC0901), IL-1β (Orienter, DC0801), IL-8 (Orienter, DC1001), IL-10 (Orienter, DC0701), IL-6 (Orienter, DC0402), and TNF-α (Orienter, DC0601), were measured by CLIA (Siemens, immulite1000). Blood samples were collected in anticoagulant tubes. The test unit is spun at high speed. A commercial cytokine detection fluid was used, followed by a series of washes to remove unbound substances. A chemiluminescent substrate was introduced to generate a light signal. The signal was measured with a high-sensitivity chemiluminescence detector, and cytokine concentrations were calculated using a standard curve.

Enzyme-linked immunosorbent assay (ELISA)

On Day 21, blood samples were collected from tumor-bearing mice, centrifuged at 1000 × g for 10 min to isolate the supernatants for cytokine level analysis. The concentrations of IFN-γ (Multi sciences, EK280), IL-2Rα (Multi sciences, EK291), IL-6 (Multi sciences, EK206), and IL-10 (Multi sciences, EK210) were detected with the ELISA kits according to the manufacturer’s instructions.

Pathological score

The colon was fixed with 4% paraformaldehyde (PFA) for 24 h where each segment of the colonic tissue was divided into three equal sections, embedded in paraffin, and stained with hematoxylin and eosin (HE). The pathological changes in the mouse colon were observed under a microscope after the sections were fixed and dried.

DNA extraction and 16S rRNA gene sequencing

Fecal samples were collected from all human participants during hospitalization and from mice on Day 21 at the end of the experiment. The feces were immediately delivered on dry ice and frozen at −80 °C until processing. Bacterial DNA was extracted from fecal samples using the E.Z.N.A.® soil DNA Kit (Omega Bio-Tek, USA). The purification and DNA concentration were examined using a NanoDrop 2000 UV-vis spectrophotometer (Thermo Scientific, USA), and the quality was determined by 1% agarose gel electrophoresis. The V3–V4 regions of the 16S rRNA were amplified using the PCR system (GeneAmp 9700, ABI, USA) with primers 338F (5′-ACTCCTACGGGAGGCAGCAG-3′) and 806R (5′-GGACTACHVGGGTWTCTAAT-3′). The amplicons were then quantified, pooled, and used to construct sequencing libraries (PE 2*300), which were sequenced on an Illumina MiSeq platform (Illumina, USA).

The raw sequencing data were processed using Trimmomatic (version 0.36) for quality control and FLASH (version 1.2.11) for merging. A 50 bp sliding window was applied, and sequences with an average quality score below 20 were trimmed. Sequences <50 bp after trimming were removed. Overlapping sequences were merged with a maximum mismatch rate of 0.2, and those unable to merge were discarded. Samples were assigned based on barcodes and primers, with exact matching for barcodes and up to two mismatches allowed for primers. Sequences with ambiguous bases were removed. The optimized sequences were then clustered into operational taxonomic units (OTUs) using UPARSE (version 7.1) at a 97% sequence similarity threshold. The most abundant sequence within each OTU was selected as the representative sequence. Taxonomic classification of each OTU representative was performed using the RDP Classifier (version 2.13) against the 16S rRNA gene database (Silva v138), with a confidence threshold set at 0.7. The sequence data were deposited in the Genome Sequence Archive (Genomics, Proteomics & Bioinformatics 2021) at the National Genomics Data Center (Nucleic Acids Res 2022), China National Center for Bioinformation/Beijing Institute of Genomics, and Chinese Academy of Sciences (GSA-Human: HRA004798).

Metagenomic sequencing and genome assembly

Before conducting metagenomic sequencing, representative samples for sequencing can be selected from a large pool of microbiome diversity samples using the microPITA (version 3.3.1). The selection criteria include: (1) Maximum diversity, selecting samples with the highest α-diversity; (2) Most dissimilar, choosing the most extreme samples based on β-diversity; (3) Most representative, identifying samples that best reflect overall diversity characteristics according to β-diversity; (4) Targeted feature, selecting samples based on specific target species or features; (5) Distinct, picking samples with the largest β-diversity distance between phenotypes or groups; and (6) Discriminant, selecting samples that exhibit the smallest β-diversity distance between groups. Based on these criteria, fecal samples from 26 patients were randomly selected for metagenomic sequencing. DNA was extracted using the E.Z.N.A.® Soil DNA Kit (Omega Bio-Tek, USA) and fragmented to approximately 400 base pairs (bp) using Covaris M220 (Gene Company Limited, China). Spliced fragments were linked first, and self-attached fragments were removed by magnetic bead screening. DNA was extracted from the samples, and library construction was performed using the NEXTFLEX Rapid DNA-Seq Kit (Bioo Scientific, USA). The libraries were sequenced with paired-end sequencing (PE300) on an Illumina NovaSeq 6000 (Illumina Inc., San Diego, CA, USA) at Majorbio Bio-Pharm Technology Co., Ltd. (Shanghai, China).

The raw sequencing data were processed using Fastp (https://github.com/OpenGene/fastp) (version 0.20.0) to trim adapter sequences and filter low-quality reads. Reads shorter than 50 bp, had a quality score below 20, or contained N bases40 were excluded. Reads were aligned to the human genome by BWA (http://bio-bwa.sourceforge.net) (version 0.7.17), and any hits associated with the reads and their mated reads were removed. The quality-filtered data were assembled using MEGAHIT (https://github.com/voutcn/megahit) (version 1.1.2). To further predict and annotate genes, contigs longer than 300 bp were selected as the final assembly results. Raw sequence data were deposited in the Genome Sequence Archive (Genomics, Proteomics & Bioinformatics 2021) in the National Genomics Data Center (Nucleic Acids Res 2022), China National Center for Bioinformation/Beijing Institute of Genomics, and Chinese Academy of Sciences (GSA-Human: HRA004812).

Open reading frames (ORFs) were predicted from contigs using MetaGene (http://metagene.cb.k.u-tokyo.ac.jp/). The predicted ORFs with a length of 100 bp were retrieved and translated into amino acid sequences. A representative sequence from the non-redundant gene catalog was aligned to the NCBI NR database using Diamond (http://www.diamondsearch.org/index.php) (version 0.8.35) for taxonomic annotation. Non-redundant amino acid sequences were aligned with the Evolutionary Genealogy of Genes: Non-supervised Orthologous Groups (eggNOG), KEGG database, with an e-value cutoff of 1e-5 using DIAMOND.

Analysis of SCFAs in feces

Fecal samples from both patients and mice (20 mg) were weighed and dissolved in 800 µL of phosphate: water (5:1000, v/v) solution. The samples were processed using a high-throughput crusher at 50 Hz, followed by centrifugation at 13,000 × g at 4 °C for 15 min. The supernatant was carefully transferred to sample vials for analysis by GC-MS. The samples were analyzed using an 8890B-7000D GC/MSD mass spectrometer equipped with an HP FFAP capillary column (30 m × 0.25 mm × 0.25 μm, Agilent J&W Scientific, Folsom, CA, USA). High-purity helium (99.999%) was used as the carrier gas at a flow rate of 1.0 mL/min, with the injection port temperature set to 180 °C. The injection volume was 1 μL with a split injection ratio of 10:1. The initial temperature was set at 80 °C, increasing at a rate of 20 °C/min to 120 °C, then at 5 °C/min to 160 °C, and finally maintained at 220 °C for 3 min. The mass spectrometry conditions were as follows: electron bombardment ion source (EI), ion source temperature of 230 °C, and a quadrupole temperature of 150 °C. The electron energy was set to 70 eV. Quantification was performed using the MassHunter quantitative software (Agilent Technologies, v10.0.707.0), with parameters automatically identified and adjusted for SCFA analysis using artificial intelligence. Internal standards were added during the experiment to evaluate system stability. A mixed standard solution at 25 μg/mL was used as a Quality control (QC) sample, and the results showed that the relative standard deviation (RSD) of all target compounds was less than 15%, indicating reliable stability for quantitative analysis. Low, medium, and high concentration spiked samples (3 replicates each) were prepared for accuracy testing using GC-MS. For stability testing, low and high concentration QC samples were prepared in triplicate and analyzed under different conditions (0, 2, 6, 12, 24, 48 h at room temperature; −20 °C for 24 h; 4 °C for 24 h; and diluted samples at 10× and 20× concentrations above the upper limit of quantification). The RSD for all SCFAs was less than 15%, with accuracy within 80–120%, confirming the reliability of the method for quantitative analysis. Then, the concentration of each SCFA was calculated by using a standard curve.

Diagnose model and data processing

In this study, the random forest algorithm was utilized, a classifier that consists of multiple decision trees, to perform microbial selection. The model was constructed using out-of-bag (OOB) error for performance evaluation, rather than relying on separate training and test sets. The construction process involved bootstrap sampling (with replacement) to generate N datasets for building the decision trees. Data not included in the sampling were designated as OOB data. Classification results were determined based on the attributes of the detected samples across various dimensions, with the final class assigned based on the highest probability derived from the majority decision of the individual trees. The analysis was performed using R (Random Forest package, version 3.3.1), which enabled efficient identification of key species for sample classification.

Furthermore, a support vector machine (SVM) model for small sample size and binary classification task was applied to examine the predictive value of gut microbiota at the genus and species levels in determining the treatment efficacy of chemo-immunotherapy combinations for NSCLC. To evaluate the model, samples were randomly assigned to training and testing sets at a ratio of 8:2. The SVM model was then tested through 5-fold cross-validation separately. Cross-validation was used to assess how the results of a statistical analysis would generalize to an independent dataset.

The 5-fold cross-validation method was selected for several reasons: (1) Bias-Variance Trade-off: It provides a good balance between bias and variance, as increasing the number of folds reduces variance but may increase bias, while fewer folds can lower bias but result in higher variance. (2) Computational Efficiency: Five folds represent a reasonable compromise between achieving a reliable assessment and minimizing computational cost. (3) Data Utilization: It ensures that each data point is used for both training and validation, which is particularly important in studies with limited sample sizes.

The 5-fold cross-validation process typically works as follows: (1) Data Division: The dataset is randomly divided into 5 equal-sized folds or subsets. (2) Model Training and Validation: For each iteration, one-fold was used as the validation set, while the remaining 4 folds were used for training the model. This process was repeated 5 times, ensuring each fold was used once as the validation set. (3) Performance Metrics Calculation: For each fold, various performance metrics, including accuracy, precision, recall, and F1 score, were calculated. (4) Aggregation of Results: The results from all 5 folds were averaged to produce a single estimation.

Based on the preliminary analysis and previous findings, the following parameters were adopted for the SVM models in each experiment: a Radial Basis Function as the kernel function, a polynomial order of 3, and a regularization strength of 1.0. Meanwhile, five performance metrics - accuracy, precision, recall, F1 score, and AUC were employed for quantitative evaluation of different aspects of the model. To ensure the robustness and generalization ability of the model, the mean values from five different experiments were taken to represent the final predictive accuracy performance.

$${\rm{Accuracy}}=\frac{{TP}+{TN}}{{TP}+{TN}+{FP}+{FN}}$$
(1)
$${\rm{P}}{\rm{recision}}=\frac{{TP}}{{TP}+{FP}}$$
(2)
$${\rm{Recall}}=\frac{{TP}}{{TP}+{FN}}$$
(3)
$${\rm{F}}1-{\rm{score}}=\frac{2{TP}}{2{TP}+{FN}+{FP}}$$
(4)

TP, TN, FP, and FN stand for True Positive, True Negative, False Positive, and False Negative, respectively.

  1. (1)

    Accuracy: refers to the proportion of total predictions that are correct. While it is useful when the classes are balanced, it can be misleading in clinical settings where the cost of false negatives or false positives may differ significantly.

  2. (2)

    Precision: refers to the proportion of positive identifications that were actually correct. High precision is crucial in clinical settings where a false positive could lead to unnecessary treatment or anxiety.

  3. (3)

    Recall: refers to the proportion of true positives that were identified correctly. In clinical settings, high recall is important to ensure that as many true cases as possible are detected, even at the risk of some false positives.

  4. (4)

    F1 score: refers to the harmonic mean of precision and recall, providing a balance between the two. It is particularly useful when there is an uneven cost associated with different types of errors.

Data analysis

The Wilcoxon rank-sum test was used to analyze intergroup differences in alpha diversity. Principal coordinate analysis (PCoA), based on Bray-Curtis distance, was applied to assess the similarity in microbial community structures between samples, and the Permutational Multivariate Analysis of Variance (PERMANOVA) non-parametric test was used to evaluate the statistical significance of differences in microbial community structures between sample groups. To identify taxa and functional modules differentially represented between the two groups, linear discriminant analysis effect size (LEfSe) was performed. Differential abundances of phyla, genera, species, and functional modules between the two groups were tested using the Wilcoxon rank-sum test. One-way analysis of variance (ANOVA) was used to illustrate the difference of microbiota among different groups. All data are presented as mean ± SD. The relationship between SCFAs and cytokines was evaluated using Spearman correlation analysis. The correlation coefficient (R value) reflects the strength and direction of the relationship, with higher absolute values indicating stronger correlations. Statistical analyses were performed using SPSS 25.0 and GraphPad Prism 8.0 software. Significance levels are denoted as *p < 0.05, ** p < 0.01, ***p < 0.001, ****p < 0.0001.