Introduction

Sepsis is characterized by organ dysfunction resulting from a dysregulated host response to infection1. Despite significant advancements in diagnosis and treatment, it remains a leading cause of mortality worldwide2, with an estimated 11 million deaths attributed to sepsis in 20173. ARDS is a frequent and severe complication of sepsis4. Sepsis-associated ARDS exhibits greater severity than non-sepsis-associated ARDS, resulting in higher morbidity and mortality rates, with a reported mortality of approximately 40%5. However, there is currently a lack of targeted and effective pharmacological treatments for sepsis-associated ARDS6. Therefore, the identification and analysis of key molecular biomarkers for early diagnosis and timely intervention are crucial to reducing the mortality associated with this condition.

In recent years, ERS has emerged as a prominent area of research interest. The endoplasmic reticulum (ER) is a vital organelle responsible for protein transport, folding, and post-translational modification7,8. Pathological conditions such as sepsis and ischemia can lead to the accumulation of unfolded or misfolded proteins, disrupting ER homeostasis and triggering ERS9,10,11,12. Prolonged or excessive ERS can overwhelm the unfolded protein response (UPR), ultimately leading to apoptosis, tissue damage, and organ dysfunction13. Several studies have demonstrated that ERS plays a significant role in the pathogenesis of sepsis-associated ARDS. Therefore, exploring effective molecular biomarkers related to ERS holds great potential for the early identification and targeted treatment of sepsis-associated ARDS.

In this study, we employed a range of bioinformatics approaches and machine learning algorithms to identify ERS-related hub genes in sepsis-associated ARDS, and validated their expression using RT-qPCR. A schematic overview of the study workflow is presented in Fig. 1.

Fig. 1
figure 1

Flow chart of the study.

Materials and methods

Data sources

The GSE32707 dataset, annotated with platform GPL10558, was downloaded from the Gene Expression Omnibus (GEO) database. This dataset comprises 144 human peripheral blood samples, from which 31 patients with sepsis-associated ARDS and 34 healthy controls were selected for analysis.

DEGs analysis in sepsis-associated ARDS

The expression data from the two groups were standardized and normalized. DEGs between sepsis-associated ARDS patients and healthy controls were identified using the limma package in R14,with parameters set at | logFC |> 1 and a P-value < 0.05. The ggplot2 and pheatmap packages were employed to generate volcano plots and heatmaps, respectively, for result visualization.

Functional enrichment analysis of DEGs

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were conducted on the DEGs using the clusterProfiler and org.Hs.eg.db packages15. GO enrichment analysis included three categories: molecular function (MF), biological process (BP), and cellular component (CC). KEGG pathway analysis was performed to identify significantly enriched signaling pathways. P-value < 0.05 was considered statistically significant.

Immunoinfiltration analysis

The CIBERSORT algorithm was applied to evaluate the relative abundance of 22 immune cell types in each sample16. Differences in immune cell infiltration between sepsis-associated ARDS patients and healthy controls were assessed and visualized using the ggpubr package’s ggboxplot function. Correlations among infiltrating immune cells were further analyzed using the corrplot package in R.

WGCNA and screening of ERS differential genes in sepsis-associated ARDS

WGCNA was conducted on the GSE32707 dataset17. Hierarchical clustering was first applied to detect and exclude outlier samples. Subsequently, an appropriate soft-thresholding power (β) was selected to construct a scale-free network. The weighted adjacency matrix was then transformed into a topological overlap matrix (TOM) to estimate network connectivity. Hierarchical clustering was used to construct a dendrogram of gene modules based on TOM dissimilarity. Modules showing the highest correlation with sepsis-associated ARDS were selected as key modules.

A total of 1301 ERGs with relevance scores > 7 were downloaded from the GeneCards database (https://www.genecards.org/). The intersection of DEGs, WGCNA key module genes, and ERGs was taken to identify ERS-related differential genes in sepsis-associated ARDS18.

Identification of hub genes for ERS in sepsis-associated ARDS

The ERS-related DEGs identified above were further analyzed using three machine learning algorithms: LASSO regression using the glmnet package, RF using the randomForest package19 and SVM20 Hub genes were identified as those commonly selected by all three algorithms.

Analysis of the diagnostic value of hub genes in sepsis-associated ARDS

The diagnostic performance of the identified hub genes was assessed using ROC curve analysis. The area under the curve (AUC) was calculated using the pROC package in R21, and AUC greater than 0.7 was considered indicative of good diagnostic performance.

Validation of ERS hub genes expression by RT-qPCR

A total of 19 patients with sepsis-associated ARDS hospitalized between May 2022 and August 2023 were enrolled as the test group. Nineteen healthy individuals undergoing routine physical examinations during the same period were recruited as the control group. Inclusion criteria: Test group: (i) met diagnostic criteria for sepsis-associated ARDS; (ii) age ≥ 18 years; (iii) provided informed consent. Healthy control group: (i) confirmed to be in good health by the hospital’s physical examination center; (ii) age ≥ 18 years; (iii) provided informed consent. Exclusion criteria: Age < 18 years; Patients who died on admission or had incomplete clinical data; Patients with organic heart diseases, including ischemic cardiomyopathy, congenital heart disease, or myocarditis; Patients with end-stage chronic illnesses or malignancies; Individuals who refused to participate or did not provide informed consent; Patients with a history of psychiatric illness or cognitive dysfunction. Peripheral venous blood samples were collected, centrifuged at 4 °C, and the supernatant was aliquoted and stored at −80 °C. Informed consent was obtained from all participants, and the study was approved by the Clinical Research Ethics Committee of Shanxi Bethune Hospital (Approval No. SBQLL-2020-009).

Gene-specific primers were designed, and total RNA was extracted from serum samples of both groups. cDNA synthesis was performed using the HiScript II Q RT SuperMix kit. RT-qPCR was conducted using the Hieff qPCR SYBR Green Master Mix on a QuantStudio 6 Flex system. GAPDH was used as the internal control gene. The relative expression levels of ERS hub genes were calculated using the 2 − ΔΔCT method.

Statistical analysis

All statistical analyses were performed using R software (version 4.3.1) and GraphPad Prism (version 9.0). Normally distributed continuous variables were expressed as mean ± standard deviation (X ± SD) and compared using independent samples t-tests. Non-normally distributed variables were presented as median and interquartile range [M (IQR)] and compared using the Mann–Whitney U test. The diagnostic efficacy of hub genes was evaluated by ROC curve analysis. A two-tailed P-value < 0.05 was considered statistically significant.

Results

DEGs in sepsis-associated ARDS

DEGs associated with sepsis-associated ARDS were identified using the limma package in R. A total of 438 DEGs were screened, including 226 upregulated and 212 downregulated genes (Fig. 2A). A heatmap was generated to visualize the top 25 upregulated and top 25 downregulated DEGs (Fig. 2B).

Fig. 2
figure 2

Identification of DEGs in sepsis-associated ARDS. (A) Volcano plot displaying 438 DEGs. Red dots represent significantly upregulated genes, while blue dots represent significantly downregulated genes. (B) Heatmap showing the top 25 upregulated and top 25 downregulated DEGs.

Functional enrichment analysis of DEGs

GO enrichment analysis revealed that the BP terms were primarily enriched in leukocyte activation and migration. The CC terms were mainly associated with specific granules, ribosomal subunits, and cytoplasmic ribosomes. The MF terms were predominantly enriched in structural constituents of ribosomes, haptoglobin binding, and antioxidant activity (Fig. 3A). KEGG pathway analysis indicated significant enrichment of several inflammatory signaling pathways, including the neutrophil extracellular trap (NET) formation pathway, the NOD-like receptor signaling pathway, and the NF-κB signaling pathway (Fig. 3B,C)22,23,24.

Fig. 3
figure 3

Functional enrichment of the DEGs. (A) GO enrichment analysis showing significantly enriched terms in BP, CC, and MF categories. (B) Bubble plot of KEGG pathway enrichment analysis. (C) Network plot of KEGG pathway enrichment analysis.

Immunoinfiltration analysis

Immunoinfiltration between the two groups was evaluated using the CIBERSORT algorithm based on the DEGs. Samples with a total immune cell abundance value of zero were excluded, resulting in the inclusion of 22 immune cell types. The distribution of immune cell infiltration across individual samples in both groups is shown in Fig. 4A. Compared to the healthy control group, the sepsis-associated ARDS group exhibited higher infiltration levels of activated CD4+T memory cells, γδ T cells, resting NK cells, and activated mast cells, while activated NK cells showed lower infiltration (Fig. 4B). Correlation analysis among immune cell populations revealed that eosinophils were positively correlated with activated dendritic cells and negatively correlated with neutrophils (Fig. 4C).

Fig. 4
figure 4

Immune cell subtype distribution based on DEGs. (A) Bar plot showing the relative proportion of immune cell infiltration across samples in the two groups. (B) Comparison of immune cell infiltration levels between the sepsis-associated ARDS and healthy control groups. (C) Heatmap of correlation analysis among immune cell types.*P < 0.05, **P < 0.01, ***P < 0.001; ns indicates no statistically significant difference.

WGCNA and identification of key ERS-related differential genes

The WGCNA R package was applied to perform clustering and establish a scale-free co-expression network, and 15 was selected as the soft threshold with a scale-free index of 0.85 for analysis (Fig. 5A,B). This analysis resulted in the identification of 11 distinct gene modules (Fig. 5C). The correlation between each module and sepsis-associated ARDS was assessed. The turquoise module, which showed the strongest positive correlation with sepsis-associated ARDS (r = 0.43, P = 3e−4), was selected as the key module and contained 2093 genes (Fig. 5D). To identify ERS-related differential genes, the intersection of three gene sets—DEGs, key module genes, and ERGs—was taken, yielding a total of 31 key ERS-associated differential genes (Fig. 5E).

Fig. 5
figure 5

WGCNA and identification of key ERS-related differentially genes. (A, B) Analysis of soft-thresholding power for scale-free topology in WGCNA. (C) Clustering dendrogram of gene modules identified by WGCNA. (D) Heatmap showing the correlation between gene modules and sepsis-associated ARDS. (E) Venn diagram showing the intersection of DEGs, WGCNA key module genes, and ERGs, resulting in 31 key ERS-related differential genes.

Screening of ERS hub genes in sepsis-associated ARDS

Three machine learning algorithms—LASSO, RF, and SVM—were applied to identify hub genes related to ERS in sepsis-associated ARDS.LASSO regression analysis identified 12 candidate genes with non-zero coefficients (Fig. 6A,B). Random forest analysis selected 16 genes with a Gini index greater than 0.5 (Fig. 6C,D). The SVM algorithm identified 28 relevant genes based on recursive feature elimination (Fig. 6E,F). By intersecting the results from all three algorithms, five overlapping ERS-related hub genes were identified: HSPB1, LCN2, SGK1, STAT3, and YWHAQ (Fig. 6G).

Fig. 6
figure 6

Screening of ERS hub genes in sepsis-associated ARDS. (A) LASSO coefficient profiles of the 30 candidate genes. (B) Ten-fold cross-validation for LASSO regression with the optimal lambda value (λ = 0.0134), identifying 12 characteristic genes. (C) Error rate curve with confidence intervals for the random forest model (ntree = 500). (D) Variable importance plot showing the top genes ranked by mean decrease in Gini index in the random forest model. (E–F) Results of the SVM algorithm: the x-axis represents the number of selected features, with a cross-validation precision of 0.802 and error of 0.198. (G) Venn diagram showing the intersection of hub genes identified by LASSO, RF and SVM algorithms.

Analysis of ERS hub gene expression and diagnostic performance

The ggboxplot function in R was used to generate box plots illustrating the expression levels of endoplasmic reticulum stress-related hub genes in the sepsis-associated ARDS group versus healthy controls from the GSE32707 dataset. Among these, LCN2 and STAT3 were significantly upregulated in sepsis-associated ARDS, while HSPB1, SGK1, and YWHAQ were significantly downregulated (P < 0.05 for all; Fig. 7A–E). ROC curves were plotted using R to evaluate the diagnostic performance of each gene, and the AUC was calculated. The AUC values were as follows: HSPB1, 0.741; LCN2, 0.824; SGK1, 0.791; STAT3, 0.817; and YWHAQ, 0.798 (Fig. 7F).

Fig. 7
figure 7

Expression of ERS hub genes and their diagnostic efficacy in sepsis-associated ARDS. (AE) Box plots showing the expression levels of HSPB1, LCN2, SGK1, STAT3, and YWHAQ in sepsis-associated ARDS patients and healthy controls from the GSE32707 dataset. (F) ROC curves illustrating the diagnostic performance of the five ERS-related hub genes.

Validation of ERS hub genes by RT-qPCR

To validate the expression of the identified ERS hub genes, RT-qPCR was performed on peripheral blood samples collected from patients with sepsis-associated ARDS and control samples. The results revealed that STAT3 expression was significantly elevated in the sepsis-associated ARDS samples (2.435 ± 1.147) compared to control samples (0.979 ± 0.319, P < 0.0001). Conversely, YWHAQ expression was significantly decreased [0.300 (0.170, 0.347)] compared to control samples [1.342 (0.470, 1.934), P < 0.001]. No statistically significant differences in expression were observed for HSPB1 [1.191 (0.641, 1.563) vs. 1.210 (0.649, 1.438)], SGK1 [1.587 (0.444, 2.284) vs. 1.289 (0.506, 1.221)], and LCN2 [0.764 (0.483, 1.054) vs. 1.243 (0.489, 1.594)] between the two groups (P > 0.05 for all)(Fig. 8A–E).

Fig. 8
figure 8

Expression of ERS hub genes in patients with sepsis-associated ARDS and healthy controls. (AE) Relative mRNA expression levels of HSPB1, LCN2, SGK1, STAT3, and YWHAQ in peripheral blood samples measured by RT-qPCR.Data for STAT3 are presented as x ± SD, while data for HSPB1, LCN2, SGK1, and YWHAQ are expressed as median [interquartile range, IQR].Statistical significance: **P < 0.001, ***P < 0.0001; ns indicates no statistically significant difference.

Discussion

In this study, we systematically screened and validated ERS-related key genes in sepsis-associated ARDS by integrating differential expression analysis, immunoinfiltration, WGCNA, and multiple machine learning algorithms using the GSE32707 dataset from the GEO database. Five ERS-related hub genes were ultimately identified: STAT3, HSPB1, YWHAQ, LCN2, and SGK1. Among them, STAT3 was significantly upregulated and YWHAQ significantly downregulated in patients, a finding further validated by RT-qPCR analysis. These results suggest that STAT3 and YWHAQ may serve as potential diagnostic biomarkers for sepsis-associated ARDS.

Differential expression analysis using R software identified a total of 438 DEGs, including 226 upregulated and 212 downregulated genes. Functional enrichment analysis indicated that these DEGs were significantly enriched in biological pathways related to leukocyte activation and migration, ribosomal subunits, antioxidant activity, and the nuclear factor kappa B (NF-κB) signaling pathway. These findings suggest that the biological roles of DEGs in sepsis-associated ARDS are closely related to inflammatory responses, ERS, and NF-κB pathway activation. Previous studies have demonstrated that 4-phenylbutyric acid (4-PBA) alleviates lipopolysaccharide (LPS)-induced lung inflammation by inhibiting ERS and suppressing activation of the NF-κB signaling pathway. Specifically, 4-PBA was shown to reduce the expression of unfolded protein response (UPR)-related proteins in the lungs of LPS-treated mice and to significantly decrease the number of inflammatory cells in bronchoalveolar lavage fluid (BALF), as well as lower the levels of inflammatory cytokines such as interferon-gamma (IFN-γ), tumor necrosis factor-alpha (TNF-α), interleukin-1 beta (IL-1β), and intercellular adhesion molecule 1 (ICAM-1)9. These results are consistent with the findings of the present study.

Immunoinfiltration analysis revealed a higher proportion of activated CD4⁺ T memory cells, γδ T cells, and mast cells in sepsis-associated ARDS compared to healthy controls. In addition, eosinophil infiltration showed a positive correlation with activated dendritic cells, and a negative correlation with neutrophil infiltration. Osorio et al. previously reported that the IRE1α–XBP1 axis of the UPR plays a crucial role in regulating dendritic cell function, and that deletion of XBP1 impairs antigen presentation by CD8α⁺ dendritic cells25. These findings suggest that ERS may influence the differentiation and functional regulation of both innate and adaptive immune cells involved in the pathogenesis of sepsis-associated ARDS.

WGCNA analysis was performed to obtain the gene modules with the highest association with sepsis-associated ARDS, containing a total of 2093 genes. The intersection of DEGs, key module genes, and ERGs yielded 31 key ERS-related differential genes. Five ERS hub genes, STAT3, HSPB1, YWHAQ, LCN2 and SGK1, were identified through three distinct machine learning algorithms. ROC analysis was performed to evaluate diagnostic efficacy, revealing that the AUC for all five hub genes was greater than 0.74, suggesting favorable diagnostic potential. The expression levels of the identified hub genes were validated in clinical samples using RT-qPCR. Compared with the healthy control group, the expression of STAT3 was significantly upregulated, while YWHAQ was significantly downregulated in the sepsis-associated ARDS group. In contrast, the expression levels of HSPB1, SGK1, and LCN2 showed no statistically significant differences between the two groups.

STAT3 is a member of the STAT protein family. The JAK/STAT pathway is a major signaling pathway for various key cytokines and initiates the expression of NF-κB, IL-1β, TNF-α, IL-6, and others. It is widely involved in various biological processes, including inflammation and apoptosis26,27,28. In animal models of sepsis-induced lung injury, STAT3 expression is positively correlated with the expression of these inflammatory factors and the severity of lung damage. Tofacitinib, a JAK inhibitor, blocks JAK-STAT3 pathway activation, markedly alleviating lung tissue injury and pulmonary edema severity in sepsis models29. In addition, methotrexate (MTX), an anti-inflammatory agent, inhibits the JAK2/STAT3 signaling pathway and has been proven to reduce systemic inflammation and pulmonary injury in CLP-induced sepsis rat models30. Forkhead box protein A2 (FOXA2) mitigates LPS-induced ERS in part by suppressing the p38/STAT3 signaling cascade31.

YWHAQ, a member of the 14-3-3 protein family, plays key roles in processes such as protein transport, signal transduction, and cell apoptosis. The encoded protein 14-3-3θ is a negative regulator of apoptosis signal-regulating kinase 1 (ASK1). Experimental evidence further suggests that regulated IRE1-dependent decay (RIDD) in the IRE1α pathway induces apoptosis by downregulating 14-3-3θ expression and activating the ASK1/JNK signaling pathway32. 14-3-3 proteins can inhibit NF-κB activation mediated by Toll-like receptor 2 (TLR2), while enhancing the activation of transcription factors dependent on TLR433. In addition, 14-3-3 proteins can bind to exosomes and play a vital role in sepsis-associated acute lung injury (ALI)34.

HSPB1, a prominent member of the heat shock protein (HSP) family, plays a vital role in the host response to various pathophysiological stresses, including injury, stress, hypoxia, and infection35. In cecal ligation and puncture (CLP) models, HSPB1 knockout (HSPB1−/−) mice exhibited nearly twice the mortality rate compared to wild-type (WT) mice.SGK1, a member of the protein kinase subfamily, is a serine/threonine kinase that serves as a key regulatory node in multiple signaling pathways and phosphorylation processes. It plays essential roles in cell proliferation, ion channel regulation, and signal transduction and is considered a critical player in inflammatory responses36. It plays essential roles in cell proliferation, ion channel regulation, and signal transduction and is considered a critical player in inflammatory responses37. LCN2, a member of the lipocalin family, is a key regulator of iron metabolism, oxidative stress, and inflammation in mammals. Experimental data indicate that LCN2 levels in plasma-derived extracellular vesicles (EVs) are significantly elevated in sepsis-associated ARDS patients compared to those without ARDS38.

However this study inevitably has some limitations. First, the clinical sample size was small, and the limited sample size may affect the stability and statistical efficacy of the results. Second, although the hub genes of ERS were identified and analyzed, the exact mechanism of the roles of STAT3 and YWHAQ in sepsis-associated ARDS was not elucidated. Therefore, further in-depth exploration is needed in the future to provide an important theoretical basis for translational clinical applications.

In summary, this study systematically identified five ERS-related hub genes—STAT3, HSPB1, YWHAQ, LCN2, and SGK1—in sepsis-associated ARDS through integrated bioinformatics analysis and multiple machine learning algorithms. Diagnostic efficacy analysis demonstrated favorable discriminatory power for all five genes. Subsequent RT-qPCR validation further confirmed that STAT3 was significantly upregulated and YWHAQ was downregulated in samples. These findings suggest that STAT3 and YWHAQ play pivotal roles in the pathogenesis of sepsis-associated ARDS and may serve as promising molecular markers for early diagnosis and potential targets for precision intervention.