Abstract
Necrotizing enterocolitis (NEC) and neonatal sepsis (NS) are major causes of morbidity and mortality in preterm infants, yet their shared molecular basis remains poorly defined. In this study, we integrated two public transcriptomic datasets and applied differential expression analysis, weighted gene co-expression network analysis (WGCNA), and three machine learning algorithms (LASSO, random forest, and XGBoost) to identify shared biomarkers. Four immune-related biomarkers (MAP2K6, CHKA, CA4, and ENTPD7) were identified and used to construct diagnostic models with high performance (AUC = 0.864 for NS; 1.000 for NEC). Immune infiltration analysis revealed distinct immune cell signatures and strong correlations with the selected biomarkers. Regulatory network construction further uncovered potential transcriptional and post-transcriptional regulatory mechanisms. These findings suggest a common immune-related pathogenesis underlying NEC and NS and highlight shared biomarkers with strong diagnostic potential. This integrative analysis provides a foundation for improved early diagnosis and targeted interventions in neonatal care.
Similar content being viewed by others
Introduction
Necrotizing enterocolitis (NEC) and neonatal sepsis (NS) are significant contributors to both morbidity and mortality in neonates, especially in preterm infants1. NEC is the leading and most lethal gastrointestinal disorder that predominantly affects premature infants2,3,4,5. Epidemiological research reveals that NEC affects 1–5% of neonatal intensive care unit patients2with mortality rates reaching up to 50% in preterm infants undergoing surgical intervention6,7. Those who survive face complications including nutritional deficiencies, hepatic failure, short-bowel syndrome, and neurological deficits8,9,10. However, the etiology of NEC is complex and not fully understood. The pathological features of NEC are marked by ischemic necrosis of the intestinal tissue, which results in the invasion of intestinal microorganisms, ultimately leading to severe sepsis and death7,11,12. Moreover, it is essential to increase the focus on NS in preterm infants who are also affected by NEC.
NS is the leading cause of neonatal mortality globally13with an incidence ranging from 1.5 to 21% in infants with very low birth weight14. NS is characterized by systemic infection and widespread inflammation, with a mortality rate of up to 15%15. Diagnosing such conditions is challenging because the onset of infection in preterm infants is insidious, and the rapid progression of the disease often leads to severe complications, including neonatal septic shock, purulent meningitis, disseminated intravascular coagulation, and even death16. Thus, it is essential to identify and diagnose NS early to allow timely and appropriate treatment.
The relationship between NEC and NS is complex, as both conditions share common risk factors and can occur concurrently in vulnerable populations, particularly in premature infants. One study reported that the mortality rate of premature infants who develop NS within three days of being diagnosed with NEC is significantly greater than that of infants with NEC who do not develop NS17. In addition, systemic infection and inflammatory responses caused by NS may also affect intestinal blood perfusion and immune function, potentially resulting in intestinal ischemia and the onset of NEC18. However, the pathogenesis and potential molecular markers common to these two diseases are still not fully understood.
Advances in bioinformatics have facilitated the rigorous exploration of transcriptional landscapes and the decoding of gene-centric disease mechanisms through integrated multiomics analytics. Several previous studies have focused primarily on comparing NEC or NS infants with healthy infants19,20but investigations of biomarkers that are pivotal in the shared progression of both NEC and NS remain limited. This study employs integrative bioinformatics and machine learning to delineate biomarkers common to NEC and NS, uncovering the shared molecular pathways underlying their comorbidity.
Methods
Gathering of relevant data
Gene expression datasets for NEC (GSE64801) and NS (GSE69686) were retrieved from the GEO database (https://www.ncbi.nlm.nih.gov/geo/)21. The GSE69686 dataset22,23derived from the GPL20292 platform, consists of 64 neonates diagnosed with NS and 85 control samples. According to the original study, neonatal sepsis was diagnosed based on clinical signs (e.g., temperature instability, apnea, respiratory distress), supported by laboratory findings and/or positive blood cultures. The GSE64801 dataset24,25originating from the GPL11154 platform, includes 9 neonates diagnosed with NEC and 5 control samples. NEC was diagnosed as Bell stage II or higher based on clinical symptoms and radiological findings, as reported in the original GEO submission. The control samples in GSE64801 were from preterm infants undergoing intestinal surgery for non-NEC conditions, such as bowel perforation, bowel obstruction, or intestinal atresia. For external validation of NS samples, we utilized data derived from the GPL570 and GPL13667 platforms within the GSE25504 dataset26,27. Given the scarcity of publicly available NEC datasets, independent validation for NEC could not be performed. Detailed sample information, including disease classification, sample type, and platform, is provided in Table 1.
After downloading and annotating the expression matrices, when multiple probes mapped to the same gene symbol, duplicates were removed and only a single representative value was retained per gene to ensure one-to-one correspondence. The expression data were then log₂-transformed, if necessary, and normalized. To address platform-specific batch effects, we applied the ComBat function from the sva package to integrate data from the GPL570 and GPL13667 platforms within the GSE25504 dataset.
Analysis of differentially expressed genes (DEGs)
To identify DEGs in the NEC and NS datasets, the “limma” package in R was employed28. DEGs were selected based on the criteria |log2FC| > 0.5 and a P-value < 0.05. Subsequently, a volcano plot was created employing the ggplot2 package within R software (version 4.3.3) to illustrate these DEGs.
Functional enrichment analysis
We performed Gene Ontology (GO) enrichment analysis using the org.Hs.eg.db, clusterProfiler, and enrichplot packages in R software (version 4.3.3) to functionally annotate DEGs and identify associated biological pathways. This analysis delineated the GO term landscape of the DEGs, encompassing biological processes, cellular components, and molecular functions. To identify pathway-level perturbations, we performed Kyoto Encyclopedia of Genes and Genomes (KEGG,https://www.kegg.jp/kegg/kegg1.html) enrichment analysis29mapping DEGs to canonical signaling pathways and metabolic networks. The results were considered statistically significant when the false discovery rate (FDR)was < 0.01 and the adjusted P-value was < 0.05. We visualized the findings using the ‘ggplot2’ and ‘GOplot’ packages.
WGCNA
WGCNA was performed on the transcriptomic profiles from the GSE69686 cohort using R software. Scale-free topology modeling (fit index R²>0.75) informed selection of soft-thresholding power (β = 7). Dynamic hybrid tree cutting (minimum module size = 50) delineated co-expression clusters, with module eigengenes (MEs) subsequently derived as first principal components for transcriptional activity quantification. The associations between modules and traits were assessed utilizing Spearman’s correlation analysis, with the significance of genes and their membership within modules represented through scatter plots. The results were visualized using dendrograms, heatmaps, and scatter plots to emphasize key findings in gene co-expression. Subsequently, the VennDiagram R packagewas used to identify and visualize the overlapping genes between WGCNA and DEGs.
Machine learning algorithms
We used three machine learning techniques to identify candidate genes: least absolute shrinkage and selection operator (LASSO), random forest (RF), and extreme gradient boosting (XGBoost). Firstly, LASSO regression was conducted utilizing the ‘glmnet’ package, incorporating 10-fold cross-validation. The optimal solution was identified by selecting the lambda value that yielded the minimum result. Subsequently, RF analysis was conducted using the “randomForest” package. Next, we performed XGBoost analysis utilizing the “xgboost” package. The common genes identified through the three machine learning algorithms were recognized as key biomarkers.
Protein-protein interaction(PPI) network analysis
In order to explore the relationships among the common genes that were identified, a PPI network was established utilizing the GeneMANIA (http://genemania.org/).
Diagnostic model construction based on key biomarkers
A diagnostic model based on key biomarkers was developed utilizing logistic regression analysis with the ‘rms’ package in R software (version 4.3.3). The evaluation of model performance was conducted through the creation of receiver operating characteristic (ROC) curves, alongside the computation of the area under the curve (AUC), utilizing the “pROC” package. Calibration curves were plotted to compare predicted probabilities with observed outcomes. Additionally, GSE25504 served as an external dataset for validating the shared hub genes.
Immune cell infiltration analysis
CIBERSORT with the LM22 signature was used to deconvolute immune cell proportions across 22 annotated subtypes. Additionally, quantitative single-sample gene set enrichment analysis (ssGSEA) was performed to assess immune cell abundance between the two diseases, and the Wilcoxon test was used to compare differences in immune cell enrichment scores. Associations between key genes and immune cell types were further quantified using Spearman correlation analysis, with results visualized as correlation heatmaps via the ‘ggcorrplot’ package.
TF-miRNA-mRNA regulatory network
Potential transcription factors (TFs) and microRNAs regulating key genes were predicted using NetworkAnalyst, and the networks were visualized with Cytoscape (version 3.9.1).
Ethics statement
This study was reviewed and approved by the Ethics Committee of the Second Affiliated Hospital of Guangxi Medical University (Guangxi, China). All methods were performed in accordance with the relevant guidelines and regulations, and the study complied with the principles of the Declaration of Helsinki. For the neonatal samples included in the original GEO datasets (GSE64801, GSE69686, and GSE25504), the collection was approved by the respective institutional review boards of the original studies, and informed consent was obtained from their legal guardians.
Results
Identification of DEGs in NS and NEC
The expression of DEGs in the GSE64801 and GSE69686 datasets is shown in Fig. 1a and b, respectively. Differential expression of genes was illustrated using volcano plots, with upregulated genes highlighted in red (|log2FC| >0.5, FDR < 0.05) and downregulated genes shown in blue (|log2FC| <−0.5, FDR < 0.05). Within the GSE69686 dataset, a cumulative total of 482 DEGs were detected, comprising 377 genes exhibiting upregulation and 105 genes exhibiting downregulation. In a similar vein, the GSE64801 dataset revealed 724 DEGs, which included 270 genes that were upregulated and 454 genes that were downregulated. The top 20 DEGs in the GSE69686 and GSE64801 datasets are shown in heatmaps (Fig. 1c, d). These heatmaps illustrate the variations in gene expression observed between the control and disease cohorts.
Visualization of DEGs in NS and NEC. (a) Volcano plot showing DEGs in the NEC dataset. Upregulated genes (red) and downregulated genes (blue) are based on |log2FC| > 0.5 and p-value < 0.05. (b) Volcano plot for DEGs in the NS dataset, using the same criteria as in panel a. (c) Heatmap of the top 20 DEGs in the NEC dataset. (d) Heatmap of the top 20 DEGs in the NS dataset.
Functional enrichment analysis of DEGs in NS and NEC
To enhance the understanding of the biological functions associated with the DEGs, both GO term enrichment and KEGG pathway mapping were performed. In the GSE69686 dataset, GO enrichment indicated significant involvement of immune-related processes, including immune response–regulating pathways, myeloid leukocyte activation, and cytokine production regulation(Fig. 2a), whereas KEGG analysis highlighted pathways such as T cell receptor signaling and neutrophil extracellular trap formation (Fig. 2b). A circular plot further illustrated the upregulated and downregulated genes contributing to these pathways (Fig. 2c). Similarly, the GSE64801 dataset revealed enriched terms related to B cell–mediated immunity, immunoglobulin-mediated immune response, and production of molecular mediator of immune response (Fig. 2d). A similar circle plot showed the expression trends of key genes in these immune-related processes (Fig. 2e). These results emphasize the involvement of immune-related pathways and biological processes in both datasets, providing insights into their potential roles in the mechanisms of NEC and NS.
Identification of key modules through WGCNA
To investigate the co-expression patterns of genes associated with NS, we constructed a weighted gene co-expression network using the WGCNA framework. First, hierarchical clustering of samples was performed to detect and remove outliers (Fig. 3a). The optimal soft thresholding power was determined to be 7 by evaluating the scale-free topology fitting index (R²), which reached 0.75 at this threshold, indicating a strong approximation to a scale-free network (Fig. 3b). Using this parameter, we identified 8 co-expression modules through hierarchical clustering and dynamic tree-cutting algorithms(Fig. 3c ,d). Notably, the blue module displayed the strongest positive correlation with NS (r = 0.46, P = 4.16e-09; Fig. 3e), and a scatterplot illustrating module membership versus gene significance demonstrated a strong linear relationship (r = 0.55, P = 5.2e − 118; Fig. 3f) within the blue module. A Venn diagram was generated to analyze the intersection between DEGs and blue module genes in the GSE69686 dataset, and 346 common genes were identified (Fig. 4a). Collectively, these findings highlight functionally relevant gene networks with potential roles in sepsis pathogenesis and identify candidate genes for further exploration.
Weighted gene co-expression network analysis (WGCNA). (a) A cluster tree of 149 samples. (b) Scale-free topology fit index as a function of soft-thresholding power (signed R²). (c) Dendrogram of module eigengenes showing their clustering relationships. (d) Gene clustering dendrogram with module colors assigned by dynamic tree cut, based on topological overlap dissimilarity. (e) Heatmap of module-trait correlations for two subtypes (control vs. sepsis). Each cell reports Pearson’s r and corresponding P value. (f) Scatterplot for the blue module displaying the relationship between module membership and gene significance for the trait.
Selection and evaluation of candidate hub genes by integrative machine-learning. (a)Venn diagram showing the overlap between WGCNA module genes and DEGs in the NS dataset. (b) Venn diagram of the DEGs in the NEC and NS datasets. (c, d) LASSO regression: (c) coefficient profiles across log(λ) values and (d) ten-fold cross-validation of binomial deviance, identifying five key genes. (e, f) Based on Random Forest algorithm to screen key genes. (g–i) XGBoost algorithm identified 7 genes. (j) Venn diagram of the four genes common to all three algorithms.
Screening hub genes through machine learning
To identify key biomarkers associated with NS and NEC, a Venn diagram analysis was conducted on the intersection of DEGs between NS and NEC, revealing 10 candidate genes(Fig. 4b). These candidates were then further screened using three machine learning algorithms to determine the final key biomarkers. Firstly, the LASSO regression algorithm was applied to develop a model and select 5 key genes (Fig. 4c, d). Subsequently, we employed a random forest model to identify 5 key genes and visualized the model’s performance using error rate plots and feature importance rankings (Fig. 4e, f). XGBoost analysis prioritized genes by calculating their contribution to model performance, while SHapley Additive exPlanations(SHAP) values summarized each gene’s directional and quantitative impact on predictions through an overall summary plot and feature-specific importance measures, ultimately identifying 7 key genes(Fig. 4g-i ). Finally, the intersection of the key genes selected by LASSO, RF, and XGBoost using a Venn diagram identified 4 overlapping hub genes (Fig. 4j), suggesting that these key biomarkers warrant further analyses.
Diagnostic model performance and calibration
The diagnostic models for both NS and NEC demonstrated excellent discrimination, with an AUC of 0.864 (95% CI: 0.804–0.924) for NS (Fig. 5a) and an AUC of 1.0 for NEC (Fig. 5b). In the external validation dataset for NS, the model achieved an AUC of 0.909(95% CI: 0.731–1), further demonstrating its robustness(Fig. 5c). The calibration curves demonstrated excellent alignment between model predictions and empirical outcomes for both diseases (Fig. 5d, e), underscoring the reliability of these predictive models.
Performance and interaction analysis of the four-gene diagnostic model. (a) ROC curve of the model in the GSE69686 cohort (AUC = 0.864). (b) ROC curve of the model in the GSE64801 cohort (AUC = 1.000). (c) ROC curves of the model in the GSE25504 validation dataset (AUC = 0.909). (d) Calibration curve of the GSE69686 model (solid line = apparent performance; dashed line = bias-corrected). (e)Calibration curve of the GSE64801 model (same conventions as panel c). (f ) PPI network of the four common genes, constructed by GeneMANIA.
Immune cell infiltration in NS and NEC cohorts. (a) Relative proportions of 22 immune cell types in NS versus control samples. (b) Relative proportions of 22 immune cell types in NEC versus control samples. (c) Boxplots comparing immune cell abundances between NS and control groups. (d) Boxplots comparing immune cell abundances between NEC and control groups. Ns, not significant; * p < 0.05; ** p < 0.01; *** p < 0.001.
Functional interaction analysis using genemania
Network analysis of four biomarkers (MAP2K6, CHKA, CA4, and ENTPD7) was conducted using GeneMANIA, revealing that physical interactions (77.64%) dominated the gene-gene relationships, followed by co-expression and predicted functional links(Fig. 5f). The key enriched pathways included the stress-activated MAPK cascade, phosphatidylcholine metabolism, and regulation of protein serine/threonine kinase activity, suggesting these genes may play central roles in immune signaling and metabolic processes associated with disease pathogenesis.
Immune infiltration and gene correlation analysis
CIBERSORT-based deconvolution revealed distinct immune cell composition profiles in NS and NEC relative to controls, as visualized in the stacked bar plots (Fig. 6a ,b). Subsequent single-sample gene set enrichment analysis (ssGSEA) further demonstrated disease-specific alterations in immune cell subpopulations. The NS group exhibited significantly downregulated proportions of activated B cells and activated CD8+ T cells, whereas activated dendritic cells, central memory CD8+ T cells, macrophages, and plasmacytoid dendritic cells showed prominent upregulation (all P < 0.001) (Fig. 6c). In contrast, the NEC group demonstrated selective upregulation of type 2 T helper (Th2) cells (P < 0.01)( Fig. 6d).
Correlation heatmap analysis highlighted key associations between candidate genes and immune cell subsets across both the NEC and NS datasets (Fig. 7a and b). CHKA exhibited significant negative correlations with activated CD8+ T cells, effector memory CD8+ T cells, and immature B cells. Similarly, CA4 was inversely correlated with effector memory CD4+ T cells, whereas ENTPD7 was negatively associated with both activated B cells and activated CD8+ T cells. All reported correlations reached statistical significance (adjusted P < 0.05). These observations highlight the potential roles of these genes in modulating disease-specific immune pathways and underscore the importance of immune-mediated mechanisms in the pathogenesis of NEC and NS.
TF–miRNA‒mRNA regulatory network
A TF–miRNA–mRNA regulatory network was generated to examine the upstream regulators of MAP2K6, ENTPD7, CHKA, and CA4 (Fig. 7c). The network revealed numerous transcription factors and miRNAs converging on each key gene, highlighting a complex interplay between transcriptional and post-transcriptional processes. MAP2K6 was linked to TFs such as RELA, STAT1, and POU5F1, and miRNAs in the hsa-miR-29 family also appeared to target this gene. Similarly, ENTPD7 and CHKA each showed associations with several TFs (e.g., MYC, E2F1, REST) and multiple miRNAs, including hsa-miR-203, hsa-miR-30a, and hsa-miR-25. CA4 was found to be regulated primarily by AP-1 components (e.g., FOS, JUN, JUNB) and select miRNAs (e.g., hsa-miR-485-5p, hsa-miR-190).
These findings underscore the multifaceted nature of the regulation of these key genes and their potential roles in disease pathogenesis.
Discussion
NEC and NS are life-threatening inflammatory diseases in neonates, associated with high morbidity and mortality rates30,31,32. Early diagnosis remains challenging due to the lack of sensitive and specific biomarkers, coupled with nonspecific clinical presentations. In this study, we systematically applied advanced bioinformatics and machine learning techniques to elucidate the molecular mechanisms underlying NEC and NS. By integrating transcriptomic datasets, immune infiltration analysis, PPI network construction, TF–miRNA–mRNA regulatory mapping, and predictive modeling, we identified four key biomarkers (MAP2K6, ENTPD7, CHKA, and CA4) and delineated their regulatory networks. The diagnostic models developed based on these biomarkers demonstrated strong predictive performance. These findings provide novel insights into the shared immunopathogenesis of NEC and NS and offer potential targets for early diagnosis and therapeutic intervention.
The identification of 482 DEGs in NS and 724 DEGs in NEC, enriched in immune-related pathways (T cell receptor signaling, B cell-mediated immunity), underscores the centrality of immune dysfunction in both diseases. The upregulation of myeloid leukocyte activation pathways in NS aligns with its systemic inflammatory profile33where excessive neutrophil extracellular traps (NETs) exacerbate tissue damage, as observed in a mouse model of infant sepsis34. In contrast, the enrichment of B cell-mediated immunity in NEC suggests that local mucosal immune responses may be driven by IgA overproduction caused by dysbiosis, which is consistent with recent single-cell studies demonstrating that intestinal microbiota exposure triggers clonal expansion and repertoire remodeling of IgA-producing B cells35.
The four hub genes (MAP2K6, ENTPD7, CHKA, CA4) identified through WGCNA and machine learning further highlight immune-metabolic crosstalk. MAP2K6, belonging to the family of MAPK kinases, predominantly initiates the p38/MAPK signaling cascade and is essential in modulating various physiological processes, including inflammatory responses, stress adaptation, programmed cell death, and the maintenance of metabolic equilibrium36,37,38. MAP2K6 activation facilitates NF-κB nuclear translocation and subsequent release of pro-inflammatory cytokines (e.g., TNF-α, IL-6), thereby amplifying inflammatory cascades in a kinase activity-dependent manner39. The upregulation of MAP2K6 in NEC and NS found in this study is consistent with the results of a transcriptomic study of adult sepsis40indicating the overactivation of MAPK in these two diseases. Nonetheless, the precise mechanism underlying its action requires additional investigation. ENTPD7, a member of the ectonucleotidase family, hydrolyzes extracellular ATP to regulate purinergic signaling, which is a critical modulator of inflammation, senescence, and tumor immunity41,42. Previous studies have revealed that downregulating ENTPD7 inhibits Ras/Raf/MEK/ERK signaling, suppressing proliferation and promoting apoptosis41. In addition, Kusu et al. reported that ENTPD7 is selectively highly expressed in intestinal epithelial cells, where ENTPDase 7, the protein encoded by the ENTPD7 gene, can regulate luminal ATP levels and thus control intestinal immune responses. Entpd7−/− mice with autoimmune encephalomyelitis are reported to exhibit a significantly increased number of IL-17/IFN-γ dual-producing CD4+ T cells, which is associated with more severe clinical symptoms42. These results provide additional evidence for the significant function of ENTPD7 in the progression of NEC. CHKA is the core rate-limiting enzyme in choline metabolism and catalyzes the phosphorylation of choline to phosphocholine43,44. Moreover, it serves as a multifunctional signaling node that is involved in tumorigenesis, metabolic regulation, and immune evasion45,46,47. Research has demonstrated that the reduction of ChoKα, a protein encoded by the CHKA gene, leads to a decrease in the activity of the MAPK and PI3K/AKT signaling pathways, both of which play crucial roles in regulating cell proliferation48. Existing studies confirm that IL-1β levels significantly increase following LPS stimulation of macrophages. Knockout and inhibition of ChoKα reduced the production of mitochondrial reactive oxygen species (mtROS) in macrophages stimulated with NLRP3 inflammasome activators and also inhibited IL-1β production. Additionally, the inhibition of ChoKα prevented the death of mice suffering from LPS-induced sepsis47. These evidences further support that CHKA may play an important role in acute and chronic inflammatory diseases associated with NLRP3 inflammasome overactivation, including NEC and NS, but its mechanism still requires further investigation. CA4 belongs to the carbonic anhydrase enzyme family and facilitates the reversible hydration reaction between carbon dioxide (CO₂) and bicarbonate (HCO₃⁻)/protons (H⁺). This enzyme is essential for maintaining pH balance and facilitating carbon dioxide transport within biological systems49,50. Previous studies have shown that CA4 is abundantly expressed in the apical membrane of the epithelia of the distal small intestine and large intestine and also present in the endothelial cells of the submucosal capillaries in all regions of the gastrointestinal tract49. CA4 is also expressed at high levels in the intestinal epithelial layer of a rat model of colitis. Fourteen days after the induction of inflammatory damage, the levels and enzymatic activity of CA4 in the rat colon increased, indicating that the overactivity of CA4 is a pathological feature of colitis51. CA4 inhibitors can enhance the recovery of colitis in mice by directly promoting the proliferation of colonic epithelial cells, offering protective effects against intestinal damage52. Consequently, we propose that CA4 might be essential in the pathogenesis and therapeutic targeting of NEC. Nonetheless, there is a lack of pertinent research that has examined the connection between CA4 and NS, highlighting the necessity for additional mechanistic studies to clarify this possible relationship.
Our immune profiling revealed distinct immunopathological signatures between NEC and NS, characterized by disease-specific alterations in immune cell dynamics. In NS, the pronounced depletion of activated B cells and CD8+ T cells is consistent with sepsis-induced adaptive immune suppression53,54,55a phenomenon also observed in adult sepsis56. This suppression is a hallmark of systemic inflammatory response syndrome, in which persistent pathogen exposure leads to exhaustion of effector lymphocytes57,58,59. Conversely, the expansion of macrophages and dendritic cells suggests compensatory innate immune hyperactivation, likely driven by unresolved microbial translocation or tissue damage-associated molecular patterns60. The upregulation of macrophages is consistent with the findings of immune cell infiltration in pediatric sepsis reported by Zhang et al.61. In addition, Th2 cell abundance was significantly greater in the NEC group than in the control group, which is consistent with previous studies62. The selective Th2 skew in NEC underscores its unique mucosal immune context63in which IL-4/IL-13-driven responses may disrupt the intestinal barrier and lead to intestinal tissue damage62,64.
The analysis of immune cell involvement in the context of the identified candidate genes reveals significant insights into their potential roles in immune signaling pathways. The inverse association of CHKA with CD8+ T cells may reflect its role in choline metabolism, where excessive phosphatidylcholine synthesis impairs lipid raft-dependent TCR signaling, dampening cytotoxic activity65. Similarly, the ecto-ATPase activity of ENTPD7 likely depletes extracellular ATP, reducing the purinergic (P2X/P2Y) signaling essential for B cell activation and antibody production66. The CA4-CD4+ T cell correlation suggests pH-dependent modulation of T cell receptor sensitivity, as carbonic anhydrase IV regulates extracellular acidification in inflamed tissues67. These findings position these genes as molecular rheostats balancing immune activation and metabolic stress, offering mechanistic insights into NEC/NS pathogenesis and nominating immunometabolic pathways for therapeutic targeting.
The TF-miRNA network revealed master regulators such as RELA (NF-κB) and the family of miR-29 targeting MAP2K6. The role of NF-κB in NEC and NS has been well established, as it can promote the production of inflammatory factors, leading to intestinal damage and endothelial dysfunction68,69. Meanwhile, research has shown that miR-29 controls both innate and adaptive immune responses to intracellular bacterial infections by targeting interferon-γ70underscoring the important role of miR-29 in bacterial infectious diseases such as NEC or NS. Notably, the regulation of CA4 by AP-1 (FOS/JUN) suggests acidosis-induced transcriptional adaptation, a novel finding in neonatal immunology. However, the specific mechanisms that underlie these phenomena have yet to be elucidated.
Unlike previous studies focusing on either NEC71 or NS56,72,73 alone, our identified biomarkers—MAP2K6, CHKA, CA4, and ENTPD7—are shared across both diseases and reflect immune–metabolic interactions. This cross-disease relevance, combined with our integrated transcriptomic and machine learning approach, offers broader diagnostic utility and novel insights into common inflammatory mechanisms. To our knowledge, this is the first study to construct diagnostic models for both NEC and NS using shared biomarkers, and to further investigate their association with the immune microenvironment. These findings provide a new perspective on the molecular link between the two conditions. Despite the novel insights gained, this study has several limitations. It was based solely on public datasets, without accounting for potential heterogeneity across different patient populations and experimental platforms. The lack of single-cell resolution precluded cell-type–specific expression analysis. In addition, our findings stem from secondary transcriptomic analyses and lack validation through clinical samples or functional experiments. Future studies should include scRNA-seq and functional experiments to confirm the diagnostic and mechanistic relevance of the identified biomarkers.
Conclusion
In this study, we identified four shared biomarkers between NEC and NS through integrative bioinformatics and machine learning analyses. These biomarkers exhibited high diagnostic value and were associated with distinct immune cell infiltration patterns. Our results enhance the understanding of common molecular mechanisms underlying NEC and NS and may inform future strategies for early diagnosis and personalized therapeutic interventions in preterm infants.
Data availability
The gene expression datasets analyzed in this study are publicly available in the Gene Expression Omnibus (GEO) database under accession numbers GSE64801, GSE69686, and GSE25504 (https://www.ncbi.nlm.nih.gov/geo/).All rights and licenses for these data are retained by the original authors.
References
Shane, A. L., Sánchez, P. J. & Stoll, B. J. Neonatal sepsis. Lancet (London England). 390, 1770–1780. https://doi.org/10.1016/s0140-6736(17)31002-4 (2017).
Lin, P. W. & Stoll, B. J. Necrotising Enterocolitis. Lancet (London England). 368, 1271–1283. https://doi.org/10.1016/s0140-6736(06)69525-1 (2006).
Neu, J. & Walker, W. A. Necrotizing Enterocolitis. N Engl. J. Med. 364, 255–264. https://doi.org/10.1056/NEJMra1005408 (2011).
Neu, J. Necrotizing Enterocolitis. World Rev. Nutr. Diet. 110, 253–263. https://doi.org/10.1159/000358474 (2014).
Stoll, B. J. et al. Trends in care practices, morbidity, and mortality of extremely preterm neonates, 1993–2012. Jama 314, 1039–1051. https://doi.org/10.1001/jama.2015.10244 (2015).
Meister, A. L., Doheny, K. K. & Travagli, R. A. Necrotizing enterocolitis: it’s not all in the gut. Exp. Biol. Med. (Maywood). 245, 85–95. https://doi.org/10.1177/1535370219891971 (2020).
Niño, D. F., Sodhi, C. P. & Hackam, D. J. Necrotizing enterocolitis: new insights into pathogenesis and mechanisms. Nat. Rev. Gastroenterol. Hepatol. 13, 590–600. https://doi.org/10.1038/nrgastro.2016.119 (2016).
Shah, T. A. et al. Hospital and neurodevelopmental outcomes of extremely low-birth-weight infants with necrotizing Enterocolitis and spontaneous intestinal perforation. J. Perinatol. 32, 552–558. https://doi.org/10.1038/jp.2011.176 (2012).
Han, S. M. et al. Long-term outcomes of severe surgical necrotizing Enterocolitis. J. Pediatr. Surg. 55, 848–851. https://doi.org/10.1016/j.jpedsurg.2020.01.019 (2020).
Bazacliu, C. & Neu, J. Necrotizing enterocolitis: long term complications. Curr. Pediatr. Rev. 15, 115–124. https://doi.org/10.2174/1573396315666190312093119 (2019).
Yazji, I. et al. Endothelial TLR4 activation impairs intestinal microcirculatory perfusion in necrotizing Enterocolitis via eNOS-NO-nitrite signaling. Proc. Natl. Acad. Sci. U.S.A. 110, 9451–9456. https://doi.org/10.1073/pnas.1219997110 (2013).
Nolan, L. S., Wynn, J. L. & Good, M. Exploring Clinically-Relevant experimental models of neonatal shock and necrotizing Enterocolitis. Shock 53, 596–604. https://doi.org/10.1097/shk.0000000000001507 (2020).
Lawn, J. E., Wilczynska-Ketende, K. & Cousens, S. N. Estimating the causes of 4 million neonatal deaths in the year 2000. Int. J. Epidemiol. 35, 706–718. https://doi.org/10.1093/ije/dyl043 (2006).
Pammi, M. & Suresh, G. Enteral lactoferrin supplementation for prevention of sepsis and necrotizing Enterocolitis in preterm infants. Cochrane Database Syst. Rev. 3, Cd007137. https://doi.org/10.1002/14651858.CD007137.pub6 (2020).
Weitkamp, J. H. The role of biomarkers in suspected neonatal sepsis. Clin. Infect. Diseases: Official Publication Infect. Dis. Soc. Am. 73, e391–e393. https://doi.org/10.1093/cid/ciaa869 (2021).
Strunk, T., Molloy, E. J., Mishra, A. & Bhutta, Z. A. Neonatal bacterial sepsis. Lancet (London England). 404, 277–293. https://doi.org/10.1016/s0140-6736(24)00495-1 (2024).
Bizzarro, M. J., Ehrenkranz, R. A. & Gallagher, P. G. Concurrent bloodstream infections in infants with necrotizing Enterocolitis. J. Pediatr. 164, 61–66. https://doi.org/10.1016/j.jpeds.2013.09.020 (2014).
Duess, J. W. et al. Necrotizing enterocolitis, gut microbes, and sepsis. Gut Microbes. 15, 2221470. https://doi.org/10.1080/19490976.2023.2221470 (2023).
Liao, W. et al. Identification and verification of feature biomarkers associated with immune cells in neonatal sepsis. Eur. J. Med. Res. 28, 105. https://doi.org/10.1186/s40001-023-01061-2 (2023).
Chen, G. et al. Identification of candidate genes for necrotizing Enterocolitis based on microarray data. Gene 661, 152–159. https://doi.org/10.1016/j.gene.2018.03.088 (2018).
Barrett, T. et al. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 41, D991–995. https://doi.org/10.1093/nar/gks1193 (2013).
Wynn, J. L. et al. Postnatal age is a critical determinant of the neonatal host response to sepsis. Mol. Med. 21, 496–504. https://doi.org/10.2119/molmed.2015.00064 (2015).
Wynn, J. L. et al. Targeting IL-17A attenuates neonatal sepsis mortality induced by IL-18. Proc. Natl. Acad. Sci. U.S.A. 113, E2627–2635. https://doi.org/10.1073/pnas.1515793113 (2016).
Tremblay, É. et al. Gene expression profiling in necrotizing Enterocolitis reveals pathways common to those reported in Crohn’s disease. BMC Med. Genomics. 9, 6. https://doi.org/10.1186/s12920-016-0166-9 (2016).
Tremblay, É. et al. IL-17-related signature genes linked to human necrotizing Enterocolitis. BMC Res. Notes. 14, 82. https://doi.org/10.1186/s13104-021-05489-9 (2021).
Dickinson, P. et al. Whole blood gene expression profiling of neonates with confirmed bacterial sepsis. Genom Data. 3, 41–48. https://doi.org/10.1016/j.gdata.2014.11.003 (2015).
Smith, C. L. et al. Identification of a human neonatal immune-metabolic network associated with bacterial infection. Nat. Commun. 5, 4649. https://doi.org/10.1038/ncomms5649 (2014).
Ritchie, M. E. et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47. https://doi.org/10.1093/nar/gkv007 (2015).
Kanehisa, M., Furumichi, M., Sato, Y., Kawashima, M. & Ishiguro-Watanabe, M. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res. 51, D587–d592. https://doi.org/10.1093/nar/gkac963 (2023).
Alsaied, A., Islam, N. & Thalib, L. Global incidence of necrotizing enterocolitis: a systematic review and Meta-analysis. BMC Pediatr. 20, 344. https://doi.org/10.1186/s12887-020-02231-5 (2020).
Zhang, D., Xie, D., Qu, Y., Mu, D. & Wang, S. Digging deeper into necrotizing enterocolitis: bridging clinical, microbial, and molecular perspectives. Gut Microbes. 17, 2451071. https://doi.org/10.1080/19490976.2025.2451071 (2025).
Bakoush, F. B. A., Azab, A. E. & Yahya, R. A. M. Neonatal sepsis: insight into incidence, classification, risk factors, causative organisms, pathophysiology, prognosis, clinical manifestations, complications, systemic examination, and treatment. South. Asian Res. J. Appl. Med. Sci. 5, 136–157. https://doi.org/10.36346/sarjams.2023.v05i06.004 (2023).
Wennekamp, J. & Henneke, P. Induction and termination of inflammatory signaling in group B Streptococcal sepsis. Immunol. Rev. 225, 114–127. https://doi.org/10.1111/j.1600-065X.2008.00673.x (2008).
Colón, D. F. et al. Neutrophil extracellular traps (NETs) exacerbate severity of infant sepsis. Crit. Care. 23, 113. https://doi.org/10.1186/s13054-019-2407-8 (2019).
Li, H. et al. Mucosal or systemic microbiota exposures shape the B cell repertoire. Nature 584, 274–278. https://doi.org/10.1038/s41586-020-2564-6 (2020).
Matsumoto, T. et al. Crystal structure of non-phosphorylated MAP2K6 in a putative auto-inhibition state. J. Biochem. 151, 541–549. https://doi.org/10.1093/jb/mvs023 (2012).
Li, Z., Fu, J., Li, N. & Shen, L. Quantitative proteome analysis identifies MAP2K6 as potential regulator of LIFR-induced radioresistance in nasopharyngeal carcinoma cells. Biochem. Biophys. Res. Commun. 505, 274–281. https://doi.org/10.1016/j.bbrc.2018.09.020 (2018).
Li, Z., Li, N. & Shen, L. MAP2K6 is associated with radiation resistance and adverse prognosis for locally advanced nasopharyngeal carcinoma patients. Cancer Manag Res. 10, 6905–6912. https://doi.org/10.2147/cmar.S184689 (2018).
Kyriakis, J. M. & Avruch, J. Mammalian MAPK signal transduction pathways activated by stress and inflammation: a 10-year update. Physiol. Rev. 92, 689–737. https://doi.org/10.1152/physrev.00028.2011 (2012).
Jiang, C. et al. Integrated analysis reveals NLRC4 as a potential biomarker in sepsis pathogenesis. Genes Immun. 25, 397–408. https://doi.org/10.1038/s41435-024-00293-4 (2024).
Wen, Z. et al. Inhibition of lung cancer cells and ras/raf/mek/erk signal transduction by Ectonucleoside triphosphate phosphohydrolase-7 (ENTPD7). Respir Res. 20, 194. https://doi.org/10.1186/s12931-019-1165-0 (2019).
Kusu, T. et al. Ecto-nucleoside triphosphate diphosphohydrolase 7 controls Th17 cell responses through regulation of luminal ATP in the small intestine. J. Immunol. 190, 774–783. https://doi.org/10.4049/jimmunol.1103067 (2013).
Lacal, J. C., Zimmerman, T. & Campos, J. M. Choline kinase: An unexpected journey for a precision medicine strategy in human diseases. Pharmaceutics 13. https://doi.org/10.3390/pharmaceutics13060788 (2021).
Lacal, J. C. Choline kinase as a precision medicine target for therapy in cancer, autoimmune diseases and malaria. Precis Med. 2, e980 (2015).
Sola-Leyva, A. et al. Choline kinase inhibitors EB-3D and EB-3P interferes with lipid homeostasis in HepG2 cells. Sci. Rep. 9, 5109. https://doi.org/10.1038/s41598-019-40885-z (2019).
Hu, L. et al. Overexpression of CHKA contributes to tumor progression and metastasis and predicts poor prognosis in colorectal carcinoma. Oncotarget 7, 66660–66678. https://doi.org/10.18632/oncotarget.11433 (2016).
Sanchez-Lopez, E. et al. Choline uptake and metabolism modulate macrophage IL-1β and IL-18 production. Cell. Metab. 29, 1350–1362e1357. https://doi.org/10.1016/j.cmet.2019.03.011 (2019).
Yalcin, A. et al. Selective Inhibition of choline kinase simultaneously attenuates MAPK and PI3K/AKT signaling. Oncogene 29, 139–149. https://doi.org/10.1038/onc.2009.317 (2010).
Fleming, R. E. et al. Carbonic anhydrase IV expression in rat and human gastrointestinal tract regional, cellular, and subcellular localization. J. Clin. Investig. 96, 2907–2913 (1995).
Supuran, C. T. & Capasso, C. An overview of the bacterial carbonic anhydrases. Metabolites 7. https://doi.org/10.3390/metabo7040056 (2017).
Lucarini, E. et al. Carbonic anhydrase IV selective inhibitors counteract the development of colitis-associated visceral pain in rats. Cells 10. https://doi.org/10.3390/cells10102540 (2021).
Mizoguchi, E. et al. Colonic epithelial functional phenotype varies with type and phase of experimental colitis. Gastroenterology 125, 148–161. https://doi.org/10.1016/s0016-5085(03)00665-6 (2003).
Yang, J. et al. Soluble CD72 concurrently impairs T cell functions while enhances inflammatory response in sepsis. Int. Immunopharmacol. 147, 113981. https://doi.org/10.1016/j.intimp.2024.113981 (2025).
Islam, M. M. et al. Immunoadjuvant therapy in the regulation of cell death in sepsis: recent advances and future directions. Front. Immunol. 15, 1493214. https://doi.org/10.3389/fimmu.2024.1493214 (2024).
Peng, Y. et al. Identification of Immune-Related genes concurrently involved in critical illnesses across different etiologies: A data-driven analysis. Front. Immunol. 13, 858864. https://doi.org/10.3389/fimmu.2022.858864 (2022).
Ning, J. et al. Use of machine learning-based integration to develop a monocyte differentiation-related signature for improving prognosis in patients with sepsis. Mol. Med. 29, 37. https://doi.org/10.1186/s10020-023-00634-5 (2023).
de Pablo, R., Monserrat, J., Prieto, A. & Alvarez-Mon, M. Role of circulating lymphocytes in patients with sepsis. Biomed. Res. Int. 2014 (671087). https://doi.org/10.1155/2014/671087 (2014).
Mulder, P. P. G. et al. Persistent systemic inflammation in patients with severe burn injury is accompanied by influx of immature neutrophils and shifts in T cell subsets and cytokine profiles. Front. Immunol. 11, 621222. https://doi.org/10.3389/fimmu.2020.621222 (2020).
Beyer, K., Menges, P., Keßler, W. & Heidecke, C. D. Pathophysiology of peritonitis. Chirurg 87, 5–12. https://doi.org/10.1007/s00104-015-0117-6 (2016).
Adib-Conquy, M. & Cavaillon, J. M. Stress molecules in sepsis and systemic inflammatory response syndrome. FEBS Lett. 581, 3723–3733. https://doi.org/10.1016/j.febslet.2007.03.074 (2007).
Zhang, W. Y. et al. Analysis and validation of diagnostic biomarkers and immune cell infiltration characteristics in pediatric sepsis by integrating bioinformatics and machine learning. World J. Pediatr. 19, 1094–1103. https://doi.org/10.1007/s12519-023-00717-7 (2023).
Liu, Y. et al. Changes in intestinal Toll-like receptors and cytokines precede histological injury in a rat model of necrotizing Enterocolitis. Am. J. Physiol. Gastrointest. Liver Physiol. 297, G442–450. https://doi.org/10.1152/ajpgi.00182.2009 (2009).
Bulek, K., Swaidani, S., Aronica, M. & Li, X. Epithelium: the interplay between innate and Th2 immunity. Immunol. Cell. Biol. 88, 257–268. https://doi.org/10.1038/icb.2009.113 (2010).
Chuang, S. L. et al. Cow’s milk protein-specific T-helper type I/II cytokine responses in infants with necrotizing Enterocolitis. Pediatr. Allergy Immunol. 20, 45–52. https://doi.org/10.1111/j.1399-3038.2008.00729.x (2009).
Zech, T. et al. Accumulation of raft lipids in T-cell plasma membrane domains engaged in TCR signalling. Embo J. 28, 466–476. https://doi.org/10.1038/emboj.2009.6 (2009).
Lee, D. H., Park, K. S., Kong, I. D., Kim, J. W. & Han, B. G. Expression of P2 receptors in human B cells and Epstein-Barr virus-transformed lymphoblastoid cell lines. BMC Immunol. 7, 22. https://doi.org/10.1186/1471-2172-7-22 (2006).
Khan, I. & Khan, K. Uncoupling of carbonic anhydrase from Na-H exchanger-1 in experimental colitis: A possible mechanistic link with Na-H exchanger. Biomolecules 9. https://doi.org/10.3390/biom9110700 (2019).
Huang, L. et al. PCSK9 promotes endothelial dysfunction during sepsis via the TLR4/MyD88/NF-κB and NLRP3 pathways. Inflammation 46, 115–128. https://doi.org/10.1007/s10753-022-01715-z (2023).
Zhang, X. et al. β-glucan protects against necrotizing Enterocolitis in mice by inhibiting intestinal inflammation, improving the gut barrier, and modulating gut microbiota. J. Transl Med. 21, 14. https://doi.org/10.1186/s12967-022-03866-x (2023).
Ma, F. et al. The MicroRNA miR-29 controls innate and adaptive immune responses to intracellular bacterial infection by targeting interferon-γ. Nat. Immunol. 12, 861–869. https://doi.org/10.1038/ni.2073 (2011).
Xie, Z., Kang, Q., Shi, Y., Du, J. & Jiang, H. A transcriptomic landscape analysis of human necrotizing enterocolitis: important roles of immune infiltration. Pediatr. Discov. 1, e1. https://doi.org/10.1002/pdi3.1 (2023).
Wang, X., Li, J. & Ning, J. Comprehensive multi-omics integration reveals B cell‐derived ELL2 as a novel diagnostic and prognostic biomarker in sepsis. Med. Res. https://doi.org/10.1002/mdr2.70010 (2025).
Ning, J. et al. Single-cell sequence analysis combined with multiple machine learning to identify markers in sepsis patients: LILRA5. Inflammation 46, 1236–1254. https://doi.org/10.1007/s10753-023-01803-8 (2023).
Acknowledgements
The authors thank the contributors of the Gene Expression Omnibus (GEO) datasets used in this study for making their data publicly available.
Funding
This work was supported by the Natural Science Foundation of Guangxi Province, China (grant number 2024GXNSFBA010274) and the First Batch of Medical Young Reserve Talents Training Program of Guangxi, China (2025–2027).
Author information
Authors and Affiliations
Contributions
QH and QM contributed equally to this work. They were responsible for designing the research, analyzing the data, and drafting the manuscript. QW, RL, LQ, and QC participated in the data analysis and interpretation. YC supervised the study, contributed to the study design and data analysis, and revised the manuscript critically. All authors have read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics approval and consent to participate
This study was approved by the Ethics Committee of the Second Affiliated Hospital of Guangxi Medical University (Guangxi, China). All methods were conducted in accordance with the relevant guidelines and regulations.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Huang, Q., Mai, Q., Wei, Q. et al. Shared immune biomarkers in necrotizing enterocolitis and neonatal sepsis identified via bioinformatics and machine learning. Sci Rep 15, 33142 (2025). https://doi.org/10.1038/s41598-025-18435-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-18435-7