Abstract
Liver fibrosis (LF) is a medical disorder caused by prolonged chronic liver injury, which, if left untreated, can progress to cirrhosis or liver cancer, posing significant risks to patient health. In recent years, the increase in liver diseases, including alcoholic liver disease, non-alcoholic fatty liver disease, and viral hepatitis, has significantly heightened the prevalence of LF. SUMOylation, an important post-translational modification, is essential for regulating cellular functions and may play a critical role in the progression of LF; however, its exact mechanisms remain poorly understood. This study conducted a thorough examination of the expression patterns of SUMOylation-related genes in patients with LF for the first time. We obtained two LF datasets (GSE130970 and GSE84044) from the GEO database, integrated the data for DEG analysis and functional enrichment analysis, and employed machine learning techniques to identify pivotal genes. Furthermore, we utilized ssGSEA and immune cell infiltration analysis to evaluate the roles of these genes within the immunological context of LF. To validate the bioinformatics findings, we established a CCl₄-induced C57BL/6 mouse model of LF to investigate the expression of relevant genes. A total of 1,583 differentially expressed genes were identified, 13 of which were associated with SUMOylation. These genes were primarily enriched in biological processes related to cell signal transduction, cell adhesion, and inflammatory responses. Utilizing machine learning approaches, we found eight crucial genes (NR3C2, PCNA, THRB, CDKN2A, DNMT1, MDM2, SMC6, and RXRA) that have significant diagnostic potential in the progression of LF. Additionally, we observed a significant increase in the infiltration of several immune cell types, with evident correlations between the expression of SUMOylation-related genes and specific immune cell types. The results of the animal experiments validated the bioinformatics analysis, as key SUMOylation-related genes exhibited expression patterns consistent with our expectations in the CCl₄-induced LF mouse model. This study elucidates the critical roles of SUMOylation-related genes in LF, highlighting their influence on liver damage and the progression of fibrosis through the regulation of cytokine synthesis, facilitation of hepatic stellate cell activation, and enhancement of immune cell infiltration. The identified significant genes exhibit potential as novel biomarkers for therapeutic applications. These findings clarify the pathogenic mechanisms of SUMOylation in LF and establish a foundation for the development of innovative therapeutic targets and diagnostic markers, thereby aiding in the prevention and treatment of LF.
Similar content being viewed by others
Introduction
Liver fibrosis (LF) is a pathological condition resulting from prolonged chronic liver injury, which, if left untreated, can progress to cirrhosis or hepatocellular carcinoma, posing significant risks to patient health and survival. In recent years, the rising prevalence of liver diseases, such as alcoholic liver disease, non-alcoholic fatty liver disease, and viral hepatitis, has led to a marked increase in the incidence of hepatic fibrosis, thereby emerging as a global public health concern1,2. The pathophysiological mechanisms underlying LF are intricate, primarily characterized by the aberrant accumulation of extracellular matrix in the liver. This process is regulated by various cellular and molecular mechanisms, including the activation of hepatic stellate cells, infiltration of inflammatory cells, and cytokine secretion3. Currently, although therapeutic options for LF remain limited, advancing our understanding of its pathophysiological mechanisms, along with the identification of effective diagnostic markers and therapeutic targets, is particularly crucial and urgent4,5.
SUMOylation is a vital post-translational protein modification that plays a significant role in various biological functions, including the regulation of protein stability, cell cycle progression, and DNA repair6. This process involves the covalent attachment of small ubiquitin-like modifiers to target proteins, thereby altering their function, localization, and interactions, and exerting a profound impact on numerous cellular processes7. Studies have shown that SUMOylation significantly affects cellular stress responses, transcriptional regulation, signal transduction, and apoptosis, thereby influencing cellular fate and functionality8,9. In the context of liver fibrosis (LF), SUMOylation may substantially affect key factors related to inflammation, cellular proliferation, and fibrosis, consequently impacting liver injury and fibrosis progression10. Furthermore, SUMOylation may facilitate the activation of hepatic stellate cells, which play a crucial role in LF due to their cytokine release in the damaged environment, contributing to the advancement of LF11. Although current research indicates a correlation between SUMOylation and various disease conditions, the specific roles of SUMOylation-related genes in LF and their potential molecular mechanisms remain to be elucidated.
The advancement of high-throughput genomics technologies has significantly enhanced the bioinformatics analysis of RNA sequencing data, offering novel insights and methodologies for the identification of disease-associated genes12. In this study, two LF datasets were retrieved from the GEO database and integrated to perform differential expression gene (DEG) analysis, functional enrichment analysis, and machine learning algorithms to identify core genes. This approach was employed to investigate the expression characteristics and potential molecular mechanisms of SUMOylation-related genes in LF patients. Furthermore, single-sample gene set enrichment analysis (ssGSEA) and immune cell infiltration analysis were utilized to thoroughly assess the roles and interrelationships of these genes within the immunological context of LF.
This study represents a pioneering effort in systematically analyzing the expression patterns of SUMOylation-related genes in patients with LF, utilizing both bioinformatics analyses and experimental validation to evaluate the roles of these genes within the immune environment of LF. The findings offer novel insights into the function of SUMOylation in LF, establishing a foundation for the identification of potential therapeutic targets in this domain. It is anticipated that the outcomes of this research will significantly advance LF studies and promote the exploration and clinical application of innovative therapeutic strategies.
Methods
Identification of differentially expressed genes between mild and advanced LF
We acquired two datasets, GSE130970 and GSE84044, from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). The GSE130970 dataset comprises 53 samples of mild fibrosis and 25 samples of advanced fibrosis, whereas the GSE84044 dataset includes 63 samples of mild fibrosis and 61 samples of advanced fibrosis. These GEO datasets employed distinct histological assessment methodologies based on their respective disease backgrounds. Specifically, the GSE130970 dataset utilized the NASH-CRN scoring system, while the GSE84044 dataset employed the Scheuer scoring system13,14. To ensure clinical consistency in fibrosis classification, two senior pathologists conducted double-blind independent evaluations of these datasets. The gene expression data utilized in this study are publicly accessible from the GEO database, and all datasets have received ethical approval from the original research institutions’ committees. Detailed information regarding the datasets is presented in Table 1.
In the course of preparing the two datasets, the Principal Component Analysis (PCA) plot indicated a pronounced batch effect in both datasets. To address this issue, we utilized the sva R package to identify and construct surrogate variables suitable for high-dimensional datasets, effectively mitigating the batch effect. Following the removal of the batch effect, we visualized the PCA plot using the FactoMineR and Factoextra R packages. Differentially expressed genes (DEGs) between the mild-fibrosis and advanced-fibrosis groups were subsequently identified using the limma package in R, based on criteria of an adjusted P-value less than 0.05 and absolute log-fold changes greater than 0.2 across the two datasets.
Gene ontology and pathway enrichment analyses
Gene ontology and pathway enrichment analyses were conducted to elucidate the functional characteristics of the DEGs. We performed comprehensive enrichment analyses using the clusterProfiler R package, covering Gene Ontology (GO) categories including biological process, molecular function, and cellular component, alongside Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation15,16. To ensure analytical rigor and reduce the risk of false positive results, we applied a standard multiple-test correction method to calculate adjusted P-values, with statistical significance defined as an adjusted p < 0.05; additionally, an GeneRatio ≥ 0.1 was set to filter out weakly enriched terms or pathways, thereby prioritizing those with stronger associations with the DEGs and greater potential biological relevance to the initiation and progression of LF.
Feature selection by three well-established machine learning algorithms
We input the 13 identified genes into the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm for the LF group. We developed a regression model using the glmnet R package with 10-fold cross-validation, set the model parameter to binomial, and selected the optimal lambda value using lambda.1 min. We plotted the partial likelihood deviation curve alongside the logarithmic (λ) curve. We then determined the optimal value for 1se of the minimal standard. Following this, Support Vector Machine-Recursive Feature Elimination (SVM-RFE) was employed to eliminate redundant features. The e1071 and MSVM-RFE R packages were used for SVM modeling, with SVM-RFE utilizing sequential backward feature elimination to identify the most significant hub gene. The corresponding gene sets were identified as the most reliable diagnostic markers at the minimal 5×CV error and maximal 5×CV accuracy. Finally, we employed the Random Forest (RF) algorithm to classify the important genes using the randomForest R package. The RF analysis, employing the decision tree methodology, identified the most significant factors. The RF model comprising 500 trees was developed using the discovery cohorts, and the optimal number of trees was determined through cross-validation errors. Subsequently, genes were prioritized based on their significance, and the ten most consequential genes were highlighted. A significance threshold of 0.5 was established to determine the outcomes of gene selection.
Following the evaluation of the three aforementioned techniques, the intersection of their respective findings was obtained. Receiver Operating Characteristic (ROC) curves were generated using the pROC R package and visualized with ggplot2 R package to evaluate the accuracy of two diagnostic genes.
Assessment of immune cell infiltration via SsGSEA and Deconvolution algorithms
In a recent investigation, gene sets specific to immune cell markers were compiled and utilized in single sample gene set enrichment analysis (ssGSEA) to evaluate the relative abundance of 24 immune cell types within the LF environment, expressed as enrichment scores. Following this, the Estimation of Stromal and Immune cells in Malignant Tumors using Expression data (EPIC) and Tumor Immune Estimation Resource (TIMER) algorithms were applied to quantify the infiltration levels of each immune cell type within the fibrotic tissue. Notably, fibrotic tissue characterized by substantial immune cell infiltration is associated with increased immune scores and decreased fibrotic tissue. To elucidate the relationship between core SUMOylation-related genes and inflammatory cell infiltration, we initially quantified the infiltration levels of key inflammatory cell types using ssGSEA. Subsequently, Spearman rank correlation analysis was conducted between the expression levels of each core gene and the ssGSEA enrichment scores of each inflammatory cell type, employing the corrplot R package. Statistical significance was determined by an adjusted p < 0.05.
Gene set enrichment analysis
This work employed gene set enrichment analysis (GSEA) utilizing the clusterProfiler R package to clarify the key functional and pathway disparities among the DEGs. We conducted the GSEA with key parameters as follows: 1,000 gene set permutations to reduce false positives, CP gene sets from the MSigDB C2 subset as the pathway source, enrichment score calculated via the weighted Kolmogorov-Smirnov test, statistical significance defined as nominal p < 0.05 and FDR < 0.25, and background genes as expressed genes (FPKM > 1 in ≥ 80% of samples) from the merged, normalized datasets. Subsequently, we analyzed the correlation between the core genes and all other genes, and used heatmaps to display the expression of the top 50 positively correlated genes.
Consensus clustering
To identify molecular subtypes of LF with distinct SUMOylation-related gene expression patterns and explore potential associations between these subtypes and clinical/pathological features, consensus clustering was performed on the integrated GEO dataset using the ConsensusClusterPlus R package. To determine the optimal number of clusters, we tested cluster counts ranging from k = 2 to k = 10, and for each k value, the clustering procedure was repeated 1000 times with 80% sample resampling to minimize random bias and ensure the robustness and reproducibility of results.
Transcription factor-gene interactions
Transcription factors (TFs) are pivotal regulatory proteins that bind to specific DNA regions, such as gene promoters or enhancers, to modulate gene expression, thereby playing a critical role in the regulation of molecular networks involved in the progression of LF. To investigate the regulatory interactions between transcription factors and the core SUMOylation-related genes identified through machine learning, we utilized NetworkAnalyst 3.0 (https://www.networkanalyst.ca), a specialized bioinformatics platform. This platform facilitated the examination of transcription factor-gene interactions and the assessment of how transcription factors influence the expression of these core genes and their roles in LF-related functional pathways. Data on transcription factors and target genes were sourced from the ENCODE ChIP-seq dataset, which provides reliable genome-wide binding profiles across various cell types and tissues. Exclusive peak intensity signals were employed to filter out non-specific binding events, ensuring that only high-confidence transcription factor-gene interactions were included for subsequent analysis.
Transcription factor–miRNA coregulatory network
MicroRNAs (miRNAs) constitute a class of endogenous small non-coding RNAs that play a crucial role in mediating the degradation of target mRNA or inhibiting translation. To understand the dysregulation of gene expression across diverse physiological and pathological contexts, it is essential to analyze the transcriptional regulatory networks involving transcription factors and microRNAs. The 8 core genes were input into NetworkAnalyst 3.0 to construct a transcription factor-microRNA coregulatory network. Regulatory interaction data, curated from existing literature, were sourced from RegNetwork (http://www.regnetworkweb.org). The relevant findings were further visualized using Cytoscape.
Deeper level network analysis based on cytoscape
After employing machine learning feature selection techniques to identify key genes, a protein interaction network was constructed by querying the STRING database with these genes. The MCODE plugin in Cytoscape was then utilized to analyze densely connected clusters within the network and to identify pivotal genes.
Verification of SUMOylation-related gene expression in a murine model of carbon tetrachloride-induced LF
Male SPF-grade C57BL/6 mice, aged 6 to 8 weeks and with an average weight of 20 to 25 g, were obtained from Hubei Bante Biotechnology Co., Ltd. in Wuhan, China. The mice were randomly assigned to either a control group or a LF model group, with each group consisting of six mice. The experimental group underwent LF induction through intraperitoneal injections of carbon tetrachloride (CCl₄, Aladdin) at a dose of 0.5 mL/kg, diluted in olive oil (O108686, Aladdin) at a 1:4 ratio, administered twice weekly over a period of eight weeks. In contrast, the control group received intraperitoneal injections of an equivalent volume of olive oil. Following the induction period, the mice were anesthetized via intraperitoneal injection of tribromoethanol (T48402-5G, Sigma) at a standard dosage of 250 mg/kg body weight. Euthanasia was subsequently conducted through cervical dislocation, and liver tissues were promptly collected. A portion of the collected tissue was utilized for protein blotting and reverse transcription quantitative polymerase chain reaction (RT-qPCR) analyses, while the remaining tissue was fixed in 4% paraformaldehyde and processed into paraffin-embedded sections. Immunohistochemical analyses were performed in accordance with the kit’s instructions to evaluate the expression levels of relevant marker genes. The Animal Ethics Committee of Hubei University of Chinese Medicine has reviewed and granted approval for this animal experiment, under the ethics approval number HUCMS20240615. This study is reported in compliance with the ARRIVE guidelines (https://arriveguidelines.org/).
Total RNA extraction and RT-qPCR assay
We acquired an appropriate amount of frozen liver tissue and measured its weight. We extracted total RNA using the Trizol method and measured the absorbance. We then performed reverse transcription and amplification by preparing the reaction system according to the SYBR qPCR mixing protocols (R222-01, Q331-02, VazymE). Use GAPDH as the internal control and analyze the mRNA expression levels of the target genes employing the 2−ΔΔCt method. The sequences for substrate design are provided in Table 2.
Protein extraction and Western blot analysis
Frozen liver tissue samples were pulverized and incorporated into RIPA lysis buffer (G2002, Servicebio) supplemented with protease (G2008, Servicebio) and phosphatase inhibitors (G2007, Servicebio). Post-lysis, the supernatant was obtained through centrifugation at 12,000 × g for 15 min at 4 °C, and protein concentration was determined using a BCA assay kit (P0012S, Beyotime). A standardized protein amount (30 µg) was extracted from each sample, resolved via SDS-PAGE, and subsequently transferred onto a PVDF membrane (IPVH00010, MerckMillipore). The membrane was blocked with 5% skimmed milk powder (BS102, Biosharp) for 1 h at room temperature. The primary antibodies against target proteins DNMT1 (1:1000, ab188453, Abcam), NR3C2 (1:2000, K106612, Solarbio), THRB (1:2000, AF8157, beyotime), CDKN2A (1:2000, 10883-1-AP, Proteintech), and the internal reference protein GAPDH (1:5000, ab8245, Abcam) were then separately applied to detect their corresponding target proteins, followed by incubation overnight at 4 °C. On the next day, the HRP-conjugated secondary antibody (ab6721, Abcam) was added, and the incubation was conducted at room temperature for 1 h. The bands’ grey values were captured, photographed, and analyzed using ECL Chemiluminescent Reagent (HY-K1005, MedChemExpress).
Immunohistochemical assessments
We prepared liver tissue samples into 4 μm paraffin-embedded slices, heated them at 65 °C for an hour, then dewaxed and hydrated the slices. We performed antigen retrieval using pH 6.0 sodium citrate buffer under high pressure. After cooling, we rinsed the slices with PBS, treated them with 3% hydrogen peroxide for 10 min to block peroxidase, and incubated them with 5% BSA for 30 min to prevent non-specific binding. The primary antibodies (DNMT1, 1:500; CDKN2A, 1:300; NR3C2, 1:200; THRB, 1:300) were added and incubated overnight at 4 °C. The next day, slices were rinsed with PBS, treated with HRP-conjugated secondary antibody (GAPDH, 1:500) for 30 min, and color developed with DAB. The reaction was stopped with tap water, and slices were re-stained with haematoxylin, dehydrated, cleared, and sealed with neutral gel. We examined and photographed specimens using an Olympus BX53 light microscope (200 × magnification), then assessed staining intensity and positive area with Image Pro Plus software.
Statistical analyses
Data analyses were performed using GraphPad Prism 8.0 and R software, with results presented as mean ± standard deviation. For group comparisons, independent samples t-test was used for normally distributed data, Mann-Whitney U test for non-normally distributed data, and one-way analysis of variance with Tukey post hoc test for multiple groups. Results of immunohistochemistry and Western blot were quantitatively analyzed via Image-Pro Plus software. Spearman correlation analysis was applied to assess the association between SUMOylation-related gene expression and immune cell infiltration, and ROC curves were used to determine the diagnostic efficiency of key genes. Differences were considered statistically significant at p < 0.05.
Results
Differential expression of SUMOylation-related genes
Upon merging the GSE130970 and GSE84044 datasets and eliminating batch effects, we identified 16,243 genes, encompassing expression patterns from 86 patients with advanced LF and 116 patients with mild fibrosis. The PCA findings demonstrated strong data consistency (Fig. 1A-B). We utilized the PreprocessCore package in R to normalize the dataset, and the results of normalization before and after were shown in Fig. 1C-D. We found 1,583 DEGs using the Limma approach, comprising 1,099 upregulated genes and 484 downregulated genes. There were notable disparities in gene expression between patients with LF and healthy individuals. Figure 1E and F depict the volcano plot and heatmap of DEGs.
Data Processing and Differential Expression Analysis. Batch calibration was performed on the GSE130970 and GSE84044 datasets: (A) prior to and (B) following correction. Normalisation of the (C–D) dataset: (C) prior to normalisation and (D) following normalisation. Differential expression analysis for the (E–F) gene set: (E) a volcano plot illustrating upregulated (red) and downregulated (blue) genes, alongside a heatmap depicting differential expression between the (F) disease group and the control group.
GO and KEGG pathway enrichment analyses of DEGs
The clusterProfiler package was used to determine the potential functions of DEGs and to identify the over-represented GO categories in the biological processes of these 1,583 DEGs. As shown in Fig. 2A–C, for biological processes, DEGs were mainly enriched in positive regulation of response to external stimulus, regulation of cell–cell adhesion, cytokine-mediated signaling pathway, negative regulation of immune system process, and, for cellular component, response to collagen-containing extracellular matrix, external side of plasma membrane, cytoplasmic vesicle lumen, and cell–substrate junction. For molecular function, DEGs were mainly enriched in components such as actin binding, amide binding, extracellular matrix structural constituent, glycosaminoglycan binding, and sulfur compound binding. KEGG pathway analysis was conducted to ascertain which common DEGs participated in cytokine–cytokine receptor interaction, phagosome, cell adhesion molecules, and viral protein interaction with cytokine and cytokine receptor (Fig. 2D). A total of 20 significantly enriched pathways were identified with an adjusted p-value of 0.05.
Functional annotation and enrichment analysis of DEGs using GO and KEGG pathways. (A–C) Enriched pathways in biological processes, cellular components, and molecular functions. (D) KEGG pathway enrichment analysis of DEGs shows (http://www.KEGG.jp/KEGG/kegg1.html).
The enrichment results and expression patterns of the SUMOylation-related differentially expressed genes
Subsequently, the upregulated and downregulated differentially expressed genes were intersected with the SUMOylation-related genes to generate Venn diagrams. A total of 13 intersecting genes were identified, with 5 genes in the upregulated group (PCNA, MDM2, TOP2A, DNMT1, CDKN2A) and 8 genes in the downregulated group (RXRA, NR3C2, THRB, NR1I2, AR, SMC6, CAPN3, BRCA1) (Fig. 3A-B). The 13 intersecting differentially expressed genes were subjected to functional enrichment analysis, including the BP, CC, and MF categories (Fig. 3C-E). The results showed that these genes were mainly enriched in the intracellular receptor signaling pathway, hormone-mediated signaling pathway, response to steroid hormone, germ cell nucleus, chromosomal region, male germ cell nucleus, nuclear receptor activity, ligand-activated transcription factor activity, and DNA-binding transcription factor binding functions. Pathway enrichment analysis revealed that the intersecting differentially expressed genes were enriched in platinum drug resistance, the thyroid hormone signaling pathway, and the PI3K–Akt signaling pathway (Fig. 3F).
Enrichment analysis of SUMOylation-related DEGs. The Venn diagrams (A–B) show the overlap between SUMOylation-related genes and DEGs. (C–E) GO annotation of 13 overlapping DEGs in (C) biological processes, (D) cellular components, and (E) molecular functions highlights relevant pathways. (F) The KEGG pathway enrichment of overlapping DEGs is shown (http://www.KEGG.jp/KEGG/kegg1.html).
We identified 13 differentially expressed SUMOylation-related genes in the cohort from the GEO database, of which 5 genes were up-regulated and 8 genes were down-regulated (Fig. 4A). The expression of the 13 SUMOylation-associated genes in patients with advanced fibrosis and mild fibrosis is displayed in the heatmap (Fig. 4B). Significant expression of genes related to the regulation of SUMOylation was observed in both fibrosis clusters and was consistent with the expected results for a subset of fibrosis (Fig. 4C).
Expression analysis of SUMOylation-related genes in fibrosis. (A) Identification of 13 differentially expressed SUMOylation-related genes. (B) Heatmap of gene expression in advanced vs. mild fibrotic patients. (C) Expression of SUMOylation genes across fibrosis clusters, consistent with expected patterns.(ns p > 0.05,* p < 0.05, **p < 0.01,***p < 0.001).
Eight genes were identified and verified as diagnostic biomarker by machine learning algorithms
Candidate diagnostic biomarkers were screened using three different algorithms. We utilized the LASSO logistic regression algorithm to identify nine meaningful feature variables related to fibrosis from the DEGs. The random forest approach identified 10 hub genes, while the SVM-RFE algorithm was used to classify 13 features among all DEGs and to identify a subset of 13 significant features (Fig. 5A–C). Thus, eight overlapping meaningful features, which are genes identified by all three algorithms, were finally selected and indicated in the Venn plot (Fig. 5D). As shown in Fig. 5E, most SUMOylation-signature genes are highly correlated with each other. Furthermore, we calculated the area under the receiver operating characteristic curve for each hub gene, resulting in values of 0.713 for NR3C2, 0.783 for PCNA, 0.721 for RXRA, 0.695 for THRB, 0.677 for CDKN2A, 0.64 for SMC6, 0.71 for DNMT1, and 0.765 for MDM2 (Fig. 5F). These results indicate that all eight gene signatures have excellent diagnostic value.
Machine learning-based key gene selection for diagnostic biomarkers. (A) LASSO regression identified nine significant genes. (B) Random Forest selected the top ten important genes. (C) SVM-RFE identified thirteen significant genes. (D) Venn diagram showing eight core genes shared by all methods. (E) Correlation plot of the eight core genes (red: positive, green: negative). (F) ROC curves for the eight core genes, indicating strong diagnostic potential.
Analysis of immune infiltration and correlation of hub genes with infiltrating immune cells in fibrosis
Since we found that immune-related genes might affect fibrosis, we conducted an immune cell infiltration investigation to shed further light on the immunological regulation of fibrosis. The associations among 23 categories of immune cells demonstrated that most infiltrating immune cells were closely connected (Fig. 6A). The boxplot showed that patients with advanced fibrosis had a higher proportion of activated CD4 T cells, activated dendritic cells, gamma delta T cells, myeloid-derived suppressor cells, mast cells, NK cells, T helper 1 cells, T helper 17 cells, and T helper 2 cells (Fig. 6B). Diverse types of immune cells were uniquely infiltrated in fibrosis patients, which might serve as possible therapeutic targets for fibrosis. The analysis of immune cell infiltration revealed interesting associations between the hub genes and specific immune cell types. CDKN2A exhibited positive correlations with the infiltration of mast cells, activated dendritic cells, activated CD8 T cells, and MDSCs. Similarly, DNMT1 showed positive correlations with the infiltration of activated CD4 T cells, natural killer T cells, gamma delta T cells, and regulatory T cells. However, it displayed a negative correlation with the infiltration of neutrophils. Moreover, MDM2 exhibited positive correlations with the infiltration of natural killer T cells, plasmacytoid dendritic cells, and activated CD4 T cells, while showing negative correlations with the infiltration of neutrophils. THRB and NR3C2 showed positive correlations with the infiltration of activated dendritic cells and MDSCs. SMC6 exhibited positive correlations with the infiltration of type 1 T helper cells and natural killer T cells. RXRA displayed positive correlations with the infiltration of gamma delta T cells. Lastly, PCNA displayed positive correlations with the infiltration of activated dendritic cells, activated mast cells, and monocytes, but showed a negative correlation with the infiltration of gamma delta T cells and mast cells (Fig. 6C). Correlation analysis between the eight genes and all other genes was performed, and the expression of the top 50 positively correlated genes was displayed using heatmaps (Fig. 7). As shown in Fig. 8, NR3C2, PCNA, RXRA, THRB, CDKN2A, SMC6, DNMT1, and MDM2 were strongly enriched in metabolic and immune-related pathways.
Immune cell infiltration and correlation with hub genes in fibrosis. (A) Immune cell infiltration levels in patients with advanced fibrosis. (B) Differences in immune cell infiltration between disease and control groups. (C) Correlations between eight core genes and infiltrating immune cells (ns p > 0.05, * p < 0.05, **p < 0.01, ***p < 0.001).
Consensus clustering analysis of sumoylation gene clusters
The expression levels of eight SUMOylation-related genes were used to determine that k = 2 was determined as the optimal number of clusters due to the highest consensus index and minimal delta area of cumulative distribution function curves (Fig. 9A). Subsequently, the samples obtained from the GEO database were divided into two distinct categories in the consensus clustering analysis: cluster A and cluster B. The expression levels of the SUMOylation-related genes in the two subtypes were visualized using heatmaps and boxplots (Fig. 9B–C). Notably, most SUMOylation-related genes, including PCNA, MDM2, CDKN2A, and DNMT1, exhibited higher expression levels in cluster B than in cluster A.
GSVA of biological pathways between subclusters of sumoylation
Through GSVA analysis, several pathways with differential expression were enriched and displayed in a heatmap. Compared with cluster A, the hallmark activities of PI3K/AKT/mTOR signaling, TNFα signaling via NF-κB, apoptosis, and the P53 pathway were higher in cluster B. In cluster A, the expression of KEGG pathways was associated with glycine, serine, and threonine metabolism, as well as lysine degradation. The results of the Reactome pathway analysis showed that MHC class II antigen presentation and cell cycle checkpoints were mainly enriched in cluster B (Fig. 10).
Results of predictions from the networkanalyst visualization database
We predicted 34 TFs capable of interacting with 8 hub genes using the STRUST database and constructed a TF–hub gene regulatory network. The NetworkAnalyst database was used to predict the target miRNAs of the key genes. A potential miRNA–hub gene network was constructed to accurately investigate the molecular mechanisms underlying the eight SUMOylation-related genes. miR-495 and miR-506 interacted with most of the SUMOylation-related genes, namely, THRB, RXRA, DNMT1, and PCNA (Fig. 11).
The network analysis in Fig. 12, generated using the NetworkAnalyst database, reveals key interactions between specific compounds and target genes. Among the compounds analyzed, Benzo(g)pyrene displays the most prominent associations, interacting with multiple target genes, including MDM2, PCNA, and CDKN2A. This suggests that Benzo(g)pyrene may have a broader impact on modulating gene expression, potentially influencing pathways associated with toxicity and having negative effects on the liver, lungs, and immune system. In addition, acetaminophen demonstrates a targeted connection with SMC6, highlighting a more specific interaction that could indicate a specialized effect within particular cellular pathways. The density of connections around Benzo(g)pyrene implies its central role in the network, possibly marking it as a primary candidate for therapeutic exploration. This network structure provides a foundation for future studies to validate these predicted interactions and explore their potential implications in fibrosis.
MCODE module screening hub genes
In order to make SUMOylation-related genes more representative, we used the MCODE module to identify four important central genes: NR3C2, CDKN2A, THRB, and DNMT1, which exhibit high interconnectivity and occupy central positions in the network. NR3C2 (mineralocorticoid receptor) plays a crucial role in maintaining electrolyte and fluid balance by interacting with corticosteroids, which is essential for renal homeostasis. CDKN2A, as a tumor suppressor, significantly regulates cell cycle progression and serves as a key barrier to uncontrolled cell proliferation. THRB mediates the effects of thyroid hormones on development, growth, and metabolism, which are essential parts of various physiological processes. DNMT1 is responsible for establishing DNA methylation patterns that regulate gene expression, ensuring appropriate cell differentiation and function. These findings highlight the biological significance of these genes and indicate their potential as targets for future research and therapeutic development (Fig. 12).
Validation of SUMOylation-related gene expression in a mouse model of carbon tetrachloride (CCl₄)-induced LF
In order to further validate the bioinformatics results and simulate the pathological process of LF, we successfully constructed a C57BL/6 mouse model of LF induced by CCl₄ for in vivo experiments. This model enables us to study the expression patterns of SUMOylation-related genes during the progression of LF. Through RT-qPCR analysis, we observed that the mRNA levels of the key SUMOylation-related genes DNMT1 and CDKN2A were significantly upregulated in the CCl₄ group, while the mRNA levels of NR3C2 and THRB were significantly decreased, which was statistically significant compared to the control group (Fig. 13A). In addition, Western blot analysis confirmed that the protein expression levels of these genes were significantly increased or decreased in fibrotic liver tissue (Fig. 13B–C). Immunohistochemistry further validated these results. The expression of DNMT1 and CDKN2A in the liver tissue of the CCl₄ group was significantly enhanced, showing stronger positive staining signals. In contrast, the expression of NR3C2 and THRB was significantly reduced in the CCl₄ group. Overall, these findings support the reliability of the bioinformatics analysis and emphasize the potential role of SUMOylation-related genes in LF (Fig. 13D).
Validation of SUMO-related gene expression in a CCl₄-induced LF mouse model. (A) RT-qPCR further confirmed significant differences in the mRNA expression of SUMOylation-related genes. (B–C) Western blot analysis results. (D) HE, Masson’s trichrome staining, Sirius Red staining, and immunohistochemistry all demonstrated the fibrosis process induced by CCl₄ in the samples. (* p < 0.05, **p < 0.01, ***p < 0.001).
Discussion
LF, a pathological condition resulting from prolonged chronic liver injury, may progress to cirrhosis or liver cancer if left untreated, posing a significant threat to patients’ health and survival17. The recent increase in alcoholic liver disease, non-alcoholic fatty liver disease, viral hepatitis, and other hepatic disorders has markedly raised the incidence of LF, presenting a global public health challenge18. The pathophysiological mechanism of LF is intricate, primarily characterized by the aberrant accumulation of extracellular matrix within hepatic tissue, governed by diverse cell types and molecular processes, including the activation of hepatic stellate cells, infiltration of inflammatory cells, and cytokine secretion19. Due to the limited treatment options for LF, a comprehensive understanding of its pathological mechanisms, together with the identification of effective diagnostic markers and therapeutic targets, is both critical and urgent.
This study conducted a thorough examination of the expression patterns of SUMOylation-related genes in patients with LF for the first time and assessed the functions of these genes in the immunological environment of LF through bioinformatics analysis and experimental validation. Research indicates that alterations in SUMOylation substantially influence liver injury and the progression of fibrosis by modulating cytokine release, activating hepatic stellate cells, and facilitating immune cell infiltration. This finding aligns with current literature and underscores the significance of SUMOylation changes in LF20. The precise functions and possible mechanisms of SUMOylation-modified genes in this disease process remain inadequately elucidated, thereby constraining our comprehensive understanding of their significance in the pathophysiology of LF21.
Studies indicate that SUMOylation is crucial for cellular stress responses, transcriptional control, and signal transduction22,23. In LF, SUMOylation may modulate the inflammatory response by influencing the activation of hepatic stellate cells24. Hepatic stellate cells are the primary fibroblasts in the liver, are activated by hepatic damage, and release substantial quantities of cytokines to facilitate the buildup of extracellular matrix25. Our data demonstrate that the expression levels of SUMOylation-related genes are markedly elevated in individuals with LF, implying that these genes may facilitate the progression of fibrosis.
This study utilized statistical analysis of RNA sequencing data using high-throughput genomic techniques, identifying 1,583 differentially expressed genes, 13 of which were linked to changes in SUMOylation. These genes demonstrate significant enrichment in various biological processes, such as cell signal transduction, cell adhesion, and inflammatory responses, indicating their potential relevance in the immunological context of LF. The cytokine-mediated signaling pathways and the regulation of immune system functions associated with these genes may be intricately linked to the onset and progression of LF26.
The invasion of immune cells is essential for the advancement of LF27,28. This study’s assessment of immune cell infiltration revealed a significant increase in many immune cell types in patients with advanced fibrosis, including activated CD4 + T cells, activated dendritic cells, and MDSCs. The presence of these immune cells indicates an elevated inflammatory response in the liver and may influence the pathological progression of fibrosis. Additionally, we noted a substantial association between the expression of SUMOylation-related genes, including CDKN2A and DNMT1, and particular immune cell types, highlighting the significance of these genes in shaping the immunological milieu. The infiltration of cells expressing CDKN2A is positively associated with MDSCs, suggesting that it may facilitate the development of an immunosuppressive environment and thereby worsen LF.
SUMOylation may, moreover, influence the advancement of LF through interactions with various signalling pathways29,30. Prior research indicates that SUMOylation is intricately linked to the NF-κB signalling system and the Wnt/β-catenin pathway, both of which are vital in LF31,32. The results indicate that no genes related to the SUMOylation regulatory pathway (such as FXR, VEGFR, Smad) were found in our gene set, which reflects the targeted design of our study. Our research focuses on SUMOylation-related genes with direct differential expression between mild and advanced LF, and prioritizes genes with diagnostic value for fibrosis staging through machine learning. The genes in the aforementioned pathways may be indirectly regulated by SUMOylation or only weakly associated with fibrosis progression; therefore, they are excluded. This does not negate previous research, but rather strengthens our focus on clinically relevant genes for the diagnosis of LF, and the functional enrichment of our core genes also supplements previous studies on the SUMOylation pathway. Consequently, subsequent studies ought to concentrate on the impact of SUMOylation changes on the mechanisms of LF by modulating these signalling pathways.
This study identified eight critical genes through machine learning approaches that may significantly impact the course of LF and exhibit considerable diagnostic potential. The expression levels of genes such as NR3C2, PCNA, and THRB show a significant association with the severity of fibrosis. These results suggest that genes associated with SUMOylation are essential in the pathological progression of LF and may serve as potential targets for future LF diagnosis. The outcomes of the implemented machine learning methodologies, including LASSO, random forest, and SVM-RFE, demonstrate that the gene combination exhibits considerable specificity and sensitivity in categorizing LF patients, thus affirming its potential as a biomarker.
This study offers novel insights; however, it also has certain drawbacks. The limited sample size of the included GEO datasets may hinder the generalizability of the results. Future research should prioritize increasing the sample size, particularly for validation across patients with various liver conditions, to enhance the external validity of the findings. Secondly, while we validated the expression patterns of SUMOylation-related genes in murine models, these findings require further confirmation in larger animal models and clinical data to elucidate their precise roles and mechanisms in LF. We acknowledge the limitation of the CCl₄-induced LF model: as a chemically induced model, it differs from human LF, and fibrosis resolution in this model does not directly translate to human outcomes. In this study, we used it primarily to preliminarily validate core SUMOylation-related gene expression trends, leveraging its well-established reproducibility in reflecting fibrotic pathology and gene expression changes to confirm consistency with our human dataset-based bioinformatics findings. Moving forward, we plan to employ disease-specific models that better mimic human etiologies for further validation to strengthen translational relevance. Similarly, we recognize that the current biomarker analysis is cross-sectional—focused on associating SUMOylation patterns with known fibrosis stages rather than predicting progression. While this validates the correlation between SUMOylation-related genes and fibrosis severity, we agree that exploring their ability to forecast aggressive progression would enhance clinical relevance. Future studies will establish a longitudinal cohort to follow early-stage fibrosis patients, track SUMOylation gene expression changes, and evaluate these patterns as predictors of advanced fibrosis progression, further confirming their potential as prognostic biomarkers.
The interplay between SUMOylation and other signalling pathways remains inadequately investigated. Future studies should concentrate on the association between SUMOylation and inflammatory responses, fibrosis progression, and immunological modulation. This will enhance comprehension of the intricate role of SUMOylation in LF and establish a foundation for the development of novel therapeutic options.
This research elucidates the possible involvement of SUMOylation-related genes in LF and introduces a novel approach that integrates bioinformatics analysis with experimental validation. Our research offers new insights into the pathogenic mechanisms and potential treatment strategies for LF. A comprehensive understanding of the correlation between SUMOylation and LF could yield novel therapeutic strategies and improve outcomes for patients with LF.
Moreover, subsequent research may investigate the role of SUMOylation in other hepatic conditions, including hepatitis and liver cancer, to assess its viability as a therapeutic target33,34. The simultaneous investigation of the relationship between SUMOylation and small-molecule therapies could yield significant advancements in the treatment of LF35. In therapeutic contexts, delineating the specific regulatory network of SUMOylation-related genes, in conjunction with personalized medication, can provide patients with more precise treatment protocols36.
This study offers novel insights into the function of SUMOylation in LF and lays the groundwork for subsequent research initiatives. Through the examination of the correlation between SUMOylation and hepatic disorders, we aim to provide novel insights and strategies for the prevention and management of liver diseases.
Conclusion
This study analyzed the expression patterns of SUMOylation-related genes in LF patients, emphasizing the important role of these genes in the immune microenvironment of LF. The results indicate that SUMOylation significantly affects the progression of liver injury and fibrosis by regulating cytokine secretion, promoting hepatic stellate cell activation, and enhancing immune cell infiltration. In addition, the key genes identified through machine learning methods have shown great diagnostic potential and may provide new biomarkers for clinical applications. These findings elucidate the role of SUMOylation in the pathogenesis of LF, providing a foundation for exploring novel therapeutic strategies and valuable diagnostic biomarkers.
Data availability
The data that support the findings of this study are available on request from the corresponding author. R codes used for data analyses and visualization are available upon request by contacting the corresponding author.
References
Vilar-Gomez, E. et al. Fibrosis severity as a determinant of cause-specific mortality in patients with advanced nonalcoholic fatty liver disease: A Multi-National cohort study. Gastroenterology 155 (2), 443–457 (2018).
Heyens, L. J. M., Busschots, D., Koek, G. H., Robaeys, G. & Francque, S. Liver fibrosis in non-alcoholic fatty liver disease: from liver biopsy to non-invasive biomarkers in diagnosis and treatment. Front. Med. 8, 615978 (2021).
Yan, Y., Zeng, J., Xing, L. & Li, C. Extra- and Intra-Cellular mechanisms of hepatic stellate cell activation. Biomedicines 9(8), 1014 (2021).
Geervliet, E. & Bansal, R. Matrix metalloproteinases as potential biomarkers and therapeutic targets in liver diseases. Cells 9(5), 1212 (2020).
Tadokoro, T., Morishita, A. & Masaki, T. Diagnosis and therapeutic management of liver fibrosis by MicroRNA. Int J. Mol. Sci 22(15), 8139 (2021).
Chang, Y., Oram, M. K. & Bielinsky, A. SUMO-Targeted ubiquitin ligases and their functions in maintaining genome stability. Int J. Mol. Sci 22(10), 5391 (2021).
Hotz, P. W., Müller, S. & Mendler, L. SUMO-specific isopeptidases tuning cardiac sumoylation in health and disease. Front. Mol. Biosci. 8, 786136 (2021).
Sengupta, A. et al. Sumoylation and its regulation in testicular Sertoli cells. Biochem. Biophys. Res. Commun. 580, 56–62 (2021).
Nie, Q. et al. The e3 ligase PIAS1 regulates p53 sumoylation to control Stress-Induced apoptosis of lens epithelial cells through the proapoptotic regulator Bax. Front. Cell. Dev. Biol. 9, 660494 (2021).
Tomasi, M., Cossu, C. & Ramani, K. Role of sumoylation in Alcohol-induced liver fibrosis. The FASEB Journal. 31 (S1), 602.7 (2017).
Bu, F. et al. SENP2 alleviates CCl(4)-induced liver fibrosis by promoting activated hepatic stellate cell apoptosis and reversion. Toxicol. Lett. 289, 86–98 (2018).
Duan, Y., Zhang, W., Cheng, Y., Shi, M. & Xia, X. A systematic evaluation of bioinformatics tools for identification of long noncoding RNAs. RNA 27 (1), 80–98 (2021).
Goodman, Z. D. Grading and staging systems for inflammation and fibrosis in chronic liver diseases. J. Hepatol. 47 (4), 598–607 (2007).
Brunt, E. M., Kleiner, D. E., Wilson, L. A., Belt, P. & Neuschwander-Tetri, B. A. Nonalcoholic fatty liver disease (NAFLD) activity score and the histopathologic diagnosis in NAFLD: distinct clinicopathologic meanings. Hepatology 53 (3), 810–820 (2011).
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44 (D1), D457–D462 (2016).
Kanehisa, M. & Goto, S. K. E. G. G. Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28 (1), 27–30 (2000).
Caligiuri, A., Gentilini, A., Pastore, M., Gitto, S. & Marra, F. Cellular and molecular mechanisms underlying liver fibrosis regression. Cells 10(10), 2759 (2021).
Tourkochristou, E., Assimakopoulos, S. F., Thomopoulos, K., Marangos, M. & Triantos, C. NAFLD and HBV interplay - related mechanisms underlying liver disease progression. Front. Immunol. 13, 965548 (2022).
Garbuzenko, D. V. Pathophysiological mechanisms of hepatic stellate cells activation in liver fibrosis. World J. Clin. Cases. 10 (12), 3662–3676 (2022).
Zhang, N. et al. Ongoing involvers and promising therapeutic targets of hepatic fibrosis: the hepatic immune microenvironment. Front. Immunol. 14, 1131588 (2023).
Zeng, M., Liu, W. & Hu, Y. Fu, N. Sumoylation in liver disease. Clin. Chim. Acta. 510, 347–353 (2020).
Enserink, J. M. The SUMO stress response in transcriptional regulation: causal relationships or secondary bystander effects? Bioessays 44 (7), e2200065 (2022).
Li, N., Zhang, S., Xiong, F., Eizirik, D. L. & Wang, C. SUMOylation, a multifaceted regulatory mechanism in the pancreatic beta cells. Semin Cell. Dev. Biol. 103, 51–58 (2020).
Dai, J. et al. Role of PML sumoylation in arsenic trioxide-induced fibrosis in HSCs. Life Sci. 251, 117607 (2020).
Tsuchida, T. & Friedman, S. L. Mechanisms of hepatic stellate cell activation. Nat. Rev. Gastroenterol. Hepatol. 14 (7), 397–411 (2017).
Zhangdi, H. et al. Crosstalk network among multiple inflammatory mediators in liver fibrosis. World J. Gastroenterol. 25 (33), 4835–4849 (2019).
Koda, Y., Nakamoto, N. & Kanai, T. Regulation of progression and resolution of liver fibrosis by immune cells. Semin Liver Dis. 42 (4), 475–488 (2022).
Pan, J. et al. Exploration of immune infiltration and feature genes in viral hepatitis-associated liver fibrosis using transcriptome data. Ann. Transl Med. 10 (19), 1051 (2022).
Tomasi, M. L. & Ramani, K. SUMOylation and phosphorylation cross-talk in hepatocellular carcinoma. Transl Gastrointest. Cancer. 3, 20 (2018).
Zhou, J. et al. SUMOylation inhibitors synergize with FXR agonists in combating liver fibrosis. Nat. Commun. 11 (1), 240 (2020).
Fan, L. et al. Regulation of sumoylation targets associated with Wnt/β-Catenin pathway. Front. Oncol. 12, 943683 (2022).
Ma, B. & Hottiger, M. O. Crosstalk between Wnt/β-Catenin and NF-κB signaling pathway during inflammation. Front. Immunol. 7, 378 (2016).
Zhang, C. et al. Saikosaponin-d inhibits the hepatoma cells and enhances chemosensitivity through SENP5-Dependent Inhibition of gli1 sumoylation under hypoxia. Front. Pharmacol. 10, 1039 (2019).
Yuan, H. et al. The role of protein sumoylation in human hepatocellular carcinoma: A potential target of new drug discovery and development. Cancers 13(22), 5700 (2021).
Yang, Y. et al. Small-Molecule inhibitors targeting protein sumoylation as novel anticancer compounds. Mol. Pharmacol. 94 (2), 885–894 (2018).
Kukkula, A. et al. Therapeutic potential of targeting the SUMO pathway in cancer. Cancers 13(17), 4402 (2021).
Acknowledgements
We acknowledge the GEO database for providing public gene expression datasets used in this research.
Author information
Authors and Affiliations
Contributions
Z.S. and J.X. designed the study, performed bioinformatics analyses, and drafted the manuscript. Y.D. conducted experimental validation and analyzed experimental data. J.S. assisted with animal model establishment and histopathological staining. C.J. supervised tntent. All authors reviewed and approved the final version.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Su, Z., Ding, Y., Xue, J. et al. Identifying SUMOylation-related genes in liver fibrosis with bioinformatics and experimental models for diagnostic insights. Sci Rep 15, 39783 (2025). https://doi.org/10.1038/s41598-025-23516-8
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-23516-8















