Introduction

Liver cancer is a leading cause of cancer-related mortality worldwide, with over 760,000 new cases and 870,000 deaths reported in 20221. Among its various subtypes, hepatocellular carcinoma (HCC) represents the predominant form, accounting for ~80% of all cases2. Despite advancements in diagnostic methods, the prognosis for HCC remains poor, with a 5-year survival rate below 20%3. This is partly due to the diagnostic and therapeutic challenges posed by the tumor heterogeneity4,5. The complex molecular and histological variability within and between tumors complicates early detection and effective treatment6, underscoring the need for a deeper understanding of its underlying mechanisms.

Tumor heterogeneity in HCC manifests at multiple levels, including genetic, epigenetic, and phenotypic diversity7,8. This variability is driven by factors such as genetic mutations, tumor microenvironment interactions, and the dynamic behavior of cancer stem cells (CSCs)9,10. The presence of distinct tumor subpopulations with differing molecular profiles complicates treatment, as certain cell populations may exhibit resistance to conventional therapies11. Additionally, the process of lineage transition, where cancer cells evolve to adopt drug-resistant or metastasis-prone phenotypes, further exacerbates treatment failure and tumor recurrence12. Understanding these mechanisms is crucial for developing more precise therapeutic strategies.

Sialylation, a glycosylation modification that adds sialic acid residues to proteins and lipids, plays a critical role in regulating tumor progression13. In HCC, aberrant sialylation influences several key processes, including immune evasion, metastasis, and resistance to therapy14,15,16. Elevated expression of sialyltransferases, such as St3gal3, has been linked to poor prognosis and aggressive tumor phenotypes17. Sialylated glycans interact with Siglec receptors on immune cells, promoting immune suppression and creating an immunosuppressive tumor microenvironment (TME)15,18. These modifications not only contribute to immune evasion but also affect tumor cell signaling pathways, thus influencing tumor survival and progression. Targeting sialylation presents a promising therapeutic approach to overcome these barriers.

The complexity and high-dimensional nature of single-cell and bulk transcriptomic data pose a significant challenge to deciphering the cellular and molecular underpinnings of tumor heterogeneity. Traditional analytical methods often fall short in capturing the nonlinear relationships and intricate patterns within such data. To this end, machine learning approaches have emerged as powerful tools for unbiased exploration and integration of multi-omics data19. In the context of HCC, machine learning algorithms can deconvolute the tumor microenvironment, reconstruct cellular trajectories, and infer cell-cell communication networks, thereby revealing previously unappreciated layers of heterogeneity20.

This study aims to explore how tumor heterogeneity and sialylation modifications regulate HCC progression and influence treatment outcomes. By integrating these machine learning-powered insights, we aim to develop a prognostic model that integrates tumor molecular profiles and sialylation-related signatures, which could improve patient stratification and guide therapeutic decision-making.

Results

Comprehensive single-cell profiling reveals key cellular heterogeneity in HCC

To explore the cellular diversity within HCC, we performed single-cell RNA sequencing on a cohort of 32,247 cells. Through rigorous quality control and dimensionality reduction, we identified eight major cell types based on well-established marker genes: hepatocytes, fibroblasts, endothelial cells, T/NK cells, B cells, plasma cells, myeloid cells, and MAST cells (Fig. 1A, B). These classifications were further validated by analyzing the cell cycle and distribution of cell types across early and advanced tumor stages, hepatocytes from advanced stage tumors showed a significant reduction in proportion compared to early-stage tumors, highlighting the dynamic changes in tumor cell composition as the disease progresses (Fig. 1C, D). Marker expression for each cell type was subsequently visualized (Fig. 1E, F), ensuring accurate classification. Hepatocytes were characterized by high ALB expression, fibroblasts by high DCN expression, endothelial cells by high PECAM1 expression, T/NK cells by high CD3D and NKG7 expression, B cells by high CD79A expression, plasma cells by high JCHAIN expression, myeloid cells by high LYZ expression, and mast cells by high KIT expression. Notably, we also observed distinct distributions of cell cycle scores (G2M, S) and RNA features (nFeatureRNA and nCountRNA) across the different cell types (Fig. 1G, H), providing further insight into the functional state of these cells. Hepatocytes exhibited the highest number of features and counts, indicating their active participation in tumor progression and cellular heterogeneity within the tumor microenvironment. KEGG pathway enrichment analysis highlighted critical biological processes associated with each cell type. Hepatocytes, in particular, showed significant enrichment for pathways related to “Chemical carcinogenesis - reactive oxygen species,” suggesting a key role of oxidative stress in driving liver cancer progression (Fig. 1H).

Fig. 1: Single-cell expression atlas of HCC.
figure 1

A, B UMAP plots showing the clustering of single cells from HCC tumors into eight distinct cell types: hepatocytes, fibroblasts, endothelial cells, T/NK cells, B cells, plasma cells, myeloid cells, and MAST cells. C Proportion of different cell types across the G1, S, and G2M phases of the cell cycle. D Proportion of cell types across early and advanced tumor stages. E, F Expression of cell type-specific markers in each cluster shown in bubble plots and UMAP plots. G Distribution of cell cycle scores and RNA metrics across different cell types. H KEGG enrichment analysis for cell type-specific markers.

Identification of distinct malignant subpopulations with stage-specific molecular profiles

To further analyze the heterogeneity of hepatocytes, we performed secondary dimensionality reduction and clustering, identifying five distinct hepatocyte subpopulations (Supplementary Fig. 1A). When cross-referenced with the TNM staging system, we observed that cluster 4 was predominantly composed of stage IV hepatocytes, underscoring its relevance in advanced tumor progression (Supplementary Fig. 1B). We assessed CNV in hepatocytes using infercnv, with fibroblasts and endothelial cells serving as reference populations. In terms of CNV scores, cluster 4 exhibited the highest scores, consistent with its association with stage IV tumors, which also showed the highest CNV scores (Supplementary Fig. 1C-D). Based on the gene expression profiles, we identified five distinct subpopulations of malignant cells, labeled as C0 (CYP2D6+ malignant), C1 (S100A6+ malignant), C2 (UBE2C+ malignant), C3 (HULC+ malignant), and C4 (S100A9+ malignant) (Fig. 2A, B). Subpopulations C0, C2, and C3 were predominantly present in early-stage tumors, while C1 and C4 were enriched in advanced-stage tumors, with C4 almost exclusively observed in advanced samples (Fig. 2B–E). These findings were further supported by the Ro/e heatmap analysis, which confirmed C1 and C4 as the dominant populations in advanced tumors (Fig. 2F). Differential marker expression was visualized, revealing key markers associated with each subpopulation, For example, C4 cells exhibited high expression of S100A8, S100A9, SLPI, S100P, and CCL26, highlighting their distinct molecular profile (Fig. 2G, H). CNV scores were significantly higher in C4 compared to other subpopulations (Fig. 2I, J), supporting its role as a prominent malignant cell type. Additionally, GO enrichment analysis revealed that C0 was enriched for metabolic pathways, C1 for ribosomal functions, C2 for nuclear-related processes, C3 for mitochondrial energy metabolism, and C4 for protein translation (Fig. 2K–M). Notably, pathway analysis for metabolic processes revealed that, compared to the other four subpopulations, C4 exhibited significantly enhanced oxidative phosphorylation scores, suggesting a heightened metabolic activity associated with its malignant phenotype (Fig. 2N).

Fig. 2: Tumor cell analysis and malignant subpopulation identification.
figure 2

AC UMAP plots of tumor cells showing five distinct malignant subpopulations. D, E Proportion of tumor cells from different stages (early vs advanced) within each malignant cluster. F Ro/e heatmap showing the association between stages and tumor subpopulations. G Bubble plots of marker expression across malignant cell clusters. H Volcano plots showing differential gene expression between tumor subpopulations. I, J Comparison of CNV scores between different subpopulations. KM GO enrichment analysis of differentially expressed genes in malignant subpopulations. L Shared enrichment results across all tumor subpopulations. M Unique enrichment results for each tumor subpopulation. N ssGSEA scores for KEGG metabolic pathways in malignant subpopulations.

Tumor cell differentiation trajectories define early and late-stage progression in HCC

To investigate the differentiation trajectories of tumor cells, we performed monocle2 trajectory analysis, which revealed distinct differentiation paths. C1 was primarily located at the early stages of the trajectory, while C0, C2, and C3 were situated in the later stages. Interestingly, C4 formed a separate branch, indicating a unique differentiation pathway (Fig. 3A–E). Slingshot trajectory analysis further defined two main differentiation paths, with C4 occupying an intermediate position in the trajectory. Combining the CytoTRACE scores with its intermediate location in the differentiation path, it is evident that C4 represents a pivotal population in driving disease progression (Fig. 3G–I). Marker gene expression along pseudotime demonstrated distinct functional changes across cell populations, providing further insights into the dynamic changes in gene regulation as tumor cells differentiate (Fig. 3J). Genes correlated with pseudotime were identified, which allowed us to investigate the transcriptional dynamics underlying tumor progression (Fig. 3K).

Fig. 3: Trajectory analysis of tumor cell subpopulations.
figure 3

A Monocle2 trajectory plot showing the progression of tumor cells along a pseudotime axis. B, C Distribution of different tumor cell subpopulations across the pseudotime axis in Monocle2, with each subpopulation colored accordingly. D Monocle2 trajectory plot colored by state. E Proportion of cell subpopulations within each state along the pseudotime trajectory. F CytoTRACE analysis of tumor cell differentiation potential. GI Slingshot trajectory analysis, showing two distinct differentiation pathways colored by tumor stage, cell type, and state. J Expression of top markers in each subpopulation across pseudotime. K Heatmap of genes correlated with pseudotime.

IMalignant cell communication networks highlight key signaling interactions driving tumor progression

Cell-cell communication analysis was performed using CellChat, revealing significant interactions between malignant tumor subpopulations (C1 and C4) and other cell types, particularly endothelial and myeloid cells (Fig. 4A–C). Upon comparing the incoming and outgoing communication patterns, we found that tumor cells exhibited significant activity in several signaling pathways, including MIF, MIK, SPP1, and PARs, suggesting their critical role in tumor microenvironment modulation (Fig. 4D). We identified key signaling pathways involved in these interactions, with a focus on the MK pathway, which was highly active in both C1 and C4 (Fig. 4E, F). C1 cells exhibited increased release of MK signaling molecules, influencing endothelial and fibroblast cells (Fig. 4E). Additionally, the MIF-CD74/CXCR4 signaling axis was strongly active between C1 malignant cells and myeloid cells (Fig. 4G), highlighting a potential immune evasion mechanism. Expression of key ligand genes in these pathways, including MIF and MDK, was significantly elevated in C1 cells (Fig. 4H), further supporting the importance of these interactions in tumor progression. The MDK-NCL communication network was also visualized, suggesting potential therapeutic targets (Fig. 4I).

Fig. 4: Cell-cell communication analysis in HCC.
figure 4

AC Cell-cell communication network maps for tumor cells, showing interactions between C1 (malignant) and C4 cells with other cell types. Line thickness represents the number of interactions, and weight indicates the strength of communication. D Bubble plots of incoming and outgoing signaling pathways for each cell type. E, F MK signaling pathway communication heatmap. G Bubble plot showing the communication from C1 malignant cells to other cell types. H Bubble plot showing the expression of key ligand genes in different cell types. I Cell-cell communication network between MDK and NCL receptors.

PGAM2 acts as a pivotal regulator in the transcriptional landscape of HCC

We identified key transcription factors regulating tumor progression by analyzing the CSI matrix and performing clustering. Three major transcriptional modules (M1, M2, and M3) were identified (Fig. 5A–C). By integrating transcription factor activity scores (AUC values), we found that module M1 exhibited high activity across C0, C1, C2, and C3 populations, whereas module M3 showed pronounced activation specifically in C1 and C4 cells (Fig. 5D, E). These findings indicate distinct transcriptional regulatory programs underlying different malignant subpopulations. At the individual transcription factor level, XBP1 activity was predominant in C0, C2, and C3 subpopulations, while PGAM2 was most active in C1 cells (Fig. 5F). Furthermore, XBP1 exhibited the highest activity in early-stage tumors, whereas PGAM2 and ELF3 were markedly activated in advanced-stage tumors, reflecting a stage-dependent shift in transcriptional regulation (Fig. 5G). Across all five malignant cell subpopulations, PGAM2 showed the highest AUC score in C1, ELF3 exhibited strong activity in both C1 and C4, and XBP1 maintained consistently high activity across C0–C3 (Fig. 5H, I). Collectively, these results suggest that PGAM2, ELF3, and XBP1 function as key transcriptional regulators orchestrating stage-specific tumor progression and malignant phenotype transitions.

Fig. 5: Transcription factor activity in tumor cell subpopulations.
figure 5

A Heatmap of transcription factor activity in tumor cell subpopulations, based on CSI matrix and clustering of regulatory modules (M1, M2, M3). B UMAP plot showing the distribution of five distinct tumor subpopulations. C UMAP plot showing the distribution of early and advanced stage tumor cells. D, E AUC scores for transcription factors in each cluster. F Scatter plot showing RSS for different transcription factors in the five tumor subpopulations. G Scatter plot showing RSS for different transcription factors in early and advanced tumor subpopulations. H, I Visualization of four transcription factors (PGAM2, ELF3, XBP1, JUNB) using AUC bar plots and UMAP plots.

We investigated the relationship between PGAM2 expression and sialylation (Supplementary Fig. 2A, B), a post-translational modification often associated with cancer progression. Using ssGSEA, we calculated sialylation scores and found that C1 cells, which are in the early stage of tumor differentiation, exhibited significantly higher sialylation scores compared to late-stage cells (C2, C3) (Supplementary Fig. 2C). A positive correlation between PGAM2 expression and sialylation was confirmed, suggesting that PGAM2 may regulate sialylation pathways, which are critical for tumor cell metastasis and immune evasion (Supplementary Fig. 2D).

Development of a prognostic model based on PGAM2 and sialylation-related genes in HCC

To further evaluate the clinical significance of PGAM2 and its associated sialylation-related pathways, we constructed a prognostic model integrating genes linked to both PGAM2 expression and sialylation activity. Candidate genes were first identified through univariate Cox regression analysis, yielding 12 prognosis-related genes, among which ST6GALNAC4, SIGLECL1, GALNTL5, B4GALT5, and AGRN were identified as prominent risk factors associated with poor survival outcomes (Fig. 6A). Subsequently, LASSO regression and multivariate Cox analysis refined the model to nine key prognostic genes, with GALNTL5, B4GALT5, and SIGLECL1 emerging as the most statistically significant contributors to patient prognosis (Fig. 6B, C). The derived risk score model demonstrated strong predictive capacity. Kaplan–Meier survival analysis revealed that patients in the high-risk group exhibited significantly shorter overall survival compared to those in the low-risk group (Fig. 6D–F), indicating the clinical relevance of the gene signature. PCA further showed a clear separation between high- and low-risk groups, reflecting distinct transcriptional profiles underlying the two risk categories (Fig. 6G). In addition, time-dependent ROC curve analysis showed that the model achieved robust predictive accuracy, with AUC values exceeding 0.7 for 1-, 3-, and 5-year survival (Fig. 6H). Together, these findings demonstrate that the PGAM2- and sialylation-associated gene signature provides an effective tool for prognostic stratification in HCC, highlighting its potential utility for risk assessment and clinical decision-making (Fig. 6I).

Fig. 6: Prognostic model based on PGAM2-regulated genes and sialylation.
figure 6

A Forest plot showing the results of univariate Cox regression analysis for 12 genes associated with survival, selected from the PGAM2-related transcription factor TARGET gene chain (importance > 10) and sialylation-related genes. B LASSO-Cox regression analysis to select prognostic genes, leading to the identification of a risk model based on 9 genes. C Multivariate Cox regression analysis and forest plot showing the hazard ratios for each prognostic gene. D Bar plot displaying the Coef of the 9 prognostic genes used to calculate risk scores. E, F Risk score vs survival time scatter plot and heatmap showing the expression of prognostic genes across different risk groups. G PCA plot showing dimensionality reduction of prognostic genes, visualizing the separation between high- and low-risk groups. H Time-dependent ROC curves for 1, 3, and 5 years, demonstrating the predictive accuracy of the risk model. I Kaplan–Meier survival curve comparing high-risk and low-risk groups.

High-risk and low-risk subgroups exhibit distinct functional signatures linked to tumor progression

Differential expression analysis between the high-risk and low-risk groups revealed a distinct transcriptional landscape that underpins their divergent clinical outcomes. Several genes, including CA9, PAEP, EPO, and SLC10A1, were markedly upregulated in the high-risk subgroup (Fig. 7A, B), suggesting their potential involvement in promoting tumor aggressiveness and metabolic reprogramming. To further elucidate the biological processes associated with these differential expression patterns, KEGG pathway enrichment analysis demonstrated a significant enrichment of the IL-17 signaling pathway (Fig. 7C), a well-established mediator of tumor-associated inflammation and immune modulation. In parallel, GO enrichment analysis revealed that high-risk group–associated genes were predominantly linked to cell cycle–related processes, including mitotic nuclear division and chromatid segregation (Fig. 7D), indicating that enhanced proliferative activity is a hallmark of tumors with poor prognosis. Additionally, GSEA analysis identified key pathways related to cell chemotaxis and sphingolipid metabolism that were upregulated in high-risk tumors, providing further insights into the molecular mechanisms driving tumor progression (Fig. 7E, F).

Fig. 7: Functional analysis of high- and low-risk groups.
figure 7

A Heatmap of the top 30 differentially expressed genes between high-risk and low-risk groups, sorted by logFC. B Volcano plot showing the differential gene expression analysis of high- and low-risk groups. C, D KEGG and GO enrichment analysis of upregulated genes in the high-risk group. E Heatmap of pathway scores from msigdb. F GSEA results for five key biological pathways.

Immune microenvironment, mutation patterns, and drug sensitivity further define high-risk tumor characteristics

Immune infiltration analysis revealed a higher proportion of Tregs in the high-risk group, along with decreased infiltration of anti-tumor immune cells (Fig. 8A, B). Correlation analysis between immune cell populations and risk scores identified significant associations with macrophages and Tregs (Fig. 8C, D). Additionally, the TIDE algorithm predicted poorer responses to immunotherapy in the high-risk group (Fig. 8E). Further comprehensive assessment of immune infiltration levels in HCC using the ESTIMATE, CIBERSORT, and xCell algorithms consistently demonstrated an immunosuppressive microenvironment in the high-risk group (Fig. 8F). Mutation frequency analysis and drug sensitivity predictions further emphasized the clinical relevance of the prognostic model, with high-risk patients exhibiting poorer responses to common therapies (Fig. 8G–I).

Fig. 8: Immune infiltration, mutation landscape, and drug sensitivity.
figure 8

A, B Bar and box plots showing immune cell composition predicted by the CIBERSORT algorithm in high-risk and low-risk groups. C Correlation between immune cell infiltration and risk scores. D Heatmap showing the correlation between immune cell infiltration and prognostic genes. E TIDE analysis showing significantly higher scores in the high-risk group. F Heatmap of immune infiltration levels in HCC based on ESTIMATE, CIBERSORT, and xCell algorithms. G Mutation waterfall plot comparing the top 20 mutations in high- and low-risk groups. H Correlation analysis between immune checkpoint genes and prognostic genes. I Box plots comparing drug sensitivity between high-risk and low-risk groups for 12 chemotherapy agents.

Functional validation of AGRN and its role in liver cancer progression

The role of AGRN as a prognostic marker in HCC was further explored through experimental validation. Comparison of mRNA expression levels between HCC tumor and adjacent normal tissues revealed significantly higher AGRN expression in the tumors (Fig. 9A). Among several HCC cell lines, Hep 3B and HuH-6 exhibited the highest AGRN levels, and were selected for subsequent experiments (Fig. 9B). To investigate the functional role of AGRN, siRNA targeting AGRN was transfected into HCT116 and Hep 3B cells, resulting in a substantial reduction in AGRN expression (Fig. 9C). Cell proliferation, measured by the CCK-8 assay, showed a significant decrease in growth rates in AGRN-depleted cells, suggesting that AGRN promotes HCC cell proliferation (Fig. 9D, E). Flow cytometry-based apoptosis assays revealed a marked increase in apoptosis in AGRN-knockdown Hep 3B cells compared to controls (Fig. 9F, G). Further functional assays using transwell chambers demonstrated that AGRN knockdown significantly reduced cell migration and invasion, supporting the notion that AGRN enhances the aggressive behavior of HCC cells (Fig. 9H, I).

Fig. 9: Expression and functional analysis of AGRN in HCC.
figure 9

A Relative AGRN mRNA expression in HCC tumors and adjacent normal tissues. B AGRN mRNA levels in LO2 and various HCC cell lines. C qRT-PCR validation of AGRN knockdown efficiency in Hep 3B and HuH-6 cells post-siRNA transfection. D, E CCK-8 assays evaluating cell proliferation over 4 days in control and AGRN knockdown groups (n = 5 per group). F, G Flow cytometric analysis of apoptosis in AGRN-silenced Hep 3B cells. H, I Transwell migration and invasion assays in Hep 3B cells with AGRN knockdown.

Discussion

Our single-cell transcriptomic analysis of HCC delineates five distinct malignant cell subpopulations with stage-specific dominance. The advanced-stage associated C1 and C4 subpopulations engage in specific pro-tumorigenic crosstalk, such as MDK and MIF signaling, within the tumor microenvironment. Building on this, we established a potent prognostic model centered on PGAM2 and sialylation-related genes, which effectively stratifies patients and reflects an immunosuppressive phenotype in high-risk cases. Furthermore, functional validation confirmed AGRN, a key model component, as a bona fide oncoprotein driving HCC proliferation and invasion.

Our findings reinforce the concept of cellular heterogeneity in HCC, central to understanding its aggressive nature and therapeutic resistance. Previous studies have suggested that hepatocellular carcinoma, like many cancers, is driven by a subpopulation of tumor-initiating cells (TICs) with stem-like properties and immune evasion capabilities21,22. Our trajectory analysis revealed distinct differentiation pathways for tumor cells, suggesting that different subpopulations—particularly C1 may drive early and late-stage progression, respectively. This is consistent with the concept of early dissemination and high neoantigen intratumor heterogeneity observed in metastases, which complicates immune recognition and treatment efficacy23.

Our results underscore the cellular diversity within HCC, where distinct tumor subpopulations contribute to various stages of tumor progression. The C1 subpopulation, predominantly found in early-stage tumors, exhibited elevated expression of S100A11 and increased secretion of MDK—findings consistent with previous literature linking MDK to tumor progression and metastasis24,25. Specifically, circMDK, derived from the MDK gene, has been identified as an oncogenic circRNA that activates the PI3K/AKT/mTOR pathway, promoting tumor cell proliferation, migration, and invasion26. This observation aligns with our finding that C1 cells secrete higher levels of MDK, suggesting a potential link between PGAM2-driven metabolic changes and MDK signaling in HCC progression.

Additionally, the elevated expression of S100A11 in C1 tumor cells is consistent with prior studies highlighting its role in inflammation and tumor progression27. S100A11 is often deregulated early in hepatocellular carcinoma, particularly in the context of steatosis, and is involved in multiple stages of cancer development28. Our findings suggest that S100A11, in conjunction with PGAM2 and MDK, may contribute to the aggressive nature of early-stage tumors, making it a potential therapeutic target for early intervention.

Beyond the early-stage PGAM2-driven dynamics, our study highlights the C4 subpopulation (S100A9⁺ malignant) as a metabolic engine propelling advanced HCC. Characterized by the highest CNV burden and near-exclusive enrichment in late-stage tumors, C4 cells undergo profound metabolic rewiring, transitioning from glycolysis to a predominant reliance on oxidative phosphorylation (OXPHOS). Importantly, this OXPHOS surge is not merely a metabolic adaptation but is intimately coupled to elevated reactive oxygen species (ROS) production, positioning ROS as a central signaling mediator and driver of malignancy. This OXPHOS–ROS axis establishes a self-reinforcing loop that fuels tumor aggressiveness. The enhanced OXPHOS satisfies the substantial energy and biosynthetic demands of rapidly proliferating and invasive cells29, while the concomitant ROS burst functions as a potent intracellular signal. Elevated ROS can stabilize hypoxia-inducible factors (HIFs) even under normoxic conditions, reinforcing a pro-tumorigenic transcriptional program30,31. Concurrently, ROS promotes degradation of key tumor suppressors such as E-cadherin—via mechanisms including RNF25-mediated ubiquitination—thereby facilitating EMT and metastatic potential32.

The marked enrichment of S100A9 in C4 provides a plausible upstream mechanism for this vicious cycle. S100A9, often induced under TACE-associated hypoxia, can scaffold PGAM5, enhancing its stability and promoting mitochondrial fission33. The resulting fragmented mitochondria are predisposed to ROS generation, locking C4 cells in a state of sustained OXPHOS-driven oxidative stress. Moreover, this ROS-enriched environment functions as a selective pressure favoring cancer stem-like properties34, activating stress-response pathways such as Nrf2 and triggering mitophagy35, processes that are closely linked to stemness maintenance and therapy resistance in HCC, as observed in ADAR1-enriched LCSCs36. Positioned at a critical branch point in the differentiation trajectory, C4 cells leverage their OXPHOS–ROS core not only to meet energetic demands but also to orchestrate a pro-metastatic signaling network, sustain stem-like plasticity, and resist therapeutic pressures. Together, these features underscore C4 as a pivotal driver of late-stage HCC malignancy.

Sialylation modifications have long been implicated in cancer progression, particularly in immune evasion and metastasis. Our study found a significant correlation between PGAM2 expression and increased sialylation in early-stage tumor cells, supporting the idea that sialylation is critical for metastasis and immune evasion. The HBV X protein has been reported to regulate sialyl Lewis A synthesis, suggesting that viral proteins modulate glycosylation events to enhance metastatic potential37. Interestingly, our study also revealed that higher sialylation scores were associated with early-stage C1 cells, which further supports the hypothesis that sialylation is not only a feature of late-stage disease but also a key factor in early-stage tumor progression. These findings contribute to the growing body of evidence linking sialylation to immune escape in liver cancer.

Beyond sialylation, we observed significant enrichment in sphingolipid metabolism pathways in high-risk tumor subgroups. This is in line with literature reports linking sphingolipid metabolism in tumor-associated macrophages to resistance to immunotherapy in hepatocellular carcinoma38. Our study’s finding that high-risk tumors exhibit altered sphingolipid metabolism suggests that metabolic reprogramming in the TME plays a crucial role in conferring resistance to treatment, particularly immunotherapy. This highlights the potential for targeting metabolic pathways such as sphingolipid metabolism to improve therapeutic outcomes in HCC.

PGAM2, identified as a key regulator of early-stage HCC, exhibited the highest expression in C1 tumor cells, which are associated with early tumor stages. This is consistent with existing literature, which implicates PGAM2 in promoting glycolysis—a hallmark of cancer metabolism39. Our results further support the notion that PGAM2 plays a pivotal role in metabolic reprogramming during tumor progression, particularly in early-stage HCC, where high glycolytic activity is essential for tumor growth and survival40. Although our study did not directly investigate the regulatory mechanisms of PGAM2, existing literature suggests that PGAM2 activity is regulated by post-translational modifications, such as acetylation at lysine 10041. The acetylation status of PGAM2 affects its enzymatic activity, and enzymes like SIRT2 can regulate this process39. Oxidative stress has been shown to promote PGAM2 activation, thereby enhancing glycolysis in tumor cells. This suggests that PGAM2 may function as a metabolic switch, responding to the tumor microenvironment to support cancer cell proliferation. Furthermore, PGAM2’s involvement in glycolysis, nucleotide biosynthesis, and redox homeostasis highlights its broad role in cancer metabolism.

Our functional validation firmly establishes AGRN as a potent oncoprotein in HCC, driving proliferation, invasion, and suppressing apoptosis. Although studies on AGRN in liver cancer remain scarce, the convergence of evidence from other malignancies underscores its potential as a central orchestrator of HCC malignancy, extending beyond a prognostic marker. Mechanistically, AGRN appears to operate through two interconnected axes: tumor-intrinsic signaling and tumor-immune modulation. Intrinsically, the pronounced pro-invasive phenotype associated with AGRN expression aligns with its capacity to activate the NOTCH1 pathway, a well-established driver of EMT and invasion in HCC42. This positions the AGRN–NOTCH1 axis as a plausible mechanistic core underlying the aggressive traits observed in our high-risk subgroup. Extrinsically, the immunosuppressive microenvironment characterized by Treg enrichment and predicted poor response to immunotherapy in high-risk patients may be partially explained by AGRN-mediated immune modulation. Evidence from breast cancer indicates that AGRN can scaffold interactions stabilizing PD-L1, and its knockdown enhances CD8⁺ T cell infiltration43. By analogy, elevated AGRN in our high-risk HCC cohort may actively sculpt an immune-cold microenvironment, contributing to therapy resistance. Taken together, AGRN emerges not as a passive component but as a nexus protein, potentially coordinating a dual assault on tumor progression: promoting aggressive tumor cell behavior via pathways such as NOTCH1, while simultaneously suppressing anti-tumor immunity. This integrated model, grounded in our functional data and informed by cross-cancer insights, elevates AGRN from a gene signature component to a compelling therapeutic candidate, offering a strategy to simultaneously curb tumor growth and restore immune surveillance in HCC.

While this study provides valuable insights into the cellular heterogeneity of HCC, there are several limitations. Although the sample size of 32,247 cells is substantial, it may not fully capture the diversity of rare tumor or immune cell subpopulations. Future studies with larger cohorts could provide a more comprehensive view of the tumor microenvironment. Additionally, while PGAM2 was identified as a key regulator, functional validation using techniques such as CRISPR-Cas9 is necessary to confirm its role in tumor progression and therapy response. Lastly, while our prognostic model showed strong predictive power, it requires validation in independent cohorts to ensure its clinical applicability.

In conclusion, our work delineates an association between metabolic reprogramming, immune evasion, and clinical prognosis in HCC. Our data suggest that PGAM2 may serve as a key node, connecting glycolytic flux to sialylation, which is correlated with an immunosuppressive microenvironment and aggressive tumor behavior. The prognostic model we developed, rooted in these observations, effectively stratified patients in our cohort and offers a candidate tool for outcome prediction. Future work focused on validating these targets in preclinical models is essential to determine the translational potential of these insights for HCC patients.

Materials and methods

Acquisition and processing of transcriptomic data

RNA expression data for HCC and corresponding clinical information were obtained from the Xena database (https://xenabrowser.net/) for a cohort of 368 patients. The transcriptomic data were pre-processed and converted to transcripts per million (TPM) format. Log2 transformation was applied for downstream analyses. A validation cohort was used to assess the stability and accuracy of the predictive model. Gene expression data were normalized and subsequently analyzed for model development and validation.

Acquisition and processing of single-cell RNA sequencing data

Single-cell RNA sequencing (scRNA-seq) data was downloaded from the GEO database (GSE149614), containing 10 HCC tumor samples (4 early-stage and 6 advanced-stage). The original data were generated by aligning sequencing reads to the human reference genome GRCh38 (hg38) using the Cell Ranger pipeline. Data analysis was performed using R software (version 4.1.3) and the Seurat package (version 4.0.0). Initial data preprocessing included quality control of cells based on mitochondrial gene content (less than 20%), blood cell contamination (less than 3%), and total UMI count (200–40,000) and gene count (200–7000). Normalization was performed using the NormalizeData function, and highly variable genes were selected using FindVariableFeatures (top 2000 genes). Cell cycle effects were regressed out with the parameters vars.to.regress = c (“S.Score”, “G2M.Score”). Batch effect correction was performed using the Harmony algorithm. Dimensionality reduction and clustering were conducted using UMAP (RunUMAP) and Louvain clustering (FindClusters), respectively. Differential gene expression between clusters was performed using FindAllMarkers, with thresholds of p-value < 0.05, log2 fold change >0.25, and expression detected in >10% of cells.

Acquisition of sialylation-related genes

Sialylation-related genes were curated from the Molecular Signatures Database (MSigDB), as previously described44. The gene set encompasses key enzymes involved in sialylation metabolism, including sialyltransferases, transporters, and neuraminidases. His comprehensive set was used for downstream analyses, including the calculation of the sialylation signature score.

Cell annotation and classification

Cell type annotations were performed using established markers for various cell types, as follows: Hepatocytes: EPCAM, KRT18, KRT19, ALB; Fibroblasts: DCN, THY1, COL1A1, COL1A2; Endothelial cells: PECAM1, CLDN5, FLT1, RAMP2; T cells: CD3D, CD3E, CD3G, TRAC; NK cells: NKG7, GNLY, NCAM1, KLRD1; B cells: CD79A, IGHM, IGHG3, IGHA2; Plasma cells: JCHAIN; Myeloid cells: LYZ, MARCO, CD68, FCGR3A; Mast cells: KIT, MS4A2, GATA2. These markers were used for cell classification, and visualizations were created to ensure accurate cell type annotations.

CNV analysis of single-cell data

The InferCNV package was used to analyze CNV across the tumor cell subpopulations45. Fibroblasts and endothelial cells were selected as reference cell populations to help detect CNVs in tumor cells. The CNV scores were calculated using default settings in InferCNV and were visualized on a heatmap to identify malignant subpopulations exhibiting copy number alterations. A threshold of 0.3 was set for CNV significance to call a region as a gain or loss.

Pseudotime analysis of tumor cell differentiation

Pseudotime analysis was performed using the Monocle2 package (version 2.18.0)46. The DDRTree algorithm was employed for dimensionality reduction, and the default parameters were used for trajectory analysis. This method allowed us to map the differentiation trajectories of tumor cells, providing insights into tumor progression from early to late stages.

Transcription factor analysis in HCC tumor cells

The SCENIC package (version 1.1.2) was used to identify key transcription factors regulating gene expression in tumor cell subpopulations47. The analysis utilized the RcisTarget and GRNBoost motif databases for motif discovery. Transcription factor activity scores for each cell type were computed using the AUCell function in SCENIC. These scores allowed us to identify differentially active transcription factors across tumor cell subpopulations.

Cell-cell communication analysis

The CellChat package (version 1.1.2) was used to investigate potential cell-cell communication within the tumor microenvironment48. A normalized gene expression matrix was imported into CellChat, and communication interactions were analyzed using the functions identifyOverExpressedGenes, identifyOverExpressedInteractions, and ProjectData. Cell-cell communication networks were constructed using the aggregateNet function, and interactions between tumor cells and immune or stromal cells were identified.

Calculation of sialylation signature scores

Sialylation-related genes were used to calculate a sialylation signature score for each cell using the ssGSEA method implemented in the GSVA package (version 1.34.0)49. The sialylation signature score was calculated for each tumor cell, and its correlation with tumor progression and PGAM2 expression was assessed.

Immune infiltration analysis

Immune infiltration levels in high-risk and low-risk tumor groups were evaluated using three different algorithms: ESTIMATE, CIBERSORT, and xCell50. The IOBR package (version 1.0.0) was used for immune cell deconvolution, and the immune infiltration profiles of each group were visualized using heatmaps and bar plots51. The results were used to explore the relationship between immune cell composition and tumor progression.

Differential gene expression and pathway enrichment analysis

Differential gene expression between high-risk and low-risk groups was performed using the limma package (version 3.46.0)52. The threshold for differential expression was set to |log2FoldChange| > 1.5 and adjusted p-value < 0.05. Pathway enrichment analysis was performed using the clusterProfiler package (version 4.0) for KEGG and GO databases53. Enriched pathways were visualized using ggplot2 (version 3.3.5).

Genomic variation landscape analysis between risk groups

The genomic variation landscape was compared between high-risk and low-risk groups using the “maftools” package (version 2.8.0) in R54. This package was used to examine mutational burden and create mutation waterfall plots. The mutational profiles of the two risk groups were compared to identify significant differences in mutation frequencies.

Development of a prognostic model based on PGAM2 and sialylation-related genes

A prognostic model was developed using a set of 116 genes, including PGAM2 and sialylation-related genes, identified through single-factor Cox analysis55. The model was constructed using the LASSO (least absolute shrinkage and selection operator) regression combined with Cox regression (using glmnet package, version 4.1). The accuracy of the prognostic model was evaluated using the timeROC package, with the calculation of AUC values for 1-year, 3-year, and 5-year survival predictions.

Clinical sample collection and preparation

Tumor and adjacent normal tissues were collected from 15 liver metastasis patients undergoing surgery at Chongqing General Hospital between May 2019 and April 2024. Normal tissues were obtained at least 3 cm from the tumor margins. Immediately after excision, the samples were snap-frozen in liquid nitrogen and stored at −80 °C to preserve RNA integrity for subsequent analysis. Ethical approval was obtained from the Institutional Review Board (IRB) of Chongqing General Hospital, and informed consent was acquired from all participants.

RNA extraction and quantitative real-time PCR (qRT-PCR)

Total RNA was extracted from tissue and cell samples using TRIzol reagent (Invitrogen, USA) according to the manufacturer’s protocol. RNA quality was assessed using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, USA), with only samples having an A260/A280 ratio between 1.8 and 2.0 included. cDNA synthesis was carried out using 1 µg of RNA per sample with the PrimeScript RT Kit (Takara, Japan). AGRN expression was quantified by qRT-PCR using SYBR Premix Ex Taq (Takara, Japan) on the QuantStudio 5 PCR system (Applied Biosystems, USA). The following cycling conditions were used: 95 °C for 30 s, followed by 40 cycles of 95 °C for 5 s and 60 °C for 30 s. Relative gene expression was calculated using the 2−ΔΔCt method, normalized to GAPDH. All experiments were performed in triplicate.

Cultivation and validation of cell lines

Cell lines used in this study were authenticated via short tandem repeat (STR) profiling and routinely tested for mycoplasma contamination. Cells were cultured in Dulbecco’s Modified Eagle Medium (DMEM; Gibco, USA) supplemented with 10% fetal bovine serum (FBS; Gibco, USA) and 1% penicillin-streptomycin (Gibco, USA) at 37 °C in a 5% CO₂ incubator. Upon reaching ~80% confluence, RNA was extracted, and AGRN expression was evaluated by qRT-PCR as described for tissue samples. All experiments were conducted in triplicate.

Procedure for siRNA transfection

Small interfering RNAs (siRNAs) targeting AGRN and control siRNAs were synthesized by TsingKe Biotechnology and dissolved in nuclease-free water at a concentration of 10 µM. HCC cell lines (Hep 3B and HuH-6) were seeded in 6-well plates (2 × 10⁵ cells/well) and incubated overnight. For transfection, siRNAs (50 nM) were mixed with 5 µL of Lipofectamine 3000 (Invitrogen, USA) in Opti-MEM (Gibco, USA) and incubated for 15 min at room temperature before adding to the cells. After 6 h, the medium was replaced with fresh DMEM containing 10% FBS. Cells were harvested 48 h post transfection for RNA extraction and AGRN expression analysis by qRT-PCR. AGRN expression was reduced by over 70% compared to controls (p < 0.01, Student’s t-test). Each experiment was performed in triplicate.

Cell proliferation evaluation using CCK-8 assay

Cell proliferation was assessed using the Cell Counting Kit-8 (CCK-8; Dojindo, Japan). Cells were seeded in 96-well plates (3000 cells/well) and transfected with siRNAs as described earlier. Proliferation was measured at 24, 48, 72, and 96 h post transfection by adding 10 µL of CCK-8 reagent to each well and incubating for 2 h at 37 °C. Absorbance at 450 nm was measured using a Synergy H1 microplate reader (BioTek, USA). Data were presented as fold-change in absorbance relative to baseline (0-h) measurements, with five replicates per condition.

Measurement of cell apoptosis using flow cytometry

Apoptosis was analyzed 48 h post transfection using the Annexin V-FITC/PI apoptosis detection kit (BD Biosciences, USA). Cells were harvested, washed twice with PBS, and stained according to the manufacturer’s protocol. Flow cytometric analysis was conducted on a BD FACSVerse instrument to quantify early apoptotic (Annexin V+/PI−) and late apoptotic (Annexin V+/PI+) cells. Data were analyzed using FlowJo software (version 10.0).

Evaluation of cell migration and invasion

Cell migration was assessed using Transwell chambers (Corning, USA) with 8 μm pore size membranes. Cells (5 × 10⁴) in serum-free medium were seeded into the upper chamber, while the lower chamber contained medium with 10% FBS as a chemoattractant. After 24 h of incubation, non-migrated cells on the upper surface were removed, and migrated cells on the lower surface were fixed with 4% paraformaldehyde and stained with 0.1% crystal violet. Cells were counted from five random visual fields per chamber. For the invasion assay, Transwell membranes were pre-coated with Matrigel (BD Biosciences, USA) before cell seeding. Migration and invasion assays were repeated in triplicate.

Statistical analysis

All statistical analyses and visualizations were performed using R (version 4.1.3). Pearson’s correlation coefficient was used to assess correlations between continuous variables. Chi-square tests were applied to compare categorical variables, and Wilcoxon rank-sum tests were used for continuous variables. The survminer package was used to determine the optimal cutoff value for survival analysis. Cox regression and Kaplan–Meier survival analysis were performed using the survival package (version 3.2-13).