Introduction

Intrahepatic cholangiocarcinoma (ICCA), the second most prevalent primary liver cancer after hepatocellular carcinoma (HCC), has shown a rising global incidence and mortality in recent decades1,2. Due to the lack of prominent symptoms in the early stages, most ICCA cases are diagnosed at an advanced stage, often correlating with a poor prognosis. Even after radical surgery, the 5-year survival rate remains below 30%, with recurrence rates ranging from 60 to 70%3. The pathogenesis of ICCA is multifaceted, involving various genetic and epigenetic alterations as well as complex interactions within the tumor microenvironment (TME)4. For example, ICCA arising in the context of primary sclerosing cholangitis (PSC) is frequently associated with mutations in the KRAS and TP53 genes5. Chronic cholangitis is also a critical pathogenic factor in ICCA, as fibrosis and inflammatory responses around the bile ducts lead to biliary stricture and liver cirrhosis, significantly increasing the risk of ICCA6,7. Despite therapeutic options such as surgical resection, radiation, chemotherapy, and targeted therapies, their efficacy remains limited, especially in patients with advanced-stage ICCA8. These limitations underscore underscore the pressing need to uncover new prognostic indicators and innovative treatment modalities. Previous studies have identified multiple signaling pathways involved in ICCA pathophysiology, but the full complexity of its pathogenesis is still not fully understood9,10. Identifying prognostic biomarkers and key regulatory genes is essential for a deeper understanding of tumor progression and for the development of personalized treatment approaches11,12.

Lactate, once considered merely a metabolic by-product, is now recognized as a signaling molecule that links glycolysis with epigenetic regulation. Its accumulation in tumors, driven by enhanced glycolysis, contributes to tumor progression through histone lactylation—a novel post-translational modification that regulates gene expression13,14. In ICCA, lactate-induced histone acetylation has been implicated in promoting cell proliferation, immune evasion, and stromal remodeling15. Although the precise mechanisms of lactylation are not yet fully understood, growing evidence points to its role in metabolic–epigenetic crosstalk in tumors16.

The TME, comprising malignant cells, immune components, fibroblasts, and other stromal elements, plays a pivotal role in tumor progression17. Cancer-associated fibroblasts (CAFs), as key components of the tumor stroma, have a profound impact on neoplastic proliferation, metastatic behavior, and treatment resistance18. CAFs are heterogeneous, with distinct subpopulations exhibiting either pro- or anti-tumor effects19,20. Notably, antigen-presenting CAFs (apCAFs) can activate CD4⁺ T cells and influence immune responses21,22. Therefore, a comprehensive understanding of the diverse roles of fibroblasts in ICCA is crucial for developing innovative therapeutic approaches to combat this highly aggressive cancer. Recent pan-cancer single-cell transcriptomic analyses of basal cell carcinoma, melanoma, and head and neck squamous cell carcinoma have demonstrated that CAFs can be subdivided into multiple clusters through unsupervised clustering, revealing tumor type–specific subpopulations (such as the C0 subtype unique to basal cell carcinoma). This subtype exhibits invasive and destructive phenotypes that are strongly associated with cancer progression, further supporting the functional specificity of CAF subsets and their critical regulatory roles in tumor development23.

This study integrates single-cell sequencing data with TCGA transcriptomic profiles to investigate lactylation-related regulatory mechanisms and fibroblast subpopulations linked to ICCA prognosis. By employing cell pseudotime analysis, intercellular communication analysis, prognostic modeling, immune infiltration analysis, regulatory network construction, drug sensitivity analysis, and RT-qPCR validation, this study aims to identify novel therapeutic and prognostic targets for ICCA.

Results

Acquisition of 5 fibroblast subpopulations

Following initial screening, a total of 33,991 cells and 19,813 genes were obtained from the scRNA-seq data (Fig. 1A). After normalization and dimensionality reduction, 50 principal components (PCs) and 12 distinct cell subpopulations were identified (Fig. 1B). The 12 subpopulations were annotated into 8 distinct cell types, including fibroblasts (Fig. 1C). The histogram analysis indicated a higher prevalence of malignant cells/cholangiocytes in the ICCA group, whereas T cells and NK cells were more abundant in the normal group (Fig. 1C). The UMAP plot showed that fibroblasts were divided into five subpopulations, which were annotated as CAFs, myofibroblastic CAFs (mCAFs), vascular CAFs (vCAFs), inflammatory CAFs (iCAFs), and apCAFs24 (Fig. 1D).

Fig. 1
figure 1

(a) Single-cell dataset GSE138709 and the filtered results from GSE138709 (ICCA: n = 5, control: n = 3); (b) dimensionality reduction Jackstraw plot and Elbow plot; (c) cell clustering and annotation results, with a percentage bar chart showing the distribution of each cell type in different disease states; (d) fibroblast subpopulation analysis (CAF: n = 151; mCAFs: n = 137; vCAFs: n = 121; iCAFs: n = 54; apCAFs: n = 41).

Cell communication and identification of 906 DEGs1 in ApCAFs

To investigate the biological pathways through which ApCAFs function, enrichment analysis revealed that ApCAFs were predominantly associated with cytochrome c oxidase-related (COX) biological processes, among others (Fig. 2A). Cell–cell communication analysis demonstrated interactions across the five fibroblast subpopulations, with overall signaling intensity being significantly elevated in the ICCA group (Fig. 2B). Furthermore, ligand-receptor interaction analysis revealed a higher abundance of intercellular signaling events in ICCA samples compared to normal controls (Fig. 2C). Notably, the CXCL12-CXCR4 ligand-receptor pair between ApCAFs and vCAFs exhibited the most significant differential expression between the ICCA and normal groups (p < 0.01).

Fig. 2
figure 2

(a) Fibroblast subpopulation enrichment analysis results; (b) cell communication analysis between fibroblast subpopulations; (c) receptor-ligand bubble plot of cell communication; (d) volcano plot of DEGs2 (|log₂FC| > 1, p < 0.05).

A total of 906 differentially expressed genes (DEGs1) were identified between the ICCA and normal groups in ApCAFs (avg |log2 fold change (FC)| > 0.5 and p value < 0.05), with 860 upregulated and 46 downregulated genes (Fig. 2D).

Acquisition of 6790 DEGs2 in TCGA-ICCA and 588 module genes

Additionally, 6790 DEGs2 were identified between the ICCA and normal groups in TCGA-ICCA (|log2 FC| > 1 and p.adj < 0.05), with 4138 upregulated and 2652 downregulated genes (Fig. 3A).

Fig. 3
figure 3

(a) Differential gene density heatmap and DEGs1 volcano plot (p < 0.05 and |log₂FC| > 1); (b) survival difference analysis based on different LRG scores, (p = 0.017), log rank test; (c) sample hierarchical clustering analysis: no significant outliers in the samples; (d) soft-threshold selection; (e) co-expression module identification; (f) heatmap of module correlations with LRGs.

Survival analysis via Kaplan–Meier (KM) curves indicated significant survival differences in lactylation-related genes (LRGs) between the two scoring groups (cut-off value = 1.0404) (p < 0.05) (Fig. 3B). These findings provided a foundation for the subsequent construction of the weighted gene co-expression network analysis (WGCNA).

The sample clustering plot revealed no outlier samples in TCGA-ICCA for subsequent WGCNA (Fig. 3C). The R2 value was close to the threshold red line (0.9), and the average connectivity value approached zero, indicating the optimal soft threshold (β) of 5 based on the scale-free distribution map (Fig. 3D). WGCNA analysis identified 8 gene modules (Fig. 3E), with the red module showing the highest absolute correlation with LRGs scores (r = − 0.744, p < 0.001), thus being considered the key module. Finally, 588 genes from this module were identified as LRG-related genes (Fig. 3F).

Enrichment and PPI analysis in 14 candidate genes

Based on the Venn diagram analysis, 14 candidate genes were identified in this study (Fig. 4A). Gene ontology (GO) analysis revealed that the primary biological processes (BP) were related to the response to mitotic cell cycle phase transition in 156 processes; cellular components (CC) were mainly associated with microtubules in 13 components; and molecular functions (MF) were linked to tubulin binding in 19 functions (Fig. 4B). Additionally, Kyoto encyclopedia of genes and genomes (KEGG) pathway enrichment analysis revealed 22 pathways (p-value < 0.05), including the p53 signaling pathway (Fig. 4C). The STRING online network analysis tool assessed the interactions of the 14 candidate genes (confidence > 0.15), resulting in a protein-protein interaction (PPI) network with 12 nodes and 46 edges, including genes like STMN1, FAM72C, and TOP2A (Fig. 4D). Correlation analysis was performed between the 14 LRGs and CAF marker genes, and the results showed that most lactylation-related genes exhibited a positive correlation with CAF marker genes (Fig. 4E).

Fig. 4
figure 4

(a) Venn diagram of candidate genes; (b) GO enrichment dendrogram of candidate genes; (c) KEGG enrichment dendrogram of candidate genes; (d) PPI network construction; (d) correlation analysis between lactylation-related genes (LRGs) and CAF marker genes. (e) correlation heatmap between 14 lactylation-related genes (LRGs) and CAF marker genes.

Construction and validation of prognostic risk models

This study aimed to identify prognostic genes with predictive significance from the pool of candidate genes and to develop risk models using 36 ICCA samples from the TCGA-ICCA cohort. Univariate Cox regression analysis identified seven candidate genes significantly associated with patient survival (p < 0.2): TOP2A, STMN1, UBE2T, CENPF, KIF20A, C5orf34, and FAM72C (Table 1). Using LASSO regression analysis, five prognostic genes were selected (lambda.min = 0.03050406) from the 36 ICCA samples in TCGA-ICCA: STMN1, UBE2T, CENPF, C5orf34, and FAM72C (Fig. 5A). The correlation coefficients of these five prognostic genes with ICCA were further analyzed using multifactorial Cox regression (Table 2). Both TCGA-ICCA and GSE107943 datasets revealed that these five genes were highly expressed in the ICCA group, as shown in the box-and-line plot (Fig. 5B).

Table 1 Univariate Cox regression of candidate genes associated with overall survival in ICCA (TCGA-ICCA cohort, n = 36). HR, hazard ratio; PH, P value of the proportional hazards test.
Fig. 5
figure 5

(a) LASSO regression analysis results; (b) expression of prognostic genes in the TCGA dataset, (p < 0.0001), Wilcoxon rank-sum test; (c) grouping of high-risk and low-risk groups, with survival status plots; (d) heatmap of prognostic gene expression; (e) survival curve of the training set (p < 0.033), log rank test; (f): ROC curve of the prognostic model in the training set.

Table 2 Multivariate Cox regression of five prognostic genes in ICCA (TCGA-ICCA cohort). Coef, regression coefficient; PH Test, P value of the proportional hazards test.

The risk score algorithm was defined as follows: risk score = STMN1 × (0.1991) + UBE2T × (0.2246) + CENPF × (− 0.0374) + C5orf34 × (0.5614) + FAM72C × (0.8908). As the risk score increased, the number of deaths significantly rose, as evidenced by the scatter plots (Fig. 5C). Furthermore, high expression of the five prognostic genes was observed in the high-risk group (Fig. 5D). KM survival analysis revealed significantly worse outcomes for patients in the high-risk group (Fig. 5E). The predictive accuracy of the risk model was supported when the area under the curve (AUC) at 1-, 2-, 3-, and 5-year time points exceeded 0.6 (Fig. 5F). The comparative analysis of the results from 1000 bootstrap resampling validations and the AUC of the original model showed that the risk model had good reliability. No Obvious model bias or overfitting phenomenon was identified through bootstrap validation, and the model could be effectively used for prognostic risk stratification of iCCA patients (Supplementary table S4). The Moel demonstrated consistent prognostic performance in an independent cohort of 30 ICCA samples, stratified into two risk groups using a cut-off value of 2.9705 (Supplementary Fig. S2), confirming the model’s stability and its potential for forecasting clinical outcomes in patients with ICCA.

GSEA and GSVA enrichment analysis in 5 prognostic genes

To explore the biosignaling pathways of the five prognostic genes, Gene Set Enrichment Analysis (GSEA) revealed enrichment in the cell cycle pathway. Notably, C5orf34, CENPF, FAM72C, and STMN1 were enriched in pathways associated with Parkinson’s disease and oxidative phosphorylation (Fig. 6A).

To understand the differential activation pathways of the five prognostic genes between the two risk groups, GSVA enrichment analysis indicated that genes associated with downregulated HALLMARK pathways were primarily involved in inflammatory response processes. In contrast, genes linked to upregulated HALLMARK pathways showed significant enrichment in the E2F target pathways (Fig. 6B). Similarly, downregulated KEGG pathways were mainly involved in leukocyte transendothelial migration, while upregulated KEGG pathways were enriched in DNA replication (Fig. 6C).

Fig. 6
figure 6

(a) GSEA plots of representative HALLMARK and KEGG pathways between the high- and low-risk groups; (b) HALLMARK gene set enrichment analysis between the high- and low-risk groups; (c) KEGG pathway enrichment analysis between the high- and low-risk groups.

Analysis of prognostic genes with differential immune cells

The immune microenvironment alterations associated with ICCA were further explored25. In the analysis of 36 ICCA samples from the TCGA-ICCA cohort, immune cell infiltration was notably lower in the high-risk group (Fig. 7A). Wilcoxon analysis identified ten immune cell populations with differential infiltration (p < 0.05), including activated B cells (Fig. 7B). Additionally, spearman correlation analysis revealed a negative correlation between FAM72C and plasmacytoid dendritic cells (r = − 0.355, p < 0.05), as well as between CENPF and mast cells (r = − 0.386, p < 0.05) (Fig. 7C).

Fig. 7
figure 7

(a) Heatmap of immune cell infiltration in high- and low-risk groups based on ssGSEA scores.; (b) Boxplots comparing immune cell infiltration between high- and low-risk groups (Wilcoxon rank-sum test, p < 0.05); (c) correlation heatmap between prognostic genes and infiltrating immune cell subtypes.

Constructing TF-prognostic genes and genemania network to explore molecular regulatory mechanisms

To explore the molecular regulatory mechanisms underlying the five prognostic genes, 25 transcription factors (TFs) were predicted for C5orf34, CENPF, UBE2T, and STMN1, with CREB1 and FOXC1 being common to all four genes (Fig. 8A). Furthermore, the genemania network predicted 20 genes related to the function of the prognostic genes, including FANCL (Fig. 8B).

Fig. 8
figure 8

(a) TF-regulatory network of prognostic genes; (b) GeneMANIA analysis.

Ten drugs were obtained based on the drug sensitivity analysis

Drug sensitivity analysis of the 36 ICCA samples from TCGA-ICCA revealed significant differences in the IC50 values of 10 chemotherapeutic agents between the two risk groups (p < 0.05) (Fig. 9). Specifically, seven drugs—including AZD6482, BX-798, Bicalutamide, CHIR-99,021, GDC0941, PAC-1, and SB216763—showed significantly higher IC50 values in the high-risk group compared to the low-risk group. In contrast, three drugs—CCT018159, GW.441,756, and S. trityl. L. cysteine—demonstrated significantly lower IC50 values in the high-risk group (Fig. 9).

Fig. 9
figure 9

Drug sensitivity analysis (p < 0.05), Wilcoxon rank-sum test.

Cell trajectory analysis of fibroblasts

This study revealed differences in cell differentiation times within fibroblasts and the specific States they occupy (Fig. 10A). In the graph, color intensity reflected the timing of cell differentiation, with darker hues indicating earlier differentiation stages. Based on the distribution patterns of the five prognostic genes, it was observed that C5orf34 was predominantly located in State1, while CENPF and FAM72C were mostly found in State4 (Fig. 10B). As shown in Fig. 10C, State1 was enriched in biological pathways related to ECM organization, assembly of extracellular structures, and the development of external encapsulating frameworks. In contrast, State4 was mainly associated with cellular detoxification processes and responses to toxic compounds. Finally, the Temporal distribution scatter plot revealed that ApCAFs and CAFs were primarily located in the early stages of cell differentiation, while vCAFs were mainly distributed in the later stages (Fig. 10D).

Fig. 10
figure 10

(a) Fibroblast pseudotime analysis; (b) pseudotime distribution of prognostic genes; (c) enrichment analysis of pseudotime analysis; (d) spatiotemporal distribution of fibroblast subpopulations.

Expression validation analysis

Gene expression validation revealed distinct patterns among the prognostic markers (STMN1, UBE2T, CENPF, C5orf34, and FAM72C) in the control and ICCA groups. The results indicated a upregulation of STMN1, UBE2T, CENPF, and FAM72C in the ICCA group compared to the controls (Fig. 11A, C and E). Conversely, C5orf34 exhibited higher expression levels in the control group (Fig. 11D). Notably, significant differential expression of STMN1, UBE2T, and FAM72C was observed between the two groups (p < 0.05) (Fig. 11A and B, and 11E). The expression patterns of STMN1, UBE2T, CENPF, and FAM72C were consistent with the results from the Wilcoxon rank sum test, providing valuable insights for future research.

Fig. 11
figure 11

(ae) Clinical sample qPCR results of prognostic genes (p < 0.05), Student’s t test.

Discussion

ICCA is a highly aggressive malignancy originating from the epithelial lining of the bile ducts within the liver. It is typically characterized by delayed diagnosis, rapid disease progression, and poor clinical prognosis. While several risk factors, including liver fluke infection, chronic biliary inflammation, and viral hepatitis, have been identified, the molecular pathogenesis of ICCA remains incompletely understood11,18. Thus, elucidating the molecular mechanisms driving ICCA progression and identifying reliable prognostic biomarkers are critical for improving clinical outcomes. This study integrated single-cell RNA sequencing with bulk transcriptomic data to comprehensively analyze LRGs and fibroblast subpopulations in ICCA, providing new insights into tumor biology, the immune microenvironment, and potential therapeutic strategies.

Our analysis identified five prognostic genes, one of which is STMN1, a cytoplasmic phosphoprotein that regulates microtubule dynamics. Upregulation of STMN1 has been linked to malignant behavior and poor prognosis in various cancers26. Additionally, STMN1 is overexpressed in HCC, correlates with clinicopathological features, and affects patient outcomes27. Moreover, silencing circRNA cPKM has been shown to reduce TGFB1 release and mesenchymal fibrosis in ICCA cells, inhibit STMN1 expression, and suppress ICCA growth and metastasis while overcoming paclitaxel resistance28. Previous studies have demonstrated that histone lactylation (H3K18la) can specifically bind to the promoters of M-phase cell cycle genes to enhance transcription29. As a core regulatory factor of the cell cycle, the high expression of STMN1 in ICCA may depend on the epigenetic activation mediated by H3K18la through the recruitment of P300 to form a transcriptional complex—a mechanism that has been validated in pancreatic cancer. Our PCR results further corroborate the significant upregulation of STMN1 in ICCA samples, reinforcing its pivotal role in HCC diagnosis and prognosis. Therefore, STMN1 emerges as a potential diagnostic and prognostic biomarker for HCC and a promising target for immunotherapy.

CENPF is a microtubule-associated protein that dynamically localizes during the cell cycle. It first appears in the nuclear matrix during the G2 phase, then redistributes to the spindle midzone and midbody during anaphase and telophase, playing a pivotal role in chromosome segregation and cytokinesis30. Recent studies have shown that CENPF may activate TERT transcription by regulating histone H3 methylation31. Given the regulatory crosstalk between histone lactylation and methylation, H3K18la can competitively occupy histone modification sites, suggesting that it may modulate this process to accelerate cell cycle progression and enhance CENPF expression32 In various cancers, including breast and prostate cancer, aberrantly high CENPF expression is associated with poor prognosis and promotes malignant traits such as increased proliferation, metastasis, apoptosis resistance, and immune evasion33,34. Although no direct evidence has yet established CENPF’s role in ICCA, our study demonstrates significantly elevated CENPF expression in ICCA samples compared to controls. CENPF overexpression may contribute to ICCA proliferation, invasion, and metastatic potential by accelerating cell cycle progression, enhancing chromosomal instability, and modulating associated signaling pathways.

UBE2T, a ubiquitin-conjugating enzyme, plays a critical role in the initiation and progression of various cancers. As a member of the ubiquitin-conjugating enzyme family, UBE2T expression is modulated by a lactylation-regulated histone modification network, which contributes to its pivotal role in the development and progression of multiple malignancies35. In breast, ovarian, and cervical cancers, upregulation of UBE2T enhances cancer cell invasion and metastasis. Specifically, UBE2T overexpression in breast cancer promotes migration and invasion of cancer cells36. In ovarian cancer, its upregulation is associated with poor prognosis and drives malignant progression37. Previous studies have demonstrated a direct regulatory link between UBE2T and glycolytic metabolism, showing that UBE2T promotes glucose uptake, lactate production, and ATP generation in breast cancer cells by modulating the PI3K/AKT signaling pathway. This directly enhances tumor cell glycolytic activity and consequently accelerates malignant phenotypic progression38. In the context of ICCA, UBE2T expression was significantly higher in ICC tissues compared to those from benign biliary conditions, with overexpression linked to an unfavorable prognosis in patients with ICCA39. Our study corroborates these findings, showing significant UBE2T elevation in the ICCA group, providing a valuable reference for future research.

FAM72C, a protein-coding gene, has been identified as a poor-prognosis predictor in several cancer types40. Overexpression of FAM72A predicts poor prognosis in lung adenocarcinoma and contributes to tumorigenesis and progression in various cancers by influencing cell proliferation and genomic instability40. Although the role of FAM72A in ICCA has not been previously documented, our experimental results demonstrate significant upregulation of FAM72A in the ICCA group. This suggests that FAM72A may be involved in ICCA tumorigenesis and progression, highlighting the need for further investigation into its underlying mechanisms and clinical relevance.

C5orf34 is notably overexpressed in many malignancies compared to normal tissues, with elevated levels linked to poor patient prognosis. It has been shown to regulate the immune microenvironment in several cancers41. C5orf34 may promote lung cancer development by regulating gene expression and signaling pathways related to cell proliferation, such as the MAPK pathway41. Although no studies have explored the role of C5orf34 in ICCA, our findings suggest higher expression of C5orf34 in the control group, possibly indicating a tumor-suppressive role in ICCA or a negative correlation with ICCA development. The tumor-suppressive function of C5orf34 may be inhibited by lactylation-mediated transcriptional repression. The precise mechanisms and clinical significance of C5orf34 in ICCA warrant further exploration. Based on the optimal cutoff value of the risk score, ICCA patients in the TCGA-ICCA and GSE107943 datasets were divided into high-risk and low-risk groups. The analysis revealed that patients with higher risk scores had a greater number of deaths and shorter overall survival times. Further Kaplan–Meier survival curve analysis indicated that, in both datasets, the high-risk group exhibited significantly poorer overall survival (OS) compared with the low-risk group. Collectively, these findings demonstrate that the prognostic risk model constructed from these five genes provides strong predictive value for the prognosis and survival outcomes of ICCA patients, further supporting its potential as a prognostic biomarker for ICCA.

To elucidate the biological significance of these genes, GSEA analysis was performed. STMN1, UBE2T, CENPF, C5orf34 and FAM72C were co-enriched in the cell cycle and oxidative phosphorylation pathways, suggesting that these genes may synergistically regulate core processes of cell proliferation and metabolic reprogramming, thereby jointly driving ICCA development and progression. This finding provides critical clues for understanding the molecular pathogenesis of ICCA. Cell cycle dysregulation represents a hallmark of uncontrolled tumor proliferation, and aberrant activation of this pathway has been recognized as a key molecular basis of ICC progression42,43. Among these genes, STMN1 has been reported to promote proliferation in multiple cancers by regulating the G2/M transition44; its overexpression in ICCA may accelerate cell cycle progression by destabilizing microtubules. UBE2T, as a ubiquitin-conjugating enzyme, mediates the ubiquitination and degradation of key cell cycle regulators, facilitating the G1/S transition. It has been shown to drive aberrant cell cycle progression in hepatocellular carcinoma45, and its high expression in ICCA has been associated with poor prognosis39, further supporting its role in ICCA cell cycle control. CENPF, localized to the centromere–kinetochore complex, ensures proper chromosome segregation and cytokinesis, and its aberrant expression in cholangiocarcinoma has been shown to promote tumor cell proliferation46. C5orf34, though less studied, has been reported to affect cell cycle progression in lung cancer by regulating the MAPK signaling pathway47; our findings suggest it may participate in ICCA cell cycle regulation through similar mechanisms. FAM72C, a member of the FAM72 family, shares homology with FAM72A, which has been shown to promote cell cycle dysregulation by enhancing genomic instability48. Thus, FAM72C may exert conserved functions in regulating ICCA cell cycle progression, a hypothesis that warrants further functional validation.

Oxidative phosphorylation (OXPHOS) serves as a core pathway of cellular energy metabolism. Its aberrant remodeling plays a key role in ICC metabolic reprogramming, enabling tumor cells to balance energy production and reactive oxygen species (ROS) generation under stress conditions, thereby supporting malignant proliferation49. Loss of STMN1 has been shown to impair autophagy and induce mitochondrial morphological abnormalities50. UBE2T regulates tumor cell OXPHOS levels by mediating the ubiquitination of mitochondrial proteins, influencing invasive potential51. C5orf34 affects tumor progression via the MAPK signaling pathway, which is tightly linked to mitochondrial function and OXPHOS regulation, implying that it may serve as a signaling bridge between these two essential pathways48,52. Although FAM72C has not yet been directly associated with OXPHOS regulation, other family members have been implicated in tumor metabolic reprogramming53, suggesting potential cooperative roles in ICCA energy metabolism.

In summary, STMN1, UBE2T, CENPF, C5orf34, and FAM72C may jointly accelerate tumor cell proliferation by co-activating cell cycle pathways, while simultaneously optimizing energy supply through OXPHOS regulation. Together, these mechanisms drive ICCA progression and reveal functional interconnections among these genes. Further in vitro and in vivo studies are required to delineate the precise molecular mechanisms underlying their regulation of the cell cycle and OXPHOS pathways, as well as to evaluate their potential as therapeutic targets in ICCA.

Although direct evidence of oxidative phosphorylation in ICCA is limited, its established role in other malignancies suggests its potential involvement in ICCA pathophysiology. For instance, studies have demonstrated that mitochondrial reprogramming in cancer cells supports stemness, invasiveness, and immune evasion. Thus, the enrichment of prognostic LRGs in oxidative phosphorylation pathways implies that these genes may regulate mitochondrial function, sustaining tumor growth and contributing to therapeutic resistance in ICCA. Similarly, the association with the Parkinson’s disease pathway, which shares common molecular features such as mitochondrial dysfunction and protein aggregation, indicates altered metabolic and proteostatic regulation in ICCA cells.

In addition to metabolic changes, this study examined the immune microenvironment through immune infiltration analysis. Notably, CENPF was negatively correlated with mast cell infiltration. Traditionally known for their role in allergic reactions, mast cells are now recognized as key immune components in the TME. In ICCA, tumor cells and cholangiocytes secrete stem cell factor (SCF), which binds to c-KIT receptors on mast cells, thereby recruiting them into the tumor environment. Activated mast cells release pro-inflammatory and pro-angiogenic mediators, such as histamine, which promote tumor progression and neovascularization54,55. The inverse correlation between CENPF expression and mast cell infiltration suggests that CENPF may inhibit SCF expression or disrupt c-KIT signaling, thereby limiting mast cell-mediated immunomodulation. This finding implicates CENPF in shaping the immune microenvironment of ICCA and potentially mediating immune escape.

This study further investigated transcriptional regulatory mechanisms and identified CREB1 and FOXC1 as potential common TFs associated with the five prognostic genes. CREB1 (cyclic AMP response element-binding protein 1) is a widely studied TF that plays a pivotal role in regulating cell growth, survival, and metabolic processes through the cAMP/PKA signaling cascade. It has been implicated in the pathogenesis of several malignancies, including HCC and pancreatic adenocarcinoma, and is known to regulate genes involved in glycolysis and angiogenesis8,9. FOXC1, a member of the Forkhead box (FOX) family, is essential for epithelial-mesenchymal transition (EMT), stemness, and tumor invasiveness. It has been shown to contribute to malignant progression and correlate with poor clinical outcomes in breast cancer, hepatic carcinomas, and gliomas1,10,56. While their roles in ICCA remain incompletely understood, the association between LRGs and these TFs suggests that they may function as downstream effectors of lactylation-mediated transcriptional programs, thereby contributing to ICCA development. Further experimental validation of these regulatory networks may reveal novel targets for transcription-based therapies.

Drug sensitivity prediction analysis identified ten differential compounds between the high- and low-risk groups, including AZD6482, BX-795, Bicalutamide, CCT018159, CHIR-99,021, GDC0941, GW-441,756, PAC-1, S-Trityl-L-cysteine, and SB-216,763. Among them, several agents demonstrated strong potential therapeutic relevance to ICCA. AZD6482, a selective PI3Kβ inhibitor, has been reported to suppress tumor growth and sensitize cancer cells to chemotherapeutic agents through inhibition of the PI3K/AKT signaling cascade, which is frequently hyperactivated in ICCA3,11. CHIR-99,021, a potent and selective GSK-3β inhibitor, modulates Wnt/β-catenin signaling and promotes mitochondrial homeostasis, suggesting a potential role in targeting metabolic reprogramming and cell cycle dysregulation in ICCA. PAC-1, a procaspase-3 activator, induces apoptosis by directly activating executioner caspases, representing a promising approach to overcome apoptosis resistance—a well-recognized hallmark of ICCA. GDC0941, another PI3K pathway inhibitor, has shown preclinical efficacy in biliary tract cancers by attenuating cell proliferation and survival signaling.

Collectively, these findings highlight that agents targeting the PI3K/AKT, Wnt/β-catenin, and apoptosis-related pathways may exhibit therapeutic potential for ICCA patients with high-risk gene signatures. The identification of these compounds provides a foundation for future preclinical validation and may contribute to the development of precision therapeutic strategies in ICCA.

This study integrated differential expression analysis, Cox regression, and single-cell sequencing data to identify and validate a set of potential prognostic genes in ICCA that are associated with histone lactylation and fibroblast activity. Functional enrichment analyses revealed that these genes synergistically participate in cell cycle regulation and oxidative phosphorylation pathways, elucidating the core molecular mechanisms underlying ICCA malignancy and highlighting their potential for clinical translation. Emerging evidence indicates that cancer-associated fibroblasts (CAFs) remodel the tumor microenvironment by secreting lactate and other metabolites57, while lactate-derived histone lactylation serves as an epigenetic bridge linking metabolic states to gene expression. Therapeutic strategies targeting CAFs—such as inhibition of CAF-derived exosomes58—or the use of oxidative phosphorylation inhibitors like metformin59 have shown promising potential in ICCA treatment. Therefore, the findings of this study not only provide novel prognostic biomarkers for ICCA but also lay a theoretical foundation for the development of precision therapeutic strategies targeting the metabolic–epigenetic crosstalk in the ICCA microenvironment. However, this study has limitations. First, most of the data were obtained from public databases, in which the number of ICCA samples was relatively small. This limitation may have resulted in insufficient statistical power, potentially affecting the stability and external generalizability of the prognostic model and hindering a comprehensive representation of the molecular heterogeneity of ICCA. Future studies should integrate multicenter and large-sample datasets to expand the analytical scope, enhance the reliability of statistical outcomes, and more precisely validate the clinical relevance and regulatory mechanisms of the identified prognostic genes. Second, the number of clinical samples was limited, necessitating further validation through additional research. In conclusion, the identification of LRGs and CAF subpopulations in ICCA offers new insights into the pathogenesis of the disease and potential therapeutic strategies. Further validation in clinical settings is essential to confirm the relevance and utility of these findings in patient management.

Materials and methods

Data collection

Single-cell RNA sequencing data (GSE138709, platform: GPL20795) from samples of patients with ICCA were retrieved from the Gene Expression Omnibus (GEO) repository (https://www.ncbi.nlm.nih.gov/geo/), which included five ICCA tumor specimens and three adjacent non-tumorous tissue samples. Transcriptomic data, clinical characteristics, and survival outcomes of TCGA-ICCA cases were obtained from The Cancer Genome Atlas (TCGA) platform database (https://portal.gdc.cancer.gov/). This dataset contained 36 ICCA tumor tissues and 9 adjacent non-cancerous liver tissue samples as controls. Additionally, GSE107943 (GPL18573) included 30 ICCA tumor samples and 27 adjacent normal tissue samples (Clinical information was referred to Supplementary Table 1). Furthermore, 22 genes associated with histone lactylation were curated from prior publications56.

Processing of the scRNA-seq data

Quality control of the GSE163558 dataset was performed using the Seurat package (version 5.0.1)60. Cells were excluded if they expressed fewer than 200 genes, had fewer than three genes detected per cell, had total gene counts below 200 or above 6000, exhibited transcript counts exceeding 40,000, or had mitochondrial gene content greater than 15%. Additionally, highly variable genes were identified using the variance-stabilizing transformation (VST) approach (Supplementary Fig. S1). A scree plot was generated using Seurat to visualize the proportion of variance attributed to the top-ranking PCs in the dataset. After performing PCA for dimensionality reduction (Supplementary Fig. S1), cell clustering was achieved using the FindNeighbors and findclusters algorithms (resolution set to 0.2), followed by visualization using the uniform manifold approximation and projection (UMAP) method. Distinct cellular subsets were identified based on canonical marker gene profiles and established classification frameworks. Subsequent analyses focused on fibroblasts, given their recognized functional significance in ICCA development and progression61,62,63. The RunPCA function was applied to fibroblasts for dimensionality reduction, and clustering analysis was performed on the resulting PCs using FindNeighbors and findclusters (resolution = 0.8). The fibroblast subpopulation results were annotated according to previous literature24. The abundance of CAFs in ICCA has been increasingly recognized for its dual role in prognosis and therapy64. Specifically, ApCAFs, which stimulate CD4⁺ T cells via antigen-dependent mechanisms, have been shown to possess Immunomodulatory capabilities65. Therefore, the present study aimed to identify differential genes associated with ApCAFs for further analysis.

Functional enrichment analysis of cell clusters in GSE138709 was conducted using the ReactomeGSA package (version 1.12.0) for pathway enrichment analysis66. Additionally, CellChat (v1.1.1) was employed to infer and map intercellular communication networks67, and intercellular ligand-receptor interactions were analyzed using the celltalker (v 0.0.7.9000) package25. Stronger potential intercellular interactions were indicated by thicker lines between ligands and receptors in the network.

Identification of single-cell and TCGA-ICCA differentially expressed genes (DEGs)

In the GSE138709 dataset, the Seurat (v5.0.1) FindMarkers function was used to identify DEGs1 in apCAFs by comparing ICCA tissues with adjacent normal tissues, applying a significance threshold of |log2 fold change| > 0.5 and p < 0.05. To further investigate gene expression differences in the TCGA-ICCA cohort, the DESeq2 package (v1.38.0)68 was utilized to detect DEGs2 with an adjusted p-value < 0.05 and |log2FC| > 1. Visualization of DEGs1 and DEGs2 was conducted using the ggpubr package (v3.3.6)69 to generate volcano plots, while the top 10 DEGs2 with the highest expression variability were illustrated through a heatmap created by the ComplexHeatmap package (v2.14.0)70.

Screening of LRGs-related module genes

In the TCGA-ICCA cohort, LRG scores were calculated using the single-sample GSEA (ssGSEA) algorithm in the GSVA package (v1.46.0)71. An optimal threshold was determined based on the distribution of these scores to stratify the samples into high- and low-score groups. The prognostic significance of LRGs was assessed by constructing KM survival curves for the two groups using the survminer package (v0.4.9)72.

To further explore gene co-expression, the WGCNA package (v1.71)73 was used to build co-expression networks and identify gene modules highly correlated with ICCA. Hierarchical clustering based on Euclidean distance of gene expression data was performed to identify and exclude outlier samples. The β value was selected by identifying the point where the scale-free topology fit index (R2) approached 0.90, ensuring optimal network construction. Gene adjacency was computed to assess topological overlap, which was then transformed into a dissimilarity matrix to construct a gene clustering dendrogram. Gene co-expression modules were delineated using the dynamic tree cut method with parameters set to a minimum module size of 300 genes and a module merging threshold (cut height) of 0.25. A correlation analysis was performed based on ssGSEA scores to illustrate the relationship between LRG scores and the modules. A correlation matrix was generated to quantify the associations between module eigengenes and LRG scores, and the results were visualized in a heatmap. Modules showing the strongest absolute correlation with LRG scores (p < 0.05) were designated as key modules, and the genes within these modules were defined as module-associated genes.

Identification of candidate genes and pathway analyses

Candidate genes were identified by intersecting DEGs1, DEGs2, and genes from WGCNA-identified modules. Functional enrichment analysis was performed using the clusterProfiler package (v4.7.1.3)74, incorporating GO terms and KEGG75,76 pathway data, with gene annotations sourced from the org.Hs.eg.db database (v3.16.0). A p-value threshold of < 0.05 was applied to determine statistical significance. Enrichment results were visualized using the enrichplot package (v1.18.0)77. To investigate potential protein-level interactions, a PPI network was constructed using data from the STRING database (https://string-db.org) with a minimum interaction confidence score of 0.15. The final PPI network was visualized using Cytoscape software (v3.5.2). To identify the relationship between the expression of lactylation-related genes (candidate genes) and CAF marker genes, and thereby further verify the mechanistic link between lactylation and CAF heterogeneity, the Spearman correlation coefficients between the candidate genes and 12 CAF marker genes were calculated using the Hmisc package78 (v 5.0.1) (|cor| >0.3, p < 0.05).

Screening of prognosis genes

To identify prognostic genes with clinical relevance, clinical expression data for candidate genes were analyzed in 36 ICCA samples from the TCGA-ICCA cohort. Univariate Cox proportional hazards (PH) regression was first performed using the survival package (v3.5.3)79, with a significance threshold of p < 0.2. The PH assumption was assessed (p > 0.05) to confirm the suitability of the model. Subsequently, least absolute shrinkage and selection operator (LASSO) regression (family = “Cox”) was applied using the glmnet package (v4.1.4)80 to refine the selection of prognostic variables. Multivariate Cox regression analysis was then performed to identify genes independently associated with clinical outcomes in ICCA.

The differential expression of these prognostic genes between ICCA and normal liver tissues in the TCGA-ICCA dataset was analyzed using the Wilcoxon rank-sum test (p < 0.05), and the results were visualized using boxplots generated with the ggplot2 package (v3.4.1)81.

Prognostic modeling and assessment

Based on these findings, a risk score was calculated for each patient using the following formula:

$${\text{Riskscore~ = ~}}\mathop \sum \limits_{{{\text{i~ = ~1}}}}^{{\text{n}}} {\text{coef}}\left( {{\text{gene}}_{{\text{i}}} } \right){\text{*expr}}\left( {{\text{gene}}_{{\text{i}}} } \right)$$

In this formula, “risk score” represents the composite prognostic score, where “coef” refers to the regression coefficient of each prognostic gene, and “expr” corresponds to the gene’s expression level. Using the calculated risk scores, the 36 TCGA-ICCA samples were classified into high- and low-risk groups based on an optimal cut-off point. To evaluate the prognostic model’s effectiveness, risk score distribution, survival status, and expression patterns of the selected prognostic genes were compared between the two subgroups. KM survival curves were generated using the survminer package (v0.4.9) to assess differences in overall survival between the two risk groups. Model performance was further assessed by constructing receiver operating characteristic (ROC) curves with the survivalROC package (v1.18.0)82. For external validation, risk scores for 30 ICCA samples in the GSE107943 dataset were computed and similarly divided into high- and low-risk groups using the same cut-off value. The robustness of the model was validated through analyses of risk score distribution, survival outcomes, gene expression profiles, KM survival comparisons, and ROC curve-based predictive evaluation.

Gene set enrichment analysis (GSEA) and gene set variation analysis (GSVA) of prognostic genes

To further explore functional and pathway-level distinctions associated with prognostic genes in ICCA, GSEA was conducted using the clusterProfiler package (v4.7.1.3), referencing the C2: KEGG curated gene sets from the Molecular Signatures Database (MSigDB) (https://www.gsea-msigdb.org/gsea/msigdb), with a significance threshold of p < 0.05. Following this, gene set variation analysis (GSVA) was performed on 36 TCGA-ICCA samples using the GSVA package (v1.46.0)83 to identify differentially enriched pathways between high- and low-risk groups, based on HALLMARK and KEGG gene sets, with criteria set to |t| > 2 and p < 0.05.

Immune microenvironment analysis

The ssGSEA method was applied to quantify the infiltration levels of 28 immune cell types within the high- and low-risk groups. Statistical comparisons of immune cell infiltration between these groups were conducted using the Wilcoxon rank-sum test, and the results were visualized through boxplots. Immune cell subsets showing significant differences between groups were defined as differentially infiltrated immune cells. To investigate potential relationships between prognostic genes and these differential immune cell subsets, Spearman correlation analysis was performed using the psych package (v3.4.4)84, and the correlation results were visualized as a heatmap.

Construction of prognostic gene regulatory networks

To examine the upstream regulatory mechanisms of the prognostic genes, TFs potentially regulating these genes were predicted via the NetworkAnalyst platform (https://www.networkanalyst.ca/). The resulting TF–gene interaction network was then assembled and visualized using Cytoscape software (v3.10.0)85.

The GeneMANIA database (http://www.genemania.org) was used to identify genes functionally associated with the prognostic genes. A network of interactions between the prognostic genes and their predicted associated genes was mapped through the platform.

Analysis of drug sensitivity

In this study, the pRRophetic package (v0.5)86 was used to predict the half-maximal inhibitory concentrations (IC50) of standard chemotherapeutic drugs across 36 ICCA samples from the TCGA-ICCA dataset. Differences in estimated drug sensitivities between high- and low-risk groups were assessed for statistical significance using the Wilcoxon rank-sum test (p < 0.05). Drug sensitivity results were visualized using HIPLOT (https://hiplot.com.cn/).

Cell trajectory analysis

To investigate the mechanism of prognostic genes within fibroblasts, fibroblasts were classified into distinct subpopulations through UMAP downscaling. Additionally, the Monocle package (v2.26.0)87 was employed to study fibroblast differentiation and the distribution and enrichment of prognostic genes.

Reverse transcription-quantitative polymerase chain reaction (RT-qPCR)

Total RNA was isolated from 10 tissue specimens using TRIzol reagent: samples 1–5 were ICCA tumor tissues and samples 6–10 were matched adjacent normal tissues. All specimens were obtained from patients who underwent curative (R0) resection; the tumor/adjacent pairs for cases 1–5 derived from a 70-year-old woman, a 68-year-old man, a 62-year-old man, a 54-year-old man, and a 63-year-old woman, respectively. All tissues were freshly resected intraoperatively rather than archived. The study was conducted in accordance with institutional ethical standards, and written informed consent was obtained from all participants. This research was approved by the Ethics Committee of Kunming Medical University (approval No. PJ-2021-216). All samples were collected at the Department of Hepatobiliary Surgery, Second Affiliated Hospital of Kunming Medical University, with ethical approval from the institution’s Ethics Committee. Informed consent was obtained from all patients, and all experiments were conducted in accordance with relevant guidelines and the Declaration of Helsinki. RNA extraction followed the manufacturer’s instructions, and RNA concentration and purity were assessed using a NanoPhotometer N50 (Implen), with 1 µL of RNA used for quantification to determine the input amount for reverse transcription.

Complementary DNA (cDNA) synthesis was carried out using the SweScript First Strand cDNA Synthesis Kit (QP056, GeneCopoeia), following the manufacturer’s protocol. The synthesized cDNA was diluted 5- to 20-fold with RNase-/DNase-free distilled water. For quantitative PCR (qPCR) reactions, 3 µL of the diluted cDNA was mixed with 5 µL of 2 × Universal Blue SYBR Green qPCR Master Mix (NO.11141ES, YEASEN) and 1 µL each of forward and reverse primers (10 µM). qPCR amplification was performed on a Bio-Rad CFX96 Real-Time PCR Detection System, using 40 cycles under conditions specified in Supplementary Table S1. Primer sequences are provided in Supplementary Table S3. GAPDH was used as the reference gene, and the ΔCt method was employed for basic data processing. The calculation formula for ΔCt was ΔCt = Ct (target gene) − Ct (GAPDH). Furthermore, the ΔΔCt method was adopted to calculate the relative expression levels of the target genes in treatment groups relative to those in control groups, with the formula ΔΔCt = ΔCt (treatment group) − ΔCt (control group). Finally, the relative expression levels of the target genes were calculated using the 2^(−ΔΔCT) method.

Statistical analysis

Statistical analyses were conducted using R software (version 4.2.3) and Cytoscape (v3.10.0). Data are expressed as mean ± standard deviation (SD). The Wilcoxon rank-sum test was used to assess differences between two groups, with p-values < 0.05 considered statistically significant.