Introduction

Liver cancer is one of the most common malignant tumors worldwide and the third leading cause of cancer-related death. About 865,000 new cases and 757,948 people die from liver cancer in 20221. About 90% primary liver cancer cases are hepatocellular carcinoma (HCC), which appears frequently in patients with cirrhosis2. The main risk factors for developing HCC include the hepatitis B virus (HBV) and hepatitis C virus (HCV) infection2,3,4, aflatoxin exposure, heavy alcohol intake, excess body weight, type 2 diabetes, smoking, alcohol-associated and non-alcohol-associated steatohepatitis2,5. A broad range of treatment options are available for patients with differential tumor stage, liver function, and performance status6,7. Early-stage HCC can adopt local ablation, surgical resection or liver transplantation. For intermediate-stage HCC can be treated curatively by percutaneous ablation, radiation transarterial or systemic therapies. For advanced-stage HCC can accept the targeted therapy or immunotherapy. Although advances in early diagnosis and treatment, the long-term prognosis remains unsatisfactory8,9. Therefore, there is urgent need for exploration the potential molecular biomarkers and relevant mechanisms to develop reliability diagnostic and therapeutic targets for HCC.

Phase separation, or liquid-liquid phase separation (LLPS), is a ubiquitous and important mechanism for the formation of membraneless condensates in cells by driving protein, RNA, and other biomolecules10. These condensates participate in a range of biological processes, including chromatin organization, maintenance of genomic stability, DNA damage response and repair, RNA metabolism, and intracellular signal transduction11,12. Increasing evidences have demonstrated that the formation or aberrant regulation of LLPS formation contribute to initiation and evolution of cancers13, which widely links to hallmarks of cancer, such as sustained cell proliferation, cell death resistance, telomere maintenance, DNA damage repair, cancer metabolism, and immune escape14,15,16. It is, thus, understanding of the LLPR-related molecular biomarkers and regulatory mechanism is critical for the developing the novel therapeutic strategies against HCC tumorigenesis.

In this study, we identified key cells predominantly regulated by LLPS and investigated their molecular mechanisms within the tumor microenvironment (TME) of HCC using single-cell RNA sequencing (scRNA-seq). Additionally, we performed a comprehensive integration of scRNA-seq and bulk RNA-seq data to identify and validate LLPS-related prognostic biomarkers in HCC. Furthermore, we explored the functions of the hub LLPS-related genes (LLPSRGs) through in vitro experiments.

Materials and methods

Single-cell RNA-sequencing data acquisition and processing

The scRNA-seq data for 10 HCC samples and 10 normal samples were retrieved from the GSE149614 dataset in Gene Expression Omnibus (GEO), which generated by GPL24676 platforms17. Seurat 4.0 package18 was used to processing and analyzing the scRNA-seq data. Quality control (QC) of the samples were performed using Seurat package, reserving the cells with the number of unique molecular identifies (UMIs) (nCount) fewer than 40,000, the number of expressed genes fewer than 6,000, the proportion of the ribosomal genes fewer than 50% and the proportion of the mitochondrial genes fewer than 15%. After that, a total of 51,890 cells, including 29,052 cells from HCC tumor samples and 22,838 cells from normal samples were sued for further analyses.

Dimensionality reduction, clustering, and annotation

“FindVariableGenes” function was used to select the top 3,000 high variable genes (HVGs). “CellCycleScoring” function was used to calculate the Cell cycle score for each phase (S1, G1, and G2). After integration of the data by canonical correlation analysis (CCA) method18 using “IntergrateData” function, “SCTransform” function was performed the linear regression based on the normalized gene expression, UMI for each cell, and the cell cycle, and then these genes were processed by principle component analysis (PCA) using “RunPCA” function. The “FindNeighbors” and “FindClusters” functions were used to identify the cell clusters. Then, the UMAP dimensional reduction was determined based on top 30 principle components (PCs) using “RunUMAP” function. Finally, manual cell annotation was performed using the well-known marker genes17.

Calculation of the LLPS score using aucell

A total of 3,600 LLPS-related genes (LLPSRGs) were included in this study, which were downloaded from the Liquid-Liquid Phase Separation Data Resource (DrLLPS, https://llps.biocuckoo.cn/). The AUCell algorithm19 was used to calculate the LLPS score and the area under the curve (AUC) value for each cell using the “AUCell” package, available on Bioconductor. First, AUCell ranks genes based on their expression levels in each cell. Then, an activity score for each gene set is calculated based on the top-ranked genes for that cell. Finally, cells are classified into high-LLPS and low-LLPS groups according to the median AUC value.

Cell-to-cell communication

“Cellchart” package (version 1.5.0)20 was used to identify the ligand-receptor interaction and to construct the intercellular communication networks between cell clusters with highest LLPS score and their distinct cell clusters, as well as hepatocytes in high-LLPS group and low-LLPS group. “CellChatDB.human” function was used to assess the major signaling inputs and outputs among all cell clusters. Then, “computeCommunProb”, “computeCommunProbPathway”, and “aggregateNet” functions were used to infer the intercellular communication networks. Dot plot shown the interaction between distinct cell clusters by putative ligand–receptor pairs, and Hierarchical plot visualized the interaction between distinct cell clusters by putative singling.

Trajectory analysis

“Monocle 2 (v.2.0)” package21 was used to construct differentiation trajectory of cell clusters with highest LLPS score. First, top 2,000 HVGs were selected as feature genes and used to sort the cells. Dimensionality reduction was performed by reversed graph embedding algorithm using “reduceDimension” function. Then, cell trajectory was constructed and visualized using “plot_cell_trajectory” function. DDRtree algorithm was used to calculate the pseudotime along the cell differentiation and transformation. The branch expression analysis modeling (BEAM) was used to identify the key genes that sperate cells into different branches. Then, these genes were divided into groups and visualized using “plot_genes_branched_heatmap” function. The “ClusterProfiler” package was used to perform the functional enrichment for each cluster. We also evaluated the direction of differentiation via correlation analysis between tumor-related functions and pseudotime by Pearson correlation analysis. Besides, the correlation between LLPS score and direction of differentiation, as well as LLPS score and tumor-related functions, was detected by Pearson correlation analysis.

Bulk RNA-sequencing data acquisition and identification of the differentially expressed genes (DEGs) between HCC tumoral and non-tumoral samples in bulk RNA-seq

The bulk RNA-seq data, somatic mutation, and corresponding clinical information for GSE14520 dataset were obtained from the GEO database, which including 225 HCC tumor tissues and 220 paired non-tumor tissues. And the bulk RNA-seq data and corresponding clinical information for 371 HCC tumor tissues and 50 normal tissues (TCGA-LIHC cohort) was downloaded from The Cance Genome Atlas (TCGA, https://portal.gdc.cancer.gov/). “Limma” package22 was used to identify the DEGs between HCC tumoral and non-tumoral tissues in GSE14520 dataset according to the threshold of |log2 (fold change, FC)| > 0.5 and p-value < 0.05.

Screening LLPSRGs and exploration of their biological functions

The LLPSRGs were obtained via intersecting the 3,086 DEGs from GSE14520 dataset, the 1,737 DEGs with |log (FC)| > 0.1 and adjusted p-value < 0.05 in the highest-LLPS score cell cluster from the scRNA-seq data, and 3,600 LLPSRGs from the DrLLPS database. The “clusterProfiler” package23 was used to explore the biological functions and related pathways of these intersected LLPSRGs, which included Gene ontology (GO) function (biological process, BP; cellular component, CC; molecular function, MF) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses24,25,26. In addition, the pathway analyses also performed based on the Hallmarker gene set from the Molecular Signatures Database (MSigDB, https://www.gsea-msigdb.org/gsea/msigdb).

Univariate Cox and LASSO regression analyses

TCGA-LIHC cohort was used as the training set, univariate cox regression analysis was performed using “survival” package to identify the genes that significantly correlated with survival with the p-value < 0.05. The selected genes were incorporated into the least absolute shrinkage and selection operator (LASSO) regression analysis using “glmnet” package27. “ggrisk” package28 was used to construct a prognostic model, the risk score for each sample was calculated according to the formula, risk score = \(\:\sum\:_{i=1}^{n}\left(expi\:ⅹ\:\beta\:i\right)\), expi represents gene expression, and βi represents the coefficient of the selected genes. All HCC patients were stratified into high-risk and low-risk groups based on the median value of the risk score. The differences in overall survival status between high-risk and low-risk groups were determined by Kaplan-Meier curve and log-rank test, and survival curves were drawn using the “survminer” package29. “timeROC” package30 was used to draw the time-dependent receive operating characteristic (ROC) curves for 1, 3, 5 years. Moreover, GSE14520 dataset was used as an external validation cohort to validate the prognostic model. Correlation between the risk score and clinicopathological features was determined.

Developing a predictive nomogram

A nomogram is a widely used tool for estimating prognosis in oncology and medicine by integrating various prognostic and determinant variables. The development of a predictive nomogram is a multidisciplinary process that requires expertise in statistics, clinical research, and domain-specific knowledge to ensure the model is both accurate and clinically useful31. The workflow includes data collection, identification of variables, construction of logistic regression models, and evaluation and validation of the model. Univariate and multivariate cox regression analyses were performed based on clinical characteristics (T/N/M pathological stages, tumor stages, and age) and risk score to identify the independent risk factors for HCC patients using “survival” package32. Then, a predictive nomogram was constructed using “rms” package33. Calibration curve was drawn to assess the reliability of the nomogram, and the decision curves was used to detect the sensitivity of the nomogram.

Gene set variation analysis (GSEA) and tumor mutation analysis

Single Sample Gene Set Enrichment Analysis (ssGSEA) was used to pathway enrichment analyses using “GSVA” package34, the “limma” package was used to identify the significant enrichment pathways between high-risk and low-risk groups according to the adjusted p-value < 0.05. log FC > 0 represents significant enrichment in high-risk group, and log FC < 0 indicates significant enrichment in low-risk group. “maftools” package35 was used to detect the tumor mutation between high-risk and low-risk groups.

Exploration of the potential drugs for risk gene signature and molecular Docking

The potential drugs target to prognostic signature genes were retrieved from the Drug Gene Interaction Database (DGIdb, https://www.dgidb.org/)36. The 3D structures of the potential drugs were obtained from the PubChem (https://pubchem.ncbi.nlm.nih.gov/)37 and the 3D structures of the prognostic signature was obtained from the Protein Data Bank (PDB, https://www.rcsb.org/) archive38. Then, molecular docking was performed using AutodockTools software39 and visualized using PyMol software40. The structure with the lowest binding free energy indicated the highest binding affinity.

Genemania and functional similarity analysis

The interconnections among genes were determined based on the GeneMANIA (http://genemania.org/)41, top 20 genes were identified as key nodes, and related top 5 pathways were identified. Then, the “GOsemsim” package42 was used to identify the functional similarity analysis of the prognostic genes.

Validation of the expression of risk signature genes in scRNA-seq and bulk RNA-seq data

The survival statues of the differential neoplasm histological grade, tumor stage diagnoses, and fibrosis score between high-risk and low-risk groups were determined. The differential expression of the prognostic signature genes was detected by Wilcoxon test. The survival statues of the differential expression of the prognostic signature genes also explored, and validated by ROC curves. Moreover, we also explored the expression of prognostic genes in different cell clusters.

Cell line and cell transfection

The HepG2 and Huh7 cells were obtained from the Yunnan Wosai Biotechnology Co. LTD and were grown in high-glucose Dulbecco’s Modified Eagle Medium (DMEM; Invitrogen, CA, USA) supplemented with 10% fetal bovine serum (FBS; Invitrogen) and 1% penicillin/streptomycin at 37 °C and 5% CO2 in a humidified incubator. All the cell lines identity was verified before experiments. After that, cells were plated in 6‑well plates (1 × 105 per well) and incubated at 37 °C for 24 h. The cells were transiently transfected with a final 50 nM dose of LGALS3 small interfering (si) RNA (Genomeditech Co., Ltd., Shanghai, China) or siRNA‑con (Genomeditech Co., Ltd., Shanghai, China) using riboFECT cp. transfection kit (cat. C10511-05, RiboBio Co., Ltd., Guangzhou, China) according to the manufacturer’s protocol. Subsequent analysis of the different cell groups was performed following incubation for 48 h.

RNA isolation and real-time polymerase chain reaction (RT-PCR) analysis

Trizol reagent (Life Technologies, CA, USA) was used to isolate the total RNA from cells according to the manufacturer’s protocol. Aliquots were converted to cDNA using FastKing RT Kit (With gDNase) FastKing cDNA (TIANGEN, Beijing, China). RNA quantification was estimated by qRT-PCR which performed with Taq Pro Universal SYBR qPCR Master Mix (Vazyme, Nanjing, China) at an ABI 7500 real-time PCR system (Applied Biosystems, CA, USA). Each experiment was performed in triplicate. GAPDH was used for normalization the expression levels of LGALS3. The results were analyzed with the 2−ΔΔCt method. The primers used are listed as following, GAPDH, forwards, 5’-TTGCCCTCAACGACCACTTT-3’, and reverse, 5’-TGGTCCAGGGGTCTTACTCC-3’. LGALS3, forwards, 5’-AACAACAGGAGAGTCATT-3’, and reverse, 5’-AACAAGTGAGCATCATTC-3’.

Scratch wound healing assay

Cell migrated ability was calculated using scratch wound healing assay. After transfection, 1 × 106 cells were collected and seeded into six-well and grown to 80–90 confluence. The scratches of monolayer of cells were made using a 200 µL pipette tip (Qiagen, CA, USA). The un-adherent cells were removed and added serum-free medium then incubated at 37 ℃ for 24 h and 48 h. The wound closing by migrated cells and the wound was observed and imaged under a light microscope (Leica, Wetzlar, Germany).

Transwell assay

Cell invasion was investigated by Transwell chamber-coated with Matrigel (Corning, Corning, NY). 1 × 105 cells were cultured in serum-free medium were added to upper chamber, with lower chamber supplied with complete medium. Cells on the lower chamber were fixed for crystal violet staining after 48 h of incubation. Invading cells were counted in 3 random fields and also imaged under a microscope (Olympus, Tokyo, Japan).

Statistical analysis

All experimental data are expressed as mean ± standard deviation (SD). The significant difference was determined by two-tailed Student’s t test using GraphPad prism 8 (GraphPad Software Inc., La Jolla, CA). P < 0.05 was considered statistical significance.

Results

Exploration of the cell composition in primary liver tumors

The flowchart of this study has been shown in Fig. 1. The scRNA-seq data for 10 HCC samples and 10 normal samples from GSE149614 dataset were obtained and processed. After the quality control, a total of 51,890 cells were used subsequent analysis (Figure S1A-B). Then, cells were distributed into 22 clusters after integrating and clustering (Figure S1C-E, Fig. 2A-B). Afterwards, 14 distinct cell clusters were identified with marker gene expression17 (Fig. 2C), including malignant hepatocyte, non malignant hepatocyte, fibroblast, endothelial cells, CD4 + T cells, Tregs, CD8 + T cells, natural killer cells, macrophages, monocytes, plasmacytoid dendritic cells (pDC), lymphoid dendritic cells (DCs), mast cells, plasma B cells, and mature B cells (Fig. 2D-E). Among these cells, malignant hepatocytes, fibroblasts, Tregs, macrophages, mast cells, and mature B cells had high cell numbers in tumor samples (Fig. 2F).

Fig. 1
figure 1

Workflow of this study.

Fig. 2
figure 2

Exploration of the cell composition in primary liver tumors. (A) UMAP plots showing 22 cell clusters after dimensionality reduction. Different colors represent distinct cell clusters. (B) UMAP plots illustrating the distribution of cell clusters between tumor and normal samples. Different colors represent distinct cell clusters in both tumor and normal samples. (C) Dot plots displaying the expression of the top 3 marker genes across the 15 cell types. (D) UMAP plots of 15 distinct cell clusters. Different colors represent the various annotated cell types. (E) UMAP plots showing the distribution of 15 distinct cell clusters between tumor and normal samples. Different colors represent the annotated cell types in both tumor and normal samples. (F) Histogram presenting the ratio of different cell clusters in tumor versus normal samples.

Malignant hepatocytes exhibit the highest LLPS scores

To investigate the impact of phase separation on different cell types, AUCell was employed to calculate the LLPS score for each cell. The LLPS score was notably higher in malignant hepatocytes (Fig. 3A). Each cell was then assigned an AUC value based on its LLPS activity, and cells were categorized into high-LLPS and low-LLPS groups according to the median AUC value (Fig. 3B). As illustrated in Fig. 3C-D, malignant hepatocytes, fibroblasts, and endothelial cells exhibited elevated LLPS scores, with the score in malignant hepatocytes being significantly higher than in non-malignant hepatocytes (Fig. 3E). These findings suggest that LLPS may drive several key biological processes in malignant hepatocytes.

Fig. 3
figure 3

Malignant hepatocytes exhibit the highest LLPS scores. (A) UMAP plots showing LLPS scores in individual cells. (B) UMAP plots depicting cells grouped by high and low LLPS scores. (C) Violin plot comparing LLPS scores across all cells. (D) Histogram showing the ratio of different cell clusters in high-LLPS and low-LLPS score groups. (E) Violin plot comparing LLPS scores between high-LLPS and low-LLPS score groups.

Interactions between malignant hepatocytes and other cell populations

We further explored cell-cell communication in relation to LLPS, focusing on specific signaling pathways and ligand-receptor interactions. Our primary focus was on the interactions between malignant hepatocytes and other cell types (Fig. 4A). Compared to non-malignant hepatocytes, malignant hepatocytes exhibited stronger connections with other cell populations (Fig. 4B). Notably, malignant hepatocytes predominantly communicated through 12 signaling pathways (Fig. 4C), including Macrophage Migration Inhibitory Factor (MIF), VISFATIN, Midkine (MK), COMPLEMENT, Protease-Activated Receptors (PARs), Chemokine (C-C motif) Ligand (CCL), Angiopoietin-like (ANGPTL), Vascular Endothelial Growth Factor (VEGF), Calcitonin Receptor (CALCR), Pleiotrophin (PTN), PROS, and Growth/Differentiation Factor (GDF).

Fig. 4
figure 4

Interactions between malignant hepatocytes and other cell populations. (A) Number of interactions in the cell-cell communication network between malignant hepatocytes and other cell types. (B) Circle plots illustrating the intensity of outgoing interactions from malignant and non-malignant hepatocytes to other cell types. (C) Heatmap showing enriched signaling interaction intensities between hepatocytes and other cell types. (D) Number of interactions in the cell-cell communication network between hepatocytes with high-LLPS and low-LLPS scores. (E) Circle plots depicting the intensity of outgoing interactions from high-LLPS and low-LLPS score hepatocytes to other cell types. (F) Heatmap of enriched outgoing signaling interaction intensities for high-LLPS and low-LLPS score hepatocytes with other cell types. (G) Dot plots displaying enriched outgoing ligand-receptor interaction intensities for high-LLPS and low-LLPS score hepatocytes with other cell types. (H) Cell-cell communication hierarchy diagram in the EGF and MIF signaling pathways. Solid circles and hollow circles represent source cells and target cells, respectively. The size of each circle represents proportional to the number of cells in each group, and the thickness of the lines indicates the strength of the interaction signals. In the left panel, the “target” in the center represents the selected target cells, while the right panel shows another group of cells selected as target cells, with their interactions observed by placing them in the center.

Furthermore, hepatocytes were divided into high-LLPS and low-LLPS groups, and we examined their communication with other cell types (Fig. 4D). As depicted in Fig. 4E, hepatocytes in the high-LLPS group exhibited stronger interactions with other cell populations. These high-LLPS hepatocytes predominantly utilized 15 signaling pathways (Fig. 4F), including MIF, CXCL, VISFATIN, MK, COMPLEMENT, PARs, CCL, ANGPTL, EGF, VEGF, CALCR, PTN, ANGPT, PROS, and GDF. Ligand-receptor interactions are detailed in Fig. 4G, where ligand gene expression was higher in high-LLPS hepatocytes compared to low-LLPS ones.

Importantly, we identified key roles for EGFR-ERGF and EGFR-AREG in the EGF signaling pathway network, as well as for MIF-CD44 and MIF-CXCR4 in the MIF signaling pathway, particularly in the high-LLPS context (Fig. 4H). These findings underscore the critical role of LLPS in modulating signaling networks in malignant hepatocytes.

Trajectory analysis of the malignant hepatocytes

We used Monocle 2 to investigate the differentiation trajectory of malignant hepatocytes with the highest LLPS scores. As shown in Fig. 5A-B, malignant cells were classified into three distinct cell states (1, 2, and 3). The LLPS scores increased progressively across these cell states (Fig. 5C). Pearson correlation analysis further revealed a positive association between LLPS scores and pseudotime (Fig. 5D). EMT, angiogenesis, E2F and MYC are identified as the hallmark of cancer43. Additionally, we observed that tumor-related functions such as epithelial-mesenchymal transition (EMT) and angiogenesis were positively correlated with both pseudotime and LLPS scores (Fig. 5E). Specifically, E2F and MYC targets also showed positive associations with pseudotime and LLPS scores (Fig. 5E-G). These findings suggest that the LLPS score is linked to the malignant differentiation of hepatocytes.

Fig. 5
figure 5

Trajectory analysis of the malignant hepatocytes. (A) Visualization of cell development time for hepatocytes in HCC. (B) Distribution of different cell states in the trajectory of hepatocytes in HCC. (C) Distribution of different cell states in hepatocytes with varying LLPS scores. (D–G) Scatter plots illustrating the correlations between (D) LLPS score and pseudotime, (F) LLPS score and E2F targets, and (G) LLPS score and MYC targets. (E) Heatmap showing the correlation between biological functions and pseudotime in relation to the LLPS score.

Identification of the LLPSRGs by integrating scRNA-seq and bulk RNA-seq data

To identify key LLPSRGs and explore their biological functions, we integrated scRNA-seq and bulk RNA-seq data. From the GSE14520 dataset, we identified a total of 3,086 differentially expressed genes (DEGs) between HCC tumor tissues and normal tissues, with 1,673 upregulated and 1,413 downregulated genes (Fig. 6A-B, Table S1). Additionally, 1,737 DEGs were obtained from scRNA-seq data (Table S2), and 3,600 LLPSRGs were sourced from the DrLLPS database. By intersecting DEGs from both bulk RNA-seq and scRNA-seq data with LLPSRGs from the DrLLPS database, we identified 207 differentially expressed LLPSRGs (Table S3, Fig. 6C). GO enrichment analysis revealed that these genes were associated with several biological processes, including protein folding, RNA localization, and focal adhesion. They were also linked to cellular components such as cell-substrate junctions, pigment granules, and melanosomes, and were involved in unfolded protein binding (Fig. 6D). KEGG enrichment analysis indicated that these genes were primarily involved in metabolism-related pathways, including fatty acid metabolism, the pentose phosphate pathway, arginine and proline metabolism, glycolysis/gluconeogenesis, and glutathione metabolism. They were also related to the cell cycle pathway (Fig. 6E). Furthermore, these genes were significantly associated with the MYC targets signaling pathway (Fig. 6F). These findings suggest that LLPSRGs play a role in tumorigenesis and progression-related pathways in HCC.

Fig. 6
figure 6

Identification of the LLPSRGs by integrating scRNA-seq and bulk RNA-seq data. (A) Volcano plot showing differentially expressed genes (DEGs) between HCC tumor and normal samples in the GSE14520 dataset. (B) Heatmap of the top 10 DEGs between HCC tumor and normal samples in the GSE14520 dataset. (C) Venn diagram showing intersected LLPSRGs from scRNA-seq DEGs, bulk RNA-seq DEGs, and LLPSRGs from the DrLLPS database. (D–F) GO, KEGG, and Hallmark pathway enrichment of the intersected LLPSRGs.

Construction and validation of a LLPS-related risk model for HCC

We identified 37 LLPS-related genes associated with survival using univariate Cox regression analysis (Fig. 7A). Using the TCGA-LIHC cohort as the training set, we constructed a prognostic model and identified 10 LLPS-related signature genes with optimal parameters: HSPD1, ENO1, TARS, RANBP1, MARCKSL1, G6PD, LGALS3, KLF4, HAGH, and GOT2 (Fig. 7B). The risk score was calculated based on the expression and coefficient of each signature gene, and patients were categorized into high-risk and low-risk groups based on the median risk score (Fig. 7C). Patients in the high-risk group exhibited significantly poorer overall survival (OS) compared to those in the low-risk group (Fig. 7D). The area under the curve (AUC) values for 1, 3, and 5 years were 0.690, 0.666, and 0.681, respectively, indicating the strong predictive ability of the prognostic model (Fig. 7E). The external cohort demonstrated similar trends (Fig. 7F-H). Additionally, we investigated the correlation between the LLPS score and clinicopathological features. As shown in Figure S2A-D, the LLPS score was associated with advanced neoplasm histological grade, advanced T stage, advanced tumor stage, and high fibrosis score. Furthermore, patients with high-risk scores exhibited poorer OS across different neoplasm histological grades compared to those with low-risk scores (Figure S2E). Similar trends were observed for patients with high-risk scores in different tumor stages (Figure S2F), T stages (Figure S2G), and fibrosis scores (Figure S2H). These findings suggest that LLPS-related risk is linked to poor prognosis in HCC patients.

Fig. 7
figure 7

Construction and validation of a LLPS-related risk model for HCC. (A) Forest plot showing survival-related LLPSRGs from univariate Cox analysis. (B) LASSO regression of 37 genes with cross-validation for parameter optimization. (C) Distribution of risk score and survival status, with a heatmap of gene expression in the TCGA-LIHC cohort. (D) Kaplan-Meier survival curves based on risk scores in the TCGA-LIHC cohort. (E) ROC curves showing the AUC for predicting OS at 1, 3, and 5 years in the TCGA-LIHC cohort. (F) Distribution of risk score and survival status, with a heatmap of gene expression in the GSE14520 cohort.

(G) Kaplan-Meier survival curves based on risk scores in the GSE14520 cohort. (H) ROC curves showing the AUC for predicting OS at 1, 3, and 5 years in the GSE14520 cohort.

Development of a predictive nomogram

We incorporated the clinicopathological features (age, T/N/M stage, tumor stage) and risk score into the univariate and multivariate cox regression analyses to identify the independent prognostic factors (P < 0.05, Fig. 8A-B). T stage, tumor stage, and risk score were identified as independent prognostic factors and used to construct a predictive nomogram (Fig. 8C). The calibration and decision curves demonstrated the reliability and sensitivity of the nomogram (Fig. 8D-E).

Fig. 8
figure 8

Development of a predictive nomogram. (A,B) Forest plots showing independent variables from univariate and multivariate Cox analyses. (C) A nomogram for predicting 1-, 3-, and 5-year survival in HCC patients based on the LLPS-related prognostic gene risk score and clinicopathologic features. (D) Calibration curves showing agreement between predicted and observed 3-year survival rates. (E) Decision curves showing the benefit of the prognostic nomogram.

LLPS-related functional enrichment and tumor mutation analysis

We investigated pathways associated with the LLPS-related risk score. As shown in Fig. 9A-B, cell cycle-related pathways, such as the G2M checkpoint and E2F targets, were predominantly enriched in the high-risk group. Additionally, stemness-related pathways, including MYC targets v1 and v2, were significantly enriched in the high-risk group. Conversely, metabolism-related pathways, such as fatty acid metabolism, xenobiotic metabolism, and bile acid metabolism, were primarily enriched in the low-risk group. Furthermore, we compared tumor mutations between the high-risk and low-risk groups. We observed a higher frequency of mutations in TP53, TTN, and CTNNB1 in HCC (Fig. 9C). Notably, TP53 mutations were found to be significantly associated with the LLPS-related risk score, whereas no such correlation was observed for TTN and CTNNB1 (Fig. 9C-D). Furthermore, the high-risk group exhibited a greater number of in-frame insertions/deletions, missense mutations, multi-hit mutations, and truncating mutations in TP53 compared to the low-risk group (Fig. 9E).

Fig. 9
figure 9

LLPS-related functional enrichment and tumor mutation analysis. (A,B) GSVA showing enriched pathways associated with the LLPS-related risk score. (C) Waterfall plot displaying the frequency of mutations in several genes. (D) Differences in mutation frequency of TP53 between high-risk and low-risk. (E) Histogram showing the ratio of in-frame insertions/deletions, missense mutations, multi-hit mutations, and truncating mutations in TP53 relation to the LLPS-related risk score.

Prediction of the potential drugs for LLPS-related gene signature

Based on the identified prognostic signature genes, we predicted potential therapeutic agents using the DIGdb database. We identified 33 potential agents (Table S4, Fig. 10A). The 3D structures of these agents and the prognostic signature genes were retrieved from the PubChem and PDB databases, respectively. Molecular docking results indicated that LGALS3 had a strong interaction with Olitigaltin (free energy: -4.6 kcal/mol) and Lactose (free energy: -4.7 kcal/mol). G6PD demonstrated strong interactions with Sitamaquine (free energy: -4.13 kcal/mol), Phenazopyridine (free energy: -5.4 kcal/mol), Sulfanilamide (free energy: -5.01 kcal/mol), Pamaquine (free energy: -4.3 kcal/mol), Sodium ascorbate (free energy: -3.5 kcal/mol), and Co-Trimoxazole (free energy: -5.81 kcal/mol) (Fig. 10B-I).

Fig. 10
figure 10

Prediction of the potential drugs for LLPS-related gene signature. (A) Network of prognostic signature genes and corresponding agents. (B,C) Molecular docking of LGALS3 with Olitigaltin and Lactose. (D–I) Molecular docking of G6PD with Sitamaquine, Phenazopyridine, Sulfanilamide, Pamaquine, Sodium ascorbate, and Co-Trimoxazole.

Validation of the expression of risk signature genes at single-cell and transcriptomic levels

To validate the expression of risk signature genes, we analyzed data from the GSE14520 dataset. We observed increased levels of ENO1, G6PD, HSPD1, LGALS3, MARCKSL1, RANBP1, and TARS in HCC tumor tissues compared to normal tissues. Conversely, GOT2, HAGH, and KLF4 were expressed at lower levels in HCC tumor tissues (Fig. 11A). Additionally, in the TCGA-LIHC cohort, high expression of ENO1, G6PD, HSPD1, LGALS3, MARCKSL1, RANBP1, and TARS was associated with poorer overall survival (OS), while high expression of GOT2 and HAGH was linked to better OS (Fig. 11B). ROC curves demonstrated the predictive ability of these genes, except for GOT2 and HAGH (Fig. 11C). Furthermore, we confirmed the expression of these genes in hepatocytes (Fig. 11D). In summary, seven LLPS-related genes, ENO1, G6PD, HSPD1, LGALS3, MARCKSL1, RANBP1, and TARS, were selected for further analysis. GeneMANIA results showed that these genes interact with other genes (Fig. 11E). Functional similarity analysis revealed that LGALS3 and G6PD had the highest similarity scores (Fig. 11F), indicating that they are central hub genes.

Fig. 11
figure 11

Validation of the expression of risk signature genes at single-cell and transcriptomic levels. (A) Expression of ENO1, G6PD, GOT2, HAGH, HSPD1, KLF4, LGALS3, MARCKSL1, RANBP1, and TARS in HCC tumor and normal tissues in the GSE14520 cohort. (B) Kaplan-Meier survival curves based on high and low expression of the signature genes in the TCGA-LIHC cohort. (C) ROC curves showing the AUC for predicting OS at 1, 3, and 5 years in the TCGA-LIHC cohort. (D) UMAP plots of the expression of the signature genes in different cell types. (E) Prognostic gene-protein interaction network based on GeneMANIA. (F) Functional similarity analysis by GOSemSim.

Experimental validation of the biological function of LGALS3 in HCC

LGALS3 is known as an oncogene in various tumors and is associated with bone metastasis in HCC44. In this study, we investigated the role of LGALS3 in HCC through in vitro experiments. HepG2 and Huh7 cells were transfected with siRNA targeting LGALS3, resulting in a significant reduction in LGALS3 expression compared to the control group (Fig. 12A-B). The most effective siRNA for LGALS3 silencing was selected for subsequent analyses. Our results showed that silencing LGALS3 significantly inhibited cell migration (Fig. 12C-E) and invasion (Fig. 12F-H). These findings suggest that LGALS3 functions as an oncogene by promoting cell migration and invasion in HCC.

Fig. 12
figure 12

Experimental validation of the biological function of LGALS3 in HCC. (A,B) qRT-PCR analysis of LGALS3 expression in HepG2 and Huh7 cells after LGALS3 silencing. (C–E) Scratch wound healing assay showing cell migration after 24 h and 48 h of LGALS3 silencing. (F–H) Transwell assay showing cell invasion after 48 h of LGALS3 silencing (scale bar = 200 μm). *P < 0.05, **P < 0.01, ***P < 0.0001.

Discussion

In recent years, the roles of LLPS and related mechanisms have been widely observed in various tumors, regulating a broad range of biological processes45,46. Notably, LLPS exerts critical role in HCC which is regulated by various molecules, such as, circVAMP3 drives phase separation of CAPRIN1 and promotes stress granule formation to accelerate HCC cell proliferation and metastasis47. Long Noncoding RNA URB1-Antisense RNA 1 (AS1) drives phase separation of ferritin to prevent sorafenib-induced ferroptosis in HCC48. Additionally, circASH2 enhances the phase separation of nuclear Y-box binding protein 1 (YBX1), reducing its inhibitory effect on the target gene tropomyosin 4 (TPM4), ultimately promoting HCC metastasis49. The coiled-coil (CC) domain of RNF214 mediates its phase separation to promote the proliferation, migration, and metastasis of HCC50. Moreover, inhibiting MLL1 activity or mutating Borealin K143 to arginine disrupts the phase separation of chromosome passenger complex (CPC), which ultimately increases chromosome instability and aneuploidy in HCC51. Therefore, identifying a reliable risk signature for HCC is crucial for distinguishing patients who are more likely to benefit from the corresponding therapy.

In recent years, bioinformatics analysis has been widely used to identify key biomarkers and investigate their underlying mechanisms in various tumors. For example, bulk RNA-seq data have revealed that the copper metabolism-related COMMD family serves as potential diagnostic and therapeutic markers for gastric cancer52. Similarly, sphingolipid metabolic pathways have been identified as autonomous diagnostic and prognostic indicators for glioma patients based on bulk RNA-seq analysis53. In addition, the ferroptosis-related gene CISD2 has been recognized as a biomarker for early detection, prognosis, and prediction in patients with head and neck squamous cell carcinoma (HNSCC)54. Moreover, multi-omics integration has enabled the identification of novel proliferative cell types, such as Prol cells, which influence overall survival (OS) and progression-free survival (PFS) in patients with pancreatic ductal adenocarcinoma (PDAC)55.

In this study, we comprehensively integrated bulk RNA-seq and scRNA-seq data to identify LLPS-related biomarkers and investigate their molecular mechanisms within the tumor microenvironment (TME) of HCC. We identified 14 distinct cell clusters in HCC, including malignant hepatocyte, non malignant hepatocyte, fibroblast, endothelial cells, CD4 + T cells, Tregs, CD8 + T cells, NK cells, macrophages, pDC, lymphoid DCs, mast cells, plasma B cells, and mature B cells. Of these cells, malignant hepatocytes showed the highest LLPS score, malignant hepatocytes acted as a signaling sender that strongly connected with other cell types through 12 signaling pathways, including MIF, VISFATIN, MK, COMPLEMENT, PARs, CCL, ANGPTL, VEGF, CALCR, PTN, PROS, and GDF. In addition, we also classified the hepatocytes into high-LLPS and low-LLPS groups, we also found the communication between hepatocytes in high-LLPS group and others through 15 pathways, including MIF56, CXCL57, VISFATIN58, MK59, COMPLEMENT60, PARs61, CCL62, ANGPTL63, EGF64, VEGF65, CALCR66, PTN67, ANGPT68, PROS69, and GDF70, which exert critical roles in tumor progression. Moreover, we found the crucial roles of EGFR-ERGF, and EGFR-AREG in the EGF signaling pathway network, and the crucial roles of MIF-CD44, and MIF-CXCR4 in MIF signaling pathway network with high LLPS.

EGFR, also known as ERBB1 or HER1, is a member of the receptor tyrosine kinase (RTK) subfamily, which regulates various biological processes including cell proliferation, apoptosis inhibition, metastasis, and metabolism71,72. EGFR signaling is frequently activated in HCC and serves as a therapeutic target, either as a monotherapy or in combination with other treatments73,74. Additionally, EGFR has been shown to stabilize the oncogene MYC, contributing to resistance against antitumor therapies, particularly when treated with BET (bromodomain and extra-terminal domain) inhibitors in HCC75. MIF, or macrophage migration inhibitory factor, is a proinflammatory cytokine that also functions as an oncogene76,77,78. The interaction between MIF and CD74 promotes HCC cell proliferation and resistance to cell death79. Moreover, the molecular interaction between MIF and CD44 has been identified in acral melanoma80, while the MIF-CXCR4 interaction has been associated with the activity of functional NK cells in HCC81. These studies suggest that targeting the interactions of EGFR-EGF, EGFR-AREG, MIF-CD44, and MIF-CXCR4 could represent promising therapeutic strategies for treating HCC.

Additionally, the differentiation trajectory analysis revealed that the LLPS score is associated with the malignant differentiation of hepatocytes. E2F and MYC target genes showed a positive correlation with both Pseudotime and LLPS score. The E2F family, which consists of eight members (E2F1-E2F8), is broadly categorized into activators (E2F1-E2F3) and repressors (E2F4-E2F8). These genes primarily regulate the cell cycle and are implicated in the majority of human malignancies82,83. Notably, a high E2F score has been linked to poorer survival outcomes in HCC patients84. E2F members and related genes are involved in the development and progression of HCC and could serve as potential targets for the prevention and treatment of HCC85,86,87. The MYC gene, comprising three paralogs (C-MYC, N-MYC, and L-MYC), is a common oncogene in human cancers, playing a crucial role in cancer initiation, maintenance, progression, and immune evasion88,89. Increased MYC activation is often observed in HCC, where it regulates macrophage polarization90, immune evasion91, cell proliferation, metastasis, and resistance to sorafenib92. In this study, we identified that the activation of E2F and MYC is associated with hepatocyte differentiation along with the LLPS score. Future experimental studies are needed to validate the functional roles of these key genes.

We integrated scRNA-seq and bulk RNA-seq data to identify and construct an LLPS-related risk signature for HCC. Through this analysis, ten potential prognostic genes for HCC were identified: HSPD1, ENO1, TARS, RANBP1, MARCKSL1, G6PD, LGALS3, KLF4, HAGH, and GOT2. We also predicted potential therapeutic agents targeting these risk signature genes. Specifically, Olitigaltin and Lactose were identified as potential agents for LGALS3, while Sitamaquine, Phenazopyridine, Sulfanilamide, Pamaquine, Sodium ascorbate, and Co-trimoxazole were identified for G6PD. After validating the expression of the risk signature genes at both single-cell and transcriptomic levels, LGALS3 and G6PD emerged as the key LLPS-related genes for HCC. Targeting LGALS3 and G6PD has emerged as a promising strategy93,94. However, there are currently no definitive in vivo or in vitro studies to validate the inhibitory effects of these predictive drugs in HCC.

Glucose-6-phosphate dehydrogenase (G6PD) acts as a key redox role in cancer, influencing cell growth and death95, and metabolism reprogramming96. Elevated G6PD expression has been shown to accelerate HCC cell migration and invasion97, while also inhibiting ferroptosis in HCC cells94. Similarly, LGALS3, also known as galectin-3, is a member of the galectin family and is involved in various physiological processes, including cell proliferation, differentiation, apoptosis, and immune response98. It plays a pivotal role in tumor cell proliferation, adhesion, angiogenesis, migration, invasion, and metastasis99. LGALS3 functions as an oncogene in HCC, promoting cell migration, invasion100, tumor aggressiveness, metastasis, poor prognosis101, immune evasion and metabolic reprogramming102. In this study, we also conducted experiments to validate the role of LGALS3 in the migration and invasion of HCC cells.

In this study, we integrated scRNA-seq and bulk RNA-seq data to construct an LLPS-related prognostic risk signature for HCC, identifying key genes such as LGALS3 and G6PD. AUCell scoring and cell-cell communication analyses revealed the critical role of LLPS in malignant hepatocytes and their interactions with other cell types. Experimental validation confirmed LGALS3’s role in promoting HCC cell migration and invasion, highlighting its potential as a therapeutic target. However, several limitations should be noted. First, malignant hepatocytes were classified primarily based on marker genes, which may overlook subtle malignant subpopulations; future studies could incorporate CNV inference tools such as inferCNV or CopyKAT to enhance classification reliability. Second, AUCell-based LLPS scoring may be affected by median dichotomization and gene set selection, and its correlation with cellular functions requires further validation, permutation testing and evidence-based gene sets could enhance robustness. Third, while AUCell scores provide a quantitative measure of LLPS activity, their correlation with cellular functional states requires further experimental validation. Fourth, the predicted therapeutic agents for LLPS-related genes have not yet been tested in vitro or in vivo, and their clinical feasibility remains uncertain. Finally, although the prognostic risk signature was validated in an independent cohort (GSE14520), future studies with larger and multi-center cohorts are needed to confirm its predictive performance and clinical utility.

Conclusion

In summary, through bioinformatics analyses and experimental validation, we successfully developed an LLPS-related prognostic risk signature and predicted potential therapeutic agents. Additionally, we identified and validated the critical role of LGALS3 in regulating HCC cell migration and invasion. Our findings offer novel insights into HCC progression and present a potential therapeutic target for HCC treatment.