Introduction

Hepatocellular carcinoma (HCC) is a major contributor to cancer-related deaths globally1, with alarmingly high postoperative recurrence rates of 50% to 70% within five years2. While the clinical characteristics influencing HCC recurrence risk are well-documented3,4,5, the molecular mechanisms remain poorly understood. This gap in knowledge impedes the development of effective postoperative strategies to reduce recurrence risk. Current adjuvant target therapies, including sorafenib, had failed to improve recurrence-free survival (RFS) in HCC6. Similarly, transcatheter arterial chemoembolization showed limited efficacy in preventing postoperative recurrence7. Although combining atezolizumab and bevacizumab has significantly extended overall survival (OS) in advanced HCC8,9, it has been less effective in prolonging RFS in the whole population of resectable HCC tumors with high recurrence risk10. Traditional immune therapy biomarkers such as programmed cell death 1 ligand 1 (PD-L1) expression, tumor mutational burden, and microsatellite instability, offered limited predictive value for HCC treatment outcomes11. This underscored an urgent need for novel molecular panels to better predict postoperative recurrence and inform treatment strategies.

P53 played a crucial role in the initiation and progression of tumors12,13,14, primarily through the activation of DNA damage and repair (DDR), inhibition of the cell cycle, and promotion of apoptosis15. P53 mutations were associated with HCC recurrence, leading to shorter OS and RFS among patients16. DDR was essential for repairing gene mutations and maintaining genomic stability17,18,19,20. Limited evidence suggested that DDR promote tumor recurrence. For example, DDR-induced reactivation of Octamer-Binding transcription factor 4 led to tumor recurrence21 , and upregulation of DDR genes in glioblastoma promoted tumor progression and recurrence22. Interestingly, defects in DDR could activate anti-tumor effects via various mechanisms23,24,25. For instance, the accumulation of mutations caused by DDR defects generated tumor-specific neoantigens, which in turn activate anti-tumor immunity26. DNA damage-induced activation of the cGAS-STING pathway fostered type I IFN signaling and enhanced anti-tumor immune responses27,28. DNA damage could also lead to increased expression of PD-L129,30 , and triggered immune responses through cell death signals31. Therefore, molecules in the DDR pathway had the potential to serve as predictive markers for immunotherapy32.

Molecular models were widely utilized for predicting cancer prognosis and guiding treatment options33,34,35. With advancements in machine learning, combining various algorithms and selecting the most effective ones are crucial for enhancing model performance36,37. In this study, we applied 173 distinct combinations of machine learning algorithms to datasets from TCGA-LIHC, PLANET, GSE76427, GSE14520, and the Xiangya cohort. Our model was designed to predict HCC recurrence and treatment response by leveraging DDR signatures associated with P53 mutations. Our analysis explored the relationship between DDR signatures and factors such as the immune microenvironment, drug sensitivity and responsiveness to immunotherapy. These findings highlighted the potential of DDR signatures as prognostic and therapeutic biomarkers in HCC.

Material and methods

Collection and sequence of clinical HCC samples

In this study, we collected frozen tumor tissues from 53 HCC patients who underwent resection at Xiangya Hospital, Central South University, between 2017 and 2020. All of them had a confirmed diagnosis of HCC without any other primary tumors, had not received any treatment prior to surgery, and had not undergone postoperative immunotherapy. A follow-up was conducted to monitor RSF, with the follow-up endpoint being August 2024. These patients served as a validation cohort for our model. Additionally, we collected HCC tissues for Cytometry by Time-Of-Flight (CyTOF) and RNA sequencing from 16 patients, which were reported in our previous article38. The RNA sequencing data were supposed to be used in calculating the DDR model-derived risk score for each sample. And the matched CyTOF data were to be used in assessing intra-tumoral immune cell infiltration between the high and low-risk groups. The collection and handling of clinical specimens were approved by the Ethics Committee of Xiangya Hospital, Central South University (approval number: 202401014). Samples meeting quality control standards were sequenced on the BGI platform, with a sequencing depth of 10G, producing raw FASTQ data. The data were processed using fastp (version 0.20.1) to remove adapters and low-quality reads, yielding clean data. Hisat2 (version 2.2.1) was used for aligning to the GRCh38 reference genome, and featureCounts (version 2.6.0) was employed to obtain read counts, which were then normalized to TPM and log2 transformed.

Public data collection and batch effect removal

For recurrence analysis, publicly available transcriptomic sequencing data were gathered from tumor resection specimens of HCC patients, including datasets from the Cancer Genome Atlas Liver Hepatocellular Carcinoma (TCGA-LIHC), PLANET, GSE76427, and GSE14520. Clinical data, mutation information, and bulk RNA sequencing raw expression files for the TCGA-LIHC cohort were retrieved from https://portal.gdc.cancer.gov. The gene expression matrix was normalized to TPM and log2 transformed. Samples with P53 mutations, identified as single nucleotide polymorphism or deletion, were annotated as P53-mutant, while others were labeled as P53-wild type. From the initial 377 samples, those without recurrence time and those within non-primary tumors were excluded, resulting in 364 samples for further analysis. For the PLANET cohort39, clinical information and raw expression files were acquired from https://figshare.com/articles/dataset/PLANET_cohort/23732370. Average expression was calculated when multiple tumor samples were available for a single patient, followed by TPM normalization and log2 transformation. Samples missing recurrence time and those categorized as non-primary tumors were excluded, resulting in 55 samples for downstream analysis. Gene Expression Omnibus (GEO) data with the accession numbers of GSE76427 and GSE14520 were downloaded from GEO database40,41. After excluding samples with missing recurrence times, 108 and 242 samples were retained from GSE76427 and GSE14520, respectively. Each dataset had low-variance genes with an inter-sample variance below 0.2 removed. Batch effects were assessed using principal component analysis (PCA) and removed using the ComBat method from the sva package to minimize technical variation and enhance data consistency and comparability.

For analysis of therapy response, we applied the HCC transcriptomic sequencing data with therapy response information from the European Genome-phenome Archive, with dataset ID EGAD00001008128 and EGAD0000100813042. EGAD00001008128 contains raw RNA sequencing data from tumor samples of 358 patients who participated in the GO30140 (NCT02715531) phase 1b or IMbrave150 (NCT03434379) phase 3 trials and received treatment with either atezolizumab combined with bevacizumab, atezolizumab alone, or sorafenib. EGAD00001008130 contains therapy response and survival outcome. These raw data were processed using fastp (version 0.20.1) to remove adapters and low-quality reads, yielding clean data. Hisat2 (version 2.2.1) was used for aligning to the GRCh38 reference genome, and featureCounts (version 2.6.0) was employed to obtain read counts, which were then normalized to TPM and log2 transformed.

Enrichment analysis and survival analysis

All analyses were performed using R software (version 4.3.2). Gene sets related to DNA damage repair were obtained from MSigDB43, enriched through Gene Ontology44, Kyoto Encyclopedia of Genes and Genomes45, Reactome46, and WikiPathways, along with gene sets summarized by Michele Olivieri et al.47, aggregating to a total of 430 gene sets. GSEA enrichment analysis was conducted using GSEA (version 4.3.3). Gene sets with an FDR < 25% and nominal p-value < 5% were considered significantly enriched, and the genes within these significant gene sets were subjected to univariate Cox regression analysis. RFS-related univariate Cox regression analysis was performed using the survival package. Genes with a Cox regression p-value < 0.05 in both the training and independent validation cohorts were intersected to identify prognostic genes significantly associated with RFS for subsequent model development. The survival and survminer packages were used to generate Kaplan–Meier plots illustrating the relationship between model risk scores and survival probability. For significant genes in the model, enrichment analysis was conducted using Enrichr-KG (maayanlab.cloud), and chord diagrams were drawn using the circlize package to depict the pathway-gene relationships and biological functions.

Model training and validation

The TCGA-LIHC cohort was used as the training set, PLANET as validation set 1, GSE76427 and GSE14520 combined as validation set 2, and the Xiangya cohort as an independent validation set. A total of 10 machine learning algorithms were integrated: CoxBoost, Random Survival Forests (RSF), Lasso, stepwise Cox, Gradient Boosting Machine (GBM), Ridge, Elastic Net (Enet), SuperPC, survival-support vector machines (survival-SVMs), and plsRcox. These algorithms were widely used for published survival models37,48,49,50. By adjusting cross-validation folds, stepwise regression directions, and penalty terms, 173 algorithm combinations were generated. These combinations were used to train 173 prognostic models on the training set. The pROC and ggplot2 packages facilitated the computation and visualization of AUC heatmaps for 1 to 5 years and calculated the concordance index (C-index) of the models. The mosaic::zscore function of the mosaic package was used to standardize risk scores within each cohort. The mosaic::zscore function automatically transformed the data into a standard normal distribution (mean = 0, standard deviation = 1). Samples with risk scores higher than cohort mean were divided into the high-risk group, while the others were in the low-risk group. The survminer package was employed to illustrate survival probability differences between high and low-risk groups using Kaplan–Meier (KM) plots. Univariate and multivariate Cox regression analyses were conducted using the survival package, with forest plots drawn using the forestplot package to evaluate the independent predictive effect of risk scores on RFS. The ggplot2 package generated bar plots showcasing the most important features in the random forest model.

Immune infiltration algorithms and CyTOF analysis

Four immune microenvironment scoring methods—ESTIMATE51, xCELL52, IPS53, and TIDE54—were applied to score the training and validation sets. Five immune infiltration algorithms—xCELL, CIBERSORT55, CIBERSORT absolute, MCPcounter, and TIDE—were used for differential analysis of immune-infiltrating cells between high and low-risk groups in the training set. CyTOF was conducted by SendiBio, as described in our previous publication38. Cell clustering and t-SNE dimensionality reduction of CyTOF data were performed using FCS Express 7.

Drug sensitivity analysis

The oncoPredict R package was utilized to predict drug responses for each sample56. CTRP was used for RNA-seq data, and GDSC2 for microarray data. Drug sensitivity scoring was performed for both high and low-risk groups in the training and validation sets, with significance tested using the Wilcoxon test. The intersection of differential drugs in each cohort provided representational differential drugs. Violin plots illustrated the inter-group differences in drug sensitivity scores, while scatter plots showcased the correlation between drug sensitivity scores and risk scores.

Cell lines and cell cultures

LO2 (normal liver cell line), Hep3B, HCCLM3, HepG2, and MHCC97H (Human liver cancer cell line) were purchased from the Chinese Academy of Sciences Typical Culture Preservation Center Cell Bank. All cell lines were maintained in Dulbecco Modified Eagle culture-medium (DMEM) (10–013-CVRC, Corning, Suzhou, Jiangsu, China) containing 10% FBS (SA301.02, Cellmax, Beijing, China) in a humidified incubator containing 5% CO2 at 37 °C. All cells were free of mycoplasma contamination.

Cell count kit-8 (CCK-8) proliferation test

Cell proliferation was detected by CCK-8 assay (BS350B, Biosharp, Beijing, China); cells of logarithmic growth stage (2000 cells per well) were inoculated on 96-well plates. At specific time points (24, 48, 72 h), cells were added with CCK-8 solution 10 μL per well and incubated at 37 ℃ for two hours. The absorbance value at 450 nm was calculated by spectrophotometry three times every 24 h.

RNA isolation and real-time quantitative polymerase chain reaction (PCR)

Total RNA is extracted using AG RNAex Pro reagent (AG21101, AG, Changsha, Hunan, China). The RNA concentration was quantified using NanoDrop™ One. 1.5 μg RNA was transferred to cDNA using the Evo M-MLV RT Kit (AG11728, AG, Changsha, Hunan, China). The SYBR Green PCR Mastermix Kit (AG11718, AG, Changsha, Hunan, China) was used to amplify PCR on QuantStudio™ real-time fluorescent quantitative PCR software. 2-ΔΔCt computes values for quantitative analysis. Primers are as follows:

POLR3G:

F: TAGGGAGCAGTGCCTTTCAG.

R: GTGGGGGTGGTTTCAACACT.

GAPDH:

F: CAGGAGGCATTGCTGATGAT.

R: GAAGGCTGGGGCTCATTT.

Using GAPDH as the internal parameter, the 2 − ΔΔCt value was calculated for quantitative analysis.

Western blotting

Extract total protein with radio immunoprecipitation assay buffers containing phosphatase and protease inhibitors. After separation by SDS-PAGE, the proteins were transferred to a polyvinylidene fluoride membrane at 250 mA for 90 min and blocked with 5% skim milk. Subsequently, the membrane was incubated with primary antibody at 4℃ for 12 h. Subsequently, the membrane and the secondary antibody were incubated at 37℃ for 60 min. After 3 washes with TBST, the ChemiDoc XRS Image system detected the signal. Finally, the protein was quantitatively analyzed using Image J software. Antibodies used were as followed: POLR3G (03G240603, 1:1000, AWA40802, Abiowell, Changsha, Hunan, China), β-Tubulin (21,000,383, 1:20,000, HRP-66240, Proteintech, Wuhan, Hubei, China).

EdU detection

3000 cells were inoculated on a 96-well plate. Cells were maintained with 20 μM EdU (C0078S, Beyotime, Shanghai, China) for 2 h and fixed with 4% paraformaldehyde. The cells were then treated with 50 μl Click Additive Solution. For 30 min, washed with 0.5% Triton X-100 (AWH0299a, Abiowell, Changsha, Hunan, China), stained with DAPI solution (AWC0292, Abiowell, Changsha, Hunan, China), and images were collected under the microscope.

Small interference RNA (siRNA) transfection

Sequentials were ordered from OBIO (https://www.obiosh.com/) for siRNA transfection targeting POLR3G, and Lipo3000 (L3000-015, Invitrogen, California, USA) was utilized for transfection. The specific sequences for each siRNA are as follows:

Si-NC.

Forward: UUCUCCGAACGUGUCACGUTT.

Reverse: ACGUGACACGUUCGGAGAATT.

Si-POLR3G.

Forward: UAAAGGAAGAGGACGUGCUGCUUAUTT.

Reverse: AUAAGCAGCACGUCCUCUUCCUUUATT.

Transwell migration

Cell migration experiments were performed in a 24-well transwell plate (8.0 µm pore size, Corning Life Sciences, Costar, USA). Stably transduced cells were treated with trypsin and adjusted to 3 × 105 cells/mL after counting. Then, 600 µL of complete medium containing 30% (v/v) serum was added to the lower chamber, 200 µL of the cell suspension was added to the upper chamber, and the cells were cultured for 48 h. The cells in the upper chamber were removed, and the cells remaining on the membrane were fixed with 4% paraformaldehyde solution (AWI0056b, Abiowell, Changsha, Hunan, China). After staining with 0.1% crystal violet solution (AWI0364a, Abiowell, Changsha, Hunan, China), the cells were observed under a microscope and imaged. All experiments were repeated three times.

Statistics analysis

All analyses were performed using R software (version 4.3.2) and GraphPad Prism 10. The Kruskal–Wallis test was utilized for multiple group comparisons. This was followed by Dunn’s post-hoc comparisons between subgroups, whose p-value was adjusted by the Bonferroni correction. The Wilcoxon rank-sum test was used for two-group comparisons. The Kaplan–Meier method was applied to estimate survival probabilities, with the log-rank test used for comparing survival curves between different groups. Univariate and multivariate Cox regression analyses were performed using the Cox proportional hazards model to evaluate the impact of variables on survival time. The Pearson correlation coefficient was calculated to assess the linear relationship between variables and quantify the strength of their association. A p-value of less than 0.05 was considered indicative of statistical significance.

Results

DDR genes associated with P53 mutations in HCC recurrence

Our methodology was illustrated in Fig. 1. Five pathway enrichment methods, including Kyoto Encyclopedia of Genes and Genomes pathway enrichment, Wikipathway pathway enrichment, Reactome pathway enrichment, Gene Ontology pathway enrichment, and Michele Olivieri et al. pathway enrichment. The findings consistently indicated that P53-mutant HCC exhibited a higher enrichment in DDR pathways, particularly in mismatch repair, homologous recombination, base excision repair, nucleotide excision repair, and in the regulation of DNA repair gene transcription by P53 (Supplementary table 1). In contrast, there was no notable enrichment of genes linked to DDR in wild-type P53 HCC (Fig. 2A). After the elimination of batch effects from different datasets, the distribution of each data is scattered and uniform, indicating that the data of various data sets are consistent and comparable. As a result, the analysis outcomes can be deemed relatively more reliable (Fig. 2B). Univariate Cox regression analysis was carried out on genes that exhibited significant enrichment in the DDR pathway across both the training and validation datasets. Subsequently, 106 prognostic genes that displayed a substantial association with RFS were pinpointed and incorporated into the model construction (Fig. 2C; Supplementary table 2).

Fig. 1
figure 1

Study design flowchart. The TCGA-LIHC dataset was utilized as the training set, while the PLANET, GEO, and the Xiangya cohorts were employed as the validation sets. To identify prognostic genes associated with RFS, univariate Cox regression analysis was conducted on DDR pathway genes notably enriched in P53-mutated samples, identifying 106 candidate genes. Evaluation metrics, such as the AUC and C-index, were used to assess the performance of 173 algorithm and parameter combinations. The optimal algorithm combination was selected to generate DDR signatures with the highest predictive significance for RFS. Subsequently, the HCC patient cohort was stratified into high and low DDR groups for comparative analysis. Differences in survival probability, clinical characteristics, microenvironment scores, immune cell infiltration, drug sensitivity patterns, and response to immunotherapy were assessed between these two groups.

Fig. 2
figure 2

DDR genes associated with P53 mutations in HCC recurrence. (A) GSEA map of five pathway enrichment methods with P53 mutation as a grouping in the training and validation set. (B) Pre-batch and post-batch effects PCA plots in the training and validation sets. (C) Venn diagram of genes significantly associated with RFS in training and validation sets.

Development and validation of DDR model for HCC recurrence

We identified 106 DDR genes in the training and validation sets through univariate Cox regression analysis based on RFS. We employed 173 combinations of algorithms and parameters to construct predictive models and evaluate the AUC and C-index over 1 to 5 years (Fig. 3A-B) (Supplementary table 3). CoxBoost + RSF, Lasso [fold = 10] + RSF, and Lasso [fold = 50] + RSF were among the best models, with an average AUC and average C-index around 0.7. The DDR genes set includes BCL7A, HDAC2, PRKCQ, NPM1, ERCC6, UBB, RAD54B, CCT2, MAPKAPK5, PSMD9, NUP85, CCNB1, POLR3G, EEF1E1, CHD1L, GTF3C3, SFN, THOC5, TTF2. The weight of each gene in the model was examined using the top three algorithms and parameters, revealing consistent results across all three algorithmic combinations. This indicated the relative reliability of our results; consistent findings were observed when employing diverse combinations of algorithms (Fig. 3C). The pathway analysis of the DDR genes set indicated that the primary enriched pathways were the P53 signaling and nucleotide excision repair pathways (Fig. 3D). Furthermore, we extensively summarized the published models and AUC values for predicting HCC recurrence utilizing various molecular features (Supplementary table 4).

Fig. 3
figure 3

Development and validation of DDR model for HCC recurrence. (A) Heat map of AUC values and 3-year average AUC values of the top 60 algorithms and parameter combinations in the training and validation sets. (B) Heat maps of 1-, 2-, 4-, and 5-year AUC and c-index values for the top 10 algorithms and parameter combinations in the training and validation sets. (C) The lollipop plot showed the three optimal model combinations to score the importance of DDR genes. (D) The chord plot showed the related pathways enriched by DDR genes.

The high-risk group presented worse clinical prognosis

To better understand the influence of risk scores on clinical features and prognosis, we analyzed RFS and clinical characteristics within both the high and low-risk groups. We utilized the LASSO [fold = 10] + RSF algorithm to conduct Kaplan–Meier curve analysis and log-rank tests across the TCGA, PLANET, and GEO cohorts. Consistently, the high-risk group showed a lower survival rate in all three cohorts. The Hazard Ratios were 115, 3.5, and 2.3 in the TCGA, PLANET, and GEO cohorts, respectively, with significant differences in log-rank P-values (log-rank P < 0.0001, log-rank P = 0.0030, and log-rank P < 0.0001). This implied that our risk score was predictive of HCC prognosis (Fig. 4A). Univariate Cox regression analysis showed that high-risk group was a significant risk factor (Fig. 4B), and further multivariate Cox regression analyses emphasized that the high-risk group was an independent poor prognostic factor (Hazard Ratios (95% CI): 100.9 (48.79,208.7)) (Fig. 4C). Further analysis of clinical features showed that the high-risk group was linked to recurrence type, pathological grade, stage, T stage, and presence of P53 mutation, yet it displayed no significant associations with recurrence site, N stage, M stage, age, or gender (Fig. 4D-L).

Fig. 4
figure 4

The high-risk group presented worse clinical prognosis. (A) Kaplan–Meier curves of RFS based on the log-rank test in TCGA, PLANET, and GEO for the high and low-risk groups. (B) Forest plot of univariate Cox regression of risk scores and clinical parameters. (C) Forest plot of multivariate Cox regression of risk scores and clinical parameters. (D) Combined violin and box plots of risk scores and type of recurrence. (E) Combined violin and box plots of risk scores and recurrence tissue. (F) Combined violin and box plots of risk scores and pathological grades. (G) Combined violin and box plots of risk scores and stages. (H) Combined violin and box plots of risk scores and T.stage. (I) Combined violin and box plots of risk scores and N.stage. (J) Combined violin and box plots of risk scores and P53 mutation type. (K) Combined violin and box plots of risk scores and gender. (L) Combined violin and box plots of risk scores and age.

The high-risk group demonstrated lower microenvironment score and decreased CD8 + T cell infiltration

ESTIMATE Score, xCELL Score, IPS, and TIDE were used to evaluate the immune infiltration of high and low-risk groups in the model (Fig. 5A; Supplementary table 5). Elevated ESTIMATE, xCELL, and IPS scores were linked to increased immune infiltration, whereas higher TIDE scores were associated with a poorer response to immunotherapy. In the training set and the two verification sets, the high-risk group was related to low StromalScore (ESTIMATEScore), ESTIMATEScore (ESTIMATEScore), StromaScore (xCELL Score) and MicroenvironmentScore (xCELL Score) and high dysfunction (TIDE Score). In the training set and only one validation set, the high-risk group was associated with low ImmuneScore (xCELL Score) scores and high Exclusion (TIDE Score)) and TIDE (TIDE Score)). In the training set alone, the high-risk group was associated with low ImmuneScore (ESTIMATE Score), MHC (IPS), and IPS (TIDE) scores. In the PLANET cohort, we observed no significant differences in Exclusion and TIDE scores between high and low-risk groups, which may be attributed to the cohort’s limited sample size (N = 55). However, analysis of the two larger cohorts (TCGA and GEO) revealed significant differences in Exclusion and TIDE scores, suggesting enhanced immune exclusion in high-risk groups. This indicates that the high-risk group resembled cold tumors and may have exhibited limited responsiveness to immune checkpoint monoclonal antibodies like anti-PD1, whereas the low-risk group may potentially benefit from immunotherapy. In the TCGA cohort, we applied five immune infiltration algorithms, including xCELL, CIBERSORT, CIBERSORT absolute, MCPcounter, and TIDE. Among these, four algorithms consistently indicated lower infiltration of CD8 + T cells in the high-risk group (Fig. 5B; Supplementary table 5). RNA sequencing of 16 HCC patients in the Xiangya cohort was used to calculate the DDR model-derived risk score for each sample. Subsequently, matched CyTOF was used to assess the difference in immune infiltration between the high and low-risk groups. CyTOF analysis demonstrated decreased immune cell infiltration in the high-risk group. Subsequent examination of cell subsets revealed reduced CD8 + T cell infiltration and an elevated level of depleted CD8 + T cells in the high-risk group (Fig. 5C-F; Supplementary table 5). This indicates that the low-risk group could potentially respond well to immunotherapy, whereas the high-risk group may exhibit characteristics resembling a cold tumor phenotype.

Fig. 5
figure 5

The high-risk group demonstrated lower microenvironment score and decreased CD8 + T cell infiltration. (A) Box plots of four immune infiltration scores in the high and low-risk groups in the training and validation set. The asterisks represented the statistical P-value (*P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001). (B) Box plot of the scores of five immune infiltration algorithms in the high and low-risk groups of the TCGA cohort. The asterisks represented the statistical P-value (*P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001). (C) CyTOF plot of Xiangya cohort of immune infiltration in the high and low-risk groups. (D) Quantitative analysis of the proportion of immune cells in the high and low-risk groups in the Xiangya cohort. The asterisks represented the statistical P-value (*P < 0.05). (E) CyTOF plot of Xiangya cohort of CD8 + T cell infiltration and exhausted CD8 + T cell infiltration in the high and low-risk groups. (F) Quantitative analysis of the proportion of CD8 + T cells and the proportion of exhausted CD8 + T cells in the high and low-risk groups of the Xiangya cohort. The asterisks represented the statistical P-value (*P < 0.05).

The high-risk group exhibited a poorer response to immunotherapy using Atezolizumab + Bevacizumab

Drug sensitivity scores were calculated for the high-risk and low-risk groups in a training set and two validation sets. The overlap of these scores pinpointed two specific drugs (MK-2206 and BI-2536) (Fig. 6A). The drug sensitivity scores for the two drugs revealed that the high-risk group in all three cohorts exhibited higher sensitivity to MK-2206, whereas the low-risk group demonstrated a stronger sensitivity to BI-2536. The sensitivity of MK-2206 was positively correlated with the risk scores in all three cohorts, suggesting that MK-2206 may be considered to increase the treatment effect in this subset of patients in the high-risk group; However, the correlation coefficient is limited (R2 = 0.08). The inverse correlation between the drug sensitivity and risk score of BI-2536 indicated its potential to improve treatment effectiveness in low-risk patient groups. However, it was worth noting that the correlation coefficient was moderate (R2 = 0.36) (Fig. 6B-C). The recurrence risk score model was utilized to evaluate the three drug regimens within the IMbrave150 and GO30140 cohorts. The analysis indicated a poorer prognosis for the high-risk group in the IMbrave 150 immunotherapy cohort (receiving Atezolizumab + Bevacizumab​) (Fig. 6D-E; Supplementary table 5). This observation implied a potential resistance to immunotherapy using Atezolizumab + Bevacizumab within the high-risk group.

Fig. 6
figure 6

The high-risk group exhibited a poorer response to immunotherapy using Atezolizumab + Bevacizumab. (A) Drugs with significant differences in drug sensitivity scores between the high and low-risk groups were intersected in the three cohorts. (B) The drug sensitivity scores of MK − 2206 and BI − 2536 were compared in the high-low risk groups of the three cohorts. (C) Correlation analysis of risk score and drug sensitivity score between the high and low-risk groups. (D) Kaplan–Meier curves showed progression-free survival probability in the high and low-risk groups of the IMbrave150 and GO30140 cohorts. E. Kaplan–Meier curves showed overall survival probability in the high and low-risk groups of the IMbrave150 and GO30140 cohorts.

POLR3G promoted the proliferation and migration of HCC cells in vitro

One of core gene in the DDR genes set is POLR3G. We explored the expression and function of POLR3G in vitro in HCC. PCR and western blot results showed that POLR3G levels were significantly higher in HCC cell lines compared to LO2 cells (Fig. 7A-B). We chose the Hep3B and HCCLM3 cell lines, which exhibited relatively high POLR3G expression, for gene silencing and functional experiments. CCK-8 and EdU assays in both cell lines revealed decreased cell proliferation capacity upon POLR3G silencing (Fig. 7C-F). Transwell migration assays were carried out to validate the role of POLR3G in promoting migration in HCC. Consistently, the transwell migration tests demonstrated a notable decrease in the migratory capacity of Si-POLR3G-HEP3B and Si-POLR3G-HCCLM3 cells compared to the control group (Fig. 7G-H). In summary, our study revealed that POLR3G enhances the proliferation and migration of HCC cells in vitro.

Fig. 7
figure 7

POLR3G promoted the proliferation and migration of HCC cells in vitro. (A) POLR3G mRNA expression in normal liver cells and HCC cell lines. (B) POLR3G protein expression in normal liver cells and HCC cell lines. (C-D) CCK-8 suggested that the knockdown of POLR3G suppressed the proliferation of HCC cells. The asterisks represented the statistical P-value (*P < 0.05; **P < 0.01). (E–F) EdU suggested that the knockdown of POLR3G suppressed the proliferation of HCC cells. The asterisks represented the statistical P-value (*P < 0.05). (G-H) Transwell migration tests suggested that the knockdown of POLR3G suppressed the migration of HCC cells. The asterisks represented the statistical P-value (*P < 0.05).

Discussion

HCC is an aggressive and recurrent malignant tumor that often responds poorly to immunotherapy57. Identifying biomarkers that predict recurrence and immune efficacy is therefore essential to optimizing treatment strategies for those most likely to benefit.

In our study, we examined the role of DDR signatures related to P53 mutations in predicting HCC recurrence and response to immunotherapy. By employing datasets from TCGA-LIHC, PLANET, GSE76427, GSE14520, and the Xiangya cohort, we utilized 173 diverse combinations of machine learning algorithms to construct predictive DDR signatures. We analyzed microenvironment scores, immune cell infiltration, drug sensitivity analysis, and immunotherapy prognosis in both high and low-risk patient groups. POLR3G was a protein-coding gene that encoded a critical component of the RNA polymerase III complex and possessed chromatin-binding activity58. It localized primarily to the cytoplasm and nucleolus59. Recent studies have established POLR3G as a critical regulator of genomic stability through its direct involvement in homologous recombination-mediated DNA double-strand break repair60,61. These findings demonstrated POLR3G was essential for maintaining DNA repair fidelity. Accumulating evidence indicated that POLR3G played a pivotal role in driving the progression of multiple cancers, including bladder, breast, lung, and prostate cancer62,63,64,65. Based on these findings regarding POLR3G’s dual roles in DNA damage and repair response and oncogenesis, we investigated its expression and function in vitro for HCC.

Our analysis of public transcriptomic datasets revealed higher immune microenvironment and CD8 + T cell infiltration scores in the low-risk group compared to the high-risk group. These findings were confirmed by CyTOF results from the Xiangya cohort. Previous studies have shown that lower levels of tumor-infiltrating CD8 + T cells were associated with worse immunotherapy response66,67. Our analysis of the IMbrave150 immunotherapy cohort revealed poorer OS in the high-risk group, indicating a correlation between DDR and immunotherapy (Atezolizumab + Bevacizumab) resistance9. Targeting DDR might be a potential combination strategy for activating the tumor immune microenvironment and enhancing immunotherapy in HCC. For example, inhibitors of poly ADP-ribose polymerase (PARP), ataxia telangiectasia-mutated (ATM), significantly up-regulated STING pathway to activate CD8 + T cells and enhanced the efficacy of anti-PD-L168,69,70,71,72. The inhibitor of ataxia telangiectasia and Rad3-related (ATR) induced synthetic lethality in mismatch repair-deficient cells to augment immunotherapy response73.

Current post-surgery target therapies and immunotherapies for high-risk recurrent HCC remain ineffective. Trials like IMbrave050 and STORM failed to demonstrate significant prolongation of survival6,10 . In our study, MK-2206 and BI-2536 showed potential as therapeutic agents, with MK-2206 being more effective in the high-risk group and BI-2536 in the low-risk subset; however, this correlation needs validation in larger cohorts74,75.

Our model had several distinct advantages over existing molecular models in predicting HCC recurrence and therapy response. For instance, in predicting 1-year recurrence, our model achieved an average AUC value of 0.752, surpassing the earlier models whose AUC values ranged from 0.598 to 0.661. (Tang et al.76 microvascular invasion-related genes model, AUC = 0.655; Long et al.77 DNA methylation driver genes model, AUC = 0.661; Kong et al.78 recurrence-related genes model, AUC = 0.598; Wang et al.79 Seven core genes model, AUC = 0.616). Similar results were observed for the 2–5 years AUC values76,77,78,79. Another key strength of our approach lied in the application of 173 machine learning algorithm combinations, which offered more robust and reliable predictions. Moreover, unlike models validated solely on publicly available transcriptome data80,81,82, our model underwent validation with the Xiangya cohort, which significantly bolstered its clinical relevance and stability. Furthermore, unlike other models relying on TIDE algorithms instead of HCC immunotherapy cohorts to predict immunotherapy response83,84, we utilized real HCC immunotherapy cohorts to demonstrate the predictive value of our model.

While our study presented promising findings, it also had several limitations. Firstly, the analysis at the single-cell level is constrained due to the lack of single-cell transcriptome data with RFS information. Subsequent studies can pay attention to the collection of RFS information in single-cell sequencing experiment. Secondly, the study’s brevity limited our exploration of the functional roles of the genes involved in the models. To gain a comprehensive understanding, future research could investigate how these genes contribute to HCC recurrence and response to immunotherapy, elucidating their mechanisms of action. Thirdly, while MK-2206 and BI-2536 have shown potential as therapeutic agents for HCC, their roles were not comprehensively studied here. Future investigations should delve deeper into their mechanisms and efficacy, aiming to solidify their potential as viable drug targets for HCC treatment.

Conclusion

In conclusion, our study identified DDR signatures associated with P53 mutations to predict HCC recurrence and treatment response, highlighting their potential as prognostic and therapeutic biomarkers.