Introduction

Hepatocellular carcinoma (HCC)is one of the most common and aggressive malignancies worldwide, with a high incidence and mortality rate1. In most cases, HCC is diagnosed at advanced stages when surgical resection is no longer feasible due to tumor size, location, or liver function impairment2. Transarterial chemoembolization (TACE)has now become the most widely used local treatment for patients with unresectable HCC3. It is effective in controlling tumor growth and improving survival in many patients, making it a cornerstone in the management of intermediate-stage HCC4.

However, despite its widespread use, TACE is not suitable for all patients. Treatment options for liver cancer are diverse, including radiotherapy and systemic therapies, among others. The combination of local and systemic treatments has been shown to significantly improve survival outcomes compared to systemic treatment alone5,6,7. However, if a patient does not benefit from TACE, they may miss the opportunity to receive other potentially effective treatments. The benefit of TACE is not uniformly observed across all patients, highlighting the need for precise patient selection8. Identifying which patients are likely to benefit from TACE remains a critical challenge, as inappropriate treatment not only fails to improve survival but may also delay or prevent the application of alternative therapies.

In recent years, artificial intelligence (AI), particularly machine learning, has emerged as a powerful tool in medical research, especially in the development of predictive models for clinical decision-making9,10. Machine learning algorithms excel at analyzing complex patterns within large datasets, allowing for more accurate predictions of patient outcomes11. However, despite advancements in liver cancer diagnosis, a significant challenge remains in identifying reliable biomarkers for alpha-fetoprotein (AFP)-negative tumors, which account for 30–40% of pathologically diagnosed HCC patients12.

Heat shock protein 90 alpha (HSP90α)has recently emerged as a promising circulating biomarker in oncology, including HCC. HSP90α is a molecular chaperone that stabilizes and activates a wide range of oncogenic client proteins and regulates multiple cancer-related pathways, such as cell growth and survival, DNA damage response, angiogenesis, epithelial–mesenchymal transition, and cell migration. Tumor cells can also secrete extracellular HSP90α, which further promotes invasion, metastasis, and neovascularization in the tumor microenvironment13.In HCC, several clinical studies have shown that plasma HSP90α levels are significantly higher in patients with liver cancer than in healthy individuals or those with benign liver diseases, and that HSP90α levels correlate positively with tumor burden and disease severity, including larger tumor size, multiple lesions, portal vein tumor thrombosis, extrahepatic metastasis, advanced BCLC stage, elevated AFP, and impaired liver function14.Moreover, dynamic changes in plasma HSP90α have been reported to reflect tumor load and treatment response, with levels decreasing after tumor resection or effective therapy, and higher baseline HSP90α being associated with worse prognosis15. Our previous multicenter study also demonstrated that elevated HSP90α is independently associated with poorer survival and can serve as both a prognostic and predictive biomarker in HCC, making it an attractive candidate for integration into AI-based prediction models16.

This study aims to leverage machine learning techniques to identify patients who are most likely to benefit from TACE and predict the survival outcomes of patients with unresectable HCC undergoing TACE. By integrating machine learning with biomarker data, we hope to provide a more personalized approach to TACE treatment, improving patient outcomes and minimizing unnecessary interventions.

Results

Patient characteristics

The technical workflow of this study is shown in Fig. 1. Before PSM, there were significant differences between the non-TACE (n = 1126)and TACE (n = 1429)groups in age, Hepatitis B Virus (HBV), diabetes mellitus, Child, ALBI (Albumin-bilirubin)grade, AFP, alkaline phosphatase (ALP), alanine aminotransferase (ALT), barcelona clinic liver Cancer (BCLC)stage, tumor number, tumor size, portal vein tumor thrombosis (PVTT), and metastasis (M, all P < 0.05). After 1:1 PSM, a total of 666 matched pairs of patients were identified. No significant differences in baseline characteristics were observed between the two groups (Table 1).

Fig. 1
Fig. 1
Full size image

Workflow diagram of this study.

Table. 1 Baseline characteristics of the patients before and after PSM

Survival

Before propensity score matching (PSM), the median overall survival (mOS)in the TACE group was significantly longer than that in the non-TACE group (19.6 [18.4–21.0] vs. 11.3 [10.3–12.9] months, P < 0.001). The 1-, 2-, and 3 year OS rates in the TACE group were 62.3%, 42.2%, and 34.4%, respectively, whereas the corresponding rates in the non-TACE group were 47.8%, 37.3%, and 32.9% (Fig. 2A). After PSM, the TACE group still showed a superior mOS compared with the non-TACE group (19.1 [17.1–21.3] vs. 12.0 [10.8–14.4] months, P < 0.001; Fig. 2B).

Fig. 2: Kaplan–Meier curves for OS in different patient groups.
Fig. 2: Kaplan–Meier curves for OS in different patient groups.
Full size image

A OS before PSM, comparing TACE and no TACE groups. B OS after PSM, comparing TACE and no TACE groups. C OS by HSP90α expression levels (low vs. high). D OS by treatment benefit (benefit vs. no benefit). OS, overall survival; PSM, propensity score matching; TACE, transarterial chemoembolization; HSP90α, heat shock protein 90 alpha.

Based on our previous research16, HSP90α levels ≥ 143.5 ng/mL were categorized into the high-expression group, while levels below this threshold were classified into the low-expression group. The mOS for the high-expression group was 7.3 (6.3-8.0)months, while the median OS for the low-expression group was 24.7 (22.5-27.7)months (P < 0.001, Fig. 2C).

Machine learning prediction of TACE-benefit in the TACE group

Initially, the TACE benefit group was established using the residual method. The TACE benefit group exhibited a significantly longer mOS compared to the non-benefit group (32.2 (27.1-NA)vs. 11.0 (9.7-12.4)months, P < 0.001, Fig. 2D).

In the TACE group, LASSO (Least Absolute Shrinkage and Selection Operator), Random Forest (RF), Support Vector Machine (SVM), Generalized Linear Model (GLM), Gradient Boosting Machine (GBM), K-Nearest Neighbors (KNN), Neural Network (NNET), and Decision Tree (DT)were used to select factors associated with TACE benefit. The ROC curve shows that the RF model performs the best with an area under the receiver operating characteristic (AUC-ROC)of 0.897 (Fig. 3A).

Fig. 3: Development and validation of the TACE benefit prediction model.
Fig. 3: Development and validation of the TACE benefit prediction model.
Full size image

A ROC curve showing the RF model’s superior performance with an AUC-ROC of 0.897. B Residual boxplot indicating the smallest residuals in the RF model, reflecting better predictive performance. C Reverse cumulative distribution curve demonstrating the RF model’s superior prediction accuracy compared to other models. D Upset plot illustrating the intersection of the top 10 predictive factors from the 8 models, selecting three key factors: HSP90α, BCLC, and size. E Online Nomogram developed based on HSP90α, BCLC, and Size to predict TACE benefit in the training set. F ROC curve in the validation set, achieving an ROC value of 0.901 for predicting TACE benefit. G Calibration curve showing the model’s robust stability and prediction accuracy. H Decision Curve Analysis confirming the model’s clinical utility. AUC-ROC, area under the receiver operating characteristic curve; HSP90α, heat shock protein 90 alpha; BCLC, barcelona clinic liver Cancer; TACE, transarterial chemoembolization.

The residual boxplot shows that the RF model has the smallest residuals, indicating better predictive performance (Fig. 3B). The reverse cumulative distribution curve further demonstrates that the RF model has superior prediction accuracy compared to other models (Fig. 3C). The variable importance plot reveals that HSP90α is the most influential factor in predicting TACE benefit within the RF model (Supplementary Fig. 1). The Upset plot shows the intersection of the top 10 predictive factors from the 8 models, ultimately selecting three factors: HSP90α, BCLC, and size (Fig. 3D).

A total of 1429 patients who underwent TACE were randomly divided into a training set (n = 999)and a validation set (n = 430)in a 7:3 ratio. Baseline characteristics are shown in Supplementary Table 1. There were no statistically significant differences in baseline characteristics between the training and validation sets. An online Nomogram (https://kesu.shinyapps.io/DynNomapp/)was developed to predict TACE benefit in the training set, based on three factors: HSP90α, BCLC, and Size (Fig. 3E). In the validation set, the model achieved an ROC value of 0.901 for predicting TACE benefit (Fig. 3F), and both the calibration curve (Fig. 3G)and decision curve analysis (DCA)curve (Fig. 3H)confirmed the model’s robust stability.

Machine learning-based prediction of OS in the TACE group

In a cohort of 1429 unresectable HCC patients treated with TACE, no significant baseline differences were observed among the three groups: the training set (n = 790), the internal validation set (n = 340), and the external validation set (n = 299, Table 2).

Table. 2 Baseline Characteristics of Patients in the Training, Internal Validation, and External Validation Sets

First, univariate Cox regression analysis was conducted to identify 13 prognostic factors associated with OS, including HSP90α, Hypertension, Child, ALBI, AFP, ALP, ALT, BCLC, number of tumors, tumor size, PVTT, N, and M. Subsequently, 101 machine learning models were developed to predict OS in TACE group. Among these models, the StepCox[forward] + RSF model exhibited the highest C-index, with values of 0.84, 0.70, and 0.78 in the training set, internal validation set, and external validation set, respectively (Fig. 4). The StepCox[forward] + RSF model was developed by first using stepwise Cox regression to select significant prognostic factors, followed by applying the RSF model to predict overall survival.

Fig. 4
Fig. 4
Full size image

C-index of 101 machine learning models constructed in the training set, internal validation set, and external validation set.

Risk scores for TACE group were calculated based on the StepCox[forward] + RSF model. As shown in Supplementary Fig. 2, patients in the high-risk group had significantly shorter OS compared to those in the low-risk group (all P < 0.001), demonstrating the model’s strong predictive performance. The top five variables most strongly associated with OS were identified as HSP90α, BCLC, number of tumors, tumor size, and ALP.

The StepCox[forward] + RSF model was further validated in both the internal and external validation sets. The AUC-ROC values for predicting 1-, 2-, and 3 year OS in the internal validation set were 0.835, 0.821, and 0.776, respectively (Fig. 5A). In the external validation set, the AUC-ROC values were 0.854, 0.790, and 0.804 (Fig. 5B). The DCA demonstrated significant clinical benefit, with higher net benefits at low threshold probabilities (0-20%), outperforming both the “All treatment” and “No treatment” strategies (Fig. 5C, D). The calibration plot showed that the observed survival closely matched the predictions of the StepCox[forward] + RSF model, confirming its accuracy in predicting 1-, 2-, and 3 year OS. These results highlight the model’s consistency and reliability (Fig. 5E, F).

Fig. 5: Performance evaluation of the StepCox[forward] + RSF model for predicting OS.
Fig. 5: Performance evaluation of the StepCox[forward] + RSF model for predicting OS.
Full size image

The ROC curves for the internal (A) and external (B) validation sets, showing the AUC values for 1-, 2-, and 3 year OS. The decision curve analysis for the internal (C) and external (D) validation sets, with net benefits at varying threshold probabilities. Calibration plots for the internal (E) and external (F) validation sets, comparing predicted OS with observed OS. ROC, receiver operating characteristic curve; OS, overall survival.

HSP90α Expression and Clinical Factors

Based on baseline stratification, HSP90α expression was positively correlated with higher AFP (Fig. 6A), worse Child-Pugh scores (Fig. 6B), more tumor number (Fig. 6C), the presence of N (Fig. 6D) and M (Fig. 6E)stages, and poorer BCLC staging (Fig. 6F).

Fig. 6: Association between HSP90α expression and key clinical characteristics.
Fig. 6: Association between HSP90α expression and key clinical characteristics.
Full size image

The relationship between HSP90α expression and AFP (A), Child-Pugh class (B), tumor number (C), N stage (D), M stage (E), and BCLC stage (F). HSP90α, heat shock protein 90 alpha; BCLC, barcelona clinic liver Cancer.

Discussion

TACE is widely used in the treatment of advanced-stage HCC. This study is the first to combine machine learning and HSP90α to identify patients who will benefit from TACE and to assess OS, providing a more personalized approach to clinical decision-making.

Our results showed high ROC values, with the AUC-ROC values surpassing those in current studies on HCC17,18. This indicates that our machine learning-based model, incorporating HSP90α as a predictive factor, is highly effective in identifying TACE benefit. Compared to other models in the literature, our study demonstrates superior accuracy in predicting OS at multiple time points, which can be attributed to the inclusion of HSP90α, a promising biomarker in HCC19,20.

Given the relatively low positive rate of AFP in some HCC cases, especially in AFP-negative patients, exploring alternative biomarkers is essential12,13. This study confirms that HSP90α expression is associated with worse OS and more advanced disease stages. Elevated levels of HSP90α correlate with poorer prognosis in HCC, and this protein plays a significant role in promoting cancer progression and resistance to treatment. It is involved in stabilizing various client proteins that are critical for tumor survival and metastasis, making it a potential therapeutic target for future treatment strategies21,22,23.

In this study, we initially screened HSP90α, BCLC, and tumor size as predictive factors for TACE benefit using eight different machine learning models. After selecting the most relevant features, we constructed an online nomogram, which achieved an AUC-ROC value of 0.901, confirming the model’s strong predictive performance. By combining machine learning-based feature selection with the nomogram, we developed a tool that allows for personalized risk assessment, helping clinicians identify patients most likely to benefit from TACE. This approach not only enhances the accuracy of treatment decisions but also minimizes the risk of administering ineffective treatments, thereby avoiding unnecessary interventions and ensuring that patients receive the most appropriate and beneficial therapy for their condition24,25,26.

Building upon this, we developed 101 machine learning models to further assess OS in patients with unresectable HCC undergoing TACE. Among these models, the StepCox[forward] + RSF model demonstrated the best predictive performance, with a C-index exceeding 0.8. In both the internal and external validation cohorts, the ROC for predicting 1-, 2-, and 3 year OS exceeded 0.8. The superior predictive accuracy of the StepCox[forward] + RSF model can be attributed to its ability to handle complex relationships between variables and its robust calibration. The forward selection process in StepCox ensures that only the most significant predictors are retained, while the RSF model contributes by incorporating the non-linear interactions between factors, making it particularly effective in heterogeneous data sets like those in HCC27,28.

Moreover, HSP90α was confirmed as one of the most important factors influencing OS in our model. Its inclusion not only improved the accuracy of survival prediction, but also highlighted the critical role of HSP90α in HCC prognosis. Elevated HSP90α expression has been consistently linked to worse survival and more advanced disease stages, supporting its potential as both a predictive biomarker and a therapeutic target. In our cohort, higher HSP90α levels were associated with larger tumors, more advanced stage, PVTT, metastasis, and impaired liver function. This pattern is biologically plausible, as elevated HSP90α likely reflects greater tumor burden, more aggressive tumor biology, and systemic inflammation in advanced HCC, and thus serves as an integrated marker of overall disease severity29. These findings reinforce the value of incorporating molecular markers such as HSP90α into predictive models to refine individualized treatment planning and risk stratification.

Numerous studies have shown that combining TACE with systemic treatments can provide superior survival benefits compared with either modality alone30,31,32. Because TACE remains the most commonly used locoregional treatment for unresectable HCC, accurately identifying which patients are likely to benefit from it is crucial. By integrating our TACE-benefit nomogram with the StepCox[forward] + RSF survival model, we can both identify patients who are more likely to derive benefit from TACE and quantify their expected OS under a TACE-based strategy. This combined approach has substantial potential to enhance the clinical management of unresectable HCC. By avoiding TACE in patients with a low predicted benefit, clinicians may reduce unnecessary procedures and facilitate timely transition to alternative systemic or trial-based therapies, thereby improving overall treatment efficiency and outcomes.

From a clinical perspective, the TACE-benefit nomogram and the StepCox[forward] + RSF survival model are designed to complement, rather than replace, multidisciplinary decision-making. The nomogram, which is available as an online calculator, can be applied at the time of initial treatment planning to estimate an individual patient’s probability of benefit from TACE based on HSP90α, BCLC stage, and tumor size. In practice, patients with a high predicted probability of TACE benefit could be prioritized for TACE-based strategies, whereas those with a low predicted probability might be considered for alternative locoregional approaches, upfront systemic therapy, or early enrolment in clinical trials. Patients with intermediate probabilities would be suitable for shared decision-making, taking into account liver function, comorbidities, patient preferences, and institutional expertise.

The StepCox[forward] + RSF survival model further refines risk stratification among TACE-treated patients by providing an individualized survival prediction and a model-based risk score. Using the predefined cut-offs employed in this study, patients in the low-risk group achieved substantially longer OS than those in the high-risk group. In real-world practice, low-risk patients with a high predicted probability of TACE benefit might be managed with TACE-based regimens combined with targeted and/or immunotherapy and standard surveillance intervals. In contrast, high-risk patients, particularly those with a low predicted TACE benefit, could be considered for earlier transition to systemic therapy, more intensive follow-up, or enrolment in trials testing novel combinations rather than repeated TACE. For example, a patient with intermediate-stage disease, limited tumor burden, preserved liver function, and low HSP90α levels may have both a high predicted probability of TACE benefit and a low-risk survival score, supporting TACE combined with targeted therapy as the initial strategy. Conversely, a patient with multifocal advanced-stage HCC, high HSP90α levels, and a high-risk score may have a low predicted TACE benefit, and in such a case the model outputs would favor upfront systemic therapy or clinical trial enrolment instead of repeated TACE.

However, this study has several limitations. First, although it was conducted across multiple centers, variations in clinical practice patterns and technical expertise between institutions may have affected the consistency of the findings. Second, although all patients received TACE-based therapy combined with targeted agents and, in a subset of cases, immunotherapy, the specific systemic regimens (including drug type, combination strategy, and dosing schedules)were not fully standardized across centers. We did not stratify our analyses according to individual targeted or immunotherapy regimens, and residual confounding from this treatment heterogeneity may have influenced effect estimates and model performance; however, this variation also reflects real-world clinical practice and may, to some extent, enhance the generalizability of our models. Third, the retrospective design of the study could introduce selection bias, despite the use of propensity score matching to control for confounding variables. Future prospective, multicenter studies are needed to validate our findings and further refine the predictive model. Moreover, the mechanism by which HSP90α influences TACE outcomes requires further investigation through laboratory studies to better understand its role in tumor biology and treatment resistance. Finally, all patients in this study were treated at tertiary centers in China, where the etiology of HCC is predominantly HBV-related and local practice patterns for TACE and systemic therapy may differ from those in Western countries or regions. As a result, the performance and clinical utility of our models may not be fully generalizable to populations with different etiologic profiles, liver function reserves, or treatment availability. External validation in independent cohorts from other geographic regions and healthcare systems, and, if necessary, model recalibration or retraining, will be essential before these tools can be widely adopted in non-Chinese settings33,34,35.

This study demonstrates that integrating AI with HSP90α expression can identify unresectable HCC patients who are more likely to benefit from TACE and provide robust prediction of OS. The TACE-benefit nomogram and StepCox[forward] + RSF survival model offer a practical framework to support personalized treatment selection and to avoid unnecessary or ineffective TACE in real-world practice.

Methods

Patients

A total of 2555 unresectable HCC patients were initially enrolled at seven Chinese tertiary hospitals between 2016 and 2021. Inclusion criteria were: (1) clinically or pathologically diagnosed HCC, (2) Child-Pugh class A or B, (3) at least one measurable lesion according to the Response Evaluation Criteria in Solid Tumors (RECIST), and (4) available pre-treatment HSP90α data. Exclusion criteria included: (1) presence of other malignant tumors, (2) contraindications to TACE, such as severe cirrhotic ascites, active infections, or poor liver function, (3) incomplete clinical information, and (4) loss to follow-up.

Serum HSP90α levels were measured independently at the clinical laboratories of each participating hospital according to routine institutional procedures.

HCC was clinically diagnosed by experienced clinicians when the following criteria were met1: presence of liver cirrhosis with chronic hepatitis B and/or hepatitis C virus infection and a newly detected hepatic mass on imaging2; a hepatic lesion ≥ 2 cm in diameter showing typical HCC enhancement on contrast-enhanced CT or MRI (heterogeneous arterial phase hyperenhancement with rapid washout in the portal venous or delayed phase)3; for hepatic lesions < 2 cm, the same typical enhancement pattern observed on both contrast-enhanced CT and MRI; or4 persistently elevated serum AFP, defined as ≥ 400 μg/L for ≥ 1 month or ≥ 200 μg/L for ≥ 2 months, in the presence of a compatible hepatic mass on imaging.

This study was approved by the Ethics Committee of Chongqing People’s Hospital (KY S2025-029-01)and was conducted in accordance with the principles outlined in the Declaration of Helsinki. All patients provided written informed consent prior to receiving treatment, and the study adhered to ethical guidelines for medical research.

Treatment and follow-up

TACE was carried out with the assistance of a digital subtraction angiography (DSA)system. The procedure commenced with the Seldinger technique to access the celiac trunk and superior mesenteric artery, allowing for precise catheter-based angiographic imaging to assess the tumor’s vascular supply and extent. Following this, a microcatheter was navigated into the artery providing blood to the tumor, where a combination of chemotherapy agents and embolic materials were delivered. Once successful embolization was verified by angiography, the catheter and sheath were removed, and hemostatic pressure was applied to the puncture site using bandages. All patients received targeted therapy, and a subset of patients also received immunotherapy as part of their treatment regimen.

The primary endpoint was OS, which was measured from the start of treatment until death from any cause or the last follow-up for censored individuals.

Cox-residual modeling

To identify patients who were most likely to benefit from TACE, we used a Cox–residual strategy. First, a multivariable Cox proportional hazards model for OS was fitted in the non-TACE cohort. For each patient in the TACE cohort, this Cox model was then used to calculate an individual linear predictor based on their baseline characteristics and to derive a model-based expected OS. The Cox residual for each TACE-treated patient was defined as the difference between the observed OS and the expected OS (residual = observed OS − expected OS). Patients with positive residuals, indicating that their observed survival exceeded the model-predicted survival, were classified as the “TACE benefit” group, whereas those with zero or negative residuals were classified as the “non-benefit” group.

Machine learning models for TACE benefit

A total of 1429 patients who underwent TACE were randomly divided into a training set (n = 999)and a validation set (n = 430)in a 7:3 ratio. Eight machine learning algorithms, including LASSO, RF, SVM, GLM, GBM, KNN, NNET, and DT models, were applied to identify predictive indicators of TACE benefit. For all machine-learning models, hyperparameters were selected using 10-fold cross-validation.

An UpSet plot was then used to extract the intersecting indicators across the eight models, thereby screening out the most robust predictors. Based on these intersection indicators, an online nomogram was developed in the training set. The performance of the nomogram was subsequently evaluated in the validation set using ROC curves, calibration curves, and DCA.

Machine learning models for OS in TACE group

1130 HCC patients treated with TACE from 6 hospitals were randomly split into a training set (n = 790)and an internal validation set(n = 340)in a 7:3 ratio. In addition, data from 299 unresectable HCC patients who underwent TACE at the Affiliated Hospital of Southwest Medical University were used as an independent external validation set.

In the training set, univariate Cox regression analysis was first performed to identify prognostic indicators in unresectable HCC patients receiving TACE. Subsequently, a total of 101 machine learning models were constructed to predict OS, and the model with the highest concordance index (C-index)was selected. In both the internal and external validation sets, ROC curves, calibration curves, and DCA were employed to evaluate the predictive performance of the selected model for 1-, 2-, and 3 year OS.

Statistical analysis

Categorical variables were processed using chi-square tests or Fisher’s exact tests, while continuous variables were handled using t-tests for normally distributed data or Mann-Whitney U tests for non-normally distributed data. PSM was used to eliminate baseline differences between the non-TACE and TACE groups. Propensity scores were estimated using a logistic regression model based on baseline covariates, and patients in the two groups were then matched in a 1:1 ratio using nearest-neighbor matching without replacement with a caliper of 0.1 on the propensity score. Kaplan-Meier (KM)curves were used to assess OS, and differences between groups were calculated using the log-rank test. All statistical analyses were conducted using R software. A p-value of <0.05 was considered statistically significant.