Introduction

In 2020, there were approximately 841,080 new cases of hepatocellular carcinoma (HCC) and 781,631 related deaths worldwide [1]. Unfortunately, approximately > 70% of HCC patients are already in the advanced stage upon initial diagnosis, leading to dismal overall survival (OS) rates [2, 3]. Among the limited treatment options, multitargeted tyrosine kinase inhibitors (TKIs), such as sorafenib and lenvatinib, are recommended as a first-line treatment in advanced HCC (Ad-HCC) patients with good hepatic function reserve according to the National Comprehensive Cancer Network guidelines [4]. In addition, transarterial chemoembolization (TACE) and hepatic arterial infusion chemotherapy (HAIC) are the current standard intra-arterial therapies (IATs) for patients with unresectable HCC [5, 6]. The combination of IATs with TKIs or immune checkpoint inhibitors (ICIs) has led to significant improvements in the survival benefits of Ad-HCC patients [7,8,9].

Ad-HCC often presents greater tumour invasiveness than early-stage HCC, and the emergence of a high tumour burden and cancer thrombi are common phenomena [3, 10]. Given their high heterogeneity, the optimal IAT strategy for Ad-HCC patients remains a controversial issue in clinical practice. Previous studies have reported that imaging features play an important role in the decision between TACE and HAIC [11, 12]. However, interventional radiologists decide whether to use TACE or HAIC based on their own assessment of the patient’s physical condition and the blood supply status of the tumour in clinical practice. Although intra-arterial combination therapy yields significant objective response rates (ORRs), not all Ad-HCC patients will achieve good outcomes, and the incidence of adverse effects also increases accordingly. Therefore, an inappropriate IAT combination scheme may not only affect the quality of life but also increase the economic burden. An accurate and noninvasive method for the preoperative estimation of HCC patients’ status is urgently needed to identify the optimal candidates who can obtain survival benefits from TACE, HAIC, and the combination scheme.

Machine learning (ML) is a branch of artificial intelligence that employs statistical, probabilistic and optimisation techniques to train a machine how to learn [13,14,15]. ML algorithms can learn from data, identify patterns and make decisions with minimal human intervention by automating analytical model building. In the medical field, there are many commonly used ML algorithms for the prediction of oncological outcomes. For example, An C et al. used five ML algorithms to establish risk prediction models for early recurrence after microwave ablation of HCC based on multicenter clinical data [14]. Their ML-based model was shown to be robust and had a high predictive value, thus providing guidance and assistance to physicians. In the current study, we aimed to develop and validate an ML-based decision support model to help interventional radiologists make decisions regarding IAT schemes in unresectable HCC.

Materials and Methods

This retrospective, multi-institutional study protocol was approved by the Institutional Review Board of all participating hospitals (NCC-010298) and was conducted in accordance with the principles of the 1975 Declaration of Helsinki. Due to the retrospective nature of this study, the requirement for written informed consent was waived. This study followed the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines [16]. No formal sample-size calculation was performed beforehand, but the large number of training cohort death events compared to that of variables analysed at the multivariable Cox regression analysis guaranteed the “ten events per variable” rule of thumb, thus implying sufficient accuracy of the regression estimates.

Patient enrolment

Between October 2014 and October 2022, a total of 4879 consecutive patients with HCC who underwent initial IATs were reviewed in 13 tertiary hospitals in China. The distribution of clinical data sources is shown in Supplementary Table 1. All of the patients were diagnosed based on the European Association for the Study of Liver and the American Association for the Study of Liver Disease guidelines [17, 18]. The inclusion criteria of this study were as follows: (a) age 18–75 years; (b) Eastern Cooperative Oncology Group (ECOG) performance status < 2; (c) Child-Pugh class A or B liver function; (d) the management of conventional TACE or HAIC of FOLFOX regimen (oxaliplatin plus fluorouracil and leucovorin). The exclusion criteria were as follows: (a) the patients received any therapeutic protocols before IATs; (b) HCC combined with other malignancies; (c) missing image data during the perioperative period; (d) lost to follow-up > 12 months. The IAT procedures and the criteria for protocol treatment discontinuation are shown in Supplementary Methods E1.12. The IAT combination therapy protocol is described in Supplementary Methods E1.3.

Study design

Figure 1 demonstrates the patient enrolment pathway in this study. A total of 2959 unresectable HCC patients (240 females and 2719 males; mean age, 52.8 ± 11.5 years) were enroled. To build an ML model adapted to clinical decision-making for IATs, the design scheme is as follows. First, the population is divided into two groups (intra-arterial combined therapy [I-C] group and intra-arterial therapies alone [I-A] group) in order to support decision-making in the IAT combination scheme. Second, the patients are further divided into two groups (TACE group and HAIC group) in order to support decision-making regarding whether patients receive TACE or HAIC. We aimed to build an ML-based decision support model (MLDSM) to guide physicians to provide decision support before administering IAT. To develop and validate the MLDSM, eligible patients were divided into three cohorts: a training cohort, a validation cohort and a test cohort.

Fig. 1: Enrolment pathway of unresectable HCC patients who underwent various IAT schemes.
figure 1

HCC patients were enroled to build an ML model adapted to clinical decision-making for IATs. The design scheme was as follows: first, the population was divided into two groups (intra-arterial combined therapy group and intra-arterial therapies alone group) in order to support decision-making regarding the IAT combination scheme; second, the patients were further divided into two groups (TACE group and HAIC group) in order to provide support for decision making regarding whether patients received TACE or HAIC. Clinical applications in guiding the IAT scheme were established based on the prediction results of ML with the best performance. HCC hepatocellular carcinoma, ML machine learning, HAIC hepatic arterial infusion chemotherapy, TACE transarterial chemoembolization, IAT intra-arterial therapy.

Follow-up protocol and endpoint definition

All eligible patients were censored at the last follow-up date (May 31, 2023). After thorough IAT procedures were accomplished, the serum alpha-fetoprotein (AFP) and dynamic contrast-enhanced images (computed tomography [CT] or magnetic resonance imaging [MRI]) were examined again at 3–6 month intervals during IATs and every 6 months thereafter. The contrast-enhanced CT images before and after IAT were reviewed independently by two fellowship-trained abdominal radiologists at each participating centre who had 5 to 10 years of experiences in liver imaging. The reviewers were aware that all patients had HCC but were blinded to the remaining clinical, histopathologic, therapeutic, and follow-up information. We selected a 12-month OS as the primary endpoint, which was defined as the death of any cause or deadline for 12 months of follow-up.

The Development and comparison of the ML model

We collected 36 clinical variables, and their definitions are described in Supplementary Methods E1.5 and Supplementary Table 2. According to previous report of Liu WD et al. [19], five representative supervised ML algorithms (eXtreme Gradient Boosting [XGBoost], Categorical Gradient Boosting [CatBoost], Gradient Boosting Decision Tree [GBDT], Light Gradient Boosting Machine [LGBT] and Random Forest [RF]) were included in this study, and their parameters are shown in Supplementary Tables 36. In the optimal ML-based model, all clinical variables were used for training and ranked according to feature class discrimination importance. The 3 least important variables were removed, and the remaining 29 feature variables were retrained in each model. Based on recursive feature elimination with cross-validation (RFECV), we performed the three least important feature elimination processes ten times (3/6/9/12/15/18/21/24/27/30 features) to evaluate the predictive performance of the risk features.

ML-based model for decision support of the IAT scheme

The MLDSM was developed through the integration of clinical information for the prediction of 12-month OS using ML classifiers. The construction of the five ML models is described in detail in Supplementary Methods E1.6. A MLDSM with the optimal performance in each test cohort (Best I-C model, Best I-A model, Best TACE model, and Best HAIC model) was connected in parallel to build the MLDSM. Implementing the MLDSM was a two-step process. First, the MLDSM was used to predict the 12-month OS probability among patients who receive I-C as the first treatment for HCC and the 12-month OS probability of patients who receive I-A as the first treatment. Second, the MLDSM was used to predict the 12-month OS probability of patients receiving TACE as the first treatment for HCC and the 12-month OS probability of patients receiving HAIC as the first treatment. Decision trees were constructed based on clinical information to predict the high- or low-risk group of the MLDSM; the details of this process are shown in Supplementary Methods E1.6 and Supplementary Table 7.

Statistical analysis

Statistical analysis was performed using SPSS version 26.0 (IBM Corp., NY, USA) and the RMS package of R software version 3.5.1 (http://www.r-project.org/). We compared the performance of 5 ML-based models for the prediction of 12-month mortality using the area under the receiver operating characteristic curve (AUC) with the Delong test. The Hosmer‒Lemeshow test was applied to assess the quality of calibration. To interpret the causal relationship of ML, we used the Shapley Additive exPlanations (SHAP) method to explain the prediction results [20, 21]. The SHAP algorithm calculates the Shapley value of each variable based on game theory to determine the relative importance of each variable in the optimal performance of the ML model. ML model-based trees relax the assumption that the treatment effect is the same for all individuals and can estimate stratified treatment effects. The strata are identified using a data-driven fashion and rely on the features of the individuals.

All significance tests were two-sided, and a P value < 0.05 was considered to indicate statistical significance.

Results

Baseline Characteristics

In this study, 1856 patients were assigned to the I-A group, and 1103 patients were assigned to the I-C group. For the total cohort, the 12-month death rates were 31.9% (352/1103) in the I-C group and 50.4% (936/1856) in the I-A group; this difference was significant (P < 0.001). The baseline characteristics between different IAT schemes are shown in Table 1. Age, cirrhosis, ALBI grade, tumour diameter and number, vascular invasion, metastasis, Barcelona Clinic Liver Cancer (BCLC) stage, albumin), aspartate aminotransferase (AST), platelet, C reactive protein (CRP), treatment modality and local therapy had significantly different distributions between the two groups (all, P < 0.05). In addition, the baseline characteristics of HCC patients who received IATs in the training cohort, validation cohort and test cohort are outlined in Supplementary Tables 811. A total of 32 risk factors were examined via correlation analysis, the correlation coefficient matrix heatmap of the features showed that the clinical features were negatively or positively correlated with 12-month OS (Supplementary Fig. 1).

Table 1 Comparisons of baseline characteristics of HCC patients who received IATs.

ML model building and selection

The line chart shows the trend of the predictive ability of each ML model with the input of clinical variables (Supplementary Fig. 2a–d). The AUC, SENS, SPEC, PPV, and NPV of the five ML models are outlined and compared in Supplementary Tables 1219, Fig. 2 and Supplementary Fig. 3. Ultimately, the LGBM model achieved the best performance in the I-A group, and the Catboost model achieved the best performance in the I-C group. Herein, the LGBM model trained on the 24 variables had AUCs of 0.812 (95% CI, 0.768–0.862) in the training cohort, 0.771 (95% CI, 0.759–0.832) in the validation cohort, and 0.759 (95% CI, 0.728–0.834) in the test cohort. The top six features were maximum tumour diameter, BCLC stage, tumour number, platelet, neutrophils and TBIL (Supplementary Fig. 4a). The Catboost model trained on the 30 variables had AUCs of 0.810 (95% CI, 0.783–0.868) in the training cohort, 0.764 (95% CI, 0.719–0.827) in the validation cohort, and 0.774 (95% CI, 0.718–0.832) in the test cohort. The top six features were local therapy, maximum tumour diameter, albumin, AFP, treatment modality, and ALBI grade (Supplementary Fig. 4b). Similarly, the Catboost model achieved the best prediction performance in the HAIC group, and the LGBM model achieved the best prediction performance in the TACE group (Supplementary Fig. 3a–f, Supplementary Fig. 4c, d).

Fig. 2: ROC comparison between five ML models.
figure 2

ROC curves were established to evaluate the performance of the ML models, including XGBoost, CatBoost, GBDT, LBGM and RF, in the training cohort (a), testing cohort (b) and validation cohort (c) in the I-A group. ROC curves were established to evaluate the performance of the ML models, including XGBoost, CatBoost, GBDT, LBGM and RF, in the training cohort (d), testing cohort (e) and validation cohort (f) in the I-C group. DeLong test; *, P value < 0.05; **, P value < 0.01; ***, P value < 0.001. ML machine learning, RF random forest, XGBoost extreme gradient boosting, CatBoost categorical gradient boosting, GBDT gradient boosting decision tree, LGBT light gradient boosting machine, ROC receiver operating characteristic.

Interpretation methods for the ML model

We used the SHAP method to explain the prediction results of these ML-based models (Fig. 3), and the nonlinear impact of each variable on the 12-month OS is illustrated for the five ML-based models. For HCC patients who underwent HAIC (Supplementary Fig. 5), our results show that local therapy was the strongest prognostic factor and that local therapy had a higher survival benefit than HAIC alone. Patients with vascular invasion and metastasis had the second and third highest risk for death compared with those without vascular invasion and metastasis. The BCLC-C stage was associated with a higher risk for poor prognosis than BCLC-A&B stage. The relationship between laboratory findings, including alanine aminotransferase (ALT), albumin, total bilirubin (TBIL), lymphocytes, neutrophils, AST, creatinine (Cre), AFP and CRP, and SHAP value was an S-shaped curve with clear turning points. The SHAP shows that the death risk of HCC patients who received IAT increased significantly when certain examination variables, including albumin, ALT, TBIL, AFP, and CRP, had elevated concentrations or lymphocytes and neutrophils values were decreased.

Fig. 3: SHAP values of individual features of ML models.
figure 3

a Bar plot of the average SHAP values for the top predicted features to illustrate global feature importance in class 1 (high risk) and class 0 (low risk) for the LGBM model in the IAT combination therapy group. b SHAP values of the top features of the CatBoost model in the IAT alone therapy group. The plot sorts feature by the sum of SHAP value magnitudes over all samples. The colour represents the feature value (red high, blue low). c Bar plot of the average SHAP values for the top predicted features to illustrate global feature importance for the LGBM model in the TACE therapy group. d SHAP values of the top features of the CatBoost model in the HAIC group. SHAP Shapley Additive exPlanations, XGBoost eXtreme Gradient Boosting, CatBoost Categorical Gradient Boosting, GBDT Gradient Boosting Decision Tree, LGBM Light Gradient Boosting Machine and RF Random Forest, HCC hepatocellular carcinoma, HAIC hepatic arterial infusion chemotherapy, TACE transarterial chemoembolization, IAT intra-arterial therapy.

The development and validation of the MLDSM

In total, 1288 and 40 deaths were recorded in this study, indicating sufficient power of estimation. The stratification thresholds generated with the median risk score in the test cohort were 0.54 for the I-C group, 0.31 for the I-A group, 0.87 for the TACE group, and 0.49 for the HAIC group. The web tool online based on the MLDSM is accessible for public use (http://106.63.4.6:8086/index). Based on the optimal cutoff values, HCC patients were classified as low- or high-risk patients. The Kaplan‒Meier curves revealed that low-risk patients in the four groups had a significant OS benefit (log-rank test, P < 0.05) in the three cohorts (Fig. 4). The calibration curves demonstrated good agreement between the predicted 12-month death probabilities computed by these five models and the actual 12-month death probabilities in the ML-based models (Supplementary Fig. 6).

Fig. 4: Risk strata for predicting the prognosis of HCC patients who underwent IATs.
figure 4

Comparing the OS between HCC patients who received the IAT scheme in a the training cohort, b testing cohort, and c validation cohort. Kaplan–Meier plots for patients who received IAT alone therapy in d the training set, e testing set, and f validation set. Kaplan–Meier plots for patients who received HAIC therapy in g the training cohort, h testing cohort, and i validation cohort. Comparing the OS between HCC patients who received TACE and HAIC therapy in j the training cohort, k testing cohort, and l validation cohort. OS overall survival, HCC hepatocellular carcinoma, HAIC hepatic arterial infusion chemotherapy, TACE transarterial chemoembolization, IAT intra-arterial therapy.

Fig. 5: Using MLSDM, top divergence variables were used in classifying treatment selection.
figure 5

a Intra-arterial alone therapy or Intra-arterial combined; b TACE therapy or HAIC therapy.

Clinical application in guiding the IAT protocol

We simulated a clinical scenario, which assumed that all patients in test cohort were in an untreated state. When a hypothetical untreated HCC patient needed to make an IAT decision, the MLDSM was used to predict the 12-month death probability of receiving I-C or I-A as the first treatment for the patient (Supplementary Methods E1.6). A similar scenario was simulated for TACE versus HAIC. The IAT scheme with a lower death probability would be recommended. The ML model-based trees were used to classify patients into different effect subgroups based on the prognostic score derived from the best ML models. Then, we used clinical factors to establish the decision tree based on the high- or low-risk group from the MLDSM. As illustrated by the decision tree, BCLC stage, local therapy, tumour diameter, albumin level, platelet level, AST level and CPR level are the top divergence variables in classifying treatment selection (Supplementary Fig. 7a). The results suggest that I-C treatment should be recommended for patients who had higher albumin, platelet, AST and CPR levels and did not receive local therapy, while I-A treatment should be recommended for patients at BCLC A&B stage and those with a tumour diameter < 5 cm. Similarly, we constructed a decision tree to help patients choose between HAIC or TACE therapy (Supplementary Fig. 7b). Consequently, an alternative noninvasive guideline could be used to guide the use of IAT for unresectable HCC.

Discussion

Previously, Shi Ming et al. reported the effectiveness and safety of TACE with HAIC treatment in large HCC (diameter > 7 cm) in a stage III clinical trial, revealing that tumour burden plays a crucial role in the selection between HAIC and TACE [22]. In addition, the CHANCE2211 trial reported that TACE plus camrelizumab and apatinib [23] showed outstanding effectiveness compared with TACE alone for locoregional HCC. However, high-level tumour heterogeneity at the histologic, genomic, and molecular levels results in a certain range of individualised differences in the prognosis of HCC patients who receive IAT. To date, multiple risk factors affecting the prognosis of HCC patients have been identified, including AFP level, tumour diameter, pathological classification, albumin-bilirubin (ALBI) score, BCLC grade, etc [24,25,26,27]. Various predictive models based on the abovementioned variables have been established and applied with the purpose of serving the clinic. However, these models are limited by relatively few variables, small derivation sample sizes, varying pathologic scoring standards, and poor clinical applicability.

In this study, we used ML algorithms to analyse clinical information from 2959 HCC patients who underwent IATs at 13 hospitals. Furthermore, we built and validated the MLDSM, which comprises 32 widely accepted clinical variables, to quickly and accurately guide physicians in making decisions before administering IAT. Among them, the LGBM (using 24 variables in the I-A group) and Catboost models (using 30 variables in the I-C group) maximised predictive performance among all ML models. The CatBoost model uses perfect hashing to store the values of category features to reduce memory usage, which reduces the need for tuning many hyperparameters and uses a symmetric decision tree-based model, significantly improving the likelihood of overfitting and making the model more generalisable. Moreover, the LGBM model has an extremely fast training speed and is very suitable for classification problems on high-dimensional datasets. For all variables, the importance score of local therapy far exceeds that of other variables, which suggests that sequential local therapy after the IAT combination scheme plays an important role in the survival benefit [28,29,30].

As the ML process lacks interpretability, we explained the relationship between these variables and death using the SHAP method. Tumour diameter and BCLC stage were two important factors in the ranking of feature importance, which mainly reflect tumour characteristics defined by the guidelines. In clinical practice, HCC with various sizes in different clinical stages also determines the formulation of treatment schemes. These indicators, including albumin, ALT, and CRP, can be understood as closely related to the status of liver function, coagulation mechanism, and tumour microenvironment (TME) components, including ECM, proteoglycan, immune cells, and hypoxia. A plethora of studies have clarified that TME plays an important role in the initiation and progression of carcinoma during the invasion-metastasis cascade [31, 32]. However, these specific mechanisms need to be further verified through clinical experiments.

Because previous studies only focused on a single treatment modality, they could not solve the key problem of how to provide optimal decision support. Liu et al. built two models based on semantic information and ultrasound image information to predict early recurrence for HCC patients receiving RFA or surgery and recommend suitable candidates [33]. Ding et al. built a hybrid model that accurately predicted the early recurrence probability of MWA and surgery and provided reliable evidence to make optimal treatment decisions for patients with single 3- to 5-cm HCC [34]. These decision support models were based on each individual and were cross-stratified twice regarding different treatments. Our MLDSM was developed by another method (e.g., a decision tree based on the high- or low-risk group from the MLDSM). In this study, clinical stages, tumour burden, and liver function were shown to affect treatment selections in HCC patients who underwent IATs. Our findings were consistent with BCLC guidelines and those reported in previous studies [22, 25]: tumour diameter and number were critical socioeconomic factors affecting IAT selection. In our series, we observed that patients treated with IAT combined with sequential local therapy, including ablation, surgery, and SBRT, were more likely to receive IAT combination therapy due to the significant ORR. The inaccessibility of IAT conversion therapy among these patients might ultimately lead to their worse survival outcomes.

Limitations

Our study had several limitations. First, this study enroled patients from 13 hospitals in China, and each hospital had different treatment habits and operation techniques. Moreover, there were also some differences in the selection of TACE drugs, which may affect the final outcomes. Second, this study enroled most patients with large HCC and HBV infection as a predominant aetiology of HCC in China. It remains unclear whether the results could be widely applied in Western countries, where the majority of patients have a low tumour burden or alcoholic liver cirrhosis as the predominant aetiology. Third, information regarding complications during and after IAT, TKI, and ICI was not analysed, warranting further investigation. Given these limitations, the application of our proposed MLDSM as a decision-support tool in prospective clinical trials needs further validation.

Conclusions

In conclusion, the MLDSM was used to recommend accurate and reasonable schemes for patients with uHCC during the IAT and follow-up process, which are accessible online considering both the presence of an assessment and interpretation of death risk. The MLDSM could be easily implemented for physicians, thereby serving as a more favourable tool to strengthen individualised IAT schemes. The application of our proposed MLDSM as a decision-support tool in prospective clinical trials needs further validation.