Extended Data Fig. 3: TIMES demonstrates superior performance in predicting HCC recurrence compared to existing prognostic markers.
From: Spatial immune scoring system predicts hepatocellular carcinoma recurrence

a, Comparison of prediction accuracies of ZFP36L2, ZFP36, VIM, and HLA-DRB1, as well as their combinations with SPON2. The prediction accuracy of an extreme gradient boosting model built on SPON2-labeled mIHC data was 89.77%, the highest among models constructed on mIHC data labeled by each of the five identified biomarkers (i.e., SPON2, ZFP36L2, ZFP36, VIM, and HLA-DRB1). We used SPON2’s accuracy as a baseline and subtracted it from the prediction accuracies of the other four biomarkers and their combinations with SPON2, in order to highlight the decreased or increased prediction accuracy compared to SPON2 alone. The X-axis lists the gene names (“+” indicates gene combinations), and the Y-axis shows the differences in prediction accuracies resulting from the subtraction. b, Comparisons of TIMES scores assigned to patients at different TNM stages. A bar chart compares TIMES scores that were assigned to the patients at early (I, II; n = 124 patients) and late (III, IV; n = 31 patients) stages, as defined by the Tumor Node Metastasis (TNM) system. Colored dots represent the scores of individual patients, and data are means ±2 s.e.m. Unpaired two-tailed Student’s t-test, P = 0.04. c, Comparisons of TIMES scores assigned to non-REC and REC patients. Left-to-right panels show ‘unstratified’ group (nnon-REC = 91 patients, nREC = 72 patients), TNM-stratified subgroups as ‘TNM < II’ (nnon-REC = 22 patients, nREC = 16 patients) and ‘TNM ≥ II’ (nnon-REC = 46 patients, nREC = 43 patients), and BCLC-stratified subgroups as ‘BCLC = A’ (nnon-REC = 17 patients, nREC = 9 patients) and ‘BCLC = B or C’ (nnon-REC = 53 patients, nREC = 51 patients). Colored dots represent the scores of individual patients, and data are means ±2 s.e.m. Unpaired two-tailed Student’s t-test, P = 7.3 × 10−16 for comparison within ‘unstratified’ group, P = 7.1 × 10−6 for comparison within ‘TNM < II’ subgroup, P = 7.4 × 10−14 for comparison within ‘TNM ≥ II’ subgroup, P = 0.036 for comparison within ‘BCLC = A’ subgroup, and P < 2.2 × 10−16 for comparison within ‘BCLC = B or C’ subgroup. d, Left: DFS curves stratified by TNM, with patients categorized into I (n = 53 patients), II (n = 71 patients), III (n = 16 patients), and IV (n = 15 patients) subgroups. The log-rank test determines P values for the comparisons between DFS curves, and only the significant comparisons are marked with notation as *P < 0.05 and **P < 0.01. Right: DFS curves stratified by BCLC, with patients categorized into A (n = 26 patients), B (n = 19 patients), and C (n = 85 patients) subgroups. Shaded areas correspond to 95% confidence intervals and central lines indicate medians. P = 0.002 for comparison between ‘TNMI’ and ‘TNMIII’ subgroups, P = 0.04 for comparison between ‘TNMI’ and ‘TNMIV’ subgroups, P = 0.05 for comparison between ‘TNMII’ and ‘TNMIII’ subgroups; P = 0.03 for comparison between ‘BCLCA’ and ‘BCLCB’ subgroups, P = 0.005 for comparison between ‘BCLCA’ and ‘BCLCC’ subgroups. e, Univariate Cox regression results for DFS. First column: TIMES and 17 clinical factors, of which natural logarithms of hazard ratio values, \({\rm{ln}}({\rm{HR}})\), are significantly different from 0 (P < 0.05 in the univariate regressions here). Second column: \({\rm{ln}}({\rm{HR}})\) values along with the 95% confidence intervals shown as horizontal bars (third column), showing means of \({\rm{ln}}({\rm{HR}})\) ± 2 s.e.m. Last column: P values from the Wald test, indicating the statistical significance of the corresponding \({\rm{ln}}({\rm{HR}})\). f, DFS multivariate Cox regression on TIMES and 17 clinical factors that were identified significant (P < 0.05 from the Wald test) in univariate Cox regressions. Tumor differentiation grade: no higher than Grade 2, between Grade 2 and Grade 3 (G2-G3), and no lower than Grade 3 (G3). Second column: \({\rm{ln}}({\rm{HR}})\) values along with the 95% confidence intervals shown as horizontal bars (third column), showing means of \({\rm{ln}}({\rm{HR}})\) ± 2 s.e.m. Last column: P values from the Wald test * P < 0.05, *** P < 0.001, and **** P < 0.0001. n = 254 patients. g, TIMES demonstrates superior performance in predicting HCC recurrence compared to NK cell subsets, including CD3−CD57+ mature NK, CD3−CD16+CD56+ cytotoxic NK, and CD3−CD56+ ordinary NK cells. Using a separate dataset of mIHC staining, receiver operating characteristic curves and area under the curve (AUC) of the TIMES model (AUC = 0.996) were compared against models built on the abundances of three NK cell subsets in the IF or TC compartments. The TIMES scoring system, which incorporates spatial information from multiple biomarkers, outperformed these prediction models based solely on NK cell subsets. h, Comparative predictive performance of TIMES versus SPON2 expression at TC for HCC recurrence. Receiver operating characteristic curves and area under the curve (AUC) of the TIMES model (AUC = 0.82) were compared against that built from SPON2 expression at TC (AUC = 0.59). i, Distributions of TIMES scores for anti-PD1 immunotherapy recipients. According to responsiveness to immunotherapy, patients can be categorized as subgroups of progressive disease (PD, in blue, n = 12 patients) and of partial response (PR, red, n = 13 patients) by RECIST (Response Evaluation Criteria in Solid Tumor) criteria. The X-axis represents TIMES scores, while the Y-axis displays the counts of the corresponding scores. The unpaired two-tailed Wilcoxon rank-sum test was employed to assess score distribution differences. The median difference (MD) between scores from PR patients and PD patients was 0.267, yielding a P value of 1.61 × 10−3. Statistical significance: *P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001.