Table 2 The model reduction and the partial likelihood ratio (LR) test revealed that a baseline model with the novel risk groups is statistically comparable to the full model to predict cancer-specific survival.

From: Artificial intelligence unravels interpretable malignancy grades of prostate cancer on histology images

Model

Nested partial likelihood ratio test, LR (p-value)

AIC

BIC

1st external validation set (CPCBN)

Baseline model + risk group + GG

Reference (full model)

167.5161

169.6403

Baseline model + risk group

1.560 (0.213)

167.07

168.4861

Baseline model + GG

4.114 (0.036)

169.6098

171.0259

Baseline model (pT)

7.557 (0.027)

171.0623

171.7703

2nd external validation set (PROCURE)

Baseline model + risk group + GG

Reference (full model)

170.6459

174.824

Baseline model + risk group

4.094 (1.2e1)

172.7398

175.8733

Baseline model + GG

10.056 (1.8e−3)

178.7017

181.8353

Baseline model (pT + pN)

27.777 (6.8e−5)

194.4228

196.5119

3rd external validation set (PLCO)a

Baseline model + risk group + GS

Reference (full model)

298.0581

301.8323

Baseline model + risk group

14.849 (1e−4)

310.9075

313.4237

Baseline model + GS

5.15 (0.023)

301.2158

303.732

Baseline model (prostate pathologic stage)

26.769 (1.54e−06)

320.8274

322.0855

  1. In contrast, GG (Gleason score/ISUP grade groups) was not comparable to the full model. Akaike information criterion (AIC) and Bayesian information criterion (BIC) support this finding as well since the fit of a baseline model with the novel risk groups is better than the fit of a baseline model with GG. pT: pathologic tumor stage; pN: pathologic nodal stage. +pN was excluded due to non-significance to prognose cancer-specific survival in the CPCBN external validation set. For PLCO external validation set, we used GS provided by the study instead of GG and prostate pathologic stage (considers T, N, and M stages) due to the study history. The best-performing models are highlighted in bold. Higher AIC and BIC are associated with the worst model fitness.
  2. aSince both GS and risk groups were significantly inferior than the full model, we applied the non-nested partial likelihood ratio test to compare between GS Cox model and risk group Cox model; our risk group demonstrated non-inferiority to GS, indicating comparable goodness of fit (z = 1.091, p = 0.138). No significant multicollinearity between variables was identified (VIF < 2).