Fig. 4: Drivers of the PDT-DFS.

a–d The importance of drivers in controlling the PDT-DFS. DFS ≤ DFS10th (n = 125,488), DFS ≤ DFS20th (n = 125,918), DFS ≤ DFS30th (n = 126,428), and DFS ≤ DFS40th (n = 126,957) in a–d represent DFS from the <10% quantile to the <40% quantile, respectively. Numbers in the parenthese are the sample size for each group. The box plots in panels (a–d) represent the range of each variable across 100 cross-validation runs. The lower dashed line indicates the 25th percentile, while the upper dashed line indicates the 75th percentile. The black line within each box denotes the median, and the dots on the boxes show the mean absolute SHAP value from each random forest iteration. The x-axes in (a–d) are uniformly truncated to the range [0.23, 0.52] to better display the data distribution. The tags on the y axis in a–d represent the drivers (Supplementary Table S3). R² in a–d represents the coefficient of determination; RMSE in a–d represents the root mean square error. e–h Partial dependence plots of the top 3 drivers. The shaded area of the partial dependence plots represents the mean velues ±SD of SHAP values for DFS ≤ DFS10th–DFS ≤ DFS40th. Source data are provided as a Source Data file.