Fig. 3: Patient stratification and model interpretability.

a Global feature permutation importance from the Clinical Transformer applied to the Chowell et al.3 dataset. N = 10 permutation test data splits. b KM curves (on 80% train splits) for different population groups defined by four quantile cutoffs from Clinical Transformer survival scores (Q1: n = 241, Q2: n = 304, Q3: n = 330, Q4: n = 303). Dotted lines: median survival time and probability from the 10 models on the train splits. c Raw feature value enrichment in the four populations used to stratify the patients (from the 10 models on 20% test splits; n = 296). Numerical features: y-axis in units of each variable; binary features: y-axis is proportion of patients. Bonferroni-corrected two-sided t test P value: ns, 5.00e-02 < P ≤ 1.00e + 00; *, 1.00e-02 < P ≤ 5.00e-02; **, 1.00e-03 < P ≤ 1.00e-02; ***, 1.00e-04 < P ≤ 1.00e-03; ****, P ≤ 1.00e-04. d, Feature interaction graph derived from cosine interaction scores. Each color depicts one of the four functional groups identified when clustering the feature pairwise cosine similarities in the test sets. Black lines: intracluster interactions; light blue lines: intercluster interactions; line thickness: cosine similarity magnitude. e CoxPH HR (error bars: 95% CI) stratifying by each of the top 10 functional groups in the Samstein et al.33 dataset. Functional group displayed genes capped at 10 for visualization purposes. f KM curves of patients possessing at least 1 mutation in group C8 (Mut) vs. no mutation (Wt), in the discovery dataset (Samstein et al.), and two independent validation datasets (Miao et al.73 and TCGA). Boxplots in (a, c): centerline, median; box limits, quartile 1 and 3; box whiskers, 1.5× interquartile range; diamonds, outliers. Shaded region or error bars in (b, e): 95% confidence interval. HR p-values in (f) from Wald statistical test. Chemo, chemotherapy; MSS, microsatellite stable. Source data are provided in the SourceData file.