Fig. 4

Age-stratified feature importance, differential expression, learning curves, and permutation tests for the Therapeutic gene set. (A, B) Top 10 Random Forest (RF) features ranked by impurity-based importance in the Young (< 70) and Old (≥ 70) age strata. Gene symbols correspond to RF-selected probes with the highest predictive contribution. (C, D) Volcano plots showing age-associated differential expression (log2 fold-change Old/Young vs. -log10(p)) across the transcriptome within each age group. The top 10 RF-selected genes are highlighted. (E, F) Learning curves for RF models trained on the top three features (RF Top-3) in each age stratum. Curves show mean balanced accuracy across outer folds (orange, ± SD) versus training set size, alongside training performance (blue), based on fivefold × 10-repeat nested stratified cross-validation. (G, H) Permutation testing results for the RF Top-3 models in each stratum. Histograms display the null distribution of balanced accuracies obtained from 100 label-shuffled permutations, with the observed score (red dashed line) plotted for comparison. P-values were calculated as the fraction of permuted accuracies equal to or exceeding the observed value, confirming statistical significance.