Fig. 1: Prediction of ICI response using XGBoost and Boruta feature selection per cell-type.

a Schematic workflow of the study: after preprocessing and quality control, the input data undergoes cell-type annotation to ensure a clean and well-annotated dataset (1). Each cell is then labeled according to its sample’s response status, and a classifier is trained at the single-cell level to differentiate responder cells from non-responder cells. The proportion of responder cells within a sample constitutes the sample score, serving as a prediction of likelihood to respond. This base model is then utilized in two main axes: (2 top, 3) interpretation of gene importance and behavior, using feature selection, feature importance analysis, and SHAP values; and (2 bottom, 4) analysis of cell importance, through cell-type prediction and a reinforcement learning framework for quantification of cell predictivity. (5) Finally, results from both axes are validated on independent datasets. b ROC curve for the base model predicting response to ICI using all cells and genes in the cohort. c XGBoost feature importance bar-plot showing the top 25 most important genes. d AUC scores indicating the prediction accuracy of the base model across different immune cell types. e Box plots comparing the scores produced by base XGBoost between responders (R) and non-responders (NR) across top most accurate cell subtypes, with significance values indicated by the Mann–Whitney U test. f ROC curve for the Boruta-selected model predicting response to ICI using all cells in the cohort. g AUC scores indicating the prediction accuracy of the Boruta-selected model across different immune cell types. h Bar-plot of the number of occurrences of each gene in the Boruta selection across the LOO folds, showing the top most robust genes. i Heatmap displaying the top genes selected by Boruta for different immune cell types, showing the number of occurrences of each gene for each cell type.