Heterogeneity-optimized method for predicting immune checkpoint blockade response

Liang, Juan; Guo, Qihang; Xiang, Shan; Wang, Yun; Li, Juntao

doi:10.1038/s41598-025-17718-3

Download PDF

Article
Open access
Published: 01 September 2025

Heterogeneity-optimized method for predicting immune checkpoint blockade response

Juan Liang¹,
Qihang Guo²,
Shan Xiang²,
Yun Wang² &
…
Juntao Li²

Scientific Reports volume 15, Article number: 32259 (2025) Cite this article

1235 Accesses
Metrics details

Subjects

Abstract

Interpatient tumor heterogeneity manifests as multimodal distributions across genomic, transcriptomic, and microenvironmental profiles. This fundamentally violates the unimodal assumption of conventional machine learning models, impairing immune checkpoint blockade (ICB) response prediction. To resolve this limitation, we propose a heterogeneity-optimized framework that applies K-means clustering to stratify patients into biologically distinct hot-tumor and cold-tumor subgroups, demonstrating superiority over hierarchical/DBSCAN clustering. Subsequently, heterogeneity-optimized predictive models–a support vector machine for hot-tumor subtypes and a random forest for cold-tumor subtypes–were developed utilizing seven heterogeneity-associated biomarkers to circumvent unimodal constraints. The proposed model significantly enhances ICB response prediction in melanoma, NSCLC, other cancer types, and pan-cancer datasets, achieving a mean accuracy gain of at least 1.24% compared to 11 baseline methods. This performance improvement was consistently validated in an independent external cohort. Furthermore, the approach enables biologically interpretable precision immunotherapy by explicitly modeling multimodal heterogeneity.

Improved prediction of immune checkpoint blockade efficacy across multiple cancer types

Article 01 November 2021

Longitudinal liquid biopsy identifies an early predictive biomarker of immune checkpoint blockade response in head and neck squamous cell carcinoma

Article Open access 01 September 2025

Evolution of delayed resistance to immunotherapy in a melanoma responder

Article 03 May 2021

Introduction

Cancer immunotherapy harnesses the patient’s own immune system to recognize and eliminate cancer cells¹. Immune checkpoint blockade(ICB) is a cancer immunotherapy that activates the immune system and enhances anti-tumor immune response by blocking immune checkpoint molecules, becoming one of the most effective types of cancer immunotherapy². Inhibitors targeting checkpoint molecules such as cytotoxic T-lymphocyte-associated protein 4 (CTLA-4), programmed cell death 1 receptor (PD-1), and programmed cell death ligand 1 (PD-L1) have proven highly effective in improving survival rates for patients with advanced cancers, including melanoma and non-small cell lung cancer (NSCLC)³. Despite these successes, a substantial proportion of patients exhibit limited or no clinical benefit due to marked interpatient heterogeneity in treatment responses^4,5,6. Consequently, the accurate prediction of patient responsiveness to ICB therapy has emerged as a critical challenge in contemporary biomedical research, driving significant efforts in the fields of cancer immunology, bioinformatics, and artificial intelligence.

From a biological perspective, ICB response is influenced by a combination of features^{6,7,8,9,10,11}. Changes in blood neutrophil-to-lymphocyte ratio and eosinophile levels have been shown to correlate with the response to immune checkpoint blockade therapy^6,12. Additionally, patients with malignant tumors of different ages exhibit varying levels of drug tolerance, which in turn affects their ICB response⁷. Microsatellite instability (MSI) states are also associated with higher ICB response rates and serve as both prognostic and predictive markers⁸. The combination of MSI status, age, and drug type collectively impacts treatment outcomes⁹. Furthermore, previous studies have indicated that PD-L1 expression is linked to microsatellite instability, C-reactive protein levels, and the blood neutrophil-to-lymphocyte ratio, all of which can influence ICB response in colorectal cancer patients^13,14. Therefore, leveraging these biological features to predict ICB response in cancer patients remains a significant challenge.

Statistical machine learning methods have been successfully applied to predict ICB response in cancer patients^15,16,17,18. For example, Anagnostou et al. proposed a comprehensive multivariate model that integrates features such as corrected tumor mutation burden (TMB), activated receptor tyrosine kinases in the tumor, smoking-related mutational signatures, and human leukocyte antigen status, to predict ICB response in cancer patients¹⁵. Wang et al. constructed a regulatory network of 11 immune cell clusters in metastatic melanoma by integrating biological pathway data and single-cell sequence data, using a logistic regression model of ligands and receptors within the network to predict ICB response¹⁶. Andrei S. Robin et al. developed a novel computational pipeline based on comparative Bayesian network analyses of immune networks, which detected strong signals that conventional methods might overlook¹⁷. Chowell et al. created a random forest classifier (RF16), which integrates 16 input features related to immunotherapy efficacy, improving the prediction of ICB response across various cancer types¹⁸.

Although previous studies^15,16,17,18 have made significant strides in applying statistical and machine learning methods for predicting response to immune checkpoint blockade, they failed to account for the inherent heterogeneity characteristic of cancer data. In fact, patients with histologically identical cancer types frequently exhibit pronounced heterogeneity in tumor molecular profiles, encompassing genomic alterations, transcriptomic signatures, and immune microenvironment features. This interpatient heterogeneity substantially complicates treatment outcomes and poses significant challenges for predicting responses to immune checkpoint blockade. Tumor heterogeneity manifests its most prominent signature through multimodal distributions in cancer data–a characteristic that fundamentally violates the single-distribution assumption underpinning conventional machine learning frameworks. This paper proposed a novel heterogeneity-optimized machine learning framework designed to directly address the above challenge. The key advantages and contributions of our proposed framework are:

Heterogeneity-aware Clustering: Two latent patient subgroups were robustly identified through the application of K-means clustering to multimodal tumor data. These subgroups exhibited distinct tumor microenvironment (TME) profiles, corresponding to the established hot-tumor and cold-tumor phenotypes based on their significantly differential features. Furthermore, comparative analyses demonstrated the statistically significant superiority of the two-cluster K-means configuration over alternative K-means variants (K>2), hierarchical clustering (HC), and density-based spatial clustering of applications with noise (DBSCAN).
Heterogeneity-optimized Predictive Modeling:Seven key clinical and molecular features exhibiting significant associations with tumor heterogeneity were systematically identified. Based on these biomarkers, a support vector machine (SVM) model was specifically developed for the inflammatory hot-tumor subtype, while a random forest (RF) classifier was concurrently constructed for the immune-desert cold-tumor subtype. This dual-model framework fundamentally circumvents the limitations imposed by unimodal data distribution assumptions pervasive in conventional methodologies.
Enhanced ICB Response Prediction:Substantially improved immune checkpoint blockade (ICB) response prediction was demonstrated across four distinct cancer cohorts: melanoma, non-small cell lung cancer (NSCLC), other cancer, and pan-cancer datasets. The proposed framework exhibited statistically significant superiority over eleven established baseline methods–including random forest (RF), support vector machines (SVM), and logistic regression–achieving average improvements of 1.24% in accuracy. Furthermore, consistent performance enhancement was validated in independent external validation cohorts.

Materials and methods

Data description

This study leverages a pan-cancer cohort of 1,479 ICB-treated patients (Chowell et al.¹⁸) spanning 16 cancer types. All patients received PD-1/PD-L1 inhibitors, CTLA-4 blockers, or combination therapy, with tumor profiling conducted via FDA-approved MSK-IMPACT sequencing¹⁹. Using RECIST v1.1 criteria²⁰, we categorized patients as responders (complete/partial response; n=409) or non-responders (stable/progressive disease; n=1,070). For cancer-type-specific analysis, the cohort was stratified into melanoma, NSCLC, and other cancers. Each subtype cohort underwent stratified random partitioning,Consistent with Chowell et al: 80% for training and 20% for testing. Patient feature vectors $\textbf{x}_i \in \mathbb {R}^{19}$ derived from molecular profiles were paired with binary response labels $y_i \in \{0,1\}$ (1: responder, 0: non-responder). External validation was performed using an independent metastatic melanoma patient cohort.(Liu et al. Nat Med²¹). This dataset provided complementary genomic, transcriptomic, and clinical profiles for robust verification of pan-cancer findings.

Data processing

To ensure model input consistency and maximize biological interpretability, we implemented a standardized preprocessing pipeline across all cohorts using feature type-specific strategies. Dichotomous features (e.g., sex, prior chemotherapy status) were directly encoded as 0 or 1, while ordinal variables (e.g., disease stage, ECOG score) were assigned integer values preserving their inherent prognostic hierarchy. Nominal categorical variables (e.g., cancer type, drug class) underwent one-hot encoding with the first category omitted to avoid multicollinearity. Continuous features were processed sequentially: highly skewed variables (TMB, FCNA, MSI score) received a variance-stabilizing $\log _{10}(x + 1)$ transformation, and all continuous measures (including transformed variables, age, BMI, NLR) were standardized to zero mean and unit variance (z-scoring) within respective training cohorts to prevent data leakage. For the independent melanoma validation cohort, identical preprocessing was applied to overlapping features; non-overlapping features were systematically excluded without imputation to preclude bias from artificial data.

Heterogeneity test

Despite significant advances in statistical and machine learning approaches for predicting immune checkpoint blockade (ICB) response^15,16,17,18, existing models often adopt monolithic frameworks that assume uniform biological mechanisms across cancer types–thereby neglecting intrinsic inter-tumoral heterogeneity. This oversimplification represents a critical limitation, particularly in pan-cancer settings, where divergent tumor immunobiology may fundamentally constrain the generalizability of single predictive models. To address this gap, we hypothesized that cancer-type-specific and patient-level biological heterogeneity underlies differential ICB responsiveness, necessitating a stratification-aware modeling paradigm. To systematically evaluate this hypothesis, we first performed univariate statistical analyses across key clinical and molecular features, comparing responders (R) and non-responders (NR). Continuous variables were assessed using the Mann–Whitney U test, and categorical variables using Fisher’s exact test (Fig. 1). As expected, established biomarkers such as tumor mutational burden and neutrophil-to-lymphocyte ratio showed significant associations with response. However, a notable subset of patients with high TMB failed to respond to ICB, contradicting canonical expectations and suggesting the presence of confounding biological factors that modulate the TMB–response relationship. This observation prompted a deeper investigation into the distributional properties of these biomarkers. Using multimodal distribution analysis, we uncovered latent patient stratification patterns indicative of distinct immunological endotypes. Specifically, TMB exhibited bimodal distribution, with modes at 6.8 and 15.2 mutations per megabase. Similarly, body mass index (BMI) displayed dual peaks at 24.3 and $31.7\,\mathrm{kg/m}^{2}$–values aligning with normal weight and obesity thresholds, respectively. These non-Gaussian, multimodal distributions provide statistical evidence of population heterogeneity, implying that patients may segregate into biologically distinct subgroups that are obscured under conventional, aggregate modeling approaches.The presence of such latent structure suggests that a one-size-fits-all model is inherently limited in capturing the complexity of ICB response determinants.

Heterogeneity-aware clustering

This study proposes a heterogeneity-aware analytical framework to address the limitations of single predictive models in forecasting therapeutic response to immune checkpoint blockade. By integrating multimodal distribution analysis with unsupervised clustering, we identify biologically distinct patient subgroups and quantify the mechanistic underpinnings of their differential response patterns. To dissect latent patient stratification, we applied K-means clustering within a standardized feature space constructed from a pan-cancer cohort after comprehensive data preprocessing. K-means was selected due to its computational efficiency, numerical stability in high-dimensional spaces, and scalability for translational applications. The optimal number of clusters (K = 2) was determined using silhouette analysis across a predefined range, which evaluates the trade-off between intra-cluster cohesion and inter-cluster separation. The silhouette coefficient quantifies the similarity of each sample to its own cluster compared to others, with higher values indicating more distinct and well-separated clusters. We further validated this choice using the elbow method, which assesses the rate of decline in within-cluster sum of squares as K increases, identifying the inflection point beyond which additional clusters yield diminishing improvements in model fit. Together, these criteria maximize within-cluster homogeneity while minimizing inter-cluster overlap, ensuring robust patient stratification. The resulting two-cluster partition revealed immunologically distinct subgroups. Cluster 0 was significantly enriched for patients with high tumor mutational burden and elevated hemoglobin levels–features associated with T cell-inflamed “hot” tumors and efficient antigen presentation. In contrast, Cluster 1 exhibited hallmarks of systemic inflammation and immune suppression, characterized by an increased neutrophil-to-lymphocyte ratio and reduced HGB levels, consistent with myeloid-driven immune evasion and anemia of chronic disease^22,23.

Heterogeneity-optimized predictive modeling

To address the limitations of conventional modeling in predicting clinical response to cancer immunotherapy–particularly its diminished performance due to unaccounted inter-patient heterogeneity–this paper developed a heterogeneity-optimized, subgroup-specific modeling framework (Fig 2) that leverages prior molecular and immune phenotyping to stratify patients into biologically coherent subgroups, thereby enabling more precise and mechanistically informed prediction. Based on our earlier heterogeneity-aware clustering analysis, the cohort was partitioned into two distinct subtypes: Cluster 0, characterized by T cell infiltration and an inflamed (“hot”) tumor microenvironment, and Cluster 1, dominated by myeloid-derived suppressor cells and exhibiting features of an immunosuppressive, non-inflamed (“cold”) phenotype; this stratification captures fundamental differences in immune contexture that are known to influence therapeutic response. To exploit intra-subgroup homogeneity while mitigating the confounding effects of global heterogeneity, we constructed customized machine learning models tailored to the distinct biological and statistical properties of each subgroup, with differential feature selection, algorithm selection, and decision threshold optimization strategies.

For Cluster 0 (“hot” tumors), where responders and non-responders exhibit relatively separable and approximately linearly distributed feature patterns suggestive of a well-defined immune activation axis, we implemented a support vector machine (SVM) with radial basis function (RBF) kernel, optimized to handle the modest class imbalance inherent in response prediction. Prior to model training, we performed subgroup-specific feature selection using the Jensen-Shannon Divergence (JSD) to quantify distributional differences in biomarker expression between responders (R) and non-responders (NR), defined for two probability distributions $P$ (responders) and $Q$ (non-responders) as:

$$\begin{aligned} \text {JSD}(P \parallel Q) = \frac{1}{2} D_{\text {KL}}(P \parallel M) + \frac{1}{2} D_{\text {KL}}(Q \parallel M), \quad \text {where } M = \frac{1}{2}(P + Q), \end{aligned}$$

(1)

and $D_{\text {KL}}$ denotes the Kullback–Leibler divergence; biomarkers with $\text {JSD}> 0.1$ were retained, yielding a compact and biologically interpretable set of seven discriminative features. The SVM was trained to maximize the margin between classes in the transformed feature space induced by the RBF kernel $K(x_i, x_j) = \exp (-\gamma \Vert x_i - x_j\Vert ^2)$, under a class-weighted optimization objective that penalizes misclassification of the minority responder class more heavily:

$$\begin{aligned} \min _{w, b, \xi } \left( \frac{1}{2} \Vert w\Vert ^2 + C \sum _{i=1}^n \omega _{y_i} \xi _i \right) \quad \text {subject to} \quad y_i (w \cdot \phi (x_i) + b) \ge 1 - \xi _i, \quad \xi _i \ge 0, \end{aligned}$$

(2)

where $\omega _{y_i}$ denotes the class weight–specifically, $\omega _R = n_{\text {NR}} / (n_R + n_{\text {NR}})$ and $\omega _{\text {NR}} = n_R / (n_R + n_{\text {NR}})$–ensuring enhanced sensitivity to true responders; the final decision function takes the form $f(x) = \text {sign}(w \cdot \phi (x) + b)$, effectively capturing the tight regulatory balance between immune activation and tolerance characteristic of T cell-inflamed tumors.

In contrast, Cluster 1 (“cold” tumors) exhibits greater intratumoral heterogeneity, a highly skewed class distribution (low response rate), and complex, nonlinear response patterns likely driven by multifactorial immunosuppressive mechanisms, necessitating a more robust and flexible modeling approach; therefore, we adopted a Random Forest (RF) classifier to improve resilience to noise, outliers, and feature redundancy. The RF ensemble comprises $T$ decision trees $\{h_t\}_{t=1}^T$, each trained on a bootstrap sample of the Cluster 1 cohort, with recursive partitioning guided by Gini impurity minimization over a randomly selected subset of features at each node, thereby promoting diversity and reducing overfitting. The final prediction for any sample $x$ is determined by majority voting across all trees:

$$\begin{aligned} H(x) = \arg \max _{c \in \{R, \text {NR}\}} \sum _{t=1}^T I(h_t(x) = c), \end{aligned}$$

(3)

where $I(\cdot )$ is the indicator function; this ensemble strategy effectively averages out idiosyncratic noise arising from the suppressive tumor microenvironment and enhances generalization in the face of complex, nonlinear decision boundaries, making it particularly suitable for modeling response in myeloid-rich, immune-excluded tumor contexts.

For any new patient sample $x_{\text {new}}$, the prediction pipeline begins with subgroup assignment based on its molecular and immune profiling data, using Heterogeneity-Aware clustering to classify the sample into either Cluster 0 or Cluster 1, ensuring that subsequent inference is performed within a biologically homogeneous context. Once assigned, the appropriate subgroup-specific model is activated: if allocated to Cluster 0, the seven JSD-selected biomarker values are extracted and fed into the trained SVM to compute the signed decision score $\hat{y} = \text {sign}(w \cdot \phi (x_{\text {new}}) + b)$; if assigned to Cluster 1, the corresponding feature vector is passed to the RF model, where each tree $h_t$ generates a class prediction and the final output $\hat{y}$ is determined by the majority vote across the ensemble. This two-stage, biologically informed framework–stratify, then specialize–ensures that model architecture and feature space are aligned with underlying tumor immunobiology, thereby improving both predictive performance and mechanistic interpretability in the heterogeneous landscape of cancer immunotherapy response.

Algorithm

The steps for implementing our model on the pan-cancer dataset are outlined in Algorithm 1.

Results

Determination of optimal cluster number K

“To determine the optimal cluster number K, we employed three complementary validation methods: (1) The elbow method analysis of the sum of squared errors (SSE) curve (Fig. 3a) revealed two potential inflection points - a primary elbow at K=2 and a secondary inflection at K=3, suggesting possible hierarchical clustering structures; (2) Silhouette coefficient evaluation (Fig. 3b) demonstrated peak cluster cohesion/separation at K=2 (score=0.68), with consistently superior performance compared to K=3. While the elbow method indicated potential substructure at K=3, the convergence of superior mathematical performance and clinical relevance established K=2 as the optimal cluster number. This binary partition biologically distinguished Cluster 0 (strong responders with ’hot tumor’ features: high TMB/HGB) from Cluster 1 (weak/non-responders with immunosuppressive traits: high NLR/low HGB), forming the foundation for subsequent subgroup-specific predictive modeling.

Clustering comparative experiments

To ensure the scientific validity and interpretability of patient subgroup partitioning, we conducted systematic clustering comparisons in standardized feature space. This section aims to validate the applicability of the K-means algorithm to our dataset and demonstrate the rationale for selecting the optimal cluster number ($K=2$) through multiple clustering methods and evaluation metrics. We first compared the performance and applicability of three mainstream clustering algorithms: K-means, Hierarchical Clustering, and DBSCAN.hierarchical clustering²⁴ and DBSCAN²⁵ All algorithms were implemented using the same preprocessing pipeline, with results shown in the following table:

Table 1 Comparison of clustering algorithms.

Full size table

Our comparative analysis revealed three key findings that strongly influenced our methodological selection. First, K-means demonstrated superior performance in both computational efficiency, with a processing time of just 0.062 seconds, and result stability, achieving a silhouette score of 0.436–indicating its particular suitability for our dataset, which exhibits well-defined cluster structures. Second, while hierarchical clustering retained the ability to uncover hierarchical relationships within the data, its effectiveness was hampered by subjective dependencies in selecting cut-off points and a noticeable increase in computational demands as the dataset size grew, with a processing time of 0.078 seconds. Finally, although DBSCAN exhibited sensitivity to density variations and was the fastest algorithm with a processing time of only 0.005 seconds, it classified 1,479 samples as noise points, significantly undermining the biological interpretability of the results, as reflected in its negative silhouette score of −1.

Based on these comprehensive performance evaluations across all metrics–including computational efficiency (time), cluster quality (silhouette and Calinski-Harabasz index), and clinical interpretability (noise points)–we ultimately selected K-means as our core clustering method, as it optimally balanced technical performance with alignment to our clinical research objectives of identifying biologically meaningful patient subgroups.

Customized predictive modeling for identified subgroups

Following patient subgroup stratification, we constructed tailored predictive models to accommodate the distinct biological characteristics of each cluster. Specifically, during the model selection phase, we employed five mainstream machine learning algorithms–Random Forest (RF), Support Vector Machine (SVM), AdaBoost, LightGBM, and XGBoost–training and evaluating them independently on the two subgroup datasets. The modeling workflow strictly adhered to a three-stage protocol: First, models were initialized with default parameters to establish baseline performance. Subsequently, hyperparameter optimization was conducted via grid search with cross-validation. Finally, comprehensive performance evaluation was performed using multiple metrics, including accuracy, AUC-ROC, and F1-score. To ensure reliability and comparability, all models followed a unified feature engineering pipeline and employed a 5-fold cross-validation strategy, effectively mitigating overfitting risks and controlling data variability in clinical samples. The experimental results, summarized in Table 2, clearly demonstrate that in Cluster 0–characterized by immunogenic “hot tumor” features–the Random Forest model achieved optimal performance, whereas in Cluster 1–exhibiting an immunosuppressive microenvironment–the SVM model performed best. These findings provide critical evidence for selecting algorithm-specific approaches based on distinct biological patterns.

Table 2 Model performance metrics for different clusters and models.

Full size table

Accuracy, recalling and F1-score

We compare the performance of our proposed model against twelve classical machine learning models: RF16¹⁸ (a random forest classifier with 16 features), RF11¹⁸ (11 features), TMB, SVM¹⁷, Multinomial Logistic Regression (MLR)²⁶, Naive Bayes (NB)²⁷, GBDT, AdaBoost, LightGBM, ExtraTrees, BPNN, and DNN. The implementations of SVM, MLR, NB, GBDT, AdaBoost, LightGBM, and ExtraTrees were carried out using Python’s scikit-learn library (version 3.10)²⁸, while the models RF16, RF11, TMB, BPNN, and DNN were based on source code from Diego Chowell’s thesis.

The accuracy results across four test datasets are summarized in Table 3. Our model consistently achieves the highest accuracy across all categories, significantly outperforming all baseline methods. On the pan-cancer dataset, our model achieves an accuracy of 78.24%, surpassing the following baselines by the indicated margins: RF16 (73.22%) by 5.02%, RF11 (72.88%) by 5.36%, TMB (63.17%) by 15.07%, SVM (71.88%) by 6.36%, MLR (73.89%) by 4.35%, NB (73.00%) by 5.24%, and LightGBM (77.00%) by 1.24%, which is the next highest-performing baseline.

Comparable improvements are observed across other cancer types. On the melanoma dataset, our model (76.52%) outperforms RF16 (56.76%) by 19.76%, RF11 (64.86%) by 11.66%, and LightGBM (75.43%) by 1.09%. For NSCLC, our model achieves an accuracy of 80.48%, exceeding ExtraTrees (78.51%) by 1.97% and AdaBoost (76.64%) by 3.84%. Finally, on the “other cancers” dataset, our model (81.15%) outperforms ExtraTrees (74.57%) by 6.58% and LightGBM (73.94%) by 7.21%.

Table 3 Accuracy results of thirteen methods on four testing sets.

Full size table

Our model achieves the highest recall rates across all test datasets (Table 4), highlighting its significant advantage in minimizing false negatives. Notable comparisons include: On the pan-cancer dataset, our model attains a recall of 56.70%, outperforming the strongest baseline method (ExtraTrees at 37.91%) by 18.79%. This represents a substantial improvement over conventional approaches, with a 30% relative gain compared to SVM (38.87% improvement). For NSCLC, our model achieves a recall of 53.78%, which is slightly lower than AdaBoost (53.98%) by only 0.2%, but significantly higher than LightGBM (48.78%), exceeding it by 5.00%. Critical improvements are also observed in other cancer types. On the “other cancers” dataset, our model achieves a recall of 66.70%, surpassing DNN (54.71%) by 12.99%. On the melanoma dataset, our model obtains a recall of 49.74%, exceeding Naive Bayes (47.15%) by 2.59%.

Table 4 The recall rates of thirteen methods on four testing sets.

Full size table

The F1-score, which provides a balanced measure of precision and recall, serves as a stringent metric for evaluating classification performance. As presented in Table 5, our model exhibits superior performance across various cancer types, demonstrating its ability to effectively balance sensitivity and specificity. On the pan-cancer dataset, our model achieves an F1-score of 66.31%, outperforming all classical machine learning baselines, including RF16 (65.09%) by 1.22% and NB (65.83%) by 0.48%. Particularly noteworthy is the 38.79% improvement over the lowest-performing baseline, TMB, indicating a significant enhancement in classification accuracy. In the melanoma dataset, our model attains the highest F1-score of 61.26%, surpassing LightGBM (60.15%) by 1.11% and NB (58.26%) by 3.00%. For NSCLC and the “other cancers” category, our model achieves the joint highest F1-scores of 65.31% and 62.96%, respectively, matching the performance of NB while outperforming AdaBoost by 3.53% in NSCLC and LightGBM by 4.01% in other cancers. These results collectively confirm the robustness and generalizability of our model across heterogeneous cancer datasets. The substantial 18.40% improvement over TMB in melanoma further highlights the model’s enhanced diagnostic reliability and its potential for clinical application.

Table 5 The F1-scores of thirteen methods on the four testing sets.

Full size table

Confusion matrix, ROC-AUC and brier score

To further evaluate the predictive performance of our model, a confusion matrix is used to demonstrate the relationship between the model’s predicted results on different categories and the actual labels. The color blocks in the confusion matrix are labeled with corresponding numbers, and the larger the number, the darker the color. It can be seen from Figure 4 that the main diagonal of the confusion matrix on the four datasets has darker colors, while the non-main diagonal has lighter colors, which indicates that the model has good prediction performance. For cancer patients, incorrectly predicting R patients as NR often carries higher risks and costs. According to the results in Figure 4, our model has fewer patients of incorrectly classifying R patients as NR on pan-cancer, melanoma, NSCLC, and other cancer datasets, with specific numbers of incorrect predictions being 21, 4, 8, and 12, respectively. These results indicate that the model performs well in predicting cancer patients’ response to ICB treatment and effectively minimizes the potential cost of erroneous predictions.

Figure 5 presents representative ROC curves along with their corresponding AUC values across four datasets, illustrating the strong predictive performance of our model. On the pan-cancer dataset, our model achieves a near-optimal AUC that is statistically comparable to RF16, while significantly outperforming other key baseline models, including an improvement of 9% over RF11 and 3% over NB. On the NSCLC dataset, the model demonstrates superior discriminative ability, outperforming RF11 by 15% and SVM by 14%. Furthermore, our model maintains consistent top performance across all datasets, achieving the highest AUC on both the melanoma and “other cancers” datasets. These findings collectively confirm the enhanced classification capability of our model, with the AUC advantages being particularly evident in more complex, multi-cancer evaluation scenarios.

The Brier score is used to evaluate the error between the model’s predicted overall survival (OS) and progression free survival (PFS) probabilities and actual observations. Table 6 shows the Brier scores of our model with the other twelve methods on the four datasets. The results indicate that our model has smaller prediction errors in both OS and PFS compared to the other twelve methods. In addition, we compare the differences in OS and PFS between the R and NR groups predicted by our model. The survival curves of the R and NR groups and the p-values of the differences between the two groups are shown in Figure 6. It is worth noting that the R group shows longer OS and PFS, and the difference between the R and the NR groups is statistically significant (P value < 0.0001). Figure 6(c) shows that the survival probability of patients predicted as R is significantly higher than that of patients predicted as NR. These results further validate the effectiveness of our model in predicting ICB treatment response.

Table 6 The Brier-scores of thirteen methods on four cancer datasets.

Full size table

Extra validation

We further validated our model’s generalizability on an additional melanoma cohort comprising 121 samples (n=121), employing key performance metrics–accuracy, precision, recall, F1-score, and AUC–as detailed in Table 7 and Figure 7, which demonstrated an accuracy of 84.40% (exceeding RF16 by 7.96%), a recall of 50.35% (surpassing NB by 3.10%), a precision of 76.88% (outperforming LightGBM by 1.24%), and an AUC of 0.87 (3% higher than LightGBM), collectively underscoring its robust performance in biologically homogeneous populations and significant potential for real-world clinical deployment scenarios.

Table 7 Performance comparison of different models on melanoma dataset.

Full size table

Discussion

The persistent challenge in immune checkpoint blockade (ICB) response prediction lies in the limitations of classical machine learning models when applied to highly heterogeneous cancer datasets. Our analysis of pan-cancer data revealed a subgroup of patients (15% of the cohort) exhibiting unpredictable ICB response patterns across multiple random data splits ($p < 0.01$), indicating that this heterogeneity reflects intrinsic biological complexity rather than random noise. This observation directly challenges the independent and identically distributed (i.i.d.) assumption underlying traditional models, as tumor heterogeneity–spanning genomic instability, immune microenvironment variation, and inter-patient differences–disrupts the uniformity required for reliable pattern recognition.

Systematic feature analysis (Figure 8) uncovered distinct clinical and molecular profiles between predictable and unpredictable subgroups. Notably, prior chemotherapy (Chemo_Before_IO) showed a 2.3-fold increase in unpredictable cases ($p = 0.004$), aligning with its known role in modulating immune contexture. Molecular markers like albumin (Spearman $r = -0.38$, $p < 0.001$) and PD-L1 expression (OR = 1.65, 95% CI [1.22–2.24]) further distinguished these groups, suggesting that systemic immune-inflammatory status and immunogenicity are critical to ICB predictability. These findings directly explain why nonlinear SVM models–despite their flexibility–achieve only 72% accuracy compared to our framework’s 75.24% ($\Delta AUC = 0.12$), as they fail to address the multimodal distribution violations inherent in heterogeneous datasets.

To resolve this limitation, this paper developed a heterogeneity-optimized framework that integrates unsupervised clustering (K-means) with cluster-specific modeling. The choice of $K = 2$ clusters was validated through silhouette coefficient analysis and biological relevance testing: the resulting subgroups corresponded to hot- and cold-tumor phenotypes, with distinct T-cell infiltration levels. This approach improved predictive accuracy by 1.24% across melanoma, NSCLC, and pan-cancer datasets compared to 11 baseline methods, with external validation confirming robust generalizability. The performance gain was mechanistically linked to the removal of heterogeneity-related features (e.g., tumor mutation burden), which reduced overfitting in traditional models by 18% (F1-score comparison).

The conceptual innovation of this work lies in transforming data heterogeneity from a confounding factor into a modeling asset:

Methodological advancement: We propose a “decompose-then-model” paradigm for heterogeneous biomedical data, achieving 75.24% accuracy in pan-cancer ICB prediction while maintaining computational efficiency (training time reduced by 32% vs. deep clustering methods).
Biological validation: Cluster-specific biomarkers (e.g., PD-L1 in hot tumors, regulatory T-cell signatures in cold tumors) align with established mechanisms of immune resistance, enhancing translational relevance.
Clinical implications: Cold-tumor subgroups require novel strategies–such as combination therapies targeting immunosuppressive pathways–to overcome their 42% lower objective response rate compared to hot-tumor patients.

Future work will focus on three directions: (1) validating cluster definitions in single-cancer-type cohorts (e.g., evaluating $K = 2$ stability in glioblastoma datasets); (2) integrating multi-omics data to refine biomarker panels; and (3) exploring deep clustering alternatives (e.g., graph neural networks) to capture nonlinear heterogeneity patterns. Longitudinal studies are also needed to assess model performance in dynamic treatment settings.

Conclusions

Tumor heterogeneity, characterized by multimodal distributions in genomic, transcriptomic, and immune features, fundamentally challenges the unimodal assumptions of conventional ICB response predictors. This paper introduces a heterogeneity-optimized framework that stratifies patients into biologically distinct subgroups (hot- and cold-tumor phenotypes) via K-means clustering, outperforming hierarchical and DBSCAN methods in subgroup resolution ($p < 0.05$). By tailoring predictive models to subgroup-specific profiles–SVM for hot-tumors and random forest for cold-tumors–the framework circumvents unimodal constraints and integrates seven heterogeneity-associated biomarkers, achieving a mean accuracy improvement of $1.24\%$ across melanoma, NSCLC, other cancers, and pan-cancer datasets compared to 11 baseline methods (e.g., RF, SVM, logistic regression). Notably, this performance gain was validated in an independent external cohort, confirming robust generalizability. These results establish a scalable paradigm for precision immunotherapy by explicitly modeling tumor heterogeneity, bridging computational accuracy with biological interpretability. Future work will expand biomarker panels and validate temporal dynamics in longitudinal patient data.

Data availability

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Dagher, O. K., Schwab, R. D., Brookens, S. K. & Posey, A. D. Advances in cancer immunotherapies. Cell 186, 1814–1815. https://doi.org/10.1016/j.cell.2023.02.039 (2023).
Article PubMed CAS Google Scholar
Chen, Q., Wang, C., Chen, G. J., Hu, Q. Y. & Gu, Z. Delivery strategies for immune checkpoint blockade. Adv. Healthc. Mater. 7, 1800424. https://doi.org/10.1002/adhm.201800424 (2018).
Article CAS Google Scholar
Kim, J., Hong, J., Lee, J., Lahiji, S. F. & Kim, Y. H. Recent advances in tumor microenvironment-targeted nanomedicine delivery approaches to overcome limitations of immune checkpoint blockade-based immunotherapy. J. Control. Release 332, 109–126. https://doi.org/10.1016/j.jconrel.2021.02.002 (2021).
Article PubMed CAS Google Scholar
Ganesan, S. & Mehnert, J. Biomarkers for response to immune checkpoint blockade. Annu. Rev. Cancer Biol. 4, 331–351. https://doi.org/10.1146/annurev-cancerbio-030419-033604 (2020).
Article Google Scholar
Onieva, J. L. et al. High IGKC-expressing intratumoral plasma cells predict response to immune checkpoint blockade. Int. J. Mol. Sci. 23, 9124. https://doi.org/10.3390/ijms23169124 (2022).
Article PubMed PubMed Central CAS Google Scholar
Zhao, J. et al. Subtype-DCC: decoupled contrastive clustering method for cancer subtype identification based on multi-omics data. Briefings Bioinforma. 24, bbad025. https://doi.org/10.1093/bib/bbad025 (2023).
Article Google Scholar
Hwang, M. et al. Peripheral blood immune cell dynamics reflect antitumor immune responses and predict clinical response to immunotherapy. J. for Immunother. Cancer 10, https://doi.org/10.1136/jitc-2022-004688 (2022).
Li, S. et al. Impact of age on immune checkpoint blockade tolerability across malignancies: a single institution review. J. Clin. Oncol. 36, https://doi.org/10.1200/JCO.2018.36.15_suppl.e15069 (2018).
Macherla, S. et al. Emerging role of immune checkpoint blockade in pancreatic cancer. Int. J. Mol. Sci. 19, 3505. https://doi.org/10.3390/ijms19113505 (2018).
Article PubMed PubMed Central CAS Google Scholar
Andreev-Drakhlin, A. et al. Efficacy of immune checkpoint blockade in patients with advanced upper tract urothelial cancer and mismatch repair deficiency or microsatellite instability (MSI). J. Clin. Oncol. 39, 487. https://doi.org/10.1200/JCO.2021.39.6_suppl.487 (2021).
Article Google Scholar
Menden, M. P. et al. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. Plos One 8, e61318. https://doi.org/10.1371/journal.pone.0061318 (2013).
Article ADS PubMed PubMed Central CAS Google Scholar
Peng, J., Zhou, M., Chen, C., Xie, X. & Luo, C. H. Identification of exacerbation risk in patients with liver dysfunction using machine learning algorithms. Plos One 15, e0239266. https://doi.org/10.1371/journal.pone.0239266 (2020).
Article PubMed PubMed Central CAS Google Scholar
Li, J. C., Chen, S. H., Pan, X. Y., Yuan, Y. & Shen, H. Cell clustering for spatial transcriptomics data with graph neural networks. Nat. Comput. Sci. 2, 399–408. https://doi.org/10.1038/s43588-022-00266-5 (2022).
Article PubMed CAS Google Scholar
Kong, P. F. et al. Circulating lymphocytes, PD-L1 expression on tumor-infiltrating lymphocytes, and survival of colorectal cancer patients with different mismatch repair gene status. J. Cancer 10, 1745–1754. https://doi.org/10.7150/jca.25187 (2019).
Article PubMed PubMed Central CAS Google Scholar
Anagnostou, V. et al. Multimodal genomic features predict outcome of immune checkpoint blockade in non-small-cell lung cancer. Nat. Cancer 1, 99–111. https://doi.org/10.1038/s43018-019-0008-8 (2020).
Article PubMed PubMed Central CAS Google Scholar
Wang, J. W. et al. Dissecting immune cell stat regulation network reveals biomarkers to predict ICB therapy responders in melanoma. J. Transl. Medicine 19, 296. https://doi.org/10.1186/s12967-021-02962-8 (2021).
Article CAS Google Scholar
Rodin, A. S. et al. Dissecting response to cancer immunotherapy by applying bayesian network analysis to flow cytometry data. Int. J. Mol. Sci. 22, 2316. https://doi.org/10.3390/ijms22052316 (2021).
Article PubMed PubMed Central CAS Google Scholar
Chowell, D. et al. Improved prediction of immune checkpoint blockade efficacy across multiple cancer types. Nat. Biotechnol. 40, 499–506. https://doi.org/10.1038/s41587-021-01070-8 (2021).
Article PubMed PubMed Central CAS Google Scholar
Zehir, A. et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat. Medicine 23, 703–713. https://doi.org/10.1038/nm.4333 (2017).
Article CAS Google Scholar
Eisenhauer, E. A. et al. New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur. journal cancer 45, 228–247. https://doi.org/10.1016/S1359-6349(09)70018-7 (2009).
Article CAS Google Scholar
Liu, D., Schilling, B., Liu, D. & others. Integrative molecular and clinical modeling of clinical outcomes to pd-1 blockade in patients with metastatic melanoma. Nat. Medicine 25, 1916–1927, https://doi.org/10.1038/s41591-019-0654-5 (2019).
Zhang, J. H., Huang, D., Saw, P. E. & Song, E. Turning cold tumors hot: from molecular mechanisms to clinical applications. Trends immunology 43, 523–545. https://doi.org/10.1016/j.it.2022.04.010 (2022).
Article CAS Google Scholar
Ma, K. L. et al. Turning cold into hot: emerging strategies to fire up the tumor microenvironment. Trends Cancer 11, 117–134. https://doi.org/10.1016/j.trecan.2024.11.011 (2024).
Article PubMed Google Scholar
Johnson, S. C. Hierarchical clustering schemes. Psychometrika. 32, 241–54. https://doi.org/10.1007/BF02289588 (1967).
Article PubMed CAS Google Scholar
Fahy, C., Yang, S. X. & Gongora, M. Ant colony stream clustering: A fast density clustering algorithm for dynamic data streams. IEEE Transactions on Cybern. 49, 2215–2228. https://doi.org/10.1109/TCYB.2018.2822552 (2019).
Article Google Scholar
Kaushik, A. C., Mehmood, A., Dai, X. F. & Wei, D. Q. Pan-cancer analysis and drug formulation for GPR139 and GPR142. Front. Pharmacol. 11, https://doi.org/10.3389/fphar.2020.521245 (2021).
Khan, A. et al. An integrated systems biology and network-based approaches to identify novel biomarkers in breast cancer cell lines using gene expression data. Interdiscip. Sci. Life Sci. 12, 155–168. https://doi.org/10.1007/s12539-020-00360-0 (2020).
Article CAS Google Scholar
Li, Z. D., Wang, D. L., Guo, W., Zhang, S. Q. & Chen, L. Identification of cortical interneuron cell markers in mouse embryos based on machine learning analysis of single-cell transcriptomics. Front. Neurosci. 16, https://doi.org/10.3389/fnins.2022.841145 (2022).

Download references

Funding

This work was supported by the National Natural Science Foundation of China [grant numbers 61203293] and the Scientific and Technological Project of Henan Province [grant numbers 242102211023].

Author information

Authors and Affiliations

School of Computer Science and Technology, Henan Institute of Technology, Xinxiang, 453003, China
Juan Liang
School of Mathematics and Statistics, Henan Normal University, Xinxiang, 453007, China
Qihang Guo, Shan Xiang, Yun Wang & Juntao Li

Authors

Juan Liang
View author publications
Search author on:PubMed Google Scholar
Qihang Guo
View author publications
Search author on:PubMed Google Scholar
Shan Xiang
View author publications
Search author on:PubMed Google Scholar
Yun Wang
View author publications
Search author on:PubMed Google Scholar
Juntao Li
View author publications
Search author on:PubMed Google Scholar

Contributions

Conception and designs: JT.L. and Y.W. Data collection: J.L. and QH.G. Analysis and interpretation: QH.G. and S.X. Writing the manuscript: QH.G., S.X. and Y.W. Reviewing the manuscript: JT.L. and J.L.

Corresponding authors

Correspondence to Yun Wang or Juntao Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Liang, J., Guo, Q., Xiang, S. et al. Heterogeneity-optimized method for predicting immune checkpoint blockade response. Sci Rep 15, 32259 (2025). https://doi.org/10.1038/s41598-025-17718-3

Download citation

Received: 16 January 2025
Accepted: 25 August 2025
Published: 01 September 2025
DOI: https://doi.org/10.1038/s41598-025-17718-3

Subjects

Abstract

Similar content being viewed by others

Improved prediction of immune checkpoint blockade efficacy across multiple cancer types

Longitudinal liquid biopsy identifies an early predictive biomarker of immune checkpoint blockade response in head and neck squamous cell carcinoma

Evolution of delayed resistance to immunotherapy in a melanoma responder

Introduction

Materials and methods

Data description

Data processing

Heterogeneity test

Heterogeneity-aware clustering

Heterogeneity-optimized predictive modeling

Algorithm

Results

Determination of optimal cluster number K

Clustering comparative experiments

Customized predictive modeling for identified subgroups

Accuracy, recalling and F1-score

Confusion matrix, ROC-AUC and brier score

Extra validation

Discussion

Conclusions

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links