Accurate prediction of disease-free and overall survival in non-small cell lung cancer using patient-level multimodal weakly supervised learning

Li, Yongmeng; Chai, Xiaodong; Yang, Moxuan; Xiong, Jiahang; Zeng, Junyang; Chen, Yun; Xu, Gang; Lin, Haifeng; Wang, Wei; Wang, Shuhao; Che, Nanying

doi:10.1038/s41698-025-00981-y

Download PDF

Article
Open access
Published: 19 June 2025

Accurate prediction of disease-free and overall survival in non-small cell lung cancer using patient-level multimodal weakly supervised learning

Yongmeng Li^1,2^na1,
Xiaodong Chai^1,2^na1,
Moxuan Yang³,
Jiahang Xiong⁴,
Junyang Zeng⁵,
Yun Chen⁶,
Gang Xu⁷,
Haifeng Lin^1,2,
Wei Wang⁴,
Shuhao Wang ORCID: orcid.org/0000-0002-5467-3548⁴^na2 &
…
Nanying Che ORCID: orcid.org/0000-0003-4179-1737^1,2^na2

npj Precision Oncology volume 9, Article number: 197 (2025) Cite this article

7102 Accesses
13 Citations
1 Altmetric
Metrics details

Subjects

Abstract

With the rapid progress in artificial intelligence (AI) and digital pathology, prognosis prediction for non-small cell lung cancer (NSCLC) patients has become a critical component of personalized medicine. In this study, we developed a multimodal AI model that integrated whole-slide images and dense clinical data to predict disease-free survival (DFS) and overall survival (OS) with high accuracy for NSCLC patients undergoing surgery. Utilizing data from 618 patients at Beijing Chest Hospital, the model achieved areas under the curve (AUC) of 0.8084 for predicting progression and 0.8021 for predicting death in the test set. Importantly, the model attained balanced accuracies of 0.7047 for predicting progression and 0.6884 for predicting death. By categorizing patients into high-risk and low-risk groups, the model identified significant differences in survival outcomes, with hazard ratios of 4.85 for progression and 4.57 for death, both with p values below 0.0001. Additionally, it uncovered novel digital biomarkers associated with poor prognosis, offering further insights into NSCLC treatment. This model has the potential to revolutionize postoperative decision-making by providing clinicians with a precise tool for predicting DFS and OS, thereby improving patient outcomes.

Development of artificial intelligence prognostic model for surgically resected non-small cell lung cancer

Article Open access 21 September 2023

Machine learning models for identifying predictors of clinical outcomes with first-line immune checkpoint inhibitor therapy in advanced non-small cell lung cancer

Article Open access 21 October 2022

Deep learning analysis of histopathological images predicts immunotherapy prognosis and reveals tumour microenvironment features in non-small cell lung cancer

Article 25 October 2024

Introduction

Lung cancer is the leading cause of cancer-related death and the second most commonly diagnosed cancer, accounting for approximately one in five (18.0%) cancer deaths and one in ten (11.4%) cancer diagnoses¹. Non-small cell lung cancer (NSCLC) represents 85% of all lung cancer cases². In clinical practice, NSCLC treatment is primarily guided by TNM staging³. Early-stage NSCLC patients (stage 0, stage IA, stage IB and stage IIA without high-risk factors) generally do not require postoperative interventions, whereas patients with stage IB or IIA with high-risk factors, stage IIB, and stage III typically require postoperative treatment. However, patients within the same stage often exhibit different clinical outcomes, posing challenges in determining the need for postoperative interventions based solely on TNM staging.

In early-stage NSCLC patients, the risk of disease progression or cancer-related death is not entirely eliminated. For stage IA NSCLC patients undergoing surgery alone, the 5-year disease-free survival (DFS) rate is 84.5%, and the 5-year overall survival (OS) rate is 96.8%⁴. Conversely, within the group of NSCLC patients who require postoperative interventions but do not receive them, some remain free from disease progression or cancer-related death. For stage IIB/IIIA NSCLC patients who undergo surgery alone, the 3-year progression-free survival rate is 36.1%⁵, and the 5-year OS rate for stage IIIA patients is 26%⁶. Prognostic prediction is crucial for determining whether postoperative interventions are necessary. Accurate tools that predict DFS and OS for NSCLC patients are essential for personalized treatment and improved disease management.

Artificial intelligence (AI)-based pathology has significantly advanced in the application to NSCLC, particularly in areas such as pathological diagnosis^7,8, molecular phenotype prediction^9,10, gene mutation prediction⁷, and prognostic prediction^{11,12,13,14,15,16,17,18,19,20,21,22,23,24,25}. Among these applications, prognostic prediction holds the greatest clinical importance for NSCLC patients. Previous studies on prognostic prediction have either excluded clinical data or incorporated only minimal clinical information. Although these studies successfully distinguished different prognostic groups, they lacked a strong correlation between predicted and actual survival outcomes. Additionally, they did not effectively predict DFS and OS, which are critical in NSCLC prognosis.

Several digital biomarkers have emerged from these studies. For example, the density of tumor-infiltrating lymphocytes (TILs) has been identified as a biomarker associated with worse prognosis¹⁴, and the growth pattern of adenocarcinoma has also been linked to prognosis¹⁵. Moreover, a recent study developed and validated four digital biomarkers based on tertiary lymphoid structures and necrosis¹⁹. However, the digital biomarkers associated with prognosis have not been fully elucidated.

In this work, we have developed a multimodal AI model for prognostic prediction in NSCLC patients undergoing surgery, referred to as AIM-LCpro. Our model uses a patient-level weakly supervised learning approach that integrates WSIs with dense clinical data (Fig. 1). It not only categorizes patients into high-risk and low-risk groups but also predicts precise DFS and OS for each patient. Through model visualizations, we have identified several novel digital biomarkers associated with poor prognosis in NSCLC patients. This model has the potential to guide decisions regarding the need for postoperative interventions and improve overall prognosis in NSCLC patients.

Results

Baseline characteristics of the study cohort

In the study cohort, 173 patients (27.99%) experienced disease progression within 5 years, and 121 patients (19.58%) died of NSCLC within the same period. Of the total cohort, 353 patients (57.12%) did not require postoperative interventions, while 265 patients (42.88%) did. The median follow-up period of the study cohort was 73 months (Supplementary Table 1). The baseline characteristics of the study cohort are presented in Table 1.

Table 1 Characteristics of the study cohort

Full size table

Performance of AIM-LCpro in predicting prognosis of NSCLC patients

In the training, validation, and test sets, the areas under the ROC curves (AUCs) for predicting progression within 5 years were 0.9925, 0.8801, and 0.8084, respectively (Fig. 2a–c). Similarly, the AUCs for predicting death within 5 years were 0.9826, 0.8477, and 0.8021, respectively (Fig. 2d–f). These results suggest that our model has the potential to accurately distinguish between patients who will experience progression and those who will experience death.

**Fig. 2: ROC curves for predicting prognosis of NSCLC patients using AIM-LCpro model.**

We found that the performance of unimodal models was inferior to that of the multimodal model. When predicting progression and death in the test set, the AUCs of the unimodal model using clinical data were 0.7597 and 0.6733, respectively (Supplementary Fig. 1a, c). The AUCs of the unimodal model using pathological images were 0.6562 and 0.7082, respectively (Supplementary Fig. 1b, d).

We applied the selected thresholds to predict outcomes in the training, validation, and test sets, achieving strong performance (Supplementary Tables 2–10). Specifically, when predicting whether patients’ disease would progress within 5 years or whether they would die within 5 years in the test set, our model demonstrated high balanced accuracies (0.7047 and 0.6884, respectively). In the test set, the model exhibited a sensitivity of 0.5556 for predicting progression within 5 years and 0.5385 for predicting death within the same period. Additionally, the model’s specificity for predicting progression and death within 5 years was 0.8539 and 0.8384, respectively.

Harrell’s C-index was also used to evaluate the performance of AIM-LCpro. In the test set, the Harrell’s C-index for predicting progression and death within 5 years was 0.7748 and 0.7775, respectively (Supplementary Table 11).

High-risk and low-risk groups

The AIM-LCpro model was able to categorize patients into high-risk or low-risk groups based on two criteria: predicting progression within 5 years and death within 5 years (Fig. 3). For instance, if the model predicted that a patient’s disease would progress within 5 years, the patient was categorized as high-risk; otherwise, they were categorized as low-risk. In the test set, there was a statistically significant difference between high-risk and low-risk groups for all patients, with p values less than 0.0001 and Hazard Ratios (HR) of 4.85 for progression and 4.57 for death (Fig. 3a, d).

**Fig. 3: Comparison between the K-M curves of high-risk groups and low-risk groups.**

Among patients who did not require postoperative interventions, the difference between high-risk and low-risk groups remained significant, with a p value of 0.0030 and HR of 5.01 for progression, and a p value of 0.0443 and HR of 4.10 for death (Fig. 3b, e). Similarly, for patients who required postoperative interventions, the high-risk group demonstrated a significant difference compared to the low-risk group, with p values less than 0.0001 and HR of 4.34 for progression, and a p value of 0.0036 and HR of 3.51 for death (Fig. 3c, f).

Consistency between predicted and actual K-M curves

The AIM-LCpro model’s predictive outcomes for both 5-year progression and death aligned with actual survival data, with no statistically significant discrepancies observed (for progression: p = 0.5029, HR = 0.85; for death: p = 0.2321, HR = 1.10), as shown in Fig. 4a, d.

**Fig. 4: Comparison between the predicted and actual K-M curves in the test set.**

For patients who did not require postoperative interventions, the model’s survival predictions were also consistent with actual outcomes, with no statistically significant differences (p = 0.4636, HR = 1.48 for progression; p = 0.3091, HR = 1.76 for death), as illustrated in Fig. 4b, e.

Similarly, for patients requiring postoperative interventions, the model maintained its accuracy, showing no significant variance between predicted and actual survival (p = 0.0580, HR = 0.56 for progression; p = 0.5253, HR = 0.81 for death), as depicted in Fig. 4c, f.

We also conducted the patient-level evaluation. As shown in Supplementary Fig. 2, the errors of DFS and OS between the predicted values and the actual values in the test set were 11.34 ± 17.23 months and 7.76 ± 13.82 months, respectively.

Feature importance analysis of the clinical data modality

In the clinical data modality, the top three features with the highest weights for predicting progression were the proportion of lepidic adenocarcinoma, the number of metastatic lymph nodes in the 2nd and 4th groups, and tumor location (Supplementary Table 12). Correspondingly, for predicting death, the top three features with the highest weights were pleural effusion, family history, and lymph node dissection (Supplementary Table 13).

Investigation of prognostic digital biomarkers through AIM-LCpro

To intuitively display the pathological features associated with prognosis, we mapped the prognostic-related features extracted by the AIM-LCpro model onto WSIs in the form of heatmaps. As shown in Fig. 5a, when comparing the heatmaps of progression and death in the test set, the number of hotspots for progression was greater than for death, both at the patient level and the slide level. Moreover, the hotspots for patients who died within 5 years were largely contained within the progression hotspots, particularly in patients or slides with more 5-year progression hotspots than 5-year death hotspots. Given that fewer patients died within 5 years compared to those with progression, it was possible that the model learned fewer features for death prediction. The consistency in the distribution of risk hotspots for progression and death highlighted the model’s predictive capabilities. By analyzing these heatmaps, we can better understand the model’s predictions, identify areas that contribute to these predictions, and potentially uncover new digital biomarkers.

**Fig. 5: Interpreted pathological features.**

The test set included 84 patients with non-mucinous adenocarcinoma (NMA), 36 patients with squamous cell carcinoma (SCC), and 5 patients with other NSCLC types (Fig. 5b). In SCC, risk hotspots were predominantly concentrated in the tumor regions, where tumor cells were disorderly arranged with enlarged and bizarre nuclei, and frequent mitotic figures were observed (Fig. 5c). Similar to SCC, in NMA, risk hotspots also tended to localize within the tumor areas (Fig. 5e–j). We further analyzed these regions covered by hotspots.

Of the 84 patients with NMA, risk hotspots were found to be distributed in micropapillary adenocarcinoma (MPA) and solid adenocarcinoma (SPA). As shown in Fig. 5d, e, MPA was present in 11 patients, of which 5/11 and 3/11 patients had risk hotspots in the 5-year progression and 5-year death heatmaps, respectively. Surprisingly, 30 patients had SPA, and all of them had risk hotspots in their SPA areas in both the 5-year progression and 5-year death heatmaps, although the instance-level hotspots did not cover all SPA regions (Fig. 5d, f). These two histological subtypes are coincidentally classified as high-grade patterns in the 5th edition of the WHO classification of thoracic tumors. The most common histological type, lepidic adenocarcinoma (LPA), exhibited only minimal coverage by risk hotspots (Fig. 5d, g). Interestingly, LPA was considered a low-grade histology in the 5th edition of the WHO classification of thoracic tumors.

Regarding the other two NMA histological subtypes, acinar adenocarcinoma (APA) and papillary adenocarcinoma (PPA), the distribution of risk hotspots was uneven. For APA, we identified two types of glands more likely to be covered by risk hotspots. As shown in Fig. 5h, the first type consisted of small, irregular glands made up of pleomorphic cells, surrounded by desmoplastic stroma, which was often hypovascular and composed of collagen fibers interspersed with fibroblasts and lymphocytes. The second type consisted of large, irregular glands with multilayered cells, characterized by significant cellular and nuclear pleomorphism. These cells were crowded, and some protruded into the glandular lumen, forming structures similar to a “papillary” pattern without a central axis (Fig. 5h). The stroma in these areas was loose and rich in neomicrovessels, consistent with the pure stromal regions identified as risk hotspots, as demonstrated in Fig. 5i. Fewer areas of PPA were identified as risk hotspots, with the model appearing to recognize regions with crowded cell arrangements as high-risk (Fig. 5j). The pathological features related to prognosis identified by our model may serve as digital biomarkers and warrant further validation in future studies.

Discussion

We demonstrated that a multimodal model combining dense clinical data with WSIs can successfully predict the prognosis of NSCLC patients undergoing surgery. AIM-LCpro effectively screens for and utilizes prognostic information, achieving a high level of balanced accuracies. The features of the clinical data modality enhanced the performance of the model. To our knowledge, no other prognostic prediction models for surgical NSCLC patients have yet entered clinical application. Our model’s ability to predict which patients do or do not require postoperative treatment aligns closely with clinical application scenarios.

Previous studies relied heavily on manual annotation or predefined image features^{11,12,13,14,15,16,17,18,19}. In contrast, our model does not require manual WSI annotation, significantly reducing the manpower involved. Additionally, it does not rely on predefined image features. Instead, it uses CAMEL2 to automatically screen and extract regions associated with prognosis²⁶. By avoiding predefined features, the model is free to search for prognostic regions across the entire WSI without limitations. It can categorize patients into high-risk and low-risk groups while predicting 5-year DFS and OS. Moreover, there is no statistically significant difference between the predicted and actual survival outcomes, which strengthens the validity of stratifying patients into high-risk and low-risk groups.

In clinical practice, physicians need tools to predict patient outcomes. If the model predicts NSCLC patients are at risk of progression or death, they can be recommended for postoperative interventions. Conversely, if patients are predicted to remain free from progression or death, chemotherapy can be avoided, aiding in more personalized treatment plans.

NSCLC exhibits significant tumor heterogeneity^27,28. This heterogeneity applies not only to tumor epithelial cells but also to the various microenvironments interacting with tumor cells²⁹. The digital biomarkers identified by our model from WSIs may reflect this heterogeneity and aid in personalizing treatment for NSCLC patients.

Deep learning (DL)-based computational pathology enables automated and high-throughput extraction of features from histopathological images, with high sensitivity to subtle characteristics^30,31. We developed AIM-LCpro that assisted prognostic prediction in NSCLC patients undergoing surgery and AIM-LCpro visually and objectively presented the prognosis-related features it extracted in the form of heatmaps, allowing pathologists to carefully review and analyze them to mine digital biomarkers. Similar to traditional biomarkers, digital biomarkers serve as indicators for diagnosis, prognosis, and therapeutic responses and should demonstrate clinical validity^32,33. The clinical utility of new biomarkers can be evaluated by their association with existing biomarkers or by directly proving their usefulness³⁴. In SCC, our model identified areas with a high mitotic index as risk hotspots, consistent with previous findings that associated a high mitotic index with poor prognosis³⁵. Additionally, our model showed varying degrees of emphasis on different histologic subtypes within the NMA. MPA and SPA were identified as risk areas, in line with high-grade growth patterns defined by the latest WHO classification of thoracic tumors. Furthermore, LPA, known for having the best prognosis, was not recognized as a risk area. APA, associated with intermediate prognosis, was widely distributed across the slides. Through heatmap analysis, we identified histological characteristics in APA that indicated poor prognosis. This finding might lead to the discovery of novel biomarkers, thereby improving classification systems. Further evidence and additional data were needed to verify this.

In addition to the conventional histological features, our model also identified some stromal areas as risk hotspots, including microvessels, fibroblasts, and extracellular matrix (ECM) and so on, which were the primary components of the tumor microenvironment and had been considered to exert important effects on the progression, metastasis and prognosis of the tumor³⁶. Angiogenesis is a complex process and a key hallmark of cancers, and there are lots of studies confirmed that angiogenesis is crucial for the growth and metastasis of lung cancers³⁷. It is common for human NSCLC to exhibit desmoplasia, characterized by cancer-associated fibroblasts (CAFs). What is more, CAFs also influences cancer cell proliferation, invasion, and drug resistance³⁸. For example, CAFs have associated with T-cell exclusion in human lung tumors, contributing to immune suppression and tumor growth³⁹. The ECM plays key roles in establishment and maintenance of tissue architecture of tumor. It has been reported that Tenascin-C could mediate lung adenocarcinoma metastasis^40,41. Therefore, some of the risk hotspots identified by our model may help us to better understand the mechanisms of the tumor microenvironment in the occurrence and development of tumors, providing new support and ideas for future research.

Predicting the prognosis of NSCLC surgery patients raises ethical concerns. For example, knowing a poor prognosis in advance may affect patients’ quality of life. Additionally, the question of who bears responsibility for harm caused by incorrect predictions remains unanswered.

Our study has several limitations. First, to avoid potential information leakage, we did not include information about subsequent treatments after progression, which could have compromised the model’s credibility. Second, the clinical benefits of altering postoperative intervention strategies based on the model’s predictions have not yet been validated. It remains to be seen how much patients would benefit from such an approach. Finally, our model is based on a relatively small cohort. Further studies with larger sample sizes are needed to enhance the model’s ability to predict NSCLC prognosis.

Methods

Study population and inclusion/exclusion criteria

We enrolled 641 NSCLC patients who underwent lung surgery at Beijing Chest Hospital between January 2016 and November 2017. After excluding 23 patients, 618 patients (BCH study cohort) were ultimately included in the study.

The inclusion criteria were: (1) NSCLC patients who underwent radical surgery, or NSCLC patients who underwent pulmonary surgery but did not receive lymph node dissection due to poor pulmonary function; (2) NSCLC patients who agreed to follow-up. Patients with stages 0 to IIIB were all potentially included. The exclusion criteria were: (1) patients with other incurable malignant tumors; (2) patients who died from other diseases before progression within 5 years after surgery; (3) cases where all primary tumor tissues were frozen prior to being fixed in formalin.

The study was conducted in accordance with the principles of the Declaration of Helsinki and approved by the Ethics Committee of Beijing Chest Hospital, Capital Medical University (YJS-2023-16).

WSI acquisition

All hematoxylin-eosin (HE) stained slides of primary tumor tissues were chosen. But frozen slides and frozen paraffin slides were excluded. All WSIs in this study were formalin-fixed paraffin-embedded (FFPE) whole-slide H&E-stained images of primary tumor tissues. A total of 2629 WSIs were acquired for the BCH study cohort, scanned using the KFBio KF-PRO-400 scanner, and saved at magnifications of ×400, ×200, ×100, and ×50.

Clinical data acquisition

Clinical variables were collected from inpatient medical records and included age, gender, smoking history, family history, TNM stages, lymph node dissection and metastasis, tumor size, CT data, postoperative treatment, and risk factors (Supplementary Table 14). Risk factors included poorly differentiated tumors, vascular invasion, wedge resection, visceral pleural involvement, and unknown lymph node status. All patients were followed up through telephone and outpatient services, with a postoperative follow-up period of over 5 years for all patients.

TNM staging was based on postoperative pathology reports and performed according to the 8th UICC/AJCC TNM edition for non-small cell lung cancer staging.

Dataset division for training, validation, and testing

The BCH study cohort was divided into training (428 patients), validation (62 patients), and test sets (125 patients) for predicting 5-year progression (Supplementary Table 15), and into training (426 patients), validation (62 patients), and test sets (125 patients) for predicting 5-year death (Supplementary Table 16). The training, validation, and test sets were comparable (Supplementary Tables 17 and 18). In the cohort, there were 3 patients with unknown progression status and 8 patients with unknown vital status. These patients were included in the training set and participated in a portion of the training process. Analysis was performed only when evaluating the specific segments of the training process in which they were involved.

Image segmentation and feature extraction

Glass regions were filtered out using RGB channel pixel variance calculations, and tissue regions were extracted and cut into 2048 × 2048-pixel image patches at ×20 magnification. Each patch was then divided into 64 instances, each measuring 256 × 256 pixels.

We performed the pre-training of CAMEL2 based on whether the patient progressed within 5 years. Similarly, we also performed the pre-training of CAMEL2 based on whether the patient died within 5 years. These two processes were independent of each other. We obtained two pre-trained CAMEL2 models.

For image features, we used the pre-trained CAMEL2 weakly supervised framework to extract features from patches in the training set, extracting intermediate features from CAMEL2 as the patch’s image feature representation. The core of this enhanced framework was an instance-level classifier, which served as the foundation for model interpretability and visualization. Each patch had two image feature representations: one for progression and one for death. To obtain patient-level image feature representations, we sorted all patch image feature representations in descending order based on the prediction probability output by CAMEL2, then averaged the top 10% of patch-level features to generate the patient-level image feature representation.

Clinical data standardization and normalization

Clinical data contained discrete and categorical variables. Discrete variables were normalized by scaling values between 0 and 1. Categorical variables, such as gender and disease type, were one-hot encoded using positional coding, where a one-dimensional vector represented two-dimensional information.

Architecture of the multimodal AI model

The workflow is shown in Supplementary Fig. 3. We employed a two-stage training strategy: first, classification training for prognostic metrics (progression and death), followed by regression training for time prediction (progression time and death time) based on the classification model weights.

Training procedure and algorithm selection

Preprocessed clinical features from the training set patients were passed through a clinical feature network to obtain clinical feature representations. This network consisted of linear layers, Batch Normalization layers, and ReLU layers. The patient-level image and clinical feature representations were concatenated and fed into the classification head network, which output the probability of progression or death. The classification head network comprised linear layers, Batch Normalization layers, and ReLU layers, with two independent linear layers in the final stage. The network was trained using cross-entropy loss.

For regression training, the clinical feature network weights were frozen, and two separate time prediction head networks were trained to output progression and death times. The time prediction head consisted of linear layers, Batch Normalization layers, ReLU layers, and a final Sigmoid layer. The output was multiplied by 60 to obtain the specific progression or death month. The network was trained using the L1 loss function. During inference, the network simultaneously output classification results and time predictions. For patients classified as negative samples, the corresponding time was set to 60 months; otherwise, the network’s original predicted output was retained.

Linear layers are one of the most fundamental layers in neural networks, also known as fully connected layers or dense layers. The role of multiple linear layers is to capture complex relationships in the data through linear transformations. Batch Normalization normalizes the input data by extracting the mean and variance along the batch dimension, reducing internal covariate shift, thereby accelerating the training process and enhancing model stability. The ReLU activation function introduces nonlinearity, enabling neural networks to learn and represent more complex functional relationships. By stacking multiple ReLU activation functions, neural networks can construct highly nonlinear mappings, thus better fitting complex data distributions. The sigmoid layer is a commonly used activation function layer in neural networks, which maps any real number to the interval (0,1).

The model assigned one probability of progression or death within 5 years to each patient. In the training and validation sets, thresholds were selected based on sensitivity, specificity, accuracy, and the Youden index. These thresholds were applied to the test set to evaluate the performance of the model. For patients who did not require postoperative treatment, thresholds of 0.1461 and 0.2092 were selected for progression and death within 5 years, respectively. For patients who required postoperative treatment, thresholds of 0.3123 and 0.3391 were used. These thresholds were applied to the test set.

To ensure the reliability, reproducibility, and fairness of model evaluation during training, testing, and validation, we utilized standardized metrics. A key focus was analyzing performance using the receiver operating characteristic (ROC) curve and its associated metrics.

Area under the ROC curve (AUC): Quantifies model performance across all classification thresholds. AUC ranges from 0 to 1, with values closer to 1 indicating superior discriminative power. This metric is particularly robust in imbalanced datasets, as it remains unaffected by class distribution skew.

The following metrics were derived under a certain threshold chosen from the ROC curve.

Sensitivity (true positive rate): measures the proportion of actual positive samples correctly identified:

$$Sensitivity=\frac{TP}{TP+FN}$$

(1)

where true positive (TP) denotes the number of correctly classified positive samples, and false negative (FN) represents the number of positive samples that were incorrectly classified as negative.

Specificity (true negative rate): measures the proportion of actual negative samples correctly identified:

$$Specificity=\frac{TN}{TN+FP}$$

(2)

where true negative (TN) is the count of correctly classified negative samples, and false positive (FP) refers to the number of negative samples incorrectly classified as positive.

Accuracy: evaluates overall prediction correctness:

$$Accuracy=\frac{TP+TN}{TP+TN+FP+FN}$$

(3)

Heatmap visualization and evaluation

To achieve precise identification of biomarkers, the heatmap visualization in this research leveraged prognosis prediction probabilities for each instance, mapping these values to heat intensity. Additionally, a sliding window-based inference strategy was employed, with finer granular steps during inference to enhance representation accuracy beyond the instance level, enabling more nuanced feature detection. Instances with probabilities lower than 0.3 are not displayed on the heatmap. In our study, pathologists would read, evaluate these heatmaps, and interpret the digital pathological biomarkers.

Statistical analysis

Categorical data were evaluated using Pearson’s chi-squared test or Fisher’s exact test. Measurement data were expressed as mean ± standard deviation and analyzed using the independent samples t-test or analysis of variance. Survival curves were generated using the Kaplan–Meier method. When survival curves did not intersect, they were compared using the log-rank test. When survival curves intersected, the Rényi test was utilized to make comparisons. Harrell’s C-index was computed in R using the Hmisc package. All tests were two-tailed, and a p value less than 0.05 was considered statistically significant. Statistical analysis was performed using SPSS software 26.0 or GraphPad Prism 10.

Data availability

Data are available upon reasonable request.

Code availability

The code can be accessed online: https://github.com/ThoroughFuture.

References

Sung, H. et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71, 209–249 (2021).
Article PubMed Google Scholar
Reck, M. & Rabe, K. F. Precision diagnosis and treatment for advanced non-small-cell lung cancer. N. Engl. J. Med. 377, 849–861 (2017).
Article CAS PubMed Google Scholar
Ettinger, D. S. et al. NCCN Guidelines® Insights: non-small cell lung cancer, version 2.2023. J. Natl Compr. Cancer Netw.21, 340–350 (2023).
Article CAS Google Scholar
Jiang, Y. et al. The impact of adjuvant EGFR-TKIs and 14-gene molecular assay on stage I non-small cell lung cancer with sensitive EGFR mutations. EClinicalMedicine 64, 102205 (2023).
Article PubMed PubMed Central Google Scholar
Scagliotti, G. V. et al. Randomized phase III study of surgery alone or surgery plus preoperative cisplatin and gemcitabine in stages IB to IIIA non-small-cell lung cancer. J. Clin. Oncol. 30, 172–178 (2012).
Article CAS PubMed Google Scholar
Douillard, J. Y. et al. Adjuvant vinorelbine plus cisplatin versus observation in patients with completely resected stage IB-IIIA non-small-cell lung cancer (Adjuvant Navelbine International Trialist Association [ANITA]): a randomised controlled trial. Lancet Oncol.7, 719–727 (2006).
Article CAS PubMed Google Scholar
Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).
Article CAS PubMed PubMed Central Google Scholar
Chen, C. L. et al. An annotation-free whole-slide training approach to pathological classification of lung cancer types using deep learning. Nat. Commun. 12, 1193 (2021).
Article CAS PubMed PubMed Central Google Scholar
Diao, J. A. et al. Human-interpretable image features derived from densely mapped cancer pathology slides predict diverse molecular phenotypes. Nat. Commun. 12, 1613 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wu, J. et al. Artificial intelligence-assisted system for precision diagnosis of PD-L1 expression in non-small cell lung cancer. Mod. Pathol. 35, 403–411 (2022).
Article CAS PubMed Google Scholar
Yu, K. H. et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat. Commun. 7, 12474 (2016).
Article CAS PubMed PubMed Central Google Scholar
Luo, X. et al. Comprehensive computational pathological image analysis predicts lung cancer prognosis. J. Thorac. Oncol.12, 501–509 (2017).
Article PubMed Google Scholar
Wang, Y. et al. Multi-scale pathology image texture signature is a prognostic factor for resectable lung adenocarcinoma: a multi-center, retrospective study. J. Transl. Med. 20, 595 (2022).
Article CAS PubMed PubMed Central Google Scholar
Pan, X. et al. Computerized tumor-infiltrating lymphocytes density score predicts survival of patients with resectable lung adenocarcinoma. iScience 25, 105605 (2022).
Article CAS PubMed PubMed Central Google Scholar
Alsubaie, N., Raza, S. E. A., Snead, D. & Rajpoot, N. M. Growth pattern fingerprinting for automatic analysis of lung adenocarcinoma overall survival. IEEE Access 11, 23335–23346 (2023).
Article Google Scholar
Wang, H., Xing, F., Su, H., Stromberg, A. & Yang, L. Novel image markers for non-small cell lung cancer classification and survival prediction. BMC Bioinforma. 15, 310 (2014).
Article Google Scholar
Wang, X. et al. Prediction of recurrence in early stage non-small cell lung cancer using computer extracted nuclear features from digital H&E images. Sci. Rep. 7, 13543 (2017).
Article PubMed PubMed Central Google Scholar
Alsubaie, N. M., Snead, D. & Rajpoot, N. M. Tumour nuclear morphometrics predict survival in lung adenocarcinoma. IEEE Access 9, 12322–12331 (2021).
Article Google Scholar
Kludt, C. et al. Next-generation lung cancer pathology: development and validation of diagnostic and prognostic algorithms. Cell Rep. Med. 5, 101697 (2024).
Article CAS PubMed PubMed Central Google Scholar
Zhao, L. et al. CoADS: cross attention based dual-space graph network for survival prediction of lung cancer using whole slide images. Comput. Methods Prog. Biomed. 236, 107559 (2023).
Article Google Scholar
Diao, S. et al. Automated cellular-level dual global fusion of whole-slide imaging for lung adenocarcinoma prognosis. Cancers 15, 4824 (2023).
Article PubMed PubMed Central Google Scholar
Shim, W. S. et al. DeepRePath: identifying the prognostic features of early-stage lung adenocarcinoma using multi-scale pathology images and deep convolutional neural networks. Cancers 13, 3308 (2021).
Zheng, Y. et al. Graph attention-based fusion of pathology images and gene expression for prediction of cancer survival. IEEE Trans. Med. Imaging 43, 3085–3097 (2024).
Article PubMed PubMed Central Google Scholar
Hattori, H., Sakashita, S., Tsuboi, M., Ishii, G. & Tanaka, T. Tumor-identification method for predicting recurrence of early-stage lung adenocarcinoma using digital pathology images by machine learning. J. Pathol. Inform. 14, 100175 (2023).
Article PubMed Google Scholar
Kim, P. J. et al. A new model using deep learning to predict recurrence after surgical resection of lung adenocarcinoma. Sci. Rep. 14, 6366 (2024).
Article CAS PubMed PubMed Central Google Scholar
Xu, G. et al. CAMEL2: enhancing weakly supervised learning for histopathology images by incorporating the significance ratio. Adv. Intell. Syst. 6, 12 (2024).
Article Google Scholar
Gridelli, C. et al. Non-small-cell lung cancer. Nat. Rev. Dis. Prim. 1, 15009 (2015).
Article PubMed Google Scholar
Chen, Z., Fillmore, C. M., Hammerman, P. S., Kim, C. F. & Wong, K. K. Non-small-cell lung cancers: a heterogeneous set of diseases. Nat. Rev. Cancer 14, 535–546 (2014).
Article CAS PubMed PubMed Central Google Scholar
Quail, D. F. & Joyce, J. A. Microenvironmental regulation of tumor progression and metastasis. Nat. Med. 19, 1423–1437 (2013).
Article CAS PubMed PubMed Central Google Scholar
Ramesh, S. et al. Artificial intelligence-based morphologic classification and molecular characterization of neuroblastic tumors from digital histopathology. NPJ Precision Oncol. 8, 255 (2024).
Article CAS Google Scholar
Liang, J. et al. Deep learning supported discovery of biomarkers for clinical prognosis of liver cancer. Nat. Mach. Intell. 5, 408–420 (2023).
Article Google Scholar
Arya, S. S., Dias, S. B., Jelinek, H. F., Hadjileontiadis, L. J. & Pappa, A. M. The convergence of traditional and digital biomarkers through AI-assisted biosensing: a new era in translational diagnostics? Biosens. Bioelectron. 235, 115387 (2023).
Article CAS PubMed Google Scholar
Montag, C., Elhai, J. D. & Dagum, P. On blurry boundaries when defining digital biomarkers: how much biology needs to be in a digital biomarker?. Front. Psychiatry 12, 740292 (2021).
Article PubMed PubMed Central Google Scholar
Song, Y., Kang, K., Kim, I. & Kim, T. J. Pathological digital biomarkers: validation and application. Appl. Sci.12, 13 (2022).
Article Google Scholar
Gürel, D. et al. The prognostic value of morphologic findings for lung squamous cell carcinoma patients. Pathol. Res. Pract. 212, 1–9 (2016).
Article PubMed Google Scholar
Altorki, N. K. et al. The lung microenvironment: an important regulator of tumour growth and metastasis. Nat. Rev. Cancer 19, 9–31 (2019).
Article CAS PubMed PubMed Central Google Scholar
Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).
Article CAS PubMed Google Scholar
Chhabra, Y. & Weeraratna, A. T. Fibroblasts in cancer: unity in heterogeneity. Cell 186, 1580–1609 (2023).
Article CAS PubMed PubMed Central Google Scholar
Grout, J. A. et al. Spatial positioning and matrix programs of cancer-associated fibroblasts promote T-cell exclusion in human lung tumors. Cancer Discov. 12, 2606–2625 (2022).
Article CAS PubMed PubMed Central Google Scholar
Paolillo, M. & Schinelli, S. Extracellular matrix alterations in metastatic processes. Int. J. Mol. Sci. 20, 4947 (2019).
Article CAS PubMed PubMed Central Google Scholar
Gocheva, V. et al. Quantitative proteomics identify Tenascin-C as a promoter of lung cancer progression and contributor to a signature prognostic of patient survival. Proc. Natl Acad. Sci. USA 114, E5625–e5634 (2017).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by Beijing AI+Health Cultivation Innovation Project (No. Z241100007724001), Beijing Municipal Public Welfare Development and Reform Pilot Project for Medical Research Institutes (No. JYY2023-15), Beijing Nova Program, and 2023 Science and Technology Projects of Qinghai Province, China (Basic Research Program, No. 2023-ZJ-732).

Author information

These authors contributed equally: Yongmeng Li, Xiaodong Chai.
These authors jointly supervised this work: Shuhao Wang, Nanying Che.

Authors and Affiliations

Beijing Tuberculosis and Thoracic Tumor Research Institute, Beijing, China
Yongmeng Li, Xiaodong Chai, Haifeng Lin & Nanying Che
Department of Pathology, Beijing Chest Hospital, Capital Medical University, Beijing, China
Yongmeng Li, Xiaodong Chai, Haifeng Lin & Nanying Che
Department of Physics, Capital Normal University, Beijing, China
Moxuan Yang
Thorough Lab, Thorough Future, Beijing, China
Jiahang Xiong, Wei Wang & Shuhao Wang
College of Light Industry Science and Engineering, Tianjin University of Science and Technology, Tianjin, China
Junyang Zeng
School of Technology, Beijing Forestry University, Beijing, China
Yun Chen
Multiscale Research Institute of Complex Systems, Fudan University, Shanghai, China
Gang Xu

Authors

Yongmeng Li
View author publications
Search author on:PubMed Google Scholar
Xiaodong Chai
View author publications
Search author on:PubMed Google Scholar
Moxuan Yang
View author publications
Search author on:PubMed Google Scholar
Jiahang Xiong
View author publications
Search author on:PubMed Google Scholar
Junyang Zeng
View author publications
Search author on:PubMed Google Scholar
Yun Chen
View author publications
Search author on:PubMed Google Scholar
Gang Xu
View author publications
Search author on:PubMed Google Scholar
Haifeng Lin
View author publications
Search author on:PubMed Google Scholar
Wei Wang
View author publications
Search author on:PubMed Google Scholar
Shuhao Wang
View author publications
Search author on:PubMed Google Scholar
Nanying Che
View author publications
Search author on:PubMed Google Scholar

Contributions

N.C. and S.W. conceived and designed the study. Y.L. collected clinical data, conducted the analyses and wrote the manuscript. X.C. participated in the interpretation of digital biomarkers and wrote the manuscript. M.Y., J.X., J.Z. and Y.C. participated in the establishment of the model. G.X. and W.W. provided assistance in establishing the model. H.L. provided assistance in the interpretation of digital biomarkers.

Corresponding authors

Correspondence to Shuhao Wang or Nanying Che.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

supplementary material (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Li, Y., Chai, X., Yang, M. et al. Accurate prediction of disease-free and overall survival in non-small cell lung cancer using patient-level multimodal weakly supervised learning. npj Precis. Onc. 9, 197 (2025). https://doi.org/10.1038/s41698-025-00981-y

Download citation

Received: 29 October 2024
Accepted: 28 May 2025
Published: 19 June 2025
Version of record: 19 June 2025
DOI: https://doi.org/10.1038/s41698-025-00981-y

This article is cited by

A multimodal feature disentanglement model for lymphadenopathy diagnosis based on BUS and CDFI ultrasound videos: a retrospective, prospective, multicenter study
- Ran Cao
- Yangyang Zhu
- Dean Ta
European Radiology (2026)