Introduction

Estimating prognosis is essential for adjuvant treatment decision making and follow-up strategy selection for lung adenocarcinoma patients after surgery1. Some pathological factors, such as visceral pleural invasion (VPI)2, spread through the air space (STAS)3, and lymphovascular invasion (LVI)4, have been reported to be associated with patient outcomes. Compared to the factors mentioned above, the International Association for the Study of Lung Cancer (IASLC)-proposed grading system has been proven to be more efficient and robust for patient stratification according to refs. 5,6. However, these factors may affect the prognosis of patients with stage I tumors, but their effect on those with stage II or III tumors requires further investigation. Furthermore, the TNM staging system can be used to categorize patients into several groups with distinct survival outcomes7. Nevertheless, there is often variation in patient outcomes even among those at a specific TNM stage. Above all, there remains room for improvement in precise risk stratification to improve patient management and disease outcomes.

Recent advances in artificial intelligence (AI) have enabled the use of quantitative data derived from whole slide images (WSIs) to predict patient outcomes directly8,9,10. Histopathology images contain prognostically important information such as tumor-infiltrating lymphocytes11,12, and proportions of tissue types13, each of which can be quantified by specific digital pathology approaches. The hidden information in routine haematoxylin and eosin (H&E)-stained images may help to stratify prognosis from a different dimension, and may serve as a complementary biomarker to the current clinical variables.

In the present study, we developed a WSI-based deep learning model to predict the recurrence risk of resected lung adenocarcinoma without any annotations from pathologists. We then assessed the ability of our model to stratify patients according to prognosis and investigated whether it could help refine the populations of patients likely to benefit from adjuvant chemotherapy. Finally, we obtained WSI heatmaps to explore the pathological features that may contribute to the predictive value of the model and the underlying biological basis of the model was also explored.

Results

Patient characteristics

With the constructed model, a WSI-based score was calculated for each patient in the two validation sets (Fig. 1). Patients in both validation sets were divided into low- and high-risk groups according to the median WSI-based score. In validation cohort 1, more patients in the high-risk group had STAS (36.3% vs. 29.3%, p = 0.048), IASLC grade III tumors (61.5% vs. 45.5%, p < 0.001), and adjuvant chemotherapy (39.7% vs. 30.4%, p = 0.010). In addition, more patients in the high-risk group had VPI (26.6% vs. 21.3%, p = 0.098), LVI (15.9% vs. 12.2%, p = 0.163), and TNM stage III tumors (17.3% vs. 12.8%, p = 0.234), although these differences were not statistically significant (Table 1). In validation cohort 2, the proportions of patients with VPI (35.6% vs. 23.9%, p = 0.089), STAS (39.1% vs. 34.1%, p = 0.493), LVI (21.8% vs. 12.5%, p = 0.101), IASLC grade III tumors (57.5% vs. 53.4%, p = 0.826), TNM stage III tumors (20.7% vs. 14.8%, p = 0.382), and adjuvant chemotherapy (34.5% vs. 29.5%, p = 0.484) were higher in the high-risk group, but none of the differences were statistically significant (Table 1).

Fig. 1: Schematic illustration of the overall study design.
figure 1

a, b Model construction; c efficiency validation; d heatmap visualization. WSI whole slide image, ROC receiver operating characteristic, ACT adjuvant chemotherapy, TLS tertiary lymphoid structure.

Table 1 Baseline characteristics of patients in validation cohort 1 and validation cohort 2

Survival analysis of DFS

In validation cohort 1, the model successfully stratified patients into high- and low-risk groups according to prognosis in the entire cohort (hazard ratio [HR] 1.95, 95% confidence interval [CI] 1.46–2.62, p < 0.001) and in most of the prespecified subgroups (Fig. 2a). Similar results were found in the analysis of validation cohort 2 (Fig. 2b).

Fig. 2: Analysis of DFS in specified subgroups.
figure 2

a Validation cohort 1; b validation cohort 2. EGFR epidermal growth factor receptor, KRAS Kirsten rat sarcoma viral oncogene, VPI visceral pleural invasion, STAS spread through air space, LVI lymph-vascular invasion, IASLC International Association for the Study of Lung Cancer, HR hazard ratio, CI confidence interval, DFS disease-free survival.

The incremental prognostic value of the WSI-based score

Univariable analysis of the validation cohort 1 revealed that patient outcomes were strongly associated with pathological-related factors, including VPI (p < 0.001), STAS (p < 0.001), and LVI status (p < 0.001), IASLC grade (p < 0.001), TNM stage (p < 0.001), and WSI-based score (p < 0.001) (Table 2). Before incorporating the WSI-based score into the multivariable Cox model, the IASLC grade (grade II vs. grade I, HR 13.17, 95% CI 3.16–54.96, p < 0.001; grade III vs. grade I, HR 26.99, 95% CI 6.50–112.13, p < 0.001) and TNM stage (stage II vs. stage I, HR 1.59, 95% CI 0.98–2.57, p = 0.061; stage III vs. stage I, HR 4.04, 95% CI 2.83–5.76, p < 0.001) were identified as independent predictors of DFS. After incorporating the WSI-based score into the multivariable Cox model, it is suggested that the IASLC grade (grade II vs. grade I, HR 13.54, 95% CI 3.24–56.52, p < 0.001; grade III vs. grade I, HR 25.69, 95% CI 6.19–106.63, p < 0.001), TNM stage (stage II vs. stage I, HR 1.67, 95% CI 1.03–2.71, p = 0.037; stage III vs. stage I, HR 4.22, 95% CI 2.95–6.05, p < 0.001), and the constructed model (HR, 1.82, 95% CI, 1.35–2.44, p < 0.001) were all independent predictors of DFS (Table 2). Similar results were found in the analysis of validation cohort 2 (Table 3).

Table 2 Multivariable Cox analysis of disease-free survival in validation cohort 1
Table 3 Multivariable Cox analysis of disease-free survival in validation cohort 2

We then used the C-index to compare the performance of each variable for predicting DFS. For variables significantly associated with DFS according to univariable Cox regression analysis, the WSI-based score did not outperform several pathological factors in either validation cohort 1 (C-index [WSI-based score] = 0.586; C-index [IASLC grade] = 0.674; C-index [TNM stage] = 0.665) or validation cohort 2 (C-index [WSI-based score] = 0.643; C-index [VPI] = 0.654; C-index [STAS] = 0.651; C-index [IASLC grade] = 0.718; C-index [TNM stage] = 0.699) (Supplementary Table 1; Supplementary Fig. 1). Regarding the 3-year and 5-year AUCs for predicting DFS, our model did not show an advantage over some other pathological factors (Supplementary Table 1; Supplementary Fig. 1).

For variables independently predicting DFS in the multivariable Cox regression analysis, we compared the predictive performance of their combinations. The results showed that the combination of IASLC grade, TNM stage and WSI-based score (C-index [WSI-based score & IASLC grade & TNM stage] = 0.753) outperformed any combination of two variables in validation cohort 1 (C-index [IASLC grade & TNM stage] = 0.737, p < 0.001; C-index [WSI-based score & IASLC grade] = 0.708, p < 0.001; C-index [WSI-based score & TNM stage] = 0.706, p < 0.001; Fig. 3a; Table 4). In validation cohort 2, the combination of three variables (C-index [WSI-based score & IASLC grade & TNM stage] = 0.811) also outperformed any combination of two variables (C-index [IASLC grade & TNM stage] = 0.777, p < 0.001; C-index [WSI-based score & IASLC grade] = 0.786, p < 0.001; C-index [WSI-based score & TNM stage] = 0.763, p < 0.001; Fig. 3d; Table 4). The combined model also showed advantages with respect to the 3-year and 5-year AUCs for predicting DFS (Fig. 3b, c, e, f; Table 4). These results collectively demonstrated the added value of the constructed model to the existing clinical models.

Fig. 3: C-index and time-dependent ROC curves at 3 and 5 years for the combination of variables independently predicting DFS according to multivariable Cox regression analysis.
figure 3

ac Validation cohort 1; df validation cohort 2. ROC receiver operating characteristic, AUC area under the curve, DFS disease-free survival.

Table 4 Performance of the constructed model combined with existing pathological variables according to multivariable analysis

The WSI-based score refines patient selection for adjuvant chemotherapy

We then investigated whether our model could help refine subgroups of patients who could mostly benefit from adjuvant chemotherapy. In validation cohort 1, neither stage IB (p = 0.551), stage II (p = 0.116), nor stage III patients (p = 0.068) significantly benefited from adjuvant chemotherapy (Supplementary Fig. 2). Further analysis with combination of the constructed model revealed no survival benefit for patients in the low-risk groups of patients with stage IB (p = 0.974, Fig. 4a), stage II (p = 0.800, Fig. 4b), or stage III (p = 0.464, Fig. 4c) disease. For patients in the high-risk groups, a survival advantage was acquired for patients in stage III (p = 0.030, Fig. 4f) and potentially acquired for patients in stage II (p = 0.077, Fig. 4e), but no survival benefit was observed for patients in stage IB (p = 0.367, Fig. 4d). Similar results were obtained for validation cohort 2 (Supplementary Fig. 2; Fig. 4g–l).

Fig. 4: The combination of TNM stage and the constructed model helps to identify patient subgroups who may particularly benefit from postoperative chemotherapy.
figure 4

af Validation cohort 1; gl validation cohort 2. ACT adjuvant chemotherapy.

Interpretation of the deep learning model

To better understand the pathological mechanism underlying this prediction, we used heatmap visualization to explore the pathomorphological features of our model. As illustrated in Fig. 1d, micropapillary components were identified in the ‘high-risk’ region of the patient with stage IA tumor. Moreover, acinar patterns with tertiary lymphoid structures were characterized in the ‘low-risk’ region of the patient with stage III tumor. This reflects the substantial associations of the constructed model with the current pathological factors and its ability to serve as a complementary biomarker.

Patients in different risk groups present significant heterogeneity in gene expression patterns (Fig. 5a). In GO analyses (Fig. 5b), patients with the model defined high-risk group were associated with pathways representing tumor metabolism and proliferation such as cellular metabolic process, protein metabolic process, cellular component organization, and cellular component organization or biogenesis. Furthermore, as shown in Fig. 5c, tumors in two groups were characterized by diverse immune infiltration patterns. According to results of ssGSEA (Fig. 5d), patients with the model defined high-risk group yielded significantly less infiltrations of activated CD4 T cell, activated dendritic cell, central memory CD4 T cell, central memory CD8 T cell, effector memeory CD4 T cell, immature B cell, immature dendritic cell, macrophage, MDSC, natural killer T cell, and T follicular helper cell.

Fig. 5: Biological basis of the deep learning model.
figure 5

a Radar charts illustrating top 30 differential genes between low-risk and high-risk patients. b Dot plots showing the top 20 upregulated molecular pathways in high risk patients; c Heat map illustrating immune infiltration patterns between low-risk and high-risk patients; d Boxplots comparing proportions of infiltrated immune cells between low-risk and high-risk patients. GO gene ontology, FDR false discovery rate, MDSC myeloid-derived suppressor cells.

Discussion

Recently, the development of digital pathology has provided important information for precise risk stratification and treatment planning. However, predicting prognosis (time-to-event) is considered a more sophisticated problem than a conventional regression task, mainly due to the fact that some patients have not experienced the expected outcomes (death, recurrence, etc.). For this reason, the number of studies using WSIs for predicting prognosis is relatively small9,10,14,15,16,17. From the perspective of technology, these studies have mostly used convolution neural network9,10,15,16,17 to automatically extract features. However, WSIs have many pixels containing lots of invalid information, which seriously affects the accuracy and efficiency of the convolutional neural network. Lee et al.14 used the aggregation algorithm to obtain the superpatch graph before using GNN, which bring a great impact on the working efficiency of the prediction system. Further, Mobadersany et al.16 used region of interest from WSI as input information, which increased the workload of clinicians. Compared with the methods mentioned above, the proposed model converts WSI into graph-based data and introduces the attention mechanism to assign different weights to different nodes, which effectively reduces the computational complexity and improves the prediction accuracy. Moreover, the proposed model was established without the facility of pathologists, overcoming the shortcoming that deep learning algorithms rely on manual annotations and the expertise of pathologists to a certain extent18,19, which may help to improve the generalization of the model.

On the one hand, although the predictive performance for prognosis of our model did not outperform some of the pathological factors, it remained statistically significant in the multivariable analysis when it was combined with TNM stage and IASLC grade, indicating the added value of the model and its ability to serve as a complementary biomarker for survival stratification. On the other hand, the presence of high-risk pathological factors, including VPI, STAS, LVI, and high-grade tumors, was more common in the model-defined high-risk groups, suggesting substantial associations between the constructed model and the current well-defined pathological factors.

A large meta-analysis revealed that adjuvant chemotherapy could yield an overall survival (OS) benefit of 5% at 5 year, however, the statistically significance was not reached (HR 0.87, p = 0.08)20. Following this study, randomized trials evaluating the efficiency of adjuvant chemotherapy were conducted for a decade. Some of the large trials successfully demonstrated the OS benefit21,22, while others failed23. Afterwards, the Lung Adjuvant Cisplatin Evaluation (LACE) study further confirmed the effect of adjuvant chemotherapy on both OS and DFS24. Nevertheless, we must be clear that the overall benefit from adjuvant chemotherapy is limited: stage II-III patients may mostly benefit, stage IB patients may only have trend toward benefit, while stage IA patients may experience deleterious effect. We need to identify subgroups of patients who may particularly benefit from adjuvant chemotherapy. In the current study, no significant survival benefits from adjuvant chemotherapy were acquired across the overall population of patients with stage IB, stage II, or stage III disease. However, combining TNM stage with our constructed model helps to distinguish a survival advantage for high-risk stage III patients, and a potential survival advantage for high-risk stage II patients (statistical significance was not reached for this group perhaps for the limitation of the relatively small sample size). According to our results, we advocate adjuvant chemotherapy for high-risk stage II-III patients and to avoid unnecessary chemotherapy for other patients.

The results demonstrated that our established model exhibits significant biological relevance. The model outputs are likely associated with genes and molecular pathways that promote tumor proliferation, and high-risk patients show significantly lower levels of immune cell infiltration. This partly explains the model’s predictive capability for prognosis and adjuvant chemotherapy decision-making in lung cancer patients.

Despite the promising results obtained in the present study, several limitations should be declared. Firstly, the retrospectively nature of the study may limit the statistical power and hinder the generalization of the results to other centers and regions, especially the results regarding adjuvant therapy, prospective validation with larger sample size is warranted. Second, although our model could be used as a complement to the existing prognostic models of lung adenocarcinoma, there remains much room for improvement in its ability to predict prognosis. Multiomics data integrating radiology, pathology, molecular, and other modalities are needed to establish more efficient and robust models in the future.

In summary, our constructed model can predict the recurrence risk of resected lung adenocarcinoma without the need for annotations from pathologists, which can complement the current prognostic models. Moreover, the model defined high- and low-risk groups may help to guide adjuvant therapy strategies in clinical practice.

Methods

Participants and study design

This multicenter study was approved by the Ethics Committee and Institutional Review Board of Shanghai Pulmonary Hospital (No. K23-292), the First Affiliated Hospital of Nanchang University, Ningbo Hwamei Hospital, the First Affiliated Hospital of Lanzhou University, and followed the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement25 (Supplementary Note 1). The informed consent was waived as this was a retrospective study.

We used 3712 H&E-stained, formalin-fixed and paraffin-embedded (FFPE) tumor tissue sections from 1705 patients with surgically resected lung adenocarcinoma. Patients with stage I-III disease and available clinicopathological data and follow-up information were included. Patients with stage IV disease, a history of neoadjuvant therapy, and no available follow-up information or tumor tissue sections were excluded. For each patient in the training and validation sets, digital WSIs were scanned from the corresponding H&E-stained tumor tissue sections.

To train the model, we used a dataset of 1889 sections from 825 patients who underwent surgery at Shanghai Pulmonary Hospital between January 2012 and December 2012 (Supplementary Table 2). We applied deep learning techniques to develop a histopathological model, the patient-level WSI-based score, to predict the disease-free survival (DFS) of patients with resected lung adenocarcinoma. We then evaluated the capability of our model for survival stratification and investigated whether it could help refine the populations of patients likely to benefit from adjuvant chemotherapy in two separate validation sets. Finally, WSI heatmaps were obtained to explore the pathological features underlying the predictions (see Fig. 1 for the study design). In addition, the underlying biological basis of the model was also explored to enhance the interpretability of the model. The validation cohort 1 included 1516 sections from 705 patients who underwent surgery at Shanghai Pulmonary Hospital between January 2015 and June 2015. The validation cohort 2 included 307 sections from 175 patients between January 2015 and December 2015 from three departments of thoracic surgery: the First Affiliated Hospital of Nanchang University, Ningbo Hwamei Hospital, and the First Affiliated Hospital of Lanzhou University.

Clinical data, including age, sex, smoking history, tumor location, surgery type, and TNM stage, were available for both the training and validation sets. To compare the performance of the constructed model in prognosis prediction with that of the current clinical models, the VPI, STAS, and LVI status and IASLC tumor grade26 were re-evaluated by two of our experienced pathologists (C.W., L.H.) for patients in the validation sets.

Four 21-day cycles of intravenous chemotherapy of cisplatin 75 mg/m2 or carboplatin AUC 5 on day 1 plus pemetrexed 500 mg/m2 on day 1 were administrated after thorough evaluation of the patients’ conditions and discussion among a group of surgeons and oncologists at our centers.

WSI-based score for recurrence risk prediction

The patient-level survival prediction model in this paper is a multiple-classification model based on variable length input. Since the number of WSIs obtained for each patient varied, and the effective area of different WSIs also varied considerably, we need to utilize a model that can handle inputs of variable length. In addition, since hundreds of millions of pixels are contained in WSIs, efficient compression of the input data was also considered to be crucial. Thus, this paper presents a graph-attention-based multiple-instance neural network (GAMINN) for processing variable WSIs for survival prediction. The total analysis system contains the following modules, whose details are shown below.

Data preprocessing, tissue segmentation and feature compression

Because each WSI contains a large number of pixels, it is difficult for a deep learning model to directly process the WSI and obtain good prediction results. Moreover, each WSI contains a large amount of invalid interference information, which not only affects the subsequent analysis performance but also consumes computing resources. Therefore, in this paper, the CLAM model27 was used to classify the tissue regions of pathological images, which can efficiently and accurately classify the regions with high diagnostic value based on low computational burden. Figure 1a shows the tissue profile extracted by the CLAM model, which was used to segment the tissue region effectively and eliminate interference from the invalid region. Thereafter, we partitioned the extracted regions into patch slices, each with a size of 256 × 256 pixels. Meanwhile, we used the ResNet50 model pretrained on the ImageNet2012 dataset to process the extracted patch slices and extract the morphological features of each slice, where the morphological feature dimension of each slice was 1024.

WSI graph construction

For each patch, we saved the position coordinates of each patch in the pathological image from the tissue segmentation and built an adjacency matrix Aj using the fast approximation KNN (k = 8). The adjacency matrix Aj modeled a 3 × 3 image receptive field in the pathological image. Finally, we combined the feature matrix Xj and adjacency matrix Aj to construct the subgraph Gj = (Xj, Aj), and all WSIs for the same patient were constructed as G = {Gj}j=1. Figure 1a shows the process of WSI graph construction.

Feature process module

We combined a graph convolutional neural network and a self-attention mechanism to process the feature input of the graph structure, effectively grasp the implicit relationships between slices, and obtain an effective representation for prognostic risk assessment. Moreover, according to the different numbers of patient-level input WSIs, a multi-instance pooling method was designed to effectively obtain the features of different WSIs of the same patient, thereby improving the final prognosis prediction performance.

Graph-attention-based network

To better handle the input data of the graph structure, we use graph neural network to automatically extract features. Similar to convolutional neural networks, graph convolutional neural networks (GCNs) have powerful feature learning capabilities, in which the convolution of a certain point can be viewed as a weighted sum of the neighbors of the point. However, the GCN treats all neighboring nodes equally during convolution and cannot assign different weights according to the importance of the nodes. Assuming that there are N nodes in a graph, in practical analysis, the contributions of different adjacent nodes to the target node should also be different. To better distribute weights among different nodes, we used an attention mechanism to uniformly normalize the correlation calculated between the target node and all its neighbors.

$${a}_{i,j}=\frac{\exp (Leaky{\rm{R}}{\rm{e}}LU({a}^{T}[W{X}_{i}\parallel W{X}_{j}]))}{{\sum }_{k\in {N}_{i}}\exp (Leaky{\rm{R}}{\rm{e}}LU({a}^{T}[W{X}_{i}\parallel W{X}_{k}]))}$$
(1)

|| is the concatenation operation, W is the linear transformation matrix, a is a renewable matrix, and ai, j are the connection degrees of node j to node i. By combining the GCN with then attention mechanism, we constructed a graph-attention-based (GAT) layer for subsequent analysis.

Learning global features

We build an end-to-end differentiable function FGAT, using a GAT layer to mine the node features of each neighbor in the space. To further learn the global morphological features of pathological images and avoid gradient vanishing in the network, inspired by the idea of residual learning, we used FGAT (l) as a residual map, which allows the superposition of multiple layers of FGAT (l) together, where the output of FGAT (l) is added to the input.

$${G}^{l+1}={F}_{GAT}^{l}({G}^{l};{\Phi }^{l},{\rho }^{l},{\zeta }^{l})+{G}^{l}$$
(2)

where φl is a message construction function that calculates the association characteristics between node u and its neighbor node v, ρl is an aggregation function that aggregates all the features passed to v, and ζl is an update function that updates the existing node features at node v with the aggregated features Xl+1v.

$${m}_{v}^{l}={\rho }^{l}({\Phi }^{l}({X}_{v}^{l},{X}_{u}^{l}))$$
(3)
$${X}_{v}^{l+1}={\zeta }^{l}({X}_{v}^{l},{m}_{v}^{l})$$
(4)

We implemented the main model structure of GAMINN using a 3-layer residual GAT model. In addition, we output the last GAT layer to the fully connected layer and aggregated the different WSI features in the same patient to achieve better patient-level feature expression (Fig. 1b).

Details on network training

We use NVIDIA GeForce 3070 GPU RTX for training the model, which has 16 GB of memory. Additionally, we use the PyTorch library version 1.12.1 for training and evaluation. Adam optimizer is selected as the model optimizer, whose initial weight is 0.0002, and each batch contains a multiple pathological image data of patients. During the training process, the model is trained through 100 epochs and utilize Cox likelihood function as loss function, which is listed as follows:

$$Loss=-{\sum }_{i}\left({h \atop}\theta \,({x}_{i})-\,{\log}\sum _{j\in R({T}_{{\rm{i}}})}\,{e}^{h}\theta ({x}_{{\rm{j}}})\,\right)$$
(5)

xi represents ith cases, hθ(.) means the risk score from the proposed model, and R(Ti) is the list of patients with shorter survival time than the ith patient.

Biological basis of deep learning model

RNA-sequencing was performed in 112 patients in validation cohort 1, the TruSeq RNA Access Library Prep Kit (Illumina) was utilized to generate library and the paired-end sequencing based on an Illumina Novaseq™ 6000 was subsequently conducted. Among them, 63 patients were classified as low-risk and 49 as high-risk. We used the edgeR package to determine differentially expressed genes between two groups with standard of log fold changes more than 1 and adjusted p values less than 0.05. Subsequently, Gene Ontology (GO) pathway analyses was performed to determine pathways related to the model defined risk groups. Additionally, the single sample gene set enrichment analysis (ssGSEA) was conducted with the GSVA package to quantify the relative infiltration of immune cell types in the tumor microenvironment.

Statistical analysis

DFS was defined as the time from surgery to the first-confirmed event of lung cancer recurrence. The Kaplan–Meier method and log-rank test were used to compare survival outcomes between groups. Cox regression analysis was performed to identify independent predictors of survival. The predictive performance of each model was assessed via the Harrell concordance index (C-index), time-dependent receiver operating characteristic (ROC) curves, and area under the curve (AUC) values at 3 and 5 years. The missing information was dealt with using the single imputation method. Statistical analysis was performed with R software (version 4.3.1). A two-sided p value less than 0.05 was considered to indicate statistical significance.