Introduction

Gastric neuroendocrine carcinoma (G-NEC) is a distinct and increasingly recognized pathological subtype of gastric malignancies1,2, with a rising incidence in recent years3,4,5. Characterized by aggressive biological behavior, high proliferative indices, and a tendency for early metastasis, G-NEC is associated with significantly poorer outcomes compared to the more common gastric adenocarcinoma (GC)6,7,8. Consequently, early and accurate identification of G-NEC is critical for selecting appropriate therapeutic strategies, which often involve more aggressive treatment regimens.

Despite its clinical importance, the histopathological distinction between G-NEC and GC remains challenging. Many G-NEC cases demonstrate morphological features that closely mimic those of poorly differentiated GC, making reliable diagnosis based on hematoxylin and eosin (H&E)-stained slides alone difficult9. As a result, current diagnostic protocols frequently rely on immunohistochemistry (IHC) biomarkers and molecular testing to confirm neuroendocrine differentiation10. Unfortunately, these confirmatory methods are both time-consuming and resource-intensive, limiting their availability and scalability, especially in low-resource or high-throughput settings. Thus, there is an urgent need for a rapid, accessible, and cost-effective diagnostic tool to support the early identification and stratification of G-NEC patients.

Recent advances in artificial intelligence (AI)—particularly deep learning-have shown transformative potential in medical image analysis. Neural network models, especially convolutional neural networks (CNNs), have demonstrated exceptional performance in tasks such as tumor detection, classification, and molecular typing of various tumors11,12,13. By learning complex hierarchical patterns in large-scale medical data, AI-based models can augment pathologists’ diagnostic capabilities14,15, offering improvements in efficiency, accuracy, and reproducibility16,17. Notably, deep learning has already shown potential in clinical applications such as breast cancer18, lung cancer19, and colorectal cancer20. Our previous work further validated the feasibility of this approach by successfully predicting Epstein–Barr virus (EBV) status in gastric cancer from histopathological images, achieving an AUROC between 0.895 and 0.96921.

In current clinical practice, definitive diagnosis of G-NEC typically integrates morphological assessment with immunohistochemistry and, where necessary, molecular profiling22,23,24. However, these conventional approaches may overlook subtle but discriminative image-based features that are amenable to computational detection. In the present study, we present G-NECNet, a deep learning-based diagnostic model designed for the image-based identification of G-NEC. Our aim was to develop a robust, scalable tool that can serve as a rapid, low-cost screening solution to aid pathologists in the differential diagnosis of gastric tumors. By improving diagnostic accuracy and reducing dependence on resource-intensive techniques, G-NECNet has the potential to streamline clinical workflows and enhance precision oncology efforts for patients with gastric cancer.

Methods

Study participants

For comprehensive patient representation and generalizability of our findings, we employed three distinct pathological image datasets to develop an automated tumor detector for histological images of G-NEC. We included biopsy and surgical specimens from G-NEC patients and surgical specimens from GC patients collected between the retrospective study period from 2013 to 2023.

Slides from patients younger than 18 years of age and the unsatisfactory slides were excluded. These datasets included the Internal cohort sourced from the Sun Yat-sen University Cancer Center (SYSUCC), the External-cohort sourced from the Sixth Affiliated Hospital of Sun Yat-sen University (SYSU-6th Hospital), and the Consultation-cohort from an external consultation database. Although SYSUCC and SYSU-6th Hospital are affiliated with Sun Yat-sen University, they are independent institutions with separate patient registries, and no patient overlap occurred across any cohorts or subsets. The Internal cohort encompassed all available slides from G-NEC patients, alongside randomly selected slides from all gastric cancer (GC) patients within the same medical center. Notably, this cohort was randomly partitioned into two subsets, one allocated for model training (Training Cohort) and the other for model testing (Internal test cohort). The External-cohort and Consultation-cohort involved the random inclusion of G-NEC patients to ensure diverse representation. All histopathological diagnoses were independently reviewed and confirmed by two board-certified gastrointestinal pathologists according to the 2022 WHO Classification of Tumors of the Digestive System (WHO 5th Edition)25. To ensure diagnostic purity, we excluded the following categories: (1) gastric neuroendocrine tumors (G-NET, grades 1 and 2); (2) mixed neuroendocrine-non-neuroendocrine neoplasms (MiNEN); (3) other rare histologic subtypes such as lymphoma, gastrointestinal stromal tumor (GIST), and metastatic tumors; (4) non-neoplastic gastric lesions (e.g., chronic gastritis, ulcer, or hyperplasia). Ethical approval for this study was obtained from the Institutional Review Board of Sun Yat-Sen University Cancer Center (project number: B2022-614). Informed consent was waived due to the retrospective nature of the study, with patients not directly recruited for the research.

Slide scanning and annotations

One or two representative H&E-stained tumor slides from each patient’s resection from the Internal cohort, the External-cohort, and the Consultation-cohort were scanned at ×40 magnification (0.25 μm/pixel) on an Aperio AT2 scanner (Leica Biosystems, Wetzlar, Germany) to generate one or two WSIs in SVS format. Blinded to the diagnosis and patients’ information, pathologists used the software QuPath (open-source25, version 0.2.3) to annotate the slides by drawing regions of interest around the tumor area. The annotations created by the senior pathologists served as the reference standard for tumor region identification.

Determination of G-NEC

According to the WHO Classification of Tumours of the Digestive System (5th Edition, 2022)25, the diagnosis of G-NEC requires histological evidence of poorly differentiated carcinoma, typically demonstrating solid, trabecular, or sheet-like growth patterns, extensive necrosis, and marked nuclear atypia with a high nuclear-to-cytoplasmic ratio. Neuroendocrine differentiation must be confirmed by immunohistochemical staining positive for at least one neuroendocrine marker, such as synaptophysin (Syn), chromogranin A (CgA), or insulinoma-associated protein 1 (INSM1). A high proliferative rate (defined as a mitotic count exceeding 20 mitoses per 10 high-power fields and/or a Ki-67 proliferation index >20%) is also required for diagnosis. Therefore, the diagnosis of G-NEC was based on morphological characteristics of the tumor supported by immunohistochemical staining for neuroendocrine markers, such as synaptophysin, Syn, CgA, and INSM11. Therefore, the diagnosis of the patients from Internal cohort, External-cohort, and Consultation-cohort datasets was determined using IHC targeting neuroendocrine markers, and assessing morphological differentiation, mitotic count, and Ki67 proliferation index of the tumor10.

Whole slide images (WSIs) preprocessing

Before diagnosing predictions, we uniformly preprocessed WSIs to eliminate the impact of image quality. First, each slide was magnified ten-fold to better identify G-NEC-related features at a lower magnification. Then, the original red-green-blue (RGB) of each slide was converted to a gray version, and then the grayscale image was extracted with a certain threshold to extract the tissue area. Tissue areas are also color-normalized to eliminate potential color variations due to differences in histological staining between medical centers. Each preprocessed slide was tiled into non-overlapping 512 × 512 pixels for the next step of the analysis.

G-NECNet development

The G-NECNet consists of two sequential components: a tumor detector and a G-NEC classifier (Fig. 1). First of all, to achieve full automation of G-NEC diagnostics, we developed a tumor detector primarily based on the internal dataset, which was then used to automatically locate the tumor regions in each slide from the external validation dataset. We used the well-known convolutional neural network ResNet50 as the classifier backbone of the tumor detector. For the tumor detector, we set 145 GC slides of the internal cohort, manually outlined by pathologists as tumor tissues, and randomly selected 145 tumor-free gastric tissues as normal tissues. And the size of each tile is 512 × 512 pixels with a magnification of ×10. During the training process, we first split the dataset into five-fold and trained five individual tumor classifiers using a five-fold cross-validation strategy. Each time the classifier is trained with four folds, and another fold is used as the internal validation. We used the stochastic gradient descent (SGD) optimizer with a batch size of 64 and weight decay of 0.0005 to train each classifier for up to 50 epochs. The learning rate starts from 0.001 and changes with the cosine annealing schedule. The probability outputs of the five classifiers are averaged as the final output of the integrated tumor detector. After training, for any input tile from a new slide in the external dataset, the average of the probability outputs of the five classifiers was the final output of the integrated tumor detector. The input tile with an average probability >0.5 was classified as a ‘tumor’ class and was then further analyzed by G-NECNet.

Fig. 1: The workflow of G-NECNet for predicting G-NEC with WSIs.
Fig. 1: The workflow of G-NECNet for predicting G-NEC with WSIs.The alternative text for this image may have been generated using AI.
Full size image

Each WSI was preprocessed and segmented into 512 × 512 pixels tiles. These tiles were then input to the tumor detector. Only tiles identified as tumor regions were fed to G-NECNet to obtain tile-level probabilities for G-NEC predictions. Five well-trained individual classifiers were integrated to form G-NECNet. The average probability output of five independent classifiers was used as the tile-level prediction of G-NEC. Average tile-level probabilities generated slide-level probabilities for G-NEC. WSI, whole slide image. Scale bars are indicated in the figures.

To predict whether a patient belongs to the G-NEC subgroup, we used ResNet50 as the classifier backbone to train based on an internal dataset and finally developed G-NECNet. During training, the dataset was divided into five folds at the slide level, and five separate classifiers were trained first using a five-fold cross-validation strategy. For each classifier, four folds of data were used to train the classifier, and the remaining fold was used as an internal validation set. About 61 G-NEC slides and 340 GC slides were obtained for all the training classifiers, and about 16 G-NEC slides and 85 GC slides were obtained for internal validation. For the training of each classifier, the slides of tumor regions selected by the tumor detector and their corresponding labels (1 for “G-NEC” and 0 for “GC”) were used, respectively, as the inputs and the outputs of G-NECNet. We used the stochastic gradient descent (SGD) optimizer with a batch size of 64 and weight decay of 0.0005 to train each classifier for no more than 150 epochs. The learning rate starts from 0.001 and changes with the cosine annealing schedule. The training was terminated when the classifier's performance on the internal validation set was not improved further over five consecutive epochs. Such a training process is repeated five times on five individual classifiers, each time using a different fold as an internal validation set. Then, five well-trained individual classifiers are ensembled to form G-NECNet, that is, the average probability output of the five individual classifiers is used as the prediction of the ensemble model G-NECNet. Moreover, the output of the ensemble model G-NECNet only represented the G-NEC prediction of this tile, and the final G-NEC prediction of any slide depended on the average of the G-NECNet predictions for all tiles of the tumor region in it.

Reader study

To compare the performance of G-NECNet with that of pathologists, three pathologists (junior pathologist with 5 years of experience, senior pathologist with about 10 years of experience, and expert pathologist with 15 years of experience specializing in gastrointestinal tumors) were presented with slides from the three cohorts. Blinded to the diagnosis and clinical information, pathologists reviewed only H&E-stained histopathological slides and classified each case into G-NEC or gastric adenocarcinoma based on the histopathological features of G-NEC and their expertise. Then, we compared the sensitivity and specificity of three pathologists with G-NECNet for the diagnosis of G-NEC.

Statistics and reproducibility

The cut-off threshold of the G-NECNet’s receiver operating characteristic curve was defined by Youden’s J statistic31 to dichotomize G-NECNet’s probabilities into binary predictions for calculating the sensitivity, specificity, and NPV. This cut-off threshold was predefined by the Internal cohort before the evaluation of the external datasets. The AUROCs of G-NECNet were calculated according to their prediction scores and using the ground-truth diagnosis as the reference standard. The AUROCs were compared and calculated by DeLong’s test. Differences in sensitivity and specificity were tested by using the McNemar test. The baseline data of study participants from different datasets were compared by variance analysis or the Chi-square test. Differences in sensitivity and specificity between each pathologist and G-NECNet were tested by using the Chi-square test. The 95% CIs of AUROC were calculated using bootstrapping. A two-tailed α criterion of 0.05 was used for significance. Analyses were done in IBM SPSS Statistics (version 20.0) and MedCalc (version 15.2.2). Python (version 3.9.6) and the deep learning platform PyTorch (version 1.9) were used to build the model and analyze the data.

Human ethics and Consent to participate declarations

Ethical approval for this study was obtained from the Institutional Review Board of Sun Yat-Sen University Cancer Center (project number: B2022-614). Informed consent was waived due to the retrospective nature of the study, with patients not directly recruited for the research.

Results

Patient cohorts

The study utilized three independent cohorts to develop, test, and externally validate the performance of the G-NECNet model (Supplementary Fig. 1). The internal cohort served as the primary dataset for model development and internal validation. It included a total of 1177 whole-slide images (WSIs) derived from 826 patients—comprising 171 WSIs from 99 patients diagnosed with G-NEC and 1006 WSIs from 727 patients with GC—all collected from a single medical center. This dataset was randomly split into training (Training Cohort) and testing subsets (Internal test cohort). The External-cohort, which functioned as independent validation, consisted of 344 WSIs (65 WSIs from 61 patients with G-NEC and 279 WSIs from 279 patients with GC) obtained from the Sixth Affiliated Hospital of Sun Yat-sen University. The Consultation-cohort, designated for independent validation, included 113 WSIs (21 WSIs from 21 patients with G-NEC and 92 WSIs from 92 patients with GC), collected from multiple external medical institutions. The details of these three datasets were summarized in Supplementary Table 1.

Diagnostic performance of G-NECNet

G-NECNet was developed using the ResNet50 architecture as its backbone (Fig. 1), aiming to accurately differentiate G-NEC from GC based on H&E-stained histopathological slides. Within the Training cohort, the model achieved area under the receiver operating characteristic curve (AUROC) values ranging from 0.978 to 1.000 across cross-validation folds during training (Supplementary Table 2). On the internal test set, G-NECNet achieved an AUROC of 0.993 (95% confidence interval (CI):0.99–1.00), a sensitivity of 0.952 (95% CI:0.88–0.99), specificity of 0.986 (95% CI:0.97–0.99), and a negative predictive value (NPV) of 0.993 (95% CI:0.98–1.00). In the External-cohort, G-NECNet maintained its robust performance, yielding an AUROC of 0.985 (95% CI:0.98–0.99), sensitivity of 0.923 (95% CI:0.83–0.97), specificity of 0.942 (95% CI:0.91–0.97), and NPV of 0.981 (95% CI:0.96–0.99). Remarkably, in the Consultation-cohort, G-NECNet exhibited exceptional diagnostic accuracy with an AUROC of 1.0 (95% CI:1.00–1.00), alongside a sensitivity of 0.952 (95% CI:0.76–1.00), specificity of 1.000 (95% CI:0.96–1.00), and NPV of 0.989 (95% CI:0.94–1.00). The consistent performance across internal and external datasets underscores the reliability and clinical applicability of G-NECNet (Fig. 2). Representative and differentiation grades correctly classified and misclassified WSIs are shown in Fig. 3, Supplementary Fig. 2, and Supplementary Table 3, respectively, illustrating the model’s interpretability and diagnostic potential.

Fig. 2: Diagnostic performances of G-NECNet and pathologists.
Fig. 2: Diagnostic performances of G-NECNet and pathologists.The alternative text for this image may have been generated using AI.
Full size image

a On the Internal test cohort, the G-NECNet achieved an AUROC of 0.993, outperforming all three pathologists with an AUROC ranging from 0.677 to 0.903 (the p-value was marked in the figure, Delong’s test, two-sided). b On the External-cohort, the G-NECNet achieved an AUROC of 0.985, outperforming all pathologists with an AUROC ranging from 0.712 to 0.882 (the p-value was marked in the figure, Delong’s test, two-sided). c On the Consultation-cohort, the G-NECNet achieved an AUROC of 1.000, outperforming all pathologists with an AUROC ranging from 0.770 to 0.876 (the p-value was marked in the figure, Delong’s test, two-sided).

Fig. 3: Successful cases predicted by G-NECNet.
Fig. 3: Successful cases predicted by G-NECNet.The alternative text for this image may have been generated using AI.
Full size image

Histological images (left column) of patients with G-NEC in a–c were from Internal test cohort, External-cohort, and Consultation-cohort, respectively. The heatmaps (middle column) showed that tumor tiles were mainly predicted as G-NEC with a high score (reddish color). Four randomly selected tiles with high G-NEC scores are displayed in the right column (at ×10 magnification). Histological images (left column) of patients with GC in d–f were from the Internal test cohort, External-cohort, and Consultation-cohort, respectively. The heatmaps (middle column) showed that tumor tiles were mainly predicted to be GC with a low score (bluish color). Four randomly selected tiles with low G-NEC scores are displayed in the right column (at ×10 magnification). Scale bars are indicated in the figures.

Comparative evaluation with pathologists

To benchmark G-NECNet’s performance, we conducted a reader study involving three board-certified pathologists with 5–15 years of experience. Diagnostic performance metrics were calculated across all three cohorts. Within the Internal test cohort, the AUROC values for the junior, senior, and expert pathologists were 0.677 (95% CI:0.61–0.75), 0.825 (95% CI:0.76–0.89), and 0.903 (95% CI:0.85–0.95), respectively. Sensitivity ranged from 0.405 to 0.821, while specificity ranged from 0.948 to 0.985.

Likewise, in the External-cohort, the AUROC values were 0.712 (95% CI0.64–0.79), 0.739 (95% CI:0.66–0.82), and 0.882 (95% CI:0.82–0.94) for the junior, senior, and expert pathologists, respectively. Sensitivity ranged from 0.538 to 0.800, while specificity ranged from 0.825 to 0.964. In the Consultation-cohort, AUROC values were 0.777 (95% CI:0.65–0.91), 0.857 (95% CI:0.74–0.98), and 0.876 (95% CI:0.77–0.99), respectively. Sensitivity scores ranged from 0.619 to 0.762, while specificity ranged from 0.935 to 1.000.

Across all three datasets, G-NECNet significantly surpassed that of all three pathologists in AUROC, particularly excelling in sensitivity while maintaining comparable specificity (Fig. 2, Table 1). Inter-rater agreement, measured using Cohen’s Kappa, revealed moderate to substantial agreement among the pathologists, with values of 0.363, 0.694, and 0.752 (Internal test cohort), 0.373, 0.517, and 0.778 (External-cohort), and 0.575, 0.803, and 0.811 (Consultation-cohort) for the junior, senior, and expert readers, respectively.

Table 1 Performance comparison between G-NECNet and pathologists

Discussion

G-NEC shares morphological similarities with conventional GC, yet the two entities diverge markedly in terms of treatment strategies and prognostic outcome. As such, accurate differentiation is essential and traditionally relies heavily on expert pathological interpretation- a process that is both resource-intensive and time-consuming. Despite advances in medical imaging and molecular diagnostics, progress in the efficient diagnosis of G-NEC has remained limited over the past decade26.

To address these challenges, we developed G-NECNet, a deep learning-based diagnostic model designed to distinguish G-NEC from GC using H&E-stained WSIs. G-NECNet was trained using a data-efficient architecture that not only maintains high diagnostic accuracy but also exhibits excellent generalizability across multiple independent datasets. Our results demonstrate that G-NECNet matches or surpasses the performance of senior pathologists, while significantly reducing the cost and turnaround time of diagnosis.

A notable advantage of G-NECNet lies in its potential application as a pre-screening tool. By autonomously flagging cases with a high likelihood of G-NEC and reserving costly immunohistochemical or molecular testing for uncertain instances, the model facilitates a more streamlined diagnostic workflow. Across internal and external validation cohorts, G-NECNet achieved outstanding AUROC values of 0.993, 0.985, and 1.000, respectively—attesting to its robustness in heterogeneous clinical environments. These results are particularly significant in light of known challenges associated with cross-institutional variability in slide preparation, staining techniques, and scanning instruments27,28.

From a clinical utility perspective, G-NECNet’s NPV of 98.9% implies a false omission rate of only 1.1%. This high NPV allows clinicians to confidently rule out G-NEC in negative cases, thereby avoiding unnecessary confirmatory testing and reducing the diagnostic burden on both healthcare systems and patients. In comparative terms, G-NECNet demonstrates superior performance relative to prior AI-related gastrointestinal cancer diagnosis models29,30,31,32, suggesting its applicability as a reliable first-line screening mechanism in routine pathology practice. This approach could significantly reduce the occurrence of misdiagnoses and associated diagnostic costs. By ruling out patients without a G-NEC diagnosis from undergoing immunohistochemistry and molecular testing, substantial savings in both labor and testing expenses could be achieved.

Despite these promising findings, it is essential to acknowledge certain limitations inherent in our study. First, although this was a multicenter study, the overall sample size remained relatively limited, and the dataset included only histologically confirmed G-NEC and gastric GC cases. This design was chosen because differentiating G-NEC from poorly differentiated GC represents the most challenging and clinically relevant diagnostic scenario in gastric pathology. However, this binary classification setting does not encompass the full spectrum of gastric lesions, such as MiNEN, other rare histological types, and non-neoplastic conditions. Expanding future studies to include a broader population would enable true “detection” of G-NEC and enhance model generalizability. Second, the deep learning model was developed and assessed retrospectively. To establish its clinical application, a rigorous prospective clinical study is warranted. Finally, several avenues may be explored to potentially enhance G-NECNet’s performance. These include the integration of current imaging data with clinical information or other relevant features, the implementation of human–machine synergy, and the consideration of alternative network architectures as replacements for existing backbones.

In conclusion, we developed a deep learning model to distinguish between G-NEC and GC by analyzing H&E-stained pathological images. G-NECNet demonstrated diagnostic performance comparable to that of experienced pathologists and maintained high accuracy across independent external datasets. By significantly reducing reliance on ancillary immunohistochemical testing, the model has the potential to streamline diagnostic workflows, lower diagnostic costs, and expedite clinical decision-making.