Introduction

Endometrial cancer, a malignancy of the uterine lining, is the sixth most commonly diagnosed cancer in women, with an estimated 420,000 new cases reported globally in 20221. Its age-adjusted mortality has increased steadily, rising from 74,000 deaths in 2008 to nearly 97,000 by 20221,2,3. Although early-stage disease is associated with favorable outcomes, patients with advanced or recurrent disease fare poorly, with a 5-year overall survival rate of only 20–25%4,5.

Histological assessment remains central to endometrial cancer (EC) diagnosis when biopsy is feasible, guiding treatment and prognosis6. Endometrioid carcinoma, the most common histological type, typically arises from endometrial hyperplasia and is often low-grade with favorable outcomes. In contrast, serous carcinoma arises from atrophic endometrium and is associated with a high risk of extrauterine spread and poor prognosis7. Given the limited reproducibility and prognostic precision of histology alone, molecular subtyping now complements traditional classification and enhances risk stratification8. According to the National Comprehensive Cancer Network (NCCN) guidelines, molecular testing for POLE-mutated (POLEmut), mismatch repair (MMR)/microsatellite instability (MSI), and p53 abnormalities is recommended to complement histological tumor classification9. According to the FIGO 2023 staging system, molecular subtype information (particularly POLEmut and p53abn status) can modify staging in FIGO stage I and II endometrial cancers10. For advanced-stage disease (FIGO III and IV), molecular classification provides important prognostic and therapeutic information but does not alter the anatomical stage4,11,12,13. Despite its clinical value, comprehensive molecular classification remains inaccessible in many settings due to cost, technical requirements, and limited availability of sequencing—particularly for detecting POLEmut subtype, which require targeted sequencing methods not routinely available in low-resource laboratories14.

To address this, digital pathology and deep learning have emerged as promising tools for predicting molecular features directly from whole slide images (WSIs)15. High-resolution WSIs capture morphological cues associated with genomic alterations and the tumor microenvironment, and deep learning models have successfully identified mutation profiles in various cancers16,17,18. However, in the field of endometrial cancer, only a few studies have used deep learning to predict molecular characteristics from histological images. Wang et al. proposed a model based on ResNet-18 and XGBoost to predict the microsatellite status of endometrial cancer patients, but interpretability analysis was not performed19. Hong et al. proposed a multiresolution classification model based on the Inception network capable of predicting the four subtypes defined by TCGA and the mutation status of 18 genes6. However, the authors treated the classification task as multiple binary classification subtasks and neglected to assess the correlations between different subtypes. Additionally, the models did not exclude nontumor tissues such as the myometrium and were more likely to yield ambiguous prediction scores for nontumor tissue tiles, potentially affecting the overall prediction performance. Fremond et al. proposed a workflow based on attention-based multi-instance learning (AB-MIL) called im4MEC, which achieved competitive performance in a classification task and demonstrated reproducibility11. Building upon this, further improvements could focus on deepening cellular-level insights and enhancing prediction for challenging subtypes such as POLEmut.

In this study, we trained an interpretable deep learning model for predicting the molecular subtype of endometrial cancer with 364 WSIs from 324 patients admitted to Fudan University Shanghai Cancer Center. We subsequently validated this model using 296 WSIs from 274 patients in the TCGA cohort and 36 WSIs from 33 patients from the First Affiliated Hospital of Soochow University. This model predicted the molecular subtypes of endometrial cancer directly without gene sequencing by extracting key features from WSIs. We developed an end-to-end subtype prediction network, significantly reducing computational costs and improving model interpretability. We then assessed current mainstream subtyping networks with 5-fold cross-validation in the Fudan cohort and compared their performance with that of our model. Additionally, we explored the morphological features potentially related to the molecular subtypes of endometrial cancer. Our study offers a proof-of-concept for morphology-based molecular subtyping, with potential implications for future clinical decision-making in endometrial cancer.

Results

Model performance and cohort overview

After excluding patients with incomplete information, images containing scanning errors and other data issues, the Fudan cohort included 364 slides from 324 patients, including 76 patients with MSI-H, 145 with NSMP, 56 with p53abn, and 47 with POLEmut (Supplementary Fig. S1, Supplementary Table S1). Within the Fudan cohort, Supplementary Table S2 presents the cross-tabulation of age group, histological type, FIGO stage, grade, and recurrence by molecular subtype. Similarly, after screening, the TCGA cohort included 296 slides from 274 patients, including 80 patients with MSI-H, 113 with NSMP, 60 with p53abn, and 21 with POLEmut. The Suzhou cohort ultimately included 36 slides from 33 patients, including 4 patients with MSI-H, 23 with NSMP, and 6 with p53abn; no patients with POLEmut were identified in the cohort (Supplementary Table S1). Notably, the Fudan cohort exhibited a higher proportion of POLEmut cases than the other cohorts, which may reflect institutional referral patterns, regional genetic variation, or minor differences in sequencing pipelines4.

In 5-fold cross-validation in the Fudan cohort, the proposed model achieved an area under the ROC curve (AUROC) of 0.846 (95% CI: 0.798–0.894) for MSI-H, 0.876 (95% CI: 0.831–0.921) for NSMP, 0.910 (95% CI: 0.818–1.003) for p53abn, and 0.835 (95% CI: 0.784–0.886) for POLEmut, with a macroaverage AUROC of 0.867 (95% CI: 0.823–0.911) (Fig. 1A, Supplementary Tables S10S11). In the external validation in the TCGA cohort, the proposed model achieved an AUROC of 0.775 for MSI-H, 0.844 for NSMP, 0.95 for p53abn, and 0.798 for POLEmut, with a macroaverage AUROC of 0.844 (Fig. 1C, Supplementary Tables S7S9). In the external validation in the Suzhou cohort, the proposed model achieved an AUROC of 0.761 for MSI-H, 0.873 for NSMP, and 0.862 for p53abn, with a macroaverage of 0.847 (Fig. 1B, Supplementary Tables S12S13). Compared with the results from the 5-fold cross-validation in the Fudan cohort, the overall performance of the model showed only slight worsening, except in the prediction of p53abn. We report the accuracy, precision, recall, F1 score, sensitivity, specificity, and negative predictive value (NPV) of the model in the different cohorts in the Supplementary Table S3.

Fig. 1: Model performance in predicting the molecular subtype in endometrial cancer.
Fig. 1: Model performance in predicting the molecular subtype in endometrial cancer.The alternative text for this image may have been generated using AI.
Full size image

Macro-average and subtype-wise receiver operating characteristic (ROC) curves in the 5-fold cross-validation of the Fudan cohort (A), and in the external validation cohorts of Suzhou (B) and TCGA (C). The true positive rate represents sensitivity and the false positive rate represents 1 minus the specificity. D Comparison of methods by subtypes with standard deviations. Baselines (AB-MIL, CLAM, MIL) use a frozen ImageNet-pretrained ResNet-50 backbone, while our method uses a partially fine-tuned ImageNet-initialized EfficientNet-V2 backbone. “Macro” is the unweighted mean AUROC across subtypes; error bars show the standard deviation over 5 folds. AB-MIL attention-based multi-instance learning. CLAM clustering-constrained-attention multiple instance learning. MIL multiple instance learning. AUROC area under the receiver operating characteristic curve. MSI-H microsatellite instability–high. NSMP no specific molecular profile. p53abn abnormal cellular tumor antigen p53 expression. TCGA The Cancer Genome Atlas.

To further assess the performance of the network proposed in this study, we conducted 5-fold cross-validation within the Fudan cohort and compared it with representative weakly supervised frameworks, including TransMIL, AB-MIL, Max-Pooling, Mean-Pooling, CLAM-SB, and CLAM-MB11. All models were built upon the UNI pathology foundation encoder for feature extraction, ensuring domain-specific and consistent representation across methods11,20,21. After calculating comparative evaluation metrics including the AUROC, sensitivity, and specificity, the results showed that the proposed network outperformed the three mainstream subtyping networks (Fig. 1D, Supplementary Table S4).

Patch-level visualization revealed the morpho-molecular correlates

Grad-CAM was used to generate heatmaps based on the input patches. We selected representative patches with high confidence from different WSIs and assessed and compared the histological features of the high-activation regions. Finally, we visualized the morphological features associated with the molecular subtypes (Fig. 2). Our results suggested that the majority of MSI-H and NSMP tumors were low-grade endometrioid carcinomas, with MSI-H having a greater density of stromal lymphocytes than NSMP. POLEmut endometrial cancer more frequently exhibited solid growth with more evident cellular atypia than the MSI-H and NSMP subtypes. Finally, p53abn endometrial cancer often displayed papillary growth and characteristics of serous carcinoma, such as high cellular atypia, pleomorphism, and a high nucleus-to-cytoplasm ratio (Fig. 2).

Fig. 2: Patch-level visualization of the morpho-molecular correlates.
Fig. 2: Patch-level visualization of the morpho-molecular correlates.The alternative text for this image may have been generated using AI.
Full size image

The corresponding heatmaps were generated using the Grad-CAM method based on the representative patches with high confidence. The left heatmap shows the high (red)- and low (blue)-activation regions predicted to be associated with the molecular subtypes. The right patch shows the original morphological features. MSI-H microsatellite instability–high. NSMP no specific molecular profile. p53abn abnormal cellular tumor antigen p53 expression.

Extraction and analysis of single-cell nuclear features

We used Hover-Net to perform cell segmentation on the regions of interest (ROI) of the 364 WSIs from the Fudan Cohort. On average, each WSI contained approximately 50,000 inflammatory cells, 75,000 stromal cells, and 550,000 tumor cells. By analyzing the cell masks obtained from segmentation with PyPathomics, 84 features were extracted for each segmented cell, including shape features, texture features, and spatial distribution features (Supplementary Fig. S3, Supplementary Table S5). In total, 245 million cells were analyzed (Fig. 3A).

Fig. 3: Correlations between single-cell nuclear features and molecular subtypes.
Fig. 3: Correlations between single-cell nuclear features and molecular subtypes.The alternative text for this image may have been generated using AI.
Full size image

A Cell number of the three cell types in ROI on 364 WSIs from Fudan Cohort for each molecular subtype. B Heatmaps of Spearman correlation with molecular subtypes. C Box plots of Spearman correlation with molecular subtypes. ROI regions of interest. MSI-H microsatellite instability–high. NSMP no specific molecular profile. p53abn abnormal cellular tumor antigen p53 expression.

We calculated the mean and standard deviation of the single-cell nuclear feature data collected in the previous phase to reveal the overall trends of the cell populations within the regions of interest at the WSI level. Spearman correlation analysis was conducted between these extracted features and the four subtypes. The correlation heatmaps and box plots are shown below (Fig. 3B).

For the MSI-H subtype, no significant correlations were found with any of the nuclear features. For the NSMP subtype, we identified a positive correlation with the proportion of stromal cells (p < 0.0001) and negative correlations with the Delaunay triangle perimeter of stromal cells (p = 0.00104) and the area of immune cells (p < 0.0001). These findings indicate that in patients with NSMP, the tumor region appears to consist of many densely packed stromal cells and small volumes of inflammatory cells. For the p53abn subtype, we identified positive correlations with the mean areas of both tumor cells and inflammatory cells (both p < 0.0001) and their standard deviations (both p < 0.0001). This finding suggests that in patients with p53abn, tumor cells and inflammatory cells are not only of large volumes but also show high variability in their sizes. For the POLEmut subtype, we found positive correlations with the mean difference between the maximum and minimum angles of Delaunay triangles for stromal cells (p < 0.0005) and the area of Delaunay triangles (p = 0.0112). These findings suggest that stromal cells are arranged tightly and exhibit uneven heterogeneity in POLEmut endometrial cancer. This observation was further supported by the heatmap for the POLEmut subtype generated by Grad-CAM, wherein the high-activation regions were dispersed across the patches (Fig. 3C).

Discussion

Currently, the molecular classification of endometrial cancer relies primarily on immunohistochemistry and gene sequencing22. The interpretation of immunohistochemistry results often depends on the pathologist’s experience and subjective judgment, leading to poor reproducibility23. Furthermore, the expression levels of certain protein markers may overlap among different molecular subtypes, resulting in insufficient specificity and sensitivity. The specificity and quality of antibodies also affect the accuracy of the immunohistochemistry results. Although gene sequencing technology has advantages in precisely determining the subtype, it is time-consuming, costly, technically complex, and requires challenging data analysis, significantly reducing accessibility in economically underdeveloped regions24. The deep learning model in this study infers likely molecular subtype directly from H&E whole-slide images. We do not propose it as a replacement for guideline molecular testing; rather, it is positioned as an H&E-first, human-in-the-loop triage tool. Slide-level probabilities and patch-level heatmaps can help pathologists prioritize confirmatory assays (e.g., MSI testing, POLE sequencing, p53 IHC) and guide selective reflex testing when tissue is limited—for example, flagging cases for MMR IHC/MSI workup when the model assigns a high MSI-H probability. By ordering and accelerating confirmatory testing, this workflow may reduce turnaround time and cost, particularly in resource-limited settings.

We trained, tested, and validated the model with 364 WSIs from 324 patients in the Fudan cohort and performed external validation with 296 WSIs from 274 patients in the TCGA cohort and 36 WSIs from 33 patients in the Suzhou cohort. Using an end-to-end predictive network characterized by low computational overhead and good interpretability, we developed a model that achieved improved performance compared with representative MIL and pooling frameworks (TransMIL, AB-MIL, CLAM-SB, CLAM-MB, Max-Pooling, and Mean-Pooling), all trained with features extracted from UNI pathology foundation model for consistent histopathologic representation.

We developed a replicable and efficient method for molecular subtype classification by formulating the problem as a soft voting task across tumor-region patches. Instead of using more complex or data-intensive models, we adopted EfficientNetV2, a state-of-the-art convolutional neural network (CNN) backbone, for patch-level classification. This choice leveraged the inherent strengths of CNNs, including rapid convergence, strong local feature extraction, and the capacity to process high-resolution inputs efficiently25. These characteristics are particularly advantageous in medical imaging tasks involving limited datasets and large histopathological slides. Compared to vision transformer-based architectures such as ViT, which often require large-scale data and operate on fixed-size patch tokenization, our approach is better suited for capturing both fine-grained morphological and global contextual patterns. This design also allowed efficient extraction of informative regions and facilitated incremental learning, avoiding the need for retraining the entire feature extractor. As an end-to-end network was used in our study, we directly captured the most advantageous regions for subtyping from the gradients of the CNN by using Grad-CAM. This facilitated the exploration of subtype-specific morphological characteristics, offering finer-grained interpretability compared with conventional MIL attention maps. Additionally, a tumor segmentation network was used in the preprocessing stage to ensure that the included tiles belonged to tumor areas, minimizing the impact of nontumor areas on the predictive performance of the model. In addition to those of endometrioid carcinoma, the most common endometrial cancer, we also included samples of other histological subtypes, increasing the representativeness of the samples and interpretability of the model.

By analyzing the billions of cells segmented from the 364 WSIs in the dataset, we confirmed the findings of Fremond et al. obtained at the WSI level, that NSMP tiles contained a relatively high proportion of stromal cells and that p53abn tiles displayed relatively large tumor cells11. Furthermore, we revealed correlations between the four molecular subtypes and certain morphological features (e.g., surface curvature) as well as distribution characteristics (e.g., density, cell arrangement) of tumor cells, stromal cells, and inflammatory cells through a larger WSI-level coverage area and richer feature analysis. These correlation analyses allowed us to explore the potential associations between pathogenic mechanisms and prognosis corresponding to the four subtypes in the tumor microenvironment.

This study has several limitations that should be noted. First, although the proposed model achieved high AUROC, accuracy, specificity, and negative predictive value (NPV) in predicting the POLEmut subtype, its performance in terms of precision, recall, F1 score, and sensitivity was relatively poor. This may be attributed to the low prevalence of the POLEmut subtype in the general population, as well as the limited number of such cases in our dataset. Second, the relatively high POLEmut frequency in our cohort may reflect referral bias and enriched molecular testing in a tertiary care setting, and thus may not fully represent population-level distributions. Additionally, given the relatively short follow-up period for the patients included in this study and their generally favorable outcomes, no correlation was established between the molecular subtype and prognostic information.

To enhance the clinical utility of the model, future work should address several key areas. First, improving the identification of the POLEmut subtype will require the expansion of training datasets to include a larger and more representative sample of such cases, thereby mitigating class imbalance and improving statistical power. Second, given the morphological overlap between POLEmut and MSI-H tumors, additional strategies such as feature disentanglement may help the network better distinguish these subtypes. Third, while the current model focuses on four-class molecular subtype prediction, future efforts could explore clinically meaningful binary tasks—such as distinguishing immune-active versus immune-inactive tumors—with particular relevance to the MSI-H subgroup, which is closely associated with immunotherapy response5. Moreover, large-scale clinical validation across diverse cohorts will be essential before the model can be translated into real-world clinical practice.

In summary, this proof-of-concept study introduces a morphology-based framework for molecular subtyping of endometrial cancer. Rather than replacing genomic assays, it provides a scalable and interpretable basis for future clinical translation, especially in settings where molecular testing is limited or unavailable.

Methods

Data preparation and molecular classification

We used diagnostic formalin-fixed paraffin-embedded (FFPE) and H&E-stained tumor slides and complete molecular and clinicopathological data from three cohorts to train, test and validate our models (Fig. 4). In both the Fudan and Suzhou cohorts, molecular classification was performed using a hierarchical algorithm based on targeted next-generation sequencing (NGS) with the ColonCore 46-gene panel (Burning Rock Biotech, Guangzhou, China)4. This panel was designed to simultaneously assess microsatellite instability (MSI) status via a read-based algorithm and to detect mutations associated with Lynch syndrome (LS) (e.g., MLH1, MSH2, MSH6, PMS2), POLE, TP53, and other genes frequently altered in endometrial cancer (e.g., PTEN, CTNNB1, ARID1A, PIK3CA), as well as genes relevant to hereditary cancer syndromes and targeted therapy (Supplementary Table S6). For MSI calling, loci were deemed unstable when the coverage ratio of pre-specified repeat-length bins fell below locus-specific thresholds; samples were labeled MSI-H if >30% of marker loci were unstable (otherwise MSS). Sequencing was performed on the Illumina NextSeq platform (Illumina Inc., San Diego, CA) at a mean depth of 1000× following the manufacturer’s instructions. Genomic loci meeting the depth threshold (≥100×) were identified, followed by variant calling using the VarScan pipeline26. Base calling in tissue samples required ≥8 supporting reads for single-nucleotide variants (SNVs), and ≥2 and ≥5 supporting reads for insertion and deletion variants, respectively. Common polymorphisms (population frequency >0.1% in ExAC/1000 Genomes/dbSNP/ESP6500SI-V2) were excluded. Variants that passed these thresholds were retained for downstream molecular classification. Molecular subtypes were assigned following the mutually exclusive algorithm consistent with The Cancer Genome Atlas (TCGA) framework (Supplementary Fig. S1). Tumors were assigned to the four WHO molecular classes using a fixed hierarchy. First, cases with pathogenic/likely pathogenic POLE exonuclease-domain mutations were classified as POLE ultramutated (POLEmut). Among POLE-wild-type tumors, those identified as high microsatellite instability (MSI-H) by our NGS-based algorithm were labeled MSI-H and treated as a distinct group throughout the study. Next, among microsatellite-stable, POLE-wild-type tumors, those harboring pathogenic/likely pathogenic, protein-altering TP53 variants were classified as p53-abnormal (p53abn). The remaining tumors were assigned to no specific molecular profile (NSMP). As an orthogonal check, IHC for MLH1, MSH2, MSH6, PMS2 was performed; IHC results were concordant with our sequencing-based assignments and did not change any final labels (NGS–IHC concordance 91.8% for assessing MMR status; previously reported by our group4). Representative IHC results are shown in Supplementary Fig. S2, but IHC findings did not alter the final molecular labels used for analysis. The TCGA cohort was classified according to published molecular subtype annotations based on multi-omics profiling27.

Fig. 4: Schematic diagram of data preparation.
Fig. 4: Schematic diagram of data preparation.The alternative text for this image may have been generated using AI.
Full size image

Partitions of the data set used for molecular subtype prediction were shown.

Deep learning pipeline

To effectively process the hundred million-pixel WSIs, we first precropped the WSIs obtained from scanning the H&E-stained slides to generate small nonoverlapping patches. The pixel density of these patches was set to 0.549 microns per pixel, and the resolution was set to 1024 × 1024 to ensure that the patches contained rich details of local cellular structures while retaining global tissue structure information. Next, to remove regions lacking biological content, we applied a brightness-based filter: patches with a mean RGB intensity below 100 or above 200 were excluded, as they typically represented blank slide borders or overstained background. This step focused computation on informative areas, improving both efficiency and accuracy. We then performed stain normalization using the Vahadane method, applied independently to each patch using a single reference slide as the template. Following this, Color mean normalization was performed by subtracting the channel-wise mean RGB values of the ImageNet dataset (\({{\rm{mean}}}_{({\rm{r}},{\rm{g}},{\rm{b}})}\) = (0.485, 0.456, 0.406),\(\,{{\rm{std}}}_{({\rm{r}},{\rm{g}},{\rm{b}})}\) = (0.229, 0.224, 0.225), assuming input pixel values are scaled to the [0,1] range) from each patch after stain normalization.

Next, to enrich the dataset with tumor-relevant regions, we applied a pretrained tumor segmentation model based on DeepLab-v3 model28. A patch was considered “tumor” if >50% of its pixels were labeled as tumor; only tumor patches were retained for all experiments. During 5-fold cross-validation (Fudan cohort), to mitigate class imbalance we performed subtype-stratified retention after tumor filtering—retaining 50% of tumor patches from MSI-H, 30% from NSMP, and 100% from p53abn and POLEmut (MSI-H:NSMP:p53abn:POLEmut = 0.5:0.3:1:1), with a fixed random seed per fold. For external evaluation cohorts, we did not use stratified retention; instead, to standardize slide contribution and control compute, we capped each WSI at 200 tumor patches (keeping all when ≤200, otherwise uniform sampling without replacement to 200). All training/validation/test patches originated from tumor regions; normal-only or stroma-only patches were excluded.

The filtered and normalized patches were then fed into a classical weakly supervised learning framework. In this end-to-end paradigm, the feature extractor and classifier were trained jointly using WSI-level molecular subtype labels. Specifically, each patch was assigned the ground truth label of its parent WSI, and the model was optimized using a categorical cross-entropy loss computed at the patch level. No attention-based weighting or instance selection was applied—each patch contributed equally to the final prediction.

The backbone architecture was EfficientNetV229, initialized with ImageNet-pretrained weights. The first five blocks (shallowest layers) were frozen during training, while the remaining layers were fine-tuned on our histopathology data. This configuration preserved generic low-level feature extraction while allowing deeper layers to adapt to the histopathological domain. Training was conducted using the AdamW optimizer (learning rate = 2 × 10−5, weight decay = 1 × 10−3, batch size = 10, dropout = 0.25, epochs = 80). Hyperparameters were selected based on 5-fold cross-validation within the Fudan training cohort; no external cohort (TCGA, Suzhou) were used in hyperparameter tuning.

For inference, the model outputted softmax probabilities across the four molecular subtypes for each patch. Slide-level classification was performed using soft voting: all patch-level probabilities from a given WSI were averaged, and the subtype with the highest mean probability was assigned using argmax. This straightforward aggregation strategy enabled interpretable and reproducible decision-making.

We evaluated six representative weakly supervised frameworks under a unified protocol, including TransMIL30, AB-MIL31, Max-Pooling, Mean-Pooling, CLAM-SB, and CLAM-MB20. For fair comparison, all models used patch-level features extracted from the UNI pathology foundation model, which provides pathology-specific representations with improved consistency across cohorts21. The extracted features were kept frozen, and only the respective aggregation or attention heads were trained on the Fudan cohort with slide-level supervision using cross-entropy loss. Preprocessing, tiling, stain normalization, tumor-region filtering (DeepLabv3), data partitioning, and evaluation metrics were identical across all models (Supplementary Tables S5).

In contrast to these pooling- and MIL-based baselines that operate on frozen patch embeddings, our model fine-tunes the feature encoder jointly with the classifier under WSI-level supervision. This joint learning strategy has been described as “classical weakly supervised learning” by Laleh et al.32 and has been shown to yield biologically relevant predictions even without explicit patch-level annotations. Moreover, our approach enables downstream interpretability through Grad-CAM visualization, highlighting tissue regions most influential for the final subtype prediction and providing finer, pathology-aligned insights beyond those offered by decoupled MIL frameworks.

By leveraging both microscopic cellular morphology and macroscopic tissue patterns in a unified pipeline, our model captures multi-scale histopathological features critical for endometrial cancer molecular subtyping. This interpretable design supports integration into future workflows where explainability and traceability are paramount (Fig. 5).

Fig. 5: Deep learning pipeline.
Fig. 5: Deep learning pipeline.The alternative text for this image may have been generated using AI.
Full size image

Whole slide image (WSI) was cut into 1024 × 1024-pixel tiles and the non-tissue tiles were excluded using the histogram method. The tissue tiles were passed to a DeepLabv3 model and classified into tumor regions(red) and normal regions(blue). The tumor tiles were color normalized using the Vahadane method and converted into mini-batches. Our proposed prediction network, based on EfficientNetV2, was trained to capture global shapes and local texture features from the tiles and finally output tile-level prediction probabilities for the four molecular subtypes. Hover-Net, a nuclear segmentation and classification deep learning model, was used to analysis the multiple features in ROI on 364 WSIs from Fudan Cohort. A WSI-level feature set was established by aggregating the tile-level features, including basic information, morphological features, texture features and spatial distribution. MSI-H microsatellite instability–high. NSMP no specific molecular profile. p53abn abnormal cellular tumor antigen p53 expression.

Patch-level visualization of the morpho-molecular correlates

To visualize the morphological features related to the molecular subtype, we used gradient-weighted class-activated mapping (Grad-CAM), a visualization technique that uses gradient information to weigh the activation values of each convolutional layer feature map and corresponding heatmaps generated based on the input patches. These heatmaps show the areas of the patch that the CNN focused on when predicting the molecular subtype, with red indicating high areas of activation and blue indicating low areas of activation. Representative patches with high confidence from different WSIs were selected to assess and compare histological features corresponding to high activation areas, establishing the connection between the molecular subtype and morphological features of endometrial cancer.

Correlations between single-cell nuclear features and molecular subtypes

In addition to these patch-level heatmaps for analyzing the histological interpretability of H&E-stained slides, Hover-Net was used to classify and segment the nuclei of all cells (including tumor cells, inflammatory cells, and stromal cells) within the tumor cell-enriched areas of each WSI. Unlike Fremond et al.11, who focused on nuclear features from the top 20 most-attended patches, our approach analyzed all cells—including tumor, stromal, and inflammatory cells—across the entire tumor compartment, thereby encompassing a much larger and more representative cellular population per case. From the segmented nuclei, we extracted a total of 84 interpretable features per cell, which were organized into three categories: morphological, texture, and spatial distribution metrics (Supplementary Fig. S1). Morphological features included area, perimeter, eccentricity, elongation, and boundary curvature, which capture aspects of cellular growth, shape regularity, and surface smoothness. Texture features such as angular second moment (ASM), contrast, entropy, and correlation quantified intra-nuclear homogeneity or heterogeneity. To assess spatial architecture, we employed Delaunay triangulation across nuclear centroids, calculating metrics such as triangle area, perimeter, and angular variance to reflect cell packing density and spatial uniformity (Supplementary Table S5).

We computed molecular subtype-specific correlations between these cell-level features and the predicted labels across the entire training dataset. For clarity and conciseness in the main text, we reported the top 20 features showing the strongest correlations with specific molecular subtypes. However, the full feature matrix across all cases is available in the supplementary appendix, ensuring transparency and enabling downstream reproducibility.

For interpretability, we considered correctly classified WSIs and computed a patch score as the softmax probability for the WSI’s ground-truth subtype. Within each subtype, patches with score ≥0.95 were retained; if fewer were available, additional patches were added in descending score order to reach the target, enforcing ≤2 patches per WSI. Two board-certified gynecologic pathologists (≥5 years’ experience) then independently reviewed the candidates (blinded; structured checklist with optional free-text) and selected 20 representative patches per subtype for qualitative illustration. These qualitative evaluations were used to cross-check whether the model-prioritized features aligned with human-observed subtype distinctions.

Together, this whole-slide, single-cell feature analysis provides a reproducible framework for exploring morpho-molecular correlates at scale. While individual correlations may remain modest, the breadth and transparency of the extracted features enable future meta-analyses and facilitate integration with prospective biological validation efforts.