Abstract
Contemporary glioma diagnosis integrates molecular features (e.g., IDH, 1p/19q) with histopathology to guide clinical decision-making. However, divergent imaging protocols and variable molecular testing standards across institutions result in pervasive data heterogeneity in multi-center studies. These inconsistencies manifest as incomplete imaging sequences and missing annotations, hindering the development of robust AI-driven diagnostic frameworks. To address this, we propose SSL-MISS-Net (Self-Supervised Learning with MIssing-label encoding and Semantic Synthesis), a unified framework that simultaneously tackles input-side modality incompleteness via cross-modal self-supervised learning and output-side annotation deficiencies through a missing-label synergistic strategy, thereby reducing reliance on complete data. To our knowledge, this is the first study to jointly address both challenges, effectively unlocking the diagnostic potential of imperfect clinical data. We evaluated SSL-MISS-Net with five-fold cross-validation and two independent test sets on multi-center cohorts (six in-house datasets, three public repositories; N = 2238). Compared with sub-optimal methods AHI, SSL-MISS-Net achieved significant accuracy gains of 4% (validation) and 10% (test) for integrated glioma diagnosis. Moreover, the framework expanded the amount of clinically usable data by 256% and consistently outperformed state-of-the-art methods trained on complete data. These results demonstrate SSL-MISS-Net’s clinical translatability and exceptional resilience to data imperfections in neuro-oncology AI diagnostics.
Introduction
Glioma are the most common malignant tumors of the Central Nervous System (CNS), accounting for approximately 80% of all CNS malignancies1. Diffuse gliomas represent the majority of adult primary brain tumors. The fifth edition (2021) of the World Health Organization Classification of Tumors of the Central Nervous System (WHO CNS5) stratifies diffuse gliomas into three categories: (1) oligodendroglioma, IDH-mutant and 1p/19q co-deleted; (2) astrocytoma, IDH-mutant; and (3) glioblastoma, IDH wildtype2. WHO CNS5 establishes integrated diagnostics combining specific molecular features (IDH mutation status, 1p/19q co-deletion) with histopathological diagnostics, emphasizing the significance of molecular features in tumor progression, therapeutic strategies, and prognosis. This shift marks the era of integrated diagnosis in the clinical evaluation of gliomas.
Existing studies have demonstrated drastic differences in overall survival and therapy response among gliomas with different CNS5 categories3,4. Clinically, performing integrated diagnosis of gliomas is a time-consuming, labor-intensive, and medically expensive task. This diagnostic process demands significant expertise and advanced equipment, which may be unavailable in resource-limited settings.
MRI is the most commonly used imaging modality for gliomas5, providing high-resolution, non-invasive imaging with substantial clinical diagnostic value. Among the commonly employed MRI sequences for preoperative evaluation, T1-weighted contrast-enhanced (T1C) imaging and T2-fluid-attenuated inversion recovery (FLAIR) are most frequently utilized modalities. T1C sequences helps visualize the core of tumor, whereas FLAIR sequences effectively delineate tumor infiltration zone.
Most current methods6,7,8,9,10 are optimized under the assumption of complete images sequences and labels. In clinical practice, variations in imaging protocols and inconsistent molecular testing standards have resulted in incomplete image sequences and missing labels across datasets from different institutions. These result in the underutilization of valuable clinical samples, substantially hindering the development of robust predictive models and their translation into clinical practice.
To address these challenges and explore potential solutions, we present a novel Self-Supervised Learning with MIssing label and Semantic Synthesis Network (SSL-MISS-Net). This is the first unified architecture addressing challenges of partial imaging sequences and missing labels in glioma MRI diagnosis, ensuring that imperfect data can still contribute to model training as illustrated in Fig. 1a–e. Our primary objective is to enhance AI-driven integrated diagnosis of gliomas by unlocking the diagnostic potential of imperfect clinical data. To validate methodological efficacy, we implemented cross-validation and independent testing on the nine center cohorts (6 in-house datasets, 3 public repositories; N = 2238). Experimental results demonstrate that our framework systematically transforms imperfect clinical data into valuable network training resources, significantly enhancing model generalization capability and diagnostic robustness.
a Illustrates the proposed SSL-MISS-Net architecture. Self-supervised and Self-attention Learning (SSL) consists of two main components: the pretrained Self-Supervised Learning (b) and the Transformer-based Bimodal Feature Coupling Module (c). The Self-Supervised Learning employing dynamic full-masking strategy effectively explores inter-modality correlations to establish robust cross-modal mapping networks. The Transformer-based Bimodal Feature Coupling Module (BFCM) dynamically regulates modality feature weights via attention mechanisms, reinforcing salient bimodal features. SSL-MISS-Net establishes a novel Missing-label Synergistic-optimized Strategy (MISS) by the Labeling prior knowledge-based GCN Learning Module (d) with multi-task loss function based on missing-label encoding (e). The multi-task loss function based on missing-label encoding (\({{\mathcal{L}}}_{{ML}}\)) optimizes missing label samples through adaptive recalibration of their loss weights; the Labeling prior knowledge-based GCN Learning Module (LGLM) explores the dependencies relationships between labels, enabling the network to accurately infer missing annotations. f Illustrates the number and distribution ratio of the nine data centers. g Shows the division of the dataset in this study.
Results
Study population
We collected MRI data from 2238 patients across nine data centers, including six in-house medical institutions and three public repositories. The public repositories include The Cancer Genome Atlas11 (TCGA), the Erasmus Glioma Database12 (EGD), and Brain Tumor Segmentation (BraTS) Challenge 2020 with molecular features and histopathological diagnostics information6. Meanwhile, we retrospectively collected data from six in-house medical institutions: Ruijin Hospital, Shanghai Jiao Tong University (RJ), Xinhua Hospital, Shanghai Jiao Tong University (XH), Tongji Hospital, Tongji University (TH), Huashan Hospital, Fudan University (HS), the First Affiliated Hospital of Fujian Medical University (FH), and the Affiliated Hospital of Southwest Medical University (SH). The characteristics of patient cohorts are summarized in Table 1. Figure 2 illustrates the patient inclusion criteria.
Image data acquisition and preprocessing
We collected MRI data from 2238 patients across nine data centers, including six in-house medical institutions and three public repositories. The data distribution is illustrated in Fig. 1f. To rigorously evaluate the robustness and generalizability of our method, we divided the 9 datasets into a five-fold cross-validation set and two independent test sets in an 8:2 ratio. The five-fold cross-validation dataset consisted of 1791 patients used for model development and optimization. We established two independent test sets: (1) Public Test Set, including BraTS (N = 40 cases), TCGA (N = 83 cases), and EGD (N = 97 cases), totaling 220 cases; (2) In-house Test Set, which included Fudan University Huashan Hospital (HS; N = 189 cases) and the remaining five smaller sample centers (FH, TH, RJ, XH, SH; N = 38 cases), totaling 227 cases, as illustrated in Fig. 1g.
The MRI images used in this study underwent a series of preprocessing steps: First, all FLAIR and T1C sequences were skull stripped and field correction to remove radiofrequency pulse inhomogeneity. Next, we applied our previous research, UDA-GS13, to segment the MRI data of each patient, obtaining the segmentation masks. Since multi-center imaging heterogeneity is most pronounced in the background regions, we selected the tumor region of the corresponding axial slice based on the segmentation masks as the model input to reduce its impact on downstream prediction tasks. For computational efficiency, all images were resized to 256 × 256, ensuring uniform aspect ratios. We then applied data augmentation techniques including random scaling and rotation, random expansion, and normalization to improve the robustness of the model.
Integrated diagnostic results for gliomas
The diagnostic performance of our method on the cross-validation set is demonstrated in the Table 2, the prediction tasks for molecular features (IDH mutation status, 1p/19q co-deletion) and pathology types achieved AUC values of 0.96. Figure 3a demonstrates that our method achieves superior overall performance compared with competing approaches, exhibiting the most extensive coverage area in the performance distribution. By contrast, comparison methods exhibit somewhat in SPE and SEN. This indicates limited adaptability to complex data distributions and compromises their discriminative capacity between positive and negative samples (IDH mutant vs. IDH wild-type and 1p/19q co-deletion vs. 1p19q intact). Furthermore, Fig. 3b reveals that our method achieves superior predictive performance for pathology types than other popular approaches. Notably, our method achieves 93% accuracy for glioblastoma and minimal confusion against astrocytoma and oligodendroglioma, further demonstrating its efficacy in pathology types classification.
a Provides a multidimensional performance comparison of different deep learning models in predicting molecular features and pathology types. b Displays the confusion matrices for pathology types diagnosis, where diagonal elements represent the proportion of correctly classified samples, and off-diagonal elements indicate the proportion of misclassified samples.
Furthermore, our method consistently achieved high discriminative capability across two independent test sets. From the Table 2, the AUC values for the three tasks reached 0.93, 0.92, and 0.91 in the Public Test Set, and the corresponding AUC values were 0.85, 0.84, and 0.81 in the In-house Test Set, significantly outperforming other methods. Additionally, as shown in Figs. 4b and 5b, our method achieved superior SPE-SEN performance over the entire range on both test sets compared with competing methods. From Figs. 4c and 5c, we observe that competing methods exhibit lower accuracy for predicting astrocytoma on both the Public Test Set and In-house Test Set, and experience higher confusion rates with the other two classes, indicating weaker discriminative ability. By contrast, our method significantly reduces inter-class confusion, attaining higher classification accuracy across all three classes. This demonstrates the best balance and the lowest inter-class confusion rate.
a Provides a multidimensional performance comparison of different deep learning models in predicting molecular features and pathology types. b Presents the ROC curves and AUC for molecular features diagnosis using different deep learning models. c Shows the confusion matrices for pathology types diagnosis.
a Provides a multidimensional performance comparison of different deep learning models in predicting molecular features and pathology types. b Presents the ROC curves and AUC for molecular features diagnosis using different deep learning model. c Shows the confusion matrices for pathology types diagnosis.
Analyzing results from both the validation and test sets, we conclude that our method demonstrates significant superiority in most prediction tasks. Despite incomplete image sequences during training, our approach effectively mitigates these issues by cross-modal self-supervised learning, enhancing adaptability to complex data distributions. Additionally, by employing a missing-label synergistic-optimized strategy (MISS), the method significantly improves the model’s ability to learn from complex label samples. Consequently, our model consistently outperforms competing methods. These findings indicate that our approach not only utilizes imperfect data more efficiently but also enhances robustness and effectiveness in glioma classification tasks.
Contribution of imperfect data
To demonstrate our study’s effectiveness, we trained and validated our model on four datasets: (1) a complete dataset (CD); (2) a dataset containing only complete labels (CL); (3) a dataset containing only complete modalities (CM); (4) SSL-MISS-Net.
As illustrated in Fig. 6a, our method increased the amount of clinically available data by roughly 256% compared to the complete dataset. Moreover, compared to CL or CM, the available data volume increased by about 70%. The five-fold cross-validation results of our method across different datasets are shown in the Table 3. The results indicate that on the CD, our method outperforms the previously methods described earlier. Moreover, when applying our method to the full-incomplete dataset, our method significant improvements over its performance on the CD: the AUC values for molecular feature prediction increased by 5% and 3%, the AUC for pathology types increases by 5%. As illustrated in Fig. 6b, our method improves classification balance and stability, achieving the best overall performance, particularly in pathology types. Even with an approximately 256% increase in CD, our method effectively captures critical features from missing data, thereby enhancing its robustness and generalization ability.
a Illustrates the clinically available data volume under four conditions: CD, CL, CM, and SSL-MISS-Net. b Compares the multidimensional performance of our method in predicting molecular features and pathology types on the cross-validation dataset across various data distributions. c Presents the confusion matrix for pathology types predictions.
We randomly selected 165 cases from Public Test Set and In-house Test Set as the test set, the experiment results are shown in Table 3. As shown in Table 3, even if the substantial increase in incomplete data, our method effectively utilizes incomplete image sequences and labels through SSL and MISS, thus significantly improves the accuracy of diagnosis. For molecular feature and pathology types prediction, our model achieved AUC values of 0.89, 0.91, and 0.90, reflecting improvements of 6%, 7%, and 4%, respectively, compared to its performance on the CD. As shown in Fig. 7a, our method achieves superior stability and consistency across most prediction tasks. The AUC curves and corresponding enlarged subgraphs in Fig. 7b further confirm that our method significantly outperforms competing approaches. Notably, in the high-sensitivity region of the ROC curve for 1p/19q prediction, our method’s curve is positioned closer to the ideal point than those of competing methods. Furthermore, Fig. 7c indicates that our method achieves the best classification balance in pathology types, improving the classification performance of astrocytoma and glioblastoma.
a Shows the AUC distribution of the five-fold cross-validation models for predicting molecular features and pathology types. b Presents the ROC curves for molecular biomarker diagnosis along with a zoomed-in view of specific regions. c Displays the confusion matrices for pathology types diagnosis in the test set.
Contribution of different reconstruction losses
To evaluate the impact of different reconstruction losses on SSL-MISS-Net, and following prior studies on image reconstruction, we selected the L1 loss14,15,16, the SSIM loss17,18 and the SSIM-L1 joint loss19 for comparison against the MSE loss employed in SSL-MISS-Net. All networks were initialized identically to ensure fairness in comparison. We assessed the influence of these reconstruction losses on the final predictive tasks using both the Public Test Set and the In-house Test Set, with the experimental results summarized in Table 4 and Figs. 8, 9.
As shown in Table 4 and Figs. 8, 9a, b, MSE loss consistently outperforms methods based on alternative reconstruction losses across all three predictive tasks on both independent test sets. From the attention score distributions in Figs. 8, 9c, it can be observed that the reconstructed sequence features of MSE loss are more semantically aligned with the original sequences compared with those obtained using other reconstruction losses. The primary focus of this study is on capturing the structural-semantic features of missing sequences within multi-center data, rather than solely pursuing visual fidelity. By employing the MSE loss, SSL-MISS-Net places greater emphasis on structural consistency, while partially disregarding inter-center distributional differences, thereby enabling the pretrained Self-Supervised Learning to learn structural–semantic representations most relevant to the final predictive tasks.
Contribution of different methodology module
To evaluate the effectiveness of the SSL-MISS-Net and the contribution of its three modules, we performed ablation studies by removing individual modules. We tested the following model variations: (1) ONF: the method without the BFCM, where feature fusion is performed using a simple addition operation to evaluate the importance of BFCM; (2) ONG: the method without the LGLM, assessing the impact of LGLM; (3) SSL-MISS-Net. All networks were initialized identically to ensure fairness in comparison. The experimental results on five-fold cross-validation are presented in the Fig. 10 and Table 5.
As illustrated in Table 5, the ONF exhibits a significant drop in predictive performance on the test sets. This decline is particularly evident in the In-house Test Set, where the AUC decreases by 4–9%. Similarly, the ONG also exhibits a significant drop in predictive performance, with the AUC decreasing by 2–7% across two independent test sets. As shown in Figs. 11a and 12a, the absence of either BFCM or LGLM not only reduces overall AUC values but also makes its AUC distribution more volatile. Figures 11b, c and 12b, c further demonstrate that SSL-MISS-Net significantly improves predictive performance across all tasks and achieves strong classification balance and robustness.
Discussion
Recent studies6,7,8,9,10 have achieved notable progress in predicting key molecular markers. Study6 utilized four imaging sequences (T1, T1C, T2, FLAIR) from a single-center dataset comprising 439 patients to predict IDH status, achieving an AUC of 0.9. In the multi-center study, study7 utilized dual imaging sequences (T1C, FLAIR) of 1166 patients from three institutions for IDH status prediction with AUC ranging from 0.86 to 0.96. Study9 utilized four imaging sequences of 1748 patients from sixteen institutions for IDH status prediction, achieving an AUC of 0.9 on an independent test set. Prior studies20,21,22 have employed multi-task networks to predict glioma genetic and histologic features. Study20 utilized four modalities from a single-center dataset including 120 patients to develop a multi-task network for glioblastoma molecular feature prediction (IDH, 1p/19q, etc.), achieving a mean prediction accuracy of 0.819. Study21 developed a 3D UNet-based multi-task network utilizing four imaging sequences from a multi-center dataset containing 348 patients to predict tumor grade and IDH status, achieving a mean prediction accuracy of 93.55%. However, they do not predict the 1p/19q co-deletion status needed for the WHO CNS5. Meanwhile, Study22 utilized four modalities from a multi-center dataset including 738 patients to predict glioma genetics (IDH and 1p/19q status), achieving a mean prediction AUC of 0.89. While these studies demonstrate significant advancements in predicting the molecular features of gliomas, they universally depend on complete imaging sequences and label information, presenting two critical limitations: (1) Data Scarcity Constraints: few study exceeds 2000 cases due to glioma’s low incidence rate, resulting in inherent data scarcity that constrains model generalizability; (2) Integrated Diagnostic Complexity: the WHO CNS5 mandates integrated diagnostics combining specific molecular features with histopathological diagnostics, posing significant challenges for clinical data acquisition and AI model development.
Extensive efforts have been devoted to addressing the challenge of incomplete image sequences. One straightforward approach involves training a “dedicated” sub-model for each possible subset of sequences23,24,25,26. To enhance model performance, a common strategy is co-training27, wherein features from a full-modality network are distilled into networks designed for incomplete image sequences. Another approach involves synthesizing the incomplete image sequences to enable full-modality processing28,29. These approaches often leverage generative adversarial networks (GANs)30. The current approaches focus on projecting the available sequences into a common latent feature space to learn shared feature representations. Nevertheless, the above approaches often require designing separate network architectures for each available modality and implementing complex feature interaction mechanisms31,32,33,34, leading to high computational and spatial overhead. To address the issue of missing labels, the primary approach has been to leverage semi-supervised learning to improve model training with incomplete label data. Semi-supervised learning methods for medical imaging can generally be categorized into self-training-based methods34,35 and graph-based methods36. Recent studies have largely focused on single recognition tasks, but clinical diagnostics frequently require concurrent prediction of multiple clinical indicators. Existing studies have primarily focused on addressing either input-side (imaging sequences) or output-side (annotation) challenges in isolation. To date, no unified framework addresses concurrent incomplete imaging sequences and imperfect annotations. Therefore, we propose SSL-MISS-Net to synergistically resolve the above dual challenges while exploiting the modeling potential of imperfect data, transforming it into a valuable training resource.
In this study, we introduce a novel SSL-MISS-Net (Self-Supervised Learning with MIssing label coding and Semantic Synthesis) for glioma designed to address two challenges: incomplete imaging sequences and imperfect annotations, enabling incomplete clinical data to effectively contribute to model training. SSL-MISS-Net significantly increases clinically available data by 256% (compared to complete data) through SSL and MISS, thus becoming the only AI study achieving integrated diagnosis (IDH mutation status, 1p/19q co-deletion, and pathology types) of glioma on a dataset of over 2000 cases. Validated on a multi-center cohort (N = 2238 patients from nine institutions), our method achieved an average AUC of 0.96 for molecular features and pathology types in cross-validation sets. In two independent test sets, our method achieved an average AUC of 0.92 and 0.83. The experimental results demonstrate that the proposed method effectively captures key features from both complete and incomplete data, enabling incomplete data to contribute to model training, thereby significantly enhancing the model’s generalization capacity and robustness.
Although SSL-MISS-Net demonstrates overall superior performance compared with current state-of-the-art methods, prediction errors still occur in cases with low image quality or complex tumor characteristics, as illustrated in the Fig. 13. Such cases typically suffer from motion artifacts, low signal-to-noise ratios, or image corruption, which result in blurred images or reduced contrast between the tumor and surrounding tissue, making it difficult for the model to capture critical discriminative features. Moreover, the current gold standard for molecular features and pathology types is established through histological and molecular analyses of biopsy sample from the tumor region. However, given the intratumoral heterogeneity, biopsy samples may not fully represent the molecular characteristics of the entire tumor. As a result, in certain cases, the gold standard itself may contain a degree of uncertainty, which could influence the evaluation of the model.
Despite these challenges, SSL-MISS-Net achieves stable and high-accuracy predictions of molecular subtypes and pathological types under real clinical conditions characterized by multi-center data, missing modalities, and incomplete labels. This has important implications for the clinical management of gliomas. Traditional methods generally require complete modality inputs for prediction, whereas SSL-MISS-Net is designed to maintain robust performance even when modalities are missing. Furthermore, SSL-MISS-Net enables the inclusion of a large amount of clinical data that would otherwise be excluded due to incomplete modalities or annotations, thereby substantially expanding the pool of clinically usable data.
By simultaneously addressing the issues of missing modalities and limited data utilization, SSL-MISS-Net not only improves algorithmic performance on ideal datasets but also enhances generalization and deployment value in real-world clinical settings, making the overall modeling strategy more aligned with clinical practice needs. In actual workflows, SSL-MISS-Net can be integrated as a downstream module of existing automatic segmentation models, directly using their tumor ROI outputs to perform molecular and pathological subtype predictions. Importantly, it can still provide reliable results even in the presence of missing modalities. This feature enables seamless integration into existing hospital intelligent diagnostic systems, allowing for rapid preoperative predictions immediately after imaging acquisition, thereby assisting radiologists and neurosurgeons in accelerating diagnostic decision-making.
It is worth discussing the limitations of this study: Firstly, our method requires preprocessing steps to extract tumor-region slices from MRI images before inputting them into the model. In future work, we aim to further refine our approach by developing an end-to-end model that can simultaneously perform tumor segmentation. In addition, we plan to incorporate more advanced imaging techniques and explore the integration of heterogeneous data sources, such as additional MRI sequences, to further enrich the model’s inputs. Finally, we propose to integrate our method into foundation models to further enhance the generalization ability and clinical application of foundation models in complex healthcare environments.
Methods
Method overview
The structure of SSL-MISS-Net is illustrated in Fig. 1a. SSL-MISS-Net simultaneously predicts IDH mutation status, 1p/19q co-deletion, and pathology types, aligning with the WHO CNS5 diagnostic criteria. SSL-MISS-Net is composed of a ResNet-based image feature extractor, a Self-Supervised and Self-Attention Learning (SSL) module, and a MISS, jointly designed to address the challenges of incomplete imaging sequences and missing annotations in multi-center glioma diagnosis. SSL consists of two main components: the pretrained Self-Supervised Learning and the Transformer-based Bimodal Feature Coupling Module (BFCM). Building upon the proven success of self-supervised learning paradigms in foundation models, Self-Supervised Learning employs a dynamic full-sequence masking strategy during pretraining to enable robust feature learning from complete imaging sequences. Specifically, it randomly selects either the T1C or FLAIR sequence from multi-center data for dynamic full masking, and applies a Mean Squared Error (MSE) loss to force the network to reconstruct the masked features from the unmasked sequence. This strategy enhances the model’s ability to capture cross-modal structural consistency within the multi-center semantic space, thereby facilitating the learning and optimization of structural-semantic representations most relevant to the final prediction task, as illustrated in Fig. 1b.
When encountering missing imaging sequences in the input data, SSL-MISS-Net activates the pretrained Self-Supervised Learning to achieve real-time completion of absent sequences. Notably, during the training of SSL-MISS-Net, the pre-trained Self-Supervised Learning remains frozen. This design preserves its strong cross-modal modeling capability and prevents the loss of cross-modal consistency learned during pretraining. This adaptive compensation mechanism enables unified learning of deep feature representations from both complete and partial sequences data, ensuring anatomical consistency in diagnostic decisions even under missing data scenarios. During image feature extraction, we designed a Transformer-based BFCM to more effectively couple multi-center bimodal image features. Utilizing the self-attention mechanism37,38, BFCM explores inter-modality correlations, adaptively adjusts the contribution of each modality, reinforces salient bimodal regions with strong structural consistency, and attenuates non-salient representations.
To address missing labels, SSL-MISS-Net establishes a novel MISS by multi-task loss function based on missing-label encoding with the Labeling prior knowledge-based GCN Learning Module (LGLM). LGLM constructs word embeddings and adjacency matrices for molecular features and pathology types labels, and employs a GCN to learn and optimize dependencies between the labels. The multi-task loss function based on missing-label encoding enhances model robustness to partially annotated samples by assigning designated label encodings to missing-label instances and adaptively reweighting negative sample losses. This loss function reduces penalties for potential false-negative cases, thereby strengthening the network’s capacity to leverage label-deficient data while maintaining diagnostic decision consistency.
Self-supervised learning
In the pretraining phase, SSL-MISS-Net only optimizes the Self-Supervised Learning, which adopts a cascaded architecture of a ResNet34 encoder and a UNet decoder. To address the practical challenge of randomly missing T1 or FLAIR sequences in clinical imaging, and inspired by recent advances in self-supervised learning for visual foundation model representation learning, we design a dynamic full-masking strategy. This strategy randomly selects and completely masks one sequence, thereby forcing the network to reconstruct the missing features by leveraging spatial correlations and structural–semantic relationships across sequences. In addition, we employ the MSE loss39 to guide optimization, encouraging the network to focus more on spatial features and significantly enhancing its ability to model structural consistency between reconstructed features and the target modality. Consequently, the pretrained Self-Supervised Learnin is able to partially disregard inter-center distribution discrepancies and instead learn structural–semantic representations more closely aligned with the target task, as illustrated in Fig. 1b.
During the SSL-MISS-Net training phase, whenever missing modalities are detected in the input data, the system activates the frozen pretrained Self-Supervised Learning to achieve real-time sequence completion. To ensure stability and robustness, the Self-Supervised Learning is kept frozen rather than fine-tuned. This design preserves the cross-modal structural features learned during pretraining, prevents the loss of anatomical representations, avoids overfitting under limited complete data, and simultaneously reduces training complexity by lowering computational cost and parameter updates.
Transformer-based bimodal feature coupling module
The restored dual-modality data is subsequently processed through an encoder for foundational feature extraction, then fed into a Transformer-based BFCM. This module explores inter-modality correlations, adaptively adjusts modality contributions, reinforces salient bimodal regions with strong structural consistency, and attenuates non-salient representations.
The BFCM consists of two main components: the correlated feature extraction layer and the modality attention layer. First, the current image features are passed into the correlated feature extraction layer, where features are mapped into tokens and fed into a self-attention layer to learn the correlations between modalities. These features are then passed to the modality attention layer, where a Softmax operation generates modality-specific weight maps. These weight maps are used to couple with the corresponding features, constructing a shared feature representation. This process is shown in Fig. 1c.
Correlated feature extraction layer
This layer primarily employs a self-attention mechanism. Given the input features \({f}_{k}\), we first reshape them into a one-dimensional format, yielding the corresponding token set \({t}_{k}\in {R}^{B\times 512\times {HW}}\). The tokens are then concatenated to construct a unified token representation \(t\in {R}^{B\times 2{HW}\times 512}\).
After obtaining the unified token representation \(t\), we apply multi-layer self-attention to capture underlying modality correlations. Each self-attention layer comprises a multi-head self-attention (MHSA) and a fully connected feed-forward network (FFN). The multi-layer self-attention can be described as Eqs. (1–2).
where \(z\) represents the \(z\)-th layer of self-attention. \({LN}\) stands for Layer Normalization. Finally, we obtain the output \({f}_{{SA}}={t}_{Z}\in {R}^{B\times 2{HW}\times 512}\) from the multi-layer self-attention. We then map \({f}_{{SA}}\) back to the image feature space, yielding the output of the corresponding feature extraction layer, denoted as \({I}^{{\rm{\mbox{'}}}}\). The corresponding formula is Eq. (3).
where \(\mathrm{Re}\left(\cdot \right)\) and \({Split}\left(\cdot \right)\) denote the reshape and split operations, respectively. \({{\rm{I}}}^{{\rm{\mbox{'}}}}\) is the feature set \({f}_{k}^{{\rm{\mbox{'}}}}\in {R}^{B\times 512\times H\times W}\) computed by the corresponding feature extraction layer, which is of the same size as the original input \({f}_{k}\).
Modality attention layer
After obtaining the feature set \({I}^{{\rm{\mbox{'}}}}\) from the correlated feature extraction layer, we apply an attention mechanism to produce modality-specific weight maps \({{\rm{W}}}_{{\rm{k}}}\). Feature representations extracted from different modalities have different fusion weights, so we introduce a dual-modal Softmax function to generate the weight maps. As shown in the Fig. 3.
Finally, we multiply the input features by their corresponding weight maps \({{\rm{W}}}_{{\rm{k}}}\) and sum over all modalities to obtain the fused feature map \({{\rm{f}}}_{{\rm{im}}}\). The corresponding formula is Eq. (4).
Since the sum of the \({W}_{k}\) is 1, the value range of \({f}_{{im}}\) remains stable. Additionally, we retain \({v}_{k}\) in the corresponding weights to indicate the relative magnitude of the latent multi-modal correlations derived from the modality attention layer. This step further enhances the robustness of data fusion process.
Labeling prior knowledge-based GCN learning module
By analyzing the latest WHO CNS5 classification criteria, we observe a mutual relationship between molecular features and pathology types. Based on this analysis, inspired by research40,41,42, we propose a LGLM, as shown in Fig. 1d. We construct a directed graph derived from existing data on molecular features and pathology types. Each graph node (label) is represented by a label word embedding. We construct the adjacency matrix by capturing mutual exclusivity and dependency relationships among labels. Notably, label nodes of the same subtype exhibit mutual exclusivity, label nodes of different subtypes display both exclusivity and dependency. Subsequently, we apply GCN to map these labels into a set of mutually dependent label features. Finally, these features are coupled with the fused image features to enhance diagnostic accuracy.
Correlation matrix
GCN works by propagating information between nodes based on the correlation matrix. Therefore, constructing an effective correlation matrix is a key challenge for GCN. In this study, we define label dependencies by mining the co-occurrence patterns of molecular features and pathology types. This approach captures the relationships among different labels, ensuring that the correlation matrix effectively represents their interactions and dependencies.
We model the label dependencies using conditional probabilities, where \(P({d}_{j},|,{d}_{i})\) denotes the probability that label \({d}_{j}\) is present given label \({d}_{i}\) is exists, as shown in the Fig. 4. Notably, since certain labels exhibit mutual exclusivity \({\rm{P}}({{\rm{d}}}_{{\rm{j}}},|,{{\rm{d}}}_{{\rm{i}}})\ne {\rm{P}}({{\rm{d}}}_{{\rm{i}}},|,{{\rm{d}}}_{{\rm{j}}})\), rendering the resulting correlation matrix non-symmetric.
To build the correlation matrix, we first count the occurrences of paired labels in the dataset to obtain the label co-occurrence matrix \({\rm{M}}\in {{\rm{R}}}^{{\rm{l}}\times {\rm{l}}}\), where \(l\) represents the number of classes. \({M}_{{ij}}\) denotes the concurring times of \({d}_{i}\) and \({d}_{j}\). We then derive the conditional probability matrix from \(M\). The corresponding formula is Eq. (5).
where \({N}_{i}\) represents the occurrence count of \({d}_{i}\), \({P}_{{ij}}=P\left({d}_{j},|,{d}_{i}\right)\) denotes the probability of label \({d}_{j}\) being present given that label \({d}_{i}\) exists. Therefore, even if certain labels appear much more frequently in the dataset than other categories, their influence in the adjacency matrix is normalized. This prevents the relationships of less frequent labels from being overshadowed, ensuring a more balanced representation of label dependencies. This approach effectively alleviates the bias introduced by sample imbalance and enables the constructed graph to express label dependencies in a more balanced and meaningful manner.
GCN-based classifier learning
We construct a GCN to learn interdependent label features from the label space, denoted \(W=\{{w}_{i}{\}}_{i=1}^{l}\), where\(l\) denotes the number of classification categories. Specifically, we employ a simple two-layer GCN to yield the learned label features. The first GCN layer receives an input \(y\in {R}^{l\times d}\), where \(d\) denotes the dimensionality of the label word embeddings. After two GCN layers, the final output is the learned label features \({f}_{{la}}\in {R}^{l\times D}\), where \(D\) is the dimensionality of the extracted image features. Finally, the image features and label features are combined to yield the integrated diagnostic result. The corresponding formula is Eq. (6).
Multi-task loss function based on missing-label encoding
In the field of multi-label learning in natural images, several previous works43,44,45 have adopted a strategy of treating missing labels directly as negative samples during model training. So, we treat missing labels as False Negative (FN) samples, and it is important to distinguish missing labels from true negatives (TN)46. Typically, FN samples exhibit high confidence (prediction probability close to 1) in the early stages of training45,47. Based on these findings, we introduce a weighted MSE loss function based on missing-label encoding (\({{\mathcal{L}}}_{{WMSE}}\)) on the negative sample side. \({{\mathcal{L}}}_{{WMSE}}\) adaptively re-weights negative samples according to their confidence, reducing the penalty on high-confidence negatives and encouraging the network to focus on “semi-hard negatives”. This weighting scheme suppresses the adverse influence of missing-label samples on model optimization, thereby improving discriminative capability and enhancing robustness to missing labels. In contrast to popular label correction strategies (e.g., Pseudo-labeling48, Regularized Online Label Estimation49), \({{\mathcal{L}}}_{{WMSE}}\) provides a smooth and continuous adjustment of loss gradients for negative samples, which prevents the network from overfitting incorrect labels or collapsing during the early training phase.The corresponding formula is Eq. (7).
where \({\rm{w}}\left(p\right)={\rm{\lambda }}-{\rm{p}}\) is the weight coefficient, designed to dynamically adjust the loss weight for FN samples. To achieve this goal, considering that the typical threshold for sigmoid-based binary classification is 0.5, we determine \({\rm{\lambda }}=1.5\) according to \(\frac{{\partial }^{2}{{\mathcal{L}}}_{{WMSE}}}{\partial {x}^{2}}=0,p={\rm{\sigma }}\left(x\right)=0\), thereby deriving the \({{\mathcal{L}}}_{{WMSE}}\).
On the positive sample side, we likewise expect the network to focus on semi-hard samples in order to enhance its generalization and discriminative ability. Therefore, we extend the standard Focal Loss50 (\({{\mathcal{L}}}_{{Focal}}\)) by introducing a margin parameter, which guides the network to emphasize semi-hard positive samples while suppressing extreme outliers. This stabilizes gradient updates and reduces sensitivity to early prediction errors. Finally, we combine \({{\mathcal{L}}}_{{WMSE}}\) with \({{\mathcal{L}}}_{{Focal}}\) as the total loss function \({{\mathcal{L}}}_{{ML}}\), ensuring that the model achieves better robustness in multi-label learning, as illustrated in Fig. 1e. During training, \({{\mathcal{L}}}_{{ML}}\) regulates loss gradients from both the negative and positive sides, thereby enhancing training stability and generalization in scenarios with missing labels. This approach substantially improves the model’s generalization capability on complex data. The corresponding formulas are provided in Eqs. (8–9).
where \({p}_{m}={\rm{\sigma }}\left({\rm{x}}-{\rm{m}}\right)\), \(m\) is a margin parameter. Finally, we provide details of the process of the training of the algorithm. As shown in Algorithm 1.
Algorithm 1
SSL-MISS-Net algorithm
Require: Pre-processed images \({{\rm{x}}}_{{\rm{k}}}\), Ground truth \(g\), Pre-trained Self-Supervised Learning \(P{SSL}\), Label Prior Knowledge \({x}_{l}\), Model parameters φ
1: for epoch do
2: if \({{\rm{x}}}_{{\rm{k}}}\) exists missing image sequences
3: \({{\rm{x}}}_{{\rm{k}}}={PSSL}\left({{\rm{x}}}_{{\rm{k}}}\right)\,{\triangleright}\,{Missing\; image\; sequences\; replacement}.\)
4: end if
5: \({{\rm{f}}}_{{\rm{k}}}={\rm{ResNet}}34\left({{\rm{x}}}_{{\rm{k}}}\right)\,{\triangleright}\,{Image\; feature\; extraction}.\)
6: \({{\rm{f}}}_{{\rm{i}}}{\rm{m}}={\rm{BFCM}}\left({{\rm{f}}}_{{\rm{k}}}\right)\,{\triangleright}\,{Image\; feature\; fusion}.\)
7: \({f}_{l}a={LGLM}\left({x}_{l}\right)\,{\triangleright}\,{LabelDependency\; Capture}.\)
8: \({\rm{f}}={f}_{i{\rm{m}}}\times {f}_{l{\rm{a}}}\,{\triangleright}\,{Integrated\; Diagnostics}\)
\({\triangleright}{Missing\; Label\; Loss\; Calculation}\)
9: Calculated loss \({{\mathcal{L}}}_{{\mathcal{M}}{\mathcal{L}}}\left({\rm{f}},{\rm{g}}\right)\)
10: The Adam optimizer and the loss to update the network parameters φ
11: end for
Study implementation and statistic method
We implemented our approach in PyTorch 1.8 and trained and tested all models on a single Nvidia A10 Tensor Core GPU. During training, we used the Adam optimizer with a learning rate of 5e-5 and a batch size of 32. To validate the effectiveness of the IDH and 1p19q prediction, we evaluated its discriminative ability using four evaluation metrics: accuracy (\({ACC}\)), the area under the receiver operating characteristic curve (\({AUC}\)), specificity (\({SPE}\)), and sensitivity (\({SEN}\)). The corresponding formula is Eqs. (10–13).
where \(m\) and \(n\) respectively denote the number of positive and negative samples. TP, TN, FP, and FN respectively denote the True Positive, True Negative, False Positive, and False Negative. \({\hat{y}}_{i}^{+}\) represents the predicted score of the \(i\)-th positive sample, \({\hat{y}}_{j}^{-}\) represents the predicted score of the \(j\)-th negative sample, \(\prod\) is the indicator function, which equals 1 if the condition inside the brackets is true, and 0 otherwise. Furthermore, since the pathology types involves three types, we calculated \({{AUC}}_{p}\), \({{SPE}}_{p}\), and \({{SEN}}_{p}\) by iteratively treating each class as the positive class and all other classes as the negative51. We implemented this using library functions from TorchMetrics, with the computational process detailed in Eqs. (14–16).
where \({N}_{c}\) denotes the sample count of the \(c\)-th class in the dataset.
We intuitively demonstrate diagnostic performance through multi-dimensional visualization analysis; employs radar charts to reflect balance across metrics; analyzes class-specific misclassification rates via confusion matrices; and evaluates diagnostic efficacy with integrated ROC curves.
To verify the effectiveness and robustness of our method, we compared it with state-of-the-art approaches (AHI7, MTTU6, ST8, PSNet9 and MDL10) on both the cross-validation set and the two multi-center test sets. AHI7 utilizes ResNet34 to extract images features, integrating radiomic signatures and patient age for IDH and 1p/19q prediction. MTTU6 constructs multi-scale fusion networks using two distinct high-level features for IDH status prediction. ST8 leverages Swin Transformers to predict IDH status. PSNet9 implements nnU-Net to extract multi-scale feature extraction and integrates features across all levels to predict the IDH status, 1p/19q status, and grade of the tumor. MDL10 modifies the original 3D ResNet34 architecture by replacing its final classification head with three task-specific classification heads to simultaneously predict IDH status, 1p/19q status, and grade of the tumor. The aforementioned methods are all representative approaches from recent years and are based on complete datasets.
We trained and validated our method on incomplete datasets, and the compared methods were trained and validated on complete datasets. This comparison not only highlights the necessity of our study but also verifies the effectiveness of the proposed approach. In addition, we perform ablation studies across diverse data distributions and framework components to prove the robustness of the proposed method.
Data availability
All reasonable requests for academic use of preprocessed in-house and analyzed data can be addressed to the corresponding author. All requests are promptly reviewed to determine whether the request is subject to patient-confidentiality obligations, processed in concordance with institutional guidelines, and requires a material transfer agreement. The public datasets are available at the following links: 1. TCGA: https://www.cancerimagingarchive.net/ 2. BraTS: https://www.med.upenn.edu/cbica/brats2020/ 3. EGD: https://xnat.bmia.nl/REST/projects/egd
Code availability
The code used in this paper is available on GitHub under an Apache-2.0 license at: https://github.com/sd-spf/SSL-MISS-Net.
References
Messali, A., Villacorta, R. & Hay, J. W. A review of the economic burden of glioblastoma and the cost effectiveness of pharmacologic treatments. Pharmacoeconomics 32, 1201–1212, https://doi.org/10.1007/s40273-014-0198-y (2014).
Louis, D. N. et al. The 2021 WHO classification of tumors of the central nervous system: a summary. Neuro-Oncol. 23, 1231–1251, https://doi.org/10.1093/neuonc/noab106 (2021).
Eckel-Passow, J. E. et al. Glioma groups based on 1p/19q, and promoter mutations in tumors. N. Engl. J. Med. 372, 2499–2508, https://doi.org/10.1056/NEJMoa1407279 (2015).
Nobusawa, S., Watanabe, T., Kleihues, P. & Ohgaki, H. Mutations as molecular signature and predictive factor of secondary glioblastomas. Clin. Cancer Res. 15, 6002–6007, https://doi.org/10.1158/1078-0432.Ccr-09-0715 (2009).
Cheng, J. H. et al. Multimodal disentangled variational autoencoder with game theoretic interpretability for glioma grading. IEEE J. Biomed. Health Inform. 26, 673–684, https://doi.org/10.1109/Jbhi.2021.3095476 (2022).
Cheng, J. H., Liu, J., Kuang, H. L. & Wang, J. X. A fully automated multimodal mri-based multi-task learning for glioma segmentation and IDH Genotyping. IEEE Trans. Med. Imaging 41, 1520–1532, https://doi.org/10.1109/Tmi.2022.3142321 (2022).
Choi, Y. S. et al. Fully automated hybrid approach to predict the mutation status of gliomas via deep learning and radiomics. Neuro-Oncol. 23, 304–313, https://doi.org/10.1093/neuonc/noaa177 (2021).
Wu, J. F. et al. Swin Transformer improves the IDH mutation status prediction of gliomas free of MRI-based tumor segmentation. J. Clin. Med. 11, 4625 https://doi.org/10.3390/jcm11154625 (2022).
van der Voort, S. R. et al. Combined molecular subtyping, grading, and segmentation of glioma using multi-task deep learning. Neuro-Oncol. 25, 279–289, https://doi.org/10.1093/neuonc/noac166 (2023).
Wu, X. W. et al. Biologically interpretable multi-task deep learning pipeline predicts molecular alterations, grade, and prognosis in glioma patients. Npj Precision Oncol. 8, 181 https://doi.org/10.1038/s41698-024-00670-2 (2024).
Bakas, S. et al. Advancing cancer genome atlas glioma MRI Collection with expert segmentation labels radiomic features.Sci. Data 4, 1–13 (2017).
van der Voort, S. R. et al. The Erasmus Glioma Database (EGD): structural MRI scans, WHO 2016 subtypes, and segmentations of 774 patients with glioma. Data Brief. 37, 107191, https://doi.org/10.1016/j.dib.2021.107191 (2021).
Hu, Z. et al. UDA-GS: a cross-center multimodal unsupervised domain Adaptation Framework Glioma segmentation.Comput. Biol. Med. 185, 109472 (2025).
Sun, H. Z. et al. Fourier convolution block with global receptive field for MRI reconstruction. Med Image Anal. 99, 103349, https://doi.org/10.1016/j.media.2024.103349 (2025).
Zou, J. et al. MMR-Mamba: multi-modal MRI reconstruction with Mamba and spatial-frequency information fusion. Med. Image Anal. 102, 103549, https://doi.org/10.1016/j.media.2025.103549 (2025).
Zhang, Y. et al. Unified multi-modal image synthesis for missing modality imputation. IEEE Trans. Med. Imaging 44, 4–18, https://doi.org/10.1109/Tmi.2024.3424785 (2025).
Zhou, X. Y., Zhang, Z. X., Du, H. W. & Qiu, B. S. MLMFNet: a multi-level modality fusion network for multi-modal accelerated MRI reconstruction. Magn. Reson. Imaging 111, 246–255, https://doi.org/10.1016/j.mri.2024.04.028 (2024).
Xuan, K. et al. Multimodal MRI reconstruction assisted with spatial alignment network. IEEE Trans. Med. Imaging 41, 2499–2509, https://doi.org/10.1109/Tmi.2022.3164050 (2022).
Fang, F. M. et al. HFGN: high-frequency residual feature guided network for fast MRI reconstruction. Pattern Recogn. 156, 110801, https://doi.org/10.1016/j.patcog.2024.110801 (2024).
Tang, Z. Y. et al. Pre-operative overall survival time prediction for glioblastoma patients using deep learning on both imaging phenotype and genotype. Lect. Notes Comput Sc. 11764, 415–422, https://doi.org/10.1007/978-3-030-32239-7_46 (2019).
Xue, Z., Xin, B., Wang, D. & Wang, X. Radiomics-enhanced multi-task neural network for non-invasive glioma subtyping and segmentation. in International Workshop on Radiomics and Radiogenomics in Neuro-oncology, 81–90 (Springer International Publishing, 2019).
Decuyper, M., Bonte, S., Deblaere, K. & Van Holen, R. Automated MRI based pipeline for segmentation and prediction of grade, IDH mutation and 1p19q co-deletion in glioma. Comput. Med. Imaging Graph. 88, 101831, https://doi.org/10.1016/j.compmedimag.2020.101831 (2021).
Azad, R., Khosravi, N. & Merhof, D. SMU-Net: style matching U-Net for brain tumor segmentation with missing modalities. Proc. Mach. Learn Res. 172, 48–62 (2022).
Chen, C., Dou, Q., Jin, Y. M., Liu, Q. D. & Heng, P. A. Learning with privileged multimodal knowledge for unimodal segmentation. IEEE Trans. Med. Imaging 41, 621–632, https://doi.org/10.1109/Tmi.2021.3119385 (2022).
Hu, M. et al. Knowledge distillation from multi-modal to mono-modal segmentation networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention 772–781 (Springer International Publishing, 2020).
Wang, Y. X. et al. ACN: adversarial co-training network for brain tumor segmentation with missing modalities. In International conference on medical image computing and computer-assisted intervention, 410–420 (Springer International Publishing, 2021).
Blum, A. & Mitchell, T. Combining labeled and unlabeled data with co-training. Proceedings of the eleventh annual conference on Computational learning theory, 92-100, https://doi.org/10.1145/279943.279962 (1998).
Lee, D., Moon, W. J. & Ye, J. C. Assessing the importance of magnetic resonance contrasts using collaborative generative adversarial networks. Nat. Mach. Intell. 2, 34–42, https://doi.org/10.1038/s42256-019-0137-x (2020).
Yu, B. T. et al. Ea-GANs: edge-aware generative adversarial networks for cross-modality MR image synthesis. IEEE Trans. Med. Imaging 38, 1750–1762, https://doi.org/10.1109/Tmi.2019.2895894 (2019).
Goodfellow, I. J. et al. Generative adversarial nets. Adv. Neur 27, 2672–2680 (2014).
Zhou, T. X., Ruan, S. & Hu, H. G. A literature survey of MR-based brain tumor segmentation with missing modalities. Comput. Med. Imaging Graph. 104, 102167, https://doi.org/10.1016/j.compmedimag.2022.102167 (2023).
Zhou, T. X., Canu, S., Vera, P. & Ruan, S. Feature-enhanced generation and multi-modality fusion based deep neural network for brain tumor segmentation with missing MR modalities. Neurocomputing 466, 102–112, https://doi.org/10.1016/j.neucom.2021.09.032 (2021).
Zhou, T. X., Canu, S., Vera, P. & Ruan, S. Latent correlation representation learning for brain tumor segmentation with missing MRI Modalities. IEEE Trans. Image Process. 30, 4263–4274, https://doi.org/10.1109/Tip.2021.3070752 (2021).
Singh, S. et al. Identifying nuclear phenotypes using semi-supervised metric learning. Inf. Process. Med. Imaging 6801, 398–410 (2011).
Gu, L. et al. Semi-supervised learning for biomedical image segmentation via forest oriented super pixels (voxels). In International Conference on Medical Image Computing and Computer-Assisted Intervention, 702–710 (Springer International Publishing, 2017).
Mahapatra, D., Vos, F. M. & Buhmann, J. M. Active learning based segmentation of Crohns disease from abdominal MRI. Comput. Methods Prog. Biomed. 128, 75–85, https://doi.org/10.1016/j.cmpb.2016.01.014 (2016).
Liu, Z. C., Wei, J., Li, R. & Zhou, J. L. SFusion: self-attention based N-to-One multimodal fusion block. Med. Image Comput. Computer Assist. Intervention, Miccai 2023 14221, 159–169, https://doi.org/10.1007/978-3-031-43895-0_15 (2023).
Liu, Z. et al. Swin transformer: hierarchical vision transformer using shifted windows. 2021 IEEE/Cvf International Conference on Computer Vision (ICCV 2021), 9992-10002, https://doi.org/10.1109/Iccv48922.2021.00986 (2021).
Rastogi, A. et al. Deep-learning-based reconstruction of undersampled MRI to reduce scan times: a multicentre, retrospective, cohort study. Lancet Oncol. 25, 400–410, https://doi.org/10.1016/S1470-2045(23)00641-1 (2024).
Chen, Z.-M., Wei, X.-S., Wang, P. & Guo, Y. Multi-Label Image Recognition With Graph Convolutional Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019), 5172–5181 (2019).
Kipf, T. N. & Welling, M. J. a. p. a. Semi-supervised classification with graph convolutional networks. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (IEEE, 2016).
Lee, C.-W., Fang, W., Yeh, C.-K. & Wang, Y.-C. F. Multi-label zero-shot learning with structured knowledge graphs. In Proc. IEEE conference on computer vision and pattern recognition, 1576–1585 (IEEE, 2018).
Chen, G., Song, Y., Wang, F. & Zhang, C. Semi-supervised multi-label learning by solving a sylvester equation. Proceedings of the 2008 SIAM international conference on data mining, 410–419, https://doi.org/10.1137/1.9781611972788.37 (2008).
Sun, Y.-Y., Zhang, Y. & Zhou, Z.-H. Multi-label learning with weak label. Proc. AAAI Conf. Artif. Intell. 24, 593–598 (2010).
Zhang, Y. et al. Simple and robust loss design for multi-label learning with missing labels. (2021).
Kundu, K. & Tighe, J. Exploiting weakly supervised visual patterns to learn from partial annotations. Adv. Neural Inform. Process. Syst. 33, 561–572 (2020).
Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570, https://doi.org/10.1038/s41551-020-00682-w (2021).
Lee, D.-H. Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. Workshop Chall. Represent. Learn. 3, 896 (2013).
Cole, E. et al. Multi-label learning from single positive labels. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 933–942 (IEEE, 2021).
Lin, T. Y., Goyal, P., Girshick, R., He, K. & Dollar, P. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42, 318–327, https://doi.org/10.1109/TPAMI.2018.2858826 (2020).
Van Calster, B. et al. Multi-class AUC metrics and weighted alternatives. 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), 1390-1396 (IEEE, 2008).
Acknowledgements
This work was supported by National natural science foundation of China (82372096), National Key R&D Program of China, MOST (2023YFC2510000), AI for Science Foundation of Fudan University (FudanX24AI038), Shanghai Science and Technology Commission Explorer Project (24TS1410600).
Author information
Authors and Affiliations
Contributions
Writing and approving of paper: P.S., C.S., J.Y., G.W., Z.F., Z.S., B.Y., F.L., F.Y., C.Z., Y.W. Design of the work: P.S., C.S., J.Y. Acquisition of data: C.S., Z.F., Z.S., B.Y. Analysis of data: P.S., C.S., J.Y., Z.S., XX. Method design and implementation: P.S., J.Y., GW. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
This study was conducted in accordance with the Declaration of Helsinki and was approved by the Institutional Ethics Committee of Huashan Hospital, Fudan University(ethics number:2015-256). The requirement for informed consent was waived by the committee due to the retrospective nature of the study.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Song, P., Shen, C., Fan, Z. et al. An integrated MRI-based diagnostic framework for glioma with incomplete imaging sequences and imperfect annotations. npj Precis. Onc. 9, 328 (2025). https://doi.org/10.1038/s41698-025-01112-3
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41698-025-01112-3












