End-to-end deep learning for the diagnosis of pelvic and sacral tumors using non-enhanced MRI: a multi-center study

Yin, Ping; Liu, Ke; Chen, Runrong; Liu, Yang; Lu, Lin; Sun, Chao; Liu, Ying; Zhang, Tianyu; Zhong, Junwen; Chen, Weidao; Yu, Ruize; Wang, Dawei; Liu, Xia; Hong, Nan

doi:10.1038/s41698-025-01077-3

Download PDF

Article
Open access
Published: 15 August 2025

End-to-end deep learning for the diagnosis of pelvic and sacral tumors using non-enhanced MRI: a multi-center study

Ping Yin¹^na1,
Ke Liu²^na1,
Runrong Chen³^na1,
Yang Liu⁴,
Lin Lu⁵,
Chao Sun¹,
Ying Liu¹,
Tianyu Zhang¹,
Junwen Zhong¹,
Weidao Chen⁶,
Ruize Yu⁶,
Dawei Wang⁶,
Xia Liu¹ &
…
Nan Hong¹

npj Precision Oncology volume 9, Article number: 286 (2025) Cite this article

2540 Accesses
1 Citations
Metrics details

Subjects

Abstract

This study developed an end-to-end deep learning (DL) model using non-enhanced MRI to diagnose benign and malignant pelvic and sacral tumors (PSTs). Retrospective data from 835 patients across four hospitals were employed to train, validate, and test the models. Six diagnostic models with varied input sources were compared. Performance (AUC, accuracy/ACC) and reading times of three radiologists were compared. The proposed Model SEG-CL-NC achieved AUC/ACC of 0.823/0.776 (Internal Test Set 1) and 0.836/0.781 (Internal Test Set 2). In External Dataset Centers 2, 3, and 4, its ACC was 0.714, 0.740, and 0.756, comparable to contrast-enhanced models and radiologists (P > 0.05), while its diagnosis time was significantly shorter than radiologists (P < 0.01). Our results suggested that the proposed Model SEG-CL-NC could achieve comparable performance to contrast-enhanced models and radiologists in diagnosing benign and malignant PSTs, offering an accurate, efficient, and cost-effective tool for clinical practice.

Deep learning algorithm assisting diagnosis of prostate cancer extracapsular extension based on [¹⁸F]PSMA-1007 PET/CT and multiparametric MRI: A multicenter study

Article 08 December 2025

The feasibility of deep learning-based synthetic contrast-enhanced CT from nonenhanced CT in emergency department patients with acute abdominal pain

Article Open access 14 October 2021

Preoperative prediction value of 2.5D deep learning model based on contrast-enhanced CT for lymphovascular invasion of gastric cancer

Article Open access 15 July 2025

Introduction

Pelvic and sacral tumors (PSTs) are rare, and metastatic tumor is the most common type due to the prominent hematopoietic function of this site^1,2,3. Primary benign PSTs mainly include giant cell tumors, schwannoma, neurofibroma, osteoid osteoma, and osteoblastoma^4,5,6. Primary malignant PSTs mainly include chordoma, chondrosarcoma, osteosarcoma, Ewing’s sarcoma, and lymphoma^7,8,9. Given that PSTs are rare and have similar clinical and imaging features, radiologists are having difficulty acquiring sufficient clinical experience to make a definite diagnosis¹⁰. In the early stage, PSTs are usually small and asymptomatic. When detected, it is usually large and compresses surrounding organs, which often requires surgical intervention. For all primary malignant sacral tumors and benign lesions involving lower segments when preservation of both S3 roots is possible, wide resection should be selected³. However, the prognoses of patients with PST are poor due to complex anatomical structures, multiple surrounding organs, and difficulty in operating on this site^2,11. Consequently, the diagnostic challenges posed by PSTs—including their rarity, symptom latency, imaging similarities, and the critical need for early detection to enable potentially curative (but complex) surgery—underscore the urgent requirement for accurate and efficient diagnostic tools¹².

In recent years, deep learning (DL) has shown great potential in exploring the nature of tumors and has been extensively used in bone tumor diagnosis, efficacy evaluation, and prognosis prediction^{13,14,15,16,17,18}. Few studies have used DL models to distinguish between benign and malignant bone tumors and were mainly based on plain films^12,19,20,21. Compared with plain films, multi-sequence magnetic resonance imaging (MRI) can better display the bone marrow infiltration and surrounding soft tissue involvement of PSTs. Owing to the large sizes of PSTs, the manual segmentation of lesions is time-consuming^4,22. An MRI-based DL segmentation model may be able to automatically segment PSTs lesions and reduce the tedious process of manually delineating lesions. In addition, attention-based DL models have been applied to medical image classification problems and have shown better aggregation and representation capabilities²³. Crucially, non-enhanced MRI is the cornerstone of initial bone lesion evaluation in clinical practice due to its wide availability, absence of contrast-related risks (e.g., nephrogenic systemic fibrosis, allergic reactions), and lower cost compared to contrast-enhanced protocols. However, interpreting complex non-enhanced MRI studies for rare PSTs remains challenging, particularly for less experienced radiologists. Therefore, developing a robust DL model capable of automatically diagnosing PSTs directly on routine non-enhanced MRI sequences holds significant promise for directly addressing the aforementioned diagnostic challenges. Such a tool could potentially: (1) augment radiologists’ diagnostic confidence and accuracy, especially in settings with limited PST expertise; (2) expedite the diagnostic workflow by automating lesion segmentation and analysis, reducing time-to-diagnosis; and (3) leverage the most accessible and safest initial MRI protocol, maximizing clinical utility and impact.

The aim of our study was to develop an end-to-end DL model for the diagnosis of benign and malignant PSTs using non-enhanced MRI.

Results

Patient characteristics

A total of 835 patients (441 males and 394 females; median age 45.0 years [range: 29.0–58.0], with ages ranging from 3 to 83 years) were included in this study (see Table 1). This cohort comprised 621 malignant tumors and 214 benign tumors. Centers 2, 3, and 4 had 17, 19, and 18 benign tumors respectively, and 46, 58, and 23 malignant tumors respectively. Clinical data for patients in the different sets are detailed in Supplementary Table 1S.

Table 1 Clinical characteristic of patients

Full size table

We found significant statistical differences in terms of age, sex, tumor size, and tumor location between patients with benign and malignant tumors (P < 0.01). The median age of patients with malignant tumors was 48.0 (29.0, 60.0), which was significantly higher than that of patients with benign tumors 38.0 (28.0, 51.0) (Z = −4.483; P = 0.000). The difference in the sex ratio between the two groups was significant (χ² = 14.55; P = 0.000). In patients with malignant tumors, the proportion of males is higher than that of females, whereas in patients with benign tumors, the proportion of females is higher than that of males. In addition, malignant tumors were significantly larger than benign tumors (Z = −3.431; P = 0.001). Benign tumors located in the sacrum were the highest in number (168 cases; 78.5%), followed by those in the ilium (19 cases; 8.9%). Malignancies in the sacrum were the highest in number (289; 46.5%), followed by those in the ilium (136; 21.9%). A significant difference in tumor location distribution was found between benign and malignant groups (χ² = 69.259; P = 0.000).

Performance of different models

The average Dice score and IoU value of the segmentation model were 0.758 and 0.610, respectively. For T1-w, T2-w, DWI, and CET1-w sequences, Dice scores were 0.606, 0.792, 0.694, and 0.728, and IoU values were 0.472, 0.678, 0.573, and 0.598, respectively. As shown in Fig. 1, the segmentation model achieved a relatively good segmentation effect.

**Fig. 1: Demonstration of the segmentation model’s predictions.**

Among Models ORI, MAN, and SEG, Model SEG had the best performance (Fig. 2; Table 2). Model ORI achieved an AUC of 0.735 and ACC of 0.759 in the Internal Test Set 1 and an AUC of 0.697 and ACC of 0.686 in the Internal Test Set 2. Model MAN achieved an AUC of 0.728 and ACC of 0.716 in the Internal Test Set 1. Model SEG had an AUC of 0.852 and ACC of 0.767 in the Internal Test Set 1 and an AUC of 0.736 and ACC of 0.743 in the Internal Test Set 2. Delong-test between AUCs showed that Model SEG was significantly better than Model ORI (P = 0.01) and Model MAN (P = 0.02) in the Internal Test Set 1. However, no significant difference between Model ORI and Model SEG was found in the Internal Test Set 2 (P = 0.58).

Table 2 Performance of different models

Full size table

Model SEG-NC achieved an AUC of 0.825 and ACC of 0.750 in the Internal Test Set 1 and an AUC of 0.735 and ACC of 0.724 in the Internal Test Set 2. Delong-test showed no significant difference between Models SEG and SEG-NC in the Internal Test Set 1 (P = 0.06) and Internal Test Set 2 (P = 0.92). Model SEG-CL had an AUC of 0.852 and ACC of 0.784 in the Internal Test Set 1 and an AUC of 0.840 and ACC of 0.800 in the Internal Test Set 2. Model SEG-CL-NC achieved 82.3% AUC (95% confidence interval [CI]: 72.6, 90.1), 77.6% ACC (95% CI: 69.8, 84.5), 82.7% sensitivity (95% CI: 74.0, 90.6), and 65.7% specificity (95% CI: 48.6, 81.2) in the Internal Test Set 1 and 83.6% AUC (95% CI: 74.9, 90.7), 78.1% ACC (95% CI: 70.5, 85.7), 82.5% sensitivity (95% CI: 74.1, 90.7), and 64.0% specificity (95% CI: 44.8, 82.4) in the Internal Test Set 2. Delong-test showed no significant difference between Models SEG-CL and SEG-CL-NC in the Internal Test Set 1 (P = 0.22) and Internal Test Set 2 (P = 0.82).

In addition, we found no difference between the AUCs and ACCs of Models SEG and SEG-CL in the Internal Test Set 1 (AUC P = 0.99; ACC P = 0.625) but a significant difference in the Internal Test Set 2 (AUC P = 0.004; ACC P = 0.03). Similarly, the AUCs and ACCs of Models SEG-NC and SEG-CL-NC did not differ in the Internal Test Set 1 (AUC P = 0.94; ACC P = 0.25) but significantly differed in the Internal Test Set 2 (AUC P = 0.01; ACC P = 0.03). Figure 3 shows confusion matrices for differentiating between benign and malignant PSTs in the test sets. Figure S1 shows the nomogram of Model SEG-CL-NC.

**Fig. 3: shows the confusion matrix for Model SEG-CL-NC in distinguishing benign from malignant PSTs.**

The total ACC of External Dataset was 0.734, the sensitivity was 0.680, and the specificity was 0.847. ACC of Center 2, 3, and 4 were 0.714, 0.740 and 0.756, sensitivity 0.690, 0.660, and 0.704, specificity 0.762, 0.917, and 0.857, respectively.

Performance of radiologist’s diagnosis

The diagnostic ACC values of the two residents and one junior attending physician were 0.819, 0.771, and 0.790, sensitivity values were 0.9, 0.925, 0.862, and specificity values were 0.56, 0.28, 0.56, respectively. Although the diagnostic ACC values of Models SEG-CL and SEG-CL-NC were slightly lower than those of the two residents and junior attending physician, the difference was nonsignificant (P > 0.05). The average time to diagnose a patient with a physician was 5.61, 4.42, and 2.94 min, respectively. However, the times required by Models SEG-CL and SEG-CL-NC to provide segmentation and classification results were only 2.8 and 2.1 s, which were significantly less than the time required by radiologists (Table 3).

Table 3 Results of physician-reading experiment

Full size table

Discussion

In this study, we developed an end-to-end DL model (Model SEG-CL-NC) for diagnosing benign and malignant PSTs using non-enhanced MRI. We evaluated its efficacy by comparing it with five other diagnostic models and with radiologists. Our findings demonstrated that Model SEG-CL-NC achieved comparable diagnostic accuracy to contrast-enhanced MRI and radiologists, with the added benefit of significantly shorter reading times compared to radiologists.

Patients with PSTs exhibit similar clinical and imaging features, posing challenges for preoperative diagnosis. Our study identified significant differences in sex, age, tumor size, and location between benign and malignant PSTs, consistent with previous studies^2,3,24. Compared with benign tumors, malignant PSTs are older in age, larger in size, occur more frequently in males, and are mostly located in the sacrum and ilium. These distinctions likely correlate with the diverse pathological profiles of benign and malignant PSTs. Malignant PSTs commonly include metastatic tumors, bone sarcomas, and chordomas, whereas benign tumors typically encompass giant cell tumors of the bone and neurogenic tumors. Our study highlights the enhanced performance of models that integrate clinical information. By integrating clinical data, our model aligns more closely with real-world clinical practice, potentially improving overall diagnostic accuracy and utility in clinical settings²⁵.

Owing to the large sizes of PSTs, the manual segmentation of lesions is time-consuming and is susceptible to interobserver variability^4,26,27. In this study, we used coarse labeled ROIs to train the model and refined labeled data to test the model. Our study demonstrated that the segmentation-based diagnostic model (Model SEG) outperformed both the diagnostic model based solely on original images (Model ORI) and the model relying on manual lesion delineation (Model MAN). Model SEG seamlessly integrated segmentation with diagnosis, eliminating the need for manual lesion delineation. This approach sets a precedent for future research, indicating that training models in this manner can enhance algorithm efficiency, reduce manual annotation costs, improve accessibility, and ensure ease of use in clinical applications.

Our results showed that the model based on non-enhanced MR images obtained an ACC comparable to that of the enhanced model. CET1-w may generate high-quality images of pelvic tumors, showing enhanced regions within tumors and distinguishing between necrotic tissues and solid tumors^4,28. However, the utility of MR-enhanced images in bone tumor treatment is primarily limited to guiding biopsy and planning tumor resection²⁹. In contrast, non-enhanced MRI scans are more commonly employed in clinical practice for diagnosing bone lesions. This approach is advantageous for patients who may be unwilling to undergo enhanced MRI due to factors such as fear of injections (especially children) or allergies to contrast media. Additionally, utilizing non-enhanced MRI can potentially reduce medical costs and shorten examination times, thereby enhancing overall efficiency.

Our proposed non-enhanced MRI-based model (Model SEG-CL-NC) achieves automatic lesion segmentation and diagnosis, providing an end-to-end diagnostic solution. Following a non-enhanced MR scan for patients suspected of PSTs in clinical settings, images are automatically transmitted to the radiologist’s diagnostic system. Our model then autonomously identifies and segments the lesion, providing a benign or malignant diagnosis promptly. This capability assists radiologists in making accurate diagnoses efficiently. Furthermore, our proposed Model SEG-CL-NC demonstrated performance comparable to that of radiologists while significantly reducing diagnostic time. In our hospital, which manages a substantial number of PST cases, the implementation of our model has enhanced physician efficiency, minimized the risk of misdiagnosing primary bone tumors, and facilitated personalized patient treatment. Malignant cases particularly benefit from the model by enabling more aggressive treatment strategies in clinical practice. Moreover, our model’s potential for generalization to other medical centers is promising, offering utility to physicians with varying levels of expertise in bone tumor imaging, including those in smaller hospitals¹². Although our model performed less effectively at other centers compared to our own, this discrepancy may be due to differences in scanners, acquisition parameters and inconsistent scanning protocols across centers. Specifically, our model integrates T1-w, T2-w, and DWI sequences, which enhances its performance. However, data from other centers might not include all three sequences simultaneously (e.g., Center 3 provided only T2-w and T1 TSE), leading to incomplete input sequences. Future research will address multicenter protocol variability through: 1) deep harmonization networks for parameter standardization, 2) sequence-robust transformers to handle partial inputs, and 3) federated calibration systems^30,31,32,33. Additionally, prospective multicenter trials will further validate these solutions for broader clinical implementation.

This study acknowledges several limitations requiring contextualization. First, excluding patients with incomplete or poor-quality images may introduce selection bias, potentially compromising real-world generalizability. Second, restricting radiologist comparisons to 4–6 year-experienced specialists precludes benchmarking against senior expertise; future trials will implement multi-tier radiologist assessment. Third, unexamined dimensions include formal cost-effectiveness analysis and quantification of segmentation stability through repeated measurements. Fourth, restricting analysis to single primary PSTs overlooks multifocal malignancies. Multiple PSTs are more common in malignancies (such as metastases, multiple myeloma, lymphoma, etc.) and are easier to diagnose. Finally, the inadequate assessment of cross-center imaging variability constrains the model’s generalizability. Future research will focus on addressing protocol variability across multicenter settings and validating the proposed approaches through prospective multicenter trials.

In conclusion, our end-to-end DL Model SEG-CL-NC exhibited diagnostic performance comparable to contrast-enhanced models and radiologists in distinguishing benign and malignant PSTs, which may provide an accurate, efficient, and cost-effective tool for clinical practice.

Methods

Patients and data acquisition

A total of 1211 patients with pathologically confirmed benign or malignant PSTs, treated at four hospitals between April 2011 and August 2024, were retrospectively analyzed.

Initially, we examined data from 1021 PST patients at our hospital (Center 1) for the period from April 2011 to May 2022. Patients from April 2011 to June 2021 were included in Internal Dataset 1, while those from July 2021 to May 2022 were included in Internal Dataset 2. Dataset 2, with its more recent data, provides a better foundation for the application of the model.

The inclusion criteria for Center 1 were as follows: 1) Single lesion was found on MRI; 2) Preoperative MRI included T1-w, T2-w, diffusion weighted imaging (DWI), and contrast-enhanced T1-weighted (CET1-w) images were complete; 3) Pathologically confirmed benign or malignant PSTs. Tumors classified as intermediate according to the WHO classification criteria were grouped as benign in this study^5,14. Exclusion criteria for Center 1 were as follows: 1) Multiple lesions (n _{center 1} = 50); 2) Incomplete enhanced MR sequence (n _{center 1} = 254); 3) Postoperative recurrence and severe image artifacts (n _{center 1} = 63).

Ultimately, 654 patients with PSTs from Center 1 were included in this study. Of these, 549 patients from Center 1 were assigned to Internal Dataset 1, which was further divided into 346 patients for the training set, 87 for the validation set, and 116 for Internal Test Set 1. Additionally, 105 patients from Center 1 were included in Internal Dataset 2, designated for Internal Test Set 2.

To further validate our model, data from 190 patients with PSTs at Centers 2, 3, and 4 were used as external test sets. All datasets adhered to the same inclusion and exclusion criteria as Center 1, except that incomplete MR sequences were not considered an exclusion criterion. This adjustment was made due to significant variations in scanning sequences between centers. Nine patients were excluded due to the presence of multiple lesions. Finally, 181 patients were included for external validation from Centers 2, 3, and 4, with 63 from Center 2, 77 from Center 3, and 41 from Center 4 (Fig. 4). Sex, age, tumor location, and maximal tumor size of the patients were also analyzed.

**Fig. 4: Patient selection flowchart.**

This retrospective study was approved by institutional review boards at four institutions, including Peking University People’s Hospital (Approval No.2020PHB293), Peking University Third Hospital (Approval No.M2023827), The First Affiliated Hospital of Guangxi Medical University (Approval No.2025-E0250), and The First Affiliated Hospital of Chongqing Medical University (Approval No.2023-139). Given the study’s retrospective nature and reliance on standard clinical protocols, the requirement for informed consent was waived by the Institutional Review Board. The study was conducted following the Declaration of Helsinki.

All images from Center 1 were acquired on the Signa HDxt 3.0 T (GE Healthcare), Signa EXCITE 1.5 T (GE Healthcare), and Discovery 750 3.0 T (GE Healthcare) MR image scanner. The acquisition parameters were as follows: axial T1-w liver acquisition with volume acceleration-flexible (LAVA-Flex) or axial T1-w FSE fs, repetition time (TR) = 3.8 ~ 700 ms, echo time (TE) = 1.7 ~ 7.8 ms, matrix = 288× 224 ~ 320 × 224, slice thickness = 4 ~ 7 mm, and field of view (FOV) = 38 × 38 cm ~ 42 × 42 cm. T2-w, TR = 2300 ~ 5119 ms, TE = 84.1 ~ 102.5 ms, matrix = 288 × 224 ~ 320 × 224, slice thickness = 6 ~ 7 mm, and FOV = 38 × 38 cm ~ 44 × 44 cm. DWI, b value = 1000, TR = 4800 ~ 5000 ms, TE = 59.2 ~ 60 ms, matrix = 128 × 128 ~ 160 × 160, slice thickness = 6 ~ 7 mm, and FOV = 36 × 36 cm ~ 44 × 44 cm. Axial CET1-w was performed following the intravenous injection of 0.2 mL/kg contrast medium (gadopentetate dimeglumine injection) with a manual push or high-pressure syringe, TR = 3.8 ~ 700 ms, TE = 1.7 ~ 7.8 ms, matrix = 288 × 224 ~ 320 × 224, slice thickness = 4 ~ 7 mm, and FOV = 38 × 38 cm ~ 42 × 42 cm.

All images from Center 2 were acquired on the Signa HDxt 1.5 T (GE Healthcare), Discovery 750 3.0 T (GE Healthcare), Discovery 750w 3.0 T (GE Healthcare), and uMR780 3.0 T (United Imaging Healthcare) MR image scanner from December 2014 to July 2024. The acquisition parameters were as follows: Axial T1 TSE: TR = 631 ms, TE = 11.1 ms, matrix = 320 × 256, slice thickness = 5 mm, FOV = 36 × 36 cm. Axial T2-w, TR = 2700 ~ 4939 ms, TE = 58 ~ 100 ms, matrix = 288 × 224 ~ 320 × 256, slice thickness = 4 ~ 7 mm, and FOV = 24 × 20 cm ~ 42 cm × 42 cm. DWI, b value = 800, TR = 3000 ~ 6650 ms, TE = 62.9 ~ 65 ms, matrix = 128 × 64 ~ 128 × 128, slice thickness = 4 ~ 6 mm, and FOV = 24 × 20 cm ~ 44 × 44 cm.

All images from Center 3 were acquired on the Signa HDxt 1.5 T (GE Healthcare), Signa Premier 3.0 T (GE Healthcare), Siemens Verio / Prisma 3.0 T (SIEMENS Healthcare), Siemens Altea 1.5 T (SIEMENS Healthcare), and Philips Achieva 3.0 T (Philips Healthcare) MR image scanner from May 2014 to June 2024. The acquisition parameters were as follows: axial T2-w, TR = 3040～5240 ms, TE = 64～130 ms, matrix = 288 × 192～400 × 306, slice thickness = 4 ~ 8 mm, and FOV = 32 × 32 cm ~ 64 × 64 cm. Axial T1 TSE: TR = 500～544 ms, TE = 8.6 ms, matrix = 280 × 312～512 × 370, slice thickness = 5 ~ 8 mm, FOV = 52.8 × 51.2 cm ~ 64 × 64 cm.

All images from Center 4 were acquired on the Signa HDxt 1.5 T (GE Healthcare), Discovery 750w 3.0 T (GE Healthcare), MAGNETOM_ESSENZA 1.5 T (SIEMENS Healthcare), and Skyra 3.0 T (SIEMENS Healthcare) MR image scanner from April 2013 to August 2024. The acquisition parameters were as follows: axial T2-w, TR = 2000 ~ 4970 ms, TE = 83 ~ 131 ms, matrix = 256 × 179 ~ 288 × 288, slice thickness = 4 ~ 5 mm, and FOV = 15 × 10.5 cm ~ 40 × 40 cm. Axial T1-w: TR = 150～630 ms, TE = 1.5 ~ 14 ms, matrix = 256 × 230～320 × 272, slice thickness = 5 ~ 7 mm, FOV = 19.1 × 12.7 cm ~ 28 × 19.6 cm. DWI, b value = 1000, TR = 3689 ms, TE = 75.4 ms, matrix = 112 × 114, slice thickness = 5 mm, and FOV = 11.2 × 11.4 cm.

Radiologist’s segmentation

The PSTs of the retrospective dataset were manually segmented using ITK-SNAP software version 3.6.0 (www.itksnap.org)³⁴. All regions of interest (ROIs) in the training set were coarsely labeled (5 layers above and below the largest level of the lesion), and ROIs in the internal test set were fine labeled (all layers of the lesion). All lesions in the retrospective dataset were carefully delineated along the edge of the lesion in each sequence by a musculoskeletal radiologist with 5 years of experience and validated by a senior musculoskeletal radiologist with 15 years of experience. All ROIs in the Internal Test Set 2 were not manually segmented, and the ROIs of each sequence finely labeled (all layers of the lesion) in the Internal Test Set 1 were used as the ground truth.

Model preprocessing

For segmentation model preprocessing, we first normalized the input MRI series and padded images with a width or height smaller than 224 to meet the crop requirement in the subsequent step. Then, we randomly selected eight slices with and without segmentation. After acquiring the slices, we used the slices above and below these images to produce three-channel images. Within each three-channel image, we randomly cropped it to 3 × 224 × 224 and then performed random flipping and rotation to decrease overfitting. The transformed image patches were fed into the segmentation model. During testing, each image slice was input into the segmentation model with its above and below slices without cropping.

For diagnostic model preprocessing, we first normalized the input images and then cropped the cuboid region with ROI from the MRI series to reduce irrelevant information. After determining the ROI, we randomly selected 12 slices and cropped them into 12 × 169 × 169 slices. Data augmentation techniques including random flipping and rotation were then performed on the cropped patches and finally sent to the classification model.

DL model development

This study compared six diagnostic models differentiated by input data sources: Original Image-based model (ORI, 1) trained directly on original images; Manual Annotation-based model (MAN, 2) utilizing radiologist annotations (manual lesion delineations); Segmentation-based model (SEG, 3) employing an automated segmentation model to eliminate manual delineation; Non-Contrast Segmentation-based model (SEG-NC, 4) applying the automated segmentation model to non-contrast MR sequences (excluding CET1-w); Clinical-feature augmented Segmentation-based model (SEG-CL, 5) combining the automated segmentation model with clinical features; and Non-Contrast Clinical-feature augmented Segmentation-based model (SEG-CL-NC, 6) integrating the automated segmentation model, clinical features, and non-contrast MR sequences (excluding CET1-w). See Table 4 for details.

Table 4 The inputs to different models

Full size table

The segmentation and diagnostic models were trained separately, and the optimal diagnostic model was trained according to the segmentation model’s predicted mask. In the segmentation stage, we adapted U-Net to be the structure of the segmentation model as it has shown promising segmentation performance in a recent study³⁵. We also used MobilenetV2 initialized with ImageNet pre-trained weights as the encoder to improve model performance. The 2.5D training method proposed by Lv et al.³⁶ was applied to train our segmentation model instead of 2D or 3D. The 2.5D segmentation method contained more spatial information than the 2D segmentation method. Given that some ROIs were large, 3D segmentation occupied large memory, which restricted model performance. All MRI sequences were input into the segmentation model for training after preprocessing.

In the diagnostic stage, given that all patients had undergone multi-sequence MRI scans, we selected a state-of-the-art CoaT-based transformer network together with multiple instance learning strategy proposed by Huang et al.³⁷ as our diagnostic model, which is proven efficient in multi-modality input classification problems. In Models SEG-CL and SEG-CL-NC, late fusion was used to combine the DL output with clinical information. Clinical information included age, sex, tumor size, and tumor location, and we executed logistic regression to fit DL and clinical features. A nomogram was drawn for the visualization of our logistic regression model. Figure 5 shows the schematic diagram of Model SEG-CL-NC. Detailed U-Net-based Network and CoaT-based Network structures are shown in supplementary Figures S2, S3, and S4.

**Fig. 5: presents the schematic diagram of Model SEG-CL-NC.**

Radiologist’s diagnosis

To compare the model’s diagnostic accuracy (ACC) with that of radiologists, two senior residents (YL and JWZ) and one junior attending physician (TYZ) independently diagnosed 105 patients in the prospective dataset without using our DL model. They had 4, 5, and 6 years of clinical experience, respectively. The original image of the patient is fed into the RadiAnt DICOM Viewer 2021.2.2 – Activation software. In order to be consistent with the input information from the model, the three viewers only saw the patient’s image, gender and age, and the rest of the clinical information was hidden. On the DICOM Viewer, the reader can also measure the maximum diameter of the lesion. Record the diagnostic result and the time spent after viewing the film. The patient’s diagnostic time was recorded from the time when the viewer opened the patient’s images to the time when the diagnosis was made. The diagnostic result was benign or malignant.

Statistical analysis

The performance of different models was assessed using the area under the receiver operating characteristic curve (AUC), ACC, sensitivity, and specificity values. The diagnostic time, ACC, sensitivity, and specificity were used as evaluation indexes for radiologist’s diagnosis. All models’ prediction probabilities were transformed to binary labels according to Youden’s index in validation sets³⁸. Statistical analysis was performed on R software (R Core Team) version 3.4.3. Two-sample t-test test was performed to compare continuous variables, while chi-squared test was used to classify variables between groups. Dice score and the intersection over union (IoU) were used for the evaluation of segmentations. All statistical tests were two-sided, and a P value less than 0.05 was considered statistically significant.

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

Code availability

The code required to reproduce these findings is available for download from https://github.com/chencancan1018/BoneTumorRecognition.

References

Siegel, R. L., Miller, K. D., Fuchs, H. E. & Jemal, A. Cancer statistics, 2022. CA Cancer J. Clin. 72, 7–33 (2022).
PubMed Google Scholar
Yin, P. et al. Radiomics models for the preoperative prediction of pelvic and sacral tumor types: a single-center retrospective study of 795 cases. Front. Oncol. 11, 709659 (2021).
Article PubMed PubMed Central Google Scholar
Yin, P. et al. Machine and deep learning based radiomics models for preoperative prediction of benign and malignant sacral tumors. Front. Oncol. 10, 564725 (2020).
Article PubMed PubMed Central Google Scholar
Yin, P. et al. A triple-classification radiomics model for the differentiation of primary chordoma, giant cell tumor, and metastatic tumor of sacrum based on T2-weighted and contrast-enhanced T1-weighted MRI. J. Magn. Reson. Imaging 49, 752–759 (2019).
Article PubMed Google Scholar
WHO Classification of Tumours Editorial Board eds. World Health Organization classification of soft tissue and bone tumours. 5th ed. (IARC Press, 2020).
Si, M. J. et al. Differentiation of primary chordoma, giant cell tumor and schwannoma of the sacrum by CT and MRI. Eur. J. Radio. 82, 2309–2315 (2013).
Article Google Scholar
Thornton, E. et al. Imaging features of primary and secondary malignant tumours of the sacrum. Br. J. Radio. 85, 279–286 (2012).
Article CAS Google Scholar
Gerber, S. et al. Imaging of sacral tumours. Skelet. Radio. 37, 277–289 (2008).
Article CAS Google Scholar
Olson, J. T., Wenger, D. E., Rose, P. S., Petersen, I. A. & Broski, S. M. Chordoma: 18F-FDG PET/CT and MRI imaging features. Skelet. Radio. 50, 1657–1666 (2021).
Article Google Scholar
Sambri, A. et al. Primary tumors of the sacrum: imaging findings. Curr. Med. Imaging 18, 170–186 (2022).
Article PubMed Google Scholar
Marmouset, D. et al. Characteristics, survivals and risk factors of surgical site infections after En Bloc sacrectomy for primary malignant sacral tumors at a single center. Orthop. Traumatol. Surg. Res. 108, 103197 (2022).
Article PubMed Google Scholar
von Schacky, C. E. et al. Multitask deep learning for segmentation and classification of primary bone tumors on radiographs. Radiology 301, 398–406 (2021).
Article Google Scholar
Zhao W. et al. GMILT: a novel transformer network that can noninvasively predict EGFR mutation status. IEEE Trans. Neural Netw. Learn. Syst. 35, 7324–7338 (2024)
Eweje, F. R. et al. Deep learning for classification of bone lesions on routine MRI. EBioMedicine 68, 103402 (2021).
Article PubMed PubMed Central Google Scholar
Liu, H. et al. Benign and malignant diagnosis of spinal tumors based on deep learning and weighted fusion framework on MRI. Insights Imaging 13, 87 (2022).
Article PubMed PubMed Central Google Scholar
Li, M. D. et al. Artificial intelligence applied to musculoskeletal oncology: a systematic review. Skelet. Radio. 51, 245–256 (2022).
Article Google Scholar
He, Y. et al. Convolutional neural network to predict the local recurrence of giant cell tumor of bone after curettage based on pre-surgery magnetic resonance images. Eur. Radio. 29, 5441–5451 (2019).
Article Google Scholar
Vogrin, M., Trojner, T. & Kelc, R. Artificial intelligence in musculoskeletal oncological radiology. Radio. Oncol. 55, 1–6 (2020).
Article Google Scholar
Liu, R. et al. A deep learning-machine learning fusion approach for the classification of benign, malignant, and intermediate bone tumors. Eur. Radio. 32, 1371–1383 (2022).
Article Google Scholar
He, Y. et al. Deep learning-based classification of primary bone tumors on radiographs: a preliminary study. EBioMedicine 62, 103121 (2020).
Article PubMed PubMed Central Google Scholar
Huang, P. Y. et al. Osteomyelitis of the femur mimicking bone tumors: a review of 10 cases. World J. Surg. Oncol. 11, 283 (2013).
Article PubMed PubMed Central Google Scholar
Langevelde, K. V., Vucht, N. V., Tsukamoto, S., Mavrogenis, A. F. & Errani, C. Radiological assessment of giant cell tumour of bone in the sacrum: from diagnosis to treatment response evaluation. Curr. Med. Imaging 18, 162–169 (2022).
Article PubMed Google Scholar
Ilse, M., Tomczak, J. M. & Welling, M. Attention-based deep multiple instance learning. Proc. 35th Int. Conf. Mach. Learn. PMLR 80, 2127–2136 (2018).
Google Scholar
Yin, P., Sun, C., Wang, S., Chen, L. & Hong, N. Clinical-deep neural network and clinical-radiomics nomograms for predicting the intraoperative massive blood loss of pelvic and sacral tumors. Front. Oncol. 11, 752672 (2021).
Article PubMed PubMed Central Google Scholar
Murphey, M. D. & Kransdorf, M. J. Staging and classification of primary musculoskeletal bone and soft-tissue tumors according to the 2020 WHO update, from the AJR special series on cancer staging. AJR Am. J. Roentgenol. 217, 1038–1052 (2021).
Article PubMed Google Scholar
Avanzo, M., Stancanello, J. & El Naqa, I. Beyond imaging: the promise of radiomics. Phys. Med. 38, 122–139 (2017).
Article PubMed Google Scholar
Larue, R. T., Defraene, G., De Ruysscher, D., Lambin, P. & van Elmpt, W. Quantitative radiomics studies for tissue characterization: a review of technology and methodological procedures. Br. J. Radio. 90, 20160665 (2017).
Article Google Scholar
Samji, K. et al. Comparison of high-resolution T1W 3D GRE (LAVA) with 2-point Dixon fat/water separation (FLEX) to T1W fast spin echo (FSE) in prostate cancer (PCa). Clin. Imaging 40, 407–413 (2016).
Article PubMed Google Scholar
Sundaram, M. The use of gadolinium in the MR imaging of bone tumors. Semin Ultrasound CT MR 18, 307–311 (1997).
Article CAS PubMed Google Scholar
Zhou, Y. et al. Development and validation of a deep learning-based framework for automated lung CT segmentation and acute respiratory distress syndrome prediction: a multicenter cohort study. EClinicalMedicine 75, 102772 (2024).
Article PubMed PubMed Central Google Scholar
Ye, Z. et al. Deep learning algorithms for melanoma detection using dermoscopic images: a systematic review and meta-analysis. Artif. Intell. Med. 155, 102934 (2024).
Article PubMed Google Scholar
Morelli, L. et al. Addressing intra- and inter-institution variability of a radiomic framework based on apparent diffusion coefficient in prostate cancer. Med. Phys. 51, 8096–8107 (2024).
Article PubMed Google Scholar
Lew, C. O. et al. Artificial intelligence outcome prediction in neonates with encephalopathy (AI-OPiNE). Radio. Artif. Intell. 6, e240076 (2024).
Article Google Scholar
Yushkevich, P. A. et al. User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage 31, 1116–1128 (2006).
Article PubMed Google Scholar
Chen, W. et al. Improving the diagnosis of acute ischemic stroke on non-contrast CT using deep learning: a multicenter study. Insights Imaging 13, 184 (2022).
Article PubMed PubMed Central Google Scholar
Lv, Peiqing, Wang, Jinke & Wang, Haiying 2.5D lightweight RIU-Net for automatic liver and tumor segmentation from CT. Biomed. Signal Process. Control 75, 103567 (2022).
Article Google Scholar
Huang, C. et al. Transformer-based deep-learning algorithm for discriminating demyelinating diseases of the central nervous system with neuroimaging. Front. Immunol. 13, 897959 (2022).
Article CAS PubMed PubMed Central Google Scholar
Ruopp, M. D., Perkins, N. J., Whitcomb, B. W. & Schisterman, E. F. Youden Index and optimal cut-point estimated from observations affected by a lower limit of detection. Biom. J. Biom. Z. 50, 419–430 (2008).
Article Google Scholar

Download references

Acknowledgements

This study received funding by the National Natural Science Foundation of China (NO.82001764), Peking University People’s Hospital Scientific Research Development Funds (RDY2020-08, RS2021-10), and Beijing United Imaging Research Institute of Intelligent Imaging Foundation (CRIBJQY202105).

Author information

These authors contributed equally: Ping Yin, Ke Liu, Runrong Chen.

Authors and Affiliations

Department of Radiology, Peking University People’s Hospital, 11 Xizhimen Nandajie, Xicheng District, Beijing, China
Ping Yin, Chao Sun, Ying Liu, Tianyu Zhang, Junwen Zhong, Xia Liu & Nan Hong
Department of Radiology, Peking University Third Hospital, Beijing, China
Ke Liu
Department of Radiology, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
Runrong Chen
Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
Yang Liu
Department of Radiology, Memorial Sloan-Kettering Cancer Center, 1275 York Ave, New York, NY, USA
Lin Lu
Infervision Medical Technology Co., Ltd., Ocean International Center, Beijing, China
Weidao Chen, Ruize Yu & Dawei Wang

Authors

Ping Yin
View author publications
Search author on:PubMed Google Scholar
Ke Liu
View author publications
Search author on:PubMed Google Scholar
Runrong Chen
View author publications
Search author on:PubMed Google Scholar
Yang Liu
View author publications
Search author on:PubMed Google Scholar
Lin Lu
View author publications
Search author on:PubMed Google Scholar
Chao Sun
View author publications
Search author on:PubMed Google Scholar
Ying Liu
View author publications
Search author on:PubMed Google Scholar
Tianyu Zhang
View author publications
Search author on:PubMed Google Scholar
Junwen Zhong
View author publications
Search author on:PubMed Google Scholar
Weidao Chen
View author publications
Search author on:PubMed Google Scholar
Ruize Yu
View author publications
Search author on:PubMed Google Scholar
Dawei Wang
View author publications
Search author on:PubMed Google Scholar
Xia Liu
View author publications
Search author on:PubMed Google Scholar
Nan Hong
View author publications
Search author on:PubMed Google Scholar

Contributions

P.Y. designed the study, collected and analyzed the data, and prepared and edited the paper. K.L. participated in the clinical research process and edited the paper. R.R.C. participated in the clinical research process and edited the paper. Y.L. participated in the clinical research process. L.L. revised the paper. C.S. participated in the clinical research process. Y.L. participated in the clinical research process. T.Y.Z. participated in the clinical research process. J.W.Z. participated in the clinical research process. W.D.C. took part in the processing of data analysis and statistics. R.Z.Y. took part in the processing of data analysis and statistics. D.W.W. took part in the processing of data analysis and statistics. X.L. participated in the process of clinical guidance and article preparation. N.H. designed the study, ensured the integrity of the whole study and revised the paper. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Ping Yin or Nan Hong.

Ethics declarations

Competing interests

All authors declare no financial or non-financial competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Yin, P., Liu, K., Chen, R. et al. End-to-end deep learning for the diagnosis of pelvic and sacral tumors using non-enhanced MRI: a multi-center study. npj Precis. Onc. 9, 286 (2025). https://doi.org/10.1038/s41698-025-01077-3

Download citation

Received: 24 November 2024
Accepted: 02 August 2025
Published: 15 August 2025
Version of record: 15 August 2025
DOI: https://doi.org/10.1038/s41698-025-01077-3