Introduction

Prostate cancer (PCa) is the most common cancer and the second most deadly cancer in males worldwide1. An early diagnosis of prostate cancer is crucial for improving its prognosis. Currently, prostate-specific antigen (PSA) testing is the most widely used screening approach, while invasive prostate biopsy is still the gold standard for PCa2. However, PSA screening has limitations despite significantly improving the diagnosis of PCa. Clinical studies have shown that PSA testing has a predictive value of 25–40%, with limited specificity and sensitivity, resulting in overdiagnosis and overtreatment3,4. Moreover, many of these lesions are less invasive and their clinical significance remains unclear. PCa is characterized by strong heterogeneity and multifocal features, and treatment decisions are typically based on the lesion with the largest volume or the highest gleason score5. Therefore, early and accurate detection of clinically significant PCa (CsPCa), is important for effective treatment5.

With the continuous advancement of imaging technology, it is possible to enhance the specificity and sensitivity of PSA. Multiparametric magnetic resonance imaging MRI (mpMRI) has been widely used in the detection and staging of prostate lesions, as well as in guiding prostate biopsies, informing treatment options, and facilitating active surveillance7. Currently, mpMRI plays central role in the diagnostic pathway for suspected PCa8. The Prostate Imaging Reporting and Data System (PI-RADS) has greatly contributed to achieving these goals by enabling reliable identification of CsPCa requiring biopsy and facilitating lesion localization9. The PI-RADS 2.1 serves as the reference for risk stratification of PCa based on mpMRI. Suspected lesions are assigned scores ranging from 1 to 5 based on lesion location and image characteristics10. However, the interpretation of images based on the PI-RADS guidelines remains challenging due to interobserver variability, particularly for PI-RADS-3 lesions. Lesions with a PI-RADS score of 3 or higher usually undergo biopsy. However, PI-RADS-3 corresponds to CsPCa in less than 15% of patients10. Therefore, using mpMRI only to determine which patients should undergo biopsy is suboptimal11. Previous studies have shown that biparametric MRI, with its lower cost, no need of contrast agent, and shorter scanning time, is not inferior to, and even superior to, mpMRI in detecting PCa12,13. Consequently, there is an urgent need for clinical tools that can accurately identify PCa and minimize unnecessary biopsies.

Tumor risk stratification remains a challenging task due to the difficulties in interpreting mpMRI images. Machine learning (ML) has the potential to assist radiologists in assessing the invasiveness of indistinct lesions, reducing variability between observers. Previous studies have demonstrated the successful utilization of ML in prostate volume segmentation, lesion segmentation, and detection14. Accurate segmentation and volume estimation of the prostate can provide valuable information for the diagnosis and clinical treatment of hyperplasia and PCa. This can improve the treatment of hyperplasia, surgical planning and prognosis of PCa. Prostate segmentation is increasingly utilized for the diagnosis of PCa, particularly for MRI transrectal ultrasound (MRI/TRUS) fusion biopsy, as accurate prostate segmentation on MRI images is crucial for the interpretation of MRI/TRUS fusion biopsy results15. In addition to segmentation, prostate volume estimation is a useful indicator, especially in the context of BPH treatment, surgical planning, and PCa prognosis16. ML can serve as a valuable tool to address the high variability among readers in certain areas, such as the transitional zone (TZ). Previous studies have compared ML models with the PI-RADS score in evaluating the performance of lesion classification, but consensus has not been reached17,18,19. Some previous studies have combined the results of the PI-RADS score with ML models to distinguish PCa in a clinical setting, but these approaches still rely on subjective PI-RADS scoring and are not sufficient for clinical practice20,21,22.

Radiomics enables the extraction of high-throughput and quantitative image features from medical images. By employing ML algorithms, the radiomic features can be utilized to construct models that uncover information pertaining to tumor pathophysiology. This, in turn, aids in medical decision-making and enhances diagnostic capabilities. This study aimed to develop a clinical-radiomic model that integrates clinical variables and radiomic features to differentiate between—BPH and malignant PCa nodules.

Materials and methods

Patients

This retrospective study was approved by the Ethics committee of Zhongshan City People’s Hospital and the requirement for written informed consent was waived by the Human Research Ethics Committee of the Zhongshan City People’s Hospital. A total of 617 patients with prostatic nodules who underwent contrast-enhanced MRI at two tertiary hospitals between January 2018 and December 2020 were reviewed. The inclusion criteria were as follows: (1) patients with histologically confirmed hyperplasia or PCa identified through needle systematic biopsy underwent a technique known as cognitive fusion, in which the urologist performing the biopsy would estimate the location of regions of interest (ROIs) based on the imaging reviewed during the procedure; (2) patients who underwent contrast-enhanced MRI within one week prior to surgery or biopsy and (3) patients who did not receive any preoperative cancer-related treatments, such as radiotherapy, endocrine therapy, or chemotherapy. The exclusion criteria were as follows: (1) incomplete clinical data (n = 89); (2) patients who received radiotherapy, chemotherapy, or other treatments before contrast-enhanced MRI scans (n = 157); (3) MRI images with poor quality (n = 51); and (4) cases where the puncture site did not correspond well with the image (n = 69). Finally, 251 patients (mean age, 68.1 ± 9.1 years) were included. The dataset was comprised of two distinct groups: 171 patients from our institution, which were randomly assigned to a training cohort (70%, n = 119; 48 malignant and 71 benign cases) and an internal validation cohort (30%, n = 52; 21 malignant and 31 benign cases); an additional 80 patients from other collaborating hospitals (an external validation cohort: 29 malignant and 51 benign cases). Figure 1 shows the patient recruitment pathway and the inclusion and exclusion criteria.

Fig. 1
figure 1

Pathway for patient recruitment and inclusion/exclusion criteria.

Clinical characteristics

Clinical variables were collected from the medical record system, which mainly consisted of age, prostate volume (PV), PSA value, total PSA (tPSA), free PSA (fPSA), and the ratio of fPSA to tPSA (fPSA/tPSA). Besides, the length, width, and height of the entire prostate and the TZ were measured on the mpMRI. The transverse diameter (A) and anteroposterior diameter (B) of the TZ, as well as the transverse diameter (C) and anteroposterior diameter (D) of the entire prostate, were measured on a horizontal section. The superoinferior diameter of the transition zone (E) and the entire prostate (F) were measured on the sagittal plane. The PV was measured at the boundary of the prostate capsule, and the TZ volume (TZV) was measured at the boundary of the fibrous layer of the TZ. The PV and TZV were calculated as follows: (π/6) × anteroposterior diameter (cm) × transverse diameter (cm) × superoinferior diameter (cm). The peripheral zone volume (PZV) was calculated as the difference between the PV and TZV. PSA density (PSAD) was calculated as tPSA/PV, TZ-PSAD as tPSA/TZV, and PZ-PSAD as tPSA/PZV (or tPSA/PV-TZV).

MR imaging and image interpretation

All patients in the two centers were scanned using a 3.0T MR system (Achieva, Philips Medical Systems, Best, the Netherlands) with a 16-channel Sense Torso XL coil. The protocol included axial, coronal, and sagittal T2-weighted imaging (T2WI), axial T1-weighted imaging (TIWI), axial diffusion-weighted imaging (DWI), THRIVE, and post-contrast axial breath-hold dynamic contrast-enhanced (DCE) imaging performed with fat-suppressed e-THRIVE. A total of 20 dynamic enhanced prostate scans were performed, with a scanning time of two minutes. Contrast agent (Gadodiamide, MEDRAD Healthcare, 0.2 mmol/kg body weight) was administered intravenously at the end of the first scan, followed by a 20 ml saline flush at the same rate of 3.0 ml/s. The detailed acquisition parameters are presented in Table 1.

Table 1 The detailed acquisition parameters of mpMRI.

The images were independently evaluated by two radiologists with 8 years of experience in prostate MRI, and by a third radiologist with 10 years of experience in prostate MRI, using the PI-RADS V2.1 criteria. The radiologists were unaware of the histopathology results. In cases of disagreement between the two radiologists, a third radiologist was consulted to reach a consensus on the final PI-RADS V2.1 score. Specially, the descriptors for the third radiologist were moved to where the two radiologists discussed them being the consensus vote. The PI-RADS V2.1 scores were assessed based on the T2WI, DWI, and DCE-MRI sequences. If multiple lesions were present, the PI-RADS V2.1 score was determined based on the largest or most aggressive lesion.

Lesion segmentation

Figure 2 illustrates the workflow of this study. The manual segmentation of the prostatic nodule was carried out by an experienced radiologist (with 8 years of experience in prostatic disease diagnosis) using ITK-SNAP software. The region of interest (ROI) was manually delineated slice-by-slice on axial T2WI and apparent diffusion coefficient (ADC) images, encompassing the entire suspicious lesions. As for PCa, the entire lesion area, including the peripheral and transitional areas of cancer, was demarcated. As for BPH, the complete hyperplasia area was outlined, while avoiding the surrounding prostate capsule, peripheral blood vessels, seminal vesicle root, bleeding, calcification, and urethra. Afterwards, the delineated ROIs were transformed into three-dimensional volumes of interest (VOIs). To minimize potential bias, the segmentation results were independently validated by a radiologist with 10 years of experience in prostatic disease diagnosis.

Fig. 2
figure 2

Workflow of the study.

Image preprocessing

The N4 correction algorithm in the 3D Slicer software was used to eliminate MRI offset field artifacts and minimize the impact of RF field inhomogeneities and the MRI equipment itself. Then, the grayscale values of the MRI were normalized to a range of 0–255 by applying a min–max scaling technique. This processing step is implemented to mitigate variations in grayscale between different patients, acquisition times, and parameter settings, ensuring precise and dependable texture analysis. Lastly, the B spline interpolation algorithm was used to resample the ROI to a uniform size (1 * 1 * 1).

Radiomic feature extraction

Radiomic features were extracted from the segmented VOI in original images, Laplacian-of-

Gaussian (LoG) filter images, and wavelet filter images using the Pyradiomics v3.0 open source package. For the LoG filter, the sigma parameter was set to emphasize different levels of texture roughness, with sigma values of 1, 3, and 5 used to obtain filtered images with different textures. A bin width of 10 was selected for the wavelet filtering. The types of radiomic features were as follows: (1) shape-based features, (2) gray-level histogram-based features, (3) texture features, including gray-level co-occurrence matrix (GLCM), gray-level run length matrix (GLRLM), gray-level size zone matrix (GLSZM), gray-level dependence matrix (GLDM), and neighborhood gray-tone difference matrix (NGTDM), and (4) wavelet features. A total of 1130 radiomic features were extracted from each MRI sequence.

Feature selection and development of radiomic models

Feature selection and model construction were exclusively conducted on the training cohort. Initially, Pearson correlation coefficient (PCC) analysis was employed to obtain a feature set with minimal redundancy (correlation coefficient threshold set at 0.99). The yielded radiomic feature values were subsequently normalized using the Z-score. To further improve the model’s generalization ability and avoid overfitting, we ultimately applied the Recursive Feature Elimination (RFE) or Relief algorithm to obtain a subset of stable and reproducible radiomic features. After that, seven ML classifiers were compared to construct the radiomic models, that is, Random Forest (RF), Support Vector Machine (SVM), Least Absolute Shrinkage and Selection Operator (LASSO), Linear Discriminant Analysis (LDA), Naive Bayes (NB), Adaboost, and XGboost. Five-fold cross-validation was used for feature selection and optimization of the classification algorithm to identify the optimal radiomics model from 14 combinations. The model was evaluated using both the internal validation and external validation cohorts. The radiomic signature (ie, rad-score) was yielded by performing logistic regression analysis on the predicted probabilities generated by the radiomic model.

Combined model construction and evaluation

Univariate and multivariate analyses were conducted to identify independent risk factors (with both P < 0.05), which were then used to develop a clinical model. The combined model (or clinical-radiomic model) was established and visualized as a nomogram by incorporating significant clinical factors and rad-score based on a stepwise logistic regression model, with the Akaike information criterion (AIC) as the criterion for model selection. The models were developed on the training cohort and verified in both the held-out validation and external validation cohorts. The performance of the combined models was evaluated from three aspects: discrimination, calibration, and clinical validity. The discrimination ability was evaluated through receiver operating characteristic (ROC) analysis, which included calculations of the area under the curve (AUC), sensitivity, and specificity. Calibration curves with the Hosmer–Lemeshow (H–L) test were applied to assess the goodness of fit between the model-predicted probabilities and the observed event proportions. The clinical usefulness was evaluated using decision curve analysis (DCA) and clinical impact curve (CIC)23. DCA provides a visual representation of the net benefit of the model at various thresholds. The CIC was used to assess the clinical impact of the combined model by estimating the proportion of patients whose treatment plan would be altered based on the predictive results.

Statistical analysis

The characteristics of the patients were compared between the training and internal validation cohorts. Statistical differences were assessed using the Student’s t-test or the Mann–Whitney U test for normally distributed or non-normally distributed continuous variables, respectively. Categorical variables were analyzed using the Chi-squared test or Fisher’s exact test. Delong test was used to compare AUCs between models. All statistical analyses were conducted using SPSS (version 25.0; IBM, Armonk, NY, USA) and Python 3.7. A two-sided p-value less than 0.05 was considered statistically significant.

Ethics approval and consent to participate

Tis study complied with the declaration of Helsinki and was approved by the Human Research Ethics Committee of the Zhongshan City People’s Hospital. Informed consent was waived owing to the retrospective nature of the study and approved by the Human Research Ethics Committee of the Zhongshan City People’s Hospital.

Results

Patient characteristics

A total of 251 patients diagnosed with BPH or PCa were included in the study. Among them, 153 patients (61%) had benign target lesions, with an average age of 67.1 ± 8.8 years. The remaining 98 patients (39%) had malignant target lesions, with an average age of 69.7 ± 9.4 years. Table 2 shows there was no significant difference observed between the training and internal validation cohorts (all P > 0.05).

Table 2 Comparison of clinical features between the training and internal validation cohorts.

Clinical and radiomic models

Univariate logistic regression analysis revealed that PSA level (P = 0.004), fPSA (P = 0.044), fPSA/tPSA (P = 0.006), PZV (P = 0.024), PSAD (P = 0.001), TZ-PSAD (P = 0.001), PZ-PSAD (P = 0.031), and PI-RADS score (P = 0.002) were identified as potential factors. After multivariate analysis, fPSA/tPSA (P = 0.045), PZV (P = 0.041), PSAD (P = 0.002), and PI-RADS score (P = 0.003) were determined to be independent risk factors for malignant nodules. The clinical model was constructed based on these independent factors achieved an AUC of 0.857 (95%CI: 0.812- 0.902) in the training cohort, 0.814 (95%CI: 0.763–0.865) in the internal validation cohort, and 0.791 (95% CI: 0.742–840) in the external validation cohort (Table 3 and Fig. 3).

Table 3 Performance comparison of the clinical, radiomic, and clinical-radiomic models.
Fig. 3
figure 3

Performance of clinical, radiomic, and clinical-radiomic models for the diagnosis of malignant prostate nodules. (a) T2WI radiomic model and ADC radiomic model based on various combinations of feature selection and classification methods. (bd) ROC curves of the clinical model, radiomic model, and clinical-radiomic model in the training cohort, internal validation cohort, and external validation cohort, respectively.

A total of 705 radiomic features derived from T2WI images were retained after PCC analysis. The optimal T2WI-based radiomic model was constructed by combining the RFE filter and Adaboost classifier, utilizing 12 selected features (Table 4). In the internal and external validation cohorts, the T2WI radiomic model yielded an AUC of 0.825 (95% CI: 0.768–0.882) and 0.753 (95% CI: 0.700–0.806), respectively (Table 3 and Fig. 3). Similarly, for ADC-derived features, 706 features were preliminarily screened by PCC analysis. The best-performing ADC-based radiomic model was built by combining the Relief filter and LDA classifier, using 16 selected features (Table 4). This combined model achieved an AUC of 0.896 (95% CI: 0.859–0.933) and 0.818 (95% CI: 0.773–0.863) in the internal and external validation cohorts, respectively. The fusion radiomic model, which integrated T2WI and ADC rad-scores, obtained a higher accuracy with an AUC of 0.908 (95% CI: 0.871–0.945) and 0.841 (95% CI: 0.800–0.882) in the internal and external validation cohorts, respectively (Table 3 and Fig. 3).

Table 4 The radiomic features included in the ADC- and T2WI-based models.

Evaluation of the clinical-radiomic model

Multivariate analysis using the stepwise regression method was conducted on fPSA/tPSA, PZV, PSAD, PI-RADS score, and fusion rad-score. The results demonstrated that fPSA/tPSA (P = 0.023), PSAD (P = 0.016), PI-RADS score (P = 0.048), and fusion rad-score (P = 1.38e-07) were identified as independent risk factors (Table 5). The clinical-radiomic nomogram was developed using the following formula:

Table 5 Univariate and multivariate logistic regression analysis of risk factors for malignant nodules.

\(- \,{4}.{541}\, - \,{5}.{512}\,*\,{\text{fPSA}}/{\text{TPSA}}\, + \,0.{996}\,*\,{\text{PSAD}}\, + \,0.{546}\,*\,{\text{PI}} - {\text{RADSscore}}\, + \,{5}.{785}\,*\,{\text{fusion }}\,{\text{rad - score}}\)The combined model produced an AUC of 0.925 (95% CI: 0.894–0.956) and 0.872 (95% CI: 0.837–0.907) in the internal and external validation cohorts, respectively (Table 3). In the internal validation cohort, the sensitivity and specificity values were 76.2% and 90.3%, while in the external validation cohort, they were 82.8% and 76.5% (Table 3). The calibration curves suggested strong concordance between the predicted probabilities from the model and the actual event proportions in both the internal validation cohort (H–L test, P = 0.476) and the external validation cohort (H–L test, P = 0.210) (Fig. 4). The higher position of our model’s curve relative to the “All” line across a wide range of threshold probabilities in Fig. 5a indicates a more accurate discrimination between those who will benefit from the intervention and those who will not. CIC plot in Fig. 5b showed that at the selected risk threshold of 0.49, 419 out of 1000 patients were considered to have a high risk of developing disease progression and 310 developed disease progression. The results further confirmed the clinical usefulness of the nomogram.

Fig. 4
figure 4

The clinical-radiomic nomogram and calibration curves. (a) Nomogram integrates fPSA/tPSA, PSAD, PI-RADS score, and rad-score; (bd) Calibration curves for the training, internal validation, and external validation cohorts.

Fig. 5
figure 5

Clinical usefulness of the clinical-radiomic nomogram. (a) clinical decision curve; (b) clinical impact curve.

Dicussion

In this current study, we developed multiple models for diagnosing malignant prostate nodules, including a clinical model, ADC-radiomic model, T2WI-radiomic model, fusion radiomic model, and clinical-radiomic model. Ultimately, the clinical-radiomic model, which incorporated fPSA/tPSA, PSAD, PI-RADS score, and fusion rad-core (combining ADC and T2WI-based rad-score), demonstrated the highest performance. The model yielded an AUC of 0.925, sensitivity of 76.2%, and specificity of 90.3% in the internal validation cohort. In the external validation cohort, the combined model achieved an AUC of 0.872, sensitivity of 82.8%, and specificity of 76.5%.

Several studies have used clinical, radiological, or radiomic features to diagnose prostate lesions. However, consensus on clinical and radiographic indicators has not been reached in previous studies8,9,17,18,19,20,21,22. This lack of consensus may be attributed to different target populations, random sampling bias, and limited sample sizes. In our work, we have identified four clinico-radiological features that aid in diagnosing PCa, which can help mitigate overdiagnosis and overtreatment24,25. Accumulated evidence suggests that tumors derived from the TZ exhibit distinct cytohistological characteristics, resulting in a lower incidence of Gleason score, prostate extravasation, seminal vesicle invasion, and biochemical recurrence in this region26,27,28,29,30. TZ-PSAD is calculated using a comprehensive formula that includes PSA, TZV, and PZV. Aurelia F et al. concluded that TZ-PSAD was more closely associated with tumor aggressiveness than PSAD31. In our study, TZ-PSAD had higher diagnostic efficiency compared to PSAD and PZ-PSAD. PI-RADS is a standardized MRI assessment method widely utilized for evaluating prostate lesions and is highly effective in diagnosing PCa32. While the PI-RADS score is commonly used in clinical practice, it cannot be used as the sole basis for biopsy or follow-up evaluation. The PI-RADS score exhibits advantages in diagnosing CsPCa, but its accuracy is dependent on observer experience and expertise. Therefore, relying solely on the PI-RADS score for predicting biopsy results has certain limitations33.

With the advancements in computer-assisted methods, radiomic analysis has been employed to diagnose prostate lesions. MRI allows for multimodal and multidirectional evaluation of prostate lesions. MRI can provide a more comprehensive description of soft tissue characteristics, atomic density, and lesion enhancement compared to CT. Contrast enhanced imaging can offer additional functional information. Min et al. utilized radiomic signature to differentiate between CsPCa and clinically insignificant PCa34. Woźnicki et al. added the PI-RADS score into a radiomic model for PCa detection and classification20. Despite the high diagnostic performance of the final models derived from these studies, achieving standardization and uniformity remains challenging due to the diversity of research methods. This is one of the major obstacles currently faced by radiomics. Until the issue of standardization is resolved, widespread implementation of radiomics for disease diagnosis will be highly challenging.

Optimizing the diagnostic efficacy of the radiomic model is crucial for accurately identifying BPH and PCa—lesions preoperatively, which holds great significance for PCa patients and directly impacts the disease prognosis. The primary advantage of radiomics lies in its ability to reduce subjectivity and reliance on empirical knowledge, enabling efficient automatic identification of benign and malignant prostate nodules. To construct the radiomic model, only MRI images and other relevant variables are required as inputs. This remarkable efficacy, coupled with high efficiency, serves as the primary driving force behind the integration of artificial intelligence in the field of medicine. Additionally, we assessed the added value of clinical variables and radiological features to the fusion radiomic model based on T2WI and ADC, and the results were satisfactory. This indicates that the explainable features utilized in routine clinical practice provide valuable information for diagnosing PCa.

Some limitations of this study should be noted. First, due to its retrospective design, patients who were clinically suspected to have “malignant nodules” in the prostate but did not undergo needle biopsy were excluded, which may introduce potential selection bias and compromise the reproducibility and comparability of the results. Larger cohorts are needed to validate our findings. Second, the Glesson score was not considered in this study, but it is the focus of our future research. Third, this study only analyzed several commonly used dimensionality reduction modeling methods and did not comprehensively investigate existing dimensionality reduction modeling methods.

Conclusion

In conclusion, we have established several models for preoperative diagnosis of prostate lesions and have compared the diagnostic effects of these models, thereby providing a preferred method for clinical application. Additionally, we have derived an optimal rad-score for the clinical-radiomic nomogram. This study not only demonstrates the feasibility of applying radiomics to noninvasive preoperative diagnosis of the prostate but also aims to determine the best modeling method and systematic research approach in radiomic research, thereby providing a foundation for the standardization of radiomics. Moving forward, further relevant studies are needed to explore the standardization of radiomics, enabling the translation of radiomics as a non-invasive and useful tool into clinical practice.