Table 2 Performance comparison of radiomics and deep learning models across training, testing, and multicenter validation cohorts
Dataset | AUC (95% CI) | Sensitivity | Specificity | Accuracy | |
|---|---|---|---|---|---|
Radiomics_Peritumor | Training Cohort | 0.992(0.987–0.998) | 0.783 | 0.986 | 0.966 |
Testing Cohort | 0.709(0.605–0.814) | 0.154 | 0.988 | 0.895 | |
Validation Cohort Ⅰ | 0.659(0.554–0.763) | 0.095 | 0.956 | 0.895 | |
Validation Cohort Ⅱ | 0.633(0.533–0.732) | 0.070 | 0.993 | 0.781 | |
Validation Cohort Ⅲ | 0.602(0.489–0.716) | 0.080 | 0.943 | 0.850 | |
Validation Cohort Ⅳ | 0.685(0.558–0.812) | 0.130 | 0.992 | 0.865 | |
Radiomics_Tumor | Training Cohort | 0.995(0.991–0.999) | 0.783 | 0.957 | 0.966 |
Testing Cohort | 0.787(0.71–0.863) | 0.154 | 0.988 | 0.869 | |
Validation Cohort Ⅰ | 0.677(0.552–0.801) | 0.286 | 0.931 | 0.885 | |
Validation Cohort Ⅱ | 0.641(0.537–0.744) | 0.116 | 0.986 | 0.786 | |
Validation Cohort Ⅲ | 0.656(0.54–0.773) | 0.120 | 0.928 | 0.842 | |
Validation Cohort Ⅳ | 0.664(0.546–0.783) | 0.130 | 0.977 | 0.853 | |
Radiomics_Combined | Training Cohort | 0.996(0.993–0.999) | 0.750 | 0.991 | 0.967 |
Testing Cohort | 0.800(0.709–0.892) | 0.154 | 0.994 | 0.899 | |
Validation Cohort Ⅰ | 0.707(0.588–0.826) | 0.095 | 0.985 | 0.922 | |
Validation Cohort Ⅱ | 0.693(0.596–0.789) | 0.140 | 0.993 | 0.797 | |
Validation Cohort Ⅲ | 0.676(0.568–0.784) | 0.200 | 1.000 | 0.915 | |
Validation Cohort Ⅳ | 0.751(0.633–0.868) | 0.130 | 1.000 | 0.872 | |
Deep learning | Training Cohort | 1.000 | 1.000 | 0.891 | 0.995 |
Testing Cohort | 0.818(0.703–0.932) | 0.538 | 0.994 | 0.852 | |
Validation Cohort Ⅰ | 0.732(0.626–0.838) | 0.333 | 0.927 | 0.885 | |
Validation Cohort Ⅱ | 0.764(0.684–0.844) | 0.465 | 0.861 | 0.770 | |
Validation Cohort Ⅲ | 0.696(0.594–0.798) | 0.400 | 0.804 | 0.761 | |
Validation Cohort Ⅳ | 0.720(0.596–0.843) | 0.391 | 0.910 | 0.833 |