Table 2 Comparison of patient-level diagnostic performance between the AI model and PI-RADS in all datasets
AUC (95%CI) | Z value | P valuea | Sensitivity | Specificity | Accuracy | PPV | NPV | |
|---|---|---|---|---|---|---|---|---|
AI model | ||||||||
Training set | 0.94(0.93–0.95) | - | - | 0.86(1345/1567) | 0.92(2519/2727) | 0.90(3864/4294) | 0.87(1345/1553) | 0.92(2519/2741) |
Validation set | 0.88(0.85–0.91) | - | - | 0.90(239/267) | 0.74(157/211) | 0.83(396/478) | 0.82(239/293) | 0.85(157/185) |
Test set 1-5 | 0.93(0.91–0.95) | - | - | 0.90(319/355) | 0.87(419/484) | 0.88(738/839) | 0.83(319/384) | 0.92(419/455) |
Test set 1 | 0.93(0.90–0.96) | - | - | 0.93(111/119) | 0.81(103/127) | 0.87(214/246) | 0.82(111/135) | 0.93(103/111) |
Test set 2 | 0.86(0.81–0.92) | - | - | 0.83(64/77) | 0.76(60/79) | 0.79(124/156) | 0.77(64/83) | 0.82(60/73) |
Test set 3 | 0.90(0.79–1.0) | - | - | 0.83(10/12) | 0.86(25/29) | 0.85(35/41) | 0.71(10/14) | 0.93(25/27) |
Test set 4 | 0.92(0.84–0.99) | - | - | 0.79(15/19) | 0.91(29/32) | 0.86(44/51) | 0.83(15/18) | 0.88(29/33) |
Test set 5 | 0.96(0.94–0.99) | - | - | 0.93(119/128) | 0.93(202/217) | 0.93(321/345) | 0.89(119/134) | 0.96(202/211) |
Test set TCIA | 0.83(0.78–0.88) | 0.75(109/146) | 0.76(81/106) | 0.75(190/252) | 0.81(109/134) | 0.69(81/118) | ||
PI-RADS | ||||||||
Training set | 0.90(0.89–0.91) | 8.674 | <0.001 | 0.91(1423/1567) | 0.74(2028/2727) | 0.80(3451/4294) | 0.67(1423/2122) | 0.93(2028/2172) |
Validation set | 0.85(0.81–0.88) | 1.878 | 0.060 | 0.93(247/267) | 0.47(99/211) | 0.72(346/478) | 0.69(247/359) | 0.83(99/119) |
Test set 1-5 | 0.93(0.92–-0.95) | 0.274 | 0.784 | 0.98(347/355) | 0.65(316/484) | 0.79(663/839) | 0.67(347/515) | 0.98(316/324) |
Test set 1 | 0.91(0.88–0.95) | 0.827 | 0.408 | 0.97(116/119) | 0.64(81/127) | 0.80(197/246) | 0.72(116/162) | 0.96(81/84) |
Test set 2 | 0.90(0.85–0.94) | 1.343 | 0.179 | 0.99(76/77) | 0.41(32/79) | 0.69(108/156) | 0.62(76/123) | 0.97(32/33) |
Test set 3 | 0.93(0.87–1.0) | 0.662 | 0.508 | 1.00(12/12) | 0.62(18/29) | 0.73(30/41) | 0.52(12/23) | 1.00(18/18) |
Test set 4 | 0.93(0.86–1.0) | 0.403 | 0.687 | 1.00(19/19) | 0.62(20/32) | 0.76(39/51) | 0.61(19/31) | 1.00(20/20) |
Test set 5 | 0.96(0.94–0.98) | 0.003 | 0.998 | 0.97(124/128) | 0.76(165/217) | 0.84(289/345) | 0.70(124/176) | 0.98(165/169) |
Test set TCIA | 0.85(0.80–0.89) | 1.153 | 0.249 | 0.94(143/152) | 0.42(45/108) | 0.72(188/260) | 0.69(143/206) | 0.83(45/54) |