Table 3 Diagnostic performance of OvcaFinder and human readers using O-RADS

From: Development and validation of an interpretable model integrating multimodal information for improving ovarian cancer diagnosis

Internal dataset

Reader A

Reader B

Reader C

Reader D

Reader E

O-RADS

OvcaFinder

O-RADS

OvcaFinder

O-RADS

OvcaFinder

O-RADS

OvcaFinder

O-RADS

OvcaFinder

AUC

0.907 (0.860, 0.946)

0.971 (0.943, 0.993)

0.900 (0.838, 0.946)

0.980 (0.957, 0.999)

0.958 (0.919, 0.987)

0.981 (0.949, 0.998)

0.947 (0.907, 0.978)

0.978 (0.949, 0.994)

0.924 (0.874, 0.966)

0.976 (0.951, 0.999)

  p

0.002

1.50 × 10−3

0.120

0.056

0.007

Sensitivity (%)

97.3 (93.3, 100.0)

97.3 (93.3, 100.0)

96.0 (90.7, 100.0)

96.0 (89.3, 100.0)

93.3 (86.7, 98.7)

97.3 (93.3, 100.0)

97.3 (94.7, 100.0)

97.3 (92.0, 100.0)

97.3 (93.3, 100.0)

97.3 (93.3, 100.0)

  p

1.00

1.00

0.375

1.00

1.00

Specificity (%)

61.1 (49.1, 75.5)

81.5 (71.7, 92.5)

72.2 (60.4, 81.1)

92.6 (83.0, 98.1)

87.0 (79.3, 94.3)

92.6 (83.0, 98.1)

77.8 (66.0, 88.7)

83.3 (73.6, 90.6)

68.5 (54.7, 81.1)

83.3 (71.7, 94.3)

  p

0.013

9.77 × 10−4

0.375

0.549

0.057

Accuracy (%)

82.2 (77.3, 88.3)

90.7 (85.9, 94.5)

86.1 (81.3, 90.6)

94.6 (89.8, 97.7)

90.7 (85.2, 95.3)

95.4 (91.4, 97.7)

89.2 (84.4, 94.5)

91.5 (86.7, 95.3)

85.3 (80.5, 90.6)

91.5 (86.7, 95.3)

  PPV (%)

77.7 (73.3, 85.2)

88.0 (83.0, 94.9)

82.8(77.9, 88.1)

94.7 (88.9, 98.7)

90.9 (86.3, 96.0)

94.8 (88.9, 98.7)

85.9 (80.2, 92.4)

89.0 (84.1, 93.7)

81.1 (75.5, 87.5)

89.0 (83.3, 95.9)

  NPV (%)

94.3 (86.5, 100.0)

95.7 (89.6, 100.0)

92.9 (84.4, 100.0)

94.3 (86.2, 100.0)

90.4 (82.0, 97.9)

96.2 (90.9, 100.0)

95.5 (90.0, 100.0)

95.7 (88.5, 100.0)

94.9 (87.8, 100.0)

95.7 (89.1, 100.0)

External test dataset

  AUC

0.888 (0.847, 0.928)

0.941 (0.902, 0.966)

0.894 (0.854, 0.922)

0.935 (0.902, 0.967)

0.894 (0.855, 0.932)

0.942 (0.913, 0.971)

0.915 (0.882, 0.945)

0.943 (0.909, 0.968)

0.927 (0.890, 0.954)

0.946 (0.911, 0.971)

  p

0.006

0.005

0.008

0.025

0.149

 Sensitivity (%)

87.7 (79.0, 93.8)

88.9 (81.5, 96.3)

86.4 (79.0, 92.6)

87.7 (81.5, 92.6)

77.8 (69.1, 86.4)

87.7 (77.8, 95.1)

87.7 (80.3, 95.1)

88.9 (81.5, 95.1)

88.9 (79.0, 95.1)

88.9 (81.5, 93.8)

 

1.00

1.00

0.077

1.00

1.00

 Specificity (%)

70.6 (65.0, 75.5)

87.6 (83.7, 91.2)

81.4 (76.1, 85.6)

90.5 (86.9, 93.5)

89.2 (85.3, 92.5)

91.8 (88.6, 94.8)

81.7 (78.4, 86.0)

89.5 (85.6, 92.5)

86.0 (82.0, 89.2)

90.9 (87.3, 93.5)

 

3.07 × 10−12

8.36 × 10−6

0.185

6.96 × 10−5

0.004

 Accuracy (%)

74.2 (69.8, 78.0)

87.9 (83.7, 91.0)

82.4 (77.8, 86.1)

89.9 (86.1, 93.3)

86.8 (83.7, 89.9)

91.0 (87.9, 94.1)

83.0 (79.6, 86.6)

89.4 (85.8, 91.7)

86.6 (83.2, 89.4)

90.4 (87.6, 92.8)

  PPV (%)

44.1 (39.7, 48.7)

65.5 (57.6, 72.6)

55.1 (48.0, 61.5)

71.0 (62.6, 78.4)

65.6 (58.5, 74.2)

74.0 (67.0, 82.0)

55.9 (50.8, 62.0)

69.2 (61.4, 75.3)

62.6 (56.5, 68.9)

72.0 (65.2, 78.3)

  NPV (%)

95.6 (92.8, 97.8)

96.8 (94.6, 98.9)

95.8 (93.6, 97.7)

96.5 (94.6, 98.0)

93.8 (91.6, 96.1)

96.6 (94.1, 98.6)

96.2 (94.0, 98.4)

96.8 (94.7, 98.6)

96.7 (94.1, 98.5)

96.8 (95.0, 98.3)

  1. Data in parentheses are 95% confidence intervals; O-RADS Ovarian-Adnexal Reporting and Data System, AUC Area under the receiver operating characteristic curve, PPV Positive predictive value, NPV Negative predictive value. The p-values of AUC were calculated using the function ‘roc_test’ in the python package of pROC. The p-values of sensitivity and specificity were calculated via two-sided McNemar test.