Table 3 Outcome of secondary diagnoses in a blind, randomized trial

From: Deep learning approach for screening neonatal cerebral lesions on ultrasound in China

Outcome

AI group (N = 757)

Junior radiologist group (N = 762)

Mean difference (95% CI)

P value

Initial result change

42 (5.5%)

158 (20.7%)

−15.2% (−18.5% to −11.9%)

<0.001

Gold standard Consistency

738 (97.5%)

746 (97.9%)

−0.4% (−2.1% to 1.2%)

0.595

Secondary diagnosis time(s), median (IQR)

6.00 (5.20–7.05)

8.00 (6.45–11.55)

−2.79 (−3.87 to −1.70)

<0.001

  1. Initial result change refers to the discrepancy between the secondary diagnosis by senior radiologists and the initial diagnosis made by AI or the junior radiologist group. Gold standard consistency refers to the agreement between the secondary diagnosis by senior radiologists and the gold standard diagnosis for each case. Secondary diagnostic time refers to the time spent by senior radiologists to make a diagnosis based on the provided materials. The superiority of initial result change and gold standard consistency was assessed using two-sided Pearson’s chi-square tests (df = 1). Secondary diagnostic time was compared using a one-sided Welch’s t-test under the directional hypothesis that AI assistance reduces diagnostic time. Source data are provided as a Source data file.