Table 2 AI performance on the internal test set across different sub-populations. We reported the AUROC of the AI system with 95% confidence intervals on the internal test set. The biopsied population only includes exams where at least one biopsy was recommended. We stratified exams based on patient age, mammographic breast density, and the manufacturer of the US devices. Mammographic breast density was categorized based on the BI-RADS standards69.

From: Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams

Population

AUROC (95% CI)

No. of breasts

No. of cancers

Overall population

0.976 (0.972, 0.980)

79,078

1248

Biopsied population

0.940 (0.934, 0.947)

12,973

1248

Age

   

 < 40 yrs old

0.969 (0.955, 0.982)

5176

72

 40 − 49 yrs old

0.970 (0.955, 0.986)

19,677

160

 50 − 59 yrs old

0.981 (0.975, 0.986)

24,142

292

 60 − 69 yrs old

0.980 (0.973, 0.985)

19,039

326

 ≥70 yrs old

0.969 (0.958, 0.981)

11,044

398

Breast density

   

 Entirely fatty

0.964 (0.942, 0.983)

1157

54

 Scattered fibroglandular densities

0.975 (0.961, 0.982)

19,199

441

 Heterogeneously dense

0.979 (0.974, 0.981)

47,255

610

 Extremely dense

0.964 (0.932, 0.973)

9398

90

 Unkown

0.970 (0.955, 0.983)

2069

53

Manufacturer

   

 GE

0.984 (0.968, 0.993)

5708

47

 Medison

0.990 (0.974, 0.996)

2673

13

 Philips

0.977 (0.970, 0.982)

28,943

412

 Siemens

0.974 (0.968, 0.980)

37,572

699

 Toshiba

0.986 (0.978, 0.992)

4180

77

Other

2

0