Table 5 The comparison of performance between physicians and the algorithm regarding hip fracture and pelvic fracture.

From: A scalable physician-level deep learning algorithm detects universal trauma on pelvic radiographs

 

Hip fracture

Pelvic fracture

Acc.

Sen.

Spe.

Misdiagnosis detected

Acc.

Sen.

Spe.

Misdiagnosis detected

Our algorithm

0.995

1.00

0.99

 

0.945

0.92

0.97

 

ER physician 1a

0.915

0.94

0.89

0 (0%)

0.925

0.90

0.95

2 (4%)

ER physician 2

0.955

0.98

0.93

1 (2%)

0.865

0.78

0.95

8 (16%)

ER physician 3

0.970

1.00

0.94

0 (0%)

0.915

0.86

0.97

4 (8%)

ER physician 4

0.960

0.98

0.94

0 (0%)

0.905

0.86

0.95

4 (8%)

Mean

0.950

0.975

0.925

0.25 (0.5%)

0.903

0.850

0.955

4.5 (9.0%)

Radiologist 1

0.985

1.00

0.97

0 (0%)

0.940

0.88

1.00

3 (6%)

Radiologist 2

0.970

0.98

0.96

1 (2%)

0.925

0.86

0.99

4 (8%)

Orthopedics 1

0.960

1.00

0.92

0 (0%)

0.970

0.94

1.00

0 (0%)

Orthopedics 2

0.980

1.00

0.96

0 (0%)

0.925

0.86

0.99

4 (8%)

Mean

0.974

0.995

0.953

0.25 (0.5%)

0.940

0.885

0.995

2.75 (5.5%)

Resident 1a

0.955

0.98

0.93

0 (0%)

0.800

0.62

0.98

16 (32%)

Resident 2

0.970

1.00

0.94

0 (0%)

0.880

0.86

0.90

4 (8%)

Resident 3

0.995

1.00

0.99

0 (0%)

0.920

0.86

0.98

4 (8%)

Resident 4a

0.890

0.94

0.84

1 (2%)

0.810

0.68

0.94

13 (26%)

Resident 5

0.980

0.96

1.00

2 (4%)

0.910

0.86

0.96

4 (8%)

Resident 6a

0.930

0.94

0.92

0 (0%)

0.825

0.90

0.75

2 (4%)

Resident 7

0.930

0.92

0.94

3 (6%)

0.935

0.92

0.95

1 (2%)

Resident 8

0.940

0.92

0.96

1 (2%)

0.880

0.86

0.90

4 (8%)

Resident 9a

0.945

1.00

0.89

0 (0%)

0.920

0.90

0.94

2 (4%)

Resident 10

0.965

0.96

0.97

1 (2%)

0.910

0.86

0.96

5 (10%)

Resident 11

0.935

0.90

0.97

5 (10%)

0.940

0.90

0.98

2 (4%)

Resident 12a

0.885

0.78

0.99

10 (20%)

0.875

0.78

0.97

8 (16%)

Resident 13

0.975

1.00

0.95

0 (0%)

0.945

0.90

0.99

2 (4%)

Resident 14

0.990

1.00

0.98

0 (0%)

0.950

0.92

0.98

1 (2%)

Mean

0.949

0.950

0.948

1.64 (3.3%)

0.893

0.844

0.941

4.86 (9.7%)

  1. aSignificant difference between a physician and the algorithm on McNemar’s test. Abbreviations: Acc accuracy, Sen sensitivity, Spe specificity.