Table 1 Performance of the uncalibrated base (Deeplasia) and the Georgia-specific calibrated version (Deeplasia-GE) on the test set of the Georgian bone age dataset. Previous results for the performance in the RSNA, DHA, and GDBD datasets5 are provided as a reference. DHA: Los Angeles digital hand atlas, GDBD: German dysplastic bone dataset. MAD: mean absolute difference, RMSE: root mean squared error, RSNA: radiological society of North america. Lower MAD and RMSE indicate higher accuracy. bEstimated range for the accuracies of the assessed single raters.

From: Population-specific calibration and validation of an open-source bone age AI

Dataset

No. Ref. Ratings

n

Deeplasia (months)

Inter-rater (months)

MAD

RMSE

MAD

RMSE

Georgian

7

260

6.6 (base)

8.8 ([8.1, 9.6]) (base)

7.9

10.6

Georgian

7

260

5.7 (calibrated)

7.4 ([6.8, 8.1]) (calibrated)

RSNA11

6

200

3.9

5.1 (4.7, 5.7])

4.8–7.0b

-

DHA23

2

1383

5.8

7.7 ([7.4, 8.0])

4.4

7.0

GDBD5

2

702

6.0

7.7 ([7.3, 8.1])

9.5

12.8