Fig. 9: The creation process of the CXR-AL14 dataset via the human-in-the-loop approach.

In the first stage, six expert radiologists (with more than 20 years of experience) collectively annotated the GT bounding boxes of all abnormalities on 8,000 CXRs, which were adopted to train a preliminary deep model and sent to CXR-AL14 dataset. In the second stage, predictive annotations for the new unannotated CXRs were generated by the preliminary model, and these annotations were randomly sent to twelve senior radiologists (with 10-20 years of experience) to correct the categories, sizes and localizations of the predicted bounding boxes. After first correction, the corrected annotations were then randomly dispensed to the six expert radiologists for second correction. The checks by two levels of radiologists are analogous to the routine workflow of radiologists interpreting CXRs. After expert correction, the annotations of CXR could be considered GT-level annotations. Then, the CXRs with corrected annotations were added to the training dataset for retraining the model and sent to the final dataset. After seven iterations, the CXR-AL14 dataset was created. GT: ground-truth.