Fig. 4: Tuberculosis diagnosis results for comparison with the other models and methods.
From: Self-evolving vision transformer for chest X-ray diagnosis through knowledge distillation

a Compared to other convolutional neural networks (CNN)-based models, the Vision Transformer (ViT) model showed a linear increase as well as the best performance among the models. b Unlike the model trained with the proposed framework, none of the existing self-supervised and semi-supervised methods showed a prominent improvement in performance with increasing time T. Data are presented with calculated area under the receiver operating characteristics curves (AUCs) in the study population (center lines) ±95% confidence intervals calculated with the DeLong's method (shaded areas). The AUCs of the proposed method were compared with the other models or methods at each time point T with the DeLong test to evaluate statistical significance, except for the T = initial where all methods of comparison start from the same baseline in b. * denotes statistically significant (p < 0.050) superiority of the proposed framework. All statistical tests were two-sided.