Fig. 2: Several Indicators in Training Procedure, using default configuration on MIT-BIH dataset.
From: Bridging clinical knowledge and AI: an interpretable transformer framework for ECG diagnosis

a Learning Rate: This plot displays the learning rate schedule used during training. It begins with a warm-up phase where the learning rate gradually increases, followed by a decay phase where it is slowly reduced. This strategy helps to ensure stable convergence. b Total Loss: The stacked area chart illustrates the composition of the total loss, which is a combination of Focal Loss and Knowledge Distillation (KD) Loss. Both loss components, and therefore the total loss, decrease steadily over the training epochs, indicating that the model is effectively learning to minimize its errors. c Training Accuracy: This plot shows a smooth and consistent increase in the model’s accuracy on the training data, eventually approaching saturation. This upward trend signifies that the model is successfully learning the patterns within the training dataset. d Testing Metrics: The final subplot presents a suite of evaluation metrics on the test set, including accuracy, precision, sensitivity, specificity, and the F1-score. All these metrics show a clear upward trend and converge to high values, demonstrating the model’s strong generalization capabilities and robust performance on unseen data.