Fig. 1: SleepXViT Overview.
From: Explainable vision transformer for automatic visual sleep staging on multimodal PSG signals

Raw PSG signal values are converted into standardized PSG images. The Intra-epoch ViT learns features from these images and includes a sleep stage classifier. The Inter-epoch ViT processes sequences of feature embeddings from the Intra-epoch ViT for l consecutive epochs, analyzing the relationships between epochs within the sequence to predict sleep stages for each epoch. SleepXViT not only predicts sleep stages but also provides explanations for them:1) a confidence score indicating the certainty of the prediction, 2) a visual explanation by heatmap that highlights crucial parts of the PSG image for decision-making, and 3) the impact of multi-epoch sequence on the prediction.