Abstract
Distinguishing epileptic seizures from parasomnias is challenging due to overlapping motor features. This study evaluated a SlowFast deep learning model using video recordings of 167 individuals to classify Sleep-Related Hypermotor Epilepsy, Disorders of Arousal, and REM Sleep Behavior Disorder. The model achieved a mean accuracy of 83.3% across three data splits. This work represents an initial step toward developing automated tools to support clinicians in assessing sleep-related motor events.
Diagnosing sleep-related paroxysmal motor events accurately remains a significant clinical challenge, particularly when differentiating epileptic seizures from parasomnias1. Although these conditions are distinct in terms of underlying mechanisms, their external manifestations during sleep often share overlapping motor characteristics, leading to potential diagnostic confusion. Experienced clinicians rely on comprehensive clinical history, video-polysomnography, and prolonged video-EEG recordings to make accurate distinctions. However, these methods can be resource-intensive, time-consuming, and prone to variability between observers, particularly in borderline cases or in institutions lacking subspecialty expertise1.
The clinical overlap between disorders such as Sleep-Related Hypermotor Epilepsy (SHE), Disorders of Arousal (DOA), and REM Sleep Behavior Disorder (RBD) has been well documented. For example, episodes in both SHE and parasomnias may present with complex motor behaviors, including sudden arousals, limb movements, or vocalizations, complicating the diagnostic process2,3,4. This is particularly relevant in children and young adults, where semiologic differences can be subtle.
Recent advances in artificial intelligence have introduced the possibility of supporting this diagnostic process. Video-based action recognition methods have gained traction, leveraging deep learning to extract motion patterns from raw video data without the need for wearable sensors or external markers5. These approaches offer the potential to streamline diagnostic workflows, enhance reproducibility, and support clinicians, especially in environments lacking full neurophysiological monitoring5,6,7,8,9,10.
Building on our earlier pilot work, which highlighted that SHE and DOA could be distinguished using automated video classification5, we now extend this framework by incorporating REM Sleep Behavior Disorder (RBD) alongside SHE and DOA and leveraging a larger and more heterogeneous dataset. In this multicenter study, we analyzed a dataset comprising 253 annotated video recordings from 167 participants. The recordings were acquired under heterogeneous conditions, further reflecting real-world clinical variability. As an additional advancement over our previous work, we employed the SlowFast neural network architecture, which combines dual temporal resolution pathways to analyze both fast and slow visual cues, thereby capturing a wide range of motor patterns11. This approach was evaluated as a fully automated video-based classifier of SHE, DOA, and RBD. The complete overview of the workflow for this study is shown in Fig. 1.
A Sketch of the video acquisition setup. B Schematic representation of the SlowFast network, illustrating its dual-pathway design: the slow pathway processes a temporally down-sampled sequence to capture overall spatial context (height H, width W, time T, channels (C), while the fast pathway operates at a higher temporal resolution to capture more rapid motion patterns. Features from both pathways are fused to generate the final classification output. C Test procedure, showing how the trained network provides the final classification (SHE/DOA/RBD) for each input video.
To determine the most effective model architecture for this classification task, we first benchmarked several leading 3D convolutional neural networks, including Temporal Segment Networks (TSN)12 and the R2 + 1D model13. However, these models demonstrated limited accuracy, generally around 50%, and were therefore deemed inadequate.
The SlowFast model11 was selected as the optimal solution based on its performance. It was tested across three independently constructed data splits, where no individual’s data appeared in more than one set (train/validation/test) for each split, particularly the test sets contain a single video per participant. All reported metrics therefore reflect patient-level classification performance. These partitions ensured that the evaluation was robust against overfitting and participant-specific bias.
As can be seen in Table 1, across the three validation splits, the model achieved a mean classification accuracy of 83% ± 3.6%, with a 95% Wilson confidence interval of 73–90%, with consistently high performance in identifying SHE (mean F1 = 88%) and slightly lower but comparable precision for DOA (F1 = 79%) and RBD (F1 = 83%). The confusion matrix in Fig. 2 highlights this pattern, showing that most errors occurred between DOA and RBD, reflecting their clinical and motor overlap. Performance was most stable across splits for SHE (recall = 92%), while greater variability was observed for RBD (recall range 62–100%). In addition to recall and F1 trends, the model achieved consistent overall specificity across splits (Split 1: 93.7%, Split 2: 91.7%, Split 3: 89.6%; overall: 91.7%), indicating stable performance in correctly rejecting non-target classes. A slight reduction in overall accuracy was observed in Split 3 (79%), mainly due to misclassifications between DOA and RBD, likely related to borderline or atypical examples within this subset. Both DOA and RBD can present with overlapping or ambiguous motor manifestations, particularly when dream-enactment–like or subtle motor behaviors occur. In Split 3, the test data included cases with greater variability in movement patterns: several DOA episodes displayed complex motor behaviors partly resembling RBD, while some RBD cases were characterized by limited or less distinctive activity. This heterogeneity likely contributed to the reduced discriminability between the two classes. Nevertheless, performance remained stable across the other splits, supporting the robustness of the proposed model despite interindividual variability in behavioral expression.
Rows represent the actual classes, and columns represent the predicted classes. SHE: Sleep-Related Hypermotor Epilepsy, DOA: Disorders of Arousal, RBD: REM Sleep Behavior Disorder.
Two false-negative SHE cases were identified, both in Split 2: one from a very young participant misclassified as RBD, and another misclassified as DOA. The first involved brief myoclonic jerks resembling RBD-like twitches, while the second showed agitated movements followed by sitting up and an attempt to get out of bed, mimicking a confusional arousal or sleepwalking episode as can be seen in Fig. 3. This qualitative visualization provides an example of overlap in motor patterns, such as partial arousals or complex motor sequences, that can lead to model confusion between epileptic and parasomnic events. Despite these challenges, the model demonstrated robust and generalisable performance across all splits, particularly in distinguishing SHE from parasomnias.
Representative anonymized frames show a Sleep-Related Hypermotor Epilepsy (SHE) episode characterized by slow, agitated movements followed by partial rising and an attempt to leave the bed.
To assess inter-center generalization, we conducted an additional experiment excluding all RBD videos from one of the centers during training and validation, using 8 of them randomly selected solely for testing. In this configuration, the model achieved an overall accuracy of 83% (20/24 videos correctly classified). All SHE cases were correctly identified, while two RBD episodes were misclassified as DOA, and two DOA were misclassified, one as RBD and one as SHE. This finding indicates that the model retains good generalization capability when applied to data from an unseen clinical site.
This study highlights that deep learning, when applied to nocturnal video recordings, can offer a reliable, automated method for classifying three major categories of sleep-related motor disorders: SHE, DOA, and RBD. The application of the SlowFast architecture, with its dual temporal pathway design, was especially effective in extracting complex motor features spanning multiple time scales. Compared to other 3D CNNs tested, the SlowFast model delivered superior performance and generalizability.
One of the model’s strongest results was in the classification of SHE, which was consistently identified with high precision across all test splits. This is notable because SHE is often difficult to diagnose due to its behavioral overlap with parasomnias. The model’s accuracy in this regard underscores its potential role as a diagnostic aid, especially in cases where expert neurophysiologic interpretation may not be available. However, the model showed reduced accuracy in distinguishing between DOA and RBD, particularly in Split 3. This limitation mirrors clinical challenges, where these parasomnia types often require careful consideration of contextual factors such as sleep stage, age, comorbidities, or even associated vocalizations, none of which were available to the model in this study. This underscores the value of a multi-modal approach and highlights opportunities for future development.
To further explore model robustness across acquisition sites, we performed a complementary analysis excluding RBD recordings from one center (Bellaria Hospital in Bologna) during training and validation and considering them only during test. The model maintained satisfactory performance (83% accuracy) comparable to the overall performance of the three original splits, correctly identifying all SHE events. This result supports the potential generalizability of video-based models to unseen clinical environments, while emphasizing the need for more balanced multi-center datasets to fully assess inter-site performance. A complete leave-one-center-out analysis was not feasible at this stage due to the imbalanced distribution of participants across classes among centers, an aspect that we plan to address in future work. Nonetheless, our dataset, drawn from multiple sleep centers using heterogeneous recording protocols and equipment, provides a realistic and ecologically valid testbed for evaluating generalizability. The variability in video quality, lighting, and resolution adds robustness to our findings, suggesting that similar models could be deployed across diverse clinical settings without extensive recalibration.
Future work will expand the dataset to include additional paroxysmal events and continuous overnight recordings, enabling assessment of age-related variability, event detection performance, and false positive rates across entire nights. Additional data acquisition will also be necessary to balance the number of individuals per class across centers, ensuring a more even class distribution and allowing for a meaningful leave-one-center-out analysis. It will also be of interest to explore multimodal approaches, first by integrating audio signals to capture vocalizations, then incorporating textual information such as demographic data and physicians’ reports and ultimately extending the analysis to include EEG recordings. The integration of these complementary data sources is expected to enhance the model’s accuracy and overall diagnostic reliability. Finally, future work should also investigate the impact of varying the dimensions of the two pathways in the SlowFast network on model accuracy. For prospective applications, automated anonymisation and controlled access pipelines will be implemented to ensure data privacy and reproducibility across centers.
In summary, our findings represent a promising proof of concept that warrants prospective and on-site validation. When validated further, such tools could assist in triage, diagnosis, or longitudinal monitoring of people with suspected nocturnal motor events, reducing diagnostic delays and relieving the burden on expert clinical teams.
Methods
Dataset
This retrospective study was conducted using video recordings acquired from five centers: Niguarda Hospital and IRCCS San Raffaele Hospital in Milan, Giannina Gaslini Hospital in Genoa, the Neurocenter of Southern Switzerland in Lugano, and Bellaria Hospital in Bologna. Ethical approval was granted by the Niguarda Hospital ethics committee (ID 939–12.12.2013), and all participants or their guardians provided written informed consent to the use of their recordings for research purposes.
The dataset included 253 video clips from 167 participants: 73 diagnosed with SHE, 53 with DOA, and 41 with RBD. Recordings were acquired using a variety of video-polysomnographic setups and reflect wide heterogeneity in temporal resolution (24–30 frames per second), camera angle, lighting, and background. Event durations ranged from 3 to 138 s, with a mean duration ± standard deviation of 28 ± 22 s. Event annotation was performed independently by two experienced experts at each participating center. A third senior expert subsequently reviewed all annotated videos across centers to ensure inter-center consistency. Only unequivocal events from patients with confirmed diagnoses, based on comprehensive clinical, neurophysiological, neuroradiological (when needed), and follow-up data, were included. All annotations corresponded to diagnostically certain events, and full agreement was reached among raters; therefore, no formal inter-rater reliability statistics were computed.
Pre-processing
Only minimal preprocessing was applied: videos were resized to 224 × 224 pixels to fit the input to the SlowFast model, kept at their original frame rates, and uniformly subsampled to 32 frames and 8 frames to match, respectively, the fast and slow pathways of the SlowFast network input while preserving the original temporal dynamics. This approach was intended to assess model robustness under heterogeneous recording conditions.
Deep learning models
We treated the classification task as a multiclass action recognition problem14. Several deep learning architectures were evaluated. At first, the Temporal Segment Network (TSN)12 served as a baseline 2D CNN model, aggregating frame-level features over time to capture coarse motion dynamics. We next tested the R(2 + 1)D architecture13, a 3D convolutional model that decomposes spatiotemporal filters into separate spatial and temporal components, allowing finer motion modelling. Finally, due to poor performance from baseline models, we adopted the SlowFast network11, which employs two parallel pathways operating at different temporal resolutions: a slow branch for detailed spatial semantics and a fast branch for rapid motion cues. The slow pathway processed low-frequency spatial patterns (i.e., contextual clues) by sampling 8 temporally spaced frames, while the fast pathway handled 32 densely sampled frames to capture short-term motion. Frame selection was dynamically adapted to each video’s duration. The network was initialized with pretrained weights from the Kinetics-400 dataset15.
Data splitting strategy
To ensure unbiased generalization, we adopted a three-split cross-validation design at the participant level. For each split, data were divided into training, validation, and test sets, ensuring that no participant contributed to more than one set. As stated in Table 2, the training set comprised 119 participants (205 videos) and each individual contributed between 1 and 4 recordings (median = 1, range = 1–4), while the validation and test sets contained 8 participants each (8 unique videos), with one video per participant and no repetition of clips across or within splits. This approach produced three independent train–validation–test configurations, each with a distinct validation and test cohort, allowing us to evaluate model stability and generalization across different participant compositions. Hyperparameters were tuned using the validation performance, while the final classification accuracy was computed as the average across the three test splits. To mitigate class imbalance, we employed a class-weighted focal loss during training.
To further assess inter-center generalization, we conducted an additional preliminary leave-one-center-out (LOCO) experiment. Although a full LOCO analysis is not feasible at this stage due to the strong imbalance in class distributions across centers (an aspect we plan to address in future work), we nonetheless performed a targeted experiment to obtain an initial indication of cross-site generalizability. In this experiment, all RBD videos from one center (Bellaria Hospital in Bologna, 14 clips from 14 unique participants) were excluded from training and validation, with 8 of them randomly selected and used exclusively for testing. This configuration simulated a previously unseen acquisition environment, providing an independent evaluation of the model’s robustness across clinical sites.
Data availability
Raw video data cannot be shared publicly due to regulations imposed by the Ethical Committee. Weights of the trained deep learning architectures and the code can be made available upon reasonable request to the corresponding author (matteo.moro@unige.it).
Code availability
Weights of the trained deep learning architectures and the code can be made available upon reasonable request to the corresponding author (matteo.moro@unige.it).
References
Montini, A., Loddo, G., Baldelli, L., Cilea, R. & Provini, F. Sleep-related hypermotor epilepsy vs disorders of arousal in adults: a step-wise approach to diagnosis. Chest 160, 319–329 (2021).
Vignatelli, L. et al. Interobserver reliability of video recording in the diagnosis of nocturnal frontal lobe seizures. Epilepsia 48, 1506–1511 (2007).
Derry, C. P., Harvey, A. S., Walker, M. C., Duncan, J. S. & Berkovic, S. F. NREM arousal parasomnias and their distinction from nocturnal frontal lobe epilepsy: a video EEG analysis. Sleep 32, 1637–1644 (2009).
Loddo, G. et al. Seizures with paroxysmal arousals in sleep-related hypermotor epilepsy (SHE): dissecting epilepsy from NREM parasomnias. Epilepsia 61, 2194–2202 (2020).
Moro, M. et al. Automatic video analysis and classification of sleep-related hypermotor seizures and disorders of arousal. Epilepsia 64, 1653–1662 (2023).
Ahmedt-Aristizabal, D. et al. Understanding patients’ behavior: vision-based analysis of seizure disorders. IEEE J. Biomed. Health Inform. 23, 2583–2591 (2019).
Ahmedt-Aristizabal, D. et al. Automated analysis of seizure semiology and brain electrical activity in presurgery evaluation of epilepsy: a focused survey. Epilepsia 58, 1817–1831 (2017).
Abbasi, B. & Goldenholz, D. M. Machine learning applications in epilepsy. Epilepsia 60, 2037–2047 (2019).
Karácsony, T. et al. Novel 3D video action recognition deep learning approach for near real time epileptic seizure classification. Sci. Rep. 12, 19571 (2022).
Boyne, A. et al. Video-based detection of tonic–clonic seizures using a three-dimensional convolutional neural network. Epilepsia 66, 2495–2506 (2025).
Feichtenhofer, C., Fan, H., Malik, J. & He, K. SlowFast networks for video recognition. In Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 6201–6210 (2019).
Wang, L. et al. Temporal segment networks: towards good practices for deep action recognition. In Eur. Conf. Comput. Vis. (ECCV), 20–36 (2016).
Tran, D. et al. A closer look at spatiotemporal convolutions for action recognition. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 6450–6459 (2018).
Carreira, J. & Zisserman, A. Quo Vadis, action recognition? A new model and the Kinetics dataset. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 4724–4733 (2017).
Kay, W. et al. The Kinetics human action video dataset. arXiv preprint arXiv:1705.06950 https://doi.org/10.48550/arXiv.1705.06950 (2017).
Acknowledgements
This work was supported by the European Union — NextGenerationEU and by the Ministry of University and Research (MUR), National Recovery and Resilience Plan (NRRP), Mission 4, Component 2, Investment 1.5, projects “RAISE — Robotics and AI for Socio-economic Empowerment” (ECS00000035) and “MNESYS — A Multiscale Integrated Approach to the Study of the Nervous System in Health and Disease” (PE0000006; DN. 1553 11.10.2022) and by the Italian Ministry of Health, 5 x 1000 project 2017, “SOLAR: Sleep disorders in children: an innovative clinical research perspective”.The VISTA (VIdeo-baSed identificaTion of pArasomnias and seizures) group is an interdisciplinary, multi-centered research team focused on the development of video-based methodologies for the detection and analysis of motor manifestations in sleep and neurological disorders. Members: Lino Nobili, Matteo Moro, Federica Sassi, Ramona Cordani, Anna Castelnovo, Mauro Manconi, Paola Proserpio, Laura Tassi, Federica Provini, Francesca Odone, Maura Casadio, Pietro Mattioli, Dario Arnaldi, Valentina Marazzotta, Marco Veneruso, Luca Baldelli, Greta Mainieri, Stefano Francione, Luca Bosisio, Alessandro Consales.
Author information
Authors and Affiliations
Consortia
Contributions
Matteo M. and F.S. contributed equally to the analysis of data, to the methodology, and to drafting the text. R.C. and A.C. contributed to the data acquisition and to drafting the text. Mauro M., P.P., L.T., and F.P. contributed to the data acquisition. F.O. contributed to the methodology. M.C. and L.N. contributed to the conceptualization of the study.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Moro, M., Sassi, F., Cordani, R. et al. Automated video-based differentiation of sleep-related hypermotor epilepsy and parasomnia episodes. npj Digit. Med. 9, 144 (2026). https://doi.org/10.1038/s41746-025-02326-2
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41746-025-02326-2


