Automated video-based differentiation of sleep-related hypermotor epilepsy and parasomnia episodes

Moro, Matteo; Sassi, Federica; Cordani, Ramona; Castelnovo, Anna; Manconi, Mauro; Proserpio, Paola; Tassi, Laura; Provini, Federica; Odone, Francesca; Casadio, Maura; Nobili, Lino

doi:10.1038/s41746-025-02326-2

Download PDF

Brief Communication
Open access
Published: 08 January 2026

Automated video-based differentiation of sleep-related hypermotor epilepsy and parasomnia episodes

Matteo Moro^1,2,
Federica Sassi¹,
Ramona Cordani^3,4,
Anna Castelnovo^5,6,
Mauro Manconi^5,6,
Paola Proserpio⁷,
Laura Tassi⁸,
Federica Provini^9,10,
Francesca Odone^1,2,
Maura Casadio¹ &
Lino Nobili^3,4
for the VISTA group

npj Digital Medicine volume 9, Article number: 144 (2026) Cite this article

2549 Accesses
Metrics details

Subjects

Abstract

Distinguishing epileptic seizures from parasomnias is challenging due to overlapping motor features. This study evaluated a SlowFast deep learning model using video recordings of 167 individuals to classify Sleep-Related Hypermotor Epilepsy, Disorders of Arousal, and REM Sleep Behavior Disorder. The model achieved a mean accuracy of 83.3% across three data splits. This work represents an initial step toward developing automated tools to support clinicians in assessing sleep-related motor events.

Diagnosing sleep-related paroxysmal motor events accurately remains a significant clinical challenge, particularly when differentiating epileptic seizures from parasomnias¹. Although these conditions are distinct in terms of underlying mechanisms, their external manifestations during sleep often share overlapping motor characteristics, leading to potential diagnostic confusion. Experienced clinicians rely on comprehensive clinical history, video-polysomnography, and prolonged video-EEG recordings to make accurate distinctions. However, these methods can be resource-intensive, time-consuming, and prone to variability between observers, particularly in borderline cases or in institutions lacking subspecialty expertise¹.

The clinical overlap between disorders such as Sleep-Related Hypermotor Epilepsy (SHE), Disorders of Arousal (DOA), and REM Sleep Behavior Disorder (RBD) has been well documented. For example, episodes in both SHE and parasomnias may present with complex motor behaviors, including sudden arousals, limb movements, or vocalizations, complicating the diagnostic process^2,3,4. This is particularly relevant in children and young adults, where semiologic differences can be subtle.

Recent advances in artificial intelligence have introduced the possibility of supporting this diagnostic process. Video-based action recognition methods have gained traction, leveraging deep learning to extract motion patterns from raw video data without the need for wearable sensors or external markers⁵. These approaches offer the potential to streamline diagnostic workflows, enhance reproducibility, and support clinicians, especially in environments lacking full neurophysiological monitoring^5,6,7,8,9,10.

Building on our earlier pilot work, which highlighted that SHE and DOA could be distinguished using automated video classification⁵, we now extend this framework by incorporating REM Sleep Behavior Disorder (RBD) alongside SHE and DOA and leveraging a larger and more heterogeneous dataset. In this multicenter study, we analyzed a dataset comprising 253 annotated video recordings from 167 participants. The recordings were acquired under heterogeneous conditions, further reflecting real-world clinical variability. As an additional advancement over our previous work, we employed the SlowFast neural network architecture, which combines dual temporal resolution pathways to analyze both fast and slow visual cues, thereby capturing a wide range of motor patterns¹¹. This approach was evaluated as a fully automated video-based classifier of SHE, DOA, and RBD. The complete overview of the workflow for this study is shown in Fig. 1.

To determine the most effective model architecture for this classification task, we first benchmarked several leading 3D convolutional neural networks, including Temporal Segment Networks (TSN)¹² and the R2 + 1D model¹³. However, these models demonstrated limited accuracy, generally around 50%, and were therefore deemed inadequate.

The SlowFast model¹¹ was selected as the optimal solution based on its performance. It was tested across three independently constructed data splits, where no individual’s data appeared in more than one set (train/validation/test) for each split, particularly the test sets contain a single video per participant. All reported metrics therefore reflect patient-level classification performance. These partitions ensured that the evaluation was robust against overfitting and participant-specific bias.

As can be seen in Table 1, across the three validation splits, the model achieved a mean classification accuracy of 83% ± 3.6%, with a 95% Wilson confidence interval of 73–90%, with consistently high performance in identifying SHE (mean F1 = 88%) and slightly lower but comparable precision for DOA (F1 = 79%) and RBD (F1 = 83%). The confusion matrix in Fig. 2 highlights this pattern, showing that most errors occurred between DOA and RBD, reflecting their clinical and motor overlap. Performance was most stable across splits for SHE (recall = 92%), while greater variability was observed for RBD (recall range 62–100%). In addition to recall and F1 trends, the model achieved consistent overall specificity across splits (Split 1: 93.7%, Split 2: 91.7%, Split 3: 89.6%; overall: 91.7%), indicating stable performance in correctly rejecting non-target classes. A slight reduction in overall accuracy was observed in Split 3 (79%), mainly due to misclassifications between DOA and RBD, likely related to borderline or atypical examples within this subset. Both DOA and RBD can present with overlapping or ambiguous motor manifestations, particularly when dream-enactment–like or subtle motor behaviors occur. In Split 3, the test data included cases with greater variability in movement patterns: several DOA episodes displayed complex motor behaviors partly resembling RBD, while some RBD cases were characterized by limited or less distinctive activity. This heterogeneity likely contributed to the reduced discriminability between the two classes. Nevertheless, performance remained stable across the other splits, supporting the robustness of the proposed model despite interindividual variability in behavioral expression.

**Fig. 2: General confusion matrix combining the results of all three splits.**

Table 1 Precision (P), Recall (R) and F1-score for each diagnostic group (Sleep-Related Hypermotor Epilepsy (SHE), Disorders of Arousal (DOA), and REM Sleep Behavior Disorder (RBD)) in each split and overall

Full size table

Two false-negative SHE cases were identified, both in Split 2: one from a very young participant misclassified as RBD, and another misclassified as DOA. The first involved brief myoclonic jerks resembling RBD-like twitches, while the second showed agitated movements followed by sitting up and an attempt to get out of bed, mimicking a confusional arousal or sleepwalking episode as can be seen in Fig. 3. This qualitative visualization provides an example of overlap in motor patterns, such as partial arousals or complex motor sequences, that can lead to model confusion between epileptic and parasomnic events. Despite these challenges, the model demonstrated robust and generalisable performance across all splits, particularly in distinguishing SHE from parasomnias.

**Fig. 3: Example of a misclassified event (SHE predicted as DOA).**

To assess inter-center generalization, we conducted an additional experiment excluding all RBD videos from one of the centers during training and validation, using 8 of them randomly selected solely for testing. In this configuration, the model achieved an overall accuracy of 83% (20/24 videos correctly classified). All SHE cases were correctly identified, while two RBD episodes were misclassified as DOA, and two DOA were misclassified, one as RBD and one as SHE. This finding indicates that the model retains good generalization capability when applied to data from an unseen clinical site.

This study highlights that deep learning, when applied to nocturnal video recordings, can offer a reliable, automated method for classifying three major categories of sleep-related motor disorders: SHE, DOA, and RBD. The application of the SlowFast architecture, with its dual temporal pathway design, was especially effective in extracting complex motor features spanning multiple time scales. Compared to other 3D CNNs tested, the SlowFast model delivered superior performance and generalizability.

One of the model’s strongest results was in the classification of SHE, which was consistently identified with high precision across all test splits. This is notable because SHE is often difficult to diagnose due to its behavioral overlap with parasomnias. The model’s accuracy in this regard underscores its potential role as a diagnostic aid, especially in cases where expert neurophysiologic interpretation may not be available. However, the model showed reduced accuracy in distinguishing between DOA and RBD, particularly in Split 3. This limitation mirrors clinical challenges, where these parasomnia types often require careful consideration of contextual factors such as sleep stage, age, comorbidities, or even associated vocalizations, none of which were available to the model in this study. This underscores the value of a multi-modal approach and highlights opportunities for future development.

To further explore model robustness across acquisition sites, we performed a complementary analysis excluding RBD recordings from one center (Bellaria Hospital in Bologna) during training and validation and considering them only during test. The model maintained satisfactory performance (83% accuracy) comparable to the overall performance of the three original splits, correctly identifying all SHE events. This result supports the potential generalizability of video-based models to unseen clinical environments, while emphasizing the need for more balanced multi-center datasets to fully assess inter-site performance. A complete leave-one-center-out analysis was not feasible at this stage due to the imbalanced distribution of participants across classes among centers, an aspect that we plan to address in future work. Nonetheless, our dataset, drawn from multiple sleep centers using heterogeneous recording protocols and equipment, provides a realistic and ecologically valid testbed for evaluating generalizability. The variability in video quality, lighting, and resolution adds robustness to our findings, suggesting that similar models could be deployed across diverse clinical settings without extensive recalibration.

Future work will expand the dataset to include additional paroxysmal events and continuous overnight recordings, enabling assessment of age-related variability, event detection performance, and false positive rates across entire nights. Additional data acquisition will also be necessary to balance the number of individuals per class across centers, ensuring a more even class distribution and allowing for a meaningful leave-one-center-out analysis. It will also be of interest to explore multimodal approaches, first by integrating audio signals to capture vocalizations, then incorporating textual information such as demographic data and physicians’ reports and ultimately extending the analysis to include EEG recordings. The integration of these complementary data sources is expected to enhance the model’s accuracy and overall diagnostic reliability. Finally, future work should also investigate the impact of varying the dimensions of the two pathways in the SlowFast network on model accuracy. For prospective applications, automated anonymisation and controlled access pipelines will be implemented to ensure data privacy and reproducibility across centers.

In summary, our findings represent a promising proof of concept that warrants prospective and on-site validation. When validated further, such tools could assist in triage, diagnosis, or longitudinal monitoring of people with suspected nocturnal motor events, reducing diagnostic delays and relieving the burden on expert clinical teams.

Methods

Dataset

This retrospective study was conducted using video recordings acquired from five centers: Niguarda Hospital and IRCCS San Raffaele Hospital in Milan, Giannina Gaslini Hospital in Genoa, the Neurocenter of Southern Switzerland in Lugano, and Bellaria Hospital in Bologna. Ethical approval was granted by the Niguarda Hospital ethics committee (ID 939–12.12.2013), and all participants or their guardians provided written informed consent to the use of their recordings for research purposes.

The dataset included 253 video clips from 167 participants: 73 diagnosed with SHE, 53 with DOA, and 41 with RBD. Recordings were acquired using a variety of video-polysomnographic setups and reflect wide heterogeneity in temporal resolution (24–30 frames per second), camera angle, lighting, and background. Event durations ranged from 3 to 138 s, with a mean duration ± standard deviation of 28 ± 22 s. Event annotation was performed independently by two experienced experts at each participating center. A third senior expert subsequently reviewed all annotated videos across centers to ensure inter-center consistency. Only unequivocal events from patients with confirmed diagnoses, based on comprehensive clinical, neurophysiological, neuroradiological (when needed), and follow-up data, were included. All annotations corresponded to diagnostically certain events, and full agreement was reached among raters; therefore, no formal inter-rater reliability statistics were computed.

Pre-processing

Only minimal preprocessing was applied: videos were resized to 224 × 224 pixels to fit the input to the SlowFast model, kept at their original frame rates, and uniformly subsampled to 32 frames and 8 frames to match, respectively, the fast and slow pathways of the SlowFast network input while preserving the original temporal dynamics. This approach was intended to assess model robustness under heterogeneous recording conditions.

Deep learning models

We treated the classification task as a multiclass action recognition problem¹⁴. Several deep learning architectures were evaluated. At first, the Temporal Segment Network (TSN)¹² served as a baseline 2D CNN model, aggregating frame-level features over time to capture coarse motion dynamics. We next tested the R(2 + 1)D architecture¹³, a 3D convolutional model that decomposes spatiotemporal filters into separate spatial and temporal components, allowing finer motion modelling. Finally, due to poor performance from baseline models, we adopted the SlowFast network¹¹, which employs two parallel pathways operating at different temporal resolutions: a slow branch for detailed spatial semantics and a fast branch for rapid motion cues. The slow pathway processed low-frequency spatial patterns (i.e., contextual clues) by sampling 8 temporally spaced frames, while the fast pathway handled 32 densely sampled frames to capture short-term motion. Frame selection was dynamically adapted to each video’s duration. The network was initialized with pretrained weights from the Kinetics-400 dataset¹⁵.

Data splitting strategy

To ensure unbiased generalization, we adopted a three-split cross-validation design at the participant level. For each split, data were divided into training, validation, and test sets, ensuring that no participant contributed to more than one set. As stated in Table 2, the training set comprised 119 participants (205 videos) and each individual contributed between 1 and 4 recordings (median = 1, range = 1–4), while the validation and test sets contained 8 participants each (8 unique videos), with one video per participant and no repetition of clips across or within splits. This approach produced three independent train–validation–test configurations, each with a distinct validation and test cohort, allowing us to evaluate model stability and generalization across different participant compositions. Hyperparameters were tuned using the validation performance, while the final classification accuracy was computed as the average across the three test splits. To mitigate class imbalance, we employed a class-weighted focal loss during training.

Table 2 Demographic and clinical characteristics of the study cohort

Full size table

To further assess inter-center generalization, we conducted an additional preliminary leave-one-center-out (LOCO) experiment. Although a full LOCO analysis is not feasible at this stage due to the strong imbalance in class distributions across centers (an aspect we plan to address in future work), we nonetheless performed a targeted experiment to obtain an initial indication of cross-site generalizability. In this experiment, all RBD videos from one center (Bellaria Hospital in Bologna, 14 clips from 14 unique participants) were excluded from training and validation, with 8 of them randomly selected and used exclusively for testing. This configuration simulated a previously unseen acquisition environment, providing an independent evaluation of the model’s robustness across clinical sites.

Data availability

Raw video data cannot be shared publicly due to regulations imposed by the Ethical Committee. Weights of the trained deep learning architectures and the code can be made available upon reasonable request to the corresponding author (matteo.moro@unige.it).

Code availability

Weights of the trained deep learning architectures and the code can be made available upon reasonable request to the corresponding author (matteo.moro@unige.it).

References

Montini, A., Loddo, G., Baldelli, L., Cilea, R. & Provini, F. Sleep-related hypermotor epilepsy vs disorders of arousal in adults: a step-wise approach to diagnosis. Chest 160, 319–329 (2021).
Article PubMed Google Scholar
Vignatelli, L. et al. Interobserver reliability of video recording in the diagnosis of nocturnal frontal lobe seizures. Epilepsia 48, 1506–1511 (2007).
Article PubMed Google Scholar
Derry, C. P., Harvey, A. S., Walker, M. C., Duncan, J. S. & Berkovic, S. F. NREM arousal parasomnias and their distinction from nocturnal frontal lobe epilepsy: a video EEG analysis. Sleep 32, 1637–1644 (2009).
Article PubMed PubMed Central Google Scholar
Loddo, G. et al. Seizures with paroxysmal arousals in sleep-related hypermotor epilepsy (SHE): dissecting epilepsy from NREM parasomnias. Epilepsia 61, 2194–2202 (2020).
Article PubMed Google Scholar
Moro, M. et al. Automatic video analysis and classification of sleep-related hypermotor seizures and disorders of arousal. Epilepsia 64, 1653–1662 (2023).
Article PubMed Google Scholar
Ahmedt-Aristizabal, D. et al. Understanding patients’ behavior: vision-based analysis of seizure disorders. IEEE J. Biomed. Health Inform. 23, 2583–2591 (2019).
Article PubMed Google Scholar
Ahmedt-Aristizabal, D. et al. Automated analysis of seizure semiology and brain electrical activity in presurgery evaluation of epilepsy: a focused survey. Epilepsia 58, 1817–1831 (2017).
Article PubMed Google Scholar
Abbasi, B. & Goldenholz, D. M. Machine learning applications in epilepsy. Epilepsia 60, 2037–2047 (2019).
Article PubMed PubMed Central Google Scholar
Karácsony, T. et al. Novel 3D video action recognition deep learning approach for near real time epileptic seizure classification. Sci. Rep. 12, 19571 (2022).
Article PubMed PubMed Central Google Scholar
Boyne, A. et al. Video-based detection of tonic–clonic seizures using a three-dimensional convolutional neural network. Epilepsia 66, 2495–2506 (2025).
Article PubMed CAS Google Scholar
Feichtenhofer, C., Fan, H., Malik, J. & He, K. SlowFast networks for video recognition. In Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 6201–6210 (2019).
Wang, L. et al. Temporal segment networks: towards good practices for deep action recognition. In Eur. Conf. Comput. Vis. (ECCV), 20–36 (2016).
Tran, D. et al. A closer look at spatiotemporal convolutions for action recognition. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 6450–6459 (2018).
Carreira, J. & Zisserman, A. Quo Vadis, action recognition? A new model and the Kinetics dataset. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 4724–4733 (2017).
Kay, W. et al. The Kinetics human action video dataset. arXiv preprint arXiv:1705.06950 https://doi.org/10.48550/arXiv.1705.06950 (2017).

Download references

Acknowledgements

This work was supported by the European Union — NextGenerationEU and by the Ministry of University and Research (MUR), National Recovery and Resilience Plan (NRRP), Mission 4, Component 2, Investment 1.5, projects “RAISE — Robotics and AI for Socio-economic Empowerment” (ECS00000035) and “MNESYS — A Multiscale Integrated Approach to the Study of the Nervous System in Health and Disease” (PE0000006; DN. 1553 11.10.2022) and by the Italian Ministry of Health, 5 x 1000 project 2017, “SOLAR: Sleep disorders in children: an innovative clinical research perspective”.The VISTA (VIdeo-baSed identificaTion of pArasomnias and seizures) group is an interdisciplinary, multi-centered research team focused on the development of video-based methodologies for the detection and analysis of motor manifestations in sleep and neurological disorders. Members: Lino Nobili, Matteo Moro, Federica Sassi, Ramona Cordani, Anna Castelnovo, Mauro Manconi, Paola Proserpio, Laura Tassi, Federica Provini, Francesca Odone, Maura Casadio, Pietro Mattioli, Dario Arnaldi, Valentina Marazzotta, Marco Veneruso, Luca Baldelli, Greta Mainieri, Stefano Francione, Luca Bosisio, Alessandro Consales.

Author information

Authors and Affiliations

Department of Informatics, Bioengineering, Robotics and Systems Engineering (DIBRIS), University of Genova, Genova, Italy
Matteo Moro, Federica Sassi, Francesca Odone & Maura Casadio
Machine Learning Genoa (MaLGa) Center, University of Genova, Genova, Italy
Matteo Moro & Francesca Odone
Dipartimento di Neuroscienze, Riabilitazione, Oftalmologia, Genetica e Scienze Materno-Infantili (DINOGMI), University of Genova, Genova, Italy
Ramona Cordani, Lino Nobili, Pietro Mattioli, Dario Arnaldi, Marco Veneruso & Luca Bosisio
Child Neuropsychiatry Unit, IRCCS Istituto Giannina Gaslini, Member of the ERN EpiCARE, Genova, Italy
Ramona Cordani, Lino Nobili, Valentina Marazzotta, Marco Veneruso, Luca Bosisio & Alessandro Consales
Neurocenter of Italian Switzerland, Ente Ospedaliero Cantonale, Ospedale Civico, Lugano, Switzerland
Anna Castelnovo & Mauro Manconi
Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland
Anna Castelnovo & Mauro Manconi
IRCCS San Raffaele Hospital, Milan, Italy
Paola Proserpio
“C. Munari” Epilepsy Surgery Center, Niguarda Hospital, Milan, Italy
Laura Tassi & Stefano Francione
Department of Biomedical and Neuromotor Sciences, Bellaria Hospital, University of Bologna, Bologna, Italy
Federica Provini, Luca Baldelli & Greta Mainieri
IRCCS Institute of Neurological Sciences of Bologna, Bellaria Hospital, Bologna, Italy
Federica Provini, Luca Baldelli & Greta Mainieri
Neurophysiopathology Unit, IRCCS Ospedale Policlinico San Martino, Genova, Italy
Pietro Mattioli & Dario Arnaldi

Authors

Matteo Moro
View author publications
Search author on:PubMed Google Scholar
Federica Sassi
View author publications
Search author on:PubMed Google Scholar
Ramona Cordani
View author publications
Search author on:PubMed Google Scholar
Anna Castelnovo
View author publications
Search author on:PubMed Google Scholar
Mauro Manconi
View author publications
Search author on:PubMed Google Scholar
Paola Proserpio
View author publications
Search author on:PubMed Google Scholar
Laura Tassi
View author publications
Search author on:PubMed Google Scholar
Federica Provini
View author publications
Search author on:PubMed Google Scholar
Francesca Odone
View author publications
Search author on:PubMed Google Scholar
Maura Casadio
View author publications
Search author on:PubMed Google Scholar
Lino Nobili
View author publications
Search author on:PubMed Google Scholar

Consortia

for the VISTA group

Matteo Moro
, Federica Sassi
, Ramona Cordani
, Anna Castelnovo
, Mauro Manconi
, Paola Proserpio
, Laura Tassi
, Federica Provini
, Francesca Odone
, Maura Casadio
, Lino Nobili
, Pietro Mattioli
, Dario Arnaldi
, Valentina Marazzotta
, Marco Veneruso
, Luca Baldelli
, Greta Mainieri
, Stefano Francione
, Luca Bosisio
& Alessandro Consales

Contributions

Matteo M. and F.S. contributed equally to the analysis of data, to the methodology, and to drafting the text. R.C. and A.C. contributed to the data acquisition and to drafting the text. Mauro M., P.P., L.T., and F.P. contributed to the data acquisition. F.O. contributed to the methodology. M.C. and L.N. contributed to the conceptualization of the study.

Corresponding author

Correspondence to Matteo Moro.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Moro, M., Sassi, F., Cordani, R. et al. Automated video-based differentiation of sleep-related hypermotor epilepsy and parasomnia episodes. npj Digit. Med. 9, 144 (2026). https://doi.org/10.1038/s41746-025-02326-2

Download citation

Received: 08 August 2025
Accepted: 26 December 2025
Published: 08 January 2026
Version of record: 10 February 2026
DOI: https://doi.org/10.1038/s41746-025-02326-2