Abstract
The translation of automated seizure detection from controlled clinical units to real-world settings is hindered by heterogeneous recording conditions and limited expert monitoring. We introduce EpiVLM, a multimodal vision–language system that combines clinically structured prompts with video reasoning for cross-environment seizure monitoring. Evaluated on a robust and diverse dataset of 232 video recordings from 127 patients, totaling 11,666 expert-annotated segments from two tertiary centers, unconstrained home recordings, and an independent public dataset, EpiVLM recognized five major semiologies with accuracy 0.795–0.947 and sensitivity 0.842–0.957. With prompts and decision thresholds fixed a priori, performance remained consistent across diverse real-world acquisition conditions without site-specific recalibration. In external validation sets, EpiVLM sustained strong recognition while maintaining low video-level false detections (0.47–2.45%) and timely detection (mean onset-to-detection delay <6 s). Compared with standard video deep-learning baselines, EpiVLM achieved superior overall performance. These results support scalable seizure recognition from routine video and motivate prospective evaluation for remote outcome monitoring.
Similar content being viewed by others
Acknowledgements
The authors thank the clinical collaborators and research assistants who contributed to data curation, interpretation, and manuscript preparation. L.C. was supported by Brain Science and Brain-like Intelligence Technology-National Science and Technology Major Project (2021ZD0204300), the Supported by Sichuan Science and Technology Program (2025NSFTD0027) and 1.3.5 project for disciplines of excellence, West China Hospital, Sichuan University (ZYYC23011). P.W. was supported by Shenzhen Science and Technology Innovation Committee (JCYJ20220818100213029).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
He, M., Sha, L., Tang, G. et al. Towards generalizable seizure monitoring: EpiVLM for cross-environment detection and classification. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02810-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41746-026-02810-3


