Abstract
Sleep decoding is key to revealing sleep architecture and its links to health, yet prevailing deep-learning models rely on supervised, task-specific designs and dual encoders that isolate time-domain and frequency-domain information, limiting generalizability and scalability. We introduce SleepGPT for sleep decoding, a time-frequency foundation model based on generative pretrained transformer, developed using multi-pretext pretraining strategy on 86,335 hours of polysomnography (PSG) from 8,377 subjects. SleepGPT includes a channel-adaptive mechanism for variable channel configurations and a unified time-frequency fusion module that enables deep cross-domain interaction. Evaluations across diverse PSG datasets demonstrate that SleepGPT sets a new benchmark for sleep decoding tasks, achieving superior performance in sleep staging, sleep-related pathology classification, sleep data generation, and sleep spindle detection. Moreover, it reveals channel- and stage-specific physiological patterns underlying sleep decoding. In sum, SleepGPT is an all-in-one method with exceptional generalizability and scalability, offering transformative potential in addressing sleep decoding challenges.
Similar content being viewed by others
Data availability
All databases used in this study are publicly available databases. The CAP database is available at https://physionet.org/content/capslpdb/1.0.0/. The MASS database is available at http://ceams-carsm.ca/mass/. The PhysioNet2018 database is available at https://physionet.org/content/challenge-2018/1.0.0/. Access to SHHS can be requested at https://sleepdata.org/datasets/shhs/. The Sleep-EDF database is available at https://physionet.org/content/sleep-edfx/1.0.0/. The SleepEEGfMRI database is available from the last author upon request, accompanied by a short description of the project, the reason, and the intended use of the data. The UMS database is available from Dr. Hanrong Cheng upon request, accompanied by a short description of the project, the reason, and the intended use of the data. The pretrained model checkpoint is provided on Figshare at https://doi.org/10.6084/m9.figshare.30626870. Source data are provided with this paper.
Code availability
The code supporting the conclusions of this study is available on GitHub at https://github.com/LordXX505/SleepGPTand in the Zenodo repository78 (https://doi.org/10.5281/zenodo.17432722). This repository contains the SleepGPT environment configuration, pretraining and fine-tuning code, as well as scripts for weight visualization and multi-task testing.
References
Xie, L. et al. Sleep drives metabolite clearance from the adult brain. Science 342, 373–377 (2013).
Krause, A. J. et al. The sleep-deprived human brain. Nat. Rev. Neurosci. 18, 404–418 (2017).
Horikawa, T., Tamaki, M., Miyawaki, Y. & Kamitani, Y. Neural decoding of visual imagery during sleep. Science 340, 639–642 (2013).
Schönauer, M. et al. Decoding material-specific memory reprocessing during sleep in humans. Nat. Commun. 8, 15404 (2017).
Yin, Z. et al. Generalized sleep decoding with basal ganglia signals in multiple movement disorders. NPJ Digit. Med. 7, 122 (2024).
Phan, H. et al. XSleepNet: multi-view sequential model for automatic sleep staging. IEEE Trans. Pattern Anal. Mach. Intell. 44, 5903–5915 (2021).
Perslev, M. et al. U-sleep: resilient high-frequency sleep staging. NPJ Digit. Med. 4, 72 (2021).
Perslev, M., Jensen, M., Darkner, S., Jennum, P. J. & Igel, C. U-time: a fully convolutional network for time series segmentation applied to sleep staging. Adv. Neural Inf. Process. Syst. 30, 4392–4403 (2019).
Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In Proc. 37th International Conference on Machine Learning 1597–1607 (PMLR, 2020).
Chen, X., Xie, S. & He, K. An empirical study of training self-supervised vision transformers. In Proc. of the IEEE/CVF International Conference on Computer Vision 9640–9649 (IEEE, 2021).
Bao, H., Dong, L., Piao, S. & Wei, F. BEiT: bert pre-training of image transformers. In International Conference on Learning Representations (2021).
Mohamed, A. et al. Self-supervised speech representation learning: a review. IEEE J. Sel. Top. Signal Process. 16, 1179–1210 (2022).
Xu, H. et al. A whole-slide foundation model for digital pathology from real-world data. Nature 630, 181–188 (2024).
Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).
Pai, S. et al. Foundation model for cancer imaging biomarkers. Nat. Mach. Intell. 6, 354–367 (2024).
Feng, B. et al. A bioactivity foundation model using pairwise meta-learning. Nat. Mach. Intell. 6, 962–974 (2024).
Cui, H. et al. ScGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).
Hao, M. et al. Large-scale foundation model on single-cell transcriptomics. Nat. Methods 21, 1481–1491 (2024).
Huang, K. et al. A foundation model for clinician-centered drug repurposing. Nat. Med. 30, 3601–3613 (2024).
Hanna, J. & Flöel, A. An accessible and versatile deep learning-based sleep stage classifier. Front. Neuroinform. 17, 1086634 (2023).
Fiorillo, L. et al. U-sleep’s resilience to AASM guidelines. NPJ Digit. Med. 6, 33 (2023).
Zapata, I. A., Wen, P., Jones, E., Fjaagesund, S. & Li, Y. Automatic sleep spindles identification and classification with multitapers and convolution. Sleep 47, zsad159 (2024).
Zhang, Z., Lin, B.-S., Peng, C.-W. & Lin, B.-S. Multi-modal sleep stage classification with two-stream encoder-decoder. IEEE Trans. Neural Syst. Rehabil. Eng. 32, 2096–2105 (2024).
Zou, B. et al. A multi-modal deep language model for contaminant removal from metagenome-assembled genomes. Nat. Mach. Intell. 6, 1245–1255 (2024).
Yang, M. et al. Contrastive learning enables rapid mapping to multimodal single-cell atlas of multimillion scale. Nat. Mach. Intell. 4, 696–709 (2022).
He, K. et al. Masked autoencoders are scalable vision learners. In Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 15979–15988 (2022).
Chen, D., Liu, J. & Wei, G.-W. Multiscale topology-enabled structure-to-sequence transformer for protein–ligand interaction predictions. Nat. Mach. Intell. 6, 799–810 (2024).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017).
Bao, H. et al. VLMo: unified vision-language pre-training with mixture-of-modality-experts. Adv. Neural Inf. Process. Syst. 35, 32897–32912 (2022).
Ghassemi, M. et al. You snooze, you win: the PhysioNet/Computing in Cardiology Challenge 2018. In 2018 Computing in Cardiology Conference (CinC) 45, 1–4 (IEEE, 2018).
Zhang, G.-Q. et al. The National Sleep Research Resource: towards a sleep data commons. J. Am. Med. Inform. Assoc. 25, 1351–1358 (2018).
Quan, S. F. et al. The sleep heart health study: design, rationale, and methods. Sleep 20, 1077–1085 (1997).
Liu, J. et al. State-dependent and region-specific alterations of cerebellar connectivity across stable human wakefulness and NREM sleep states. NeuroImage 266, 119823 (2023).
Zou, G., Liu, J., Zou, Q. & Gao, J.-H. A-pass: an automated pipeline to analyze simultaneously acquired EEG-fMRI data for studying brain activities during sleep. J. Neural Eng. 19, 046031 (2022).
Terzano, M. G. et al. Atlas, rules, and recording techniques for the scoring of cyclic alternating pattern (CAP) in human sleep. Sleep Med. 2, 537–554 (2001).
O’Reilly, C., Gosselin, N., Carrier, J. & Nielsen, T. Montreal archive of sleep studies: an open-access resource for instrument benchmarking and exploratory research. J. Sleep Res. 23, 628–635 (2014).
Kemp, B., Zwinderman, A. H., Tuk, B., Kamphuisen, H. A. C. & Oberye, J. J. L. Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the EEG. IEEE Trans. Biomed. Eng. 47, 1185–1194 (2000).
Chen, X. et al. Validation of a wearable forehead sleep recorder against polysomnography in sleep staging and desaturation events in a clinical sample. J. Clin. Sleep Med. 19, 711–718 (2023).
Berry, R. B. et al. Rules for scoring respiratory events in sleep: update of the 2007 aasm manual for the scoring of sleep and associated events: deliberations of the sleep apnea definitions task force of the american academy of sleep medicine. J. Clin. Sleep Med. 8, 597–619 (2012).
Mostafaei, S. H., Tanha, J. & Sharafkhaneh, A. A novel deep learning model based on transformer and cross modality attention for classification of sleep stages. J. Biomed. Inform. 157, 104689 (2024).
Lee, H. et al. Explainable vision transformer for automatic visual sleep staging on multimodal PSG signals. NPJ Digit. Med. 8, 55 (2025).
Lee, S., Yu, Y., Back, S., Seo, H. & Lee, K. SleePyCo: automatic sleep scoring with feature pyramid and contrastive learning. Expert Syst. Appl. 240, 122551 (2024).
Phan, H. et al. L-seqsleepnet: whole-cycle long sequence modelling for automatic sleep staging. IEEE J. Biomed. Health Inform. 27, 4748–4757 (2023).
Liu, P. et al. Automatic sleep stage classification using deep learning: signals, data representation, and neural networks. Artif. Intell. Rev. 57, 301 (2024).
Zhang, X., Zhang, X., Huang, Q., Lv, Y. & Chen, F. A review of automated sleep stage based on EEG signals. Biocybern. Biomed. Eng. 44, 651–673 (2024).
Yang, Y. & Liu, X. A re-examination of text categorization methods. In Proc. of the 22nd annual International ACM SIGIR Conference on Research and Development in Information Retrieval 42–49 (ACM, Berkeley, California, USA, 1999).
Sokolova, M. & Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45, 427–437 (2009).
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at http://arxiv.org/abs/1802.03426 (2020).
Wu, D., Li, S., Yang, J. & Sawan, M. Neuro-bert: rethinking masked autoencoding for self-supervised neurological pretraining. IEEE J. Biomed. Health Inform. https://doi.org/10.1109/JBHI.2024.3415959 (2024).
Yang, C. et al. Self-supervised electroencephalogram representation learning for automatic sleep staging: model development and evaluation study. JMIR AI 2, e46769 (2023).
Kumar, V. et al. MulEEG: a multi-view representation learning on EEG signals. In International Conference on Medical Image Computing and Computer-Assisted Intervention 398–407 (Springer, 2022).
Eldele, E. et al. Self-supervised contrastive representation learning for semi-supervised time-series classification. IEEE Trans. Pattern Anal. Mach. Intell. 45, 15604–15618 (2023).
Sarkar, P. & Etemad, A. Self-supervised ECG representation learning for emotion recognition. IEEE Trans. Affective Comput. 13, 1541–1554 (2022).
van den Oord, A., Li, Y. & Vinyals, O. Representation learning with contrastive predictive coding. Preprint at http://arxiv.org/abs/1807.03748 (2019).
Yue, Z. et al. TS2Vec: towards universal representation of time series. Proc. AAAI Conference on Artificial Intelligence 36, 8980–8987 (2022).
Zerveas, G., Jayaraman, S., Patel, D., Bhamidipaty, A. & Eickhoff, C. A transformer-based framework for multivariate time series representation learning. In Proc. of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining 2114–2124 (ACM, Virtual Event, Singapore, 2021).
Kong, X. & Zhang, X. Understanding masked image modeling via learning occlusion invariant feature. In Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 6241–6251 (2023).
Mahowald, M. W. & Schenck, C. H. Insights from studying human sleep disorders. Nature 437, 1279–1285 (2005).
Arnardottir, E. S., Thorleifsdottir, B., Svanborg, E., Olafsson, I. & Gislason, T. Sleep-related sweating in obstructive sleep apnoea: association with sleep stages and blood pressure. J. Sleep Res. 19, 122–130 (2010).
Miettinen, T. et al. Success rate and technical quality of home polysomnography with self-applicable electrode set in subjects with possible sleep bruxism. IEEE J. Biomed. Health Inform. 22, 1124–1132 (2018).
Chien, H. Y. S. et al. MAEEG: Masked Auto-encoder for EEG Representation Learning. In NeurIPS 2022 Workshop on Learning from Time Series for Health (New Orleans, LA, USA, 2022).
Zhang, R. et al. ERP-WGAN: a data augmentation method for EEG single-trial detection. J. Neurosci. Methods 376, 109621 (2022).
Tosato, G., Dalbagno, C. M. & Fumagalli, F. EEG synthetic data generation using probabilistic diffusion models. Preprint at http://arxiv.org/abs/2303.06068 (2023).
Aristimunha, B. et al. Synthetic sleep EEG signal generation using latent diffusion models. In NeurIPS 2023 Deep Generative Models for Health Workshop (2023).
Warby, S. C. et al. Sleep-spindle detection: crowdsourcing and evaluating performance of experts, non-experts and automated methods. Nat. Methods 11, 385–392 (2014).
You, J., Jiang, D., Ma, Y. & Wang, Y. SpindleU-net: an adaptive U-net framework for sleep spindle detection in single-channel EEG. IEEE Trans. Neural Syst. Rehabil. Eng. 29, 1614–1623 (2021).
Kinoshita, T. et al. Sleep spindle detection using Rusboost and synchrosqueezed wavelet transform. IEEE Trans. Neural Syst. Rehabil. Eng. 28, 390–398 (2020).
Buckland, M. & Gey, F. The relationship between recall and precision. J. Am. Soc. Inf. Sci. 45, 12–19 (1994).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Vig, J. A multiscale visualization of attention in the transformer model. In Proc. of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations 37–42 (2019).
Kapishnikov, A. et al. Guided integrated gradients: an adaptive path method for removing noise. In Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 5050–50558 (2021).
Shao, M., Bao, Z., Liu, W., Qiao, Y. & Wan, Y. Frequency domain-enhanced transformer for single image deraining. Vis. Comput. 40, 6723–6738 (2024).
Zhuang, X., Li, Y. & Peng, N. Enhanced automatic sleep spindle detection: a sliding window-based wavelet analysis and comparison using a proposal assessment method. Appl. Inform. 3, 11 (2016).
Jiang, D., Ma, Y. & Wang, Y. A robust two-stage sleep spindle detection approach using single-channel EEG. J. Neural Eng. 18, 026026 (2021).
Tapia, N. I. & Estévez, P. A. RED: deep recurrent neural networks for sleep EEG event detection. In 2020 International Joint Conference on Neural Networks (IJCNN) 1–8 (2020).
Kales, A., Rechtschaffen, A., University of California, Los Angeles Brain Information Service & Neurological Information Network (U.S.). A Manual of Standardized Terminology, Techniques and Scoring System for Sleep Stages of Human Subjects. (U.S. National Institute of Neurological Diseases and Blindness, Neurological Information Network, 1968).
Zou, Q. et al. Cortical hierarchy underlying homeostatic sleep pressure alleviation. Nat. Commun. 16, 10014 (2025).
Huang, W. A unified time-frequency foundation model for sleep decoding. Zenodo. https://doi.org/10.5281/zenodo.17432722 (2025).
Acknowledgements
This work was supported by the STI2030-Major Projects (2021ZD0200800 to Q.Z.,2021ZD0200500, 2021ZD0200506 and 2022ZD0206000 to J.H.G.); the National Natural Scientific Foundation of China (Grants w2431053, 81790650, 81727808, and 82327806 to J.H.G., 82372034 and 81871427 to Q.Z.); Beijing United Imaging Research Institute of Intelligent Imaging Foundation (CRIBJZD202101 to Q.Z.) and non-profit Central Research Institute Fund of Chinese Academy of Medical Sciences (2024-RC416-02 to Z.C.). We thank the National Center for Protein Sciences at Peking University in Beijing, China, for assistance with data acquisition. This study is supported by High-performance Computing Platform of Peking University.
Author information
Authors and Affiliations
Contributions
W.H., Q.Z. and J.G. conceived the research idea. W.H. designed the study, implemented the model, performed all analyses, and prepared the figures. H.C. collected and provided the UMS dataset. Z.C. contributed partial computational resources for model training. Y.W., Q.Z., and J.G. provided guidance on study design and analyses. Y.W., H.X., T.L., X.W., H.C., P.L., Z.C., W.X., Q.Z., and J.G. contributed to manuscript drafting, review, and revision. Q.Z. and J.G. supervised the study and provided critical feedback.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Sahar Hassanzadeh Mostafaei, Xiang-Dong Tang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Huang, W., Wang, Y., Cheng, H. et al. A unified time-frequency foundation model for sleep decoding. Nat Commun (2026). https://doi.org/10.1038/s41467-025-67970-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-67970-4


