Novel digital markers of sleep dynamics: causal inference approach revealing age and gender phenotypes in obstructive sleep apnea

Bechny, Michal; Kishi, Akifumi; Fiorillo, Luigi; van der Meer, Julia; Schmidt, Markus; Bassetti, Claudio; Tzovara, Athina; Faraci, Francesca

doi:10.1038/s41598-025-97172-3

Download PDF

Article
Open access
Published: 08 April 2025

Novel digital markers of sleep dynamics: causal inference approach revealing age and gender phenotypes in obstructive sleep apnea

Michal Bechny^1,2,
Akifumi Kishi³,
Luigi Fiorillo^2,4,
Julia van der Meer⁵,
Markus Schmidt^5,6,
Claudio Bassetti^5,6,
Athina Tzovara^1,5 &
…
Francesca Faraci²

Scientific Reports volume 15, Article number: 12016 (2025) Cite this article

3535 Accesses
4 Citations
Metrics details

Subjects

Abstract

Despite evidence that sleep-disorders alter sleep-stage dynamics, only a limited amount of these parameters are included and interpreted in clinical practice, mainly due to unintuitive methodologies or lacking normative values. Leveraging the matrix of sleep-stage transition proportions, we propose (i) a general framework to quantify sleep-dynamics, (ii) several novel markers of their alterations, and (iii) demonstrate our approach using obstructive sleep apnea (OSA), one of the most prevalent sleep-disorder and a significant risk factor. Using causal inference techniques, we address confounding in an observational clinical database and estimate markers personalized by age, gender, and OSA-severity. Importantly, our approach adjusts for five categories of sleep-wake-related comorbidities, a factor overlooked in existing research but present in 48.6% of OSA-subjects in our high-quality dataset. Key markers, such as NREM-REM-oscillations and sleep-stage-specific fragmentations, were increased across all OSA-severities and demographic groups. Additionally, we identified distinct gender-phenotypes, suggesting that females may be more vulnerable to awakenings and REM-sleep-disruptions. External validation of the transition markers on the SHHS database confirmed their robustness in detecting sleep-disordered-breathing (average AUROC = 66.4%). With advancements in automated sleep-scoring and wearable devices, our approach holds promise for developing low-cost screening tools for sleep-, neurodegenerative-, and psychiatric-disorders exhibiting altered sleep patterns.

Gender differences of clinical and polysomnographic findings with obstructive sleep apnea syndrome

Article Open access 15 March 2021

A nationwide study on the prevalence and contributing factors of obstructive sleep apnea in Iran

Article Open access 17 October 2023

A continuous approach to explain insomnia and subjective-objective sleep discrepancy

Article Open access 12 March 2025

Introduction

The clinical sleep study (polysomnography, PSG) involves comprehensive overnight monitoring of body biosignals, including electroencephalogram (EEG), electrocardiogram (ECG), electromyogram (EMG), and others. Medical personnel evaluate the PSG following guidelines of the American Academy of Sleep Medicine (AASM)¹, focusing on the detection of complete and partial breathing arrests (i.e., apneas and hypopneas), movement events, and notably, categorizing stages of sleep. Sleep scoring - conventionally done manually for each 30-second window (epoch) of the biosignals recorded - differentiates between five sleep-wake stages: wakefulness (W), rapid-eye-movement (REM) sleep, and three other non-REM (N1, N2, N3) sleep-states. Such a structured sleep-scoring (hypnogram) forms a basis for the PSG report, providing information on basic markers (e.g., sleep efficiency, % of sleep-stages, REM latency) that relate to sleep quality and may also indicate certain sleep disorders^2,3,4.

Sleep and its markers have a complex relationship with individuals’ age and may vary by gender⁵. Several meta-analyses have made considerable efforts to establish normative values of sleep markers in healthy individuals^6,7. However, the validity of certain estimates might be questionable due to inappropriate statistical evaluations of the individual studies whose results were pooled⁸. For instance, REM latency, as a time-to-event phenomenon subject to censoring, is best quantified using survival techniques rather than mean comparisons. Similarly, the % of sleep-stages, which are interdependent, should be assessed by compositional methods. Proper techniques enabling unbiased estimation are however rarely applied. Quantification of normative ranges and changes in sleep markers in diseased subjects is even more challenging. The observational study design of PSG databases, typically including non-randomized symptomatic subjects, introduces a high degree of confounding⁹. This results in an imbalanced prevalence of individuals with different clinical statuses and distributional shifts in their demographic characteristics. These factors make it difficult to separate the effects of natural ageing from the effects of particular disorders on sleep parameters. The unaddressed confounding, difficulty in assessing data of patients who often suffer from several sleep disorders simultaneously, and the use of not always appropriate statistical approaches are major challenges that increase the risk of biased conclusions even in the analysis of well-established PSG markers.

While differences in sleep-stage dynamics are evident for certain sleep disorders, such as increased sleep fragmentation in Obstructive Sleep Apnea (OSA)^10,11, or a short REM latency in narcoleptic patients¹², the clinical PSG report has, so far, included only a limited number of dynamics-related markers. This includes sleep and REM latencies and the absolute counts of sleep-stage transitions or awakenings¹. While latencies target the first (tens of) minutes of the night, the overall numbers of transitions/awakenings are proportional to sleep duration and may not sufficiently capture more complex patterns of sleep dynamics that may be specific to individual sleep disorders. Although counts of transitions and awakenings are sometimes normalized as indices per hour, stage-specific dynamics-such as REM-related continuity and transitions-are typically overlooked, despite their potential to reveal disorder-specific patterns of sleep disruption. Their limited incorporation into clinical PSG reports is largely due to the absence of standardized methodologies, normative values, and intuitive frameworks to support clinical interpretation. Recognizing these limitations, significant research has been conducted to comprehensively explore sleep-stage dynamics in various modalities. These studies, which date back to the 1980s, exhibit heterogeneity in terms of subject demographics, clinical diagnoses, and the methodologies employed¹³. Two main investigative directions have emerged: (i) focusing on the transitions between sleep stages, and (ii) focusing on the duration of sleep stages. The perspectives of these two seemingly distinct but strongly interrelated areas are discussed in the following two separate paragraphs, highlighting the contribution of the most impactful studies.

Research on sleep-stage transitions has evolved rapidly, beginning with one of the earliest mathematical models by Kemp (1986), who quantified transition intensities in 23 healthy males aged 18-30¹⁴. Yassouridis (1999) followed by exploring the relationship between transition intensities and plasma cortisol levels in 30 males aged 20–30¹⁵. Several studies identified associations between transition rates and clinical symptoms. For instance, Burns (2008) observed increased sleep fragmentation and transitions into N3 in 15 females with fibromyalgia syndrome (mean ± standard deviation (SD) age of 42.5 ± 12.9), contrasting with age- and gender-matched controls¹⁶. Laffan (2010) found a significant association between transition rates and self-reported sleep quality in a large cohort from the Sleep Heart Health Study (SHHS) database, consisting of 5684 participants (47.2% males, all aged over 40)¹⁷. The existing research extends to specific conditions such as chronic fatigue syndrome, where Kishi (2008) reported abnormal REM transitions in 22 female patients (aged 42 ± 8) in comparison to healthy controls of similar demographics¹⁸. Further exploring clinical implications, Kim (2009) found differences in sleep-stage dynamics between nights with and without CPAP therapy in 113 OSA subjects (aged 54.0 ± 11.7, 16 females)¹⁹. Wei (2017) documented increased N2-to-W/N1 transitions in 46 insomnia patients (aged 50.3 ± 13.6, 8 males) compared to age- and gender-matched controls, indicating altered sleep patterns²⁰. In addition, Schlemmer (2015) analyzed first- and second-order sleep-stage transitions across 4 groups of subjects (young vs old, healthy vs disorder), highlighting the varied impacts of ageing and pathological conditions²¹. Yet, the disordered subjects represented a pool of various sleep and psychological conditions, and the findings cannot be attributed to a specific diagnosis. Recently, Wachter (2020) utilized MANOVA adjusted for age, gender, and BMI, to evaluate differences in the 25 most common second-order transitions in different severities of OSA compared to healthy subjects, demonstrating associations with demographic and clinical factors²². The significant findings primarily related to wake and light-sleep (N1, N2) oscillations, when comparing severe-OSA and healthy. An innovative yet not diagnosis-oriented approach by Yetton (2018) applied a Bayesian network to model transitions as well as stage durations in 3202 - according to exclusion criteria - healthy subjects (mean age of 62.5, 60% males). The prediction-oriented results demonstrated the highest accuracy (62.3%) in the identification of the current stage based on the previous 2 stages, the duration of the last stage, and no consideration of age, gender, or BMI²³.

Another perspective in understanding sleep dynamics focuses on the quantification of sleep stage durations, providing insights into the temporal characteristics of individual sleep-wake periods. Lo (2002) initiated this research direction by examining sleep-wake dynamics in 20 healthy subjects (aged 23–57, 9 males), revealing different characteristics between sleep and wake periods’ duration and advocating for their modelling using power law distributions²⁴. Building on this, Penzel (2003) applied power-law models to quantify sleep-stage durations in both healthy and disordered subjects, identifying reduced duration and hence more fragmented sleep in sleep-apnea subjects²⁵ (with no specific demographics details provided). Following that, Norman (2006) exploited survival techniques and revealed decreased sleep continuity when comparing 10 mild and 10 moderate/severe subjects with sleep-disordered-breathing (SDB) against 10 normal subjects²⁶. The analysis did not consider subjects’ age, which was significantly higher in disordered subjects. Chervin (2009) compared sleep architecture in 48 children (aged 5-12.9) with sleep-disordered breathing to healthy controls, finding a significant decrease in the duration of N2 and REM²⁷. Bianchi (2010) employed multi-exponential fitting to analyze sleep-stage durations across 376 predefined controls (aged 68.2 ± 6.3, 35.6% males), in comparison to 496 mild-OSA (aged 63.8 ± 0.3, 60% males), and 338 severe-OSA (aged 63.7 ± 10.5, 70.7% males) subjects from the SHHS database²⁸. They report accelerated decay rates in W, NREM, and REM among OSA subjects, suggesting a larger sleep fragmentation and shorter stage bouts. Notably, despite considerable age and gender differences within its sample (35.6% vs 70.7% males in healthy vs severe-OSA), the study did not adjust for them. Klerman (2013) investigated durations of sleep-wake states in healthy subjects and identified an age-related decline of NREM-sleep continuity²⁹. A comparison of sleep-stage duration by Kishi (2020) in sleep bruxism (SB) patients (aged 23.3 ± 1.1, 6 males) and matched controls showed that despite no differences in the prevalence of sleep-stages (except for N1), the SB subjects differed in several parameters describing their dynamics, particularly related to an increased REM fragmentation and hence reduced duration of REM-bouts³⁰.

By analysing sleep-stage transitions^{14,15,16,17,18,19,20,21,22,23} or by characterizing their duration^{24,25,26,27,28,29,30}, all of these studies highlight the importance and clinical utility of analysing sleep dynamics across a wide range of disorders. Although most of the studies focus on one of these two aspects, it is important to point out that their nature is functionally linked as the lower transition probability relates to an increased bout duration^31,32. The existing research works have variously addressed the complexities of confounding and the selection of appropriate statistical models. The majority of studies concurred on the need to control for age and gender or limit the demographic ranges to ensure a homogeneous group of study participants. In existing studies, this is achieved by using stratified analysis with (M)ANOVA (e.g.,^21,22,28), regression adjustment (e.g.,¹⁷), or selecting matched individuals (e.g.,^16,20,30). The simplicity of the first two approaches, typically comparing the effect of exposure (such as OSA) on the outcome (e.g., sleep dynamics) against unexposed healthy controls, is offset by its susceptibility to confounding bias³³. Analyzing non-randomized observational PSG databases, which typically include older, symptomatic individuals, complicates the separation of confounder effects (of age, gender) from the exposure (disorder). In contrast, while the matching approach helps a lot to reduce the bias³⁴, it is generally applied within smaller subject cohorts. This limitation arises from the challenges of finding individuals with matched characteristics within typically imbalanced clinical databases of limited sizes.

Our study introduces a comprehensive framework for quantifying sleep dynamics, demonstrated on OSA but applicable to other (sleep) disorders. OSA, one of the most prevalent sleep disorders and a significant risk factor affecting up to 17% of the general adult population³⁵, serves as a use-case to showcase the framework’s versatility. Building on existing research and addressing its limitations, our framework-depicted in Fig. 1 and detailed in Methods-fulfils several key objectives:

Data acquisition, Fig. 1a: Leveraging a high-quality, heterogeneous observational clinical database, we identified OSA and healthy subjects (aged 6–91 years) based on clinical gold-standard of conclusive diagnosis. Consistent with the literature (e.g.,^{17,21,22,28,35}), we identified age and gender as the primary confounders. The subjects’ sleep was summarized through AASM-scored hypnograms, forming the basis for proposing and deriving novel digital markers of sleep and its dynamics. The information about sleep comorbidities was also considered, to adjust our framework for additional possible confounders. This importance of the need for comorbidity-adjustement can be underscored by the fact that 48.6% of OSA subjects in our dataset had at least one sleep-wake comorbidity among their conclusive diagnoses.
Balancing confounders, Fig. 1b: To address confounding of age and gender, exhibiting distributional overlap between OSA and healthy subjects, we applied Inverse Probability Weighting (IPW) (c.f.,^36,37,38) that ensured balanced comparisons between the OSA and healthy groups, regarding the main confounding factors. In short, IPW aims to mathematically re-weight the original dataset, as it was matched regarding the confounders considered (such as age).
Sleep dynamics modeling, Fig. 1c: Utilizing hypnograms, we propose a novel “sleep fingerprint”, a matrix $\textbf{P}$ of sleep-stage transition proportions. As first ones in the field, respecting the interdependencies between individual dimensions of transition proportions $\textbf{P}$ (that sum-up to 100%), we quantified them jointly using Dirichlet regression³⁹, a method well-suited for the compositional nature of $\textbf{P}$, within a causal S-Learner framework⁴⁰ applied to IPW-balanced data. The idea of causal S-Learner is to extrapolate outcomes for “conditioned” (OSA) vs control (healthy) subjects for arbitrary values of predictors. This approach enables the estimation of changes in sleep (dynamics) across different ages, OSA-severities (AHI), and the previously understudied interplay of OSA with gender and sleep-wake-related comorbidities.
Digital marker quantification, Fig. 1d: Finally, by exploiting the estimated model (1.c), we quantify not only the estimated effects of OSA on $\textbf{P}$ but also derive several novel digital markers. These markers capture the disorder’s impact on sleep, sleep-stage dynamics and also durations, personalized for arbitrary values of predictors (such as age, gender, apnea-severity), and are presented in terms of Conditional Average Treatment Effect (CATE) and Risk-Ratio CATE (RR-CATE)⁴¹, standing for absolute and relative comparisons of expected outcomes (such as specific stage-transitions) for OSA and healthy, respectively.

Our framework integrates the two main branches of sleep dynamics research-quantification of sleep-stage transitions and durations-by demonstrating their interconnectedness and enabling their simultaneous quantification. Our study is the first in the field to rigorously account for the interactions between OSA, gender, and a wide range of comorbidities, providing a deeper understanding and less biased estimates of how OSA impacts sleep across various ages, genders, and apnea-severity levels. As demonstrated in our results, the quantified effects and markers of OSA can be leveraged to: (i) explain-by establishing normative values for sleep parameters tailored to different demographic profiles and OSA severity; and also (ii) predict-by training models capable of identifying OSA subjects based solely on observed demographics and sleep-stage dynamics. The results are publicly accessible through an interactive online app, fostering a broader scientific exploration and discussion.

Results

The main findings of our study are presented in the four subsections:

Modelling of sleep-stage transition matrix, following Fig. 1a–c, presents the estimation of causal S-learner quantifying the matrix of sleep-stage transition proportions $\textbf{P}$, and the impact of predictors, on IPW-balanced data.
Personalized digital markers of sleep dynamics and the effects of OSA, following Fig. 1d, introduces principal findings on OSA-markers based on: 1. raw matrix $\textbf{P}$ exploring the overall prevalence of individual transitions; 2. derived markers capturing certain clinical properties by summing up relevant dimensions of $\textbf{P}$; and 3. derived Markovian matrix $\textbf{P}^M$ investigating sleep-stage-specific transition mechanisms related to stage durations. The personalization of markers refers to the estimation of the OSA impact for various levels of age, genders, and apnea-severity, helping to understand how OSA alters sleep and its dynamics across different subpopulations.
Predictive Performance of $\textbf{P}$ Markers on External Data evaluates the utility of each of the 25 possible sleep-stage transition proportions in identifying subjects with moderate sleep-disordered breathing (AHI > 15). A logistic regression model was trained on the study dataset (BSDB) and applied to the large open-access dataset (SHHS), using only age, gender, and the specific transition proportion as predictors.
The final part introduces our app, which lets users interactively explore results beyond those shown in this paper (e.g., interactions of OSA with arbitrary comorbidities, evaluation of extreme OSA with AHI$>>30$, etc).

Modelling of sleep-stage transition matrix

Propensity score model and IPW balancing

To balance the Berner Sleep Data Base (BSDB) study dataset for the main confounders of gender and age, we used the Inverse Probability Weighting (IPW) strategy, c.f., Fig. 1a, b. Propensity scores introduced in Eq. 24 were used to calculate weights according to Eq. 25. The estimates of propensity scores were based on the logistic regression model from Eq. 29. The choice of gender and age as the inputs for the IPW was driven by the evidence of existing studies that control for them^17,22 and clinical evidence that OSA is more prevalent in males and at older ages³⁵. In the BSDB exploited, both OSA and healthy subjects can be observed across the entire range of age and genders, thus satisfying the assumption of overlap and positivity³⁷. After re-weighting the dataset, the characteristics of age and gender were balanced, which was evidenced by a t-test based on IPW-reweighted means and standard deviations that failed to reject (p-val > 0.05) the null hypothesis of equality of variable means between the OSA and healthy subjects. The weights were subsequently used within the outcome model, enforcing the balanced impact of age and gender, across OSA and healthy subjects.

Outcome model

The proportions of the 25 possible sleep-stage transitions in $\textbf{P}$ were modeled using Dirichlet regression (c.f., Fig. 1c) applied to IPW-balanced data. The model specification followed Eq. 30, and the inclusion of the OSA indicator as one of its predictors exploited the causal S-learner framework, enabling a straightforward quantification of (age, gender, apnea-severity)-heterogeneous OSA-effects in terms of Conditional Average Treatment Effect (CATE) and Risk-Ratio CATE (RR-CATE) (c.f., Eqs. 27, 28). Simplistically, the CATE and RR-CATE refer to absolute and relative differences between conditioned (i.e., OSA-affected) and control (i.e., healthy) subjects, respectively. The model estimation followed the implementation of Dirichlet regression in R³⁹. To assess uncertainty, both in the model coefficients and derived effects, the nonparametric bootstrap with 200 repetitions was used to calculate 95% confidence intervals (CI) based on 2.5% and 97.5% bootstrapped quantiles.

A summary of estimated regression coefficients together with CI for each predictor and transition proportion is provided in Supplementary Table 1. The estimates indicate a significant influence of both demographics (age and gender), OSA, and its severity (AHI) on sleep-stage dynamics, as at least one of them had a significant impact on each of the transition proportions. The significant interactions of OSA with gender point to the presence of possible gender-specific OSA phenotypes. The adjustment for comorbidities appears to be essential as the comorbidity indicators influenced most of the transitions.

Given the complex relationship of the marginal effect on the outcome (i.e., transition %’s) with individual coefficients and the actual predictors’ value (c.f., Eq. 18), we detail results in the intuitive scales of expected percentages, differences (CATE, Eq. 27), and risk-ratios (RR-CATE, Eq. 28), below.

Personalized digital markers of sleep dynamics and the effects of OSA

The estimated outcome model enables various scenarios of comparisons of OSA vs healthy, including the raw matrix $\textbf{P}$, derived markers (e.g., % of sleep-stages), and Markovian transition matrix $\textbf{P}^M$, c.f., Fig. 1d. All this, for arbitrary values of predictors, provides a wide range of results that can inspire new investigative directions. Since all of our results refer to (possibly derived) transition probabilities (%), we present them in RR-CATE (CATE)% format, indicating the amount of relative (absolute)% changes, respectively.

Utilizing our model (Eq. 30) can extrapolate OSA-effects for arbitrary values of predictors, we showcased the results for three scenarios according to OSA severity, O1: mild (AHI = 5), O2: moderate (AHI = 15), and O3: severe (AHI = 30); three ages: A1: young (30 years), A2: middle-aged (50 years), A3: older (70 years); and for females (F) and males (M), without comorbidities. When selecting the most prominent effect in a group, we choose the one according to RR-CATE.

Matrix $\textbf{P}$ of sleep-stage transition proportions

The heatmap in Fig. 2 shows whether individual transition proportions in $\textbf{P}$ (Eq. 1) were significantly altered due to specific OSA conditions across different ages and genders. All these aggregated findings are based on detailed results depicted as supplementary heatmap figures supplemented with respective estimates and CI. Supplementary Figures 1 and 4 depict expected $\textbf{P}$ for different ages and OSA-severities for F and M, respectively. Based on that, Supplementary Figures 2 and 5 present CATE comparisons between different levels of OSA and healthy individuals of the same demographics, and Supplementary Figures 3 and 6 depict the respective RR-CATE.

Notably, except for $\text {N2} \rightarrow \text {N3}$ and $\text {N3} \rightarrow \text {N2}$ of A3-F, each significant effect identified for O1 or O2 of both genders was followed with significant effect in the corresponding more severe OSA group. This follows the intuition, that the sleep-stage dynamics and hence also $\textbf{P}$ change gradually with increasing prevalence of apnea events (i.e., AHI). The exemption of older F is justified by a significantly lower % of N3, 70.04 (-5.6)% in A3-O3 (c.f., Supplementary Table 4).

As the entire $\textbf{P}$ sums up to 100%, each decrease in a certain proportion is compensated with an increase in one or more other ones. For F, a major decrease is observed in $\text {REM} \rightarrow \text {REM}$, with RR-CATE of about 60% across all ages and OSA severities, and the most prominent drop, 55.55 (-4.85)%, in older. This suggests significant REM sleep instability, which could impact cognitive health⁴². The O2- and O3-F also show significantly decreased $\text {N3} \rightarrow \text {N3}$, as low as 57.08 (-6.97)% in A3, indicating disrupted deep-sleep continuity, which may affect physical restoration and memory consolidation⁴³. For A1-M, $\text {REM} \rightarrow \text {REM}$ decreased for all OSA severities, down to 67.5 (-6.19)%, and for A2-(O2,O3), 66.93 (-4.73)%, with the largest declines always in O3. The decreases in all A3-M-OSA groups were not significant, likely due to a larger variance in estimates caused by the limited number of healthy older M in the data. Contrary to F, a decrease in $\text {N3} \rightarrow \text {N3}$ was not significant in M, but a significant decrease in $\text {N2} \rightarrow \text {N2}$ was noted for (A1, A2) O3-OSA, as low as 91.09 (-3.22)%.

For both genders of all ages and OSA severities, several significantly increased transition proportions were identified, distinguishing them from healthy subjects. The most pronounced effects were found in A1-O3-F. The increased $\text {W} \rightarrow \text {(N2, N3)}$ transitions, up to 234.6 (0.4)%, indicate more frequent arousals attributable to apneic events and subsequent attempts to quickly regain restorative sleep. Increased transitions $\text {N1} \rightarrow \text {N3}$, up to 241.0 (0.4)%, suggest a compensatory mechanism where the body attempts to achieve the restorative effects of deep sleep, bypassing intermediate stages due to frequent sleep disruptions. The increase in $\text {N3} \rightarrow \text {(N1, REM)}$ transitions, up to 245.5 (0.3)%, indicates rather infrequent compensatory transitions for reduced N3-continuity, related to a regression to lighter sleep or irregular shifts to REM sleep. Lastly, elevated $\text {REM} \rightarrow \text {(N1, N3)}$ transitions, up to 261.6 (0.6)%, reflect REM stage instability, with more frequent abrupt changes in sleep depth. Particularly, the atypical transitions between N3 and REM may reflect a build-up of sleep pressure associated with OSA. While such transitions are uncommon under normal conditions, their presence may indicate a compensatory mechanism triggered by long-term disrupted or unrefreshing sleep.

Interestingly, all OSA-F showed a significant increase in awakenings from all sleep stages, $\text {(N1, N2, N3, REM)} \rightarrow \text {W}$. For M, there was no increase in $\text {REM} \rightarrow \text {W}$ in any OSA group, and increases in $\text {(N1, N2, N3)} \rightarrow \text {W}$ were observed only for O2 and O3. This suggests that in comparison to M, the OSA-F may experience more fragmented sleep due to frequent awakenings from all stages, potentially leading to greater daytime sleepiness, and the presence of insomnia symptoms.

PSG markers derived from $\textbf{P}$

The heatmap in Fig. 3 aggregates the OSA-effects identified for different PSG markers (c.f., Eqs. 2, 13) derived from $\textbf{P}$. Detailed results concerning expected probabilities (%) of their occurrence following Eqs. 18, 19, CATE, and RR-CATE for individual age and OSA categories are provided in Supplementary Tables 2-4 for F, and Tables 5-7 for M, respectively.

Regarding the percentagess of individual sleep-stages, the main effect of OSA shared between both genders of all ages is the increase in N1 in O3, with the largest increase of 161.94 (5.53)% in A1-F. The increase affected also all O2-M, up to 122.36 (2.57)% in A1, and A1-O2-F, 134.41 (3.07)%. F seem to have more affected sleep macro-architecture by OSA than M, as for all OSA-severities of (A1, A2)-F an additional increase in W%, up to 185.63% (3.57%) in A1-O3, suggesting a reduced sleep-efficiency, and decreased REM%, as low as 74.9 (-4.25)% in A2-O3, was identified. Except for reduced REM% in A1-O3-M, 79.54 (-4.46)%, these changes were identified only in F.

In addition to increased N3- and REM-awakening from Eqs. 5, 6 already discussed above, increased aggregates of total-awakenings (Eq. 3), up to 192.55 (2.89)%, and of light-sleep-awakenings (Eq. 4), up to 200.35 (2.01)%, were observed in all age and OSA categories with exception of O1-M, with largest effects in A1-O3-F.

A particularly sensitive marker of OSA for all severities appear to be NREM-and-REM oscillations (Eq. 7), which were identified as significantly increased across all groups, peaking at 212.48 (3.59)% in A1-O3-F. This marker is elaborated in detail in Fig. 4 showcasing the expected outcome for F. The upper plots (1a-c) depict the expected probability (%), CATE, and RR-CATE and corresponding CIs for varying age (and fixed AHI), whereas the bottom plots (2a-c) for varying AHI (and fixed age). One can observe, that the effect of OSA remains significant over the entire range of both, age and AHI. The magnitude of the difference tends to decrease with age (c.f., 1b-c), from CATE of about 4.5% in children to 1.5% in older age, likely due to generally shorter sleep with decreasing REM% and lower number of sleep cycles. The effect’s size increases rapidly with AHI (c.f., 2b-c), which typically increases with age. The outcomes for M are illustrated in supplementary Figure 13.

Another two highly sensitive derived markers of OSA include sleep- and sleep-stage-fragmentation from Eq. 10 and 12, referring to probabilities of transitions between wakefulness and sleep, and switching from one non-W stage to the other, respectively. The effect of the sleep-fragmentation was significant across all groups except O1-M and peaked at 192.33 (5.66)% for A1-O3-F. The sleep-stage-fragmentation was increased in all groups, peaking at 174.94 (10.42)% in A1-O3-F. The sleep-stage-fragmentation marker is in-depth elaborated in supplementary Figures 14 and 15, for F and M, respectively.

The increased fragmentation is reflected in decreased sleep- and sleep-stage-compactness from Eq. 9 and 11, referring to staying in not-interrupted sleep and sleep-stage, respectively. Reduced sleep-compactness, down to 88.42 (-9.65)% in A3-O3-F, seems specific to F, suggesting their more frequent apnea-related arousals than M. The sleep-stage-compactness was reduced in all categories of F, down to 76.77 (-16.54)% in A3-O3. This decrease, however, was not present for A3-M and A2-O1-M.

The reduced stage-specific-compactness metrics (e.g., REM $\rightarrow$ REM) were already elaborated in the section on $\textbf{P}$-specific transition %’s. Yet, the stage-specific-fragmentation markers (Eq. 13) show significant alterations due to OSA across almost all demographic groups. The only gender-specific difference can be observed in wake-fragmentation, which is increased in all cases of F (likely due to more frequent awakenings experienced), up to 192.1 (2.77)% in A1-O3, but not for O1- and A3-O2-M. The fragmentation related to non-REM (N1, N2, N3) stages increased in all OSA and demographics groups, ranging from 118.29 (1.18)% in N1-fragmentation in A1-O1-M to 178.61% (4.05%) in A1-O3-F. The most pronounced effects were visible in REM-fragmentation, up to 219.51 (2.32)% in A1-O3-F, referring to more than twice as many transitions leaving REM sleep.

Markovian transition matrix $\textbf{P}^M$ derived from $\textbf{P}$

Finally, we present the main findings based on $\textbf{P}^M$, derived from $\textbf{P}$ through row normalization as shown in Eq. 14. While $\textbf{P}$ quantifies the overall probabilities (%) of the 25 sleep-stage transitions, $\textbf{P}^M$ conditions on the presence of a specific stage, summing to 100% per row. Therefore, whereas $\textbf{P}$ evaluates overall chances of observing specific transitions in the hypnogram during the night (e.g., 36.4% of $\text {N2} \rightarrow \text {N2}$ in healthy A1-F), the $\textbf{P}^M$ evaluates the distribution of the next sleep stage given the current stage (e.g., 84.3% to stay in N2 in healthy A1-F), offering another perspective on the underlying mechanisms of sleep dynamics. The heatmap in Fig. 5 depicts how individual transitions of $\textbf{P}^M$ (Eq. 14) altered due to specific OSA conditions across different ages and genders. Detailed results on expected transition probabilities of $\textbf{P}^M$, CATE, and RR-CATE for comparisons of OSA vs healthy are provided in heatmap Supplementary Figures 7-9 and 10-12 for F and M, respectively.

W-transitions: Despite increased occurrences of $\textbf{P}$-transitions from W in F, the respective $\textbf{P}^M$-dynamic was not significantly altered, indicating that the mechanism of the W-transitions remains similar to healthy subjects, but those transitions tend to occur more often. This suggests that for OSA-F, the overall increased W% is the main trigger of the W-related transitions in $\textbf{P}$. Conversely, M exhibit increased $\text {W} \rightarrow \text {(N2, N3, REM)}$ transitions, up to 250.5 (1.3)% in A3-O3 for $\text {W} \rightarrow \text {N2}$, across all ages and OSA severities, suggesting an increased sleep pressure due to its disruption induced by apneic events.

N1-transitions: Both genders showed increased $\text {N1} \rightarrow \text {N3}$, up to 169.4 (0.9)% in A3-O1-F. Only F experience increased $\text {N1} \rightarrow \text {W}$, up to 156.5 (4.7)% in A3-O1, and decreased $\text {N1} \rightarrow \text {N1}$, as low as 76.5 (-10.3)% in A1-O1. Increased $\text {N1} \rightarrow \text {REM}$ transitions were present in all F, up to 201.1 (3.4)% in A3-O1, but only in some of the O1-M, up to 122.7 (1.1)% in A3.

N2-transitions: All groups have decreased $\text {N2} \rightarrow \text {N2}$, down to 88.4 (-9.8)% in A1-O3-F, and, except for A1-O3-F, significantly increased $\text {N2} \rightarrow \text {N3}$ transitions, up to 145.6 (2.2)% in A3-O3-F. All F groups have increased $\text {N2} \rightarrow \text {W}$ transitions, up to 177.6 (1.6)% in A3-O3-F, which is present also in all O3-M. $\text {N2} \rightarrow \text {N1}$ increased for all O2 and O3 groups, up to 179.8 (4.2)% in A3-O3-F, and $\text {N2} \rightarrow \text {R}$ increased for all O3.

N3-transitions: Across all groups, the N3 dynamic had significantly increased transitions into REM, peaking up to 293.1 (2.1)% in A3-O3-F, pointing to almost three times higher occurrence of these atypical transitions in OSA. Additionally, decreased $\text {N3} \rightarrow \text {N3}$, as low as 77.9 (-18.0)% in A1-O3-M, and increased $\text {N3} \rightarrow \text {N1}$, up to 316.8% (2.5%) in A3-O3-F, were noted for all except O1-M. Transitions $\text {N3} \rightarrow \text {W}$ increased in all (A2, A3)-F, up to 214.1% (2.6%) in A3, and only in O3-M of the same demographics.

REM-transitions: The most prominent effects of OSA are visible in changed REM dynamics. The decrease in $\text {REM} \rightarrow \text {REM}$ in both genders of all ages, down to 77.1 (-20.4)% in A3-O3-F, is compensated by increased transitions into all NREM-stages, up to 345.8 (5.9)% in $\text {REM} \rightarrow \text {N1}$ for A1-O3-F. The increased $\text {REM} \rightarrow \text {W}$ is specific for all F, up to 254.8 (5.3)% in A1-O3-F. For M, these transitions are decreased partially for all O3 and A3-O2, up to 180.0% (2.8%) in A3-O3.

Stage-survival: Finally, following Eq. 15, the diagonal elements of $\textbf{P}^M$ (i.e., probabilities of $\text {W} \rightarrow \text {W}$, $\text {N1} \rightarrow \text {N1}$, etc.) simplistically approximate the average expected duration of individual sleep stages, bridging transition dynamics with investigations modelling the sleep-bout durations. Here, naturally, significantly decreased probabilities of staying in a given stage introduced above are equivalent to significantly decreased stage durations.

Predictive performance of $\textbf{P}$ markers on external data

The results of the previous sections focused on quantifying the effects of OSA, specifically explaining how OSA impacts sleep dynamics and its markers. To illustrate the informativeness of these markers, we developed a simple logistic regression model for each of the 25 transition proportions in $\textbf{P}$. The binary outcome variable was defined as moderate sleep-disordered breathing, indicated by AHI> 15, and the predictions were based on three predictors: age, gender, and the percentage of a specific transition. The inclusion of age and gender was motivated by the observed heterogeneity of OSA effects with respect to these factors. Each of these logistic regression models, trained on the study dataset (BSDB), was used to make predictions on the SHHS1 and SHHS2 subsets of SHHS, which contain observational data on subjects who underwent baseline PSG (SHHS1) and follow-up PSG several years later (SHHS2, N = 2621).

Table 1 AUROC with 95% CI for predicting moderate sleep-disordered breathing (AHI> 15) using individual sleep-stage transition proportions. Results are shown for SHHS1 (sleep heart health study baseline) and SHHS2 (follow-up), of N = number of subjects after exclusion criteria. Mean ± SD (standard deviation) summarizes performance across all transitions. Asterisk (*) denotes significant predictive power with AUROC (> 50%).

Full size table

The results in Table 1 indicate that each transition proportion included in the simple predictive model demonstrated significant predictive power, as all AUROCs and their confidence intervals were much greater than 50%. The average AUROC performance across proportions was 66.77% for SHHS1 and 65.98% for SHHS2, with standard deviations of 1.79% and 1.82%, respectively, highlighting practical equivalence in generalizations between SHHS1 and SHHS2. This robustness is particularly notable given that (i) we used a simple logistic regression model that assumes monotonic effects of individual predictors and no interactions between them, (ii) we predicted moderate sleep-disordered-breathing (AHI > 15) using only age, gender, and the percentage of a single transition, (iii) the models were trained on a relatively small dataset of 622 patients, and (iv) the generalization was performed from the clinical population of BSDB to the broader public represented by SHHS. All AUROCs are very similar, which can be explained by the fact that all transitions in $\textbf{P}$ are interdependent and numerically share related information.

Interactive R shiny app

The above-presented results focused on three categories age (30, 50, 70 years), OSA severity (mild, moderate, severe), and both genders, considering a case without sleep-comorbidities. For a deeper exploration of our findings, the volume of which is beyond the scope of this paper, we created a freely accessible app (https://mystatsapps.shinyapps.io/Causal_Sleep_Dynamics/) that interactively displays results for arbitrary values of predictors. As an input, the user specifies the transition(s) of interest by clicking out some of the 25 (5$\times$5) dimensions, age, OSA severity (AHI), and the presence of comorbidities (as indicated in Eq 30). Additionally, the user chooses whether CATE and RR-CATE should be displayed for age or AHI (= CATE-variable).

As an output, the app displays a total of six panels. The most important one, Effects of OSA, displays expected probabilities (%) of selected transitions for healthy vs OSA together with corresponding CATE and RR-CATE. All these outputs are supplemented by 95% CI and are depicted for selected age (range 0-100 years) or AHI (range 5-100), and both genders.

The Percentual Transition Matrix and Markovian Transition Matrix tabs show the expected matrix of sleep-stage transitions $\textbf{P}$ and the derived row-normalized $\textbf{P}^M$ for healthy and OSA subjects of both genders and specified characteristics. In addition, each tab shows matrices of CATE and RR-CATE depicted as heatmaps supplemented with 95% CI.

The Dirichlet Regression Coefficients tab summarizes regression coefficients as presented in Supplementary Table 1. The dimensions of specified transitions of interest from the input are highlighted.

The Marginal Effects of All Predictors tab approximate the Eq. 18 by calculating the difference in the outcome by a row-indicated change in the predictors’ value. The marginal effects that are supplemented with 95% CI are shown concerning four baselines (healthy, OSA) $\times$ (female, male), of specified characteristics from the input. Due to the complex relationship of marginal effect with all Dirichlet dimensions its value changes with the values of predictors (c.f., Eq. 18). Hence, their understanding can be particularly useful in understanding the interplay between different levels of demographics, OSA severity, and particularly their interactions with comorbidities, that have been so far understudied.

Finally, the Sleep Stage Survival tab depicts survival curves of individual sleep stages, based on diagonal elements on $\textbf{P}^M$ and Eq. 15. Notably, as this quantity is based on the whole-night $\textbf{P}^M$, survival curves illustrate the overall average duration of individual stages.

Discussion

Sleep is a complex phenomenon whose finest mechanisms are yet to be fully deciphered. Scoring sleep into a hypnogram of five sleep-wake stages translates it into a simplified, human-readable code, enabling the calculation of PSG markers and their interpretation by clinical personnel. Currently, likely due to non-standardized methodologies and reliance on aggregate counts or summary indices, the representation of sleep-stage dynamics in clinical PSG reports remains limited ^1,44. Although such markers are reported, they often lack normative values and standardized interpretation guidelines, which may limit their full clinical potential. Yet, existing studies provide strong evidence that more granular characteristics of sleep-stage transitions^{14,15,16,17,18,19,20,21,22,23} or sleep-stage duration/survival^{24,25,26,27,28,29,30} can be specific for various sleep conditions and age. For clinical, economic, and ethical reasons, most of the related research has in common that PSG data were collected in a non-randomised way and were analysed retrospectively, hence subjected to considerable confounding³³. A minority of studies investigating sleep dynamics addressed confounding either by analyzing subjects with restricted demographic ranges (e.g.,^14,15,24), or by selecting typically age- and gender-matched controls (e.g.,^16,18,20,30). This may limit the findings’ generalizability or underfit the age- and gender-specific phenotypes.

By exploiting techniques of causal inference (IPW-balancing from Eq. 25; S-Learner from Eq. 30), our study presents a novel and highly flexible approach to jointly quantify (i) sleep-stage dynamics, (ii) effect of disorder, and (iii) derive several established as well as novel digital markers of sleep. We demonstrate our approach to OSA, the most prevalent sleep condition and a significant risk factor, evidenced to impact sleep macro-strucure and dynamics^{19,22,25,26,27,28}.

Working with the observational BSDB database, we initially balanced the dataset using IPW-reweighting and addressed the confounding of age and gender, whose distributions differed between healthy and OSA-affected subjects. Ignoring this, it would be challenging to separate the effects of demographics (e.g., of ageing) from OSA, since its prevalence and severity increase with age²⁸. To quantify sleep-stage dynamics, we proposed to exploit the matrix $\textbf{P}$ (Eq. 1), consisting of 25 (5 $\times$ 5) interdependent transition proportions. Thanks to the flexibility of $\textbf{P}$ to quantify all, the dynamics, derived markers, and Markovian $\textbf{P}^M$, we suggest considering it as a simple digital sleep-fingerprint. All dimensions of $\textbf{P}$ were modelled jointly as an outcome of Dirichlet regression (Eq. 17, 30), respecting their compositional nature (summing to 100%) and allowing their straightforward aggregation to derive many established and novel PSG markers (c.f., Eqs. 3, 13). In contrast, analyzing dependent outcomes, e.g., % of sleep stages and their transitions, separately, such as using (M)ANOVA²², would lead to biases and disregard constraints on value ranges and cumulative sums. Considering predictors of age and gender allowed outcome model’s (Eq. 30) adaptation to nonlinear changes in sleep due to ageing and quantification of possible gender phenotypes^2,4,5. Most importantly, the inclusion of the OSA indicator followed the causal S-learner framework⁴⁰, allowing direct quantification of OSA effects in terms of CATE and RR-CATE (c.f., Eqs. 27, 28) by comparing expected outcomes for healthy individuals of given demographics with hypothetically matched OSA-subject of specified OSA-severity (AHI). Our modelling approach avoids discretization of age and AHI, and hence allows quantification of personalized (up to OSA-severity and demographics) effects/markers, closely aligning the needs of precision medicine. Even so, it is important to recall that the BSDB dataset contained only 2 cases of paediatric OSA (age < 18 years) and therefore, the conclusions should be taken with care when generalizing them to the pediatric OSA-population. Uniquely, the richness of BSDB allowed us to account for interactions between OSA and several other sleep comorbidities - a clinically well-known and relevant fact (c.f., ^{45,46,47,48,49}), so far either overlooked (e.g.,^19,25), being admitted but not handled (c.f., ²⁸), or leading to analysis of subjects with no sleep-comorbidities (e.g., ^22,26). With 48.6% of OSA subjects in our observational dataset having at least one additional sleep comorbidity, addressing these interactions is crucial for reducing bias and accurately estimating the impact of OSA from other conditions.

The estimated outcome model provides three main dimensions of our results. First, the quantification of sleep fingerprint $\textbf{P}$ provides information on the % of time spent in individual transitions and compactness of sleep-stages. Several transitions were significantly increased by OSA for all demographics and AHI-severity groups: W $\rightarrow$ (N2, N3), N1 $\rightarrow$ N3, N3 $\rightarrow$ (N1, REM), and REM $\rightarrow$ (N1, N3), all peaking with RR-CATE >200%. Despite their rare presence in healthy subjects, our findings suggest they may be a sensitive marker of OSA. In addition, all OSA-F had significantly increased (N1, N2, N3, REM) $\rightarrow$ W, W $\rightarrow$ REM, N1 $\rightarrow$ REM, REM $\rightarrow$ (W, N2), and decreased REM $\rightarrow$ REM, suggesting their higher vulnerability to awakenings and REM-disruptions in comparison to M, for whom these effects were observed only partially. This finding may also be linked to more likely REM-OSA in F⁵⁰. These results suggest that female OSA patients may experience subtler forms of sleep disruption, such as increased REM instability and awakenings, which could contribute to the under-recognition of OSA burden in women if relying solely on oxygen desaturation metrics. Secondly, by aggregating dimensions of $\textbf{P}$, one can derive standard PSG markers (e.q., % of sleep-stages), and many novel proposed ones, that may be specific to particular conditions. For all demographic and AHI groups, OSA significantly increased NREM-REM oscillations (c.f., Eq. 7), overall sleep-stage fragmentation (c.f., Eq. 12), and (N1, N2, N3, REM)-specific fragmentations (c.f., Eq. 13). In addition, all, sleep-, light-sleep, and deep-sleep-awakenings (c.f., Eqs. 3, 5), were increased for all moderate and severe-OSA groups. Finally, row-normalizing $\textbf{P}$ yields the Markovian $\textbf{P}^M$, which quantifies the probabilistic distribution of the next phase given the current state, thus investigating deeper dynamic mechanisms. For all age and AHI groups, OSA increased N1 $\rightarrow$ N3, N3 $\rightarrow$ REM, REM $\rightarrow$ (N1, N2, N3), and decreased REM $\rightarrow$ REM and N2 $\rightarrow$ N2. All moderate and severe OSA had also increased N3 $\rightarrow$ N1 and decreased N3 $\rightarrow$ N3. For all OSA-M, an additional increase in W $\rightarrow$ (N2, N3, REM) and for all OSA-F increase in N1 $\rightarrow$ (W, REM), (N2, REM) $\rightarrow$ W and decreased N1 $\rightarrow$ N1 was observed. Furthermore, we demonstrated that $\textbf{P}^M$ can also be used to model sleep-stage survival (Eq. 15), bridging the two principal directions of sleep dynamics research: sleep-transitions^{14,15,16,17,18,19,20,21,22,23} and sleep-stage bout duration quantification^{24,25,26,27,28,29,30}. The merit of the stage survival analysis includes the evaluation of the functional form of the distribution. We can learn their statistical property which provides insights into the underlying mechanism.

To underscore the diagnostic utility of our findings, we evaluated the predictive power of individual transition proportions in $\textbf{P}$ on external data from SHHS, containing a broad population of subjects from the general public. For each transition, we developed a simple logistic regression model using age, gender, and the specific transition percentage as predictors, and assessed its ability to identify moderate sleep-disordered breathing (AHI > 15). Results showed significant predictive utility across all 25 transitions, with all AUROC values exceeding 50% (range of 59.82-69.28), and their average of 66.77% for SHHS1 and 65.98% for SHHS2, with respective standard deviations of 1.79% and 1.82%. This robust performance highlights the generalizability of the derived markers from the BSDB dataset to a broader population while confirming the informativeness of individual transitions as predictors of sleep-disordered breathing. Higher predictive performance can be expected when including additional predictors not reflected in $\textbf{P}$ (e.g., total-sleep-time, sleep-latency), their interactions with specific proportion, using all proportions jointly, or using a more complex predictive model than logistic regression.

In summary, our findings from different perspectives confirm that OSA is associated with reduced continuity of N2, N3, and REM sleep, reflected by increased sleep fragmentation^{19,22,25,26,27,28} at both the conventional sleep-to-wake level and in the proposed markers of stage-specific dynamics. By exploiting the matrices $\textbf{P}$ and $\textbf{P}^M$, we identified OSA-specific transitions contributing to these alterations, particularly atypical transitions from light to deep sleep and oscillations between N3 and REM. These transitions, though rare in healthy individuals, may serve as sensitive markers of OSA, possibly reflecting compensatory mechanisms where the body attempts to regain restorative states, either after their frequent disruption by apneic events or following long-term accumulation of disrupted and unrefreshing sleep. These findings contribute to growing evidence that OSA phenotypes vary across demographic groups and may benefit from personalized clinical interpretation. Our results suggest that females with OSA exhibit increased REM-stage instability and a higher frequency of awakenings from multiple sleep stages, despite often presenting with milder oxygen desaturation-patterns that may elude detection by AHI alone when compared to males^51,52,53. This aligns with prior reports of REM- or arousal-dominant OSA profiles in women, often accompanied by insomnia-like symptoms^2,4. By quantifying stage-specific dynamics, our framework may support more refined diagnostic stratification and treatment decisions-especially in subgroups historically underrepresented or mischaracterized by standard PSG indices. For instance, women with OSA-who often present with insomnia-like symptoms and have a higher risk of comorbid depression-may benefit from personalized treatment approaches that contextualize available markers, such as REM-related instability, frequent awakenings, or shorter apneas ⁵³, to more precisely identify the plausible contributing factors-be it OSA, insomnia, depression, or others.. This may guide the use of CPAP with cognitive behavioral therapy for insomnia (CBT-I), or considering oral appliance therapy (OAT) in milder-AHI cases^52,53. The results of our work are also available as an interactive app, allowing in-depth exploration of results and proposed markers for arbitrary demographics, OSA severity, and their interactions with other sleep comorbidities.

Our approach to support diagnostics, has broader applicability beyond the OSA use-case, as sleep dynamics and their markers can be specific to other sleep disorders, such as narcolepsy, insomnia, periodic limb movement disorder, and others. With the rise of telemedicine and increasing use of wearables, investigating sleep dynamics and its markers could become a valuable screening tool for assessing the risk of psychiatric (e.g., depression, schizophrenia, etc.) and neurodegenerative disorders (e.g., Parkinson’s disorder, Alzheimer’s disease, etc.), which are evidenced to be associated with disrupted sleep^54,55,56. Even though the consumer devices provide - compared to clinical PSG - lower quality signals and hypnograms, adaptation and re-estimation of our approach to these data has still great potential to provide valuable insights. Furthermore, with advances in automatic sleep-scoring tools that offer hypnodensity beyond the standard hypnogram⁵⁷, our framework could enhance the understanding of sleep micro-events and more granular sleep dynamics, when hypnograms on less than 30-second windows would be used as data for our model.

Limitations & future work Our future work will extend our approach to address several of its limitations. Following the ideas of Schlemmer et al. (2015)²¹, we aim to extend it to the second-order sleep-stage transitions that would require quantifying a 125 (= 5 x 25) dimensional transition cube. Next, we plan to account for time spent asleep and investigate dynamics at different times of the night. Currently, we have focused on transitions aggregated over the entire sleep period, but recognizing the non-stationary nature of sleep offers opportunities for identifying even more specific markers. This would also concern the quantification of sleep-stage survival or duration, which our current work approximated by an overall night expectation. Additionally, we plan to consider whether the subject’s apnea events are REM- or NREM-dominant, which may reveal additional phenotypes, and to assess whether the proposed markers can also help distinguish obstructive sleep apnea (OSA) from central sleep apnea (CSA). Further, we plan to investigate in greater detail the interaction of OSA with comorbidities, which can already be explored in our app. Finally, our current framework captures sleep dynamics at the macro-structural (sleep-stage) level, relying on stage annotations from standard hypnograms ¹. It does not directly account for EEG-based micro-events such as brief arousals or microstructural instability captured by the Cyclic Alternating Pattern ^{58,59,60,61,62}, which has shown clinical relevance in characterizing sleep instability, evaluating treatment response (e.g., to CPAP), and supporting diagnostics. With the rise of automatic sleep-scoring algorithms that produce hypnodensity outputs ⁵⁷-i.e., probabilistic stage predictions at sub-30-second resolution-there is growing potential to adapt our framework to this finer temporal scale. The domains of sleep-stage and micro-structural dynamics can now be seen as complementary, and our future work will aim to bridge them.

Methods

This section details the study data, introduces the matrix of sleep-stage transition proportions as a foundational digital marker, and explores its properties alongside several novel sleep markers. Additionally, we outline the technical framework, which leverages causal inference tools to minimize bias in the conclusions of this observational study, and present a use case examining the effects of OSA.

Data

Berner sleep data base (BSDB)

For the primary evaluations (such as estimating the effects) of our study, we exploited the clinical Berner Sleep Data Base (BSDB) from Inselspital, University Hospital Bern. We considered a subset of 62 healthy subjects (aged 0-71 years) with excluded existing clinical conditions undergoing PSG as controls in several historical studies, and a total of 560 individuals having OSA (aged 2-81 years, including 2 pediatric cases aged < 18 years) as one of their conclusive diagnoses, made by physicians considering all test-based diagnoses (e.g., actigraphy- or PSG-based), clinical anamnesis, and the context. The PSG signals were recorded at 200 Hz and scored manually according to the AASM rules¹. To align older recordings scored by Rechtschaffen and Kales⁶³ rules with AASM standard, N3 and N4 stages were merged into N3. To prevent bias due to possibly longer sleep-onset in the unfamiliar clinical setting, a part of the PSG recording and hypnogram before the first sleep was cut off. Further, recordings with total sleep time <180 minutes, >5% of the time with lights-on, no sleep-stage transitions, and subjects with breath control or ventilation therapy introduced, or undergoing split night PSG evaluations were excluded. For the basic statistical description of BSDB in Table 2, we considered 3 groups of OSA subjects: mild (O1) with AHI $\in [5,15)$, moderate (O2) with AHI $\in [15,30)$, and severe (O3) with AHI $\ge 30$.

Table 2 Comparison of demographics, sleep metrics, and prevalence of sleep comorbidities among healthy and (mild, moderate, severe) OSA subjects in the BSDB dataset. Variables denoted with $^*$ are binary, summarized as count (percentage), N (%), and significantly different pairs are listed, following a significant chi-squared independence test and pairwise posthoc proportions test. Healthy subjects were excluded from comorbidities comparisons as they had no comorbidities. Variables denoted with $^\dagger$ are continuous, summarized as mean (standard deviation), $\mu (\sigma )$, and significant pairs are listed following a significant Kruskal–Wallis test and pairwise Wilcoxon posthoc test. All posthoc pairwise comparisons were performed with Bonferonni corrections at the significance level of 0.05.

Full size table

Most sleep metrics and demographics differ significantly between healthy individuals and OSA groups, as well as across different OSA severity levels. There is a clear trend of increasing age and % of males from healthy to more severe OSA, which is also associated with changes in sleep architecture, such as decreased sleep efficiency and reduced N3 and REM %. Separating the effects of these demographic shifts from the effects of OSA is a key challenge, addressed using a causal inference below.

Ethics approval and consent The secondary usage of the dataset was approved by the local ethics committee (Kantonale Ethikkommission Bern [KEK]-Nr. 2022-00415), ensuring compliance with the Human Research Act (HRA) and Ordinance on Human Research with the Exception of Clinical Trials (HRO). All methods were carried out in accordance with relevant guidelines and regulations. Written informed consent was obtained from all participants, as part of the general consent process introduced at Inselspital in 2015. Data were maintained with confidentiality throughout the study.

Sleep heart health study (SHHS)

The Sleep Heart Health Study (SHHS) is a large, multi-centre cohort study designed to investigate the relationship between sleep-disordered breathing and cardiovascular outcomes^64,65. SHHS1 includes baseline polysomnography (PSG) data collected from 5804 unique subjects aged 39-90 years, while SHHS2 provides follow-up PSG data for 2651 subjects aged 44-90 years. Following the same criteria as in BSDB, we included only subjects with total sleep time (TST) > 180 minutes. After this selection, SHHS1 retained 5734 subjects (mean age 63.1 ± 11.2 years, 47.6% male), and SHHS2 included 2621 subjects (mean age 67.5 ± 10.3 years, 46.1% male).

SHHS1 and SHHS2 were utilized to independently evaluate the predictive power of individual sleep-stage transition proportions, forming the foundation for deriving novel sleep markers, in identifying subjects with sleep-disordered breathing. These analyses provide robust external validation of the effectiveness of these transition proportions in the predictive task, which underscores their clinical relevance.

For both BSDB and SHHS datasets, the definition of the Apnea-Hypopnea Index (AHI) used aligns with the National Sleep Research Resource (NSRR) harmonization⁶⁵: AHI = (All apneas + hypopneas with $\ge$30% nasal cannula [or alternative sensor] reduction and $\ge$3% oxygen desaturation or with arousal) per hour of sleep, which follows clinical guidelines¹.

Matrix $\textbf{P}$ of sleep-stage transition proportions: a basic sleep marker

Our framework proposes the use of a flexible digital marker-a sleep fingerprint-that, based on the observed sleep stages of a subject, enables the derivation of both established and novel PSG parameters, quantifying various sleep characteristics that may be specific to different sleep conditions. The basis for achieving this is the hypnogram, which represents the sequence of sleep-wake stages (W, N1, N2, N3, REM) throughout the night. While sleep dynamics in clinical PSG reports are currently limited to the total counts of transitions and awakenings, this can be easily extended by the 5 x 5 matrix of sleep-stage transition proportions $\textbf{P}$. Let us denote the total number of epochs in the patient’s hypnogram (starting from sleep-onset) as $N^E$, and the number of transitions from stage i to j as $N^{ij}$. Each cell $p_{ij}$ of $\textbf{P}$ can then be expressed as:

$$\begin{aligned} p_{ij} = \frac{N^{ij}}{N^E} = P(\text {next stage = { j}, current stage = { i}}) = P(i \rightarrow j) \hspace{5mm} \forall i, j \in \{\text {W, N1, N2, N3, REM} \}, \end{aligned}$$

(1)

indicating the empirical probability (proportion, %) of observing a transition from stage i to j $(i \rightarrow j)$, relative to all the transitions observed in the hypnogram. In the following, we highlight three main dimensions of the clinical relevance of $\textbf{P}$.

$\textbf{P}$ recovers the majority of clinically established PSG markers

For example, summing up the column transition proportions of $\textbf{P}$ yields the overall percentage of sleep stages:

$$\begin{aligned} \text {stage { j} \%} = p_{*,j} = \sum _{i \in \{\text {W, N1, N2, N3, REM}\}} p_{i,j} \quad \forall j \in \{\text {W, N1, N2, N3, REM}\}. \end{aligned}$$

(2)

In addition, other clinically commonly used PSG markers can be easily derived by considering relevant proportions and the Total Sleep Time (TST), $\text {TST} = \frac{N^E}{2}$, in minutes. For example, Sleep Efficiency (SE), quantifying the percentage of sleep after its onset, can be calculated as $\text {SE} = \sum _{j \in \{\text {N1, N2, N3, REM}\}} p_{*,j} = 1 - p_{*,W}$. The Wake After Sleep Onset (WASO) minutes can be computed as $\text {WASO} = \frac{N^E}{2} p_{*,W}$. The Number of Awakenings (NoA) can be determined by $\text {NoA} = N^E \sum _{i \in \{\text {N1, N2, N3, REM}\}} p_{i,W}$. Finally, the Number of Transitions (NoT) is given by $\text {NoT} = N^E \sum _{i \in \{\text {W, N1, N2, N3, REM}\}} (1 - p_{i,i})$.

$\textbf{P}$ allows derivation of novel PSG markers

The aggregation of $\textbf{P}$-dimensions offers a great flexibility to derive several novel and highly intuitive digital markers of sleep and its dynamics. Considering a set of sleep-states, $\mathcal {S} = \{ \text {N1, N2, N3, REM} \}$, we propose and in results also evaluate the following.

Total Awakenings, the probability of transitioning from any sleep-state ($\mathcal {S}$) to wakefulness:

$$\begin{aligned} P(\mathcal {S} \rightarrow \text {W}) = \sum _{i \in \mathcal {S}} p_{i,W} = p_{\text {N1,W}} + p_{\text {N2,W}} + p_{\text {N3,W}} + p_{\text {REM,W}}, \end{aligned}$$

(3)

Light-sleep Awakenings, the probability of transitioning from light sleep (N1, N2) to wakefulness:

$$\begin{aligned} P(\text {Light-sleep} \rightarrow \text {W}) = p_{\text {N1,W}} + p_{\text {N2,W}}, \end{aligned}$$

(4)

Deep-sleep Awakenings, the probability of transitioning from deep sleep (N3) to wakefulness:

$$\begin{aligned} P(\text {N3} \rightarrow \text {W}) = p_{\text {N3,W}}, \end{aligned}$$

(5)

REM Awakenings, the probability of transitioning from REM sleep to wakefulness:

$$\begin{aligned} P(\text {REM} \rightarrow \text {W}) = p_{\text {REM,W}}, \end{aligned}$$

(6)

NREM-REM Oscillations, sum of probabilities for transitions between NREM sleep stages and REM sleep:

$$\begin{aligned} P(\text {NREM} \rightleftarrows \text {REM}) = \sum _{(i,j) \in \{N1, N2, N3\} \times \{REM\}} p_{i,j} \end{aligned}$$

(7)

Light-sleep Oscillations, sum of probabilities for transitions between the light sleep stages (N1 a, N2):

$$\begin{aligned} P(\text {N1} \rightleftarrows \text {N2}) = p_{N1, N2} + p_{N2, N1}, \end{aligned}$$

(8)

Sleep Compactness, the total probability of staying within any (non-wake) sleep stages:

$$\begin{aligned} P(\text {Sleep Compactness}) = \sum _{(i,j) \in \mathcal {S} \times \mathcal {S}} p_{i,j}, \end{aligned}$$

(9)

Sleep Fragmentation, the total probability of switching between wakefulness and sleep states:

$$\begin{aligned} P(\text {Sleep Fragmentation}) = \sum _{\begin{array}{c} i \in \mathcal {S} \end{array}} (p_{W,i} + p_{i,W}), \end{aligned}$$

(10)

Sleep-stage Compactness, the sum of probabilities of staying within the same (non-wake) sleep stages:

$$\begin{aligned} P(\text {Sleep-stage Compactness}) = \sum _{i \in \mathcal {S}} p_{i,i}, \end{aligned}$$

(11)

Sleep-stage Fragmentation, the probability of transitioning from one (non-wake) sleep stage to a different one:

$$\begin{aligned} P(\text {Sleep-stage Fragmentation}) = \sum _{\begin{array}{c} (i,j) \in \mathcal {S} \times \mathcal {S} \\ i \ne j \end{array}} p_{i,j} \end{aligned}$$

(12)

Stage-specific Compactness and Fragmentation, for each sleep stage i, the probability of staying in the same stage and the probability of switching to any other sleep stage, respectivelly:

$$\begin{aligned} P(i\text {-th stage Compactness}) = p_{i,i}, \quad P(i\text {-th stage Fragmentation}) = \sum _{\begin{array}{c} j: i \ne j \end{array}} p_{i,j} \quad \forall i \in \{\text {W, N1, N2, N3, REM}\} \end{aligned}$$

(13)

Each metric from Eqs. 3–13 expands the standard clinical PSG markers and focuses on a specific sleep pattern. Their quantification requires no additional effort once the subject has undergone the PSG study and the hypnogram is available.

$\textbf{P}$ bridges stage-transitions and durations-oriented sleep dynamics research

Normalizing $\textbf{P}$ so that each row sums to 1 (100%) yields a standard transition matrix, often utilized in Markovian models. We denote this matrix as $\textbf{P}^{M}$, where M indicates it is Markovian. Each cell, $p_{i,j}^M$, corresponds to the conditional probability of transitioning to stage j after being in stage i:

$$\begin{aligned} p_{i,j}^M = P(\text {next stage = { j} | current stage = { i}}) = \frac{p_{i,j}}{p_{i,*}} = \frac{p_{i,j}}{ \sum _{\begin{array}{c} j \in \{\text {W, N1, N2, N3, REM}\} \end{array}} p_{i,j}} \hspace{5mm} \forall i, j \in \{\text {W, N1, N2, N3, REM}\} \end{aligned}$$

(14)

The key difference is that while $\textbf{P}$ provides an overall view of the plausibility of individual transitions, $\textbf{P}^{M}$ operates under the assumption that a given state has occurred and problematically evaluates the chances of (not-)switching the sleep-stage in the next epoch. Both $\textbf{P}$ and $\textbf{P}^{M}$ are interconnected and offering two perspectives on sleep-stage dynamics. Notably, the diagonal elements of $\textbf{P}^{M}$ enable straightforward quantification of the sleep-stage durations, as they are exponentially distributed, $\mathcal {E}(\lambda ) = \mathcal {E}(1 - p_{i,i}^M)$, with the expected duration (ED) of each stage (over entire night):

$$\begin{aligned} \text {ED}_i = E(\text {duration of stage { i}}) = \frac{1}{\lambda } = \frac{1}{1 - p_{i,i}^M}\quad \forall i \in \{\text {W, N1, N2, N3, REM}\}, \end{aligned}$$

(15)

known as the mean sojourn time. Due to the scoring of sleep in 30-second windows, these durations are measured in epochs.

Causal framework to quantify sleep-stage transition matrix $\textbf{P}$ and effects of a disorder

The preceding sections have highlighted the utility of investigating the matrix $\textbf{P}$ as a sleep-fingerprint, showing its relation to several clinically established PSG markers and its connection between stage-transition and stage-duration sleep dynamics research. Moreover, we introduced several novel markers derived from $\textbf{P}$. To quantify $\textbf{P}$ and the derived markers, the next sections will present an approach that combines Dirichlet regression, well-suited for the compositional data of $\textbf{P}$, with elements of causal inference to address confounding. The key challenge in modeling $\textbf{P}$ lies in respecting the compositional nature of the data, where the total of all percentages must sum to 100%. Ignoring this constraint, such as analyzing particular proportions separately with ANOVA, can lead to significant bias and counterintuitive outcomes. This issue is evident in some meta-analyses where, for example, aggregated percentages of sleep stages do not sum to 100%, as seen in Table 2 of ⁷. This challenge must be addressed when modeling the proportions of sleep-stage transitions in $\textbf{P}$, which involve 25 compositional dimensions. Ensuring the outcomes are intuitive and correct is crucial for enabling their interpretation by medical professionals.

Dirichlet regression: model formulation and properties

The Dirichlet distribution is well-suited for modeling compositional data, such as percentages or the elements of $\textbf{P}$. For a random variable $Y = (Y_1, Y_2,..., Y_D)$ representing proportions over D dimensions, the probability density function of the Dirichlet distribution is parameterized by a vector of positive reals $\alpha = (\alpha _1,...,\alpha _D)$ and given by:

$$\begin{aligned} Dir(Y; \alpha ) = \frac{1}{B(\alpha )} \prod _{d=1}^D Y_d^{\alpha _d - 1}, \end{aligned}$$

(16)

where $B(\alpha )$ is the multivariate beta function ensuring normalization³⁹. In Dirichlet regression, the logarithms of $\alpha$ are modeled as functions of covariates, adapting the distribution’s characteristics based on predictor values:

$$\begin{aligned} \log (\alpha _d) = \beta _{d0} + \beta _{d1}X_1 + \cdots + \beta _{dK}X_K, \end{aligned}$$

(17)

where $X = (X_1,...,X_K)$ is a set of K covariates and $\beta _d = (\beta _{d0},...,\beta _{dK})$ a vector of regression coefficients for the d-th dimension. The expectation of each component $Y_d$, $E[Y_d]$, and the marginal effect of $X_j$ on $E[Y_d]$, $\frac{\partial E[Y_d]}{\partial X_k}$, are directly influenced by all elements of X and $\alpha$, reflecting the interdependencies of compositional data:

$$\begin{aligned} E[Y_d]&= \frac{\alpha _d}{\sum _{j=1}^D \alpha _j} = \frac{\exp (\beta _{d0} + \beta _{d1}X_1 + \cdots + \beta _{dk}X_K)}{\sum _{j=1}^D \exp (\beta _{j0} + \beta _{j1}X_1 + \cdots + \beta _{jk}X_K)}, \nonumber \\ \frac{\partial E[Y_d]}{\partial X_k}&= E[Y_d] \left( \beta _{dk} - \sum _{j=1}^D \beta _{jk} E[Y_j] \right) . \end{aligned}$$

(18)

A convenient property of the Dirichlet distribution is its ability to aggregate over several dimensions, allowing flexible quantification of measures based on the elements’ summation. For example, aggregating dimensions i and j yields:

$$\begin{aligned} Y' = (Y_1,..., Y_i + Y_j,...,Y_D) \sim Dir(Y'; (\alpha _1,..., \alpha _i + \alpha _j,...,\alpha _D)). \end{aligned}$$

(19)

Thus, Dirichlet regression is suitable for modelling $\textbf{P}$, and its aggregation property facilitates straightforward quantification of all markers derived from it (c.f., Eqs. 2–13).

Causal elements

In contrast to randomized experiments, the analysis of observational data, such as those from PSG databases, is susceptible to confounding, due to varying distributions of characteristics (e.g., age), between treated/exposed/conditioned and healthy-control subjects. Our study, which aims to quantify changes in sleep parameters resulting from a sleep disorder, adopts the principles and standard notation of causal inference⁴¹. We define the treatment/exposure/condition variable T as an indicator of whether a subject suffers from a particular disorder of interest ($T = 1$), or is a healthy control ($T = 0$). In line with the language of causal inference, the treatment within our study corresponds to the presence of OSA. The outcome (Y) represents the sleep parameter investigated, such as $\textbf{P}$, while subject characteristics and potential confounders are denoted as X.

Potential outcomes framework and causal estimands. The potential outcomes framework asserts to each individual two hypothetical outcomes: Y(1), under $T = 1$, and Y(0), without exposure, $T = 0$. The Individual Treatment Effect (ITE), $\tau _i$, is the difference between these outcomes, evaluating the causal effect of exposure (e.g., OSA) on subject’s outcome (e.g., sleep):

$$\begin{aligned} \text {ITE} = \tau _i = Y_i(1) - Y_i(0). \end{aligned}$$

(20)

The Average Treatment Effect (ATE) is the expected ITE, assessing the effect of T across the entire population:

$$\begin{aligned} \text {ATE} = E[\tau ] = E[Y(1) - Y(0)]. \end{aligned}$$

(21)

The Conditional Average Treatment Effect (CATE) assesses $\tau (x)$, standing for the treatment effect within a specific subgroup of the population characterized by covariates $X$, making it suitable to quantify personalized markers for different conditions:

$$\begin{aligned} \text {CATE}(X) = E[\tau (x)] = E[Y(1) - Y(0) \mid X], \end{aligned}$$

(22)

The fundamental problem of causal inference is that only one of the two potential outcomes is observed for each individual, according to their treatment/exposure assignment $T_i$:

$$\begin{aligned} Y^{obs}_i = Y_i(T_i) = T_iY(1) + (1 - T_i)Y(0), \end{aligned}$$

(23)

making it impossible to directly calculate all hypothetical estimands (ITE/ATE/CATE) from observed data $(Y^{obs}_i, T_i, X_i)$.

Personalized markers using CATE estimates. To estimate (C)ATE from observational data, advanced techniques are required to adjust for confounders and mimic a randomized experiment setting. One method exploits Propensity Scores (PS):

$$\begin{aligned} \pi (X_i) = P(T = 1 | X_i), \end{aligned}$$

(24)

assessing the probability of receiving treatment given the individual’s characteristics X. Adjusting for PS removes biases associated with included covariates ³⁶. In addition, by assuming positivity (i.e., all confounder values can be observed in both treated and controls) and no unobserved confounders, the treatment and potential outcomes become independent conditional on $\pi (X_i)$, $T \perp Y(0), Y(1) | \pi (X)$, allowing straightforward effect estimation by matching or regressing the outcome on PS ³⁷.

Another approach, Inverse Probability Weighting (IPW), balances the distribution of X across treated and controls by creating a pseudo-population where each original subject is re-weighted using weights:

$$\begin{aligned} w_i = \frac{T_i}{\pi (X_i)} + \frac{1 - T_i}{1 - \pi (X_i)}. \end{aligned}$$

(25)

The weights can be, for example, incorporated into flexible, even machine-learning-based, outcome models (e.g., weighted regression) to estimate the treatment effect while mitigating selection bias ³⁸.

In our study, focusing on quantifying the effects of OSA ($T = 1$) on $\textbf{P}$, we employ IPW within the S-learner framework ⁴⁰. The S-learner is a baseline approach of meta-learners, enabling flexible estimation of heterogeneous CATE. The S-learner quantifies the outcome using a single model (hence S-Learner), including the treatment indicator T as one of its predictors:

$$\begin{aligned} \mu (x,t) = E[Y^{obs}|X = x, T = t], \end{aligned}$$

(26)

allowing straightforward estimation of CATE from Eq. 22 that is easily extrapolated over the entire range of X:

$$\begin{aligned} \hat{\text {CATE(x)}} = \hat{\mu }(x,1) - \hat{\mu }(x,0). \end{aligned}$$

(27)

For probabilistic outcomes, the Risk-Ratio CATE (RR-CATE) is preferred as it naturally compares the chances of an event:

$$\begin{aligned} \hat{\text {RR-CATE}}(x) = \frac{\hat{\mu }(x,1)}{\hat{\mu }(x,0)}. \end{aligned}$$

(28)

One of the key benefits of S-Learner is its simplicity in extrapolating the (RR-)CATE estimates over and beyond the observed values of X. Unlike other meta-learners (e.g., T- or X-learner ⁴⁰) that fit separate response functions for exposed ($T=1$) and control ($T=0$) subjects, the S-learner estimates a single model and thus requires less data, while assuming that the effects of the other (non-treatment) variables are shared within groups.

Practical considerations. Care must be taken in interpreting causal effects due to assumptions underlying PS (and so IPW), such as no unobserved confounders and positivity. These assumptions are challenging to validate rigorously. In summary, addressing confounding is better than ignoring it, but interpretations should consider the assumptions made.

Study use case: effects of OSA on sleep-stage transitions matrix $\textbf{P}$ and derived markers

The practical part of our study links the proposed sleep fingerprint $\textbf{P}$ (c.f. Eq. 1) and derived markers (c.f., Eqs. 2–13 and 14) to a causal framework for their efficient quantification and estimation of disorder effect. We demonstrate our approach on OSA, the most prevalent sleep disorder and a significant risk factor, and exploit study dataset from BSDB.

To model PS from Eq. 24, we applied the logistic regression including confounders the most frequently occurring in the literature: age and gender. Both factors are also known to impact the risk of OSA and at the same time, their value range is not constrained between OSA and healthy subjects, thus meeting the positivity assumption. The PS model included separate predictors of the scaled age above 50 years in decades ($X_{(\text {Age} - 50)/10}$), gender indicator ($\mathbb {I}_{\text {male}}$), and their interaction:

$$\begin{aligned} \pi (X) = P(\text {OSA} = 1 \mid X) = \frac{1}{1 + e^{-(\beta _0 + \beta _1 \mathbb {I}_{\text {male}} + \beta _2 X_{(\text {Age} - 50)/10} + \beta _3 \mathbb {I}_{\text {male}} \times X_{(\text {Age} - 50)/10})}}. \end{aligned}$$

(29)

The IPW weights based on Eq. 25 were used to balance the data concerning the main confounders shared.

To estimate the effects, i.e., (RR)-CATE from Eqs. 27, 28, of OSA on the compositional outcome of $\textbf{P}$, the Dirichlet regression, as introduced in Eqs. 16, 17, was exploited to model the response within the S-learner framework from Eq. 26. Each of the 25 possible transition proportions captured in $\textbf{P}$ and indexed as $(i,j)\quad \forall i, j \in \{\text {W, N1, N2, N3, REM} \}$, was modelled using the predictor specific for the corresponding dimension characterized by $\alpha _{(i,j)}$:

$$\begin{aligned} \begin{aligned} \log (\alpha _{(i,j)}) = \beta _{(i,j),0} + \beta _{(i,j),1} {\mathbb {I}}_{\text {male}} + \beta _{(i,j),2} X_{(\text {Age} - 50)/10} + \beta _{(i,j),3} {\mathbb {I}}_{\text {OSA}} + \beta _{(i,j),4} ({\mathbb {I}}_{\text {OSA}} \times {\mathbb {I}}_{\text {male}}) + \\ \beta _{(i,j),5} ({\mathbb {I}}_{\text {OSA}} \times X_{(\text {AHI} - 5)/10}) + \beta _{(i,j),6} ({\mathbb {I}}_{\text {OSA}} \times {\mathbb {I}}_{\text {Insomnia\_Com}}) + \beta _{(i,j),7} ({\mathbb {I}}_{\text {OSA}} \times {\mathbb {I}}_{\text {NT1\_Com}}) + \\ \beta _{(i,j),8} ({\mathbb {I}}_{\text {OSA}} \times {\mathbb {I}}_{\text {OtherHyp\_Com}}) + \beta _{(i,j),9} ({\mathbb {I}}_{\text {OSA}} \times {\mathbb {I}}_{\text {Parasomnia\_Com}}) + \beta _{(i,j),10} ({\mathbb {I}}_{\text {OSA}} \times {\mathbb {I}}_{\text {Movement\_Com}}). \end{aligned} \end{aligned}$$

(30)

This log-transformed $\alpha _{(i,j)}$ was regressed on several covariates and interaction terms with a primary goal to separate and quantify the effect of OSA, present as an indicator variable $\mathbb {I}_{\text {OSA}}$. Although this S-learner model was estimated on IPW-balanced data (c.f., Eq. 29), the inclusion of age and gender was justified by the necessary adjustment due to their known influence on sleep manifestation. Next, the interaction of OSA with gender was also included, to investigate potential gender-specific phenotypes. In addition, several variables that violating the positivity assumption were included, as they could not be utilized within the PS model due to their disjoint distributions among healthy and OSA subjects. This included the interaction terms of OSA with scaled Apnea Hypopnea Index (AHI), $X_{(\text {AHI} - 5)/10}$, denoting number of AHI greater than 5 in tens, capturing the apnea severity as the number of complete or partial breath-arrests per hour. Uniquely, our model adjusts for a comprehensive range of comorbidities present as indicator variables: insomnia ($\mathbb {I}_{\text {Insomnia\_Com}}$), Narcolepsy Type 1 (NT1, $\mathbb {I}_{\text {NT1\_Com}}$), other hypersomnolence except NT1 ($\mathbb {I}_{\text {OtherHyp\_Com}}$), parasomnias ($\mathbb {I}_{\text {Parasomnia\_Com}}$), and movement-related sleep-disorders ($\mathbb {I}_{\text {Movement\_Com}}$). The distribution of AHI and all the comorbidities is completely disjoint, as healthy subjects do not suffer from any disorder/comorbidity and AHI values in OSA subjects are always greater than 5.

To assess uncertainty and calculate confidence intervals (CI) in all strands of our investigations, including the PS model, IPW-balanced S-learner with Dirichlet regression, and subsequent quantification of $\textbf{P}$-derived markers using (RR)-CATE, we implemented a non-parametric bootstrap procedure with 200 repetitions, inspired by⁶⁶.

Data Availability

The datasets analyzed in this study are subject to different accessibility conditions: Bern sleep data base (BSDB): Due to patient confidentiality and ethical restrictions, the BSDB dataset is not publicly available. However, de-identified data may be obtained from the corresponding author upon reasonable request, subject to data sharing agreement and approval by the relevant ethics committees. Sleep heart health study (SHHS): This dataset is publicly accessible through the National Sleep Research Resource (NSRR) at https://sleepdata.org/datasets/shhs. Researchers can request access by creating an NSRR account and agreeing to the data use terms and conditions.

References

Berry, R. B. et al. Aasm scoring manual version 2.2 updates: new chapters for scoring infant sleep staging and home sleep apnea testing (2015).
Redline, S. et al. The effects of age, sex, ethnicity, and sleep-disordered breathing on sleep architecture. Arch. Intern. Med. 164, 406–418 (2004).
Article PubMed MATH Google Scholar
Carskadon, M. A. et al. Normal human sleep: An overview. Prin. Pract. Sleep Med. 4, 13–23 (2005).
Article Google Scholar
Sahlin, C., Franklin, K. A., Stenlund, H. & Lindberg, E. Sleep in women: normal values for sleep stages and position and the effect of age, obesity, sleep apnea, smoking, alcohol and hypertension. Sleep Med. 10, 1025–1030 (2009).
Article PubMed MATH Google Scholar
Luca, G. et al. Age and gender variations of sleep in subjects without sleep disorders. Ann. Med. 47, 482–491 (2015).
Article PubMed MATH Google Scholar
Ohayon, M. M., Carskadon, M. A., Guilleminault, C. & Vitiello, M. V. Meta-analysis of quantitative sleep parameters from childhood to old age in healthy individuals: developing normative sleep values across the human lifespan. Sleep 27, 1255–1273 (2004).
Article PubMed Google Scholar
Boulos, M. I. et al. Normal polysomnography parameters in healthy adults: a systematic review and meta-analysis. Lancet Respir. Med. 7, 533–543 (2019).
Article PubMed MATH Google Scholar
Egger, M., Schneider, M. & Smith, G. D. Meta-analysis spurious precision? Meta-analysis of observational studies. BMJ 316, 140–144 (1998).
Article CAS PubMed MATH PubMed Central Google Scholar
Cochran, W. G. & Rubin, D. B. Controlling bias in observational studies: A review. Sankhyā Indian J. Stat. Ser. A 417–446 (1973).
Penzel, T. et al. Analysis of sleep fragmentation and sleep structure in patients with sleep apnea and normal volunteers. In 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference, 2591–2594 (IEEE, 2006).
Kimoff, R. J. Sleep fragmentation in obstructive sleep apnea. Sleep 19, S61–S66 (1996).
Article CAS PubMed MATH Google Scholar
Andlauer, O. et al. Nocturnal rapid eye movement sleep latency for identifying patients with narcolepsy/hypocretin deficiency. JAMA Neurol. 70, 891–902 (2013).
Article PubMed PubMed Central Google Scholar
Hermans, L. W. et al. Representations of temporal sleep dynamics: Review and synthesis of the literature. Sleep Med. Rev. 63, 101611 (2022).
Article PubMed MATH Google Scholar
Kemp, B. & Kamphuisen, H. A. Simulation of human hypnograms using a markov chain model. Sleep 9, 405–414 (1986).
Article CAS PubMed MATH Google Scholar
Yassouridis, A., Steiger, A., Klinger, A. & Fahrmeir, L. Modelling and exploring human sleep with event history analysis. J. Sleep Res. 8, 25–36 (1999).
Article CAS PubMed MATH Google Scholar
Burns, J. W., Crofford, L. J. & Chervin, R. D. Sleep stage dynamics in fibromyalgia patients and controls. Sleep Med. 9, 689–696 (2008).
Article PubMed MATH Google Scholar
Laffan, A., Caffo, B., Swihart, B. J. & Punjabi, N. M. Utility of sleep stage transitions in assessing sleep continuity. Sleep 33, 1681–1686 (2010).
Article PubMed MATH PubMed Central Google Scholar
Kishi, A., Struzik, Z. R., Natelson, B. H., Togo, F. & Yamamoto, Y. Dynamics of sleep stage transitions in healthy humans and patients with chronic fatigue syndrome. Am. J. Physiol. Regul. Integr. Comp. Physiol. 294, R1980–R1987 (2008).
Article CAS PubMed Google Scholar
Kim, J., Lee, J.-S., Robinson, P. & Jeong, D.-U. Markov analysis of sleep dynamics. Phys. Rev. Lett. 102, 178104 (2009).
Article ADS CAS PubMed MATH Google Scholar
Wei, Y. et al. Sleep stage transition dynamics reveal specific stage 2 vulnerability in insomnia. Sleep 40, zsx117 (2017).
Schlemmer, A., Parlitz, U., Luther, S., Wessel, N. & Penzel, T. Changes of sleep-stage transitions due to ageing and sleep disorder. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 373, 20140093 (2015).
Article ADS Google Scholar
Wächter, M. et al. Unique sleep-stage transitions determined by obstructive sleep apnea severity, age and gender. J. Sleep Res. 29, e12895 (2020).
Article PubMed MATH Google Scholar
Yetton, B. D., McDevitt, E. A., Cellini, N., Shelton, C. & Mednick, S. C. Quantifying sleep architecture dynamics and individual differences using big data and Bayesian networks. PLoS ONE 13, e0194604 (2018).
Article PubMed PubMed Central Google Scholar
Lo, C.-C. et al. Dynamics of sleep-wake transitions during sleep. Europhys. Lett. 57, 625 (2002).
Article ADS CAS MATH Google Scholar
Penzel, T., Kantelhardt, J. W., Lo, C.-C., Voigt, K. & Vogelmeier, C. Dynamics of heart rate and sleep stages in normals and patients with sleep apnea. Neuropsychopharmacology 28, S48–S53 (2003).
Article PubMed Google Scholar
Norman, R. G., Scott, M. A., Ayappa, I., Walsleben, J. A. & Rapoport, D. M. Sleep continuity measured by survival curve analysis. Sleep 29, 1625–1631 (2006).
Article PubMed Google Scholar
Chervin, R. D., Fetterolf, J. L., Ruzicka, D. L., Thelen, B. J. & Burns, J. W. Sleep stage dynamics differ between children with and without obstructive sleep apnea. Sleep 32, 1325–1332 (2009).
Article PubMed PubMed Central Google Scholar
Bianchi, M. T., Cash, S. S., Mietus, J., Peng, C.-K. & Thomas, R. Obstructive sleep apnea alters sleep stage transition dynamics. PLoS ONE 5, e11356 (2010).
Article ADS PubMed PubMed Central Google Scholar
Klerman, E. B. et al. Survival analysis indicates that age-related decline in sleep continuity occurs exclusively during nrem sleep. Neurobiol. Aging 34, 309–318 (2013).
Article PubMed MATH Google Scholar
Kishi, A. et al. Sleep stage dynamics in young patients with sleep bruxism. Sleep43, zsz202 (2020).
Jackson, C. Multi-state models for panel data: The msm package for r. J. Stat. Softw. 38, 1–28 (2011).
Article MATH Google Scholar
Kalbfleisch, J. & Lawless, J. F. The analysis of panel data under a markov assumption. J. Am. Stat. Assoc. 80, 863–871 (1985).
Article MathSciNet MATH Google Scholar
Ellenberg, J. H. Selection bias in observational and experimental studies. Stat. Med. 13, 557–567 (1994).
Article CAS PubMed MATH Google Scholar
Rubin, D. B. The use of matched sampling and regression adjustment to remove bias in observational studies. Biometrics 185–203 (1973).
Senaratna, C. V. et al. Prevalence of obstructive sleep apnea in the general population: A systematic review. Sleep Med. Rev. 34, 70–81 (2017).
Article PubMed Google Scholar
Rosenbaum, P. R. & Rubin, D. B. The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55 (1983).
Article MathSciNet MATH Google Scholar
Hirano, K. & Imbens, G. W. Estimation of causal effects using propensity score weighting: An application to data on right heart catheterization. Health Serv. Outcomes Res. Method. 2, 259–278 (2001).
Article MATH Google Scholar
Chesnaye, N. C. et al. An introduction to inverse probability of treatment weighting in observational research. Clin. Kidney J. 15, 14–20 (2022).
Article PubMed Google Scholar
Maier, M. J. DirichletReg: Dirichlet Regression (2021). R package version 0.7-1.
Künzel, S. R., Sekhon, J. S., Bickel, P. J. & Yu, B. Metalearners for estimating heterogeneous treatment effects using machine learning. Proc. Natl. Acad. Sci. 116, 4156–4165 (2019).
Article ADS PubMed MATH PubMed Central Google Scholar
Imbens, G. W. & Rubin, D. B. Causal inference in statistics, social, and biomedical sciences (Cambridge University Press, 2015).
Lal, C., Strange, C. & Bachman, D. Neurocognitive impairment in obstructive sleep apnea. Chest 141, 1601–1610 (2012).
Article PubMed MATH Google Scholar
Stickgold, R. & Walker, M. P. Sleep-dependent memory consolidation and reconsolidation. Sleep Med. 8, 331–343 (2007).
Article PubMed MATH PubMed Central Google Scholar
Plazzi, G. & Pizza, F. Sleep dynamics beyond traditional sleep macrostructure. Sleep 36, 1123–1124 (2013).
Article PubMed MATH PubMed Central Google Scholar
Kapur, V. K. et al. Clinical practice guideline for diagnostic testing for adult obstructive sleep apnea: An American academy of sleep medicine clinical practice guideline. J. Clin. Sleep Med. 13, 479–504 (2017).
Article PubMed MATH PubMed Central Google Scholar
Sweetman, A. M. et al. Developing a successful treatment for co-morbid insomnia and sleep apnoea. Sleep Med. Rev. 33, 28–38 (2017).
Article PubMed MATH Google Scholar
Winkelman, J. W., Shahar, E., Sharief, I. & Gottlieb, D. J. Association of restless legs syndrome and cardiovascular disease in the sleep heart health study. Neurology 70, 35–42 (2008).
Article PubMed MATH Google Scholar
Benetó, A., Gomez-Siurana, E. & Rubio-Sanchez, P. Comorbidity between sleep apnea and insomnia. Sleep Med. Rev. 13, 287–293 (2009).
Article PubMed Google Scholar
Luyster, F. S., Buysse, D. J. & Strollo, P. J. Jr. Comorbid insomnia and obstructive sleep apnea: Challenges for clinical practice and research. J. Clin. Sleep Med. 6, 196–204 (2010).
Article PubMed PubMed Central Google Scholar
Koo, B. B., Patel, S. R., Strohl, K. & Hoffstein, V. Rapid eye movement-related sleep-disordered breathing: Influence of age and gender. Chest 134, 1156–1161 (2008).
Article PubMed Google Scholar
Shepertycky, M. R., Banno, K. & Kryger, M. H. Differences between men and women in the clinical presentation of patients diagnosed with obstructive sleep apnea syndrome. Sleep 28, 309–314 (2005).
PubMed Google Scholar
Valipour, A. et al. Gender-related differences in symptoms of patients with suspected breathing disorders in sleep: A clinical population study using the sleep disorders questionnaire. Sleep 30, 312–319 (2007).
Article PubMed MATH Google Scholar
Bublitz, M. et al. A narrative review of sex and gender differences in sleep disordered breathing: Gaps and opportunities. Life 12, 2003 (2022).
Article ADS PubMed MATH PubMed Central Google Scholar
Malhotra, R. K. Neurodegenerative disorders and sleep. Sleep Med. Clin. 13, 63–70 (2018).
Article PubMed MATH Google Scholar
Freeman, D., Sheaves, B., Waite, F., Harvey, A. G. & Harrison, P. J. Sleep disturbance and psychiatric disorders. Lancet Psychiat. 7, 628–637 (2020).
Article Google Scholar
Krystal, A. D. Psychiatric disorders and sleep. Neurol. Clin. 30, 1389–1413 (2012).
Article PubMed MATH PubMed Central Google Scholar
Stephansen, J. B. et al. Neural network analysis of sleep stages enables efficient diagnosis of narcolepsy. Nat. Commun. 9, 5229 (2018).
Article ADS CAS PubMed MATH PubMed Central Google Scholar
Terzano, M. G., Parrino, L., Boselli, M., Spaggiari, M. C. & Di Giovanni, G. Polysomnographic analysis of arousal responses in obstructive sleep apnea syndrome by means of the cyclic alternating pattern. J. Clin. Neurophysiol. 13, 145–155 (1996).
Article CAS PubMed Google Scholar
Parrino, L., Smerieri, A., Boselli, M., Spaggiari, M. & Terzano, M. G. Sleep reactivity during acute nasal cpap in obstructive sleep apnea syndrome. Neurology 54, 1633–1640 (2000).
Article CAS PubMed Google Scholar
Parrino, L. et al. Reorganization of sleep patterns in severe osas under prolonged cpap treatment. Clin. Neurophysiol. 116, 2228–2239 (2005).
Article PubMed MATH Google Scholar
Milioli, G. et al. Can sleep microstructure improve diagnosis of osas? integrative information from cap parameters. Arch. Ital. Biol. 153, 194–203 (2015).
PubMed MATH Google Scholar
Mutti, C. et al. The contribution of sleep texture in the characterization of sleep apnea. Diagnostics 13, 2217 (2023).
Article PubMed MATH PubMed Central Google Scholar
Kales, A., Rechtschaffen, A., University of California, L. A. B. I. S. & (U.S.), N. N. I. N. A Manual of Standardized Terminology, Techniques and Scoring System for Sleep Stages of Human Subjects: Allan Rechtschaffen and Anthony Kales, Editors. NIH publication (U. S. National Institute of Neurological Diseases and Blindness, Neurological Information Network, 1968).
Quan, S. F. et al. The sleep heart health study: Design, rationale, and methods. Sleep 20, 1077–1085 (1997).
CAS PubMed MATH Google Scholar
Zhang, G.-Q. et al. The national sleep research resource: Towards a sleep data commons. J. Am. Med. Inform. Assoc. 25, 1351–1358 (2018).
Article PubMed MATH PubMed Central Google Scholar
Austin, P. C. Variance estimation when using inverse probability of treatment weighting (iptw) with survival analysis. Stat. Med. 35, 5642–5655 (2016).
Article MathSciNet PubMed MATH PubMed Central Google Scholar

Download references

Acknowledgements

The secondary usage of Berner Sleep Data Base (BSDB) from Inselspital, University Hospital Bern, was approved by the local ethics committee (KEK-Nr. 2022-00415), ensuring compliance with the Human Research Act (HRA) and Ordinance on Human Research with the Exception of Clinical trials (HRO), and analyzed in the framework of the E12034 - SPAS (Sleep Physician Assistant System) Eurostar-Horizon 2020 program. The BSDB dataset access may be granted upon individual request, after data transfer agreements were put in place.

Author information

Authors and Affiliations

Institute of Computer Science, University of Bern, Bern, 3012, Switzerland
Michal Bechny & Athina Tzovara
Institute of Digital Technologies for Personalized Healthcare (MeDiTech), University of Applied Sciences and Arts of Southern Switzerland (SUPSI), Lugano-Viganello, 6962, Switzerland
Michal Bechny, Luigi Fiorillo & Francesca Faraci
Graduate School of Medicine, The University of Tokyo, Tokyo, 113-0033, Japan
Akifumi Kishi
Parkinson Disease and Movement Disorder Center, Neurocenter of Southern Switzerland, Ente Ospedaliero Cantonale, Lugano, 6903, Switzerland
Luigi Fiorillo
Department of Neurology, Bern University Hospital (Inselspital) and University of Bern, Bern, 3010, Switzerland
Julia van der Meer, Markus Schmidt, Claudio Bassetti & Athina Tzovara
Sleep-Wake-Epilepsy Centre, Bern University Hospital (Inselspital) and University of Bern, Bern, 3010, Switzerland
Markus Schmidt & Claudio Bassetti

Authors

Michal Bechny
View author publications
Search author on:PubMed Google Scholar
Akifumi Kishi
View author publications
Search author on:PubMed Google Scholar
Luigi Fiorillo
View author publications
Search author on:PubMed Google Scholar
Julia van der Meer
View author publications
Search author on:PubMed Google Scholar
Markus Schmidt
View author publications
Search author on:PubMed Google Scholar
Claudio Bassetti
View author publications
Search author on:PubMed Google Scholar
Athina Tzovara
View author publications
Search author on:PubMed Google Scholar
Francesca Faraci
View author publications
Search author on:PubMed Google Scholar

Contributions

M.B. conceptualized the study, developed the methodology, performed the analysis, drafted the manuscript, and incorporated feedback from all co-authors. A.K. provided detailed feedback on related work, and clinical interpretation of the results, and contributed to the discussion section. J.v.d.M. assisted with data curation and provided detailed feedback on the introduction and discussion sections. L.F., M.S., C.B., A.T., and F.F. read the manuscript and provided their feedback. All co-authors approved the final manuscript and agreed to be listed as co-authors.

Corresponding author

Correspondence to Michal Bechny.

Ethics declarations

Competing interest

Mr Akifimi Kishi is supported by JST FORESTO program (grant no. JPMJFR2156), outside the submitted work. All authors declare no financial or non-financial competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Bechny, M., Kishi, A., Fiorillo, L. et al. Novel digital markers of sleep dynamics: causal inference approach revealing age and gender phenotypes in obstructive sleep apnea. Sci Rep 15, 12016 (2025). https://doi.org/10.1038/s41598-025-97172-3

Download citation

Received: 17 December 2024
Accepted: 02 April 2025
Published: 08 April 2025
Version of record: 08 April 2025
DOI: https://doi.org/10.1038/s41598-025-97172-3