Introduction

Mood and anxiety disorders are highly prevalent and comorbid in the United States1,2. Mood disorders such as Major Depressive Disorder (MDD) and anxiety disorders such as Generalized Anxiety Disorder (GAD) are characterized by a defined set of psychiatric symptoms. However both categorical diagnoses share substantial overlap in their clinical presentation. For example, both can be characterized by executive function deficits, sleep disturbances, fatigue, maladaptive arousal, and psychomotor abnormalities3. Yet, recent work demonstrates that even within shared diagnoses and patients presenting with the same level of depression severity, there are underlying differences in how symptoms interact with one another to drive this severity4. This overlap, co-occurrence5, and mutual exacerbation3 can complicate specificity of diagnoses and measurement of treatment outcomes. Measurement of symptom severity and symptom interactions rather than categorical diagnoses can thus be useful in clinical characterization and the development of novel treatment targets.

Current tools for measuring and diagnosing mood and anxiety disorders include clinical interview, often via the Diagnostic and Statistical Manual of Mental Disorders (DSM), which facilitates categorical diagnosis of disorders. The self-reported Mood and Anxiety Symptom Questionnaire (MASQ)6 can also measure symptom severity for separate domains of anxious arousal, general distress, and anhedonic depression on a dimensional continuum. While this assessment can discriminate anxiety and depressive symptom severity within psychiatric patient populations6, much like many other self-report questionnaires, the MASQ requires that a patient recall a week’s worth of experiences and distill complex emotional and mental states into a single number. This presents a recall bias that may lead to an inaccurate representation of symptom severity that has been averaged over a long period of time and potentially distorted by any underlying cognitive dysfunction, which is highly prevalent in mood and anxiety disorders7,8,9,10,11. Determining diagnosis and symptom severity in this way overlooks the intricacies and nuances of symptom profiles that can fluctuate on a day-to-day basis within patients12,13.

Digital phenotyping – i.e., momentary assessment of symptom profiles in the real-world through personal digital devices – can counteract some of these limitations in symptom profiling by assessing symptom severity across multiple timepoints, with low patient burden14. Approximately 90% of Americans own a smartphone15 with a projected increase to over 5 billion people owning smartphones globally by 203016,17. This ubiquity of smartphones enables novel quantification of patients’ behaviors and symptoms in real life and real-time through digital phenotyping. Since mood and anxiety symptom severity can fluctuate over time, digital phenotyping tools may detect vulnerability towards mood and anxiety disorders before they become taxing on an individual’s life, and indicate behavioral markers that are associated with symptom changes18. Digital phenotyping tools may also highlight how different symptoms interact with one another4, and how underlying cognitive changes, such as changes in motivation, might impact those relationships19. Maladaptive changes in motivation are consistently observed in individuals with mood and anxiety disorders20,21,22,23, and recent evidence suggests that both intrinsic motivation (actions driven by ‘internal drivers’) and extrinsic motivation (actions driven by tangible external stimuli or outcomes) may be differentially impaired24. While motivational deficits are commonly observed within psychiatric populations, understanding how different kinds of motivation may differentially impact symptom severity in the real-world remains limited.

Over the past decade, research on digital phenotyping has accelerated18 by providing active data (captured via ecological momentary assessment (EMAs))25 and passive data from sensors including accelerometer (for movement/physical activity), GPS, and screentime (for sedentary behavior). These methods enhance ecological validity and reduce patient recall bias. For instance, a recent study employing actigraphic assessments of step counts found a negative association between depression and physical activity26 such that even low to moderate levels of physical activity were associated with reduced depression scores. Meanwhile, studies assessing the relationship between screentime and mental health have produced mixed findings27,28,29 and focused primarily on adolescents and young adults given they have grown up entirely in the digital age and are more prone to long-term negative effects30,31,32. Interestingly, there is an established inverse relationship between physical activity and sedentary behaviors, such as screentime, suggesting that increased screentime can impede physical activity levels33,34. However, research on the combined effects of phone usage and physical activity on health outcomes remains limited and often relies on self-report data31,33,34. Therefore, there is a need to objectively quantify screentime in a broad age range to better understand how these competing sedentary and active behaviors affect mood and anxiety symptoms across the lifespan given evidence that sedentary behaviors can influence risk of disease, independent of physical activity levels.

The current study introduces a new set of single-item, self-reported, EMAs examining anxiety, distress, and depression based on the gold-standard MASQ35, similar to prior work36,37,38,39. These electronic MASQ (eMASQ) EMAs were captured via a research-based open-source smartphone application (mindLAMP)40 in individuals with mood and anxiety (MA) disorders and healthy controls (HC). Clinical assessments of mood and anxiety symptom severity were also captured in the laboratory (in-lab MASQ) in order to test whether anxiety, distress, and depression eMASQ-EMAs related to their corresponding in-lab MASQ measures. Additional EMAs of intrinsic and extrinsic motivation were also collected to assess their relationship with each eMASQ-EMA. Passive activity data was collected in tandem to obtain objective measures of physical activity (operationalized as steps taken per day) and sedentary activity (operationalized as screentime, or time spent on one’s mobile device) to enable detection of behaviors that may inform risk prediction. Finally, to understand how each of these active and passive activity measures might work together to drive mood/anxiety disorders, a network model based on a graph theory method was constructed using this data. We primarily hypothesized that each eMASQ-EMA of mood and anxiety symptoms would be significantly associated with its corresponding in-lab MASQ measure and would show high variability over time.

Results

Participants

A total of N = 114 individuals participated in the study. Following data cleaning and preprocessing (see Methods: Preprocessing, Fig. 1), a total of N = 101 participants remained, including N = 49 HC and N = 52 in the MA group. The MA group included N = 32 individuals with primary MDD and N = 20 with a primary anxiety or stress-related disorder, although most participants showed high levels of comorbidity (63.5%) (see Table 1).

Fig. 1: Flowchart illustrating phases of data preprocessing and participants included in each set of analyses.
figure 1

Bold text indicates datasets in which analyses were performed. HC Healthy Control group, MA Mood/Anxiety disorder group. Created in BioRender. Beltran, J. (2025) https://BioRender.com/p43h413.

Table 1 Demographics reported for healthy controls (HC) and participants with a mood or anxiety disorder (MA) (N = 32 MDD, 15 GAD, 5 PTSD)

Adherence

Over the 30-day period, HCs completed an average of 21.3 ± 6.23 total days of EMAs while the MA group completed an average of 20.9 ± 5.72 total days, with no significant difference in days of surveys completed between groups (Fig. 2a, N = 101, p = 0.687). There was no significant difference in survey completion by day between the two groups (Fig. 2b, N = 101, p = 0.386), suggesting overall good adherence in line with prior EMA studies conducted in participants with MDD showing completion rates ranging from 65 to 85%41.

Fig. 2: Study Adherence between groups.
figure 2

a Box plots illustrate the percentage of days out of the 30-day requirement on which participants completed at least one survey. b Line graph illustrates the percentage of subjects who completed at least one survey on a given day out of the total number enrolled to determine study adherence. Error bars standard deviation of a proportion, HC Healthy Control group, MA Mood/Anxiety disorder group.

Relationships between eMASQ-EMAs and in-lab measures of symptom severity

In the full cohort (i.e., MA + HC), there was a significant association between the MASQ Anxious Arousal and anxiety eMASQ-EMA (Fig. 3a, N = 80, p < 0.001). Similarly, a significant association was observed between the MASQ General Distress and distress eMASQ-EMA (Fig. 3b, N = 80, p < 0.001). Finally, a significant association was observed between the MASQ Anhedonic Depression and depression eMASQ-EMA (Fig. 3c, N = 80, p < 0.001), together demonstrating the convergence between EMAs and in-lab clinical ratings. The same associations were observed in the stratified analyses for each group individually suggesting the utility of EMAs for measuring mood and anxiety symptoms even within individuals with low symptom variability (i.e., the HC group; see Fig. 3d, Supplementary Fig. 1 and 2, Supplementary Result 1 and 2, and Supplementary Table 14).

Fig. 3: Results from assessing the reliability of eMASQ-EMAs for anxiety, distress and depression.
figure 3

a–c Scatter plots showcasing significant relationships between the in-lab MASQ for Anxious Arousal, General Distress, and Anhedonic Depression against anxiety, distress, and depression eMASQ-EMA scores alongside their score distributions. Individual data points represent survey responses per participant. d Lollipop plots depicting results from regression models assessing the relationship between the in-lab MASQ for Anxiety, Distress and Depression and their corresponding eMASQ-EMAs in the full cohort (HC + MA). MASQ total scores represent the sum of each MASQ subscale and were correlated against the sum of each eMASQ-EMA. In the full cohort, significant associations were found between each MASQ subscale and its corresponding eMASQ-EMA demonstrating the reliability of these scales (N = 80, β’s > 0.0792, t’s > 2.28, p’s < 0.05). HC Healthy Control group, MA Mood/Anxiety disorder group, MASQ Mood and Anxiety Symptom Questionnaire.

Effects of intrinsic motivation on anxiety, distress and depression

We next assessed the relationship between intrinsic (Fig. 4a–c) and extrinsic (Fig. 4d–f) motivation EMAs, and anxiety, distress, and depression eMASQ-EMAs on an exploratory basis.

Fig. 4: Regression results illustrating marginal effects plots of the predicted values of anxiety, distress and depression severity across different levels of intrinsic and extrinsic motivation upon adjusting for covariates.
figure 4

Zero-inflated Poisson models for each anxiety/distress/depression eMASQ-EMA demonstrated there were main effects of intrinsic motivation (ac) (N = 101, IRRs > 0.82, p’s < 0.001) and extrinsic motivation (df) (N = 101, IRRs > 0.88, p’s ≤ 0.001) on each eMASQ-EMA. There were also significant interactions between group and intrinsic motivation on anxiety/distress/depression (IRRs > 1.08, p’s < 0.001) and group and extrinsic motivation on depression (N = 101, IRR = 1.06, p = 0.018). Data points are based on marginal effects from the fitted ZIP models using the GLMMadaptive package’s ‘effectPlotData‘ function in R. HC Healthy Control group, MA Mood/Anxiety disorder group.

For each individual set of models, there was a significant main effect of intrinsic motivation on anxiety (Fig. 4a), distress (Fig. 4b) and depression (Fig. 4c; N = 101, p’s < 0.001), whereby lower intrinsic motivation was associated with higher incidence and severity of each symptom. There was also a significant interaction between intrinsic motivation and group for each symptom domain (count model: N = 101, p’s < 0.001), whereby as intrinsic motivation increased, groups differed more in symptom severity. This was driven by a stronger relationship between intrinsic motivation and symptom severity in the MA group, suggesting that symptom severity was more affected by changes in intrinsic motivation in the MA group compared to controls. There was also a significant effect of weekend on distress but no significant effects of group, day, age or sex on symptom severity (see Supplementary Tables 57 for full results and model residuals).

Effects of extrinsic motivation on anxiety, distress and depression

Again for each model, there was a significant main effect of extrinsic motivation on anxiety (Fig. 4d), distress (Fig. 4e) and depression (Fig. 4f; N = 101, p’s < 0.015), whereby lower extrinsic motivation was associated with higher incidence and severity of each symptom. There was also a significant interaction between extrinsic motivation and group for anxiety (count model: N = 101, p = 0.025), whereby as extrinsic motivation increased, groups differed more in anxiety severity. There were also main effects of group and weekend on symptom severity, and a main effect of day on depression, with no significant effects of age, or sex. See Supplementary Tables 810 for full results and model residuals.

Effects of physical activity on anxiety, distress and depression

We next assessed the relationship between physical activity (Fig. 5) and anxiety, distress, and depression eMASQ-EMAs.

Fig. 5: Regression results illustrating marginal effects plots of the predicted values of anxiety, distress and depression severity across steps taken per day upon adjusting for covariates.
figure 5

Zero-inflated Poisson models demonstrated a main effect of steps on anxiety (a), distress (b) and depression (c) (N = 70, IRRs > 0.87, p’s ≤ 0.05). Across all three measures, participants in the MA group experience greater symptom severity (N = 70, IRRs > 2, p’s < 0.001) in comparison to HC. Meanwhile, only in the anxiety model was there an effect of sex such that females demonstrate higher anxiety scores than males, after accounting for step counts. Steps data were scaled within-participants. Data points are based on marginal effects from the fitted ZIP models using the GLMMadaptive package’s ‘effectPlotData‘ function in R. HC Healthy Control group, MA Mood/Anxiety disorder group.

There was a main effect of steps on anxiety (Fig. 5a), distress (Fig. 5b), and depression (Fig. 5c; count model: N = 70, p’s < 0.05), within each individual model, whereby higher physical activity was associated with lower symptom severity. There was also an interaction between group and steps on depression (N = 70, p = 0.052), whereby as steps taken per day increased, groups differed more in depression severity. Again, this was driven by a stronger relationship between physical activity and depression severity in the MA group, suggesting that symptom severity was more affected by changes in physical activity in the MA group compared to controls. There were main effects of group and sex on anxiety; group, weekend and day on distress; and group and day on depression. See Supplementary Tables 1113 for full results and model residuals.

Results from the linear mixed effects model with a 1-period lag demonstrated marginal effects of physical activity on depression at the following timepoint (N = 34 MA, p = 0.062, pFDR = 0.299). Meanwhile, a 7-period lag analysis demonstrated that on average, higher physical activity (steps) was associated with lower depression and anxiety approximately one week later (N = 34 MA, Depression: p = 0.004, pFDR = 0.043, Anxiety: p = 0.004, pFDR = 0.043) (Supplementary Fig. 3b). When scaling step counts data within-participants to explore how an individual’s step counts affect their symptoms of depression and anxiety, there were similar effects of steps on depression and anxiety symptoms approximately one week later (N = 34 MA, Depression: p = 0.010, pFDR = 0.211, Anxiety: p = 0.031, pFDR = 0.329) (Supplementary Fig. 3a). There was no significant effect of steps on a 1 or 7-period lag of distress symptoms. See Supplementary Fig. 3 and Supplementary Data 1 and 2 for additional time-lag results.

Effects of digital activity on anxiety, distress and depression

We next assessed the relationship between screentime (Supplementary Fig. 4) and anxiety, distress, and depression eMASQ-EMAs.

There was a main effect of screentime on incidence of distress (Supplementary Fig. 4b, N = 80, p = 0.005), whereby the odds of observing zero symptoms of distress increased with higher screentime (OR = 1.63, small effect). There were no main effects of screentime on anxiety (Supplementary Fig. 4a) or depression (Supplementary Fig. 4c) symptom severity or incidence. There were main effects of group on anxiety, and weekend, day and group on distress and depression. See Supplementary Tables 1416 for full results and model residuals.

Results from the linear mixed effects model with a 1 and 7-period lag demonstrated no significant lagged effect of screentime on symptoms of anxiety, distress, or depression. See Supplementary Fig. 5 and Supplementary Data 3 and 4 for full time-lag results.

Assessing interactions over time: The DEPNA Model

The Dependency Network Analysis (DEPNA) method (Fig. 6) provided an estimation of the influence of each individual symptom measure on the relationships between all other symptom and activity measures over time (Fig. 7a–c). In terms of overall influence, intrinsic motivation and extrinsic motivation had the highest influence (Influencing Degree) on the rest of the symptom network in the MA group (Table 2). The MA group exhibited significantly higher influence of extrinsic motivation and intrinsic motivation, as compared to HC (N = 48, p < 0.02, q FDR < 0.05 and N = 48 < 0.01, q FDR < 0.05, respectively) (Table 2, Fig. 7b). Depression and extrinsic motivation were more influenced by other measures in the MA group compared to HC (N = 48, p = 0.01, q FDR = 0.07 and N = 48, p = 0.02, q FDR = 0.07, respectively (Table 2, Fig. 7c), although this did not survive FDR-correction. In terms of specific, directed influence, intrinsic motivation significantly influenced steps (N = 48, p < 0.003, q FDR < 0.05), to a greater extent among the MA group compared to HC (Fig. 7a). On the other hand, extrinsic motivation influenced distress to a greater extent among the MA group compared to HC (N = 48, p < 0.004, q FDR < 0.05) (Fig. 7a). We did not find any significant FDR corrected results among HC compared to MA.

Fig. 6: The DEPNA.
figure 6

a The correlation influence of node j on the pair of nodes i and k is defined as the difference between the correlation, C(i,k), and the partial correlation, PC(i,k | j). The partial correlation coefficient captures the effect (or contribution) of one node on the correlation between a pair of nodes. When this coefficient is large, it means a significant fraction of the correlation between a pair of nodes can be explained by the effect of a third node (Jacob et al., 2019). b A dependency matrix is created by calculating the partial correlation effect for each node on all other pairwise correlations in the network. The total influence of node j on node i, D(i,j) is defined as the average influence of node j on the correlations C(i,k), over all nodes k. The node dependencies define a dependency matrix D, whose (i,j) element is the influence of node j on node i. c The ‘Influencing Degree’ of node j is defined as the sum of the influence of node j on all other nodes i. The larger this measure is, the greater its impact on all other connections in the network and the more likely it is to generate the information flow in the network. d A graph visualization based on the pair-wise dependency connectivity matrix. The graph is color-coded according to its influencing degree. Pair-wise nodes with dependencies that were significantly different between the two groups at the p < 0.05 (FDR corrected) level are plotted as edges. Each edge is color-coded according to the t-test sign as light or dark gray with the arrows representing the direction of influence.

Fig. 7: DEPNA results.
figure 7

a A network illustration and graph visualization of the ‘influencing degree’ of symptoms in the MA group against healthy controls. Each region is color-coded according to the t statistic value from the t-test between the ‘Influencing Degree’ of the two groups. All pair-wise ROIs with connections, significant at the p < 0.05 level, are plotted as edges. Each edge is color-coded according to the t-test sign as light or dark gray with the arrows representing the direction of influence. b The nodes’ averaged ‘Influencing Degree’ and (c) ‘Influenced Degree’. The total influence of both extrinsic and intrinsic motivation was significantly higher among the MA group compared to healthy controls. *p < 0.05, **p < 0.05 FDR corrected. HC Healthy Control group, MA Mood/Anxiety disorder group.

Table 2 Influencing and influenced degree of symptoms and activity measures as estimated by the DEPNA

Discussion

This study presents a new set of single-item self-reported anxiety, distress and depression eMASQ-EMAs that were significantly related to established in-lab measures of these symptom domains. Novel, exploratory intrinsic and extrinsic motivation EMAs were also significantly related to mood and anxiety symptoms in the real-world. Both types of motivation significantly influenced all other symptom and activity measures to a greater extent in the mood and anxiety group compared to the HC group. Finally, higher within-subject physical activity (steps) was significantly associated with lower severity of anxiety, distress and depression, with a variable temporal association with depression and anxiety symptoms in the MA group. Meanwhile, digital activity (screentime) was only associated with the absence of distress.

Overall, the current results demonstrate the utility and feasibility of digital phenotyping for accurately monitoring symptoms in participants with psychiatric conditions that have been associated with high burden and drop-out rates42,43. This utility is in line with prior work leveraging EMAs to measure mood and anxiety symptoms38,39,44 and in relation to gold-standard assessments to determine their utility36,37. Further, in this work we found similarly good adherence levels between groups, suggesting that the use of few >=5 single-item daily surveys over 30-days, alongside weekly check-ins provide good adherence within a feasible, low-burden framework. Indeed, previous work demonstrates that providing some form of digital feedback can enhance adherence45,46. Maintaining good adherence over time is critical given the significant variability in symptom severity in individuals with mood and anxiety disorders demonstrated.

Previous work assessing the relationship between intrinsic and extrinsic motivation and anxiety, distress, and depression EMAs has been limited, in part due to a lack of consensus on the precise definition of intrinsic motivation, and how it can be distinguished from extrinsic motivation24,47. Reduced extrinsic motivation and sensitivity to extrinsic rewards has been consistently measured in mood disorders such as MDD22,48,49,50, but the impact of intrinsic motivation is unclear. Here, both extrinsic and intrinsic motivation showed significant negative relationships with anxiety, distress, and depression EMAs such that higher intrinsic motivation was associated with a 13–18% decrease in symptom severity while higher extrinsic motivation was associated with a 7–12% decrease in symptom severity across both MA and HC groups. However, interactions between group and intrinsic motivation revealed that groups were differentially impacted by changes in intrinsic motivation. Specifically, in individuals with mood and anxiety disorders the protective effects of intrinsic motivation are 8–12% weaker across each symptom domain, suggesting that the relationship between intrinsic motivation and symptom severity is more nuanced and may be impacted by differences in cognitive flexibility that engaging intrinsic motivational processes may require. Indeed, prior works demonstrate evidence suggestive of a general difficulty with engaging intrinsic motivational processes across depressive phenotypes51,52,53. Meanwhile in anxiety disorders the development of high intrinsic motivation might derive from maladaptive uncertainty learning that problematically drives elevated avoidance behaviors21,53. Notably, intrinsic motivation seemed to have a greater impact on reducing overall depression severity in comparison to extrinsic motivation (18% decrease versus 12% decrease, respectively). This finding is in line with prior work, suggesting that when individuals can effectively engage intrinsic motivational processes such as working for personal growth or for personal satisfaction, they may be more protected against depression54. Interestingly, group-level effects were only observed in the extrinsic motivation models whereby participants with a mood or anxiety disorder showed approximately 2 times the incidence of anxiety, distress, and depression in comparison to healthy controls, when extrinsic motivation was held constant. Day of the week also had an effect on the relationship between extrinsic motivation and each eMASQ-EMA, indicating that symptom severity was 6–10% lower on weekends versus weekdays. Similar effects were observed between intrinsic motivation and distress only, suggesting that external factors such as engaging in rest or non-work-related activities during weekends may contribute to symptom reduction. Taken together these findings suggest that the impact of intrinsic motivation is more variable based on group membership but overall, may have more protective effects on depression than extrinsic motivation regardless of weekday. While the current study cannot explicitly demonstrate which kinds of intrinsic or extrinsic motivators participants drew from when completing assessments, the examples provided in the EMAs aligned with a mix of previous reports of intrinsic and extrinsic rewards in order to capture the entire scope of the phenotype24,55,56,57. It is therefore not clear from this study whether intrinsic versus extrinsic reward sensitivity versus motivational tone were important, as both outcome sensitivity and internal drive or vigor could differentially contribute to the self-assessment of intrinsic or extrinsic motivation. Future work should more precisely characterize the array of intrinsic and extrinsic factors that could contribute to motivation in the real-world. Nonetheless, these findings implicate the importance of assessing for the kinds of motivators individuals experience and how they may relate to affective processes.

In terms of digital activity (screentime), we did not observe an effect of screentime on anxiety, distress, or depression in the count models. However, there was a small effect of screentime on distress in the logit model suggesting that increased screentime was associated with higher odds of not experiencing distress. While concerns regarding the psychological and cognitive impacts of screentime persist58,59, this latter finding may be more reflective of the positive impact of smartphone usage27,60. Recent work emphasizes the importance of considering the type of motivation behind smartphone usage when assessing its impact on health outcomes61. For instance, smartphones might be increasingly used to monitor one’s fitness levels and maintain social connections which can have positive outcomes62, or more negatively used to passively monitor others’ online lives. Although this dichotomous view has been recently challenged63, motivations behind use, and a range of other variables, can all influence the effect of screentime between users and within users over time29,64,65. Indeed, there was an effect of weekend on the relationship between screentime and symptoms of distress/depression but not anxiety. This suggests that distress and depression symptom severity were 9–12% lower on the weekend when accounting for screentime, and further supports the nuanced relationship between screentime and mental health. The present study did not assess the type of usage nor motivations for personal digital device usage, which could contribute to the null findings in the count models. However, we did explore whether steps taken per day was associated with anxiety, distress, and depression given the established relationship between screentime, physical activity, and cognition27,66,67.

Overall we found that when individuals engage in greater levels of physical activity than their typical baseline, there is a 7–12% decrease in anxiety, distress and depression symptom severity, in line with previous work on the mental health benefits of exercise68,69,70. However our time-lagged linear mixed effects model revealed that the benefits of physical activity on depression and anxiety levels are temporally delayed in psychiatric groups and manifest themselves most prominently about a week later. While a marginal effect of physical activity on depression was observed at a 1-period lag, whereby higher step counts were associated with lower depression at the next time point, this result did not reach statistical significance. Taken together these findings suggest that while there is an apparent reduction in symptom severity associated with increased physical activity, the benefits of physical activity on mental health may take longer to manifest (i.e., the effect of steps on depression 1 day post-exercise is weaker in comparison to its effects 7-days post-exercise). Importantly, we note that there was a lack of regular interval reporting periods in our data and caution this interpretation. Furthermore, there was a main effect of sex on the relationship between anxiety and steps taken per day suggesting that females report approximately 2 times the incidence of anxiety symptoms compared to males when accounting for physical activity. However, little emphasis has been placed on the association between physical activity and anxiety71,72. Importantly, a significant association between steps and distress was not present in the best-fitting model therefore results from this model should be interpreted with caution. Future research should consider not only the quantity of physical activity in relation to psychiatric symptoms but also the type and intensity in relation to other behaviors such as screentime and socioeconomic factors which may require high occupational physical activity with minimal time for leisure physical activity.

Having established that there is a relationship between physical activity, changes in motivation and mood and anxiety symptoms, the present study used the DEPNA model to evaluate which feature in this network of symptoms and behaviors most impacts all other features in the network. This is the first application of this type of directed graph network analysis to digital phenotyping data in mood and anxiety disorders. The DEPNA model uniquely captures the interactions between measurements of symptom severity and behavior by leveraging partial correlation effects. Partial correlations determine how a third variable affects the correlation between two other variables. For example, this application would capture how depression severity influences the correlation between intrinsic motivation and steps. In this way, a correlational influence, but not a causal influence, can be determined based on the network’s hierarchy of influence. The DEPNA revealed that, rather than symptom severity per se, it was the putative underlying measures of intrinsic and extrinsic motivation that had the greatest influence over all symptoms and activity measures in the MA group. This suggests that measures of cognitive constructs related to drive and activity may be more useful in characterizing phenotypes in the real-world. Further work should explore other cognitive measures that have been linked to mood and anxiety disorders such as executive function or sleep disturbance. In the MA group, depression was also the symptom domain that was most influenced by the other measures, suggesting that this particular symptom domain is most malleable or receptive to change. This coincides with the high variability we observed of this measure over time (see Supplementary Fig. 6). Interestingly, intrinsic motivation had amongst the greatest impact (‘influence’) on all other measures over time and seemed to influence physical activity more in the MA group. Previous work demonstrates a link between higher physical activity and lower mood and anxiety symptoms68,69,73. Therefore this suggests that in this population, intrinsic motivation might act through physical activity to modulate symptom severity and may be what aggravates the disorder4. The current findings also highlight the importance of assessing motivational changes in relation to mood and anxiety disorders. Prior work demonstrates that motivational deficits undermine functioning in patients with depression19 and that secondary effects on specific symptoms can occur through changes in other symptoms74,75. Therefore, the assessment of motivation levels could (1) serve as an important risk factor for mood and anxiety disorders, (2) aid in understanding symptoms that are prone to exacerbating into a depressive episode, and (3) serve as a surrogate endpoint in clinical trials where the primary endpoint is unmet. This latter point speaks to the need to investigate the effect of treatments on individual symptoms as opposed to a summed score.

There are several limitations of this study that could be addressed in future research. First, there was missing data across EMAs that could be attributed to participant’s either skipping certain surveys or a technical issue with the application failing to administer surveys on a given day. Despite this missingness, we still observed a good level of adherence to the study and were able to maintain at least 3 days of survey data per participant. In line with this limitation, our exploratory analysis assessing whether changes in physical/digital activity may precede decreases in mood and anxiety symptoms was confounded by a lack of regular interval reporting periods to which we primarily focus on results from a 1 and 7-period lag. To minimize missingness, future work aimed at exploring these associations may benefit from providing daily digital feedback or offering additional monetary incentives to reward participants’ completion of EMAs in full accordance with study protocols.

In conducting the DEPNA, we introduce a new analytical method that overcomes limitations of other modeling approaches requiring high temporal resolution or predefined assumptions of directionality. However, the DEPNA does not take into account any information on temporal lags and was conducted in a smaller sample size to ensure sufficient active, steps, and screentime data points for a reliable correlation analysis between each feature. Nonetheless, the DEPNA and ZIP models remain useful analytical methods that may help overcome certain limitations with the kinds of large-scale, time-series, and highly interrelated data structures that come from digital phenotyping17. While ZIP models rely on the use of a Poisson process that operates under the assumption that one is measuring ‘counts’ of the outcome variable, our EMA’s operated on a 0–10 likert scale producing “pseudo-count” distributed data which, in taking a data-driven approach, demonstrated that ZIP models were the best fitting models based on AIC criteria. Additionally, the zero-inflation observed in our data was inherently due to the inclusion of a control group that reported majority zero symptoms (Supplementary Fig. 7). While the logit model accounts for excess zeros and only provides a measure of the likelihood of an individual reporting zero, we do find that this joint modeling approach allows for a better quantification of differences in not only symptom severity but also behaviors that may be associated with the absence of symptoms independent of diagnosis. This is evident in our steps model specification whereby group membership alone is enough to predict the absence of depression, meanwhile additional information on steps taken per day and survey day are necessary to predict the absence of distress. Further, upon repeating each set of analyses within the MA group using a standard linear mixed-effects models adjusted for age, sex, survey day, and weekend with a random effect of participant, the general direction of the associations was consistent across all models – with the exception of steps in which case, there is only a significant association between steps and depression (see Supplementary Tables 2233 for full results). All in all, we call for future researchers to visually inspect their data and assess for zero-inflation when there are control groups that may inevitably report zero symptoms and violate model performance. We also suggest the use of DEPNA or network models to assess how different variables interact with one another given the dynamics of mood.

Second, models assessing the relationship between extrinsic and intrinsic motivation on anxiety indicated residual dispersion, as did the models for screentime and steps (see Supplementary Tables 516 for results). Therefore, while we include these results, we adhere caution with their interpretation. Third, the steps data downloaded from the mindLAMP server came from three different sources: (1) a pedometer, (2) Apple health and (3) Google Mobile Services. Within these data, there were instances in which the source was not clearly specified, and null values were assumed to come from the pedometer after consulting with the application’s platform developers. It is unclear whether the pedometer or Digital Health Ecosystems data had greater sensitivity, however, we did find a moderate correlation between the two sources (see Supplementary Fig. 8). Therefore, we do not expect that the choice of data source would alter the presented results. Further, we recognize that smartphone-derived step counts may not capture all physical activity data reliably and are limited to capturing instances in which participants carry their phone on their person. Future researchers may capture greater sensitivity of movement by integrating data from an accelerometer or wearable device and pairing it with location data for further contextualization. Finally, we acknowledge the lack of diversity of our study sample, which may raise concerns about the lack of representation of ethnic minorities in research studies. Although digital phenotyping aims to offer insights into real-world patient populations, its effectiveness is compromised when certain groups are excluded. This exclusion can stem from factors such as lower smartphone ownership, digital literacy, or limited access to healthcare, which in turn exacerbates these groups’ vulnerability to mental health issues. To ensure greater access and a greater representation of ethnic diversity, we encourage researchers to supply digital devices and engage in efforts to actively recruit participants from underserved communities.

In conclusion, this study presents EMAs of anxiety, distress and depression symptom severity in individuals with mood and anxiety disorders, that corresponded well to gold-standard in-lab measures. These findings highlight the potential of digital phenotyping for accurately assessing and monitoring psychiatric conditions with good adherence. Furthermore, using a combination of ZIP models, network analysis, and time-lagged linear mixed effects models the presented work highlights how underlying cognitive measures such as intrinsic and extrinsic motivation may be most influential in predicting symptom severity and physical/digital activity.

Methods

Participants

Adult volunteer research participants (ages 18–75) were recruited from the greater New York City area through the Depression and Anxiety Center at the Icahn School of Medicine at Mount Sinai (ISMMS). This age range was selected to capture a broad spectrum of individuals with behaviors relevant to mental health and phone usage across the lifespan. Given the level of comorbidity and overlapping symptom profiles of anxiety, depressive or stress-related disorders in our study sample (63.5% comorbidity), participants were included in a single mood and anxiety (MA) disorders group if they met DSM-V criteria for MDD, post-traumatic stress disorder (PTSD), or an anxiety disorder (including GAD, Social Anxiety Disorder, and Panic Disorder) as determined by the Structural Clinical Interview for DSM-V Axis Disorders (SCID)76. Grouping participants in this manner allowed for a transdiagnostic approach and dimensional analyses of symptom severity, rather than categorical diagnoses, which might overlap across individuals and heterogenous presentations. Healthy control (HC) participants free from any current or past psychiatric diagnoses as determined by the SCID or the MINI were also enrolled. Participants were excluded if they did not speak English or own a smartphone that could run the study applications. After screening, the full MASQ was completed in-lab to assess Anxious Arousal, General Distress, and Anhedonic Depression to capture shared underlying dysfunctions that might contribute to both mood and anxiety disorders6. All study procedures were conducted in accordance with the guidelines and regulations set by the Program for Protection of Human Subjects and Institutional Review Board at the ISMMS. Participants provided written informed consent and were compensated for their time.

Digital phenotyping

The smartphone application, mindLAMP, was utilized to capture active data and passive data on both Apple and Android personal smartphone devices over a 30-day study period. Active data included daily single-item measures of Anxious Arousal (anxiety), General Distress (distress), and Anhedonic Depression (depression) (see Table 3). These single-item scales were developed by summarizing questions that constitute the MASQ35 tripartite subscores for Anxious Arousal, General Distress, and Anhedonic Depression (see Table 3). Participants also completed two exploratory novel measures of intrinsic and extrinsic motivation on the same daily basis (see Table 3). The intrinsic motivation measure (see Table 3) was developed by summarizing themes from the interest/enjoyment subscale of the Intrinsic Motivation Inventory77 and the Work Extrinsic and Intrinsic Motivation Scale78. Meanwhile the extrinsic motivation measure (see Table 3) was developed by summarizing themes surrounding work motivation from the Work Extrinsic and Intrinsic Motivation Scale78. Participants provided responses to each measure on a 0–10 point Likert scale, with 0 indicating ‘Strongly Disagree’ and 10 indicating ‘Strongly Agree.’ Passive data were also collected continuously in the background via smartphone sensors and served to monitor screentime and steps taken per day through a pedometer, Apple Health application79, and Google Mobile Services. Participants were guided through download of mindLAMP onto their own smartphone. Each week, the research team assessed adherence and provided feedback to the participant via a standardized email template to encourage adherence80.

Table 3 EMAs administered to participants via the mindLAMP application

Preprocessing

Active and passive data were preprocessed in Python. Survey days were coded as chronological days since enrollment and stratified to begin at 6 am to account for potential duplicates from the built-in sensors. Survey responses that occurred on the weekend were determined based on the date on which a survey was completed and coded as ‘1’ for weekend and ‘0’ for weekday. A flow chart illustrating the breakdown of participants is shown in Fig. 1. Briefly, while N = 114 participants were enrolled in the study, N = 8 participant’s data were unable to be preprocessed due to technical issues resulting in no data, or a complete lack of participant engagement. Therefore N = 106 participants data were successfully obtained and preprocessed. Of these, N = 101 participants (49 HC, 52 MA) met criteria for inclusion based on completing at least 3 days of active data (see Table 1 for participant demographics). These participants were included in an adherence analysis given the study requirements for completing 30 days of surveys. Of the N = 101 participants with active data, N = 80 had MASQ scores available and were included in the analysis for assessing the reliability of the eMASQ-EMAs.

Passive data was obtained from a total of N = 106 participants. However, only N = 73 participants (38 HC, 35 MA) had steps sensor data available from the mindLAMP app and N = 81 participants (43 HC, 38 MA) had screentime sensor data available and met criteria for inclusion based on having at least 3 days of data (See Fig. 1 for information on missingness and exclusion criteria). Steps sensor data was collected from three distinct sources via mindLAMP: (1) a pedometer within the mindLAMP platform which was used to calculate daily steps by taking the maximum step count on a given day, (2) Apple Health which was used to calculate steps taken per day by taking the cumulative sum of step counts on a given day, and (3) Google Mobile Services which was used to calculate steps taken per day similarly to the Apple Health source. Apple Health and Google Mobile Services data were considered Digital Health Ecosystems data and selected as the primary source of steps data. To account for days with missing Digital Health Ecosystems steps data, we interpolated values from the pedometer source on days where this source had data available, given the moderate correlation that exists across values between these two data sources (see Supplementary Fig. 8). Finally, to account for a sensor-related error-margin, values recorded within 30 s of each other and within 10% magnitude of each other were considered duplicates and only the first entry was included (see Supplementary Fig. 9).

Screentime sensor data were subset into epochs of morning (6 am–12 pm), afternoon (12 pm–6 pm), evening (6 pm–12 am), and overnight (12 am–6 am). The raw screentime data consisted of timestamps corresponding to each instance of change in “device state,” including “screen on,” “screen off,” “device locked,” and “device unlocked,” with each timestamp denoting the transition between these states. Preprocessing involved segmenting the data by calculating the duration between each instance of “screen on” and either “screen off” or “device locked,” accounting for instances where the device transitioned directly to “device locked.” Subsequently, the total screentime for each quadrant of the day was computed and days with missing screentime data were removed. Segments with durations of less than 30 s were excluded to minimize the influence of brief screen activations, often attributed to notifications rather than active use. Taking the raw data, we separately subset participants with a minimum of 5 days of active, steps, and screentime data to retain sufficient timepoints for a reliable correlation analysis between each feature. This resulted in a sample of N = 48 participants (27 HC, 21 MA). Using this dataset, we extracted the time course of each of the measures to apply the Dependency Network Analysis (DEPNA) method described in detail in the following section (Integrating data over time: Dependency Network Analysis (DEPNA)).

Participant study adherence

To assess how well participants engaged with the application and determine the feasibility of digital phenotyping studies within a psychiatric patient population, we used simple, unpaired two-sample t-tests to determine if there was a significant difference in the average number of days on which participants from each group completed at least one EMA. Additionally, given the requirement to complete surveys daily for 30 days, an unpaired two-sample t-test was used to assess if there is a significant difference in survey completion by day between our two groups.

Assessing reliability and consistency of eMASQ-EMAs

To determine the reliability of each eMASQ-EMA, we constructed models with the eMASQ-EMA as the dependent variables, the corresponding MASQ score and subgroup classification (MA vs. HC) as independent variables, a covariate for survey day and a random effect for participant. Additionally, we conducted a stratified analysis to examine associations across each group. In this analysis, we refitted three similar regression models without the group term as a predictor for different cohorts (i.e., MA, HC, and combined MA + HC). We evaluated and visualized the differences in regression coefficients for each model using lollipop plots. Finally, to assess the internal consistency of the eMASQ-EMA over time, we extracted intraclass correlation coefficients (ICC) from the models, which were also visualized via lollipop plots. The ICC was interpreted using standard nomenclature where values below 0.5 indicate poor reliability, between 0.5 and 0.75 moderate reliability, and any value above 0.75 indicates good-to-excellent reliability81 (see Supplementary Fig. 1 and Supplementary Result 1). Additional linear mixed effects models adjusted for age, sex, survey day and weekend, with a random effect for participant were also tested to assess the reliability of each eMASQ-EMA within the MA group (see Supplementary Result 2, Supplementary Tables 24, Supplementary Fig. 2). To examine variability of these three measures in the real-world, the standard deviation over time was computed for each participant in each group and entered into independent-samples t-tests, or welch tests where appropriate.

Exploring the effects of intrinsic and extrinsic motivation on anxiety, distress and depression

To assess the relationship between intrinsic and extrinsic motivation and eMASQ-EMA symptoms in the whole sample (HC and MA), a series of mixed-effects models were fitted and compared based on the AIC metric (see Supplementary Result 3 and Supplementary Tables 1720 for results). Zero-inflated Poisson (ZIP) mixed-effects models were determined to be the best fitting models based on AIC (see Supplementary Tables 1720) and in line with our observation that healthy controls consistently reported on the lower end of the eMASQ-EMA scales, i.e., they produced zero-inflated datasets (see Supplementary Fig. 3). ZIP models are mixture models that consist of two parts: (1) a Poisson count model which serves to estimate the incident risk ratio (IRR) and (2) a logit model for estimating an odds ratio (OR) and predicting excess zeros82,83. These models were fitted using the GLMMadaptive package’s ‘zi.poisson()‘ family in R to assess the relationship between variability in eMASQ-EMA symptoms and intrinsic/extrinsic motivation. All models included group, survey day and a random effect for participant. Additional models were estimated including interaction terms and covariates of age, sex, and weekend. Models were compared using the AIC metric and the most frequently best-fitting model was selected for main results reporting (see Supplementary Table 21). Results tables reported in the supplement include both standardized and unstandardized IRRs and ORs that were generated using the sjPlot package’s ‘tab_model‘ function in R with the parameter show.std = “std2” given our models include continuous and binary predictor variables. To aid in interpretation, unstandardized ORs are reported as effect sizes whereby OR < 1.5 and OR > 5 are indicative of small and large effect sizes, respectfully84. Anything in between these bounds was considered a medium effect. Meanwhile unstandardized IRRs are reported in the main text and interpreted as percent changes (Percent Change = (IRR –1) × 100). To illustrate the results from these models we used the ‘effectPlotData‘ function in R to generate partial effects plots with robust (sandwich) standard errors to visualize the fitted model’s effects, which include adjusting for covariates. Visualizing the data in this way enables us to display the predicted relationships between key predictors and the outcome while adjusting for covariates. Importantly, the plots account for model uncertainty, which is especially relevant given the use of a zero-inflated Poisson model to account for the distribution of the outcome which consisted of pseudo count data bounded between 0 and 10 with excess zeros.

Results from the model adjusting for age, sex, survey day and weekend with a random effect for participant are reported in the main text given this model was most frequently determined to be the best fit. See Supplementary Materials (Supplementary Result 3 and Supplementary Table 21) for full details including results from the model comparison. The IRR and OR for each predictor variable was calculated by taking the exponential of each regression coefficient. Analyses were conducted at the level of the individual as opposed to the observation given the longitudinal nature of the data. In interpreting the results from this model, a significant effect of group, age, and sex would indicate there were differences in symptom severity between HC and MA, different age groups, or males versus females. A significant effect of day would indicate there was a change in symptom severity over the course of the study. Finally, a significant effect of weekend would indicate there were differences in symptom severity on weekends in comparison to weekdays.

Exploring the effects of physical activity and screentime on symptoms

Similarly to the motivation analysis, separate ZIP models were tested and compared to assess the relationship between variability in eMASQ-EMA symptoms and physical activity (steps taken per day) and between eMASQ-EMA symptoms and digital activity (screentime) within the whole sample. Again, results from the model adjusting for age, sex, survey day and weekend with a random effect for participant are reported in the main text with results from competing models available upon request.

Steps and screentime data were scaled within-participants to assess how fluctuations in a participant’s own step count/screentime were associated with their respective symptom severity over time. Therefore, model coefficients reported from these data are standardized. Datapoints greater than 3 standard deviations from the mean were considered outliers and removed from the following analysis. The IRR and OR for each predictor variable was calculated by taking the exponential of each regression coefficient. Analyses were conducted at the level of the individual as opposed to the observation given the longitudinal nature of the data. Separate analyses assessing how differences in step counts and screentime across participants relate to symptom severity were conducted by scaling between participants and are available upon request.

To explore the timing of associations between physical activity/digital activity and symptoms on subsequent time-points, a time-lagged linear mixed effects model was performed and adjusted for survey day with a random effect for participant, using the ‘lead‘ function from the dplyr package in R. Each eMASQ-EMA served as the dependent variable and steps/screentime served as the independent variable. A time lag varying from 1 to 7 time-periods was placed on each eMASQ-EMA to determine how physical activity/digital activity can affect mood and anxiety symptoms within psychiatric groups at subsequent time-points. We primarily focus on a 1 and 7-period lag given the lack of regular interval reporting periods in our data.

Integrating data over time: Dependency Network Analysis (DEPNA)

To assess how all measures interacted with one another over time and evaluate which measure (symptom/activity) most impacts individuals with a mood/anxiety disorder, we applied the DEPNA model to the full dataset85,86,87,88,89,90,91,92. DEPNA is a graph theory network method for constructing a directed graph. It takes as input the time course of each variable (i.e., node) such that participant’s symptom scores as they occurred from Days 1–30 are maintained. Given participants’ lack of regular interval reporting periods, DEPNA proves to be an advantageous method for modeling digital phenotyping data as it does not require a high temporal resolution, unlike other methods such as Granger causal modeling. Further, unlike structural equation models, DEPNA does not require any specific a-priori direction of influence. In this way, the application of the DEPNA model allows us to take a data-driven examination of a large network, illustrate the connections between different digital phenotyping variables (i.e., nodes) and use arrows to show the direction in which each variable impacts other variables or is impacted by other variables.

DEPNA was originally introduced for the study of financial data90,93 and has since been extended and applied to other systems, such as the immune system92, semantic networks91 and functional brain networks87. Briefly, DEPNA assesses a node’s centrality in a network based on its correlational influence, which is measured by the partial correlation coefficient. This coefficient quantifies the unique contribution of one node to the correlation between two other nodes, after removing the effect of a third node. In simpler terms, the partial correlation reflects the direct relationship between two variables while controlling for the influence of a third. Correlational influence, then, is the difference between the overall correlation of two nodes and their partial correlation when accounting for a third node. When this coefficient is large, it means a significant fraction of the correlation between a pair of nodes can be explained by the effect of a third node. Given correlation does not imply causation, we cannot infer causal influence from this method, and only infer the networks’ hierarchy of influence based on correlational influences87. In this way, DEPNA offers a new computational model for quantifying and comparing directed graphs based on timeseries data.

In this approach, each of the active and passive data measures represent a node in the graph. First, all the nodes (i.e., measures) time course were normalized within participants using Z-Score. Then, the pairwise node–node connectivity matrix was calculated using Pearson correlations and normalized using a Fisher r-to-Z transformation. Next, we calculated the partial correlation between two nodes, i and k, with respect to a third node, j – PC(i,k | j). The influence of node j on the pair of nodes i and k was then defined as the difference between the correlation, C(i,k), and the partial correlation, PC(i,k | j), given by the following equation and Fig. 6a (1):

$$d(i,k|j)\equiv C(i,k)-PC(i,k|j)$$
(1)
$$PC(i,k|j)=\frac{C(i,k)-C(i,j)C(k,j)}{\sqrt{[1-{C}^{2}(i,j)][1-{C}^{2}(k,j)]}}$$

This coefficient is large only when a significant fraction of the correlation between nodes i and k can be explained in terms of node j. We then repeated this process to calculate the partial correlation effect for each node on all other pairwise correlations in the network. The total influence of node j on node i, D(i,j) is defined as the average influence of node j on the correlations C(i,k), over all nodes k, given by Eq. (2) and Fig. 6b:

$$D(i,j)=\frac{1}{N-1}\mathop{\sum }\limits_{k\ne j}^{N-1}d(i,k|j)$$
(2)

The node dependencies define a dependency matrix D, whose (i,j) element is the influence of node j on node i. Particularly, the dependency matrix is nonsymmetrical since the influence of node j on node i is not equal to the influence of node i on node j.

The ‘Influencing Degree’ of node j is defined as the sum of the influence of node j on all other nodes i, given by Eq. (3) and Fig. 6c:

$$Influencing\,Degree(j)=\mathop{\sum }\limits_{i\ne j}^{N-1}D(i,j)$$
(3)

The ‘Influencing Degree’ measure indicates the hierarchy of efferent (out-degree) influence of the node on the entire network. The higher this measure, the greater its impact on all other connections in the network and the more likely it is to generate the information flow in the network. The influence of the network on node j is termed the ‘Influenced Degree’ and is defined as the sum of the influences (or dependencies) of all other nodes i in the network on node j, given by Eq. (4):

$$Influenced\,Degree(j)=\mathop{\sum }\limits_{i\ne j}^{N-1}D(j,i)$$
(4)

The higher the ‘Influenced Degree’ measure, the more this node was dependent or influenced by all the other nodes in the network.

Next, we conducted a between-group two-sample t-test for each node’s degree of influence. All influencing or influenced t-test results were corrected for multiple comparisons (N = 8, number of nodes in the network) using false discovery rate (FDR) correction with p < 0.05 threshold. To create network graph visualization (Fig. 6d), we used the pair-wise dependency connectivity matrix. A two-tailed t-statistic was computed to compare the two groups. We then connected only pair-wise nodes with dependencies that were significantly different between the two groups (p < 0.05, FDR corrected for number of nodes) creating a simple graph visualization of the differences between the groups. Graph visualization was conducted using the NetworkX library in Python94.