Introduction

Mental health care systems face a multitude of serious challenges, including an ever-increasing demand for mental health support, insufficient funding, employment insecurity, high staff turnover, understaffing, and clinician burnout1,2,3,4. Concurrently, people seeking mental health support frequently encounter barriers to care such as long waitlists, reduced appointment availability, prohibitive financial costs, and limited access to ongoing sessions5,6,7.

The growing need for mental health care, alongside the proliferation of smartphone ownership, has catalysed the development of innovative digital interventions designed to address unmet mental health needs2,8. These technologies represent a paradigm shift in mental health care delivery9 by improving treatment accessibility and offering the potential to mitigate pressure on overburdened in-person services. By providing scalable, cost-effective, and evidence-based treatment options, digital interventions present a promising approach to meeting the increasing demand for mental health support10,11,12.

Digital mental health apps can be self-guided and fully automated, allowing users to access support independent of a clinician13. These apps offer the potential to increase access to support, reduce costs11,14,15, and help manage wait times for clinical services16. Digital interventions can also complement traditional face-to-face care through ‘blended care’ models, in which clinicians integrate technology into their practice to enhance both the online and offline components of care17,18,19,20. For example, clinicians may incorporate digital mood tracking between sessions, provide access to complementary online psychosocial content to reinforce therapeutic concepts, and use synchronous (e.g., real-time chat or video calls) or asynchronous (e.g., secure messaging) online communication to maintain engagement and support outside of scheduled appointments. Moreover, emerging digital interventions have the potential to address critical gaps across various phases of care—such as waitlists, discharge, and relapse prevention—thereby enhancing their value throughout the treatment journey21. The scalability of digital interventions holds promise for delivering personalised treatments to a much larger population than traditional face-to-face services can accommodate2,22. Consequently, evidence-based digital technology has emerged as a valuable resource for addressing the disparity between the demand for mental health care and the available supply1.

Digital mental health interventions are generally well-regarded by clinicians and young people23. They have demonstrated efficacy in both high-prevalence disorders such as depression and anxiety12,24,25, as well as complex and severe disorders such as psychosis26. Additionally, there are indications that digital interventions can be cost-effective compared with alternate treatments11,27. However, low user engagement remains a significant and long-standing issue3,28,29,30. In a review of real-world user engagement with popular mental health apps, Baumel et al. 31 reported an engagement rate of 3.9% by day 15, which declined to 3.3% by day 30. In a systematic review conducted by Fleming et al.29 examining user engagement in samples of people with depression and anxiety, sustained usage varied widely, ranging from 0.5% to 28.6%. These low and inconsistent rates of engagement, despite demonstrating positive outcomes in controlled environments12, raise concerns about the clinical utility and effectiveness of digital mental health interventions in real-world scenarios3.

Overcoming the engagement challenge is an essential component in bridging the gap between potential effectiveness and practical impact2. To be efficacious, digital interventions must enable offline therapeutic action—desired health behaviours initiated within a digital mental health app that are subsequently adopted into real-world settings32. This aligns with Cole-Lewis et al.'s 33 concept of Big E (health behaviour engagement) proposed in the framework for digital behaviour change interventions, which emphasises the importance of engagement in leading to actual health behaviour changes. According to this model, achieving Big E—the real-world adoption of health behaviours—is contingent on “Little e” (user engagement with the app’s features), which includes interactions with both app elements (e.g., games) and embedded behaviour change techniques designed to influence health outcomes (e.g., providing choice to support autonomy, as informed by Self-Determination Theory)47. The framework underscores that meaningful interaction with a digital intervention is a necessary precursor to real-world change, reinforcing that without engagement, even the most well-designed interventions are unlikely to achieve their intended impact.

In digital health literature, there is no universally accepted definition of user engagement3. This review adopts Borghouts et al. 34 comprehensive definition of engagement, which includes both initial adoption and continued interaction with the digital intervention, evidenced by behaviours such as signing up, using its features, and sustained use over time.

While the causal link between engagement and intervention efficacy remains unclear, it is broadly acknowledged that some degree of engagement is necessary for users to benefit from an intervention. Further, a lack of engagement complicates attributing positive outcomes to the intervention3. The relationship between user engagement and intervention effectiveness is complex and presumed to be influenced by various factors, such as type of engagement, the intervention itself, and individual user characteristics24. A nuanced understanding of the diverse and potentially interconnected factors influencing engagement is essential for shaping user behaviour and translating clinical efficacy into real-world outcomes.

The exploration of persuasive design holds promise for enhancing user engagement and supporting improved intervention outcomes30,35,36. The concept of persuasive design was developed to leverage technology to positively influence behaviour change and discourage harmful behaviour at an individual level37. This method is based on the concept that technology can serve more than just a functional purpose; it can also act as a catalyst to promote and support targeted behaviours, emotions and mental states35,37. Through the strategic application of persuasive design principles and strategies, these systems aim to motivate and support users in fostering positive shifts in attitudes and behaviours35,36,37,38.

The persuasive systems design (PSD) framework, proposed by Oinas-Kukkonen and Harjumaa38, consists of 28 principles categorised into four domains: (1) primary task support, which facilitates the primary goal of the intervention, (2) dialogue support, which enables communication between the intervention and the user, (3) system credibility support, which enhances the trustworthiness and credibility of the intervention; and (4) social support, which leverages a social experience within the intervention. Each domain is further broken down into 7 distinct persuasive principles. These principles serve as a roadmap for developers to craft persuasive systems that produce more compelling products to engage users and foster positive behaviour change over time39. See Table 1 for more information.

Table 1 Persuasive systems design framework38

Although Limited research has examined the role of PSD principles in digital interventions for mental health, Kelders et al. 30 identified a significant relationship between user adherence to web-based health interventions and the application of persuasive design elements. While the study did not focus solely on mental health, it found that design principles within the dialogue support domain—which aim to facilitate effective communication between the system and the user—were associated with increased adherence. As Kelders et al. 30 note, the study was constrained by a lack of usable reported usage data, as well as by the coding process, which relied solely on published descriptions. The authors recommend that future research further investigate the relationship between persuasive technology—particularly primary task support—and clinical outcomes in digital interventions.

Orji and Moffatt36 reviewed 16 years of literature on persuasive technology in health and wellness. The authors reported that although 92% of the studies reported positive results, the review did not identify a statistically significant correlation between the use of persuasive principles and the intervention outcomes. In 2021, McCall et al. 25 conducted a systematic review, meta-analysis, and meta-regression to examine the impact of persuasive design principles in self-directed eHealth interventions. Their findings provided modest preliminary support that interventions utilising more persuasive elements in the primary task domain were more effective for treating depression, but not anxiety. In contrast, Wu et al. 40, in a separate meta-analysis of persuasive design in smartphone apps for anxiety and depression, reported a negative association between persuasive design principles and engagement, as measured by completion rates, further highlighting the inconsistent evidence surrounding the role of persuasive design in digital interventions.

Given the rapid proliferation of mental health apps and the persistent challenge of sustaining user engagement, reviewing the application of persuasive design principles, and their impact on user engagement rates and overall effectiveness, is both relevant and timely. Previous research has often focused on broader eHealth interventions beyond mental health, restricted the scope to web-based interventions, or concentrated on specific conditions such as depression and anxiety. In contrast, this review takes a platform-specific approach by examining smartphone apps designed to address a broad range of mental health conditions. This focus helps address existing gaps and provides a more comprehensive understanding of how persuasive design principles influence both engagement and intervention efficacy across diverse mental health domains. Specifically, this review aims to (1) conduct a systematic review and meta-analysis of randomised controlled trials of smartphone mental health apps, (2) systematically assess the prevalence and types of persuasive design principles used in these interventions, and (3), via meta-regression, examine the relationships between persuasive design principles and the efficacy and engagement levels of digital mental health apps. Comprehensively exploring these factors will provide valuable insights to guide the development and refinement of mental health apps, ultimately contributing to enhanced mental health outcomes for users.

Results

The search identified 5030 records, with 4028 remaining after duplicate removal. Following title and abstract screening, 390 articles proceeded to full text review. Of these, 119 met the inclusion criteria for the systematic review, with 92 providing sufficient data for meta-analysis. The remaining 27 studies were included in the narrative synthesis but excluded from meta-analysis due to insufficient pre/post-outcome data or limited intervention details. The PRISMA 2020 flow diagram outlining the selection process is presented in Fig. 1.

Fig. 1: A total of 5030 records were identified via database searches.
figure 1

After removing 1002 duplicates, 4028 articles underwent title and abstract screening, with 3638 excluded. Full-text eligibility assessment was conducted for 390 articles, of which 271 were excluded. The final review included 119 studies.

Title and abstract screening was conducted independently by two authors (L.V. and J.D.X.H.), with proportional agreement rates of 89.81% (3615/4025) for title and abstract screening and 77.24% (275/356) for full-text screening.

Study characteristics

A total of 119 studies, comprising 30,251 participants, were included in the systematic review. Studies were published between 2016 and 2022 and spanned 27 countries, with the largest proportion of participants from the United States (31.1%), followed by the United Kingdom and Germany (8.4% each) and Australia (7.6%). Sample types varied, with 36 studies (30.3%) including non-clinical populations, 48 (40.3%) targeting subclinical populations, and 35 (29.4%) focusing on clinical samples. Mean participant age ranged from 14 years to 60 years (median = 34 years). Most interventions (n = 80, 67%) were fully self-guided, while 39 (33%) incorporated human support. Apps were more likely to include human support in clinical (60%, 21/35) and subclinical (29%, 14/48) populations than in non-clinical samples (11%, 4/36). Intervention duration ranged from 10 days to 18 months.

Depression was the most commonly targeted mental health condition, accounting for 25.64% of the studies. A total of 11% of the studies focused on a combination of depression, anxiety, and stress, whereas 9.4% of the studies focused on stress alone. Other mental health conditions targeted included eating disorders and body dissatisfaction/dysmorphic disorder (11%), generalised anxiety (7.69%), post-traumatic stress disorder (PTSD; 5.98%), general mental health and wellness (4.27%), psychotic spectrum disorders (3.4%), postnatal depression (3.4%), psychological distress (3.4%), a combination of psychotic spectrum and bipolar disorder (2.56%), non-suicidal and suicidal injury (1.7%), obsessive-compulsive disorders (OCD; 1.7%), agoraphobia and panic disorder (1.7%), suicidal ideation (0.85%), sleep disturbance (0.85%), resilience (0.85%), loneliness (0.85%), burnout (0.85%), and bipolar disorder (0.85%).

Most interventions were grounded in one or more psychological frameworks. Cognitive-behavioural therapy (CBT) was the most commonly used (60.5%, n = 72), followed by mindfulness-based approaches (23.5%, n = 28), Acceptance and Commitment Therapy (ACT) (4.2%, n = 5), and Dialectical Behaviour Therapy (DBT) (2.5%, n = 3). A small proportion of studies (5%, n = 6) did not specify a therapeutic framework. A detailed summary of study characteristics is presented in the supplementary information.

Risk of bias

The risk of bias was assessed using the Cochrane Risk of Bias Tool version 241 across five domains. Of the 119 studies included, 7% were classified as having a high risk of bias in at least one domain, while 13% were rated as having a low risk of bias across all domains. The majority (80%) were categorised as having either low or unclear risk in at least one domain. A comprehensive assessment of the risk of bias across the five categories for each study is presented in Table 1 of the supplementary information.

Publication bias

Publication bias for the overall effects meta-analysis was explored via visual inspection of the funnel plot of standard errors and trim and fill estimates. As shown in Fig. 2, the funnel plot appears symmetrical across both sides of the final estimate. Furthermore, trim and fill analyses revealed no imputed studies on the left- or right-hand side of this estimate. Hence, we noted limited evidence of publication bias.

Fig. 2: The funnel plot visualises the distribution of standard errors for primary studies included in the overall effects meta-analysis.
figure 2

Blue dots represent individual studies, while dashed lines indicate the 95% pseudo-confidence intervals. The estimated overall effect size (observed studies only) is shown as a dashed line, and the estimated effect size with imputed studies (if any) is shown as a solid black line. The plot appears symmetrical, and trim-and-fill analyses identified no imputed studies, suggesting limited evidence of publication bias in the meta-analysis.

Efficacy: overall and sub-group effects of intervention outcomes

Of all studies included in this review, 92 (N = 16,782 total participants) provided sufficient information for effect sizes to be calculated for intervention efficacy and were therefore included in the meta-analysis. Across these studies, sample sizes ranged from 16 to 2271 participants (M = 182.41, SD = 266.53) at study intake. There was an overall significant effect of intervention efficacy, with a medium size (Hedges g = 0.43, 95% CI [−0.53, −0.34], p < 0.001) indicating that at the post-intervention time point across studies, those in the intervention groups showed significant improvements in mental health outcomes compared with those in the control groups. Notably, however, there was substantial heterogeneity present within this effect (I2 = 83.4, range 46.3–98.7). Thus, we examined potential moderators to try and explain some of this variance. Table 2. displays the meta-analytic effects and heterogeneity estimates for both the total effect and sub-group moderation analyses, and Fig. 3. presents the forest plot of effect estimates per study.

Fig. 3: The figure displays individual study effect size estimates (Hedges’ g) with corresponding 95% confidence intervals.
figure 3

Squares represent the point estimates of intervention efficacy. Horizontal lines indicate confidence intervals, and the vertical line at zero represents no effect. Studies to the left of zero reflect a reduction in symptoms and thus favour the intervention, while those to the right suggest a negative or null effect.

Table 2 Meta-analytic and heterogeneity estimates for the overall effect, and sub-group moderation effects, of intervention efficacy

The sub-group moderation analysis revealed that the overall effect of intervention efficacy did not significantly differ according to sample type (Q(2) = 3.86, p = 0.145), outcome type (Q(9) = 14.48, p = 0.106), the presence of human support in interventions (Q(1) = 0.07, p = 0.789), or intervention type (Q(4) = 3.40, p = 0.493) (Table 2).

The analysis demonstrated that digital interventions were significantly more effective than control conditions in reducing symptoms of depression, anxiety, stress, PTSD, and body image/eating disorders. No significant effects were observed for social anxiety, psychosis, suicide/self-harm, broad mental health outcomes, or postnatal depression. However, it is important to note that these outcomes were each represented by only 2 to 4 studies, limiting the strength of the pooled estimates. In contrast, depression, anxiety, and stress were represented by 8 to 29 studies each, allowing for more robust and reliable effect size estimates.

Engagement: overview and key metrics across studies

Seventy-six percent (n = 90) of studies included in the systematic review provided data on engagement, while the remaining 25% (n=29) did not. Among the papers reporting engagement data, a total of 25 distinct engagement indicators were identified. These indicators were grouped into ten overall engagement metrics: (1) rate of uptake, (2) time (min/h) spent on the app, (3) days of active use, (4) logins, (5) modules completed, (6) study metrics, (7) messages sent and received, (8) posts and comments made, (9) participant self-reports, and (10) miscellaneous; see Table 3 for details. The most commonly reported user engagement metrics were: the percentage of participants who completed the entire programme or completed the programme per protocol, ranging from 8%42 to 100%43; the average percentage of modules completed, ranging from 30.95%44 to 98%45; and the mean number of logins or visits to the app during the intervention period, ranging from 11.52 logins46 to 106.84 logins.26

Table 3 Range of user engagement metrics across digital mental health interventions

A correlation analysis was conducted to examine the relationship between engagement and efficacy. Due to substantial variation in engagement measurement across studies, the analysis focused on the percentage of programme or per-protocol completion (n = 17), the most commonly reported metric, alongside pre- and post-intervention effectiveness data. No significant relationship was identified (Table 4).

Table 4 Correlation between engagement and intervention efficacy

Persuasive design principles: identification, frequency, and additional design features

None of the studies included in this review explicitly identified the use of specific persuasive design principles. However, 88% (n = 81) of the studies included in the meta-analysis provided sufficient descriptions of the app intervention, or referenced protocol papers, development papers, websites, or app store information, which allowed for deductive coding based on the PSD framework38.

A total of 22 out of the 28 persuasive principles in the PSD framework were identified across the reviewed apps. The number of principles per app ranged from 1 to 12, with a mode of 5. Table 5 .presents the persuasive principles identified in the review, the percentage of apps that incorporated each principle, and updated examples of how these principles were operationalised in the reviewed apps. Among the four domains, principles from the primary task support domain, which facilitates the completion of primary tasks or goals that the user seeks to achieve through the system, were the most frequently coded, accounting for 50% of the persuasive principles identified. This was followed by dialogue support (23%) and credibility support (22%). Only 5% of the persuasive principles were derived from the social support domain, which includes principles such as social learning and cooperation. Tunnelling, which is intended to guide the user through a process or experience in a manageable way, was the most frequently identified principle overall, and was present in 88% of the apps. Rehearsal, which provides opportunities to practice behaviour to help users prepare for real-world situations, was identified in 84% of the apps. 5.

Table 5 Prevalence and examples of persuasive design principles used in digital mental health apps: percentage of studies reporting each principle

In addition to the persuasive principles outlined in the PSD framework, this study identified additional recurring design features in the evaluated apps that may influence user behaviour and motivation. These features, summarised in Table 6, include: (1) staged disclosure, (2) goal setting, (3) explicit self-pacing, and (4) limited use.

Table 6 Prevalence and application of novel persuasive design principles in digital mental health apps

The identified features are grounded in motivational theories, including self-determination theory47 and goal setting theory48. These features promote progression, autonomy, and competence, enhancing motivation and engagement by balancing user control with elements of novelty, competition, and exclusivity.

Figure 4 uses a heatmap to illustrate the percentage of digital mental health interventions incorporating PSD features across specific mental health conditions. To reduce heterogeneity, only interventions targeting a single mental health condition with more than one intervention per condition were included, resulting in the analysis of 61 apps. Each cell represents the percentage of interventions using a specific PSD principle, with red tones indicating higher prevalence and green tones representing lower prevalence.

Fig. 4: This figure presents a heatmap illustrating the percentage of digital mental health apps within each mental health condition category that incorporate specific PSD principles.
figure 4

To reduce heterogeneity, only interventions targeting a single mental health condition with more than one intervention per condition were included, resulting in the analysis of 61 apps. Each cell shows the percentage of apps within the specific mental health category that incorporated a given PSD principle. Colour intensity reflects prevalence, with red tones indicating higher usage and green tones indicating lower usage.

Notable patterns include a higher prevalence of primary task support and dialogue support principles, with limited use of social support features. Rehearsal, Trustworthiness, and Tunnelling were the most commonly applied principles. In contrast, social features such as Social Comparison Competition, Cooperation and Social Learning were rarely used, highlighting their limited current use in digitalmental health apps.

Meta-regression: persuasive design principles and intervention efficacy

To explore the relationship between overall intervention efficacy and persuasive design principles, we conducted a meta-regression in which the number of persuasive design principles used across studies was regressed onto effect size estimates (Hedges g). Figure 5 displays a scatterplot illustrating the relationship between these variables. Overall, the analysis revealed no relationship between the number of persuasive design principles and intervention efficacy across studies, b = 0.01, SE = 0.02, 95% CI = [−0.04, 05], p = 0.804. Similarly, we found no significant association between intervention efficacy and any of the four persuasive design domains (primary task: estimate = −0.08, 95% CI [−0.17, 01], p = 0.078), dialogue: estimate = 0.06, 95% CI [−0.03, 14], p = 0.184), system credibility: estimate = 0.001, 95% CI [−0.11, 11], p = −0.981), social support: estimate = 0.05, 95% CI [−0.06, 16], p = 0.375)).

Fig. 5: This figure illustrates the relationship between the total number of persuasive design principles implemented in digital mental health interventions (PD_TOTAL) and intervention efficacy, measured by Hedges’ g effect sizes.
figure 5

Each point represents an individual study, with point size reflecting study weight in the meta-regression. The shaded region represents the 95% confidence interval around the regression line. The analysis found no significant relationship between the number of persuasive design principles and intervention efficacy, b = 0.01, SE = 0.02, 95% CI [−0.04, 0.05], p = 0.804.

Relationship between persuasive design principles and engagement: correlation and meta-analysis findings

The intended meta-regression to explore the relationship between PSD principles and engagement could not be conducted due to substantial variation in how engagement was conceptualised, measured and reported across studies. Instead, a two-tailed Pearson’s correlation analysis was performed to examine the relationship between the number of persuasive design principles used and app engagement. As there was substantial variation in measurements of engagement across studies, this correlation analysis focused on the most consistently reported engagement metric: percentage of users that completed the whole programme or percentage of users that completed the programme per protocol (n = 17). No significant relationship was found between the number of persuasive design principles used and programme completion, r(17) = 0.21, p = 0.43.

A separate random-effects meta-analysis was conducted to test between-group differences (intervention vs control) at post-intervention for each individual PSD principle. The analysis revealed the significant effects in favour of digital interventions compared with control conditions for the majority of PSD principles (see Table 7). However, no significant effects were observed for the use of social role, surface credibility, real-world feel, third-party endorsement, normative influence and cooperation. Importantly, these latter outcomes were represented by only 3–9 studies each, whereas the others represented up to 81 studies, providing a more robust pooled effect size in most cases.

Table 7 Random-effects meta-analysis of between-group differences (intervention vs control) at post-intervention, moderated by the use of individual persuasive design principles

Discussion

This study found that mental health apps significantly improved clinical outcomes compared to control groups, with a medium effect size (N = 16,728, g = 0.43). Significantly positive effects were identified for interventions targeting depression, anxiety, stress, and body image/eating disorders, whereas no significant effects were observed for interventions addressing psychosis, suicide/self-harm, postnatal depression, or overall mental health. These results may reflect the much larger body of research focused on depression, anxiety, and stress, underscoring the need for further investigation into digital interventions for less-studied conditions. Notably, no significant differences in outcomes were identified on the basis of sample type (clinical, subthreshold, or non-clinical), the presence of human support, or the clinical approach (e.g., CBT, mindfulness, psychoeducation).

Notably, none of the studies included in the meta-analysis explicitly reported the use of persuasive design principles in their app descriptions. Through deductive coding, it was determined that 79% of the principles from the PSD framework were present across the apps included in the meta-analysis. We found that apps used between 1 and 12 persuasive design principles, with a mode of 5. This aligns closely with the findings of McCall et al.’s 25 systematic review and meta-analysis, which identified between 1 and 13 principles per intervention, with a mean of 4.95. The most commonly implemented principles were tunnelling (88%), rehearsal (84%), trustworthiness (80%), reminders (55%), and personalisation (50%). Principles from the primary task support domain were the most frequently employed, a finding consistent with previous research25,30.

In contrast to previous research, however, this study did not find any significant association between the number or domain of persuasive design principles (e.g., primary task, dialogue) and app efficacy. Similarly, no significant relationship was observed between engagement and either individual persuasive principles or domains. These findings differ from earlier studies, which have reported both positive25,30,36,40 and negative40 associations between PSD principles and engagement or efficacy outcomes.

These discrepancies may stem from several factors, including differences in study scope, coding practices, methodological approaches, and analytical techniques. For instance, some previous studies focused on digital interventions within the broader digital health field rather than specifically on mental health30,36, while others examined web-based interventions rather than smartphone apps30 or exclusively investigated interventions targeting depression and anxiety 25,40. Variations in scope, mental health focus, and intervention modalities may partly explain the divergent findings across studies.

Another factor could relate to the variability in coding practices across research teams. As noted, none of the included studies explicitly reported their use of persuasive design principles, necessitating reliance on subjective coding decisions made by individual research teams. This process introduces variability, as coding decisions are subject to subjective interpretations that may differ across research groups. These challenges highlight the need for more transparent and standardised reporting of persuasive principles to minimise bias and improve consistency and comparability across studies.

Methodological differences may also account for the observed discrepancies across reviews. Previous reviews typically coded interventions for PSD principles based solely on the descriptions provided in study outcome publications30,40. In contrast, our approach sought to enhance accuracy of PSD coding by incorporating supplementary materials, such as study protocols, development papers, and publicly available websites. This broader range of source material was intended to enable a more comprehensive assessment of the PSD principles operating within each intervention. However, this more expansive approach to coding may have contributed to the different findings relative to earlier reviews.

Finally, differences in study design and analytical approaches may have also contributed to the difference in findings. Our study utilised a methodology that focused exclusively on RCT-tested apps across all mental health conditions and a total participant sample size of 16,728, providing greater statistical power and the opportunity for subgroup analysis to enhance the reliability and precision of effect estimates.

Inconsistent reporting of PSD principles poses a significant challenge in evaluating their presence, implementation, and impact in digital mental health interventions. Without explicit disclosure from authors, it remains difficult to determine whether these principles have been incorporated, to what extent, or with what level of fidelity. This lack of transparency reflects a broader issue: the absence or inconsistent application of standardised frameworks for detailing persuasive features in digital interventions. Such gaps result in key design elements being either unreported or varying widely in their quality and integration. Addressing these challenges requires the adoption of standardised frameworks, such as the PSD framework, alongside consensus on best practices for documenting and evaluating engagement-enhancing features. Clear and consistent reporting throughout the app development and evaluation process is essential to improve transparency, reduce variability, and ensure that the potential impacts of persuasive design in digital interventions can be rigorously assessed.

Building on these findings, we now turn to potential reasons why no significant relationship between PSD principles and engagement was identified in the current study. While coding practices and inconsistent reporting may have contributed to this outcome, several other factors may also explain the limited evidence for PSD principles enhancing engagement. These include variability in the application of PSD principles, challenges in reliably measuring engagement, and the influence of external variables that extend beyond the scope of the PSD framework.

As mentioned, inadequate reporting of how persuasive design principles are incorporated into interventions may have led to missed or misattribution of PSD principles to apps. This could also be a threshold issue, where a principle is minimally applied and marked as present, whereas another app may implement the same principle more thoroughly or effectively, yet both receive the same score. Second, substantial variability in how engagement is measurement across studies limites our ability to comprehensively explore the relationship between PSD principles and engagement. Third, using more PSD principles does not necessarily lead to proportionally higher engagement or efficacy. This non-linear relationship can make it harder to detect associations. For example, certain combinations of PSD principles may be more effective in fostering engagement and efficacy for specific subgroups of users. Fourth, other variables influencing engagement (e.g., overall app aesthetics and design, human support, user characteristics, social reciprocity, social ranking, variable unpredictable rewards, etc) may mask the relationship between the PSD principles researched in this study and engagement.

Finally, while it is possible that PSD principles do not directly influence engagement or efficacy, this explanation appears less plausible. Given the well-established role of user experience and design in shaping interactions with technology, it is improbable that persuasive design principles have no effect. Therefore, it is more reasonable to consider that other factors, such as variability in application or measurement, may account for the lack of a clear relationship in this study.

To shift focus from the relationship between PSD principles and engagement specifically, we now address the significant variability in how engagement was defined and reported across studies. This variability hindered the ability to test associations through meta-analysis and identify possible patterns of engagement. Notably, 24% of the included studies did not report engagement metrics at all, a considerable proportion given the importance of the engagement challenge in the digital intervention field. This aligns with previous findings by Lipschitz et al. 2, who highlighted the low rates of engagement reporting in digital mental health interventions for depression, where only 64% of studies reported daily usage and just 23% provided retention rates for the final treatment week. The present study extends these concerns to mental health apps more broadly, suggesting that inconsistent and selective reporting of engagement metrics may skew perceived levels of user interaction. In some cases, this may reflect a form of selective outcome reporting, whereby studies present only the most favourable engagement metrics—those that cast the intervention in a positive light—while omitting others that may be less compelling. Such selective reporting can artificially inflate engagement data, potentially overestimating the use, relevance, and even efficacy of interventions. These discrepancies underscore the need for standardised engagement metrics to ensure accurate, comprehensive reporting and a clearer understanding of user interaction with digital mental health interventions.

We found significant variability in the engagement metrics reported across studies, identifying 25 distinct metrics, which we grouped into 10 categories. These categories included: (1) rate of uptake, (2) time spent on the app (min/h), (3) days of active use, (4) logins, (5) modules completed, (6) study metrics, (7) messages sent and received, (8) posts and comments made, (9) participant self-reports, and (10) miscellaneous. Among all the engagement metrics, the most commonly reported were: (1) the percentage of users who completed the entire intervention or who completed the intervention per protocol (14% of studies), (2) self-reported engagement (12% of studies), and (3) the mean percentage of modules completed (11% of studies). To support comparability across studies, we recommend using either the percentage of users who completed the entire intervention or completed the intervention per protocol, and/or the mean percentage of modules completed. Given its susceptibility to recall bias and other potential sources of error, we advise interpreting self-report data with caution and avoided when objective metrics are available.

The focus on engagement metrics in digital interventions stems from the understanding that engagement is necessary for the intervention to be effective. Engagement metrics are, therefore, crucial for ensuring that users are meaningfully exposed to the intervention content and essential for attributing positive outcomes to the intervention in question3. However, to obtain an accurate understanding of the relationship between engagement and effectiveness, we must also understand the behaviour change mechanisms of the intervention, that is, how the intervention drives real-world change. The factors influencing engagement with the digital intervention, such as persuasive design, do not fully align with those driving real-world behaviour change, which are often rooted in deeper motivations, self-efficacy, and the perceived relevance and feasibility of recommended behaviours49. Despite the aim of promoting behaviour change, Orji and Moffat36 reported that 55% of studies did not link mental health interventions to any behaviour change theory. This absence of a theoretical underpinning was also observed in the present study, where the extraction of behaviour change theories underlying the reviewed studies was halted due to insufficient reporting.

Therefore, to advance the field, we propose several recommendations aimed at addressing gaps in the development, reporting, and evaluation of digital mental health interventions. Establishing standardised engagement metrics is crucial for enabling reliable comparisons across studies and facilitating future meta-analyses. We recommend adopting guidelines such as CONSORT-EHEALTH50 to improve reporting consistency and to provide clearer insights into how persuasive design principles influence engagement and outcomes. We also endorse, and propose extending, the five-point checklist developed by Lipschitz et al. 2, which includes metrics such as adherence, rate of uptake, level-of-use, duration-of-use, and number of completers. We suggest incorporating two additional measures aimed at providing a more comprehensive understanding of intervention impact. First, the description of interventions should include detailed information on the design principles and features of the platform that promote engagement; authors should explicitly state the frameworks (e.g., PSD) and specific principles employed. Second, authors should explicitly link the intervention to an overarching model of behaviour change. By systematically reporting these principles alongside consistent engagement metrics, researchers can better establish possible links between design elements, user engagement outcomes, real-world behaviour change, theoretical models, and clinical outcomes. This approach will facilitate a deeper understanding of user engagement and what drives therapeutic change.

To build on these recommendations, it is also important to acknowledge that while persuasive design principles aim to enhance engagement, therapeutic change requires a nuanced understanding of how engagement interacts with psychological mechanisms specific to mental health conditions. While persuasive principles aim to enhance user engagement, it is important to recognise that engagement alone may not directly lead to behaviour change unless paired with mechanisms targeting the underlying psychological processes relevant to a mental health condition. For instance, the negative association Wu et al.40 found between PSD features and engagement (as measured by completion rate), points to the complexity of this relationship and suggests that other factors may moderate the impact of persuasive techniques on outcomes. For instance, Wu et al. 40 speculate that engagement may not be the primary factor driving the association with effectiveness. We recognise that behaviour change in digital mental health intervention is likely influenced by a complex interplay of factors, including both engagement-driven processes and condition-specific mechanisms of action.

Despite its strengths, this study had several notable limitations. First, the use of persuasive principles was not explicitly reported in any studies included in the analysis. Although coding was guided by a well-defined framework and a conducted by an expert team of user experience designers, many of the reviewed apps were not commercially available. As a result, limiting their comprehensive description and accessibility for analysis. As a result, we were unable to directly access or interact with these interventions and had to rely solely on the authors’ descriptions in published papers to code for persuasive system design features. This method raises concerns that some apps may have employed persuasive principles not described in their app descriptions, potentially affecting assessments of effectiveness and outcomes. Additionally, there is uncertainty about the extent to which persuasive principles were actually incorporated, even among those apps that were coded for it. This limitation highlights the urgent need for greater transparency in intervention reporting, especially for apps that are not commercially available, to facilitate more detailed evaluations of how persuasive design principles impact mental health app outcomes.

Additionally, we could only conduct a single correlation analysis between PSD principles and one engagement metric. With only 76% of studies reporting on engagement or adherence, and those metrics being highly heterogeneous, a meta-analysis was not feasible. This inconsistency in reporting and operationalising engagement metrics significantly hinders progress in understanding app and digital intervention engagement. Consequently, these limitations constrained our ability to comprehensively evaluate the relationships between persuasive design principles and app engagement and effectiveness. Despite these limitations, to our knowledge, this is the most extensive examination to date on the relationships between PSD principles, engagement and efficacy in digital mental health. We included 119 RCTs, coded all persuasive design principles with high inter-rater reliability, reviewed all protocols and associated reports, including downloading the apps when available, and categorised all engagement data reported into meaningful categories.

In conclusion, this study provides the most extensive and systematic evaluation to date of the relationship between persuasive design principles, engagement, and efficacy in digital mental health apps. Results identified that mental health apps are moderately effective for depression, anxiety, stress, and body image/eating disorders but show limited efficacy for psychosis, suicide/self-harm, and postnatal depression, highlighting the need for further research in these areas. While persuasive design principles are intended to enhance user engagement, our study found limited evidence of their direct impact, likely due to inconsistent reporting, variability in their application, and a lack of standardised engagement metrics. Further, owing to the rapid development of technology and its capabilities, persuasive principle frameworks require revision and updating to include strategies such as staged disclosure as identified in this study. These findings underscore critical limitations in the field that require immediate attention. To address the ongoing issue of poor engagement with digital mental health interventions, we need a concerted effort to establish a uniform understanding of engagement as a foundation for harmonising the reporting of engagement metrics. Additionally, it is essential to align descriptions of the persuasive strategies and behaviour change frameworks used to underpin engagement. Addressing these methodological gaps is critical for enabling large-scale data pooling, which will facilitate a more nuanced understanding of the factors influencing engagement, tailored to specific populations and contexts. Without such foundational improvements, the key challenge of engagement in digital mental health will remain intractable, limiting its real-world impact.

Methods

This study comprised a systematic review, meta-analyses, and meta-regression. Details on the conduct of the study in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA 2020)51 guidelines are provided in the supplementary information. Additionally, it adhered to the prospectively registered protocol in PROSPERO (CRD42022352123) in August 2022, which can be accessed at https://www.crd.york.ac.uk/prospero/.

Eligibility criteria

To be included, studies had to meet the following criteria: (1) were randomised controlled trials; (2) delivered a digital mental health intervention; (3) were delivered via a smartphone app; (4) aimed at addressing a mental health condition with psychological/psychosocial approaches; (5) delivered the therapeutic intervention primarily via the app; (6) reported quantifiable measurements of user engagement, intervention adherence, or effectiveness data; (7) contained an app description that was adequate enough to allow coding of its persuasive design principles; (8) were published in peer-reviewed academic journals; and (9) were written in English.

The exclusion criteria were as follows: (1) case studies and feasibility studies without a control group, reviews, theses, or book chapters and (2) data reported in another study (i.e., conference abstracts where data are subsequently published elsewhere).

Synthesis approach

To examine our primary aims, we conducted a systematic narrative synthesis and meta-analysis of the available literature. The available studies that were identified as meeting criteria via review were systematically synthesised and reported by examining (a) the types of samples, interventions, and mental health contexts in which smartphone-based therapeutic interventions have been applied; (b) which persuasive principles were present in each app description and the number of overall principles present in each app; (c) the type of metrics used to report on intervention engagement and level of engagement (if available); and (d) pre and post-intervention outcome data.

Search strategy

A systematic search was conducted via the Ovid platform to identify relevant studies. The databases searched included PsycINFO, PubMed, and Embase, as they provide a well-established body of research on engagement and clinical outcomes in the clinical context, which was the focus of our review. A search term strategy was designed to target four key concepts: (1) app (app OR smartphone OR iphone OR android OR mobile OR “mobile phone” OR “mobile app” OR “mobile application” OR “mobile based” “smartphone app” OR “smartphone application”), (2) digital intervention (digital intervention” OR “mobile intervention” OR “smartphone intervention” OR “digital mental health” OR “mobile health intervention” OR “digital technology” OR “mHealth” OR “eHealth” AND “mobile health”), (3) mental ill health (“mental ill health” OR disorder OR “psychiatric disorder” OR well-being OR wellbeing OR depress* OR psycho* OR bipolar OR anxiety OR schizophrenia OR affective OR self-harm OR self-injury OR distress OR mood OR body image OR eating disorder OR suicid* OR “posttraumatic stress” OR PTSD OR agoraphobia OR phobia* OR panic OR funct* OR OCD OR stress OR symptom,* and (4) randomised controlled trial (randomised OR randomized OR RCT OR waitlist OR allocate*). NOT protocol OR “single arm” OR “systematic review” OR “scoping review.”

Study selection and data collection

The searches were completed in August 2022, after which all retrieved articles identified by the search strategy were downloaded and then uploaded to Covidence systematic review software, Veritas Health Innovation, Melbourne, Australia. Available at www.covidence.org. After automated identification and removal of duplicates, two reviewers (L.V., J.D.X.H.) independently screened the titles and abstracts of all articles according to the eligibility criteria. Once consensus on eligibility was reached in the screening phase, two researchers (L.V., J.D.X.H.) independently conducted full-text reviews. If conflicts arose in the screening or full-text review phases via the Covidence interrater feature, the reviewers (L.V., J.D.X.H.) resolved them through discussion.

Data extraction

Four authors (L.V., M.S.V., S.O., and R.S.) independently extracted data related to the study, sample, intervention characteristics and engagement and efficacy rates (i-v. below). Fifty percent of the full dataset was double-extracted to ensure reliability (W.P., R.P.S., and L.V.). Any discrepancies were resolved by consensus with a third author (J.N.). The following data were extracted:

  1. (i)

    Study characteristics: authors, year of publication, country of participant recruitment,

  2. (ii)

    Sample characteristics: total sample size, number of participants in the control and intervention group(s), primary mental health condition, age, gender, population type (clinical/sub-clinical/non-clinical),

  3. (iii)

    Intervention characteristics: app name, description, theoretical treatment model (e.g., CBT), intervention length (in days), if there was human support, control conditions (e.g., waitlist, treatment as usual, active control), primary mental health target including discrete symptoms (i.e., stress), symptom clusters (i.e., a depression diagnoses) and measures of behaviour (i.e., social functioning);

  4. (iv)

    Engagement rates: whether engagement data were reported (yes/no), engagement/adherence metric (e.g., number of logins, modules completed), and engagement rates;

  5. (v)

    Efficacy data: means, standard deviations (SDs), and sample sizes (N) of the primary outcome measures for both the intervention and control group(s) at the pre and post-intervention time points. If certain metrics (e.g., SDs) were missing, they were either calculated using available data or the authors were contacted for additional information. When multiple primary outcome measures were presented, the first listed measure was considered the primary outcome for extraction. An exception to this procedure was made for subsequent outcome measures pertaining to depression or anxiety, which were also included in the extraction process, and

  6. (vi)

    Persuasive principles: the presence or absence of the 28 persuasive design principles outlined in the Persuasive Systems Design (PSD) framework for each app description. Each principle was coded dichotomously for each app, with a value of 1 indicating its presence and 0 indicating its absence, resulting in a total of 28. Data were extracted from descriptions of apps within the included studies, and, whenever possible, app information from supplementary materials such as protocols, development papers, app stores, or websites was gathered and used to provide more context for each app.

These data were extracted by three authors (K.B., L.B., and L.V.), who first collaborated to operationalise the PSD framework (Table 1.) and apply its principles to contemporary app design. KB, a senior UX designer, and L.B., a senior product designer, are both experts in product design and user needs. Initially, the three authors coded 20% of the apps collaboratively to ensure consistency in their approach before coding independently once they were confident that they were aligned in their deductive approach. This collaborative effort aimed to maintain accuracy and consistency in applying the PSD framework to the app descriptions. In cases of discrepancies requiring resolution, a fourth author (J.N.) was consulted. During this coding process, reoccurring principles that were not part of the PSD framework were identified and discussed among the reviewers. These principles were then added to the deductive coding framework for later analysis.

Risk of bias assessment of individual studies

Six reviewers independently evaluated the risk of bias in each study via the Cochrane risk of bias (RoB2) tool. Each of the five risk domains of the RoB2 was scored against a three-point rating scale relating to a low, some concern, or high risk of bias.

Meta-analysis, meta-regression, and correlation analysis

A meta-regression analysis was conducted to investigate the relationship between intervention efficacy and the implementation of persuasive design principles. Additionally, this analysis examined the associations between intervention efficacy and variables related to sample type (e.g., clinical, sub-threshold, non-clinical), outcome type (e.g., depression, anxiety etc.), the presence of human support (blended vs self-guided), and intervention type (e.g., CBT, mindfulness etc.). A two-tailed Pearson’s correlation analysis was also performed to examine the relationship between the number of persuasive design principles and app engagement. All analyses were conducted in SPSS (v29).

The meta-analysis and meta-regression used random-effects models with maximum likelihood estimations. Initially, we computed Hedges’ g effect sizes from available estimates (e.g., M, SD) reported within studies to reflect the size of the between-group (control vs intervention) effects at the post-intervention time-point, and then meta-analysed them. The primary outcome variable (e.g., depression) as reported within each study was used across all analyses. To maintain coding consistency, estimates calculated on the basis of positive-valued outcomes (i.e., where higher scores indicate better mental health) were reverse-coded to align with the majority of outcomes. Therefore, negative Hedges’ g values represent cases where the intervention group was more effective than the control group. To gain a deeper understanding of intervention efficacy, we conducted sub-group moderator analyses across various variables, including sample type (non-clinical, sub-clinical, clinical), outcome type (depression, anxiety, stress, social anxiety, PTSD, body image/eating disorder, psychosis, suicide/self-harm, general mental health, postnatal depression), presence of human support (blended vs self-guided), and type of intervention (CBT, CBT+, mindfulness-based, and psychoeducation). This approach allowed us to identify specific factors that may influence the overall effectiveness of digital mental health interventions.

Additionally, a meta-regression was performed, regressing the efficacy of the interventions (Hedges’ g) on the total number of persuasive design principles identified in each study, to determine the potential relationship between persuasive design principles and intervention efficacy. Throughout all analyses, we assessed heterogeneity using common metrics (e.g., Cochran’s Q, I²52,53); and examined potential publication bias by (a) visually inspecting the funnel plot of standard errors by Hedges’ g to assess for asymmetry, and (b) examining trim and fill estimates to determine if the potential for missing estimates affected the overall effect size54.

A two-tailed Pearson’s correlation analysis was run to examine the relationship between the number of persuasive design principles used and app engagement, as the literature presents conflicting findings regarding the relationship between persuasive design principles and user engagement levels.