Introduction

Increasing physical activity (PA)1, reducing sedentary behavior (SB)1, and optimizing physiological outcomes2,3 (e.g., weight and body mass index [BMI]) is crucial for preventing non-communicable diseases (NCDs) and even death due to NCDs in adults. The prevalence of physical inactivity increased from 23.4% in 2001 to 26.4% in 20164. The latest report revealed that this trend is sustained and approximately 31.3% of adults did not meet the 2010 World Health Organization (WHO) recommendations for PA in 20224.

According to the International Labor Organization estimates, in 2023, approximately 60% of adults worldwide were workers5. Changes in industrial structure have shifted the workplace from one that requires physical labor to one that requires sedentary labor6. Moreover, physical inactivity is present in white-collar and blue-collar workers7. Notably, the prevalence of moderate-intensity occupation reduced from 48% in 1960 to 20% in 2008, and that of sedentary and light-intensity occupations steadily increased6. Workers in France sit for approximately 4.2 h daily, and those who engaged in sedentary work sit for 6.2 h8. Thus, workers are at greater risk for developing NCDs, including obesity.

Interventions that use digital technology (d-Health), defined as “the field of knowledge and practice associated with the development and use of digital technologies to improve health,”9 are an emerging area for health promotion in the workplace. d-Health can potentially overcome barriers to implementing lifestyle interventions in the workplace, such as time, support, and information10,11. In fact, several systematic reviews (SRs) suggested the potential impact of d-Health interventions on PA and SB in workers12,13,14,15. Additionally, several SRs16,17,18,19,20 and a few umbrella reviews21,22 suggested the efficacy of d-Health interventions on PA, SB, and physiological outcomes in various populations. However, many differences seem to exist in the characteristics of target populations and definitions of d-Health interventions, making it difficult to conclude the efficacy of d-Health interventions23.

Given the number of SRs of d-Health interventions on workers12,13,14,15 has increased, it is an excellent time to elucidate scientific evidence on the efficacy of d-Health interventions on the increase of PA and the reduction of SB in workers. Further, sedentary workers are likely to be insufficiently physically active, a scientifically proven cardiovascular risk factor1. Hence, we aim to conduct an overview of SRs with or without meta-analysis—an umbrella review—assessing the efficacy of d-Health interventions to optimize PA, SB, and physiological outcomes (e.g., weight, BMI, fat mass).

Results

Search strategy and selection of systematic reviews and individual studies

The search strategy for the selection of SRs is shown in Supplementary Table 1. Among 6479 records initially screened, we obtained 72 full-text articles assessed for eligibility after title and abstract search. Eventually, this umbrella review included 24 SRs12,13,14,15,19,20,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41. The PRISMA Flow diagram is shown in Fig. 1. The list of excluded studies for SRs is presented in Supplementary Table 2.

Fig. 1: PRISMA flow diagram.
figure 1

a PRISMA flow diagram of the systematic review, and b individual studies. D design, I intervention, O outcome, P population, PTpublication type.

To increase the directness between evidence and our research question, we selected individual studies included in the 24 SRs and met all inclusion criteria except for publication type. The 24 SRs included 683 individual studies, and then we screened them and identified 179 that met the inclusion criteria for this umbrella review. After removing duplicate studies, 133 individual studies42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174 were left. Furthermore, among the 133 individual studies, two individual studies77,78 reported the same trial but different outcomes. One study reported results from the full sample of a trial108, while the other reported findings from a subsample of the same trial (Dekkers, 2011)173. One study (De Cocker, 2017)174 was a sub-analysis of an RCT136. Therefore, we treated these pairs of studies as one, resulting in a total of 130 studies42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172 included in our analysis.

Characteristics of included systematic reviews

Table 1 summarizes the characteristics of the 24 SRs12,13,14,15,19,20,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41. The number of individual studies in each SR ranged from 6 to 70, totaling 683 in the 24 SRs. A total of 1312,13,14,15,26,27,28,29,31,32,33,36,38 out of 24 (54%) SRs were limited to only workers, whereas the others19,20,24,25,30,34,35,37,39,40,41 focused on various and broader populations (e.g., not specified, working-age women). Two SRs limited their scope to women19,32. Among 13 SRs12,13,14,15,26,27,28,29,31,32,33,36,38 focused on workers; the setting was restricted to workplaces in 9 (69%) SRs12,13,26,29,31,32,33,36,38 and not specified in the rest14,15,27,28.

Table 1 Characteristics of 24 included systematic reviews

A total of 17 (71%) SRs12,13,14,15,19,20,24,25,27,28,34,35,36,37,39,40,41 only included individual studies that used d-Health as a stand-alone intervention or a part of multicomponent interventions. Otherwise, the definition of d-Health varied across the reviews. For example, most SRs defined their interventions without restrictions on digital technologies used; others limited it to a smartphone application24,34 or a computer prompt36. A total of 15 (63%) SRs12,13,19,20,24,25,28,31,32,34,35,37,38,39,41 included any types of control conditions (e.g., no control, no intervention, usual care, waitlist control, or non-d-Health intervention). Regarding outcomes, 19 (79%) SRs12,13,14,19,20,25,26,27,28,29,30,31,32,33,34,36,38,39,41 used PA-related outcomes; 11 (46%) SRs12,13,14,15,26,27,28,29,33,36,39 used SB-related outcomes; and 18 (75%) SRs12,13,14,19,20,24,27,28,29,32,33,34,35,37,38,39,40,41 utilized physiological outcomes (i.e., weight, BMI, and fat mass).

Quality of included systematic reviews

The scores of the methodological quality of SRs measured by the revised MeaSurement Tool to Assess systematic Reviews (AMSTAR 2) are presented in Table 1, and their details are shown in Supplementary Table 3. Of the 24 SRs, 17 (71%) SRs were rated as critically low, 4 (17%) SRs as low, 2 (8%) SRs as moderate, and 1 (4%) SRs as high. Most of the SRs did not meet several criteria as follows: 21 of 24 (88%) SRs did not explain their study design for inclusion (“Did the review authors explain their selection of the study designs for inclusion in the review?”)175. Similarly, 22 of 24 (92%) SRs did not report on the sources of funding for the included studies (“Did the review authors report on the sources of funding for the studies included in the review?”)175. Additionally, 19 of 24 (79%) SRs did not provide a list of excluded studies and justify the exclusions (“Did the review authors provide a list of excluded studies and justify the exclusions?”)175.

Overlap in individual studies in systematic reviews

The 24 SRs comprised 179 individual studies that met our inclusion criteria of this umbrella review. Among these, we identified 130 unique individual studies42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172 cited across 24 reviews12,13,14,15,19,20,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41. The corrected cover area (CCA) was 1.6%, which indicates a slight overlap. A table for calculating the CCA is in Supplementary Table 4.

Characteristics of individual studies in systematic reviews by outcome

Among 130 individual studies included in the 24 SRs, 108 (83%) individual studies43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,99,103,104,106,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,131,132,133,134,135,137,138,139,140,144,146,147,148,149,150,151,152,153,154,155,156,157,158,159161,166,168,170,171 used PA-related outcomes; 50 (38%) studies43,44,45,46,48,51,52,62,64,65,67,72,74,76,82,84,90,94,97,110,111,112,113,115,116,129,130,131,134,135,136,137,138,139,140,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157 used SB as an outcome; 65 (50%) studies42,44,45,48,49,51,57,58,62,63,64,67,69,72,76,77,78,79,80,81,83,86,87,88,92,93,95,96,98,99,100,101,102,103,104,105,106,107,108,109,111,112,116,117,118,119,127,141,142,144,152,154,155,159,160,161,162,163,164,165,167,169,170,171,172 used physiological outcomes (Table 2). Detailed descriptions of 130 individual studies are shown in Supplementary Data 1.

Table 2 Characteristics of individual studies included in the 24 systematic reviews

Approximately 60% to 70% of the individual studies were conducted in North America or Europe. The rate of female participants was 60% or more in about half or more individual studies. The dropout rate was less than 25% in approximately half of the individual studies. Even so, the dropout rate varied according to the types of outcomes and study designs. Despite the relatively high dropout rates, 40% to 50% of the individual studies failed to report on intention-to-treat (ITT) analysis and describe how to handle missing data. Eighteen of 130 (14%) individual studies used last- or baseline-observation-carried-forward analysis (a method to impute participants’ missing data with the last observation before they dropped out)176. Further, 13 of 130 (10%) used a linear mixed effect model to impute missing observation.

Study design

Of the 108 individual studies with at least one PA-related outcome, 83 studies43,44,47,49,53,54,55,56,57,58,60,62,63,65,67,68,69,71,73,74,76,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,99,103,104,106,108,110,111,115,116,118,119,120,122,123,126,127,128,129,131,132,133,134,135,137,138,139,140,144,147,148,149,150,151,152,153,154,155,156,157,158,159,170,171 were randomized controlled trials (RCTs), and 25 were non-RCTs (i.e., non-RCT interventional study and cohort study) including 17 non-RCT interventional studies48,50,51,59,64,66,70,72,75,109,112,113,114,121,124,125,146 and 8 cohort studies45,46,52,61,117,161,166,168. Of 50 studies with at least one SB-related outcome, 39 studies43,44,62,65,67,74,76,82,84,90,94,97,110,111,115,116,129,130,131,134,135,136,137,138,139,140,144,145,147,148,149,150,151,152,153,154,155,156,157 were RCTs, and 11 were non-RCTs including 8 non-RCT interventional studies48,51,64,72,112,113,143,146 and 3 cohort studies45,46,52. Of 65 studies with at least one physiological outcome, 52 studies42,44,49,57,58,62,63,67,69,83,92,93,95,96,111,116,118,119,127,141,142,144,152,154,155,159,169,170,171,172 were RCTs, and 13 were non-RCTs including 7 non-RCT interventional studies48,51,64,72,105,109,112 and 6 cohort studies45,117,167.

Participants

Most studies placed some restrictions on their participants regarding job type (e.g., blue color, desk-based worker, nurses), PA level (e.g., sedentary, not physically active), and physiological status (e.g., obese, overweight, metabolic syndrome). The median and interquartile range (IQR) of the total sample size was 141 (67–305) for studies with PA-related outcomes, 93 (46–278) for studies that measured SB-related outcomes, 185 (97–421) for studies that utilized physiological outcomes.

Interventions

Median (IQR) duration of intervention and follow-up were 3.0 (2.0–5.0) months and 3.5 (2.0–6.0) months for studies with PA-related outcomes; 3.0 (2.0–3.9) months and 3.2 (2.1–6.4) months for studies with SB-related outcomes; and 3.5 (3.0–6.0) months and 6.0 (3.0–6.0) months for studies that selected physiological outcomes. Approximately 40% to 50% of individual studies used mobile health (m-Health), and approximately 90% employed d-Health (except for m-Health) as a stand-alone intervention or a part of interventions. About 70% to 90% of individual studies that used PA-related outcomes or SB-related outcomes focused on only PA and SB. In contrast, ~65% of studies that measured physiological outcomes focused on multiple domains, including PA and SB, Diet/Nutrition, and others (e.g., smoking, drinking, sleep). Moreover, over 50% of individual studies combined d-Health and non-d-Health interventions (e.g., printed materials, face-to-face counseling, on-site educational sessions, and supervised exercise classes). Approximately 10% to 15% of individual studies used financial or non-financial incentives as an intervention component. Substantial heterogeneity was observed in types of d-Health interventions.

Controls

Over 40% of RCTs employed the “d-Health. vs. no.” comparison. “d-Health. vs. non-d-Health.” and “d-Health vs. other d-Health” comparisons were used in approximately 10% to 30% of RCTs. Several RCTs used the “d-Health + non-d-Health vs. same d-Health only” or other types of comparison. By contrast, most non-RCTs had no control group(s) and performed only pre-post comparisons.

Methods for measuring PA- and SB-related outcomes

Numerous types of metrics were used as outcomes. One hundred twelve individual studies measured at least one PA- or SB-related outcome. Among them, 45 studies43,44,45,46,47,48,49,50,53,54,55,56,58,63,64,65,66,76,80,82,83,84,85,94,106,117,120,130,134,137,139,140,143,144,145,147,148,152,153,154,155,156,157,168,171 measured PA/SB objectively (i.e., tools such as a device). Forty-nine studies57,61,62,68,69,70,71,72,73,74,75,81,86,87,88,89,90,91,92,93,95,96,97,99,103,104,108,109,110,114,118,119,123,124,125,126,127,128,129,131,132,133,138,150,158,159,161,166,170 used subjective methods (i.e., self-reporting questionnaire) for measuring PA and SB. Eighteen studies51,52,59,60,67,79,111,112,113,115,116,121,122,135,136,146,149,151 measured PA and SB using both methods. Various devices or self-reporting questionnaires were used to measure PA and SB. For example, of 63 studies that objectively measured PA or SB; 16 studies43,84,94,135,136,137,139,144,145,147,148,153,154,155,156,157 used activPAL (PAL Technologies Ltd., UK); 12 studies44,51,52,60,64,67,82,85,134,146,152,156 used validated devices such as ActiGraph (ActiGraph, USA); 8 studies47,48,50,59,63,66,151,168 used Fitbit (Fitbit Inc., USA); 4 studies79,116,117,121 used Yamax (Yamasa Tokei Keiki Co., Ltd., Japan); 4 studeis80,83,120,171 used Omron interactive pedometer (Omron Healthcare, Japan); and one study115 used SenseWear (BodyMedia Inc., USA). The others still measured PA or SB using non-validated devices. Frequently used questionnaires were as follows: the International Physical Activity Questionnaire (IPAQ) long177 or short form178 (23 studies57,60,67,72,73,74,75,79,95,104,109,111,112,114,115,118,121,123,131,132,136,149,170) and the Godin-Shephard Leisure-Time Physical Activity Questionnaire179 (7 studies81,86,87,88,124,125,128). Few studies used other validated questionnaires (e.g., the Global Physical Activity Questionnaire180 and the Occupational Physical Activity Questionnaire181) and non-validated questionnaires developed for the studies.

Outcomes

One hundred eight individual studies used at least one PA-related outcome. Among them, 37 studies57,60,67,69,72,73,75,79,81,86,87,88,90,92,95,108,109,111,112,113,115,118,121,122,123,124,125,126,127,128,129,131,135,140,147,149,150 selected total PA; 36 studies47,48,51,58,60,63,64,66,79,80,83,94,106,111,113,120,121,135,140,144,148,151,152,154,168,171 measured step counts; 30 studies43,49,61,72,74,79,90,94,97,112,115,126,127,129,132,135,137,138,140,144,146,148,149,153,155,156,170 used walking or stepping (time or energy spent in walking or stepping); 27 studies44,62,64,67,82,85,96,113,115,119,125,126,133,148,156,157,170,171 utilized moderate to vigorous PA (MVPA); 22 studies76,87,88,96,99,110,112,113,115,128,129,132,135,149,158,159 measured vigorous PA (VPA); 21 studies60,62,68,76,87,88,99,110,112,113,115,128,129,132,135,149,158,159 employed moderate PA (MPA); 17 studies52,60,62,64,67,76,82,87,88,113,115,128,149,151,153,157,159 used light PA (LPA); 8 studies84,113,135,139,144,148,152,157 measured sit-to-stand transition. Several studies used other types of outcomes as PA-related outcomes. Regardless of outcomes used, 86 studies85,89,99,103,104,106,108,109,129,137,144,146,161,166,170,171 measured all-day PA; 30 studies43,51,60,65,73,74,81,84,90,97,110,113,115,121,126,135,144,155,156,157 used workplace, workday, or worktime PA (workday PA); 25 studies51,60,69,73,74,91,96,110,111,113,115,121,124,135,147,152,153,155,159 selected non-workplace, non-workday, non-worktime, or leisure time PA (non-workday PA).

Fifty individual studies measured at least one SB-related outcome. Sitting time was used in 34 studies43,45,52,67,72,74,84,90,94,97,112,113,115,116,130,131,148,154,155,156,157; sedentary time in 17 studies44,46,48,51,52,62,64,67,76,82,110,111,115,129,147,149,153; all-day SB in 28 studies48,51,52,62,64,67,76,82,94,110,111,113,115,129,134,136,137,144,145,149,150,154,155,156,157; workday SB in 33 studies43,46,52,65,72,74,84,90,97,116,130,131,151,152,155,156,157; non-workday SB in 11 studies52,74,111,113,115,136,147,149,152,153,155.

Sixty-five individual studies used at least one physiological outcome. Frequently used outcomes were as follows: weight in 51 studies44,45,48,49,51,57,58,62,63,64,67,72,76,79,80,81,83,86,88,92,93,95,96,98,99,100,101,102,103,104,105,107,108,109,111,112,117,119,127,141,142,144,155,159,161,162,164,169,170,171,172; BMI in 39 studies48,49,51,58,63,64,67,69,72,83,87,93,96,99,103,104,106,107,111,112,116,117,119,142,144,155,159,160,169,170,172; fat mass or rate in 19 studies57,58,79,83,95,99,100,101,104,106,109,117,142,144,152,155,159,169,172; and waist circumference (WC) in 35 studies42,57,58,62,64,72,76,79,81,83,92,96,98,100,101,104,106,107,108,109,111,112,117,119,141,144,152,154,155,159,160,161,162,164,169.

Risk of biases in individual studies

Approximately 50% of studies reporting at least one PA-related outcome, approximately 65% of studies reporting at least one SB-related outcome, and approximately 40% of studies reporting at least one physiological outcome exhibited a high risk of bias (Table 2). The risk of bias for each study is presented in Supplementary Data 1.

Narrative synthesize results

The revised effect direction plot provides a comprehensive summary of the efficacy of the d-Health intervention on PA-related outcomes (i.e., all types, both objectively and subjectively measured), SB-related outcomes (i.e., all types, both objectively and subjectively measured), and physiological outcomes. This summary is organized according to study designs, including RCTs and non-RCTs, as well as by types of comparisons (Table 3).

Table 3 Effect direction plot summarizing the impact of the d-Health intervention on physical activity-related, sedentary behavior-related, and physiological outcomes

Efficacy of d-Health intervention on PA-related outcomes

One hundred eight individual studies investigated the efficacy of d-Health intervention on PA-related outcomes. First, 39 RCTs56,68,74,76,80,81,85,87,88,89,90,91,94,96,97,104,111,115,116,119,129,132,133,134,135,137,139,140,144,147,148,149,150,152,155,156,159,170,171 employed no intervention, including usual care and waitlist control, as a control group. Five (13%) RCTs56,89,129,140,170 showed significantly favorable results for the intervention groups compared with the control groups for all types of outcomes (Table 3). The percentage favoring the intervention groups differed between objectively measured PA (9.5%) and subjectively measured PA (18%). Second, 16 RCTs44,57,60,62,67,79,82,86,92,93,95,108,118,123,126,127 compared a d-Health intervention with a non-d-Health intervention (e.g., printed materials, supervised exercise sessions, face-to-face counseling). Four (25%) RCTs44,92,118,123 significantly favored the intervention groups compared with the control groups (Table 3). The proportion of RCTs favoring the intervention groups varied between objectively measured PA (40%) and subjectively measured PA (21%). Third, 24 RCTs43,47,49,53,54,55,65,71,73,83,84,99,103,106,110,120,122,128,131,138,151,153,154,158 compared d-Health interventions and other d-Health interventions (e.g., multi-digital technologies vs. single digital technology, personalized intervention vs. non-personalized intervention). Four49,53,55,120 RCTs (17%) presented significantly favorable results for the intervention groups compared with other d-Health interventions (Table 3). The percentage favoring the intervention was 27% in objectively measured PA; in contrast, the percentage was 0% in subjectively measured PA. Fourth, 20 non-RCTs45,46,50,51,52,59,61,64,66,70,72,112,113,114,117,124,125,161,166,168 that did not have a control group reported pre-post comparison results. Eight (40%) studies45,66,70,114,117,124,125,161 showed significant improvements in PA. Lastly, 4 RCTs employed other types of comparison (i.e., late vs. early initiation of identical intervention63,69, a similar intervention but feedback message from an individual, friend, or team58, similar d-Health intervention with sit-to-stand desk vs. without it157). Overall, the RCTs showed non-significant results.

Efficacy of d-Health intervention on SB-related outcomes

Fifty individual studies evaluated the efficacy of d-Health intervention on SB-related outcomes. First, 23 (46%) RCTs74,76,90,94,97,111,115,116,129,134,135,136,137,139,140,144,147,148,149,150,152,155,156 employed no intervention as a control group. Seven (30%) RCTs116,129,137,140,150,155,156 showed significantly favorable results for the intervention groups. The percentages favoring the intervention group were 38% in objectively measured SB and 27% in subjectively measured SB (Table 3). Second, six RCTs44,62,67,82,130,145 used non-d-Health interventions as control groups. One (17%) RCT130 showed significantly favorable results. The proportion of RCTs favoring the intervention groups varied between objectively measured PA (20%) and subjectively measured PA (0%) (Table 3). Third, 9 RCTs43,65,84,110,131,138,151,153,154 compared a d-Health and other d-Health interventions. One (11%) study65 presented significantly favorable results for activity monitor and text message group compared to activity monitor only (Table 3). Finally, among 9 non-RCTs45,46,51,52,64,72,112,113,143 did not have control groups, 4 non-RCTs45,51,72,112 represented significant reduction in SB for pre-post comparison. The percentages presenting significant reduction in SB were 29% for objectively measured SB and 60% for subjectively measured SB (Table 3).

Efficacy of digital health intervention on physiological outcomes

Sixty-five individual studies assessed efficacy of d-Health intervention on physiological outcomes. First, 25 RCTs42,76,77,78,80,81,87,88,96,100,102,104,107,111,116,119,141,144,152,155,159,169,170,171,172 employed no intervention as a control group. Thirteen (52%) RCTs42,80,81,96,100,102,107,116,140,141,169,170,171 showed significantly favorable results for the intervention groups compared with no intervention. Second, 16 RCTs44,57,62,67,79,86,92,93,95,98,108,118,127,142,163,164 used non-d-Health interventions as a control group. Two (13%) RCTs92,127 showed significantly favorable results for the intervention groups compared with the control groups. Third, 8 RCTs49,83,99,101,103,106,154,165 compared d-Health interventions and other d-Health interventions. Only 1 (13%) study154 showed significantly favorable results for activity monitor plus offline intervention compared to activity monitor only. Fourth, of 10 individual non-RCTs45,51,64,72,112,117,160,161,162,167 that did not have control groups, 5 (50%) non-RCTs45,51,117,161,162 represented significantly improvements in physiological outcomes. Lastly, 3 RCTs employed other types of comparison (i.e., early vs. late initiation with an identical intervention63,69 and interventions performed by an individual vs. friend vs. team58). None of these showed significant favorable results for the intervention groups.

Meta-analysis results

Thirteen RCTs44,56,60,79,80,111,115,116,135,144,148,152,171 reported sufficient data for meta-analysis on step counts; 12 RCTs44,60,62,67,85,96,115,133,152,156,170,171 provided it on MVPA; 23 RCTs62,67,74,76,90,97,111,115,116,130,134,135,136,137,139,144,145,148,149,150,152,155,156 supplied it on sedentary or sitting time; and 26 RCTs44,57,62,67,76,79,80,81,92,93,95,96,100,102,104,108,111,119,141,142,144,155,159,164,170,171 presented it on weight. First, regarding step counts, 10 RCTs with 13 arms used no intervention as a control and included 1689 workers (intervention group: n = 961; control group: n = 728). Workers who received d-Health interventions exhibited a small but significant improvement in step counts compared with no intervention (standardized mean difference [SMD]: +0.25; 95% confidence interval [CI]: +0.07 to +0.43; I2 = 56.9%, Fig. 2a), with significant moderate heterogeneity (p = 0.0058). Three RCTs with 5 arms utilized a non-d-Health intervention as a control and included 1390 workers (intervention group: n = 688; control group: n = 702). The d-Health intervention showed a slight and nonsignificant increase in step counts compared with non-d-Health intervention (SMD: +0.44; 95% CI: −0.25 to +1.12; I2 = 88.1%, Fig. 2b), with significant high heterogeneity (p < 0.0001).

Fig. 2: Effects of d-Health interventions on objectively measured step counts.
figure 2

a Forest plot of d-Health interventions on objectively measured step counts in RCTs that used no intervention as the control, and b forest plot of those in RCTs that used non-d-Health interventions as the control. The error bars in the plot represent the 95% CI. Each square within the plot corresponds to the standardized mean difference in step counts observed across different studies. The size of the squares reflects the relative weight of each study in the analysis. The hollow diamond displays the overall effect size. CI confidence interval, RCTs randomized controlled trials, SD standard deviation, Std.standardized.

Second, among 12 RCTs that assessed MVPA, 8 RCTs used no intervention as a control and included 1499 workers (intervention group: n = 872; control group: n = 627). Participation in the d-Health intervention led to a nonsignificant improvement in MVPA compared with no intervention (SMD: +0.08; 95% CI: −0.07 to +0.23; I2 = 48.2%, Fig. 3a), with significant moderate heterogeneity (p = 0.0603). Three RCTs with 5 arms used a non-d-Health intervention as a comparator and included 1485 workers (intervention group: n = 736; control group: n = 749). The d-Health intervention led to a small but significant increase in MVPA compared to non-d-Health intervention (SMD: +0.24; 95% CI: +0.13 to +0.35; I2 = 0.0%, Fig. 3b), with low and nonsignificant heterogeneity (p = 0.4177). We found no heterogeneity in measurement methods (objectively vs. subjectively measured MVPA), but the 95% CIs tended to be broader for subjectively measured MVPA than for objectively measured MVPA.

Fig. 3: Effects of d-Health interventions on moderate to vigorous physical activity.
figure 3

a Forest plot of d-Health interventions on MVPA in RCTs that used no intervention as the control, and b forest plot of those in RCTs that used non-d-Health interventions as the control. The error bars in the plot represent the 95% CI. Each square within the plot corresponds to the standardized mean difference in MVPA observed across different studies. The size of the squares reflects the relative weight of each study in the analysis. The hollow diamond displays the overall effect size. CI confidence interval, MVPA moderate to vigorous physical activity, RCT srandomized controlled trials, SD standard deviation, Std. standardized.

Third, of 23 RCTs that measured sedentary time or sitting time, 19 RCTs with 25 arms used no intervention as a control and included 4000 workers (intervention group: n = 2308; control group: n = 1280). Meta-analysis showed that the d-Health intervention led to a slight but significant reduction in sedentary or sitting time compared to no intervention (SMD: −0.44; 95% CI: −0.61 to −0.28; I2 = 81.3%, Fig. 4a), with high and significant heterogeneity (p < 0.0001). Four RCTs with 5 arms used a non-d-Health intervention as a control and included 248 workers (intervention group: n = 125; control group: n = 123). The d-Health intervention resulted in a small and nonsignificant reduction in sedentary or sitting time compared to the non-d-Health intervention (SMD: −0.25; 95% CI: −0.50 to 0.00; I2 = 0%, Fig. 4a), with small and nonsignificant heterogeneity (p = 0.7550). Moreover, objectively measured sedentary or sitting time tended to yield a slightly larger SMD than subjectively measured sedentary or sitting time in both control conditions.

Fig. 4: Effects of d-Health interventions on sedentary behavior.
figure 4

a Forest plot of d-Health interventions on SB in RCTs that used no intervention as the control, and b forest plot of those in RCTs that used non-d-Health interventions as the control. Sitting time and sedentary time were combined in the analysis. The error bars in the plot represent the 95% CI. Each square within the plot corresponds to the standardized mean difference in SB observed across different studies. The size of the squares reflects the relative weight of each study in the analysis. The hollow diamond displays the overall effect size. CI confidence interval, RCTs randomized controlled trials, SB sedentary behaviour, SD standard deviation, Std. standardized.

Finally, among 26 RCTs that utilized weight, 15 RCTs with 18 arms comparing a d-Health intervention to no intervention included 3836 workers (intervention group: n = 2072; control group: n = 1764). We found that the d-Health intervention led to a small but significant reduction in weight compared to no intervention (SMD: −0.24; 95CI: −0.41 to −0.07; I2 = 75.2%, Fig. 5a), with significant moderate heterogeneity (p < 0.0001). Eleven RCTs with 16 arms comparing a d-Health intervention to a non-d-Health intervention included 8599 workers (intervention group: n = 4353; control group: n = 4247). The d-Health intervention resulted in a nonsignificant weight change (SMD −0.01; 95% CI −0.06 to 0.04; I2 = 0%) (Fig. 5b), with nonsignificant low heterogeneity (p = 0.6883).

Fig. 5: Effects of d-Health interventions on weight.
figure 5

a Forest plot of d-Health interventions on weight in RCTs that used no intervention as the control, and b forest plot of those in RCTs that used non-d-Health interventions as the control. The error bars in the plot represent the 95% CI. Each square within the plot corresponds to the standardized mean difference in weight observed across different studies. The size of the squares reflects the relative weight of each study in the analysis. The hollow diamond displays the overall effect size. CI confidence interval, RCTs randomized controlled trials, SD standarddeviation, Std. standardized.

Results of subgroup analysis

Subgroup analyses were conducted for all four outcomes based on risk of bias, intervention duration (< 3 months vs. ≥3 months), m-Health use, multielement use, offline intervention use, outcome type (all day, workday, or non-workday), and sex (female percentage <60% vs. ≥60%). The results of subgroup analyses for objectively measured step counts are presented in Supplementary Figs. 16. RCTs with a high risk of bias showed larger SMDs than those with a low risk of bias or some concerns; however, the direction of effects remained consistent. Multielement use led to a small but significant increase in step counts, whereas single-element use resulted in a moderate but nonsignificant increase. Additionally, d-Health interventions combined with offline interventions significantly increased step counts to a moderate degree, while those without offline interventions did not lead to a significant change. No apparent differences were observed in subgroup analyses for the other variables.

Regarding objectively measured MVPA (Supplementary Figs. 713), RCTs with a high risk of bias or some concerns showed larger SMDs than those with a low risk of bias. RCTs using m-Health interventions led to a small but significant increase in MVPA, whereas those without m-Health interventions showed a nonsignificant change. d-Health interventions combined with offline interventions resulted in a nonsignificant change in MVPA, whereas those without offline interventions led to a slight but significant increase. Overall MVPA tended to increase more than non-workday MVPA. Additionally, RCTs with less than 60% female participants showed larger SMDs than those with 60% or more. Subgroup analysis by multielement use was not performed for MVPA, as all RCTs included in the meta-analysis employed two or more elements.

Supplementary Figs. 1420 present the results of subgroup analyses for sedentary or sitting time. Workday sedentary or sitting time resulted in larger SMDs than overall sedentary or sitting time. RCTs with 60% or more female participants showed a small but significant reduction in sedentary or sitting time, whereas those with less than 60% showed a nonsignificant change. No apparent heterogeneity was observed in subgroup analyses for the remaining variables.

The results of subgroup analyses for weight are shown in Supplementary Figs. 2126. A small but significant effect was observed in RCTs with an intervention duration of 3 months or more, whereas a moderate but nonsignificant effect was found in those with an intervention duration of less than 3 months. RCTs that incorporated m-Health, multielement, or offline interventions demonstrated a greater reduction in weight compared to those that did not include these components. Greater weight reduction was observed in RCTs with less than 60% female participants compared to those with 60% or more. The effects of d-Health interventions on weight were consistent across all four risk-of-bias categories.

Publication bias

We assessed potential publication bias using funnel plots and Egger’s test for step counts in RCTs with no-intervention controls (Fig. 6a), sedentary or sitting time in RCTs with no-intervention controls (Fig. 6b), and weight in RCTs with either no-intervention (Fig. 6c) or non-d-Health intervention controls (Fig. 6d). While the funnel plots appeared symmetric, Egger’s test indicated asymmetry for weight in RCTs with no-intervention controls (p = 0.022).

Fig. 6: Funnel plots for step counts, sedentary behavior, and weight.
figure 6

a Funnel plot for step counts in RCTs that employed no intervention as the control, b Sedentary behavior in RCTs that employed no intervention as the control, and c, d Weight in RCTs that employed no intervention and non-d-Health interventions, respectively, as the control. The x-axis represents the standardized mean difference, and the y-axis represents the standard error. Funnel plot asymmetry was tested using Egger’s test. RCTs randomized controlled trials.

Confidence in cumulative evidence and summary of findings

Figure 7 presents the confidence in cumulative evidence and a summary of findings on the efficacy of d-Health interventions for step counts, MVPA, and weight in RCTs included in the meta-analysis. Low to very low-quality evidence suggested that d-Health interventions significantly, albeit slightly, outperformed controls in step counts, sedentary or sitting time, and weight when compared to no intervention. In contrast, d-Health interventions had minor positive but nonsignificant effects on MVPA. Similarly, low to very low-quality evidence indicated no clear advantage of d-Health interventions over non-d-Health interventions. Furthermore, our narrative synthesis found no evidence that any specific type of d-Health intervention (e.g., email vs. phone, multi-device vs. single device, or frequent vs. less frequent use) was more effective than another.

Fig. 7: Summary of and confidence in cumulative evidence for the efficacy of d-Health interventions on step counts, MVPA, sedentary or sitting time, and weights in RCTs included in meta-analysis.
figure 7

Indirectness, risk of bias, inconsistency, and imprecision were rated as no serious (green), serious (yellow), or very serious (red). Publication bias was rated as unlikely (green), likely (yellow), and very likely (red). Confidence in cumulative evidence begins at high and then addresses five domains that possibly rate down the confidence in cumulative evidence from high to very low. C control, I intervention, NA not applicable, O outcome, P population.

Discussion

To the best of our knowledge, this is the first umbrella review to summarize the efficacy of d-Health interventions on PA-related, SB-related, and physiological outcomes in workers. We included 24 SRs and 130 individual studies. Low to very low-quality evidence suggested that d-Health interventions had small but significant effects on PA-related, SB-related, and physiological outcomes compared to no intervention. However, there was no clear evidence supporting the superiority of d-Health interventions over non-d-Health interventions, such as exercise sessions and face-to-face counseling. Additionally, insufficient evidence was available to determine whether one type of d-Health intervention was more effective than another. Subgroup analyses identified potential sources of heterogeneity in d-Health interventions, which varied by outcome domain and type (e.g., step counts, MVPA).

The methodology of study design, population, intervention, control, and outcome significantly differed among the 24 SRs included in this umbrella review. For instance, 63% of the 24 reviews included various control conditions (e.g., no intervention, usual care, waitlist control, non-d-Health intervention, and other d-health interventions). Most did not synthesize them separately. Moreover, the risk of bias in the 24 reviews was rated as “critically low confident” in 71% and “low confident” in 17%. These results aligned with the previous umbrella reviews on activity trackers as intervention182 and smartphone applications21 in mixed populations. This umbrella review comprehensively covers a wide range of interventions. Thus, we emphasize that future SRs should be warranted with a more high-quality methodology to assess the efficacy of d-Health interventions on PA-related, SB-related, and physiological outcomes in workers.

The study design, population, intervention, control, and outcome measures varied significantly among the 24 SRs included in this umbrella review. For instance, 63% of the reviews incorporated diverse control conditions (e.g., no intervention, usual care, waitlist control, non-d-Health interventions, and other d-Health interventions), though most did not analyze them separately. Additionally, the risk of bias was rated as ‘critically low confidence’ in 71% of the reviews and ‘low confidence’ in 17%. These findings are consistent with previous umbrella reviews on activity trackers as interventions182 and smartphone applications21 in mixed populations. Given the broad scope of interventions covered in this review, we emphasize the need for future SRs to adopt higher-quality methodologies to better assess the efficacy of d-Health interventions on PA-related, SB-related, and physiological outcomes in workers.

This umbrella review analyzed 130 individual studies included in the 24 SRs. Similar to the SRs, we observed considerable variability among individual studies, particularly in interventions, control conditions, and outcomes. For example, a wide range of digital technologies were used as interventions, including static and dynamic websites, computer prompts, emails, text messages, smartphone applications, and interactive wearable devices. The most common comparisons in individual studies included d-Health interventions versus no intervention (including usual care and waitlist control), d-Health versus non-d-Health interventions, one d-Health intervention versus another, and pre- versus post-intervention comparisons. Additionally, numerous outcomes were assessed using different measurement devices, questionnaires, and methodologies. Due to this substantial heterogeneity, we conducted meta-analyses for only four relevant outcomes: step counts, MVPA, sedentary or sitting time, and weight.

The risk of bias was generally high in approximately 50% of individual studies. Among these, selection bias may have led to an overestimation of d-Health intervention efficacy for several reasons. First, dropout rates were high, with intervention groups being more likely to lose participants than control groups. Second, 40% to 50% of individual studies did not report on ITT and methods for handling missing data, and 14% relied on the last- or baseline-observation-carried-forward analysis, which tends to overestimate intervention effects183. These methodological flaws can introduce bias and affect the reliability of findings. Therefore, future studies should aim to minimize dropout rates and adopt appropriate ITT approaches, combined with more rigorous analytical methods such as multiple imputation and sensitivity analyses under alternative plausible assumptions for missing data mechanisms184,185,186,187.

Two previous SRs examined the efficacy of m-Health interventions in promoting PA among workers and found that these interventions can increase PA12,27. In this umbrella review, 5 of 39 RCTs (13%) reported significantly favorable results for d-Health interventions compared with no intervention, while 4 of 25 RCTs (25%) showed favorable results compared to non-d-Health interventions. In contrast, no study reported significantly unfavorable results in either control conditions. These narratively synthesized findings suggest an effect of d-Health interventions on PA-related outcomes, as a ratio of favorable to unfavorable results of 0.5 or higher indicates intervention efficacy188,189. Additionally, the effectiveness varied depending on whether PA was measured objectively or subjectively. Low-quality evidence indicated that d-Health interventions had a small but significant effect on step counts compared to no intervention (SMD: +0.25; 95% CI: +0.07 to +0.43; I² = 56.9%), with moderate heterogeneity. In contrast, very low-quality evidence suggested that d-Health intervention resulted in a nonsignificant increase in step counts compared with non-d-Health intervention. Very low-quality evidence suggested that, in contrast to step counts, MVPA showed no significant effect compared to no intervention but a significant effect compared to non-d-Health interventions. Moreover, potential sources of heterogeneity appeared to vary by outcome type. These results support previous findings12,27 and expand our understanding of the efficacy of d-Health interventions, including m-Health, for promoting PA. The efficacy of these interventions may depend on control conditions, measurement methods, outcome types, and interventional characteristics. Furthermore, our umbrella review found no clear evidence indicating that any specific type of d-Health intervention is superior to others.

Regarding SB, the percentages of individual studies exhibited significantly favorable results in the intervention group, ranging from 27% to 38% compared with no intervention. In contrast, the percentages dropped to less than 0% to 20% in any type SB (only 1 of 6 RCTs), objectively measured SB (only 1 of 5 RCTs), subjectively measured SB (0 of 2 RCTs) compared with non-d-Health interventions. Meta-analyses results supported these findings, as the efficacy of d-Health intervention was grater in RCTs that used no intervention as a control (SMD: −0.44; 95% CI: −0.61 to −0.28; I2 = 81.3%) than in those that used non-d-Health intervention as a control (SMD: −0.25; 95% CI: −0.50 to 0.00; I2 = 0%). These findings provided new insights into the efficacy of m-Health interventions on SB-related outcomes in workers12. Notably, the previous review included various control conditions (e.g., no intervention, educational booklet, and activity monitor without motivational messages)12, whereas we analyzed them separately. This distinction suggests that d-Health intervention may be more beneficial when compared with no intervention than non-d-Health intervention. Moreover, subgroup analyses identified outcome types (e.g., overall SB, workday SB) and participants’ sex as potential sources of heterogeneity. Additionally, similar to PA-related outcomes, it remains unclear which type of d-Health intervention is more effective than others.

Regarding physiological outcomes, 13 of 25 RCTs (52%) reported significantly favorable results in the intervention group compared with no intervention. By contrast, the percentage was 13% when compared with non-d-Health interventions. Meta-analysis results support these findings. d-Health intervention led to a small but significant weight reduction compared with no intervention (SMD: −0.24; 95CI: −0.41 to −0.07; I2 = 75.2%); while no significant weight change was observed compared with non-d-Health intervention (SMD: −0.01; 95% CI: −0.06 to 0.04; I2 = 0%). Although previous SRs showed inconsistent results24,27,28, our umbrella review supports the efficacy of d-Health interventions in improving physiological outcomes compared with no intervention. However, we did not find clear evidence regarding whether d-Health interventions outperform non-d-Health interventions or which type of d-Health intervention is most effective. Further research is needed to obtain conclusive evidence.

This umbrella review has several limitations that should be considered when interpreting the results. First, although this umbrella review was conducted following a pre-determined protocol, the protocol is not publicly available. To address this limitation, we reported deviations from the protocol in the Methods section. Second, we found moderate to high heterogeneity in most meta-analysis results. Meta-analyses were performed using a random-effect model, and subgroup analyses identified potential sources of heterogeneity (e.g., control conditions, risk of bias, m-Health use, multielement use, combination with offline interventions, and sex). Future studies that account for these factors are needed. Third, meta-analysis and subgroup analysis were incorporated during the peer review process. Thus, these results should be interpreted with caution. SRs with pre-planned meta-analyses and subgroup analyses are warranted. Finally, we could not perform sensitivity analysis including only individual studies with a low risk of bias because most studies had a high or moderate risk. We are concerned about the potential overestimation of the efficacy of d-Health interventions due to high dropout rates and flaws in handling missing data in individual studies.

Considering these limitations, we propose several recommendations for further research to advance d-Health interventions in workers. These recommendations aim to assist employers in deciding whether to implement d-Health as a health promotion tool for their employees.

  1. i.

    Clearly define the research question and study design (i.e., population, intervention, control, and outcome) and report it in accordance with relevant reporting guidelines to facilitate quantitative synthesis. This allows us to assess whether the magnitude of benefits is clinically essential.

  2. ii.

    Evaluate the efficacy of the d-Health intervention compared with non-d-Health interventions and identify which d-Health interventions are more effective than others.

  3. iii.

    Select at least one key PA- and SB-related outcome (e.g., step counts, MVPA, sedentary time, and sitting time) measured using a validated and reliable method. Ideally, researchers should use activity monitors to measure PA and SB, as questionnaires are less accurate in measuring these behaviors190,191.

  4. iv.

    Design studies to minimize the loss to follow-up and apply appropriate statistical methods to handle missing data (e.g., multiple imputations and mixed effect models).

  5. v.

    Conduct SRs in accordance with the latest Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement158 and refer to the AMSTAR 2 tool175 for more rigorous evidence synthesis and reporting.

This umbrella review briefly assessed 24 SRs and 130 individual studies included in these reviews, summarized the efficacy of d-Health interventions on PA-related, SB-related, and physiological outcomes while considering comparison groups in workers. Current evidence suggests that d-Health interventions, compared with no intervention, can improve PA-related, SB-related, and physiological outcomes. However, there is insufficient evidence to conclude the superiority of d-Health interventions over non-d-Health interventions and which type of d-Health intervention is most effective. Future high-quality studies are needed to strengthen the scientific evidence on the efficacy of d-Health interventions in promoting PA, reducing SB, and improving physiological outcomes in workers.

Methods

This review was reported in accordance with PRISMA 2020 statement192 (Supplementary Table 5) and an approach proposed by Aromataris et al.193. Although the review protocol was not prospectively registered, we conducted the review according to the pre-determined protocol. During the peer review process, we modified or added to our review methods as follows: adding two databases (i.e., Cochrane Library and Google Scholar) for literature searching; using a vote-counting system188,189 and the revised effect direction plot189,194 for the narrative synthesis of individual study results; and performing meta-analysis and subgroup analysis on several outcomes for quantitative synthesis and the investigation of heterogeneity.

Eligibility criteria

We included SRs, with or without meta-analysis, of interventional studies (RCTs and non-RCTs) and cohort studies. Reviews were excluded if they had an inappropriate study design, such as scoping reviews, literature reviews, individual studies (e.g., RCTs), or gray literature (e.g., conference abstracts and protocols).

We included reviews that contained at least one individual study involving workers aged 18 or older. No restrictions were placed on job type, employment status, shift type, or disease risk factors; however, studies involving trained athletes were excluded.

This review defined d-Health as medical and public healthcare practices that utilize digital technologies, including phone calls, short messaging services, emails, static and dynamic websites, computer applications, and simple computer prompts9,195. We also considered m-Health as a subcomponent of d-Health. In this umbrella review, m-Health refers to medical and public healthcare practices using smartphone applications, interactive wearable devices (e.g., activity tracker, heart rate monitor), and accelerometers integrated into smartphones, following the WHO’s 2019 definition195 and a review by Nomura et al.196. Traditional non-interactive pedometers (i.e., those without data distribution to a user interface) were not classified as m-Health and d-Health technologies, consistent with a previous SR12.

Interventions must incorporate at least one d-Health technology, either alone or as a part of a multicomponent intervention, to optimize PA, SB, and physiological outcomes. We included multicomponent interventions because many d-Health interventions targeting these outcomes were delivered within such frameworks17. Including these interventions broadens the range of eligible studies and the review’s scope while ensuring the inclusion of relevant research. However, we excluded interventions that used d-Health devices solely for outcome monitoring rather than as an active component of the intervention.

Eligible control conditions included no intervention, usual care, waitlist groups, active control groups, pre-post data, or non-d-Health interventions. Additionally, we included studies that used d-Health interventions in both the intervention and control conditions, as well as those comparing different d-Health interventions.

SRs were required to include individual studies that assessed at least one PA-related, SB-related, and physiological outcome (i.e., outcomes influenced by PA, such as body weight, BMI, fat mass or rate, and WC). All types and units of PA- and SB-related outcomes were eligible, including both subjective and objective measures (e.g., self-reported PA level, step counts, daily or weekly time in MVPA, sedentary time, and sitting time). We included only articles written in English, as a restricting language during study selection rather than during literature searching introduces minimal language bias, which is considered negligible197.

Information sources and search strategy

We searched PubMed, Cochrane Library, and Google Scholar focusing for SRs, with or without meta-analysis, from database inception to October 25, 2024. No language restrictions were applied. The search strategy was developed using medical subject headings and text words related to the following categories: Population (“occupation”), Intervention (“d-Health”), Outcomes (“PA,” “SB,” “body weight,” “BMI,” “fat percentage,” and “waist circumference”), and Publication Type (“SR”). Full search strategies for all databases are provided in Supplementary Table 1.

Review selection process

Three review pairs (MI and CO; MI and SJ; MI and TY) independently screened titles and abstracts. Full texts were obtained for all studies that appeared to meet the inclusion criteria or where eligibility was uncertain. The pairs then screened the full texts to determine inclusion. Disagreements were resolved through discussion, and unresolved disputes were adjudicated by an arbitrator (KN). All selection processes were performed using Rayyan198, a web application to conduct those processes effectively.

Data extraction from systematic reviews

Tow reviewers (MI and CO) extracted data using a standardized extraction form, following a pilot evaluation by the first author (MI) to ensure adequate data capture. Extracted information included the first author’s name, publication year, total number of individual studies in each review, number and percentage of individual studies meeting the current umbrella review’s inclusion criteria, population, setting (e.g., not restricted or workplace), characteristics of included interventions, measured outcomes (PA-related, SB-related, and physiological outcomes), tools and results of risk of bias assessment, and study findings.

Data extraction from individual studies

Only 6 of 24 SRs were with >80% of the percentage of individual studies that met the umbrella review’s inclusion criteria. In the same manner as data extraction from SRs, two review pairs (MI and CO; MI and KN) extracted additional information from studies that met the inclusion criteria to improve the directness between our PICO (population, intervention, control, and outcome) and the PICO of previous reviews199,200. The following items were extracted if the corresponding SR did not report them: the country where the individual studies were conducted; details of participant characteristics, including sample size, age, percentages of females, dropout rate; whether intention-to-treat analysis was performed; a method for handling missing data; details of intervention and control conditions; a technique for assessing PA and SB; and results of studies.

Intervention details included intervention duration (from baseline to the end of the intervention) and follow-up duration (from baseline to follow-up measurement). We also extracted comparison types, categorized as follows:

  • d-Health vs. no intervention: Compared d-Health interventions to no intervention, including usual care and wait-list control.

  • d-Health vs. non-d-Health: Compared d-Health intervention to non-d-Health intervention (e.g., printed materials, supervised exercise sessions, face-to-face counseling).

  • d-Health vs. other d-Health: Compared a d-Health intervention to other d-Health interventions (e.g., multiple technologies vs. single technology, personalized vs. non-personalized, frequent vs. less frequent).

  • Other comparisons.

Additional extracted data included the digital technologies used (d-Health and m-Health), intervention focus (i.e., PA and SB, diet/nutrition, or others including smoking, alcohol, stress management), and whether the intervention incorporated multielement approaches (e.g., use of website and email; use of application and email), non-d-Health (e.g., face-to-face counseling, supervised exercise sessions, educational lectures, and printed material), or incentives. We also recorded the type (objective, subjective, or both) and details of PA and SB measurement methods. Individual study results at both the end of the intervention and follow-up were extracted.

Risk of bias assessment for systematic reviews

Reviewer pairs (MI and TM; MI and MN; MI and SJ; MI and TY) independently assessed the risk bias for each included SR using the AMSTAR 2 tool175. The tool comprises 16 items, each rated as “no,” “partial yes,” or “yes.” Seven critical items can critically affect the validity of a review, while the remaining nine are considered non-critical. The seven critical domains are as follows: Item 1, protocol registration; Item 2, adequacy of the search strategy; Item 3, risk-of-bias assessment; Item 4, appropriateness of meta-analytical methods; Item 5, use of the risk-of-bias assessments in interpretation; Item 6, evaluation of publication bias; and Item7, justification for excluding individual studies. SRs were classified as follows:

  • High confidence: 0 critical weakness and 1 non-critical weakness)

  • Moderate confidence: 0 critical weakness and 2 to 3 non-critical weaknesses)

  • Low confidence: 1 critical weakness

  • Critically low confidence: >1 critical weakness

This umbrella review defined more than three non-critical weaknesses as multiple non-critical weaknesses, which may reduce confidence in the review. If a SR had no critical weaknesses but more than three non-critical weaknesses, its overall rating was downgraded from moderate to low confidence.

Overlap in individual studies in included systematic reviews

The degree of overlap among individual studies included in the SRs captured in the current umbrella review was assessed using the CCA method201. A CCA of 100% indicates that all SRs in this umbrella review contain exactly the same individual studies, whereas a CCA of 0% indicates that each SR includes entirely unique individual studies. A less than 5% CCA suggests slight overlap, 6%–10% moderate overlap, 11%–15% high overlap, and 15% or more very high overlap201.

Narrative data synthesis methods

Given the overlap among some of the component studies included in various meta-analyses captured in this umbrella review, as well as heterogeneities in study designs, interventions, and outcomes across SRs and their individual studies, we primarily chose narrative synthesis to summarize the results of the SRs and individual studies. We summarized the risk of bias for each study assessed using the Cochrane risk-of-bias tool for randomized trials (Cochrane RoB)202 or the revised Cochrane risk-of-bias tool for randomized trials (Cochrane RoB2)203 as “High,” “Some concerns,” or “Low.” For studies evaluated using the Risk of Bias in Non-Randomized Studies of Interventions (ROBINS-I)204 or the Effective Public Health Practice Project (EPHPP)205, we converted “Serious” or “Weak” ratings to “High,” “Moderate” ratings to “Some concerns,” and “Low” or “Strong” rating to “Low.” In SRs that applied the Joanna Briggs Institute (JBI) Checklist206, we followed Wonggom’s approach207 and categorized the risk of bias as either “High” or “Low.” For SRs using the Standard Quality Assessment Criteria for Evaluating Primary Research Papers from a Variety of Fields (QUALSYST), we followed the QUALSYST manual208 to classify studies with a total score greater than 0.75 as “Low” risk, those with a score between 0.55 and 0.75 as “Some concerns,” and those with a score below 0.55 as “High” risk. When other tools or specific assessment items were used, we conservatively rated the risk of bias as “Some concerns.” If a SR did not report individual risk-of-bias results, we categorized the study’s risk of bias as “Not reported.” Finally, when an individual study was included in multiple SRs, we assigned its risk of bias based on the worst rating among them.

We narratively synthesized the direction of results of individual studies using the vote-counting system188,189 and visualized them using the modified effect direction plot189,194. For RCTs and non-RCTs (i.e., non-RCT interventional studies or cohort studies) with the control group(s), we defined the direction of the results as follows:

  • Significantly favorable: The outcome significantly favored at the end of the intervention or follow-up (e.g., p-value less than 0.05, with 95% CIs that did not intersect with 0).

  • Significantly unfavorable: The outcome significantly favored the control groups.

  • Mixed: At least one outcome significantly favored the intervention group, while at least one other outcome significantly favored the control groups.

  • Non-significant: No outcome showed a significant difference between the intervention and control groups.

  • No statistical test performed: Group comparisons were not performed for any outcomes, including cases where only pre-post comparison results were reported.

    Where multiple outcomes and/or measurements were reported within the same outcome domain (e.g., PA-related, SB-related, and Physiological outcomes) in a single study, the results were classified as non-significant or mixed if less than 70% of the outcomes and measurements reported a consistent direction188,189.

    For non-RCTs without control groups, the direction of the results was defined as follows:

  • Significantly favorable: The outcome significantly improved at the end of the intervention or follow-up assessment.

  • Significantly unfavorable: The outcome significantly worsened from baseline.

  • Mixed: At least one outcome that significantly improved, while at least one other outcome significantly worsened.

  • Non-significant: No outcome showed a significant difference from baseline.

  • No statistical test performed: No pre-post comparison was performed for any outcomes.

Where multiple outcomes and/or measurements were reported within the same outcome domain in one study, the results were categorized as non-significant or mixed if less than 70% of the outcomes and measurements showed a consistent direction188,189. Then, we calculated the number of outcome-specific studies with significantly favorable, non-significant/mixed/no-statistical test performed, or significantly unfavorable results. These totals were expressed as a percentage of the total number of individual studies. Additionally, findings were summarized using the revised effect direction plot189,194.

Meta-analysis

All statistical analyses were performed using R (version 4.4.1). The pooled effect was estimated for step counts, MVPA, SB (combined sedentary time and sitting time), and weight from RCTs that used no intervention or a non-d-health intervention as a control. When a study measured an outcome across different time frames (i.e., overall, workday, non-workday), we extracted data based on the following priority order for meta-analysis: overall, workday, and non-workday. For studies that measured an outcome at multiple time points (e.g., at the end of the intervention, three months post-intervention), we used the latest measurement as the post-intervention value.

We estimated SMD in change scores (subtracting the baseline value from the post-intervention value) because most individual studies reported it or provided baseline and post-intervention values. For studies that reported the mean change score without its standard deviation (SD), we estimated the missing SD using sample size and standard error (SE) or the 95% CIs209. For studies that did not report the change score but provided baseline and post-intervention means and SDs, the SD of the change score was calculated using the following formula: SDI or C, change = [SD2I or C, baseline + SD2I or C, post-intervention – (2*Corr*SDI or C, baseline + SDI or C, post-intervention)]0.5 209. Where, I denotes the intervention group, C denotes the control group, and Corr represents the correlation coefficient between baseline and post-intervention values. We assumed the correlation was 0.5 for conservative analysis209. When a study reported baseline and post-intervention medians with the 25th and 75th percentiles, we calculated the means and SDs of baseline and post-intervention values using Wan’s method210. For studies with multiple intervention or control groups, we created pairs of each intervention or control group. For example, if a study had three intervention groups and one control group, we created three pairs: (I1 vs. C, I2 vs. C, and I3 vs. C). Missing data were handled using listwise deletion.

Random effect models were employed to combine the results. Using the inverse variance method, we calculated a pooled SMD and 95% CIs for each outcome. In this umbrella review, an SMD of 0.2 or more but less than 0.5 was interpreted as a small effect; an SMD of 0.5 or more but less than 0.8 as a moderate effect; and an SMD of 0.8 or more as a large effect211. Statistical heterogeneity was assessed using the chi-squared test and I2 statistics. A P-value of less than 0.10 was considered statistically significant209. We interpreted the I2 statistics as follows: <25%, low heterogeneity; 25–75%, moderate heterogeneity; >75%, high heterogeneity212. These analyses were performed separately according to the control conditions (i.e., no intervention, non-d-Health intervention). Additionally, we analyzed MVPA and SB separately by measurement method (i.e., objectively or subjectively measured) as subjectively measured PA using questionnaires has several limitations (e.g., recall bias, missing data, and less precision)213.

To examine the source of heterogeneity, we performed subgroup analyses for all four outcomes based on risk of bias, intervention duration (<3 months or 3 months or more), m-Health use, multielement use, offline intervention use, outcome type (i.e., overall, workday, non-workday), and sex (percentage of female <60% or 60% or more). In the subgroup analyses, we combined the two control conditions (i.e., no intervention and non-d-Health intervention) to increase the sample size. Additionally, only objectively measured results were used for MVPA because subjectively measured MVPA had a very wide 95% CI.

To assess publication bias, we conducted a funnel plot analysis and Egger’s test214 for step counts and SB in RCTs that employed no intervention as a control, as well as weight in RCTs that utilized either no intervention or a non-d-Health intervention as a control, since these methods are valid when there are at least 10 studies.

Confidence in cumulative evidence

The confidence in the cumulative evidence on the efficacy of the d-Health intervention for step counts, MVPA, sedentary/sitting time, and weight was assessed across the following domains for each outcome: risk of bias215, inconsistency216, imprecision217, indirectness218, and publication bias219 following the Grading of Recommendations Assessment, Development, and Evaluation working group methodology (GRADE) approach220.

Risk of bias assesses the overall risk across a group of studies for each outcome215. The judgment is typically based on the risk of bias in individual studies (e.g., Cochrane RoB2203, ROBINS-tool204). Therefore, we referred to the results of risk of bias assessment to evaluate the risk of bias among individual studies for each outcome category. Inconsistency refers to variability in treatment effects216. We assessed the similarity of point estimates and CIs and the results of statistical tests for heterogeneity. Indirectness indicates a mismatch between the research question and the available evidence218. Assessment of indirectness is based on discrepancies in population, intervention, control, and outcome. Imprecision primarily considers the width of 95% CIs and sample size, with at least 400 participants required for continuous outcomes217. Publication bias was evaluated using a funnel plot and Egger’s test219. Finally, the confidence in the cumulative evidence was summarized in a figure, with a red, yellow, and green color-coding system to aid visual interpretation193.