Introduction

Physical activity is an important factor in the etiology of chronic diseases,1 whereas less is known about the association between activity-related energy expenditure (AEE) and chronic diseases.2 AEE can be defined as the component of total energy expenditure that is caused by any kind of body movement produced by skeletal muscles.3 Although AEE is primarily determined by physical activity, it also depends on other individual characteristics, such as sex, age, height or body composition.4, 5, 6, 7, 8 The measurement of AEE under free-living conditions is a challenging task, as AEE is derived as the difference between total energy expenditure (TEE), resting metabolic rate (that is, energy necessary to uphold the basal metabolic functions) and diet-induced thermogenesis.9 The gold standard for measuring TEE under free-living conditions is the doubly labeled water method (DLW), and resting metabolic rate can be obtained by indirect calorimetry where the amount of oxygen consumption and carbon dioxide production of an individual under fasting and resting conditions is measured.10

Nevertheless, these methods are too time and cost intensive to be used in large-scale epidemiological studies. Therefore, AEE has been estimated traditionally by relying on questionnaire-based physical activity information that is then linked to MET (metabolic equivalent) intensity levels to derive an individual’s energy expenditure.11

Recently, accelerometry has been introduced into the field of physical activity measurement, and several devices are currently in use in epidemiological studies such as the German National Cohort.12 Accelerometers are small devices that may be attached to the human body for several days and thereby objectively capture acceleration of body movement in up to three planes; thus, they can provide information about frequency, intensity and duration of physical activity.13 Similar to the MET values, the accelerometer output might be used to estimate an individual’s AEE. It is, however, unclear to what extent these devices may explain the variance in AEE under free-living conditions.

Therefore, the aim of this systematic review was to summarize studies that predict AEE based on accelerometry-assessed physical activity data in adults under free-living conditions, and to examine to what extent these predictions explain the variance in AEE. The second aim was to examine to what extent other factors influence these predictions, such as study design, accelerometer device properties or individual characteristics (for example, age, sex, body composition), and to what extent the addition of such factors improve prediction models.

Materials and methods

Search strategy and study selection

A comprehensive literature search was performed in the MEDLINE database (Medical Literature Analysis and Retrieval System Online) from inception until 31 December 2014 using the following keywords and operators: energy expenditure AND (prediction OR estimation OR validation OR regression OR model) AND (accelerometry OR accelerometer OR motion sensor OR activity monitor) AND (activity OR exercise). The following filters were set: species: humans; ages: adult 19+ years; languages: English, German; text availability: full text; publication date: to 31 December 2014. In addition, reference lists of included articles, and references of reviews and meta-analyses on this topic, were hand-searched for further eligible articles.

During the first selection step based on title and abstract, articles were excluded if (1) AEE was not examined, (2) no accelerometer was used to measure physical activity and (3) no adult population aged at least 18 years was examined. All further selection steps were based on full-text screening (if available). Articles were excluded for the following reasons: (1) examining no AEE but only TEE, (2) using no accelerometers but pedometers or other devices instead (for example, heart rate monitors), (3) examining only children or adolescents, (4) setting up the study under no free-living conditions (which requires the use of DLW for measuring TEE) and (5) reporting no original data (review articles). One person conducted the literature search and the initial screening of title and abstract, and two people performed the full-text screening. Any disagreements were solved by consensus.

Several of the studies that we identified did not aim at prediction of AEE based on accelerometry-derived physical activity data but instead compared the AEE estimates that were automatically calculated by the accelerometers based on underlying (in most instances not freely available) algorithms with those AEE derived from DLW. As the focus of our review was to evaluate the prediction of AEE, we excluded these studies from our analysis. A summary of these studies can be found in Supplementary Table S1.

We considered DLW as the gold standard to assess TEE under free-living conditions. The mean difference and its s.d. of DLW to determine TEE as compared with indirect calorimetry or controlled food intake are reported to range between −2% and 6% and between 1% and 8%, respectively.14

Data extraction

From each included article the following information was extracted by two reviewers: first author’s name, year of publication; study population characteristics (type of population, sample size, sex, age range or mean, body mass index range or mean (or alternatively weight)); accelerometer characteristics (device name, accelerometer type, body position, recording period, wear time); energy expenditure measurement features (period of DLW measurement, measurement or calculation of resting/basal/sleeping metabolic rate, diet-induced thermogenesis, AEE calculation); measured accelerometer output metric; results of association and prediction (if reported): crude explained variance (crude R2), that is, the variance in AEE explained solely by accelerometer output; type of prediction model; predictors of final model(s); total R2, that is, the variance in AEE explained by accelerometer output and additional predictors; partial R2, that is, the variance in AEE explained by accelerometer output if other predictors are included in the model; and other factors not included in the model. The reported correlation coefficient R was transformed to R2 if necessary.

If multiple accelerometer devices were used in one article15, 16, 17, 18 or if various subpopulations19 were examined, each device or population group was regarded as a separate study in our analysis. Among studies that included additional predictors to the association of accelerometer output and AEE, some reported several prediction models.19, 20, 21, 22, 23, 24 From these studies we considered only the model that explained the largest variance (total R2) for relation analyses. In cases where the same explained variance was reported for different models, we used the model with the lowest number of added predictors (most parsimonious model). In sex-stratified analyses sex was considered as an additional factor,25 and also interaction terms were considered as an additional factor.19 If mean age was not reported, it was calculated as mean of minimum and maximum of the population’s age.17, 19, 26 Some studies reported various accelerometer output metrics from the same device. Among the different outputs of the accelerometers, the primary output metric that we considered was counts per day16, 18, 23 or vector magnitude counts per day.18, 24 We did not consider time per intensity categories as accelerometer outputs. Some studies did not use absolute AEE but (additionally) AEE relative to body weight (AEE per kg).16, 20, 21, 24, 27, 28, 29, 30, 31, 32 If both absolute and relative AEE were reported, we considered the absolute value for result description and relation analyses.

Statistical analysis

To assess whether study or accelerometer device characteristics influence the association between accelerometer-derived physical activity output and DLW-derived AEE, we examined the relation between explained variance in AEE and study characteristics in the univariate (crude R2) and multivariate models (total R2) graphically and with nonparametric tests. Scatter plots and Spearman’s rank correlation were used for continuous characteristics (that is, sample size, mean age), whereas boxplots and Mann–Whitney U-test or Kruskal–Wallis test were used for categorical characteristics (that is, accelerometer body position, recording period, wear time, accelerometer output type, accelerometer output metric).

For the subset of studies that reported on adding predictors to the model to improve the explained variance in AEE, we depicted crude and total R2 with boxplots. We tested improvement of R2 with Wilcoxon signed rank sum test, and examined the relation between number of additional predictors and total R2 and improvement of R2 with Kruskal–Wallis test.

In sensitivity analyses we compared the crude explained variance R2 between the group of studies that reported absolute AEE value and the group that reported AEE relative to body weight (AEE per kg) using the Mann–Whitney U-test. Studies that reported both values were considered in both groups. We also compared crude R2 within the group of studies that reported both values (AEE and AEE per kg) using the Wilcoxon signed rank sum test.

All analyses were performed using statistical software SAS Enterprise Guide, version 4.3 (SAS Institute Inc., Cary, NC, USA). Figures (boxplots, scatter plot) were made using Microsoft Excel 2010 (Microsoft Corporation). Presented P-values are two tailed and were considered statistically significant if P<0.05.

Results

Search results and study selection

Out of 299 articles resulting from MEDLINE database search, 64 articles remained after reading title and abstracts (Figure 1). After reading full text (not available for 2 articles), 43 articles were excluded. This resulted in 21 articles meeting the inclusion criteria. In addition, 16 articles were included after hand-searching of reference lists of included articles or of reviews or meta-analyses on this topic. Of these 37 articles, 18 were excluded because they did not aim at prediction of AEE but instead on the comparison between accelerometry-derived AEE and DLW-derived AEE (listed in Supplementary Table S1). Thus, our analysis includes 19 articles that reported on the prediction of DLW-derived AEE based on accelerometry-derived physical activity data.

Figure 1
figure 1

Flowchart of article selection. EE, energy expenditure.

Characteristics of included articles

The characteristics of the 19 included articles are provided in Table 1. Because of the usage of multiple accelerometers in one population15, 16, 17, 18 or the examination of various subpopulations,19 we extracted 28 separate studies on the associations between accelerometer-derived physical activity output and measured AEE under free-living conditions. All studies had a cross-sectional study design.

Table 1 Characteristics and results of 19 included articles

Of these 28 studies, 10 included additional factors beyond accelerometry into the prediction model. Table 2 gives a summary of study characteristics of the 28 studies. Most of the studies were conducted in the general population with sample sizes ranging from 10 to 149 individuals. A total of 19 different accelerometer devices were used (7 uniaxial accelerometers, 11 triaxial and 1 biaxial) from a total of 15 different manufacturers. Of the 28 studies, 12 studies applied uniaxial accelerometers, 15 studies applied triaxial accelerometers and 1 study a biaxial accelerometer. Recording periods ranged from 5 to 15 days (Table 2). In most of the studies the accelerometer was worn at the trunk (that is, hip, lower back, waist or chest; n=20 studies), and the wear time was limited to waking hours (n=21 studies). The most frequently reported accelerometer output metric was uniaxial or triaxial activity counts per time interval.

Table 2 Summary of characteristics derived from the articles in Table 1 concerning study population, accelerometer and prediction model for all studies and subset of studies with additional prediction model

The measurement period of TEE using the DLW method ranged from 7 to 14 days, and most studies measured resting, basal or sleeping metabolic rate with indirect calorimetry techniques (ventilated hood, handheld mask, respiration chamber) instead of using estimation formulas (Table 1).

Explained variance (R2) and its relation to study characteristics

In the studies, linear (Pearson’s correlation, linear regression) and nonlinear approaches (Spearman’s rank correlation, log-linear regression) were used to calculate the variance in DLW-derived AEE explained by accelerometer output. We included 24 studies in the analysis, as 4 studies did not report information about crude R2 values. Crude R2 ranged from 0.043 to 0.80 (Table 1) with a median of 0.26 (Figure 2). Crude R2 did not significantly differ by accelerometer recording period (1 week vs >1 week), body position (trunk vs limbs), wear time (waking hours vs 24 h), accelerometer output type (uniaxial vs triaxial outputs) or accelerometer output metrics (counts vs steps vs other) (all P-values of Mann–Whitney U-test and Kruskal–Wallis test >0.05, Figure 2). There was a significant inverse association between crude R2 and sample size (r=–0.45, P=0.03, Figure 3). There was no significant correlation between crude R2 and mean age of participants (r=0.16, P=0.44, Figure 3).

Figure 2
figure 2

Crude R2 of all 24 studies, and stratified by accelerometer characteristics (recording period, body position, wear time, accelerometer output type, accelerometer output metric); circle=outlier; a includes the categories ‘acceleration’ and ‘other’ provided in Table 2.

Figure 3
figure 3

Scatterplots of crude R2 and sample size or mean age based on 24 studies.

Explained variance (R2) in studies with additional predictors beyond accelerometer output

Ten studies reported on including additional predictors to the association of accelerometer output and DLW-derived AEE. The characteristics of these 10 studies were similar to the characteristics of all 28 studies (Table 2). Information about the prediction models of the 10 studies and their performance are listed in Table 3 and summarized in Table 4. The studies by Pomeroy et al.17 were not taken into account in the summary Tables 2 and 4 and in the relation analysis because this study did not report crude R2 and total R2.

Table 3 Additional study characteristics and effects of added predictors on total R2 in subset of studies with prediction model (n=10)
Table 4 Summary of characteristics derived from articles in Table 3 concerning additional predictors for studies with additional prediction model

The studies included between one and three predictors in addition to accelerometer output metric. Weight and fat-free mass were the most frequently added predictors (Table 4). The explained variance (total R2) of the multivariate models that included other predictors in addition to accelerometer output ranged from 0.125 to 0.86 (median 0.41), and partial R2 for accelerometer output ranged from 0.04 to 0.41 (Table 3). Interestingly, in four studies presenting several models,21, 22, 23, 24 those models that included fat-free mass explained a higher proportion of variance in AEE compared with the models that included weight instead.

When stratified by the number of additional predictors, total R2 did not differ between studies with 1 (n=4, median total R2=0.42), 2 (n=4, median total R2=0.37) or 3 additional predictors (n=2, median total R2=0.63; P=0.56). When examining total R2 over sample size, there was an inverse correlation, similar as with the crude R2 relation, but without statistical significance (r=–0.53, P=0.12).

Eight studies provided information on crude R2 (with accelerometer output but no additional predictors) and about total R2 (with accelerometer output and additional predictors).19, 20, 21, 23, 24, 25, 33 Among the 3 studies that added 1 predictor to accelerometer output in the model, the explained variance increased from 0.21 to 0.31 (P=0.25). Among the 3 studies that added 2 predictors, it increased from 0.08 to 0.33 (P=0.25), and among the 2 studies that added 3 predictors, it increased from 0.37 to 0.63 (P=0.50). The improvement of R2 did not differ between studies with 1 (n=3, median R2 increase=0.10), 2 (n=3, median R2 increase=0.26) or 3 additional predictors (n=2, median R2 increase=0.26; P=0.16). When considering these 8 studies together, the explained variance increased significantly from 0.16 without additional predictors to 0.37 with the largest number of predictors available in each study (P=0.008, Figure 4).

Figure 4
figure 4

Crude and total R2 for subset of studies adding additional predictors (n=8); circle=outlier.

Sensitivity analysis concerning AEE character

Eleven studies reported absolute AEE only as dependent (outcome) variable,15, 18, 23, 25, 33 6 studies reported AEE relative to weight (AEE per kg) only20, 24, 27, 28, 29, 31 and 7 studies reported both AEE and AEE per kg.16, 19, 21, 30, 32 There was no difference in crude R2 between studies using AEE and studies using AEE per kg when analyzed as between-group comparison (AEE studies: n=18, crude R2 range 0.043–0.49, median 0.23; AEE per kg studies: n=13, crude R2 range 0.05–0.80, median 0.35; P=0.09) or as within-group comparison (n=7, AEE studies: crude R2 range 0.09–0.46, median 0.29; AEE per kg studies: crude R2 range 0.05–0.62, median 0.35; P=0.08). Furthermore, similar to our main analysis, there was no relation between crude R2 and recording period, body position, wear time or accelerometer output type in these two groups. There was also no correlation between crude R2 and mean age of participants. There was an inverse association between crude R2 and sample size in both groups, although the correlations were not statistically significant at the 5% level (AEE studies: r=–0.37, P=0.13; AEE per kg studies: r=−0.51, P=0.07).

Discussion

In this systematic review, we identified 19 articles resulting in 28 underlying studies that estimated AEE based on accelerometry-assessed physical activity data under free-living conditions in a general adult population. The explained variance of DLW-derived AEE from single accelerometer output was quite broad and ranged from 4 to 80%. Sample size was the only parameter that was related to the explained variance across studies, in a way that it was lower in studies with larger sample size. Parameters such as accelerometer output type or output metrics, recording period, body position of the accelerometer, wear time or age did not systematically explain this heterogeneity. Inclusion of predictors other than accelerometry significantly improved the explained variance in AEE, although this did not depend on the number of predictors included.

We speculate that the heterogeneity observed in our review for the explained variance in AEE across studies is partly because of different study designs. This may be supported by our observation that studies with smaller sample size resulted in higher explained variances, whereas studies with larger sample size resulted in lower explained variances. Smaller studies often include volunteers selected from a special group (for example, conscripts26 or elderly15, 16), where the range of personal characteristics that may affect energy expenditure (such as age, weight, height, body composition or activity patterns) is likely to be smaller than in larger studies. Accordingly, the variance in AEE that is due to these personal characteristics will be smaller and, as a consequence, the variance explained by accelerometry-assessed physical activity will be greater in smaller studies than in larger studies.

Furthermore, accelerometers might also detect different types of activities with different accuracy;13 for example, accelerometers worn at the hip may be more likely to detect activities that involve movement of the trunk but less likely to detect activities that involve movements of the arms. We speculate that the type of activities in larger studies might be more heterogeneous as compared with smaller studies. This could result in smaller variance in AEE based on accelerometry in larger studies with more heterogeneous types of activities. Unfortunately, the data we obtained from the published studies included in our systematic review did not allow to investigate the effect of type of activities on the explained variance in more detail.

In addition, it is likely that different procedures of data processing may further contribute to heterogeneity across studies. There is a large variety of available accelerometer devices with each manufacturer having its own approach to filter, amplify or convert the acceleration signals into an output value, commonly activity counts, that is, however, not comparable between different accelerometers.34 Unfortunately, this information on data processing was not readily available from the publications included in our analysis and, therefore, we were not able to investigate whether such variety may systematically account for the differences in the explained variances across studies. We were also not able to analyze data by manufacturer or device model that might have been a proxy for different data processing techniques35, 36 because the multitude of manufacturers and device models did not allow an aggregated analysis.

Furthermore, the length of the wear time period per day may influence the variance of AEE explained by accelerometry-assessed physical activity. Overall, we found no significant differences between studies that applied accelerometry 24 h per day and those that recorded physical activity during waking hours only. However, the definition of waking hours was different across studies and may further contribute to heterogeneity of results. For example, Herrmann et al.37 showed that estimation of daily physical activity based on extrapolation of step counts recorded for <12 h per day may underestimate ‘true’ average daily physical activity. The placement of the accelerometer at the body could determine whether and how valid all the different activities are detected, and the length of recording period should cover the typical activity pattern of an individual but without having a negative impact on the compliance.38 Nevertheless, we found no significant differences in the explained variance of AEE for studies with different body placements or with different recording periods.

We found no significant differences in the explained variance of AEE between studies that used triaxial accelerometer outputs versus those that used uniaxial outputs. Theoretically, triaxial accelerometers may record physical activity with higher validity than uniaxial devices.6, 38, 39 Furthermore, some studies suggest that activities in sedentary or standing postures may be detected more sufficiently with triaxial as compared with uniaxial accelerometers.9, 38 In our review, two articles analyzed both the uniaxial and the triaxial output of one used monitor.18, 24 In both studies the explained variance of AEE was slightly higher for the triaxial output compared with the uniaxial (R2=0.81 vs 0.77(ref. 24); R2=0.29 vs 0.27(ref. 18)). As these differences are rather small, we speculate that both accelerometer types seem to provide comparable information about AEE in free-living subjects.38

After inclusion of other predictors in addition to accelerometer output into the prediction models, the explained variance in AEE generally increased and ranged from 12.5 to 86%, still quite a broad range. We found no clear association between the number of additional included predictors and the explained variance of that model. Similarly, the improvement of the explained variance after including additional predictors did not differ with the number of additional predictors. The partial explained variance for accelerometer output ranged from 4 to 41%. We therefore speculate that the number of predictors is not most important but which predictors are included. However, the number of studies was low in this analysis, and therefore results have to be interpreted cautiously.

Physiologically plausible predictors are body weight or fat-free mass, as they are associated with energy expenditure and physical activity.7 In our review, these were the most frequently used predictors. Interestingly, when fat-free mass was included in the model instead of weight, the explained variance slightly increased.21, 22, 23, 24 This could be explained by the different impact of fat-free mass and fat mass or weight on AEE and physical activity.24, 40 This again illustrates the difference between AEE and physical activity: two individuals of same physical activity and same weight may have different amount of AEE due to differences in body composition because the impact of fat-free mass on AEE as metabolic active component of weight is higher than the impact of fat mass.7 Other factors also associated with energy expenditure like height, age and sex were less often included in the prediction models.

As already mentioned, the comparison of accelerometer outputs between the studies is limited because of the arbitrary character of ‘counts’.13 Therefore, the comparison of the improvement by included additional predictors should be interpreted cautiously.

The strength of this review is that we focused only on studies under free-living conditions using DLW that examined the association of accelerometer-derived physical activity output. Other reviews also included studies that compared accelerometry-derived AEE with DLW-derived AEE or studies under laboratory conditions with predefined activity protocols4, 5, 6, 41 that could mislead the interpretation of association and prediction of AEE by accelerometry in free-living populations. For comparison, we listed those identified studies from our search in the Supplementary Table S1.

Our review also has some limitations. Overall, the number of studies we identified was relatively small, and there was substantial heterogeneity in study characteristics. For example, most studies included in our review provided information about absolute AEE but we also included the few studies that reported only AEE relative to body weight. In sensitivity analyses, the explained variance of AEE was slightly but not significantly higher for studies on AEE relative to body weight as compared with studies on absolute AEE. This is in line with our speculation that body weight is an important factor to consider when predicting AEE, either by including it as a prediction factor or by standardizing AEE on body weight.

Furthermore, in two articles information of accelerometer type, body position or wear time was not reported,16, 31 and we therefore made assumptions based on other references using the same device in order to compensate the missing information. Next, in three studies the periods of DLW measurement and accelerometry recording did not strictly overlap.22, 30, 33 Overall, these often subtle differences across studies may have increased the heterogeneity observed for the variance of AEE explained by accelerometry that we found in our review. In addition, the pooling of heterogeneous studies into groups may attenuate between-group differences toward the null.

Another methodological limitation is the strict choice of some search terms so that many studies were only identified by hand-searching the reference lists.

In order to improve the prediction of free-living AEE based on accelerometry-derived physical activity, in theory there are at least two major possibilities: one is to improve the assessment of physical activity by use of technically more advanced accelerometers,34 and the other is to consider factors other than physical activity that may affect AEE. Unfortunately, based on our review, current studies that have been published so far do not allow to make evidence-based recommendations on how to improve prediction. Therefore, future studies are urgently needed to investigate in detail how AEE can best be predicted based on accelerometry, and which factors should be considered.

In conclusion, we found a large heterogeneity across studies in the explained variance of models that predict AEE based on accelerometry data assessed in individuals under free-living conditions. The explained variance was smaller in studies with larger sample size. Addition of factors other than accelerometry significantly improved the prediction but this improvement did not depend on the number of factors added. These data indicate that AEE as estimated based on accelerometry needs to be interpreted cautiously. Further development of prediction models in population-based studies under free-living conditions is needed, with focus on improved and comparable measurement of physical activity by accelerometry.