Abstract
Measurement data serve as an objective basis for scientific findings. Therefore, their reliability in repeated measurements is a crucial prerequisite. The repeatability of measurements is quantified using various reliability parameters. Often, data are subjected to normalization procedures to reduce inter-individual variability or to improve interpretability. In our study, we aimed to investigate the extent to which the application of normalization has impact on the determined reliability parameters. This was examined using the example of maximum force values for trunk extension and flexion. For this purpose, 85 healthy individuals (42 women) were subjected to maximum isometric force tests of the trunk muscles at two-weeks intervals. The calculated reliability and agreement parameters included the intraclass correlation coefficient (ICC), the standard error of measurement (SEm), the standard error of the mean (SEM), and the coefficient of variation of the method error (CVME). The ICC values consistently indicated good reliability for the original data (Extension: 0.889, Flexion: 0.882). Within the subpopulations of women and men, lower but still good ICC levels were observed (Women Extension: 0.671, Flexion: 0.826; Men Extension: 0.819, Flexion: 0.680). For the anthropometrically normalized force values, lower ICC levels were found for the entire group (Extension: 0.803, Flexion: 0.768) although the deviations in the subpopulations were different (Women Extension: 0.747, Flexion: 0.805; Men Extension: 0.852, Flexion: 0.642). Thus, normalization of measured values leads to varying deviations of calculated ICC levels compared to the original data, which is why the use of standardized data is not recommended for reliability calculations.
Similar content being viewed by others
Introduction
In order to generate or expand knowledge, findings must be collected. Depending on the field of research, this is achieved with different instruments or via corresponding parameters. In psychological research, for example and therefore incomplete, subjective information on emotional state1, stress2, state of mind3 or quality of life4 are the main target variables. In life sciences, from biology to physiology, medicine, biomedicine, nutritional sciences and veterinary medicine (research disciplines are not listed completely), measurements of the corresponding so-called objective parameters form the basis of all research. These are then crucial for the identification of physiological parameters, which are used diagnostically, particularly with regard to their deviation from normal ranges. Furthermore, corresponding parameters are recorded and assessed over time, for example over periods of life or in the short term as part of therapeutic interventions5.
However, any diagnostic interpretation of data is tied to compliance with the basic quality criteria of measurements: Validity, reliability, and objectivity6. Related to this are the sensitivity and specificity of the measurement method6, which should ensure that changes are reliably recorded (sensitivity) and can be assigned to the cause as clearly as possible (specificity).
This rather general scientific-theoretical introduction has a direct impact on the possibility of interpreting the obtained data. Particularly in the case of physiological parameters, their suitability for diagnostic statements is determined, among other things, by whether these data are stable over uninfluenced periods of time or can be measured with appropriate repeatability. Any interpretation of changes depends on this. In the context of human studies, however, various external influences must be taken into account7. This is all the more true for studies that i) are subject to motivational influences8 and ii) show large inter-individual variability9. Force values determined during maximum voluntary contraction (MVC) can serve as an example9,10. These MVC values are used to diagnose individuals with respect to their physical performance, but also to monitor the success of training programs11. In sports science, MVC tests are still the gold standard for performance testing12,13. Corresponding data are collected and used for training management. However, whether such values are transferable to naive individuals is still the subject of scientific research and debate. In addition to motivational influences, the very large inter-individual variability of the data also plays an important role in their interpretation14. Ultimately, data that vary largely between individuals are difficult to interpret. This is where standardization procedures come into play—for the aforementioned maximum force, for example, to anthropometric parameters such as body weight15,16. This results in a significant advantage for the evaluation, as anthropometric standardization makes force data individually assessable on the one hand, and on the other hand, standardization should always lead to a reduction in inter-individual variability.
Although numerous studies have investigated MVC, its clinical relevance is limited17,18,19,20. In order to move MVC measurements beyond their traditional role in assessing training efficiency, we have chosen to focus on maximal force of the trunk muscles. The importance of adequate muscular force for maintaining spinal integrity is well established21,22,23,24. While the back muscles are naturally central to diagnostic evaluations in this context, the abdominal muscles also play a significant role. Specifically, the abdominal muscles biomechanically contribute to spinal stability by generating adequate intra-abdominal pressure25. This has been associated with back pain in cases of dysfunction26,27.
Consequently, functional testing of the abdominal muscles is an essential component of therapy monitoring within targeted training programs28,29. The role of either psychological30,31,32 and neuromuscular coordination factors33,34 in the development of acute and chronic back pain are not to be discounted; however, this study focuses specifically on performance-related aspects of trunk musculature.
To our knowledge, although maximum force performance is frequently examined, established normative values remain scarce35. This is particularly relevant to the functional interpretation of the considerable interindividual differences observed in pure force capacity. As previously mentioned, standardization appears to be a reasonable approach to enhance the individual assessment and interpretation of collected data.
In general and for practical purposes, the question therefore arises as to which values should be used for reliability measures. As explained, the usefulness of standardization procedures is certainly undisputed, as this means that interesting and practically relevant additional parameters, such as the determination of the physiological force reserve at maximum force, can be used more reasonable for diagnostic purposes than data without an anthropometric context.
Therefore, we investigated maximal force testing of the trunk musculature in a practically relevant measurement scenario, conducting repeated assessments in individuals who had no prior experience with such tests. The key target parameter was to directly compare reliability and agreement parameters between the non- normalized and anthropometrically normalized MVC data, as this has not been published yet. Secondary target parameter was the determination of the respective value ranges to ensure that the obtained data corresponds with data found in the literature.
Methods
Population
Participants were recruited through regional press announcements and electronic media. For this study, a total of 85 healthy individuals (42 women) aged 24 to 52 years were examined. The anthropometric data of the population can be found in Table 1. The data presented here are part of a larger study, partial results of which have already been published36.
Exclusion criteria for the study included age < 20 or > 55 years (to avoid influences of ongoing adolescence and also to avoid involution effects), active sports participation > 3 h per week (questioned during recruitment and again immediately prior to investigation), current pain in the spine or trunk region, and a history of spinal surgery. Additionally, all participants underwent a brief clinical-orthopedic screening examination (conducted by C.An.). Written informed consent for voluntary participation was obtained from all participants. The study was reviewed and approved by the Ethics Committee of Friedrich Schiller University Jena (2021–2373-BO, 2021-2373_1-BO). Thus, the study complies with ethical standards for research involving human participants and adheres to the current version of the Declaration of Helsinki.
Investigation/data
Device
The subjects were positioned in a computerized trunk muscle testing and training device (CTT Centaur, BfmC Leipzig, Germany). Reliability of the data, detected with the device has been proven37. In the device, they stood in upright body position, secured from the hips downward while the upper body remained movable. A bar equipped with force sensors in the x- and y-directions was positioned over the participants’ shoulders, which captured the force data during the MVC tests (see below). All tests were performed in this device. The device can be tilted from 0° to 90°, while any rotational angle between − 180° and + 180° can be set. As the device tilts the subject completely, regardless of its position in space, the subject always remained in upright body position.
Determination of upper body weight
At the beginning of the examination, the upper body weight (UBW) was determined by tilting the device 90° forward (horizontal position), allowing the participants to relax into the shoulder bar (Fig. 1). Visual and tactile assessments of the back muscles ensured full relaxation of the back muscles. If residual contractions were detected, the subjects received appropriate feedback to help them relax completely. The measurement was carried out three times, and the highest reliable value was recorded to later determine the ratio of UBW to the maximal force (anthropometric normalization of force data). Immediately thereafter, the participants performed a series of defined isometric submaximal extension and flexion tests in the device. These data were analyzed for further research questions and simultaneously served as a warm-up before the maximal force tests.
Maximum voluntary contraction test
The isometric maximal force tests were conducted three times in flexion and also three times in extension direction, with a measurement duration of five seconds per trial and a five seconds rest period between repetitions. Participants were instructed to reach their maximal force within one second and maintain it for approximately three seconds. The first attempt was performed as a familiarization trial with approximately 50% of the self-estimated maximal force35, while the subsequent two trials were executed with maximal effort. The best of these two trials was used for further analysis. During the maximal force trials, verbal encouragement was consistently provided38. Testing always began with extension, followed by flexion. During the tests, subjects kept their arms crossed in front of their chests.
Subsequently, the force data were extracted from the analysis software of the CTT Centaur and included in the analysis. The entire examination (T1) was repeated in an identical manner after 14 days (T2).
Analysis parameters
For the reliability analysis, both the measured force values and force values normalized to the upper body weight were used. The following reliability and agreement parameters were applied: the Intraclass Correlation Coefficient (ICC), the Standard Error of Measurement (SEm), the Standard Error of the Mean (SEM) and the Coefficient of Variation of the Method Errors (CVME). They are briefly explained below based on36.
Intraclass Correlation Coefficient (ICC)
We used the ICC as ICC (2,1)39 ( alternatively ICC (A,1)40) for comparing data from two measurement occasions. The settings for the calculation in SPSS (IBM, Chicago, USA) were as follows: single rater, absolute agreement, two-way mixed effects40,41. The ICC value can range from 0 to 1, with values close to 1 indicating excellent reliability and values below 0.5 indicating poor reliability42.
Standard error of measurement (SEm)
It represents the standard deviation (SD) of the mean of an infinitely repeated measurement (“true value”). However, this is practically unfeasible. Therefore, each individual result is considered the best estimate of the true value, with some sampling error. This sampling error can be assumed to be normally distributed with the corresponding SD. For two measurement occasions (m = 2), the SD of all observation points (m*n with n = 85 in the actual case) is multiplied by the square root of 1—ICC43,44. Since the SEm takes the ICC into account for its calculation, it depends directly on the level of the ICC and becomes smaller as the ICC increases. No normative values are available for the SEm.
Standard error of the mean (SEM)
The SEM provides information about the variance in repeated measurements. These means themselves have a mean, for which an SD can be calculated. This SD of such a distribution of means is called the SEM and indicates the distribution of random fluctuations in the estimation of the mean of several samples. Its boundaries correspond to the well-known limits of the SD, i.e., 68% within ± 1SEM and 95% within ± 2SEM. This allows for the definition of the uncertainty range of a measurement. Values outside this range, with the corresponding probability, are unlikely to be random fluctuations. The SEM is calculated as the quotient of the SD of the difference between the two measurement occasions and √n45 (n = 85) and is therefore a parameter of agreement.
Coefficient of variation of the method errors (CVME)
The CVME is the method error (ME) normalized to the mean of the two measurements, which is conceptually similar to the SEM but uses √2 in the denominator instead of √n for calculation. Its interpretation differs from that of the SEM and primarily indicates the systematic error of the measurement methodology. The values are presented as percentages due to the normalization, which facilitates their interpretation44,46. A review about reliability of ultrasound based determination of quadriceps femoris muscle thickness revealed a relative error of 6.5%47. Given that MVC measurements are subject to both external and particularly internal influences, we anticipated that the CVME would be approximately twice as high, around 13%. Values above 15% would therefore be considered as unacceptable, i.e. of low agreement.
Minimal detectable difference
We also calculated the minimum detectable difference (MDD) to identify the threshold of clinically relevant changes in the parameter expression5,48,49. This value is provided primarily for the sake of completeness, as it is mainly used in the clinical context to detect if actual changes in parameters over the course of a disease are to be considered relevant. It is calculated as the product of the SEm*z-score*√2, and is therefore directly dependent on the level of the ICC49.
Statistics
First, a test for normality was conducted (Shapiro–Wilk). Since normal distribution was confirmed, the values from both days were compared using the t-test for dependent samples. A similar comparison, but for independent samples was made between the sexes. The respective effect sizes for dependent samples were calculated for all tests, i.e. dividing the averaged difference by its standard deviation50.
Results
For both sexes, statistically higher mean values were achieved at T2 except for the maximal extension in the female subgroup (Fig. 2). The respective effect sizes can be found in Table 2.
The measured force values differed highly significantly between sexes, regardless of the time point. When analyzing the normalized data significance levels and also effect sizes decreased (Table 3).
Reliability statistics across the entire group showed ICC values > 0.767. When examining the ICC levels separately by sex, generally lower values were found, with values > 0.641 observed (Table 4).
Discussion
In the present study, MVC force values of the trunk for extension and flexion were analyzed for their reliability in both women and men, following a two-week measurement repetition. Since such values are often characterized by large interindividual variability and therefore are difficult to interpret without anthropometric reference, they are normalized accordingly. This allows for the effective representation of key metrics, such as physiological reserve, and enables statements about individual performance capacity. It is essential, of course, to ensure the reliability of such normalized data as well.
MVC data
As expected, women exhibited significantly lower MVC values compared to men (see Table 4). This can be plausibly explained by the markedly different muscle mass between sexes51,52. This difference pertains not only to overall muscle mass but also suggests sex-specific variations in muscle fiber architecture. Although the proportions of type I and type II fibers are similar between women and men53, notable differences exist in fiber cross-sectional areas. Specifically, men display significantly larger functional cross-sectional areas of their type II fibers, a characteristic not observed in women54.
However, when normalizing the force data to UBW, the pronounced differences in absolute force output were substantially reduced (see Table 3). This convergence in normalized force values can be attributed to the significantly higher upper body weight in men. As such, normalized force values allow for a more accurate interpretation than absolute values55. Particularly, the derived physiological force reserve serves as an important diagnostic parameter for assessing physical performance capacity. The present data are suitable for use as general normative values for the investigated population and the applied test conditions.
Despite the improved interpretability of normalized maximal force data, the analysis clearly demonstrates that such data are not suitable for reliability analyses. Among others this is due to the intentionally reduced interindividual variability resulting from normalization.
It is also noteworthy that the recorded maximal force values improved in extension for men and in flexion for both sexes. Specifically, men showed increases of 4.9% in extension and 5.9% in flexion, while women improved their flexion force by 6.4%. These results demonstrate a maximum force gain that can be interpreted as a learning effect, despite ICC levels > 0.64 within the subgroups56. Interestingly, this effect was more pronounced for flexion than for extension. This may be due to the relatively unfamiliar nature of flexion exercises, whereas extension loads are more likely to be familiar. It can therefore be concluded that unfamiliar tasks are accompanied by learning effects upon repeated execution, which in turn improve performance57. However, the duration of such learning effects cannot be determined from the present data, and further targeted studies would be necessary to address this question. On the other hand, the also calculated MDD levels argue against these changes to interpreted as relevant, as their differences are all well below the determined MDD values.
Regardless of the reported values and observed sex differences, previous studies have demonstrated high reliability for non-standardized measurements of maximal trunk muscle force58. Furthermore, the values obtained in the present study are in good agreement of previously published reference data35,59.
General statements according the ICC
The calculation algorithm for the primary reliability parameter, the ICC, has specific considerations that have to be taken into account. First, the correct model for its calculation should be selected. In the present case (Eq. (1)), this would be the ICC(2,1)39 or ICC(A,1)40.
Equation (1): calculation of the ICC(2,1) according to39, BMS: between targets mean square (variance between subjects), EMS: estimated means square (residual variance), JMS: between judges mean square (variance between observations), k: number of observations, n: number of subjects.
The components of the equation may lead to confusion rather than being helpful without the appropriate mathematical background, which is why the influence of the equation’s components on the ICC will briefly be explained here. The so-called "between-subject variability" (BMS), or the variability of values between individuals, plays a crucial role in determining the final ICC value. When this variability increases, while all other parts of the equation remain constant the ICC will also increase, as it located in the numerator of the equation. At the same time, the repeatability—essentially the target measure of such calculations (JMS)—affects the ICC value. As expected, high variability between measurements leads to a decrease in the ICC value.
Additionally, there is another influencing factor, the so-called residual variability (EMS), which results from the spread of values per individual and the differences between observation time points. Furthermore, sample size plays an important role as well— the more individuals considered, the higher the ICC values tend to be. Therefore, it is difficult to predict the impact that normalization of the force values will have on the ICC and, consequently, on the other reliability parameters.
Effect of the normalization
The goals of any applied normalization procedure can be divided into i) reducing the interindividual variance of the values being assessed or ii) relating the values to individual characteristics, i.e. the applied anthropometric normalization for a more meaningful and therefore improved interpretation of data. As i) inevitably leads to a reduction of the ICC ii)'s consequences on ICC levels are not obvious. Therefore, the global means are presented in Table 3, and the dispersions per subgroup and measurement time point are shown in Fig. 2. However, it is specifically observed for the force data analyzed here that normalization to the upper body weight improves the interpretability of the maximal force values significantly but does not lead to a consistent reduction in the variation between individuals. This ca most clearly be seen in the variation coefficients, which are also displayed: here, the SD is expressed as a percentage relative to the mean value level. It can be observed that normalization resulted in an increase of the variability for extension in the female participants, but a decrease for flexion. In the male participants, the opposite occurred: normalization led to a reduction of the variability for flexion and an increase for extension. Seemingly paradoxically, these changes in interindividual variability of the values were accompanied by corresponding changes in ICC levels. Thus, the otherwise meaningful anthropometric normalization led to an ambiguous distortion of the ICC values, making them no longer unequivocally interpretable. In a transferable sense, these considerations also apply to the standard error of measurement (SEm), as the ICC is part of its calculation.
Further influencing factors: value level and sample size
The mean MVC level clearly differed between women and men, with the sex differences in the normalized values decreasing significantly in magnitude and, consequently, in the clarity of these differences. This further highlights the relevance of anthropometric normalization, which demonstrates a strong convergence between the sexes in the force ratio between MVC and UBW. Through anthropometric normalization, the values of the physiological reserve approximate despite clear force differences, so that for extension, at least at T1, no sex differences were detectable (Table 3). In other words, this means that although there are large differences in absolute force levels between both sexes, these are much smaller in relation to the physiological force reserve and tend to be negligible. This effect becomes particularly evident when considering the respective effect sizes (see Table 3), which, despite statistical comparisons remained significant, decreased to moderate levels60. Another effect of MVC levels, independent of their influence on the ICC, was observed for the standard error of the mean (SEM), which, in turn, directly affected the coefficient of variation of the measurement error (CVME). At comparable repeatability, the values of the differences between the repeated measurements showed a direct correlation, initially leading to a corresponding change in SEM. However, this change was inversely correlated with sample size. Consequently, the effects of the individual calculation components on SEM are complex and, therefore, difficult to interpret.
In contrast, the influence of value level and sample size on CVME is less complex, as sample size does not affect its calculation. Thus, at comparable repeatability, no systematic changes in CVME are expected across different MVC levels. This was confirmed by the data of the present study: while CVME values showed only minor differences between the original force values and the normalized values, only the distinctly different reliability levels for flexion and extension in women and men were associated with corresponding changes in CVME values.
Limitations
The investigation was conducted using a specific test and training device (CTT Centaur, BfmC Leipzig, Germany), which limits the mobility of the upper body in an upright position. Since the bar for force measurement was positioned cranially over the shoulders, it cannot be excluded that forces were also produced in the cranial direction due to possible contact during the maximum force measurement. Since investigators were aware of this potential error in tasks execution, they closely monitored the performance of the exercises. In case of implausibly low values, instructions were repeated, the exercise practiced with submaximal force, and finally the MVC test was repeated.
Another limiting factor arises from the studied population. Participants were individuals who where not engaged in intensive sports activities in their leisure time, and thus had little to no experience with maximum force exercises. The systematically higher maximum force values at time T2 suggest a habituation or learning effect56. Although this is a commonly observed effect in practice when investigating naïve persons, references to it in the literature are unfortunately sparse. We found only one specific study, but this relates to muscle activation, which was not reported here61.
To compensate for this, the so-called familiarization trial with submaximal force was always conducted beforehand35. Only the two remaining trials were included in the analysis.
Another influencing factor for maximum force measurements arises from motivation of the participants and the experimental conditions in terms of verbal encouragement. Since it is known that verbal encouragement positively affects performance38,62, all participants were encouraged accordingly to ensure this effect was consistent for everyone.
Summary
Reliability analyses should always be conducted using non-normalized original values. Any normalization of measurement values, which may be meaningful for diagnostic interpretation or for assessing group variability, influences reliability metrics in a complex and thus difficult-to-trace manner. Consequently, normalized data are not suitable for deriving reliability indices. This was demonstrated in the present study using the example of MVC measurements of the trunk for flexion and extension.
Conclusion
Maximal force measurements of trunk muscles show good intersession reliability. However, when analyzed by sex, they decrease to moderate levels for women in extension and for men in flexion. With respect to force levels men demonstrate approximately 1.5 times higher absolute force values than women, independent of force direction. This difference, however, reduces to approximately 1.1 when anthropometrically normalized values are used. In general, anthropometrically normalized force values enhance comparability of MVC data. However, the impact of normalization on reliability values is variable and can lead to both poorer and better reliability levels. These deviations are not consistent. Therefore, for reliability studies, original (non-normalized) values should always be used to avoid unsystematic distortions of reliability outcomes.
Data availability
All data generated or analyzed during this study are included in this published article and its supplementary information files.
References
Liu, S., Zhu, M., Yu, D. J., Rasin, A. & Young, S. D. Using real-time social media technologies to monitor levels of perceived stress and emotional state in college students: a web-based questionnaire study. JMIR Ment. Health 4, e2. https://doi.org/10.2196/mental.5626 (2017).
Lavreysen, O. et al. An overview of work-related stress assessment. J. Affect Disord. 383, 240–259. https://doi.org/10.1016/j.jad.2025.04.076 (2025).
Lorentz, W. J., Scanlan, J. M. & Borson, S. Brief screening tests for dementia. Can. J. Psychiatry 47, 723–733. https://doi.org/10.1177/070674370204700803 (2002).
Baalmann, A. K. et al. Patient-reported outcome measures for post-COVID-19 condition: a systematic review of instruments and measurement properties. BMJ Open 14, e084202. https://doi.org/10.1136/bmjopen-2024-084202 (2024).
MacDermid, J. C. & Stratford, P. Applying evidence on outcome measures to hand therapy practice. J Hand Ther 17, 165–173. https://doi.org/10.1197/j.jht.2004.02.005 (2004).
Currell, K. & Jeukendrup, A. E. Validity, reliability and sensitivity of measures of sporting performance. Sports Med 38, 297–316. https://doi.org/10.2165/00007256-200838040-00003 (2008).
Hopkins, W. G. Measures of reliability in sports medicine and science. Sports Med 30, 1–15. https://doi.org/10.2165/00007256-200030010-00001 (2000).
Takarada, Y. & Nozaki, D. Maximal voluntary force strengthened by the enhancement of motor system state through barely visible priming words with reward. PLoS ONE 9, e109422. https://doi.org/10.1371/journal.pone.0109422 (2014).
Salonikidis, K. et al. Force variability during isometric wrist flexion in highly skilled and sedentary individuals. Eur. J. Appl. Physiol. 107, 715–722. https://doi.org/10.1007/s00421-009-1184-5 (2009).
Faulks, T., Sansone, P. & Walter, S. A Systematic Review of Lower Limb Strength Tests Used in Elite Basketball. Sports (Basel) 12, https://doi.org/10.3390/sports12090262 (2024).
Ansdell, P. et al. Task-specific strength increases after lower-limb compound resistance training occurred in the absence of corticospinal changes in vastus lateralis. Exp. Physiol. 105, 1132–1150. https://doi.org/10.1113/EP088629 (2020).
Moore, D., Semciw, A. I. & Pizzari, T. A Systematic Review and Meta-Analysis of Common Therapeutic Exercises That Generate Highest Muscle Activity in the Gluteus Medius and Gluteus Minimus Segments. Int. J. Sports Phys. Ther. 15, 856–881. https://doi.org/10.26603/ijspt20200856 (2020).
Wilson, G. J. & Murphy, A. J. The use of isometric tests of muscular function in athletic assessment. Sports Med 22, 19–37. https://doi.org/10.2165/00007256-199622010-00003 (1996).
Sarabon, N., Kozinc, Z. & Perman, M. Establishing Reference Values for Isometric Knee Extension and Flexion Strength. Front. Physiol. 12, 767941. https://doi.org/10.3389/fphys.2021.767941 (2021).
Burden, A. How should we normalize electromyograms obtained from healthy participants? What we have learned from over 25 years of research. J Electromyogr Kinesiol 20, 1023–1035. https://doi.org/10.1016/j.jelekin.2010.07.004 (2010).
Halaki, M. & Ginn, K. in Computational Intelligence in Electromyography Analysis – a Perspective on current Applications and future Challenges (ed G. R. Naik) (InTech, Rijeka, 2012).
Roman-Liu, D., Kaminska, J. & Tokarski, T. M. Population-specific equations of age-related maximum handgrip force: a comprehensive review. PeerJ 12, e17703. https://doi.org/10.7717/peerj.17703 (2024).
Jaafar, M. H. et al. Normative reference values and predicting factors of handgrip strength for dominant and non-dominant hands among healthy Malay adults in Malaysia. BMC Musculoskelet. Disord. 24, 74. https://doi.org/10.1186/s12891-023-06181-8 (2023).
Kim, J., Hegland, K., Vann, W., Berry, R. & Davenport, P. W. Measurement of maximum tongue protrusion force (MTPF) in healthy young adults. Physiol. Rep. 7, e14175. https://doi.org/10.14814/phy2.14175 (2019).
Or, C., Lin, J. H., Wang, H. & McGorry, R. W. Normative data on the one-handed static pull strength of a Chinese population and a comparison with American data. Ergonomics 59, 526–533. https://doi.org/10.1080/00140139.2015.1073793 (2016).
Gracovetsky, S., Farfan, H. & Helleur, C. The abdominal mechanism. Spine 10, 317–324 (1985).
Cholewicki, J., Juluru, K., Radebold, A., Panjabi, M. M. & McGill, S. M. Lumbar spine stability can be augmented with an abdominal belt and/or increased intra-abdominal pressure. European spine journal : official publication of the European Spine Society, the European Spinal Deformity Society, and the European Section of the Cervical Spine Research Society 8, 388–395 (1999).
Hodges, P., Cresswell, A. G., Daggfeldt, K. & Thorstensson, A. In vivo measurement of the effect of intra-abdominal pressure on the human spine. J. Biomech. 34, 347–353 (2001).
Hodges, P. & Gandevia, S. C. Changes in intra-abdominal pressure during postural and respiratory activation of the human diaphragm. J Appl Physiol 89, 967–976 (2000).
Cholewicki, J., Juluru, K. & McGill, S. M. Intra-abdominal pressure mechanism for stabilizing the lumbar spine. J. Biomech. 32, 13–17. https://doi.org/10.1016/S0021-9290(98)00129-8 (1999).
Fairbank, J. C., O’Brien, J. P. & Davis, P. R. Intraabdominal pressure rise during weight lifting as an objective measure of low-back pain. Spine 5, 179–184 (1980).
Helewa, A., Goldsmith, C. H., Lee, P., Smythe, H. A. & Forwell, L. Does strengthening the abdominal muscles prevent low back pain–a randomized controlled trial. J Rheumatol 26, 1808–1815 (1999).
Arokoski, J. P., Valta, T., Kankaanpaa, M. & Airaksinen, O. Activation of lumbar paraspinal and abdominal muscles during therapeutic exercises in chronic low back pain patients. Arch. Phys. Med. Rehabil. 85, 823–832 (2004).
Ebrahimi, H., Blaouchi, R., Eslami, R. & Shahrokhi, M. Effect of 8-week core stabilization exercises on low back pain, abdominal and back muscle endurance in patients with chronic low back pain due to disc herniation. Phys. Treat.-Specific Phys. Therapy J. 4, 25–32 (2014).
Hoogendoorn, W. E., van Poppel, M. N., Bongers, P. M., Koes, B. W. & Bouter, L. M. Systematic review of psychosocial factors at work and private life as risk factors for back pain. Spine 25, 2114–2125. https://doi.org/10.1097/00007632-200008150-00017 (2000).
Kamper, S. J. et al. Multidisciplinary biopsychosocial rehabilitation for chronic low back pain. Cochrane database of systematic reviews 9, CD000963, https://doi.org/10.1002/14651858.CD000963.pub3 (2014).
McCarthy, C. J., Arnall, F. A., Strimpakos, N., Freemont, A. & Oldham, J. A. The biopsychosocial classification of non-specific low back pain: a systematic review. Phys. Therapy Rev. 9, 17–30. https://doi.org/10.1179/108331904225003955 (2004).
Panjabi, M. M. The stabilizing system of the spine. Part I. Function, dysfunction, adaptation, and enhancement. Journal of spinal disorders 5, 383–389 (1992).
Panjabi, M. M. in IV World Congress of Biomechanics (2002).
Kurz, E., Anders, C., Walther, M., Schenk, P. & Scholle, H. C. Force Capacity of back extensor muscles in healthy males - effects of age and recovery time. J Appl Biomech 30, 713–721. https://doi.org/10.1123/jab.2013-0308 (2014).
Mader, L., Herzberg, M. & Anders, C. Reliability of sEMG data of back muscles during static submaximal loading situations − Values and pitfalls. J. Electromyogr. Kinesiol. 79, 102947. https://doi.org/10.1016/j.jelekin.2024.102947 (2024).
Pfeifle, C., Edel, M., Schleifenbaum, S., Kühnapfel, A. & Heyde, C.-E. The reliability of a restraint sensor system for the computer-supported detection of spinal stabilizing muscle deficiencies. BMC Musculoskelet. Disord. 21, 597. https://doi.org/10.1186/s12891-020-03597-4 (2020).
McNair, P. J., Depledge, J., Brettkelly, M. & Stanley, S. N. Verbal encouragement: effects on maximum effort voluntary muscle action. Br. J. Sports Med. 30, 243–245 (1996).
Shrout, P. E. & Fleiss, J. L. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 86, 420–428 (1979).
McGraw, K. O. & Wong, S. P. Forming Inferences About Some Intraclass Correlation Coefficients. Psychol. Methds 1, 30–46 (1996).
Asendorpf, J. & Wallbott, H. G. Maße der Beobachterübereinstimmung: ein systematischer Vergleich. Zeitschrift für Sozialpsychologie 10, 243–252 (1979).
Koo, T. K. & Li, M. Y. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J. Chiropr. Med. 15, 155–163. https://doi.org/10.1016/j.jcm.2016.02.012 (2016).
Brown, J. D. Standard error vs. Standard error of measurement. Shiken Research Bulletin 3, 20–25 (1999).
Lexell, J. E. & Downham, D. Y. How to assess the reliability of measurements in rehabilitation. Am J Phys Med Rehab 84, 719–723 (2005).
Rasch, B., Friese, M., Hofmann, W. & Naumann, E. Quantitative Methoden 1: Einführung in die Statistik für Psychologie, Sozial-& Erziehungswissenschaften. (Springer, 2021).
Losa-Iglesias, M. E., Becerro-de-Bengoa-Vallejo, R. & Becerro-de-Bengoa-Losa, K. R. Reliability and concurrent validity of a peripheral pulse oximeter and health-app system for the quantification of heart rate in healthy adults. Health Inf. J. 22, 151–159. https://doi.org/10.1177/1460458214540909 (2016).
Soares, A. L. C., Carvalho, R. F., Mogami, R., Meirelles, C. d. M. & Gomes, P. S. C. Validity, reliability and measurement error of quadriceps femoris muscle thickness obtained by ultrasound in healthy adults: a systematic review. Revista Brasileira de Cineantropometria & Desempenho Humano 25, e93936 (2023).
Portney, L. G. Foundations of clinical research: applications to evidence-based practice. 4th edn, (FA Davis, 2020).
Beckerman, H. et al. Smallest real difference, a link between reproducibility and responsiveness. Qual. Life Res. 10, 571–578 (2001).
Lakens, D. Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Front Psychol 4, https://doi.org/10.3389/fpsyg.2013.00863 (2013).
Frontera, W. R., Hughes, V. A., Lutz, K. J. & Evans, W. J. A cross-sectional study of muscle strength and mass in 45- to 78-yr-old men and women. J Appl Physiol 71, 644–650 (1991).
Janssen, I., Heymsfield, S. B., Wang, Z. M. & Ross, R. Skeletal muscle mass and distribution in 468 men and women aged 18–88 yr. J Appl Physiol 89, 81–88 (2000).
Thorstensson, A. & Carlson, H. Fibre types in human lumbar back muscles. Acta Physiol. Scand. 131, 195–202 (1987).
Mannion, A. F. Fibre type characteristics and function of the human paraspinal muscles: normal values and changes in association with low back pain. J Electromyogr Kinesiol 9, 363–377. https://doi.org/10.1016/S1050-6411(99)00010-3 (1999).
Cuddigan, J. H. Quadriceps femoris strength. Rheumatol Rehabil 12, 77–83. https://doi.org/10.1093/rheumatology/12.2.77 (1973).
Gabriel, D. A., Kamen, G. & Frost, G. Neural adaptations to resistive exercise: mechanisms and recommendations for training practices. Sports Med 36, 133–149. https://doi.org/10.2165/00007256-200636020-00004 (2006).
Solum, M., Loras, H. & Pedersen, A. V. A Golden Age for Motor Skill Learning? Learning of an Unfamiliar Motor Task in 10-Year-Olds, Young Adults, and Adults, When Starting From Similar Baselines. Front Psychol 11, 538. https://doi.org/10.3389/fpsyg.2020.00538 (2020).
Bak, P., Anders, C., Bocker, B. & Smolenski, U. C. Reliability of the measurement of isometric maximal voluntary contraction of trunk muscles in healthy subjects. Phys Med Rehab Kuror 13, 28–34. https://doi.org/10.1055/S-2003-37669 (2003).
Anders, C., Ludwig, F., Sänger, F. & Marks, M. Eight Weeks Sit-Up versus Isometric Abdominal Training: Effects on Abdominal Muscles Strength Capacity. Arch Sports Med 4, 198–204. https://doi.org/10.36959/987/252 (2020).
Sink, C. A. & Mvududu, N. H. Statistical power, sampling, and effect sizes three keys to research relevancy. Counseling Outcome Res. Evaluat. 1, 1–18 (2010).
Frost, L. R., Gerling, M. E., Markic, J. L. & Brown, S. H. M. Exploring the effect of repeated-day familiarization on the ability to generate reliable maximum voluntary muscle activation. J. Electromyogr. Kinesiol. 22, 886–892. https://doi.org/10.1016/j.jelekin.2012.05.005 (2012).
Argus, C. K., Gill, N. D., Keogh, J. W. L. & Hopkins, W. G. Acute Effects of Verbal Feedback on Upper-Body Performance in Elite Athletes. J. Strength Cond. Res. 25, 3282–3287. https://doi.org/10.1519/JSC.0b013e3182133b8c (2011).
Funding
Open Access funding enabled and organized by Projekt DEAL. Berufsgenossenschaft Nahrungsmittel und Gastgewerbe, 2.11.11.
Author information
Authors and Affiliations
Contributions
C.An. designed the study, C.A., M.H. and L.M. collected the data. C.An., C.A. and L.M. performed the statistical analysis. C.A., M.H. and L.M. analyzed the data, C.An. wrote the main manuscript text, all authors read and approved of the final manuscript.
Corresponding author
Ethics declarations
Competing interests
Hereby all authors disclose any competing financial and non-financial interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Anders, C., Alex, C., Herzberg, M. et al. Normalized data may have limitations in determining the reliability of MVC measurements. Sci Rep 15, 32930 (2025). https://doi.org/10.1038/s41598-025-20014-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-20014-9