Introduction

In recent years, increasing evidence has demonstrated the critical role of cardiorespiratory fitness (CRF) in reducing both cardiovascular and all-cause mortality1,2,3, with its prognostic value firmly established across various cardiovascular diseases4,5. CRF has been recognized as a vital indicator of cardiovascular health, even recommended by the American Heart Association as a novel vital sign6. Peak oxygen consumption (V̇O2peak) is considered the “gold standard” for laboratory-based CRF measurements and represents the most significant indicator in cardiopulmonary exercise testing (CPET)7. V̇O2peak reflects an individual’s maximal capacity to uptake, transport, and utilize oxygen, serving as both a prognostic marker for preoperative risks and an evaluative tool for assessing responses to exercise training or the effectiveness of patient treatments7,8. CPET is currently the sole assessment tool capable of providing a comprehensive, objective, and quantitative evaluation of overall cardiorespiratory function in a single measurement9. An individual’s V̇O2peak is meaningful only when referenced against the normal values of healthy individuals. As the importance of accurately quantifying CRF through CPET has grown, especially in apparently healthy men and women, it has become essential for improving individualized risk assessments, guiding clinical decisions, and enhancing prognoses10.

It is well-known that reference values for V̇O2peak vary due to influences such as region, race, sex, age, body type, lifestyle, and exercise habits11. As age increases, V̇O2peak tends to decrease, with men generally showing higher values than women. Most commonly used equations provide inaccurate estimates of CRF in obese patients, especially those with severe obesity and low CRF12. Because athletes typically have higher V̇O2peak values, comparing their measured V̇O2peak with the predicted values for the general active population could lead to misdiagnosis13. Therefore, each exercise laboratory should select an appropriate set of reference values to best reflect the characteristics of the tested population. Despite numerous studies in this field, the majority have been concentrated in Europe and the Americas14,15,16. In recent years, there has been a rapid increase in CPET-related research among the Chinese population, although research on reference values and prediction equations for V̇O2peak remains limited.

Therefore, the primary objective of this study is to establish a prediction equation for V̇O2peak in Chinese adults using an electronically braked cycle ergometer for CPET and to assess the impact of sex, age, and body mass index (BMI) on reference values. Previous research has indicated that CRF in obese populations is often underestimated, and as such, we incorporate both lean body mass (LBM) and body surface area (BSA) to construct the equation17.

Method

Participants

This retrospective study encompassed 22,943 CPET data sets conducted at the cardiovascular rehabilitation center of Peking University Third Hospital (PUTH) from September 1, 2017, to September 1, 2023. All participants underwent an evaluation by an experienced physician prior to exercise. The pre-exercise assessment included anamnesis, physical examination, resting electrocardiogram, exercise habit, and medication. Height and body weight were recorded to the nearest 0.5 cm and 0.1 kg, respectively, with participants dressed in light clothes and no shoes. The inclusion criteria for participants were: (1) age ≥ 18 years; (2) tested on electronically braked cycle ergometers; (3) in the case of multiple tests on the same participant, only the first test result was considered to avoid bias due to familiarity with CPET. Participants with a history of cardiovascular or pulmonary diseases, as well as musculoskeletal or neurological disorders, were excluded from the cohort. Participants who terminated the test due to abnormal clinical findings before reaching voluntary maximal effort or the occurrence of a positive electrocardiographic reaction were also excluded. A total of 9,155 participants were included, and additional participant information is illustrated in (Fig. 1).

Fig. 1
Fig. 1
Full size image

Flowchart of the inclusion profile in the study. CPET cardiopulmonary exercise testing.

This study adhered to the STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) guidelines18 and was carried out in compliance with the amended Declaration of Helsinki19. Due to the retrospective nature of the study, Peking University Third Hospital Medical Science Research Ethics Committee (M2023790) waived the need of obtaining informed consent.

Cardiopulmonary exercise testing

We used electronically braked cycle ergometer for symptom-limited exercise testing to evaluate V̇O2peak. Gas exchange and ventilatory variables were continuously analyzed on a breath-by-breath basis using the computer-based MedGraphics system (Minnesota, USA) and Madecare system (Beijing, China). Despite differences in CPET equipment, all operators underwent uniform training to maintain consistent procedures, following the guidelines20. Calibration of the gas exchange measurement systems was performed daily before testing to ensure the accuracy of gas exchange determinations. V̇O2peak was defined as the highest average value during a sequential 30-s period.

The certified and trained clinical exercise physiologists performed the CPET. Standardized procedures and instructions were followed to ensure consistent testing conditions for all participants, with a physician available at all times upon request. Before each test, the equipment was set up in a standardized manner, with reference gas calibration and volume calibration were performed. In the absence of chest pain and electrocardiogram abnormalities, participants were encouraged to exercise continuously until reaching maximal fatigue, which was defined as voluntary exhaustion, dyspnea, or fatigue.

The protocol included a 3 min resting period, followed by a 3 min unloaded warm-up, with a pedaling cadence of 60–70 revolutions per minute. Individualized continuous incremental load programs were devised based on the protocol, with a preset load increment rate of 10–40 W/min16. Participants were set to achieve symptom-limited CPET within 8–12 min. Upon self-perceived fatigue (such as shortness of breath and leg fatigue), participants entered the recovery period, during which data were continuously recorded for 6 min. 12-lead electrocardiogram was monitored continuously. Blood pressure was recorded every 3 min using an automatic digital sphygmomanometer. Oxygen saturation (SPO2) was continuously recorded by a pulse oximeter. A CPET was deemed valid if at least two of the following criteria were met: (1) RER ≥ 1.0; (2) rating of perceived exertion > 17; (3) V̇O2 reaches a plateau or slightly declines despite increased load.

Development of the PUTH equation and comparison with other prediction equations

Data from participants between September 1, 2017, and August 31, 2021 (N = 4,531) constituted the derivation cohort, while data from September 1, 2021, to September 1, 2023 (N = 4,624) were used as a temporal validation cohort for external validation.

The derivation cohort underwent backward multivariate regression analysis to generate the V̇O2peak prediction equation. Considering the correlation between BSA and LBM with CRF21,22, we attempted to incorporate these indicators into the V̇O2peak equation. These indicators were calculated using formulas derived from Chinese data23,24, rather than being directly measured by the device. In the validation cohort, we compared the predicted V̇O2peak and the percentage of achieved V̇O2peak across the PUTH equation. For comparison with the measured V̇O2peak, predicted V̇O2peak for each participant were determined using the PUTH equation, Xiangya equation25, Wasserman equation16, and FRIEND equation15. The comparison was conducted both overall and stratified by sex, age groups, and BMI categories (underweight < 18.5 kg·m−2, normal weight 18.5–23.9 kg·m−2, overweight 24.0–27.9 kg·m−2, obese ≥ 28 kg·m−2)26.

Statistical analysis

Data analysis was conducted using SPSS 26.0 (IBM, Armonk, New York). Continuous variables are reported as mean ± standard deviation, while categorical variables are reported as frequency (percentage). Additionally, one-way analysis of variance and chi-square tests were used for comparisons between derivation and validation cohorts. Box plots were generated to analyze trends in V̇O2peak across different age groups. Bland–Altman plots and scatter plots were constructed to illustrate differences between the measured and predicted values from the PUTH equation, Xiangya equation, Wasserman equation, and FRIEND equation. P value < 0.05 was considered statistically significant.

Results

Baseline characteristics of PUTH derivation cohort and PUTH validation cohort

The baseline characteristics of PUTH derivation cohort and PUTH validation cohort are presented in Table 1. The PUTH derivation cohort included 4,531 participants with a mean age of 50.7 years, of whom 48.8% were men. The PUTH validation cohort included 4,624 participants with a mean age of 46.1 years, with 48.5% being men. Figure 2 illustrates the distribution of V̇O2peak in derivation cohort by sex and age groups. V̇O2peak in both men and women declined as a whole with age. At all ages, men had higher V̇O2peak than women.

Table 1 Baseline characteristics of the derivation cohort and validation cohort.
Fig. 2
Fig. 2
Full size image

The distribution of V̇O2peak in derivation cohort by sex and age groups.

There were significant differences in age, height, weight, and BMI between the PUTH derivation and validation cohorts (p < 0.05). Overall, participants in the PUTH validation cohort were younger (46.1 ± 14.4 vs. 50.7 ± 14.6 years), taller (167.3 ± 8.4 vs. 166.6 ± 8.4 cm), lighter (69.0 ± 13.9 vs. 69.7 ± 13.5 kg), and had a lower BMI (24.5 ± 3.8 vs. 25.0 ± 3.8 kg·m−2) compared to the derivation cohort. However, there were no significant differences in sex, BSA, and LBM between the two cohorts. Additionally, in terms of CPET parameters, compared to the PUTH derivation cohort, the PUTH validation cohort had a lower peak RER (1.17 ± 0.10 vs. 1.18 ± 0.11, p < 0.001) and a higher V̇O2peak (23.0 ± 6.6 vs. 21.6 ± 6.1 mL·kg−1·min−1, p < 0.001).

PUTH prediction equation

The backward multivariate regression analysis revealed that sex, age, square of height, logarithmic transformations of BMI, logarithmic transformations of BSA, and logarithmic transformations of LBM were significant predictors of V̇O2peak (adjusted R2 = 0.624, p < 0.001). The equation is as follows:

V̇O2peak (mL·min−1) = −24364.9–621.3×Sex (Women=1, Men=2)−10.7×Age+0.2×Height2 (cm) + 6464.7×Log(BMI)−24997.2×Log(BSA) + 12388.6×Log(LBM).

Comparison of measured V̇O2peak and predicted V̇O2peak by the PUTH equation in the PUTH validation cohort

We calculated the predicted V̇O2peak using the PUTH equation in the validation cohort, and compared it with measured V̇O2peak by stratifying participants by sex, age groups, and BMI categories, as shown in Table 2. Measured and predicted V̇O2peak differed significantly between men and women, participants aged 30–39, 40–49, and 50–59 years, and individuals with BMI values of 18.5–23.9 and 24–28 kg·m−2 (p < 0.001). The percentage of predicted V̇O2peak was 102.5%, indicating minimal overall differences between measured and predicted values in the entire cohort. This finding highlights the high accuracy and broad applicability of the PUTH equation in predicting V̇O2peak.

Table 2 Measured and predicted V̇O2peak values by the PUTH equation and percentage of predicted V̇O2peak in the validation cohort were stratified by sex, age groups, and BMI categories.

Comparison of the mean difference between measured and predicted V̇O2peak of different equations in the PUTH validation cohort

We compared the mean differences between measured and predicted V̇O2peak using the PUTH, Xiangya, Wasserman, and FRIEND equations in the PUTH validation cohort. In the total population, the absolute mean difference between the measured and predicted V̇O2peak using the PUTH equation was the smallest (37.0 ± 349.7 mL·min⁻1), followed by the mean differences for the Xiangya (209.6 ± 367.0 mL·min⁻1), Wasserman (456.5 ± 426.5 mL·min⁻1), and FRIEND equations (739.7 ± 487.6 mL·min⁻1), respectively. A statistical difference was found among the equations (p < 0.001).

When the study population was further divided by sex, age, and BMI, the mean differences between measured and predicted V̇O2peak were significantly smaller with the PUTH equation compared to the other three equations in most subgroups (p < 0.001). Except for the 80–89 age group, where the Xiangya equation showed the smallest difference (0.5 ± 173.2 mL·min⁻1), the PUTH equation generally had the smallest absolute differences in all other sex, age groups, and BMI categories followed by the Xiangya, Wasserman, and FRIEND equations (see Table 3).

Table 3 Difference in measured and predicted V̇O2peak values by multiple equations in the validation cohort were stratified by sex, age groups, and BMI categories.

Comparison of people whose absolute value of (measured V̇O2peak-predicted V̇O2peak)/measured V̇O2peak is > 20% in different prediction equations

Table 4 presents the proportion of individuals in the PUTH validation cohort whose absolute value of (measured V̇O2peak-predicted V̇O2peak) / measured V̇O2peak exceeded 20%, based on different equations and stratified by sex, age groups, and BMI categories. In the validation cohort, the proportion exceeding 20% were 33.0, 39.5, 62.9, and 81.4% for the PUTH, Xiangya, Wasserman, and FRIEND equations, respectively (p < 0.001). The PUTH equation had the fewest individuals exceeding 20% in both men and women. Among age groups, those under 70 had the fewest instances exceeding 20% with the PUTH equation, while those over 70 had the fewest with the Xiangya equation. In all BMI categories, the PUTH equation resulted in the lowest number of individuals exceeding the 20% threshold. These differences were statistically significant across all equations (p < 0.001).

Table 4 Comparison of people whose absolute value of (measured V̇O2peak-predicted V̇O2peak)/measured V̇O2peak is > 20% in different prediction equations.

Consistency evaluation and accuracy between prediction equations

The consistency evaluation between measured and predicted results using Bland–Altman analysis for different equations is shown in Fig. 3. Among sex-specific evaluations, the PUTH equation exhibited the least bias for both men (50.2 mL·min−1) and women (24.6 mL·min−1). The Xiangya equation showed a slightly higher bias (Men = 284.9 mL·min−1; Women = 138.6 mL·min−1), while the FRIEND equation exhibited the highest bias (Men = -1021.6 mL·min−1; Women = −473.7 mL·min−1).

Fig. 3
Fig. 3
Full size image

The consistency evaluation between measured and predicted results for different equations. Bland–Altman plots were utilized to depict differences obtained from the PUTH, Xiangya, Wasserman, and FRIEND equations.

Figure 4 presents scatter plots comparing the measured V̇O2peak with the predicted values from four equations for men and women. The PUTH equation aligns most closely with the 45-degree reference line, demonstrating the highest accuracy. In contrast, the Xiangya equation systematically underestimates V̇O2peak across the entire range, while the Wasserman and FRIEND equations overestimate V̇O2peak, particularly at higher values. These findings indicate that the PUTH equation provides the most accurate predictions across different sex groups and V̇O2peak levels.

Fig. 4
Fig. 4
Full size image

Scatter plots comparing measured V̇O2peak values with predicted values from four equations (Xiangya, Wasserman, FRIEND, and PUTH). The dashed 45-degree line represents the line of identity, where perfect agreement between measured and predicted values would lie. Points deviating from the identity line indicate discrepancies between measured and predicted values.

Discussion

This study presents a new reference equation for V̇O2peak in Chinese adults, derived from a large cohort tested with electronically braked cycle ergometers. The PUTH equation, which incorporates sex, age, height, BMI, LBM, and BSA, outperforms existing equations in predicting V̇O2peak, providing more accurate results across different BMI categories. Notably, it highlights the importance of body composition factors, especially LBM and BSA, in estimating CRF. The PUTH equation demonstrates minimal bias and better accuracy, making it highly applicable in clinical settings for diverse age and BMI groups.

Factors influencing V̇O2peak

Previous research has indicated that V̇O2peak is associated with genetics, age, sex, body type, exercise habits, lifestyle, and cardiovascular condition14,25,27,28,29,30. A systematic review suggested that V̇O2peak is higher in men than women and decreases with age30. Consistent with these findings, our study based on a Chinese cohort also revealed higher measured V̇O2peak in men, gradually decreasing with age (see Fig. 2). However, our study provides some unique insights, especially regarding the consideration of body composition and BMI. Our study further emphasizes the critical role of LBM and BSA in predicting V̇O2peak, significantly improving the accuracy of V̇O2peak prediction. Additionally, ethnic differences also play a crucial role in physiological functions, influenced by genetics, environment, and social factors31, affecting the overall function of oxygen intake through respiration, transportation by the cardiovascular system to organs and skeletal muscles, and eventual elimination of carbon dioxide32,33. A study conducted in China by Yan et al. demonstrated significant differences in forced expiratory volume in 1 s (FEV1) and forced vital capacity (FVC) among different ethnic groups, even after adjusting for age, height, sex, and residence differences34. These findings further underscored the complex interplay between body composition, genetic background, and other environmental factors, suggesting that a comprehensive consideration of multiple factors was essential when assessing V̇O2peak, particularly across different national and ethnic groups for personalized analysis.

Significance of establishing V̇O2peak prediction equation in China

V̇O2peak is a key indicator of CRF and has established reference values based on cohort studies in the United States and Europe. However, as shown by the results of this study, widely used foreign equations are not applicable to the Chinese population, possibly due to significant differences in genetics, lifestyle, body type, and environmental factors. Most of the existing V̇O2peak prediction equations for Chinese adults have been derived from small sample sizes, which may not fully reflect the physiological characteristics of the Chinese population25,35,36. Knowing an individual’s CRF as a percentage of the normal range for their age and sex helps classify their fitness level, providing a basis for activity counseling, risk stratification, and therapy evaluation. Establishing a reference equation tailored to the Chinese population is crucial for clinical practice and public health, as it will provide a reliable tool for evaluating CRF across diverse age groups and health conditions in China.

Advantages of the PUTH equation

By including 9155 participants from China and conducting temporal validation on the validation cohort, we developed the PUTH equation to predict V̇O2peak reference values for the Chinese population. Consistent with most studies in Western countries, sex, age, and height were identified as important variables in calculating V̇O2peak reference values. Additionally, our PUTH equation included BMI, BSA, and LBM as independent variables, introducing new variables not previously considered in Chinese population-based cohort studies, which is essential for personalized health assessments, disease prevention, and the development of targeted interventions.

The derivation of the PUTH equation is widely representative, with participants spanning a broad range of ages (18–88 years), weights (40.0–186.9 kg), and heights (140.0–207.0 cm), including a substantial number of women (Women = 2321, 51.2%). Furthermore, the effectiveness of PUTH equation was also assessed in a temporal validation cohort within the Chinese population. In the validation sample, women accounted for 51.5%, and the age (18–89 years), height (143.0–196.0 cm), and weight (38.4–171.5 kg) ranges were similarly broad, effectively covering various subgroups, including women, youth, elderly, and obese individuals (see Table 1). This is uncommon in previous equations and enables a more precise assessment of CRF in these particular populations. Furthermore, in the validation cohort, the ratio of measured to predicted V̇O2peak values was 102.5%, indicating minimal differences between measured and predicted V̇O2peak values across different sex, age groups, and BMI categories, particularly in young (< 30 years old) and elderly (> 60 years old) individuals, as well as underweight (BMI < 18.5 kg·m−2) and obese individuals (BMI ≥ 28 kg·m−2), with no significant statistical differences (see Table 2).

Comparison of the PUTH prediction equation with other prediction equations

Our study yielded several key findings. First, compared to the Xiangya, Wasserman, and FRIEND equations used for calculating V̇O2peak reference values, the PUTH equation produced values that remained more stable across different sex, age groups, and BMI categories. Specifically, the mean difference between measured and predicted V̇O2peak values using the PUTH equation was minimal (37.0 mL·min⁻1), whereas the mean differences for the Xiangya, Wasserman, and FRIEND equations were considerably larger (209.6, −456.5, and −739.7 mL·min⁻1, respectively; see Table 3). Consequently, the V̇O2peak levels observed in our participants were lower than those reported in the Wasserman and FRIEND studies.

Secondly, when comparing the absolute values of (measured V̇O2peak—predicted V̇O2peak)/measured V̇O2peak > 20% among different predictive equations, the PUTH equation demonstrated the lowest percentage (33.0%), compared to 39.5, 62.9, and 81.4% for the Xiangya, Wasserman, and FRIEND equations, respectively. This trend was consistent across sex, age, and BMI subgroups (see Table 4).

Third, Bland–Altman analyses revealed that, for both males and females, the PUTH equation exhibited smaller deviations than the other equations (see Fig. 3). These findings were corroborated by the scatter plots shown in Fig. 4.

Regarding the disparities in predicting V̇O2peak reference values in the Chinese population using different equations, we observed that the Xiangya equation underestimated these values, while the Wasserman and FRIEND equations overestimated them, with the greatest disparities found in the Wasserman and FRIEND equations (see Table 3). The primary reason for the disparities in the Xiangya equation may be its smaller sample size (964), leading to limited representativeness.

Several factors may contribute to our V̇O2peak reference values being significantly lower than those of other countries. Firstly, this difference may be associated with the overall body type, weight, and height of the study population. Whether compared to the participants of the Wasserman and FRIEND equations in this study or to those in relevant Western studies, the Chinese population generally exhibits lower BMI, body weight, and height14,15,16,37. Secondly, differences in predicted values may also be attributed to ethnicity. Asian populations demonstrate distinct differences in skeletal muscle physiology, lung function, and chest wall anatomical structure compared to Western populations, potentially resulting in further differences in exercise capacity38,39. Additionally, variations in heart structure and function may also contribute to disparities in V̇O2peak reference values. Lastly, differences in methodology of testing15,28, participant selection bias14,40, habitual activity levels40, and obesity levels15,21 may all contribute to variations in predicted V̇O2peak. When the cohort used to develop the prediction equation includes a substantial number of individuals whose characteristics closely resemble those of the target population, the resulting prediction equation is preferable. Based on these results, we propose that the PUTH equation may outperform traditional equations in predicting V̇O2peak reference values within the Chinese population, thereby providing a more accurate standard for assessing CRF.

Strengths and limitations

The primary strengths of this study include its large sample size and the inclusion of specific populations, namely women, older adults, and individuals with obesity. Additionally, this study emphasizes the impact of body composition and BMI, which has not been adequately considered in previous prediction equations for the Chinese population.

This study has several limitations. First, not all participants reached V̇O2peak, as some did not attain the plateau phase of oxygen consumption, potentially leading to an underestimation of V̇O2peak. Second, there was an uneven distribution of participants across both age and BMI groups, with a particular lack of individuals aged 70 and older, as well as insufficient representation in the underweight and obese groups. This imbalance may limit the generalizability of the findings, especially to older adults and individuals with extreme BMI values. Finally, being a single-center study, despite nationwide recruitment, selection bias cannot be fully excluded, which may affect the external validity of the findings.

For future research, a multi-center prospective design is recommended to achieve a more balanced age and BMI distribution. We will also explore indicators like ventilation and oxygen uptake efficiency slope to better understand cardiopulmonary function and overall health.

Conclusion

This study introduces the PUTH equation, a new V̇O2peak prediction equation for Chinese adults. The equation shows reduced bias and superior accuracy compared to existing equations, with particular emphasis on the role of body composition (such as LBM and BSA) in predicting cardiovascular health, demonstrating broad clinical applicability potential. However, further validation through large-scale, multi-center studies is required to confirm its broader applicability.