Introduction

Accurate interpretation of pulmonary function tests is critical in the diagnosis and management of various respiratory conditions, including chronic obstructive pulmonary disease (COPD) and restrictive lung diseases. Spirometry, which measures airflow and volume during inhalation and exhalation, is a standard physiological tool in clinical practice for assessing lung function. In 2012, the Global Lung Function Initiative (GLI) developed the prediction equations for spirometric indices and their lower limits of normal for worldwide application, with the predicted values being calculated based on individuals’ age, sex, height, and race1. These race-specific 2012-GLI spirometry reference equations have been widely used in clinical care and research over the recent decade.

However, there are accumulating evidence for potential biases and health disparities using the 2012-GLI race-specific reference equations for spirometry interpretation2,3,4,5. Race adjustment may cause inequities by underdiagnosing or misclassifying respiratory conditions in certain racial groups. The GLI has subsequently introduced the race-neutral reference equations in 2022 for a more equitable approach to interpreting spirometry results across diverse populations6. With their potential benefits, there have been accumulating literature evaluating the clinical implications of switching from the race-specific 2012-GLI to race-neutral 2022-GLI reference equations using different patient cohorts, although non-Western populations were underrepresented5,6,7,8,9,10,11. In particular, the ramifications of this shift to a race-neutral approach for interpreting lung function in Northeast Asians, which account for approximately 20% of the worldwide population12,13,14remain unexplored.

This study aimed to investigate the impact of changing from the race-specific 2012-GLI to the race-neutral 2022-GLI reference equations for spirometry interpretation (including obstruction, possible restriction, and preserved ratio impaired spirometry (PRISm) and medical disability classification. Additionally, we evaluated the relationships between lung function percent-predicted values derived from both GLI reference equations and individuals’ respiratory symptoms, as well as their longitudinal trajectories. We hypothesized that adoption of the race-neutral 2022-GLI spirometry reference equations would affect the frequency of lung function impairment, the direction of medical disability classification, and the magnitude of longitudinal changes in Northeast Asian populations, despite the consistent associations with respiratory symptoms.

Methods

Study population

We utilized data from the Korean Genome Epidemiology Study (KoGES) Ansung-Ansan cohort, with detailed methodological information being previously published15. This population-based longitudinal study enrolled adults between 2001 and 2002 who aged 40–69 years at baseline living in Ansung and Ansan, representing a rural and an urban community, respectively, and were followed up biannually until 2018. Ethical approval was obtained from the Ethics Committee of Incheon St. Mary’s Hospital (the Institutional Review Board number: OC23ZISI0033). Written informed consent was collected from all participating individuals.

Clinical variables

During the baseline assessment, demographic data were collected, including age, sex, body mass index (BMI), smoking history, and biomass exposure. A structured questionnaire was also used to evaluate medical history, with the following data being included in this study for respiratory symptoms (chronic bronchitis and modified Medical Research Council [mMRC] dyspnea score) and comorbidities (hypertension, diabetes mellitus, coronary artery disease [CAD], congestive heart failure [CHF], kidney disease, and cerebrovascular disease). Chronic bronchitis was defined as the presence of cough and sputum for three months in two consecutive years16. In addition, serial spirometry measurements of forced expiratory volume in 1 s (FEV1) and forced vital capacity (FVC) were extracted from baseline to the 7 th follow-up in 2013–2014.

Lung function measurements

Pulmonary function tests were performed by a trained technician using a standard spirometer (Vmax-2130; Sensor Medics, Yorba Linda, CA, USA) according to the American Thoracic Society and the European Respiratory Society guidelines for calibration and quality assurance17. Pre-bronchodilator values of FEV1, and FVC, the FEV1/FVC ratio were obtained. The lower limit of normal (LLN) and percent-predicted values for FEV1 and FVC using the race-specific 2012-GLI and the race-neutral 2022-GLI reference equations were calculated using the ‘rspiro’ package in R software18.

Lung function interpretation

Spirometry measurements were categorized into different lung function physiologic groups as normal (FEV1/FVC, FEV1, and FVC ≥ LLN), obstruction (FEV1/FVC ratio < LLN, with FEV1 and FVC values that may or may not fall above the LLN), possible restriction excluding PRISm (FEV1/FVC ratio ≥ LLN, with FVC < LLN and FEV1 ≥ LLN), and PRISm (FEV1/FVC ratio ≥ LLN, with FEV1 < LLN). The severity of FEV1 impartment was categorized as Group 1 (FEV1 ≥ 80% predicted), Group 2 (FEV1 50–80% predicted), Group 3 (FEV1 30–50% predicted), and Group 4 (FEV1 < 30% predicted), which were adopted from the 2024 Global Initiative for Chronic Obstructive Lung Disease (GOLD) guidelines for stratifying the severity of airflow obstruction in patients with COPD19.

Medical disability classification

We categorized individuals’ status of disability classification according to the spirometric criteria for the 6 th edition of the American Medical Association Guides to the Evaluation of Permanent Impairment20. There are five classes of disability: Class 0 for both FEV₁ and FVC ≥ 80% predicted, Class 1 for FVC 70–79% predicted or FEV₁ 65–79% predicted, Class 2 for FVC 60–69% predicted or FEV₁ 55–64% predicted, Class 3 for FVC 50–59% predicted or FEV₁ 45–54% predicted, and Class 4 for FVC < 50% predicted or FEV₁ < 45% predicted.

Statistical analysis

Statistical analyses were performed using R software (version 4.3.2; R Development Core Team, Vienna, Austria). Categorical data are presented as counts (percentages), while continuous data are shown as mean ± standard deviation. P-value < 0.05 was considered statistically significant.

Lung function physiologic, FEV1 impairment, and medical disability classification

The distributions of different lung function physiologic, FEV1 impairment, and medical disability groups were determined separately based on percent-predicted values derived using both GLI reference equations and compared. Agreements of the classification of the lung physiologic, FEV1 impairment, and medical disability groups between the two GLI reference equations were analyzed with the Cohen’s kappa test with squared weights21. Cohen’s kappa values are classified as follows: ≤ 0 as no agreement, 0.01 to 0.20 as slight agreement, 0.21 to 0.40 as fair agreement, 0.41 to 0.60 as moderate agreement, 0.61 to 0.80 as strong agreement, and 0.81 to 1.00 as almost perfect agreement.

Association of baseline FEV1 and FVC percent-predicted and respiratory symptoms

Multivariate linear regression models were used to evaluate relationships between baseline FEV1 and FVC percent-predicted values using different GLI reference equations and mMRC dyspnea scale. Additionally, associations of baseline FEV1 and FVC percent-predicted values using different GLI reference equations and the presence of chronic bronchitis were analyzed using multivariate logistic regression models. Adjusted variables for both analyses included age, sex, BMI, and smoking status.

Longitudinal lung function trajectories

Changes of FEV1 and FVC percent-predicted values using the race-specific 2012-GLI and the race-neutral 2022-GLI reference equations over 12 years were analyzed using linear mixed models, with adjustment for age, sex, BMI, and smoking status. A subgroup analysis was performed based on individuals’ sex.

Results

Baseline characteristics

A total of 10,030 individuals were enrolled in this study (Table 1). They had a mean age of 52.3 ± 8.9 years, with 47% being male and 41% being ever-smokers. Their mean mMRC score was 1.7 ± 1.6. At the baseline, mean FEV1 was 2.9 ± 0.7 L, FVC was 3.6 ± 0.9 L, and FEV1/FVC ratio was 0.80 ± 0.08.

Table 1 Baseline characteristics.

Lung function physiologic, FEV1 impairment, and medical disability classification

The agreement of lung physiologic groups according to two GLI references was moderate with the Cohen’s Kappa of 0.54 (Figure S1). There were 8164 individuals (83.5%) classified as normal and 749 (7.7%) as obstruction for both reference equations (Table 2). An additional 289 individuals (3.0%) had obstruction with the use of the race-specific 2012-GLI reference equations. Most individuals (n = 513, 88.3%) who were classified as PRISm or possible restriction using the race-specific 2012-GLI reference equations were re-classified to the normal group when applying the race-neutral 2022-GLI reference equations.

Table 2 Differential distribution of patients with normal spirometry and obstruction using the race-specific 2012-GLI and race-neutral 2022 GLI reference equations.

The distribution of FEV1 impairment groups using both GLI reference equations is presented in Figure S2. Agreement of the FEV1 impairment classification according to two GLI references was almost perfect with the Cohen’s Kappa of 0.81, with 1.8% (n = 181) patients being reclassified to a worse group using the race-specific 2012-GLI reference equations compared to the race-neutral 2022-GLI reference equations.

For medical disability, the agreement of classification according to two GLI reference equations was almost perfect with the Cohen’s Kappa of 0.89. Most of the population (93.4%) were classified as normal (Class 0) for both GLI reference equations (Fig. 1). Among individuals with disabilities due to lung function impairment (Class 1–4) based on the race-specific 2012-GLI reference equations, 40% (n = 258) were re-classified to a less severe disability class using the race-neutral 2022-GLI reference equations. No individuals exhibited the opposite shift.

Fig. 1
figure 1

Sankey plot for the differences of medical disability classification derived using the race-specific 2012-GLI and race-neutral 2022-GLI. GLI, Global Lung Function Initiative.

Association of baseline FEV1 and FVC percent-predicted values with respiratory symptoms

There were consistent relationships between FEV1 percent-predicted values for both GLI reference equations and respiratory symptoms (Tables 3 and 4). Lower FEV1 percent-predicted was associated with worsened dyspnea measured using the mMRC scale (p < 0.002 for both) and the presence of chronic bronchitis (p < 0.001 for both). FVC percent-predicted values for both GLI reference equations were consistently associated with the mMRC scale, although neither was associated with the presence of chronic bronchitis.

Table 3 Association of FEV1 percent-predicted using race-specific 2012-GLI and race-neutral 2022-GLI reference equations and dyspnea.
Table 4 Association of FEV1 percent-predicted using race-specific 2012-GLI and race-neutral 2022-GLI reference equations and chronic bronchitis.

Longitudinal lung function decline rates

The absolute annualized decline in lung function was − 39.3 mL/yr for FEV₁ and − 34.4 mL/yr for FVC. There were more rapid annualized decline rates of FEV1 and FVC percent-predicted using the race-neutral 2022-GLI reference equations, compared to the race-specific 2012-GLI reference equations (FEV1: −1.05%/yr vs. −0.99%/yr, p < 0.001; FVC: −0.74%/yr vs. −0.68%/yr, p < 0.001) (Fig. 2). Similar observations were noted for subgroup analysis by sex.

Fig. 2
figure 2

Longitudinal trajectories of (a) FEV1 and (b) FVC using the race-specific 2012-GLI and race-neutral 2022-GLI reference equations. GLI, Global Lung Function Initiative; FEV1, forced expiratory volume in 1 s; FVC, forced vital capacity; yr, year.

Discussion

Using the KoGES cohort of > 10,000 participants, our study is the first to longitudinally evaluate of the impact of changing from the race-specific 2012-GLI to race-neutral 2022-GLI reference equations for spirometry interpretation in individuals of Northeast Asian origin. There was at least moderate agreement in both lung function physiologic and medical disability classification between the two GLI reference equations. Of note, application of the race-neutral 2022-GLI reference equations reclassified some patients into normal physiology and non-disabled status. While both lower FEV1 and FVC percent-predicted were independently associated with increased respiratory symptoms irrespective of the reference equations used, larger annualized declines were observed for both lung function parameters using the race-neutral 2022-GLI reference equations.

The impact of removing race adjustment in spirometry interpretation varies across different racial and ethnic groups. Our study fills the gap in existing literature with the focus on Northeast Asians. Adoption of the race-neutral 2022-GLI reference equations consistently results in increased prevalence and severity of respiratory impairment, as well as the degree of medical disability, among Black individuals5,7,9,11,22,23. However, contrary changes were observed in Northeast Asian individuals, as seen in those of White and Hispanic backgrounds7,9,23which resulted in a change in the spirometry interpretation from abnormal to normal results in approximately one of every 13 assessments. This opposite direction may be driven by some potential mechanisms. First, when all racial groups are pooled in the race-neutral 2022-GLI equations, the predicted and LLN values gravitate toward the overall population mean. This lowers the thresholds for groups, such as Northeast Asians, whose lung volumes are relatively large for their standing height, while raising them for groups with smaller volumes, such as Black individuals24,25. Second, because the reference equations rely on the standing height alone, they overlook inter-ethnic differences in body proportions. Northeast Asians typically have a higher sitting-height-to-stature ratio, which corresponds to greater thoracic volume for the same stature, and this characteristic further lowers their predicted values when all racial groups are pooled24,26.

On a population scale, this would affect hundred millions of people. At the individual level, this has substantial implications for disease diagnosis, treatment decisions, clinical trial eligibility, as well as access to disability compensation and health insurance coverage. If the prevalence of airflow obstruction is revised downward, this may result in underdiagnosis of airway diseases such as COPD, with resultant missed opportunities for timely interventions including smoking cessation, pulmonary rehabilitation, and vaccination efforts.

Nevertheless, both interpretations of FEV1 and FVC percent-predicted using the race-specific 2012-GLI and race-neutral 2022-GLI reference equations had comparable inverse associations with respiratory symptoms. The lack of association between FVC percent-predicted and chronic bronchitis may be attributed to the low number of individuals with advanced lung disease. Access to spirometry testing remains challenging, particularly in regional or rural areas and low- and middle-income countries27,28,29. Targeted case findings in symptomatic individuals followed by spirometry testing for the diagnosis of airway disease and specialist-delivered interventions reduce subsequent healthcare utilization30. Given the consistent interrelationship between spirometry parameters and dyspnea that appears unaffected by the reference equations used, the importance and relevance of symptom assessment as a clinical indicator for possible underlying respiratory disorders remain19,30,31.

While z-scores are recommended for interpreting lung function to remove bias attributed to age, height, and sex32percent-predicted values continue to be widely used in clinical practice. The longitudinal changes in FEV1 and FVC percent-predicted values based on the race-specific 2012-GLI and race-neutral 2022-GLI reference equations were significantly different, although the absolute magnitudes were not large. Although both GLI equations generate percent-predicted values using the same absolute decline in lung function,, the switch of reference equations can potentially affect the evaluation of serial measurements for FEV1 and FVC percent-predicted values with overestimation of an apparent change33. However, given the small magnitude of the differences, its clinical significance may be debatable depending on the context with increased relevance in individuals with borderline normal or abnormal spirometry measurements. During the transition period of switching reference equations, it is essential to heighten the awareness to avoid erroneous interpretation and the downstream consequences.

There are a few limitations to our study. Given that this was a population-based cohort, the percentages of individuals with lung function impairment were low. Still, the large study size allowed the evaluation to add to our understanding on the implications of applying the race-neutral 2022-GLI reference equations in Northeast Asians, which is lacking in the literature. Similar to other longitudinal cohort studies, there were dropouts during the follow-ups for the KoGES cohort, although the retention rate was above 62% 15. While evaluation of airway obstruction was determined using pre-bronchodilator values as post-bronchodilator measurements were not obtained in this cohort, this approach has previously been used in other population studies3,34. As the last follow-ups for spirometry measurements were performed in 2013–2014, there was no more recent data for evaluation. As survival data was not available, evaluation of the relationship with spirometry interpretations using different GLI reference equations could not be performed.

Conclusions

Application of the race-neutral 2022-GLI spirometry reference equations has substantial clinical implications in over 27 to 40 million Northeast Asians with reduced diagnosis of lung function impairment and medical disability, despite adequate agreements with the race-specific 2012-GLI spirometry reference equations in the majority of individuals. Nevertheless, respiratory symptoms, in particular dyspnea, remains a useful surrogate marker for impaired spirometry measurements. Our results highlight the importance of evaluating the consequences of switching reference equations for lung function interpretation in populations of different racial and ethnicity backgrounds, which are critical to inform clinical adoption and patient care.