Background

Chronic obstructive pulmonary disease (COPD) remains one of the most significant causes of death and disability worldwide1,2. While the utility of primary prevention through smoking cessation and avoidance of inhaled particulates and toxins is well established, significant gaps in prevention of COPD remain3. Specifically, while both environmental and genetic risk factors for COPD have been identified, these insights have not yet resulted in individualized prevention4,5,6,7.

This effort has been recently helped by assessing COPD risk using polygenic scores (PS) instead of individual genetic variants. Specifically, a PS developed by Moll et al.succeeded in predicting COPD risk earlier than the clinical risk factors of age and cigarette smoking, with stronger effects in patients of European ancestry (EUR)8. This PS used data from a genome-wide association study of lung function from the UK Biobank and SpiroMeta with the final PS including 1.7 million variants associated with forced expiratory volume in one second (FEV1) and 1.2 million variants associated with FEV1/ forced vital capacity (FVC). This same PS was found to be associated with reduced lung growth patterns in children with asthma and lower lung function attainment in former premature infants8,9,10. These ancestry-specific effects may be due to EUR ancestry being the predominant ancestry in both the UK Biobank and SpiroMeta datasets, which limits the generalizability of PS performance in non-European populations.

These promising findings raise the question, does this PS constitute a population wide measure of genetic lung function potential or does it capture genetic risk in specific sub-populations. Addressing this question is essential to translating previous findings into COPD prevention. If this PS captures population wide genetic lung function potential, the effect on COPD risk from the interaction between PS and harmful exposures may be additive. Conversely, if the PS instead captures population specific risk, subpopulations may experience exponentially higher COPD risk, potentially amenable to intervention. In this study, we address this question by examining the effect of this PS on lung function in a large cohort of pediatric patients with asthma, exploring specifically sex and ancestry specific effects.

Methods

Population

Subjects were drawn from The Children’s Hospital of Philadelphia (CHOP) biorepository at the Center for Applied Genomics (CAG). All subjects have provided informed consent, and the study protocol was approved by the Children’s Hospital of Philadelphia institutional review board (IRB protocol 16-013,278)11. A previously validated reproducible phenotype was used to define subjects with asthma12. This phenotype defines asthma cases as patients above the age of four with at least two asthma related ICD-10 codes in at least two different and independent clinical encounters, and at least one asthma related prescription. Patients with significant other pulmonary diseases including cystic fibrosis and bronchopulmonary dysplasia were excluded.

Spirometry data

For patients with asthma, all spirometry results between ages six and 21 years of age from 01/01/2000 and 12/31/2022 were obtained. These tests were completed as part of routine clinical care. Data obtained included FEV1 in liters, FEV1/FVC, sex at birth, height at time of spirometry, and age in years. Only tests with both FEV1 and FVC values available were included in the study. The race neutral Global Lung Function Initiative calculator was used to calculate FEV1 and FEV1/FVC z-scores13. A COPD phenotype was defined as having a FEV1/FVC ratio below 0.7. During quality control, spirometry measurements recorded as occurring on a weekend or associated with physiologically improbable z-scores (z score <  − 5 or > 5) were removed. Similarly, patients were excluded if patient height was not updated, defined as three or more spirometry occurrences with the same recorded height, in patients unlikely to have reached adult height (recorded height < 140 cm).

Genetic data

Genotype data were generated on four major genotyping array families from Illumina (HumanHapMap550/610Q, OMNI2.5M, OmniExpress, and the GSA array). Array versions within families were merged on common SNPs and filtered for genotype missingness (geno 0.1), individual missingness (mind 0.02), and minor allele frequency (MAF ≥ 0.01), in that order using PLINK v1.914. Data were imputed using the TOPMed v2 reference panel on the TOPMed Imputation Server15,16,17. Each imputed file set was filtered for imputation quality on a combination of R-squared (R2) and MAF (for SNPs with MAF ≥ 0.05, R2 ≥ 0.3 were kept; for MAF < 0.05, R2 ≥ 0.5 were kept). File sets were merged, and variants present in 95% of samples were retained. Ancestry was assigned based on the results of principal component analysis (PCA). PCA was performed using flashpca on approximately 2.4 million imputed SNPs with MAF > 0.05 that had been pruned for linkage disequilibrium (LD) using PLINK v1.914,18. The first three principal components were plotted, and ancestry designation was performed by comparison to the reference genotypes from the HapMap consortium19. After splitting of ancestries, ancestry specific PCAs were performed using SNPs heavily pruned for (LD) and filtered for MAF ≥ 0.05. The African ancestry-specific PCA contained approximately 180,000 variants and the European ancestry-specific PCA included approximately 130,000 variants.

Polygenic score

PS was calculated for FEV1 and FEV1/FVC with the final composite PS score calculated using the (0.43847*PS FEV1) + (0.58833*PS FEV1/FVC) as described by Moll et al.8 All samples used for PS calculation had the PS SNPs successfully genotyped or imputed. The allelic scoring was done using PLINK v1.914,18.

Statistical analysis

To account for ancestry-specific differences in PS distribution, we standardized the PS within each ancestry group by computing ancestry-specific z-scores. This was done by calculating the mean and standard deviation of the PS within each ancestry group and then converting individual scores to z-scores. This approach allowed for within-ancestry comparisons while preserving the original PS structure. To optimize power, we analyzed the PS as a categorical variable with a cut-off of PS ancestry-specific z-score of 1 used to divide the population in patients with a ‘high’ and ‘low’ PS. T-tests were used to assess if there were differences in spirometry tests between these two groups. If significant (P < 0.05), mixed-effects linear regression models were used to assess if the effect of PS remained significant after accounting for clustering at the subject level and including ancestry as a covariate. For spirometry measures associated with PS at a population level, we further examined subpopulation effects with subpopulation specific regression models. Of note, In this study, ‘subpopulation’ refers to any distinct group within the total study population, including stratifications by sex, genetic ancestry, or PS group.

Finally, in the most sensitive subpopulation we used a logistic regression model to predict if patients met criteria for COPD on at least one spirometry test. This model included age at last recorded spirometry in years, ancestry, and PS group.

The vif function from the car package was used to test for multicollinearity throughout these models, and the variance inflation factor for all variables was below 1.5 in all models20.

Results

Population

There were 6,336 patients with asthma, including 3,441 with recorded male sex and 2,895 with female sex, that had spirometry and PS data available (Table 1). At a genetic ancestry subpopulation level, this population included 3,713 patients with African (AFR) ancestry, 1,536 with EUR ancestry, 107 with South Asian (SAS) ancestry, 81 with East Asian (EAS) ancestry, 23 with Admixed American (AMR) ancestry, and 876 with ‘Other’ ancestry (Table 2)21. For these patients, a total of 25,477 spirometry tests met inclusion criteria. Mean age at time of spirometry was 12.33 ± 3.65 years old. Using the GLI Global race-neutral reference equations the mean FEV1 z-score was − 0.51 ± 1.60 and mean FEV1/FVC z-score was − 0.68 ± 1.25.

Table 1 Overview of population characteristics.
Table 2 Spirometry characteristics of the study population, stratified by ancestry and sex.

Population wide effects

Population wide, patients in the high PS group had significantly lower FEV1 and FEV1/FVC z-scores than those in the low PRS group (Fig. 1). The relationship between FEV1/FVC z-scores and PS remained significant in a linear regression model including PS as a categorical variable and adjusting for sex, age, and race as recorded during spirometry as confounders (P = 0.0018, beta =  − 0.12, 95% confidence interval [CI] − 0.19 to − 0.044).

Fig. 1
figure 1

Population wide relationship between polygenic score (PS) and spirometry results. Boxplots of median and interquartile range (IQR) FEV1 (A) or FEV1/FVC (B) z-score by PS group (defined by PS z-score above or below 1), whiskers extend to 1.5 times the IQR, datapoints outside this range are shown as points. P-value reported here was calculated using a two-sided t-test. While a statistically significant difference was observed in both measures, the magnitude of the difference in FEV1 z-score was small and unlikely to be clinically meaningful. Please refer to the text for P-values from the regression model adjusting for sex, age, and self-reported race as confounders.

Effects in sub-populations

When stratified by sex, no differences by age were detected (Supplemental Fig. 1). Conversely, high as opposed to low PS was related with lower FEV1 and FEV1/FVC z-score in patients with male sex and with lower FEV1/FVC z-score in patients with female sex (Fig. 2). When using a mixed-effects linear regression model including age and self-reported race as covariates, only the relationship between PS and FEV1/FVC z-score in male patients remained significant (P = 0.00824, beta =  − 0.14, 95% CI − 0.24 to − 0.036). To examine ancestry specific effects, we explored these same associations in our two largest ancestry groups (‘AFR’ and ‘EUR’). In ancestry specific linear regression models including age and self-reported race as covariates, there was a significant relationship between a high PS and lower FEV1 and FEV1/FVC z-score in males of EUR ancestry (P = 0.01, beta =  − 0.34, 95% CI − 0.59 to − 0.079 and P = 0.00085, beta =  − 0.35, 95% CI − 0.54 to − 0.14 respectively, Fig. 3). No significant relationships were found in females with EUR ancestry or patients with AFR ancestry (Supplemental Fig. 2). The lack of results in patients with AFR ancestry is unsurprising given that the cohorts used to create the PS contained mostly subjects of EUR ancestry.

Fig. 2
figure 2

Sex specific relationship between polygenic score (PS) and spirometry results. Boxplots of median and interquartile range (IQR) FEV1 (A) or FEV1/FVC (B) z-score by PS group (defined by PS z-score above or below 1). Whiskers extend to 1.5 times the IQR, datapoints outside this range are shown as points. P-value reported here was calculated using a two-sided t-test. Please refer to text for P-value from regression model adjusting for age and self-reported race as confounders.

Fig. 3
figure 3

Relationship between age and spirometry results by polygenic score (PS) and sex for patients with European ancestry. Scatterplots showing the relationship between age and FEV1 z-score (A) and FEV1/FVC ratio (B), with points colored by the interaction of sex and PRS status. Trendlines indicate linear relationships between age and results within each group, with confidence intervals around the trendlines included. Colors represent different combinations of sex and PRS status: black for males with High PS, orange for males with Low PS, coral for females with High PS, and blue for females with Low PS.

Finally, we explored if having a high PS increased the odds of having a FEV1/FVC ratio below 0.7. To this end, we created population wide, male only, and male of EUR ancestry logistic regression models (Supplemental Table 2). While significant effects were found when using population wide and male only data (P = 0.024, odds ratio [OR] = 1.26, 95% CI 1.03 to 1.54 and P = 0.017, OR = 1.37, 95% CI 1.05 to 1.77 respectively), the strongest effect was identified in males of EUR ancestry (P = 0.007, OR = 2.01, 95% CI 1.19 to 3.32).

Discussion

Our data are the first to demonstrate that a previously validated COPD PS appears to have sex-specific effects in pediatric patients with asthma. We also redemonstrated that this PS has stronger effects in patients of EUR ancestry, likely due to their overrepresentation in the UK Biobank and SpiroMeta consortium cohorts. Additionally, it is important to note that our study population, which consists predominantly of children with AFR ancestry, differs in ancestry from the EUR-dominated cohorts used in the original GWAS of lung function in COPD patients. This ancestry mismatch may contribute to the observed differences in PS performance and underscores the need for developing ancestry-specific polygenic scores to improve accuracy and equity in genetic risk prediction. We observed that males with a high PS and European ancestry had significantly lower FEV1 and FEV1/FVC than patients with a low PS. For patients in this group, having a high PS increased the odds of having ‘COPD’ as defined by a FEV1/FVC < 0.7.

Sex differences in disease genetics have been well described and understanding these differences is a prerequisite to operationalizing a PS as a biomarker22,23. In our study, the association between COPD PS and spirometry values was specific to males. This is consistent with the previous association between this PS and the male predominant (71% of patients) ‘reduced lung growth’ category in the CAMP pediatric asthma cohort8,10.

Specifically in COPD, sex has been associated with disease severity and phenotype, with women with COPD being generally younger and with better lung function than men, but having more frequent dyspnea and exacerbations24. Some of these differences may be associated with differential gene regulatory patterns, especially of genes involved in the extracellular matrix25. In this context, it is worth noting that sex hormones have been previously linked with pulmonary inflammation and COPD26,27. However, a recent meta-analysis and Mendelian randomization study does appear to support an indirect, rather than direct relationship between sex-hormones and COPD28.

When genetic effects manifest is always an intriguing question. In Fig. 3, we explored the relationship between PS and spirometry stratified by age to assess whether children with a high PS start off with lower lung function early in life or fail to keep up with expected lung function growth over time. Our findings suggest the latter, but prospective studies will be needed to confirm this more definitively.Our study has some limitations. Specifically, retrospective data collected during routine clinical care may result in selection bias. While it appears unlikely that associations between genetic risk and spirometry values were driven by this, future studies validating our results in longitudinal cohorts are required. Furthermore, our center is a quaternary referral center located in an urban center in the North-Eastern United States. Replication studies in lower acuity settings and geographically diverse populations are needed to ensure generalizability of our findings. Additionally, our study relies on an EMR-derived asthma phenotype, which is validated as a binary ‘Yes/No’ variable and does not capture potential changes in asthma status over time. This limits our ability to assess the impact of temporal variations in asthma diagnosis, symptoms, or treatment on lung function trajectories.

Conclusion

Our study demonstrates that a specific sub-population of pediatric patients with asthma, identified by ancestry, sex and PS, appeared to have significantly lower FEV1 and FEV1/FVC. Further studies are needed to assess if and how this early-life sensitivity relates to development of COPD and COPD mortality later in life. Additionally, future studies should incorporate a more dynamic assessment of asthma status over time, utilizing repeated clinical evaluations or EMR-based longitudinal tracking. Furthermore, exploring sex-specific trajectories in lung function with advanced longitudinal modeling approaches may help uncover subtle differences not detectable in our cross-sectional analysis. If follow-up studies support our initial findings, examining how medical and environmental factors interact with lung function in this especially vulnerable sub-set of patients may open up COPD prevention pathways.