Introduction

An estimated 11% of babies worldwide are born prematurely,1 defined by WHO as delivery before 37 weeks of gestation2. Yet, as survival rates for premature infants improve and the limits of viability are pushed earlier, gains in long-term outcomes have lagged3. Survivors remain disproportionately affected by neurodevelopmental differences when compared to their term counterparts4.

Predicting an individual patient’s risk for a poor outcome is challenging, with experienced neonatologists predicting significantly different patient outcomes based on the same standardized patient scenario5. Many studies have classified preterm neonates at highest risk for poor outcomes based on clinical and epidemiological predictors6,7,8,9. While several well-established prognostic models exist, none integrate genetic predictors,9 representing an untapped opportunity to improve individualized risk assessment.

Autism spectrum disorder (ASD) is a neurodevelopmental disorder linked to prematurity. In a national Swedish cohort of over 4 million people, 6.1% born extremely preterm (22–27 weeks) vs. 1.4% born at term (39–41 weeks) were diagnosed with autism, with each additional week of gestation associated with a ~ 5% lower prevalence of ASD10. Similarly, in a USA-based cohort of extremely premature infants (ELGAN), autism prevalence was 4 times higher than in the general population at 10 years of age11. Prior research demonstrates that some significant early-life clinical events, such as intraventricular hemorrhage,12 postnatal steroid exposure,13 and intrauterine growth restriction,14 are associated with developing autism in premature infants.

Separately, there is also a strong genetic component to the development of ASD. Based on a meta-analysis of twin studies, the heritability of autism is estimated to be 64–91%15. While genome-wide association studies (GWAS) have identified numerous loci contributing to ASD susceptibility,16 the cumulative contribution of common variants, quantified as polygenic risk scores (PRS), has emerged as a tool for assessing individual-level genetic liability to ASD. Higher autism PRS is associated with ASD traits in cohort studies17. While both higher PRS and preterm birth are independent risk factors for ASD, the extent to which preterm birth interacts with or modifies underlying genetic liability remains unknown.

A recent study by Zhang et al. analyzed large cohorts of children with ASD and reported that children with ASD who were born preterm exhibited more severe phenotypic profiles despite showing comparable levels of ASD polygenic risk as term-born individuals with ASD18. These findings suggest that preterm birth may act as an independent risk modifier or influence phenotypic expression without significantly altering polygenic burden. In a separate study, autism risk from a history of mental disorders in the immediate family was not explained by individual PRS, indicating that family history and PRS are best viewed as complementary measures of family-based ASD risk19.

To further explore the relationship between genetic liability and gestational age at birth, we conducted an independent analysis of ASD polygenic risk scores across four groups stratified by both ASD diagnosis and gestational age: preterm with ASD, preterm without ASD, term with ASD, and term without ASD. We hypothesized that polygenic liability would differ by diagnosis status, with the highest PRS average observed in our cohort of term children diagnosed with autism, replicating prior findings.

Method

Subjects and genotyping

Study participants were recruited through the Center for Applied Genomics (CAG) at Children’s Hospital of Philadelphia (CHOP), using the CHOP Health Care Network. All protocols were approved by the CHOP Institutional Review Board, and written informed consent was obtained from participants or their legal guardians by trained medical personnel under physician supervision. Inclusion criteria for control subjects required the absence of any major medical conditions, with exclusion of individuals with a personal history or current diagnosis of cancer. Only participants with genetic similarity to a European referent group (as determined by the first two components of principal components analysis) were included to match the patient population in which the existing autism PRS was originally validated. Birth history, including gestational age at delivery, was documented through clinical records, allowing classification into preterm (gestational age < 37 weeks) and term (≥ 37 weeks) subgroups. ICD-10 codes (defined as any code containing F84) in the medical chart were used to identify individuals with an ASD diagnosis. We included a total of 588 preterm (70 with ASD and 518 without ASD) and 1799 term (461 with ASD and 1,388 without ASD) children in our study (Table 1).

Table 1 Cohort sample size by group accompanied by median PRS score and IQR.

Genomic DNA from enrolled individuals was genotyped using high-density SNP arrays, either the Illumina HumanHap550 or HumanHap610 platforms. Rigorous quality control procedures were implemented: samples with genotype call rates below 95% were excluded, and SNPs were removed if they exhibited a minor allele frequency (MAF) less than 1%, call rate below 98%, or deviated significantly from Hardy–Weinberg equilibrium (P < 1 × 10⁻6). Genotype imputation was performed using the TOPMed Imputation Server20 and the minimac4 algorithm, referencing a comprehensive panel derived from over 100,000 whole-genome sequences. Post-imputation, only variants with MAF > 1% and imputation quality score (Rsq) > 0.5 were retained to ensure robust downstream analyses.

Polygenic risk score analysis

To quantify individual-level common variant burden for ASD, we computed PRS using the PRS-CS (Polygenic Risk Score-Continuous Shrinkage) method21. PRS-CS is a Bayesian regression framework that infers posterior SNP effect sizes under continuous shrinkage priors, incorporating linkage disequilibrium (LD) patterns from a reference panel to improve the accuracy of effect size estimation.

ASD GWAS summary statistics used for PRS calculation were derived from the Psychiatric Genomics Consortium (PGC) ASD meta-analysis (2017 release),16 comprising 46,351 individuals. PRS-CS was run with default parameters using the European LD reference panel from the 1000 Genomes Project (Phase 3). The resulting posterior SNP effect sizes were used to compute individual PRS via the --score function in PLINK v1.9.

To account for population structure, we performed principal component analysis (PCA) on genotyped variants using PLINK. The top 10 genetic-similarity principal components and sex assigned at birth were included as covariates in regression models.

Statistical analysis

PRS values were converted into z-scores prior to visualization and statistical testing. We analyzed the PRS distribution across four stratified groups: preterm children with ASD, preterm children without ASD, term children with ASD, and term children without ASD via generation of a violin plot. The violin plot displays the median, interquartile range, and kernel density of PRS within each group. We then performed logistic regression models to evaluate for differences in common variant burden across groups, including the top 10 ancestry principal components and sex assigned at birth as covariates. Our threshold for statistical significance was p < 0.05.

Finally, we performed an exploratory analysis via linear regression on PRS score by gestational age. For this analysis, autism diagnosis status was not considered. Thus, we standardized the proportion of individuals with ASD across the term and preterm cohort by only including a random sample of 214 individuals born at term with ASD in this model.

Ethics statement

All methods were performed in accordance with the relevant guidelines and regulations. The study was approved by the Institutional Review Board of the Children’s Hospital of Philadelphia, and written informed consent (or assent with parental consent, as appropriate) was obtained from all participants or their legal guardians in accordance with the Declaration of Helsinki.

Results

We examined a cohort of 2,387 individuals to compare genetic liability for ASD stratified by gestational age and autism diagnosis status (Table 1). Figure 1 shows a violin plot of PRS distributions across cohorts, displaying the median, IQR, and density. Overall, individuals with ASD exhibited higher scores than their counterparts without ASD. Notably, a modest statistically significant difference in PRS was observed between term ASD and term non-ASD groups (logistic regression; OR 1.17, 95% CI 1.03–1.32, p = 0.017), indicating that common genetic variant burden is increased in individuals diagnosed with autism when born at term. Every increase in autism PRS score by 1 standard deviation was associated with a 17% increase in odds of autism diagnosis.

Fig. 1
Fig. 1
Full size image

Violin plot showing the distribution of standardized polygenic risk scores (PRS) for ASD across four groups defined by ASD status and birth term: Preterm with ASD, Preterm without ASD, Term with ASD, and Term without ASD. Statistical significance was assessed using generalized linear models to control for ancestry. A significant PRS elevation was observed in the Term-ASD group compared to Term-Non-ASD (P = 0.017), whereas comparisons involving preterm subgroups were not significant (P > 0.25).

In contrast, PRS did not significantly differ between preterm ASD and preterm non-ASD groups (OR 1.18, 95% CI 0.88–1.60, p = 0.26) or between preterm ASD and term ASD groups (OR 0.84, 95% CI 0.62–1.14, p = 0.28) in logistic regression models. These findings align with previous observations from Zhang et al.18 which reported that ASD PRS is not significantly elevated in preterm-born ASD individuals compared to their non-ASD counterparts, despite marked phenotypic severity in the former group.

Interestingly, we found that PRS scores were overall lower in the preterm cohort compared to the term cohort when controlling for ASD prevalence (linear regression model; coefficient for preterm: − 0.14, 95% CI − 0.21 to − 0.06, p = 0.00095). Being born prematurely was associated with a reduction in standardized ASD PRS score by 0.14, indicating lower genetic liability from identified common variants. Similarly, unaffected preterm individuals had significantly lower PRS compared to unaffected term-born individuals (logistic regression, OR 0.83, 95% CI 0.74–0.94, p = 0.0026) (Fig. 2).

Fig. 2
Fig. 2
Full size image

Forest plot of regression model results by comparison. (A) Results of logistic regression from the four-group analysis. Of note, PRS score reached statistical significance in two out of four models: The model comparing termborn children with ASD to termborn children without ASD and the model comparing pretermborn children without ASD to termborn children without ASD. (B) Results from an exploratory linear regression demonstrating overall lower ASD PRS score in the preterm cohort compared to the term cohort after adjusting for ASD prevalence.

Discussion

In this study, we explored the distribution of ASD PRS scores across four subgroups stratified by both gestational age and ASD diagnosis. Our findings contribute to the growing body of literature that seeks to clarify the relationship between genetic predisposition and clinical factors, particularly prematurity, in the etiology of ASD18.

We found that PRS for ASD is significantly higher among our cohort of term children diagnosed with ASD compared to our cohort of term children not diagnosed with ASD. Interestingly, this genetic signal was not observed among preterm individuals: neither the comparison between preterm ASD and preterm non-ASD, nor that between preterm ASD and term ASD, yielded significant differences. These results indicate that the polygenic architecture of ASD may be more detectable in term-born populations, whereas the genetic liability in preterm-born individuals may be obscured or modified by other mechanisms.

Conversely, there was also a small but statistically significant difference in ASD PRS between preterm vs. term-born individuals without an ASD diagnosis (OR 0.83, 95% CI 0.74–0.94), indicating that common variant burden is lower in preterm individuals without a diagnosis compared to term individuals without a diagnosis. This lower polygenic burden observed in unaffected preterm individuals compared to unaffected term individuals may reflect a protective role of low PRS score in this environmentally vulnerable population, though studies with larger sample sizes are needed to validate this effect.

Our observations align with the findings of Zhang et al.18 who reported that preterm-born individuals with ASD exhibited greater phenotypic severity and multimorbidity despite showing similar levels of PRS as term ASD individuals. They also align with Cullen et al. which found that ASD PRS did not have a significant interaction effect on cognition in preterm individuals22. The utility of adding PRS score, gestational age, and male gender into models to predict autism risk with 90% success as described by Zhang18 may be explained by a protective factor of low PRS burden in preterm individuals at highest risk for ASD.

One possible explanation for the lack of genetic signal in our preterm cohort with ASD is that the etiological basis of ASD in preterm individuals may involve a distinct contribution from rare de novo variants; one study demonstrated an elevated prevalence of pathogenic copy number variants in preterm patients compared to their parents or population databases23. In another study, genomes of preterm-born individuals had a significant increase in de novo mutation burden, and many of the genes affected were involved in fetal brain development24. As such, preterm individuals may be more likely to derive their genetic propensity for autism from rare variation, as opposed to the common variation quantified by ASD PRS.

Furthermore, early life complications associated with prematurity including hypoxia, neuroinflammation,25 postnatal steroid treatment,13 neonatal intensive care interventions such as high frequency ventilation,26 etc. may independently elevate risk for ASD in premature patients. In such cases, common variant burden as captured by PRS may play a comparatively minor role in driving neurodevelopmental outcomes. Finally, we applied an autism PRS score derived on a presumed term cohort. It is possible that different common variation drives the ASD risk in premature children when compared to term populations.

In line with possible differences in common variation distribution by gestational age, we found that preterm birth was associated with a reduction in standardized ASD PRS score by 0.14, indicating that the distribution of PRS scores in the preterm cohort was negatively skewed compared to the term cohort. This difference may reflect higher autism PRS protecting against preterm birth through other associated traits. For example, the positive association of autism PRS with educational attainment16 may make mothers with higher genetic liability less likely to deliver prematurely given that lower maternal educational level is a well-known risk factor for preterm birth27. Further research to understand the differences in common variant distribution in preterm vs. term populations is needed to inform the application of term-derived PRS scores for predictive modeling in preterm populations.

Overall, our findings suggest that the predictive utility of PRS in ASD risk stratification may be contingent on gestational age context. While PRS could serve as a useful risk indicator among term-born neonates, it appears less informative in preterm populations. This highlights the importance of integrating genetic and clinical context when interpreting polygenic scores in neurodevelopmental research and risk modeling. It also highlights the opportunity to develop novel PRS models in preterm cohorts to better capture genetic liability unique to this population.

Future studies with larger sample sizes are needed to further examine potential interaction effects between polygenic burden, gestational age, and environmental stressors. Longitudinal follow-up of preterm cohorts with and without ASD will also be critical to disentangle the timing and nature of neurodevelopmental divergence. Integrating PRS with rare variant burden, epigenetic marks, clinical comorbidity, and brain imaging may provide a more comprehensive understanding of ASD liability across different birth conditions.