Introduction

Many individuals who contract SARS-CoV-2 infection experience new, persistent, or exacerbated symptoms for months, or even years, afterward, often referred to as post-acute sequelae of SARS-CoV-2 infection (PASC), or Long COVID1,2. Existing knowledge on Long COVID, including its incidence, risk factors, subtypes, treatment, and pathophysiology were mostly developed from non-pregnant, adult populations1,2,3,4,5,6,7,8,9,10. Little is known about Long COVID after SARS-CoV-2 infection during pregnancy.

SARS-CoV-2 infection in pregnancy presents a unique set of challenges, intertwining aspects of virology, obstetrics, pediatrics, and public health11,12. Acquiring SARS-CoV-2 infection during pregnancy is associated with an increased risk of mortality and obstetric complications11,13,14,15. These adverse pregnancy outcomes can extend beyond maternal health to affect the short- and long-term quality of life of the offspring16,17,18. The immune response and proteomic changes during pregnancy in the context of COVID-19 exhibit distinct characteristics compared to non-pregnant individuals, indicating a nuanced relationship between maternal protection of the fetus and susceptibility to severe disease manifestations12. While SARS-CoV-2 infection acquired in pregnancy is associated with worse perinatal outcomes, infection during pregnancy has been described as protective against Long COVID17. However, prior studies have been conducted on relatively small pregnancy cohorts17, limiting the generalizability of the results. Further, knowledge gaps still exist for patient counseling including further consideration of gestational age at the time of SARS-CoV-2 infection in pregnancy and interval Long COVID risk, as well as the influence of pre-existing co-morbid health conditions.

In this study, within the National Institutes of Health (NIH) Researching COVID to Enhance Recovery (RECOVER) initiative19, electronic health records (EHR) data from 29 sites from the National Patient-Centered Clinical Research Networks (PCORnet) and 65 sites from the National COVID Cohort Collaborative (N3C) were analyzed to build one of the largest retrospective cohorts of females with SARS-CoV-2 infection during pregnancy. The objective of this study was to estimate Long COVID risk in individuals acquiring SARS-CoV-2 infection during pregnancy compared with a similar cohort of reproductive-age females who acquired SARS-CoV-2 outside of pregnancy. The secondary aim was to evaluate the influence of other variables such as race/ethnicity, infection by pregnancy trimester, SARS-CoV-2 variants, body mass index, baseline co-morbid health conditions, and vaccination status on the risk of developing Long COVID. The Long COVID outcomes were assessed using a PCORnet rule-based Long COVID Computational Phenotype (CP) method, an N3C Long COVID machine learning (ML) CP method, unspecified PASC diagnoses (ICD10 codes U09.9 or B94.8), and diagnoses of cognitive, fatigue, and respiratory conditions.

Results

A total of 492,325 and 1,019,180 eligible reproductive-age females, with documented SARS-CoV-2 infection between March 1, 2020, and October 31, 2022, and follow-up to June 1, 2023, who were connected to the healthcare network before infection, were identified from the PCORnet and N3C, respectively. Of those, 29,975 were pregnant when they acquired a SARS-CoV-2 infection in the PCORnet cohort and 42,176 in the N3C cohort. For each pregnant individual, non-pregnant females were selected for comparison by exactly matching on region, age, infection time, acute severity, and baseline comorbidities (Method) with a ratio of 1:3, resulting in 87,127 in the PCORnet and 120,732 in the N3C. The patient selection flow and the population characteristics are presented in Fig. 1 and Table 1 (More covariates in Supplementary Table 1), respectively. See the population characteristics before matching in Supplementary Table 2.

Fig. 1: Cohort selection.
Fig. 1: Cohort selection.
Full size image

a Selection of females with SARS-CoV-2 infection during pregnancy or not, from the PCORnet cohort and N3C cohort. The SARS-CoV-2 infection was between March 1st, 2020, and October 31, 2022, and follow-up to June 1st, 2023. b Study design. The post-acute sequelae of SARS-CoV-2 infection (PASC), or Long COVID, outcomes were ascertained from day 30 after the SARS-CoV-2 infection and the adjusted risk was computed at 180 days after the SARS-CoV-2 infection. Two exposure groups are pregnant individuals who acquired SARS-CoV-2 during pregnancy as illustrated in b compared with outside of pregnancy. The pregnant group was compared with exactly matched non-pregnant females on site region, age, infection time, acute severity, and selected baseline comorbidities including diabetes, hypertension, autoimmune or immune suppression, mental health disorders, severe obesity, and asthma with a ratio of 1:3.

Table 1 Baseline characteristics of SARS-CoV-2 positive pregnant females and matched SARS-CoV-2 positive non-pregnant females from PCORnet and N3C, March 2020 to October 2022a

Before matching, as shown in Supplementary Table 2, the median age in the pregnant female group was younger than the non-pregnant female group (30 [interquartile range (IQR), 26-34] vs 35 [IQR 27-43]) in PCORnet and 30 [IQR, 26-34 vs 36 [IQR 27-44] in N3C). Compared to non-pregnant females, pregnant females were less likely to have cancer, chronic kidney disease, chronic pulmonary disorders, hypertension, mental health disorders, class III obesity, or to be fully vaccinated at baseline. By contrast, pregnant females were more likely to have anemia, coagulopathy, and to be overweight compared with the non-pregnant females in both cohorts. After matching, as shown in Table 1 (See more covariates in Supplementary Table 1), the two comparison groups became more comparable in terms of these baseline covariates. To further adjust for any residual differences, inverse probability of treatment weighting (IPTW) was applied to the matched cohorts (see Methods) for estimating relative risks. All the measured variables were well-balanced between the two comparison groups in PCORI and N3C as summarized in Supplementary Table 3.

Four Long COVID definitions were examined: a PCORnet rule-based Long COVID definition which includes 15 incident conditions across multi-organ systems on the PCORnet cohort5,20, an N3C Long COVID ML Phenotype trying to predict miss- or under-diagnosed PASC diagnosis U09.9 on the N3C cohort21,22, unspecified PASC ICD-10 diagnosis U09.9/B94.8, and a sub-cluster of cognitive, fatigue, and respiratory diagnoses23. The latter two were cross-checked among two cohorts as a sensitivity analysis.

Long COVID risk in the PCORnet cohort

At 180 days of follow-up, the estimated risk of Long COVID was 16.47 events per 100 persons (95% confidence interval (CI), 16.00 to 16.95) in the pregnant group, and 18.88 (95% CI, 18.59–19.17) in the non-pregnant group (Fig. 2). Compared to non-pregnant females, pregnant females had a lower risk of Long COVID, with a Hazard Ratio (HR) of 0.86 (95% CI, 0.83–0.90) and risk reduction of 2.41 events per 100 persons (95% CI, 1.85–2.96).

Fig. 2: Long COVID risks in the SARS-CoV-2 infected pregnant women versus the matched infected non-pregnant women in PCORnet and N3C.
Fig. 2: Long COVID risks in the SARS-CoV-2 infected pregnant women versus the matched infected non-pregnant women in PCORnet and N3C.
Full size image

Outcomes were ascertained 30 days after the first documented SARS-CoV-2 infection evidence until the end of the follow-up. The absolute risk, risk ratio, and risk difference were captured by the cumulative incidence (CIF), hazard ratio (HR), and the difference of cumulative incidence per 100 persons (DIFF/100), estimated at 180 days after the infection index date, respectively. The centers of the error bars were adjusted hazard ratios calculated by the Cox proportional hazard model, and error bars indicated two-sided 95% confidence intervals (95% CI).

Lower risk of incident Long COVID in the pregnant group was observed across systems as shown in Fig. 2, including post-acute neurological conditions (sleep disorders, cognitive problems, encephalopathy), post-acute pulmonary conditions (pulmonary fibrosis, acute pharyngitis, shortness of breath), post-acute circulatory condition (chest pain), and some general conditions in the post-acute phase (e.g., malaise and fatigue, unspecified Post-COVID-19 diagnostic codes U099/B948, smell, and taste). A few exceptions are post-acute metabolic conditions (edema, diabetes, malnutrition), post-acute musculoskeletal conditions (joint pain), pulmonary fibrosis, and fever, which showed no significant difference between the two groups.

Comparison with the N3C cohort

Using the N3C cohort with the applied N3C ML phenotype, the estimated risk of Long COVID at 180 days in the N3C cohort was 4.37 events per 100 persons (95% CI, 4.18–4.57) in the pregnant group and 6.21 (95% CI, 6.07–6.35) in the non-pregnant group. The same relatively lower risk of Long COVID in the pregnant group compared to the non-pregnant group was observed in the N3C cohort (Fig. 2) with HR of 0.70 (95% CI, 0.66–0.74) and risk reduction of 1.84 events per 100 persons (95% CI, 1.60–2.08).

Long COVID Risk in Sub-populations

Regarding absolute risks in the pregnant female group, as shown in Fig. 3, we observed higher Long COVID risk in several subgroups: self-reported Black individuals compared to White individuals, individuals with advanced maternal age (\(\ge\)35 years compared to those aged <35 years), those infected during the first two trimesters compared to the third trimester, those infected during the Delta and Omicron periods (compared to earlier variants), individuals with obesity compared to those who were overweight or of normal weight, and those with baseline chronic medical conditions compared to those without. Similar absolute risks were observed in subgroups regardless of vaccination status.

Fig. 3: Long COVID, risk in different sub-populations in PCORnet and N3C cohorts.
Fig. 3: Long COVID, risk in different sub-populations in PCORnet and N3C cohorts.
Full size image

Corresponding sub-populations in the SARS-CoV-2 infected pregnant women and the infected non-pregnant women were compared. Outcomes were ascertained 30 days after the first documented SARS-CoV-2 infection evidence until the end of the follow-up. The absolute risk, risk ratio, and risk difference were captured by the cumulative incidence (CIF), hazard ratio (HR), and the difference of cumulative incidence per 100 persons (DIFF/100), estimated at 180 days after the infection index date, respectively. The centers of the error bars were adjusted hazard ratios calculated by the Cox proportional hazard model, and error bars indicated two-sided 95% confidence intervals (95% CI). Having co-existing risk factors is having any hypertension, diabetes, class III obesity, and asthma at baseline.

When compared to the non-pregnant group, the same relatively lower risk of Long COVID in the pregnant group was obtained across different subpopulations stratified by self-reported race/ethnicity (White, Black), age ( < 35 years, \(\ge\)35 years), SARS-CoV-2 variants of concern (ancestral, Alpha, Delta, and Omicron), body mass index (normal, overweight, and obese), having baseline chronic medical conditions (yes or no), vaccination status (fully vaccinated, any vaccine records, or no vaccine records), and acquiring SARS-CoV-2 during the 3rd trimester, across two cohorts (Fig. 3). A few exceptions are no significant or moderate higher risk in patients infected during the 1st trimester (HR 1.07 (0.97 to 1.19) in PCORnet, HR 1.17 (1.03, 1.34) in N3C) or 2nd trimester (HR 1.15 (1.08 to 1.23) in PCORnet, HR 0.89 (0.81, 0.97) in N3C.

Sensitivity analyses

We further cross-checked the risk of Long COVID in terms of unspecified PASC ICD-10 diagnostic codes U099 or B948, and a subcluster of post-acute cognitive, fatigue, and respiratory conditions, in both PCORnet cohort and N3C cohort as shown in Fig. 4.

Fig. 4: Risks of unspecified PASC diagnoses and Cognitive, Fatigue, and Respiratory symptom cluster among the SARS-CoV-2 infected pregnant women versus the matched infected non-pregnant women, in PCORnet and N3C cohorts.
Fig. 4: Risks of unspecified PASC diagnoses and Cognitive, Fatigue, and Respiratory symptom cluster among the SARS-CoV-2 infected pregnant women versus the matched infected non-pregnant women, in PCORnet and N3C cohorts.
Full size image

Outcomes were ascertained 30 days after the first documented SARS-CoV-2 infection evidence until the end of the follow-up. The absolute risk, risk ratio, and risk difference were captured by the cumulative incidence (CIF), hazard ratio (HR), and the difference of cumulative incidence per 100 persons (DIFF/100), estimated at 180 days after the infection index date, respectively. The centers of the error bars were adjusted hazard ratios calculated by the Cox proportional hazard model, and error bars indicated two-sided 95% confidence intervals (95% CI).

Regarding the unspecified PASC ICD-10 diagnostic codes U099 or B948, the estimated risk at 180 days was 0.19 (95% CI, 0.14–0.25) events per 100 persons in the pregnant group and 0.60 (0.55–0.66) in the non-pregnant group within the PCORnet cohort. In the N3C cohort, the estimated risk was 0.23 (0.19–0.28) events per 100 persons in the pregnant group and 0.44 (0.40–0.48) in the non-pregnant group. This indicates that the pregnant group consistently exhibited a relatively lower risk—approximately two to three times lower—compared to the matched non-pregnant group across both cohorts.

Regarding having any post-acute cognitive, fatigue, and respiratory conditions, the estimated risk was 4.86 (4.59–5.14) events per 100 persons in the pregnant group and 6.79 (6.60–6.97) events per 100 persons in the non-pregnant group within the PCORnet cohort. In the N3C cohort, the estimated risk was 6.83 (6.59–7.08) events per 100 persons in the pregnant group and 9.54 (95% CI, 9.37–9.71) events per 100 persons in the non-pregnant group.

Consistency was observed in both absolute and relative risks when applying these two Long COVID definitions across the two cohorts. Regarding different Long COVID outcomes in various subpopulations (Figs. 5 and 6), we observed a consistent pattern of lower relative risk in pregnant females compared with non-pregnant females, along with similar gradients of absolute risks across subgroups within the pregnant group. One exception was a higher incidence of unspecified Long COVID diagnoses in the Delta era among pregnant groups compared to other periods.

Fig. 5: Risks of Cognitive, Fatigue, and Respiratory symptoms cluster in different sub-populations from the PCORnet cohort and N3C cohort.
Fig. 5: Risks of Cognitive, Fatigue, and Respiratory symptoms cluster in different sub-populations from the PCORnet cohort and N3C cohort.
Full size image

Corresponding sub-populations in the SARS-CoV-2 infected pregnant women and the infected non-pregnant women were compared. Outcomes were ascertained 30 days after the first documented SARS-CoV-2 infection evidence until the end of the follow-up. The absolute risk, risk ratio, and risk difference were captured by the cumulative incidence (CIF), hazard ratio (HR), and the difference of cumulative incidence per 100 persons (DIFF/100), estimated at 180 days after the infection index date, respectively. The centers of the error bars were adjusted hazard ratios calculated by the Cox proportional hazard model, and error bars indicated two-sided 95% confidence intervals (95% CI). Having co-existing risk factors is having any hypertension, diabetes, class III obesity, and asthma at baseline.

Fig. 6: Risks of unspecified PASC diagnoses U099/B948 in different sub-populations from the PCORnet cohort and N3C cohort.
Fig. 6: Risks of unspecified PASC diagnoses U099/B948 in different sub-populations from the PCORnet cohort and N3C cohort.
Full size image

Corresponding sub-populations in the SARS-CoV-2 infected pregnant women and the infected non-pregnant women were compared. Outcomes were ascertained 30 days after the first documented SARS-CoV-2 infection evidence until the end of the follow-up. The absolute risk, risk ratio, and risk difference were captured by the cumulative incidence (CIF), hazard ratio (HR), and the difference of cumulative incidence per 100 persons (DIFF/100), estimated at 180 days after the infection index date, respectively. The centers of the error bars were adjusted hazard ratios calculated by the Cox proportional hazard model, and error bars indicated two-sided 95% confidence intervals (95% CI). Having co-existing risk factors is having any hypertension, diabetes, class III obesity, and asthma at baseline.

Our findings remain consistent across various SARS-CoV-2 identification methods, cohort selection criteria, and a modified rule-based Long COVID phenotype method. Specifically, when we identified SARS-CoV-2-infected patients in the PCORnet cohort using only the lab tests and diagnoses, excluding Paxlovid or Remdesivir (see Method-Study cohort), as shown in Supplementary Fig. 1, the risks and relative risks are largely the same as the primary results in Figs. 2 and 4. Second, requiring at least two visits during the baseline period and at least one visit in the follow-up period, as shown in Supplementary Fig. 2, resulted in an increased cumulative incidence of Long COVID in both pregnant and non-pregnant groups compared to the primary analysis (Figs. 2 and 4). However, the risks of Long COVID in the pregnant group remained lower than in the non-pregnant group, with the adjusted hazard ratios even lower than those in the primary analysis (Figs. 2 and 4). Finally, we examined a variant of the PCORnet rule-based phenotype method by excluding the edema condition. As shown in Supplementary Fig. 3, the cumulative incidence of any Long COVID condition was lower in both groups due to the exclusion of the edema. However, the primary finding of lower risk of Long COVID in the pregnant group compared to the non-pregnant cohorts remains robust.

Discussion

In this retrospective cohort study involving 29 PCORnet sites and 65 N3C sites as part of the RECOVER initiative, we estimated the risk of Long COVID in pregnant females with SARS-CoV-2 infection during pregnancy. The long-term implications of COVID-19 in pregnancy are significant, as reflected in the different Long COVID outcomes captured across the two cohorts. In the PCORnet cohort, the estimated risk of Long COVID at 180 days of follow-up was 16.47 events per 100 persons (95% CI, 16.00–16.95) based on a rule-based Long COVID phenotype method. In the N3C cohort, the estimated risk of Long COVID was events per 100 persons 4.37 (4.18–4.57) using a machine learning-based approach. The risks of unspecified PASC diagnostic codes U099 or B948 were 0.19 events per 100 persons (95% CI, 0.14 –0.25) in PCORnet and 0.23 events per 100 persons (95% CI, 0.19–0.28) in N3C. The risks of post-acute cognitive, fatigue, and respiratory condition were 4.86 events per 100 persons (95% CI, 4.59–5.14) in PCORnet and 6.83 events per 100 persons (95% CI, 6.59 –7.08) in N3C. A higher incidence of Long COVID was observed in self-reported Black patients, patients with advanced maternal age, those infected during the first two trimesters, individuals with obesity, and those with baseline conditions.

Of note, we observed a relatively lower risk of Long COVID in pregnant individuals compared to SARS-CoV-2-infected non-pregnant females who were exactly matched on region, age, infection time, acute severity, and baseline comorbidities. The lower risk patterns were consistent across different Long COVID phenotype methods in both PCORnet and N3C cohorts: the adjusted Hazard Ratio (aHR) of 0.86 (95% CI, 0.83 to 0.90) and risk reduction of 2.41 events per 100 persons (95% CI, 1.85 to 2.96) for the PCORnet cohort with its rule-based phenotype method; the aHR of 0.70 (95% CI, 0.66–0.75) and risk reduction of 1.84 events per 100 persons (95% CI, 1.60 to 2.08) for the N3C cohort with its ML-based phenotype method; aHRs of 0.32 (95% CI, 0.22, 0.46) and 0.53 (95% CI, 0.41, 0.68) for unspecific PASC ICD-10-CM diagnostic codes U099 or B948 in PCORnet and N3C respectively; and aHRs of 0.70 (95% CI, 0.65, 0.76) and 0.70 (95% CI, 0.67, 0.74) for the cognitive, fatigue, and respiratory diagnoses cluster in PCORnet and N3C respectively. Furthermore, the pattern of relatively lower risk of Long COVID in pregnant individuals compared to non-pregnant females was largely consistent across different subpopulations and robust to various sensitivity analyses in terms of various Long COVID definitions in both the PCORnet and N3C cohorts.

Pregnancy reflects a period of physiologic immune tolerance to accommodate fetal development. Differences in regulatory T cells, cytokines, and other immune cells have been described during pregnancy and are thought to prevent maternal immune system rejection of the fetus24. More severe disease courses from other viruses, such as influenza, have been described during pregnancy and attributed to these immune alterations25. We might hypothesize that the altered immune and inflammatory environment during the puerperium likely contributes to the lower risk of Long COVID identified among the pregnant compared to the non-pregnant cohorts. The observed risk differences in this analysis suggest future dedicated pathophysiology and immune studies of Long COVID in pregnant individuals are warranted. In particular, a focus on differences in Long COVID by trimester may be informative for patient counseling. A higher risk of Long COVID in self-reported Black females draws attention to racial and ethnic disparities in the development of Long COVID among individuals who acquired the SARS-CoV-2 infection during pregnancy, which may be related to factors such as inequitable healthcare access, socioeconomic factors, and structural racism.

This study has several strengths. First, the utilization of two large-scale clinical data networks, consisting of 73 unique hospital systems, allowed for more comprehensive analyses with substantial statistical power, particularly for the pregnant groups. In a prior publication17, a subset of 5,397 eligible pregnant females acquiring COVID-19 during pregnancy from 19 PCORnet sites was reported. The sample size precluded subgroup analyses with adequate power. Through collaborative efforts from PCORnet, N3C, and the RECOVER-Pregnancy Cohort within RECOVER, for this analysis, 72,151 eligible pregnant females with infection during pregnancy, and 207,859 exactly matched infected non-pregnant females with a ratio of 1:3, were identified. Second, Detailed subgroup analyses were performed, stratified by self-reported race/ethnicity, maternal age, variants of concern, BMI, baseline co-morbid health conditions, and infection by trimester. Third, we characterized and cross-checked the Long COVID risk in terms of four different definitions including a rule-based definition organized by multi-organ systems in PCORnet5,20, a machine-learning Long COVID phenotype in N3C21, unspecified PASC diagnosis U099/B948, and a sub-cluster of cognitive, fatigue, and respiratory diagnoses23. The similar patterns and triangulation from different Long COVID definitions across two different cohorts further strengthen the confidence in these findings.

There are also several limitations. First, this is a retrospective observational study based on electronic health records, which might suffer from potential residual confounding, missingness, and misclassification of pregnancy and study variables. Second, due to separate data systems following different common data models, we did not implement the PCORnet Long COVID definition for the N3C cohort or the N3C Long COVID predictive model for the PCORnet cohort. However, un-specific PASC diagnoses and the cognitive, fatigue, and respiratory conditions were cross-checked in both cohorts and the results suggested consistent conclusions. Third, the associations between vaccine status and Long COVID require further dedicated investigation. More than 82% of patients in the pregnant female group showed no vaccine data (Table 1), higher than the nearly 77% no data portion in the infected non-pregnant group. The no-vaccine data could have derived from both poor capture of vaccine data in EHR and the initial low public confidence about COVID-19 vaccination in pregnancy (due to lack of enrollment of pregnant people in the early vaccine trials), and thus low vaccination rates in pregnant individuals. Fourth, though adjusting for healthcare utilizations at baseline, pregnant individuals usually have frequent prenatal care visits (particularly for first and second-trimester infections), which may result in higher rates of detection of the Long COVID outcome variables in those populations. Finally, we cross-checked the Long COVID risks among pregnant individuals in terms of different Long COVID modeling approaches across two large clinical research networks; however, validation with external cohorts and prospective cohorts is still needed.

Methods

Data

This study utilized electronic healthcare records (EHR) data from two clinical research networks (CRN) in the U.S., namely the National Patient-Centered CRN (PCORnet) and the National COVID Cohort Collaborative (N3C), within the RECOVER initiative. Analyzes were conducted separately for each cohort by following a similar experimental protocol and the same statistical analytics.

The PCORnet RECOVER infrastructure leveraged PCORnet to develop a single, unified EHR/RWD repository to study PASC across ~28.25 million (18.75 million adult − 9.5 million pediatric) patients from 40 adult and pediatric health systems nationwide who continue to refresh their data at least quarterly. The source data includes patients tested for COVID-19 (regardless of result), those diagnosed with COVID-19, those who received COVID-19 vaccine and therapeutics (e.g., Remdesivir and Paxlovid), and/or those who have received a respiratory diagnosis since 2019. The enclave contains structured EHR data consisting of inpatient and ambulatory encounters, laboratory results, vital signs, medications, diagnoses, procedures, birth dates, sex, and race/ethnicity information. The EHR data is linked to geocoded data to the level of the census tract, block group, and/or 9-digit zip code to allow linkage to exposome information to assess the influence of SDoH and environmental exposures on COVID-19 outcomes. In addition, the data enclave includes clinical notes for NLP, vaccine registries, and death registries.

Individual EHR data is stored in the N3C Data Enclave, which provides access to harmonized EHRs from 84 health sites with data from over 22.8 million patients (as of August 1st, 2024). For the current investigation, we used N3C data from version 152 (2023-12-07), and our final cohort encompasses contributions from 65 sites that had individuals who met our inclusion criteria. The N3C Data Enclave uses the Palantir Foundry platform (2021, Denver, CO), a secure analytics platform, for data access and analysis. N3C’s methods for patient identification, data acquisition, ingestion, data quality assessment, and harmonization have been described previously26,27. The N3C EHR data is structured in a similar way to PCORnet, consisting of inpatient and ambulatory encounters, laboratory results, vital signs, medications, diagnoses, procedures, birth dates, sex, and race/ethnicity information. Data for individuals is geocoded at the 9-digit zip code level, and sites are linked to vaccine registries, as well as a privacy-preserving record linkage to mortality and CMS (Medicare and Medicaid) claims data.

Ethics oversight

The use of the PCORnet data was approved by the Institute Review Board (IRB) under Biomedical Research Alliance of New York (BRANY) protocol #21-08-508. As part of the Biomedical Research Alliance of New York (BRANY IRB) process, the protocol has been reviewed in accordance with the institutional guidelines. The Biomedical Research Alliance of New York (BRANY) waived the need for consent and HIPAA authorization. Institutional Review Board oversight was provided by the Biomedical Research Alliance of New York, protocol #21-08-508-380. The N3C data transfer is performed under a Johns Hopkins University Reliance Protocol #IRB00249128 or individual site agreements with NIH. The N3C Data Enclave is managed under the authority of the NIH; information can be found at https://ncats.nih.gov/n3c/resources. This work was conducted under DUR RP-5677B5. The N3C received a waiver of consent from NIH Institutional Review Board under the 1996 Health Insurance Portability andmetho Accountability Act privacy regulations for a Limited Data Set.

Study Cohort

For our base cohort in PCORnet, we included SARS-CoV-2 patients with at least one positive SARS-CoV-2 polymerase-chain-reaction (PCR) or antigen laboratory test, COVID-19 diagnosis code U07.1, or prescription of Paxlovid or Remdesivir, between March 01, 2020, and June 30, 2023. The COVID-19 index date was defined as the date of the first documented positive COVID-19 record if they had (a) positive SARS-CoV-2 polymerase-chain-reaction (PCR) or antigen laboratory tests; (b) the International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnosis code U07.1 representing COVIID-19 diagnosis; or (c) Paxlovid (nirmatrelvir/ritonavir) or Remdesivir prescriptions, whichever occurred earlier. We required female patients, aged between 18 to 50 years old, and at least one diagnosis code within three years to seven days before the index date to be included in the cohort. The baseline period was defined as three years before the index date, and the post-acute phase, or the follow-up period, was set as 31 days to 180 days after the index date. We further require the index date before October 31, 2022, to guarantee at least a 180-day follow-up period.

For our base cohort in N3C, we included SARS-CoV-2 patients with at least one positive SARS-CoV-2 polymerase-chain-reaction (PCR) or antigen laboratory test, or COVID-19 diagnosis code U07.1 before October 31, 2022. The COVID-19 index date was defined as the date of the first documented positive COVID-19 lab test or diagnosis. The baseline period included all individual records going back to 2018, and we required at least two visits within one year before the index date. We further required at least one visit more than 100 days after the index date to ensure individuals didn’t leave our data sample.

The primary exposure group included SARS-CoV-2 infection during pregnancy compared with outside of pregnancy. Thus, we identified two comparison groups: females acquiring SARS-CoV-2 during pregnancy versus outside pregnancy, applying additional eligibility criteria requiring infection in the gestational period for the pregnant females. The infection during pregnancy was defined as the first documented SARS-CoV-2 infection occurring between the start of pregnancy and the date of delivery. The delivery event was ascertained by identifying diagnosis codes related to delivery outcomes or delivery-related procedures28 after March 01, 2020. The start of the pregnancy and gestational age were approximated using the Z3A codes associated with the date of the delivery in PCORnet29. Pregnancies in N3C were identified using a hierarchical rules-based algorithm described in a previous paper, which also uses Z3A codes to define gestational age30. The gestational period was defined as the start of the pregnancy to the delivery event. In both PCORnet and N3C, we identified the SARS-CoV-2-infected pregnant group as those females with identified delivery events and SARS-CoV-2-infection occurring within the gestational period. The SARS-CoV-2-infected non-pregnant group consisted of individuals without any identified delivery events within the study windows.

The pregnant individuals were compared with exactly matched non-pregnant females on site region, age, infection time, acute severity, and selected baseline comorbidities including diabetes, hypertension, autoimmune or immune suppression, mental health disorders, severe obesity, and asthma with a ratio of 1 to 3. The cohort selection flow is illustrated in Fig. 1a.

Outcomes

The definition of Post-acute Sequelae of SARS-CoV-2 (PASC), or Long COVID, used for this study varies between PCORnet and N3C. In PCORnet, the Long COVID definition for pregnant females is a rules-based computable phenotyping algorithm leveraging International Classification of Diseases (ICD) 10th Version codes for 15 incident conditions, including cognitive problems, encephalopathy, sleep disorders, acute pharyngitis, shortness of breath (dyspnea), pulmonary fibrosis, chest pain, diabetes, edema, malnutrition, joint pain, fever, malaise and fatigue, ICD-10-CM diagnosis codes U099/B948 for unspecified PASC, and smell and taste. These conditions were identified based on previous studies5,20, evidence from the literature20,31,32,, and tailored for pregnant females17. An incident condition was defined as occurring in SARS-CoV-2 infected patients who developed the condition between 31 days and 180 days after the acute infection, provided they did not have the condition three years to seven days before their acute infection. Long COVID was defined as having any incident condition from the abovementioned list.

In contrast, in the N3C cohort, Long COVID was defined primarily through a machine learning algorithm, specifically, the PASC Machine Learning 2.0 (LCM 2.0)21,22. This machine-learning pipeline predicts the presence of Long COVID using information extracted from the EHR data, creating a computable phenotype for Long COVID. The model was designed to address challenges such as missing data and idiosyncratic coding practices inherent in EHRs. Unlike its predecessor, LCM 1.0, which relied on the acute COVID-19 date as an anchor point for analysis, LCM 2.0 employs set time windows applicable to all patients, regardless of their COVID-19 index dates. These time windows, progressing through overlapping 100-day periods, enable the model to assess the probability of Long COVID across diverse patient populations, including those with suspected or untested COVID-19 cases and individuals experiencing multiple SARS-CoV-2 reinfections.

Two alternative definitions for Long COVID were further cross-checked in both PCORnet and N3C including a) un-specific PASC ICD-10-CM diagnostic codes U099 (Post COVID-19 condition, unspecified) or B948 (Sequelae of other specified infectious and parasitic diseases) and b) cognitive, fatigue, and respiratory diagnoses cluster23.

Baseline covariates

A broad range of potential confounders collected at the time of infection were considered for the adjusted analyzes. These covariates included age at infection, self-reported race/ethnicity, national-level Area Deprivation Index (ADI)33, healthcare utilization, time of infection, the most recent body mass index (BMI), smoking status, ICU or ventilation in acute infection, COVID-19 vaccine status, and a range of baseline health comorbidities. Age was categorized into 18–24 years, 25–29 years, 30–34 years, 35–39 years, 40–44 years, and 45–50 years. The self-reported race/ethnicity was categorized as Asian, Black or African American, White, other (by grouping American Indian or Alaska Native, Native Hawaiian or Other Pacific Islander, Multiple race/ethnicity, and other categories in the PCORnet Common Data Model34), missing, and self-reported ethnicity as Hispanic, not Hispanic, and other/missing. The ADI, which ranks from 1 to 100, was used to capture the socioeconomic disadvantage of patients’ residential neighborhoods with 1 indicating the lowest level of disadvantage33. We used geocodes or 9-digit zip codes to link to the national ADI percentiles. Healthcare utilization was measured as the number of inpatients and emergency encounters (0 visits, 1 or 2 visits, 3 or 4 visits, and 5 or more visits for each encounter type). The infection time was categorized into bins spanning every four months since March 2020 to account for different periods of the pandemic. The BMI was categorized into underweight (<18.5 kg/m2), normal weight (18.5–24.9 kg/m2), overweight (25.0–29.9 kg/m2), and obese (≥30.0 kg/m2), and missing according to the Centers for Disease Control and Prevention guideline for adults35. The severe acute infection was approximated by the ventilation status and critical care during the infection.

We collected a range of baseline co-morbid health conditions based on a tailored list of the Elixhauser comorbidities36 and related drug categories, including alcohol abuse, anemia, arrhythmia, asthma, cancer, chronic kidney disease, chronic pulmonary disorders, cirrhosis, coagulopathy, congestive heart failure, chronic obstructive pulmonary disease, coronary artery disease, dementia, diabetes (type 1 or 2), end-stage renal disease on dialysis, hemiplegia, HIV, hypertension, inflammatory bowel disorder, lupus or systemic lupus erythematosus, mental health disorders, multiple sclerosis, Parkinson’s disease, peripheral vascular disorders, pulmonary circulation disorder, rheumatoid arthritis, seizure/epilepsy, severe obesity (BMI ≥ 40 kg/m2), weight loss, Down syndrome, other substance abuse, cystic fibrosis, autism, sickle cell, obstructive sleep apnea, Epstein-Barr and Infectious Mononuclesosi, Herpes Zoster, corticosteroid drug prescriptions, and immunosuppressant drug prescriptions. Patients in PCORnet were considered to have a condition if they had at least one corresponding diagnosis or medication documented in the three years before the COVID-19 index date, and in N3C conditions were defined as any corresponding diagnosis or medication in the data (starting in 2018) prior to COVID-19 index date. The N3C used OMOP concept sets to match corresponding variables in PCORnet, but did not include cirrhosis, multiple sclerosis, lupus, Parkinson’s disease, seizure/epilepsy, cystic fibrosis, autism, Epstein-Barr and Infectious Mononucleosis, or Herpes Zoster as health conditions. Corticosteroid and immunosuppressant prescription variables were created using the same drug codes as PCORnet.

Follow-up period

We followed each patient from 30 days after their index date until the occurrence of the first target outcome, documented death, loss of follow-up in the database, 180 days after the baseline, or the end of our observational window (June 30, 2023), whichever came first.

Statistical analyses

For each individual in the pregnant group, the SARS-CoV-2 infected non-pregnant comparators were exactly matched on the site region, age, infection time, acute severity, and selected baseline comorbidities including diabetes, hypertension, autoimmune or immune suppression, mental health disorders, severe obesity, and asthma with a ratio of 1:3. Based on pregnant and matched non-pregnant cohorts, the relative risks were further adjusted via inverse probability of treatment weighting (IPTW) by considering a broader range baseline covariates. The propensity scores for the two groups were calculated with the regularized logistic regression with L2 norm with all the baseline covariates as independent variables20,37. The stabilized IPTW was used and extreme weights beyond their 1st or 99th percentiles were further trimmed to reduce variability38. The balance of covariates was evaluated by comparing standardized mean differences (SMD), with a difference of less than 0.1 considered to be balanced. The cumulative incidence for the two groups was estimated with the Aalen-Johansen model in the matched and reweighted population by considering death as a competing risk39. The hazard ratios were estimated by the Cox survival model in the matched and reweighted population and two-sided 95% confidence intervals were calculated with the use of a robust variance estimator to account for stabilized IPTW weights. The absolute risk reduction was the difference in cumulative incidences at 180 days of follow-up between pregnant and non-pregnant groups.

The subgroup analysis was conducted by stratifying patients in both pregnant and non-pregnant groups by self-reported race/ethnicity, maternal age, trimesters when acquiring infection, variants approximated by infection time [the ancestral strain wave (March 2020– September 2022), Alpha wave (October 2020–May 2021), Delta wave (June 2021–November 2021), Omicron BA.1 and BA.2 wave (December 2021–March 2022), and Omicron other sub-variants wave (April 2022–October 2022)], body mass index, baseline comorbidities (diabetes, hypertension, asthma, class III obesity), and vaccination status. For stratified analysis by different variant periods, we further adjusted for the infection time which was categorized into bins spanning every four months. To check the robustness of results in two cohorts, the unspecific PASC diagnostic codes U099 or B948 and the post-acute cognitive, fatigue and respiratory conditions were cross-checked in both PCORnet and N3C cohorts, in terms of overall population and different sub-populations.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.