Introduction

Increased neuropsychiatric sequelae associated with the COVID-19 pandemic have been reported worldwide1,2. However, it remains unclear to what extent these effects are attributable to SARS-CoV-2 infection itself versus broader pandemic-related stressors and mitigation strategies2,3,4. Similar to adults, children and youths are also susceptible to experiencing enduring neuropsychiatric and related conditions after an acute COVID-19 infection5,6. Although significant research has been conducted on Post-acute Sequelae of SARS-CoV-2 Infection (PASC) in the adult population, there remains a notable gap in studies pertaining to pediatric cases7,8. Children and youths often exhibit distinct symptoms compared to adults and typically experience a milder acute disease trajectory, with a reduced risk of hospitalization or mortality, especially in cases where pre-existing conditions are absent9,10,11,12. Given these variations in acute infection profiles and prevalence in children and youths as compared with adults, it is imperative to separately investigate the characteristics of PASC in the pediatric population in well-controlled studies.

There are existing studies with large pediatric samples investigating neuropsychiatric conditions in pediatric populations with and without COVID-19 infection13,14,15. However, the results remain inconclusive due to limitations such as the reliance solely on clinical diagnoses to identify COVID-19 positive and negative cohorts, with only a subset being confirmed with testing14,15. Given that COVID-19 symptoms are often mild or absent in children, some infected individuals may have been misclassified13,14,15. These studies likely underestimated the prevalence of mental health conditions, as many DSM-5-based diagnoses used by clinicians cannot be fully matched to ICD-10-CM codes13,14,15.

In our study, the large electronic health record (EHR) data set allowed COVID-19 negative cohorts of sufficient size matched for risk factors and stratified by age. We used both diagnosis and polymerase chain reaction (PCR), antigen, or serology tests to reliably identify COVID-19 positive and negative groups16. Neuropsychiatric and related conditions were identified by a typology developed to query EHR data for the full spectrum of DSM-5 disorders17. The primary objective of this retrospective cohort study was to ascertain the risk of developing neuropsychiatric and related conditions after the pandemic in children and youths who had tested positive for COVID-19 compared to those who tested negative and never had a positive test at the same time interval. To achieve this, we utilized EHR data collected from twenty-five children’s hospitals and healthcare institutions across the United States from the RECOVER program. Initially, we calculated the raw frequency of any neuropsychiatric and related conditions, both before and after the onset of the pandemic. Subsequently, we conducted a difference-in-difference analysis to determine whether contracting SARS-CoV-2 increased the risk of being diagnosed with neuropsychiatric and related conditions, compared to the SARS-19 negative group, both groups being exposed to the pandemic psychosocial stressors.

Results

Frequency of post-acute neuropsychiatric related events for COVID-19-positive and COVID-19-negative patients

The detailed cohort construction procedure is shown in Fig. 1 and the Methods section. The baseline description of covariates in both cohorts is presented in Table 1. As shown in Tables 2, 3, there were small increases in frequency of any neuropsychiatric and related condition in the post-COVID phase (compared to pre-COVID) for both COVID-19 positive and COVID-19 negative groups in the children (COVID 19 positive cohort:12·45% to 14·01%; COVID 19-negative cohort: 11·6–12·48%) as well as for youths (COVID-19 positive cohort: 16·0% to 17·86%; COVID 19 negative cohorts: 15·55% to 16·76%).

Fig. 1
figure 1

Selection of participants for both COVID-19-positive and COVID-19-negative patients, stratified by age (children and youths).

Table 1 Baseline demographic and health characteristics of COVID-19 positive and negative groups, stratified by age into children (5 to 11 years) and youths (12 to 20 years)
Table 2 Raw frequency of individual and composite neuropsychiatric and related conditions before and after the index date in the children cohort (5–11 years)
Table 3 Raw frequency of individual and composite neuropsychiatric and related conditions before and after the index date in the youths cohort (12–20 years)

During the post-acute phase, both the child and youth COVID-19 positive groups displayed a higher frequency than their respective COVID-19 negative groups in the composite outcome and across various categories, including adverse childhood experience, anxiety disorders, mood disorders, neurocognitive disorders, neurodevelopmental disorders, sleep-wake disorders, standalone symptoms, and substance use and dependence. Additionally, the child COVID-19 positive group has a higher prevalence than the COVID-19 negative group in eating and feeding disorders, intentional self-harm/suicidality, personality disorders, psychotic disorders, and tic disorders.

Risk difference of post-acute neuropsychiatric outcomes after SARS-CoV-2 infection

As shown in Figs. 2, 3, after propensity score matching and interrupted time analysis, both the children and youths COVID-19 positive groups retained significant risk differences compared to their respective negative groups in the composite outcome (children: 0·96%, 95% CI [0·75%, 1.16%]; the youth: 0·84%, [0·53%, 1.15%]). The children COVID-19 positive group also exhibited significant risk differences for anxiety disorder (0·26%, [0·19%, 0·33%]), OCD (0·02%, [0·00%, 0·04%]), somatoform disorder (0·03%, [0·00%, 0·05%]), stress disorder (0·08%, [0·02%, 0·14%]), avoidant/restrictive food intake (0·07%, [0·03%, 0·11%]), bipolar disorder (0·01%, [0·00%, 0·02%]), delirium (0·04%, [0·02%, 0·06%]), ADHD (0·11%, [0·02%, 0·21%]), autism spectrum disorder (0·10%, [0·02%, 0·18%]), communication/motor disorder (0·38%, [0·25%, 0·52%]), and intellectual disability (0·12%, [0·05%, 0·20%]), and tic disorder (0·05%, [0·02%, 0·08%]).

Fig. 2: Risk difference of post-acute COVID-19 neuropsychiatric and related conditions in children aged 5–11, compared to the COVID-19-negative cohort.
figure 2

Outcomes include cluster-level conditions across adverse childhood experiences, anxiety disorders, disruptive behavior disorders, eating and feeding disorders, elimination disorders, gender dysphoria/sexual dysfunction, intentional self-harm/suicidality, mood disorders, neurocognitive disorders, neurodevelopmental disorders, personality disorders, psychotic disorders, sleep-wake disorders, standalone symptoms, substance use and dependence, and tic disorders. The composite outcome refers to the occurrence of any listed neuropsychiatric or related condition. The sample size was 141,349 for the COVID-19 positive group and 441,790 for the COVID-19 negative group. Risk differences and 95% confidence intervals are shown. Red lines indicate statistically significant differences (p < 0.05), while gray lines indicate non-significant findings. P-values were calculated using two-sided t-tests; no adjustments were made for multiple comparisons.

Fig. 3: Risk difference of post-acute COVID-19 neuropsychiatric and related conditions in youths aged 12 to 20, compared to the COVID-19-negative cohort.
figure 3

Outcomes include cluster-level conditions across adverse childhood experiences, anxiety disorders, disruptive behavior disorders, eating and feeding disorders, elimination disorders, gender dysphoria/sexual dysfunction, intentional self-harm/suicidality, mood disorders, neurocognitive disorders, neurodevelopmental disorders, personality disorders, psychotic disorders, sleep-wake disorders, standalone symptoms, substance use and dependence, and tic disorders. The composite outcome refers to the occurrence of any listed neuropsychiatric or related condition. The sample size was 184,725 for the COVID-19 positive group and 445,524 for the COVID-19 negative group. Risk differences and 95% confidence intervals are shown. Red lines indicate statistically significant differences (p < 0.05), while gray lines indicate non-significant findings. P values were calculated using two-sided t tests; no adjustments were made for multiple comparisons.

For the youth cohorts, the COVID-19 positive group had significantly higher risk difference compared to the COVID-19 negative cohort in anxiety disorder (0·26%, [0·05%, 0·48%]), suicidality (0·11%, [0·02%, 0·19%]), minor depression (0·21%, [0·05%, 0·37%]), delirium (0·08%, [0·03%, 0·14%]), ADHD (0·33% [0·16%, 0·50%]), intellectual disability (0·09%, [0·01%, 0·17%]), insomnia (0·13%, [0·06%, 0·21%]), and anxiety standalone symptoms (0·05%, [0·00%, 0·10%]), attention standalone symptoms (0·08%, [0·03%, 0·14%]), depressive standalone symptoms (0·02%, [0·00%, 0·04%]). Note that Figs. 2 and 3 display model-adjusted estimates, whereas Tables 2 and 3 show raw, unadjusted frequencies; differences between them reflect adjustment for baseline risk and time trends.

Selective psychotropic medications with the potential to decrease susceptibility to SARS-CoV-2 infection were used by 0·68% of COVID-19 positive children and 0·75% of negative children aged 5-12 years. Among youths, these medications were used by 5·09% of COVID-19 positive patients and 5·36% of negative patients. Detailed results can be found in Supplementary Note 3.

Discussion

Infections have long been linked to neuropsychiatric disorders, as evidenced by reports from the 1890 influenza epidemic, the 1918 Spanish flu, and more recently, a Danish nationwide study18. This study found that children and adolescents who were hospitalized for infections faced an increased risk of subsequent diagnoses of neuropsychiatric disorders and higher rates of psychotropic medication prescriptions. The highest risks following infections were associated with conditions such as schizophrenia, OCD, personality and behavioral disorders, intellectual disability, autism, ADHD, ODD, conduct disorders, and tic disorders18.

In this study, the primary objective was to investigate the impact of COVID-19 infection on the potential risk of post-acute sequelae neuropsychiatric and related conditions for both children and youths. Using the real-world EHR data from twenty-five health institutions in the RECOVER program, we conducted the retrospective cohort study of patients 5 to 20 years of age with documented SARS-CoV-2 infection compared to those with a negative test. The findings demonstrate that children and youths with a history of COVID-19 infection exhibited a consistent increase in risk for multiple neuropsychiatric conditions compared to their matched counterparts. Although the effect sizes across multiple outcomes are minimal, they remained statistically significant, suggesting a potential pattern of elevated risk not likely due to chance alone. These observations align with global reports highlighting the combined effects of SARS-CoV-2 infection and broader pandemic stressors19. Similarly, the higher frequency rates observed in older age groups in both COVID-19 positive and negative cohorts (1·56% and 0·88%, respectively, for ages 5–11, and 1·86% and 1·21%, respectively, for ages 12–20) echo prior studies suggesting that adolescents and young adults may be disproportionately affected by both the viral infection and pandemic stress compared to younger children19. Recent large-scale studies using EHR data further support this, reporting a higher likelihood of developing new neuropsychiatric and related conditions in both COVID-19 positive and negative adolescents compared to younger children15.

The key findings from our study show that both children and youth in the COVID-19 positive groups retained significant risk differences compared to their respective negative groups for the composite neuropsychiatric outcome (as shown in Table 3 and Fig. 2). The risk difference was slightly higher in children than in youths. Additionally, differences across diagnostic categories were observed between the two age groups. Among children with infection, the highest risk difference was seen for communication/motor disorders, followed by anxiety, intellectual disability, ADHD, and autism spectrum disorder. Other conditions, such as stress-related disorders, avoidant/restrictive food intake, tics, delirium, somatoform disorders, OCD, and bipolar disorder, had risk differences ranging from 0·08% to 0·01%. In youth with infection, the highest significant risk difference was for anxiety disorders, followed by minor depression, standalone attention symptoms, insomnia, and suicidality. Intellectual disability and standalone symptoms of anxiety and depression had risk differences ranging from 0·09% to 0·02%. The small increases in risk found in our study support studies indicating that infections may account for only a small proportion of the risk for neuropsychiatric and related conditions20. That same study also showed that polygenic risk scores for infections were associated with modest increase in risk for ADHD, major depression, and schizophrenia. In our study, increased risk for ADHD and minor depression were found in the COVID-19 positive child and youth cohorts respectively while risks for disorders that are more common in the older age ranges would be less likely to be detected. Although the absolute differences in risk were small, they may still hold relevance in a public health context, as even slight increases in childhood neuropsychiatric conditions could have broader implications for healthcare burden and developmental trajectories.

Our study has several notable strengths. Firstly, by leveraging EHR data from over twenty clinical institutions nationwide as part of the RECOVER program, our research presents the most comprehensive investigation on U.S. children and youths to date, exploring the impact of SARS-CoV-2 infection on the neuropsychiatric and related conditions21. Secondly, our approach included a more extended follow-up period than most existing studies. Specifically, our follow-up extended until December 2022, encompassing the emergence of the Omicron variant. Thirdly, we accounted for pre-infection differences in neuropsychiatric and related condition risks by employing the difference-in-differences method. This approach allowed us to estimate the additional contribution of SARS-CoV-2 infection beyond general pandemic effects while accounting for baseline disparities in neuropsychiatric and related conditions. Additionally, we enhanced our analysis by adjusting for over 200 potential confounders through propensity score stratification. This method ensured a balanced comparison between the SARS-CoV-2-infected and non-infected groups22. Lastly, our study’s comprehensive scope—examining 50 neuropsychiatric and related conditions at both individual disorder and category levels—enabled a more nuanced assessment of the impact of SARS-CoV-2 infection on mental health. While the inability of previous studies using limited ICD codes to detect increased outcomes during the pandemic cannot be attributed solely to its reliance on ICD codes22, limited outcome definitions and reduced diagnostic granularity likely contributed. By integrating SNOMED CT, we aimed to reduce the risk of missed diagnoses and improve detection sensitivity, even while acknowledging that all code-based EHR analyses remain constrained by practitioner documentation practices. SNOMED CT provides detailed, structured input during patient care, while ICD codes enable standardized data retrieval and secondary analysis23. Our EHR-based pediatric mental health typology identified 4047 SNOMED CT codes, covering 49 diagnostic clusters and one composite outcome, mapped to ICD-CM for billing and administrative compatibility24. This integration improves diagnostic granularity, addressing ICD-based limitations that may underreport neuropsychiatric and related conditions due to clinical documentation variability. While SNOMED CT broadens neuropsychiatric condition capture, it cannot resolve under-detection if symptoms go undocumented due to diagnostic uncertainty, stigma, or system constraints. Ultimately, SNOMED CT enhances diagnostic precision but remains dependent on clinical documentation.

Our study is subject to several limitations that can be considered for future studies. Firstly, identifying a high-quality COVID-19 negative group presents a significant challenge. To mitigate potential misclassification of negative status, we have utilized multiple tests, including PCR, antigen, and serology test results, in addition to diagnosis codes for COVID-19 and long COVID, to refine our definition of the COVID-19 negative group. Despite these efforts, the rapid and dynamic developmental changes experienced by children and youths, such as the physical growth and changes in physiological, cognitive, emotional, and social domains, suggest that further enhancements in control selection methods could improve the reliability of our findings. We also acknowledge that, in addition to potential misclassification due to asymptomatic infections, true infection status may also have been misclassified due to community-level testing constraints, particularly as widespread, no-cost testing became less available over time. Despite our efforts to define the COVID-19 negative group using multiple test types and diagnosis codes, some individuals may have experienced asymptomatic or undiagnosed infections—particularly during later stages of the pandemic when pediatric exposure to SARS-CoV-2 was widespread. This misclassification may bias our results toward the null, suggesting that the observed risk differences could underestimate the true impact of infection. Secondly, although we implemented rigorous methods to ensure comprehensive data collection, certain biases may be intrinsic to our study. For example, in youths with more severe symptoms, parents may have been more likely to disclose additional health-related information, potentially leading to reporting biases. Also, while the EHR data used in this study capture a wide range of care settings—including primary care, specialty care, and hospital-based services—they are primarily derived from large academic and nonprofit health systems. As such, healthcare encounters that occur in unaffiliated community practices or smaller clinics may be underrepresented, potentially limiting the generalizability of findings to populations served outside these networks. Differential access to clinicians with the appropriate expertise to evaluate neuropsychiatric issues could also have contributed to the underascertainment of such conditions. Additionally, while our analysis incorporated an extensive list of potential confounders available within the EHR database, the inherent limitations of EHR data completeness may still introduce potential confounding bias. Thirdly, while both COVID-19 positive and negative cohorts were exposed to broader societal stressors of the pandemic, including disruptions in daily life, school closures, and healthcare access, direct measures of these stressors were not available in our dataset. Moreover, such variables are generally not captured in EHR data, making it challenging for EHR-based studies to fully disentangle infection-related risks from concurrent pandemic-related exposures. To address this limitation, we used calendar time as a proxy for pandemic-related stressors, ensuring that both groups were compared within the same broader environmental context. This approach allows us to estimate the additional contribution of SARS-CoV-2 infection beyond general pandemic effects, though future research incorporating external data sources on societal factors would provide a more comprehensive understanding of these complex relationships. Moreover, our analysis did not account for participants who may have been infected several times during the study period, a factor that could become increasingly relevant in the later stages of the pandemic.

In summary, in both COVID positive and negative cohorts, we found small increases in frequency in composite neuropsychiatric and related outcomes, slightly higher in the COVID positive group and in the older age groups. These small increases are similar to those reported in other studies and attributed to the combined COVID-19 viral infection and broad pandemic stressors19,25.

While the frequency attributed to the combined viral infection and pandemic stress, and the risk attributed to the viral infection may be small, these raise concern in a pediatric population given that childhood conditions often have lifelong consequences26,27.

Our results, therefore, indicate an urgent need for well-controlled studies that investigate not only COVID-19 but other infections, known to affect the CNS. Pediatric studies also require cohorts with narrower age stratification, cohorts that also include the prenatal period, and adequate follow-up to control for the rapid neurodevelopmental changes.

Methods

Inclusion and ethics

This study was conducted using de-identified electronic health record (EHR) data from 25 diverse pediatric healthcare institutions participating in the NIH RECOVER Initiative. Institutional Review Board approval was obtained under a central protocol with waiver of consent and HIPAA authorization, in accordance with all applicable ethical guidelines. The study population includes children and youths from a wide range of racial, ethnic, socioeconomic, and geographic backgrounds, enhancing generalizability and equity considerations. Our analyses sought to understand the neuropsychiatric impacts of SARS-CoV-2 infection while accounting for potential disparities in healthcare access, data availability, and diagnostic practices across sites.

Study design and participants

We conducted a retrospective cohort study using the pediatric EHR cohort of the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative, which seeks to understand, treat, and prevent long COVID (more information on RECOVER https://recoverCOVID.org/). The pediatric RECOVER EHR network spans 38 health systems across the United States, of which 25 were included in the study. The Institutional Review Board (IRB) obtained approval under Biomedical Research Alliance of New York (BRANY) protocol #21-08-508, with a waiver of consent and HIPAA authorization. The participating institutions in this study include Ann & Robert H. Lurie Children’s Hospital of Chicago, Children’s Hospital Colorado, Children’s Hospital of Philadelphia, Children’s National Medical Center, Cincinnati Children’s Hospital Medical Center, Duke University, Medical College of Wisconsin, Medical University of South Carolina (MUSC), Montefiore, Nationwide Children’s Hospital, Nemours Children’s Health System (inclusive of the Delaware and Florida health system), New York University School of Medicine, Northwestern University, OCHIN, Seattle Children’s Hospital, Stanford Children’s Health, University of California, San Francisco, University of Iowa Healthcare, University of Michigan, University of Missouri, University of Nebraska Medical Center, University of Pittsburgh, Vanderbilt University Medical Center, Wake Forest Baptist Health, and Weill Cornell Medical College. These sites were selected based on data completeness and quality, including sufficient follow-up time, documented COVID-19 testing, and complete information on key covariates such as race/ethnicity and obesity. The participating institutions represent a mix of public and private healthcare systems and collectively capture a broad and diverse pediatric population across racial, ethnic, socioeconomic, and geographic backgrounds, enhancing the generalizability of our findings. Detailed data description can be found in Supplementary Note 1.

In the construction of our COVID-19 positive cohort, we began by identifying individuals who received their first positive COVID-19 PCR, antigen, or serology test and a diagnosis of COVID-19/PASC within the study period from March 1st, 2020, to December 3rd, 2022 (N = 1,017,542). From this initial group, we subsequently filtered for those with at least one medical visit occurring between 28 and 179 days after the index date (follow-up interval)11,28,29,30,31,32,33,34,35,36,37 (N = 787,370) and at least one visit within the 7 days to 24 months leading up to the index date (baseline interval) (N = 676,582). We included only the patients with complete variable records (n = 488,606), and we refined the positive cohort with age constraints between five and twenty when the study period starts and complete records (N = 326,074). Among these individuals, we identified a child cohort with ages 5–11 years (N = 141,349) and a youth cohort with ages 12–20 (N = 184,725).

We then constructed a COVID-19 negative cohort consisting of individuals who were not included in the COVID-19 positive group. Specifically, these individuals had no record of a positive COVID-19 test, had at least one documented negative PCR, antigen, or serology test during the study period, and had no recorded diagnoses of COVID-19 or PASC (N = 3,030,550). For this COVID-19 negative group, we imputed index dates randomly from the distribution of index dates observed in the COVID-19 cohort, ensuring that both cohorts shared a similar distribution of follow-up times. We further required that patients in the COVID-19 negative cohort must have had at least one visit between 28 and 179 days after the imputed index date as the follow up period (N = 2,172,217) and at least one visit occurring between 7 days to 24 months before the imputed index date as the baseline period (N = 1,766,033). Similar to the COVID-19 positive cohort, we only included patients with complete variable records (N = 1,416,069) and satisfying age constraints between five and twenty at the start of the study period (N = 887,314). We further stratified the children cohort with ages from five to eleven (N = 441,790) and the youth cohort with ages from twelve to twenty (N = 445,524). Figure 1 displays attrition tables for both COVID-19 positive and negative cohorts.

In this research, we utilized covariates assessed before the index date. The predefined covariates were determined based on prior knowledge38,39. The predefined covariates included age; race (Asian/PI, Black/AA, Hispanic, White, multiple, and other); gender (male, female, and other); hospital; body mass index; and hospital utilization, including the number of ED visits, inpatient encounters, and outpatient encounters. We also included the Pediatric Medical Complexity Algorithm (PMCA) index40,41,which classifies children’s chronic disease complexity based on diagnosis codes; the number of negative tests prior to cohort entry; and medical history. To adjust for the timing of the COVID-19 test for the cohorts, we additionally included the calendar month in which a patient tested positive for COVID-19 and entered the cohort.

We also evaluated the use of selective psychotropic medications, reported to be activators of Sigma 1-receptor ligand, of varying affinity, as some prior data suggested their potential capacity to decrease susceptibility to SARS-CoV-2 infection. These included SSRIs (fluvoxamine, fluoxetine, citalopram, and escitalopram) and antipsychotics (haloperidol, chlorpromazine, and fluphenazine)42,43. We evaluated the prevalence of usage of the above medications in both COVID-19 positive patients and the negative cohort to ensure that SSRI usage did not introduce imbalance or bias into our study results.

Outcomes

The outcomes were predetermined based on our prior research on systematically characterizing the post-acute effects of SARS-CoV-2 infection44. We specify our outcomes based on Systematized Nomenclature of Medicine (SNOMED)45, and a typology developed to query aggregated, standardized EHR data for the full spectrum of neuropsychiatric and related conditions. This typology included the pediatric DSM-5 disorder categories including anxiety, OCD, somatic, stress, disruptive behavior, feeding and eating, elimination, gender dysphoria/sexual dysfunction, mood, neurocognitive, neurodevelopmental, personality, psychotic, sleep-wake, substance use, and dependence disorders46. Expansion beyond DSM-5 disorders included intentional self-harm, catatonia, encephalopathies, standalone symptoms, tic disorders, and adverse childhood experiences17.

We also specified a composite outcome of any neuropsychiatric and related condition. Supp Table 1 in Supplementary Note 2 details the definition of the outcomes. To illustrate the granularity offered by SNOMED CT in defining these outcomes, we include a comparative table in Supplementary Note 2, which contrasts SNOMED CT and ICD coding granularity for neuropsychiatric and related conditions and supports our approach to utilizing detailed clinical. Frequencies of each outcome were assessed 24 months to 7 days before and 28 days to 179 days after the index date for children and youths, respectively (Tables 2, 3).

Statistical analyses

We defined the pre-COVID period as the span from 24 months to 7 days before the index date and the post-COVID period as the period from 28 to 179 days after the index date (the post-acute phase). For each neuropsychiatric and related condition, we calculated its frequency by dividing the number of patients who were diagnosed during each of the defined periods.

To assess differences in the risk of neuropsychiatric and related conditions between COVID-19 positive and negative patients, we conducted an interrupted time-series analysis using a two-sample proportion test with stratified cohorts of children and youths. To mitigate the potential impact of measured confounding factors, we employed a propensity score matching method with the covariates outlined in the Covariates section. After matching, we assessed the standardized mean difference (SMD) for each covariate, employing a cutoff value of 0·1. Subsequently, we compared the risk difference in neuropsychiatric and related conditions between the COVID-19 positive and the COVID-19 negative cohort. The characteristic balance results before and after propensity score matching are presented in Supplementary Note 4.

Sensitivity analysis

We performed comprehensive sensitivity analyses to assess the robustness of our findings. Initially, we conducted an analysis without age stratification and documented the results in the Supplementary Note 5. We also performed an analysis with a different control group, which was defined as patients with at least one negative test and one non-COVID respiratory disease diagnosis within 30 days of the negative test. Details of the study design and results are documented in the Supplementary Note 6. Furthermore, our sensitivity analysis included subgroup analyses in the Supplementary Note 712 based on gender (male and female), race/ethnicity (Asian/Pacific Islander (PI), Black/African-American(AA), Hispanic, and White), obesity, hospitalization status (non-hospitalized, hospitalized, and admitted to ICU), severity of symptoms (asymptomatic, mild, moderate, and severe), and time frames corresponding to predominant virus variants (pre-Delta, Delta, and Omicron). Additionally, we evaluated the robustness of our inference to the specification of the variance-covariance structure by comparing the model-based standard errors with those estimated using a heteroskedasticity-consistent (robust sandwich) estimator. Results of this analysis, presented in the Supplementary Note 13, showed minimal differences in confidence intervals and supported the stability of our findings.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.