Introduction

SARS-CoV-2 infection in children usually leads to a mild or even asymptomatic infection followed by full recovery, although severe cases have been described throughout the pandemic.1 However, SARS-CoV-2 infection can also be followed by post-acute outcomes. The Multisystem Inflammatory Syndrome (MIS-C) is a severe, hyper-inflammatory reaction that usually begins two to eight weeks after infection and frequently requires intensive care support.2 However, our understanding and management of this condition has improved, and its incidence has dramatically decreased since the beginning of the pandemic, probably for viral changes but also widespread community immunity from previous infections or vaccination.3

The other post-acute outcome of pediatric SARS-CoV-2 infection is Long COVID (LC). This condition, also known as or Post COVID Condition or Post-Acute Sequelae of SARS-CoV-2, is characterized by the persistence of signs and symptoms that were not present before SARS-CoV-2 infection and last at least 8–12 weeks, negatively impacting daily life (https://www.who.int/publications/i/item/WHO-2019-nCoV-Post-COVID-19-condition-CA-Clinical-case-definition-2023-1). Long COVID has been reported in patients of almost all age groups all over the world, and among pediatric patients those older than 10 years of age seem more affected, independently from the severity of the initial infection.4,5 Although the causes of Long COVID are still unknown, several studies in adults are documenting multiple biological abnormalities when Long COVID patients are compared with healthy controls, including events associated with thromboinflammation, persistent immune activation and dysregulation.6 However, no studies have so far attempted to provide a detailed inflammatory profile of children with Long COVID. This is a major gap in our understanding of this condition in children and, subsequently, how we can advance care for these patients.

Therefore, in this study we aimed to perform an in-depth proteomic assessment of a prospectively characterized cohort of children with Long COVID compared with control groups.

Methods

Patients were enrolled from two prospective cohorts followed in two Pediatric University Departments in Rome, Italy.

Patients with Long COVID were enrolled from the pediatric Post-COVID Unit of Gemelli University Hospital in Rome Italy, an outpatient clinic that prospectively follow-up children with microbiologically confirmed SARS-CoV-2 infection. Children younger than 19 years old with laboratory SARS-CoV-2 infection (between 01/02/2020 to 15/11/2022), referred to the public post-COVID outpatient unit from the Emergency Department, admission ward or family pediatricians in the region of Rome are evaluated in person by a pediatrician specialized in Pediatric Infectious Diseases, Long COVID and Chronic Fatigue Syndrome, as previously published.4,5 Children were diagnosed with Long COVID if they had persistence of symptoms for at least eight weeks after initial infection, and the symptoms had a negative impact on daily life impairing usual routine (eg playing the usual sport, attending school, limiting daily life, eat habits, etc), and other possible diagnoses excluded (anemia, hypothyroidism autoimmune diseases, celiac diseases, parasitic infection, type 1 diabetes), according to the definition provided by the WHO (https://www.who.int/publications/i/item/WHO-2019-nCoV-Post-COVID-19-condition-CA-Clinical-case-definition-2023-1).

Patients with MIS-C, sampled during the acute phase of the disease and who had no post MIS-C sequelae, acute COVID-19, and healthy controls were enrolled at Bambino Gesù Children Hospital.7,8 Patients categorized with MIS-C were fulfilling the MIS-C WHO criteria (https://www.who.int/news-room/commentaries/detail/multisystem-inflammatory-syndrome-in-children-and-adolescents-with-COVID-19) within the CACTUS study and full laboratory and clinical characteristics have been recently described.7,9

For each patient we collected information on demographics (age, sex), severity of acute infection (classified according the WHO classification as reported in a previous study3), time between initial infection and first blood sampling, main persisting symptoms, including fatigue, cardiovascular (eg tachycardia and palpitations), respiratory (post exertional malaise (PEM), chronic cough), neurocognitive symptoms (including headack), musculoskeletal pains, gastrointestinal problems (eg nausea, vomiting, diarrhea), and skin rashes. Data on the dominant circulating variant at the time of infection was collected from the report coordinated by the Italian Superior Health Institute (https://www.epicentro.iss.it/coronavirus/SARS-CoV-2-monitoraggio-varianti).

Proteomic studies

Plasma samples were analyzed using the Olink Inflammation 96-plex panel based on the highly sensitive and specific proximity extension assay technology.10,11,12 We chose this inflammatory panel in light of growing evidence of chronic inflammation and immune dysregulation in adult patients with Long COVID.6 Our hypothesis was that also in the pediatric population Long COVID may be characterized by chronic inflammation. To reduce interand run variations, data underwent a normalization process. The pre-processed data were reported in arbitrary units as Normalized Protein Expression (NPX) that enables individual protein analysis across a sample set analyzed in log2 scale. Proteins exhibiting values below the limit of detection in over 80% of cases were excluded from the dataset.

Statistical analysis

The proteomics data were provided in NPX on a log2 scale, where a high NPX value indicates a high protein concentration, and vice versa. For each continuous variable’s distribution, normality was assessed using the Shapiro test, while variance homogeneity was evaluated using the Barlett test if normally distributed; otherwise, the Fligner test was used. Additionally, the homogeneity of variances between distributions was evaluated using the Bartlett test if all distributions were found to be normally distributed. Otherwise, the Fligner test was utilized if any of the distributions were not found to be normal. Statistical comparisons of proteomics among more than two groups for normally distributed and homogeneous distributions were a one-way ANOVA followed by Tukey-Kramer’s post hoc test. In cases where variances were unequal, but distributions remained normal, the Welch ANOVA followed by Games-Howell’s post hoc test was utilized. Alternatively, when distributions were not normally distributed, the Kruskal–Wallis’s test followed by Dunn’s post hoc test was applied. Post hoc tests were only performed on variables with adjusted p-values (FDR) < 0.05. To correct for multiple comparisons, p-values were adjusted using the False Discovery Rate (FDR). Results were deemed statistically significant if the adjusted p-value was less than 0.05. Additionally, we conducted a Principal Component Analysis (PCA) on the proteomics data to provide an overview of the distribution of patients based on their plasma proteomics profile. To mitigate the influence of age in this study, we specifically chose participants aged between 2 and 18 years for each group. Moreover, we ensured that the age distributions were evenly spread across groups and showed no statistically significant differences between them. Statistical analyses were conducted using R (version 4.3.2).

Machine learning

We utilized a proteomics dataset comprising 92 features. To ensure data integrity, variables with null values were excluded, resulting in a reduction of the feature count to 87. The final dataset comprising of 112 patients (34 LC, 32 SARS-CoV2+, 27 MIS-C, 19 HC) was divided into training (60%) (n = 67) and testing (40%) (n = 45) datasets. Subsequently, employing the Python scikit-learn library, we implemented the RFE-RF (Random Forest-Recursive Feature Elimination) algorithm. This was executed with 10-fold cross-validation, on the training dataset. The objective was to further refine the variable set down to 20, which served as inputs for the model. Through iterative experimentation, we identified the top 7 variables out of the 20 that contributed to the development of the most optimal model. We employed 6 machine learning algorithms from the Python Scikit-learn library: Logistic Regression, k-Nearest Neighbor (k-NN), Linear Support Vector Machine (SVM), Radial Basis Function Support Vector Machine (RBF-SVM), Random Forest and eXtreme Gradient Boosting (XGBoost). To optimize the hyperparameters during the training phase of each model, we employed the StratifiedKFold class. Additionally, we utilized the GridSearchCV class for identifying the best combination of hyperparameters for a model. The evaluation of model performance on the testing dataset relied upon metrics such as accuracy, sensitivity, and specificity were included in the confusion matrix within panel F of Fig. 3. Based on these parameters, the Logistic Regression model demonstrates superior performance, achieving an accuracy of 0.93, a specificity of 0.86, and a sensitivity of 0.97.

Results

One-hundreed twelve children were included in the study and underwent proteomic analyses, including 34 children fulfilling clinical criteria of Long COVID, 32 acute SARS-CoV-2 infection, 27 MIS-C and 19 healthy controls. Demographic characteristics of the different study groups were overlapping. Children with Long COVID were mostly infected during Omicron wave and had a mild acute infection. Fatigue, neurocognitive problems, and post-exertional malaise were the most common persisting symptoms complained by patients with Long COVID. Further details are available in Table 1.

Table 1 Study population

Long COVID children present a distinct proteomic profile compared to HC, SARS-CoV2 infected and MIS-C

To explore the proteomic profile of LC, we analyzed 92 plasmatic proteins mainly involved in inflammation in LC compared to SARS-CoV2 infected children (n = 40), as compared to HC (n = 36), children affected by multisystem inflammatory syndrome (MIS-C) (n = 27). To avoid proteomic biases related to age and sex,13 groups were harmonized in order to match age of control groups with the LC group (see Supplementary Fig. 1A). Clinical characteristics are provided in Table 1. Principal Component Analysis of the whole proteomic profile was able to segregate LC from age matched groups of HC, MIS-C and SARS-CoV-2+ children (Fig. 1a). Whereas no clear distribution was found according to sex, PC2 (presented in Fig. 1a) was particularly able to segregate LC from the rest of the groups. Top contributing proteins for the PC2 were mainly within the proinflammatory set of chemokines CXCL1, CXCL5, CXCL6, CXCL8, TNFSF11 showing higher values in LC compared to other groups. Other proteins suggesting an ongoing inflammatory storm and impacting on the clustering were mainly driven by the MIS-C group which showed higher values of CXCL10, CDCP1, IL18R1, CCL19, CCL23. Also, IL 10, an inhibitory molecule, resulted significantly higher in MIS-C and SARS-CoV2 suggests a compensatory effect over the acute phase of the infection or inflammation. Indeed, the distinct time infection, for SARS-CoV2, MIS-C and LC are reported in Supplementary Fig. 1C–E.

Fig. 1: Proteomic profile of the study population.
figure 1

Panel a shows Principal Component analysis of the proteomic profile across all the group. Lollipop graph in panel b shows top contributing proteins to PC2. Panel c depict a Venn diagram showing number of DEPs across the groups analyzed. Notes on the left side of the graph inform on overlapping DEPs across comparisons.

This observation was further confirmed by the Differentially Expressed Proteins (DEPs) at differential analysis when comparing LC with age matched SARS-CoV2 infected children (DEPs = 24), MIS-C (DEPs = 18) and HC (DEPs = 16) (Venn diagram in Fig. 1b). A distinct pattern of proteins was found in LC since only few DEPs were in common to all groups’ comparisons (n = 5) (Fig. 1b).

Distinctive features of LC compared to other groups were mainly found in higher proinflammatory proteins including OSM, a growth regulator protein involved in the production of IL6, STAMBP1a cytokine-mediated signal transductor in the JAK-STAT cascade as shown by violin plots in Fig. 2a. When analyzing the top-ranking DEPs able to distinguish LC from other conditions, several CXCLs family members emerged as higher in LC compared to other groups, including CXCL8, CXCL11 and CXCL5, all particularly involved in inflammation and angiogenesis (Fig. 2b). Conversely CXCL9, CXCL10 resulted higher in MIS-C vs LC. Both these proteins, according to the existence of ELR motif (Glu-Leu-Arg) are inhibitory chemokines for angiogenesis14 and may further show in MIS-C a counter effect on the cytokine storm as also confirmed by higher level of IL10.

Fig. 2: Cytokine expression and gene onthology in the study groups.
figure 2

Violins plot in panel a shows distinctive proteins of LC compared to other groups. Lollipop graph in panel b shows the most informative DEPs for group comparison between LC vs MIS-C, SARS-CoV2 and HC. Gene ontology analysis by KEGG depicting the number of DEPs enriched per specific pathway are shown in panel c.

Gene ontology analysis of DEPs, performed through KEGG, further confirmed such results showing enrichment in pathways of cytokine-cytokine receptor interaction (Fig. 2c). Also, further suggesting the roles of such chemokines in inflammation and angiogenesis, DEPS enriched in JAK-STAT and TNF signaling pathways were found especially in DEPsbetween LC and SARS-CoV2 infected children.

Proteomic analysis is able to segregate LC patients with chronic fatigue

Clinical involvement in LC was further associated to proteomic analysis to investigate whether specific signature could inform clinical characteristics (Showed in Table 1). Overall, proteomic profile was not able to inform the cumulative number of systems involved in LC as showed by the PCA in Fig. 3a. However, analyzing clinical involvement by single system, PCA was able to segregate LC presenting with chronic fatigue compared to other LC not affected by fatigue (Fig. 3b). Few specific proteins, including FGF21 a member of the fibroblast growth fact resulted significantly higher in patients with fatigue. Conversely proinflammatory members of the TNF pathway, such as TFSRSF9 and TNFSF12 involved in mechanisms of apoptosis and cell proliferation as well as a modulator of NK cell cytolytic activity (CD244) were found higher in LC without fatigue compared to the others (Fig. 3c, d). A similar pattern was also found in LC patients with respiratory involvement were significantly higher levels of TFSRSF9 and TNFSF12 were found compared to other LC (Supplementary Fig. 2A–C).

Fig. 3: Proteomic profile in children with Long Covid according to main clinical presentation and diagnostic performance of the proteomic signature.
figure 3

PCA analysis performed on LC with a specific annotation of number of clinical systems involved in shown in panel a. PCA with annotation on presence or absence of fatigue in panel b. Panels c and d show respectively lollipop graph and violin plot of DEPs found between LC with fatigue and LC w/o fatigue. Confusion matrix is shown in panel e. Panel f shows the ROC curve according to the ML, showing accuracy, sensitivity and specificity of the model in predicting LC according to proteomic data.

Machine learning analysis informed by proteomic is able to identify LC patients

To define the ability of the proteomic profile to identify LC we employed a Machine Learning (ML) approach (Fig. 3e, f). Following the outcomes of the previously outlined differential analysis procedure, we began the initial stage of feature selection. After feeding the model with proteomic data deriving from 60% of patients (n = 67) we tested the efficacy of the model on the 40% of the cohort (n = 45). The model was able to identify LC with an accuracy of 0.93, specificity of 0.86, and sensitivity of 0.97. To further test the efficacy of the model, we tested the ability to identify LC on 25 patients (17 HC, 8 SarsCoV2+) initially excluded due to differences in terms of age compared to LC.

The model, tested on such an additional group correctly classified all patients as NO Long COVID, suggesting that the score trained on a distinct cohort could be useful also in other age groups.These results, albeit limited by the small sample size, which impacts on the stability of the model, may suggest that a score informed by proteomic data may help in identifying post-acute chronic conditions such as LC.

Discussion

This is the first study providing an extensive proteomic analysis of pediatric patients with Long COVID compared with control groups. Overall, we found that children with Long COVID present distinct proteomics signatures showing a preferential overexpression of pro-inflammatory pro-angiogenetic cytokines in comparison to controls. These data suggest, for the first time, that Long COVID in children is not merely a mental health condition associated with pandemic restrictions, but a specific condition characterized by a pro-inflammatory signature, similarly to several other studies performed in adult patients.

Although Long COVID in children was first documented following infections in 2020,15 and its presence has been consistently observed by healthcare providers and family organizations globally, there has been ongoing discussion about whether its origins are rooted in physical health or if it is a psychological repercussion of the enforced social restriction.16 The main reason for questioning about the reality of pediatric Long COVID was that several studies including control groups found that symptoms like fatigue, headache, pains, etc, were relatively common in children independently from previous SARS-CoV-2 infection, suggesting that the social restrictions rather than the virus might have determined these symptoms.16 However, these studies were mostly based on online surveys of self-reported symptoms, and we have recently documented that symptoms were commonly self-reported also in children that were classified as completely recovered from COVID-19 after a careful medical evaluation.17 In fact, the WHO Long COVID definition requires that symptoms have a negative impact on daily life (https://www.who.int/publications/i/item/WHO-2019-nCoV-Post-COVID-19-condition-CA-Clinical-case-definition-2023-1), a detail usually missed by studies based on surveys.

To end this debate, we aimed to perform an in-depth immunological profile of patients. For our study, we carefully selected children with Long COVID among those reported persisting symptoms and these symptoms were impacting daily life. This well-characterized cohort of Long COVID children was found to be a well distinct immunological group compared with healthy controls and other control groups, all similar by age and sex distribution. With regard to this, in the context of proteomics, the impact of age was previously shown and highlights the importance to compare groups with similar ages for such variables.18,19 In particular, a group of four inflammatory cytokines (CXCL8, CCL7, CXCL11 and OSM) and angiogenetic factors (VEGF, TNSF14) had a particularly differential expression in patients with Long COVID compared with healthy controls.

CXCL8 is the most potent human neutrophil-attracting chemokine and plays crucial roles in the response to infection and regulate endothelial adhesion, chemotaxis, and activation of other leukocytes and mast cells.20 Interestingly, a neutrophil-associated inflammatory phenotype was shown in adults with Long COVID, and evidence of neutrophil extracellular traps, or NETs, was found in the blood of these individuals, as expressed by higher levels of cytokines including CXCL8,21 that was also differently expressed in our pediatric cohort. These data suggest that neutrophil-mediated inflammation persists in the long-term phases of patients with COVID-19, including children, opening new scenarios to understand this condition and explore potential therapeutics. NETs are highly immunogenic and lead to inflammation and epithelial/endothelial cell death,22 theoretically leading or at least contributing to the thromboinflammation that, in adults, has been proved to be a major event during Long COVID.23 Notably, neutrophil priming and NET formation are poorly responsive to corticosteroids,21 emphasizing that alternative approaches may be needed.

To further reinforce evidence of ongoing inflammation, Long COVID children exhibited higher levels of CCL7, a potent chemoattractant for a variety of leukocytes, including monocytes, eosinophils, basophils, dendritic cells (DCs), NK cells and activated T lymphocytes.24 In addition, CXCL11 levels were also increased. CXCL11 is chemotactic for activated T cells (https://www.ncbi.nlm.nih.gov/gene/6373#:~:text=CXCL11%20promotes%20tumor%20progression%20by,aggressiveness%20of%20breast%20cancer%20cells). This chemokine, along with the others overexpressed in our pediatric Long COVID cohort, was also previously found to be increased in adults with Long COVID with respiratory symptoms.25,26,27,28 Also, Oncostatin M (OSM) is a pleiotropic cytokine part of the IL6 family, which participates in the regulation of cell growth and differentiation during hematopoiesis, neurogenesis, and osteogenesis (https://www.sciencedirect.com/topics/neuroscience/oncostatin-m#:~:text=Oncostatin%20M%20(OSM)%20is%20a,during%20haematopoiesis%2C%20neurogenesis%20and%20osteogenesis). Upregulation of OSM have been reported in many pathological conditions characterized by chronic inflammation, vascular injury, and fibrosis29 but, also and more importantly, in adult patients with Long COVID compared with healthy controls.30

These findings, along with other inflammatory cytokines overexpressed in children with Long COVID compared with healthy controls, confirm previous adult studies suggesting that immune dysregulation and inflammation are associated with Long COVID,31 and such signature may also represent a diagnostic test to recognize these children, as clinically they may be misclassified as having psychiatric conditions. In this regard, our ML model was highly accurate and may be a promising test if confirmed in other cohorts, with potential routine clinical applications. Of particular interest would be to test it for other post-viral chronic fatigue syndromes with pediatric onset, conditions still neglected and without diagnostic tests.

Interestingly, the proteomic profiling also revealed that, among the proteins that most influenced the principal components, there was an upregulation of pro-angiogenetic factors such as VEGF-A, EGF and TNFSF12.32,33,34 Although we have not specifically tested the presence of thromboinflammation or circulating microclots, the upregulation of pro-angiogenetic factors we found in pediatric patients may justify future pediatric studies specifically investigating these events in children. In two smaller independent pediatric cohorts, in fact, activated circulating platelets were found in children with Long COVID (https://www.croiconference.org/wp-content/uploads/sites/2/posters/2024/347.pdf).35 Confirmation of these events in children with Long COVID, using deeper immunological investigations on larger cohorts, may open new scenarios for potential treatments in this group of patients.

Of note, from an immunological perspective, several pro-inflammatory cytokines were similarly expressed in children with Long COVID, acute COVID and MIS-C further confirming that SARS-CoV-2 infection can lead to a long-term microinflammatory environment, while differences with healthy controls were strongly significant. From an immunological perspective, it is interesting to note that children with Long COVID clustered better than those with MIS-C, a theoretically easier to recognize hyperinflammatory condition. Again, these findings would suggest that Long COVID is an organic-immunologic condition.

We also aimed to determine whether different clinical phenotypes within the sole group of children suffering from Long COVID shared similar immune profiles. It is increasingly recognized that Long COVID does not manifest as a single condition but rather exhibits a variety of phenotypes. Despite the lesser homogeneity of these groups, both clinically and immunologically, we observed that symptoms such as fatigue and respiratory issues were more discernible, even from an immunological standpoint. Notably, children experiencing persistent fatigue following COVID-19 showed a significantly elevated expression of FGF21, a key player in metabolic regulation. This includes enhancing glucose uptake, promoting gluconeogenesis, increasing the oxidation of free fatty acids, fostering ketogenesis, and boosting energy production and utilization.36 In addition, FGF21 also acts as a stress-induced myokine, which is released during different muscular pathological states, including mitochondrial dysfunction.37 Strikingly, mitochondrial dysfunction and muscle pathology has been found to be a hallmark of Long COVID fatigue in adults,38 and FGF21 levels have already been linked with adults with chronic fatigue.39 All together, these findings are meaningful and strongly reinforce the organic nature of Long COVID in children, in a very similar way and pathway of adult Long COVID, and contribute to a better understanding of Long COVID and other post-viral fatigue syndromes, opening new hopes for diagnostics and therapeutic strategies, also for pediatric patients.

Our study has limitations to address. The cohorts were relatively small, although we acknowledge that Long COVID is rarer in children compared with adults. Also, we have not been able to profile thromboinflammatory events and metabolomic profiles in our patients, therefore we cannot exclude that these major mechanisms happening in adults is also a main event in children. In this optic, it has been recently shown in adults as low serotonin represents a metabolite signature associated with Long COVID. Indeed, viral inflammation drives serotonin depletion by reducing tryptophan absorption and increased monoamine oxidase (MAO) enzymes expression.40 This inflammatory response is TLR3-, IFNAR-, and STAT1-dependent and results in decreased vagal and hippocampal activation as well as in cognitive impairment. We cannot exclude that severe reduction in the daily activity experienced in our patients could play a role in this biological loop. This represents an important future investigation as these events are theoretically treatable. Last, we have not assessed longitudinal blood samples in this study, therefore we are not able to define if children that recover from Long COVID also return to an immune signature that resemble the one of healthy controls. However, since we have assessed the preliminary results of this study, we have started follow-up sampling, and we hope in future to also fill this gap. Nevertheless, the rigorous classification of our patients, the availability of pre-pandemic healthy controls, and the in-depth immunological profile of our cohorts, represent a unique strength in this field of research.

In conclusions, our study suggests that pediatric Long COVID is characterized by a blood protein signature marked by increased ongoing general and endothelial inflammation and is well distinct from healthy controls. These data strongly suggest a similar immune-mediated nature of Long COVID in children as much as in adults, offering a basis for new diagnostics bus also opening new scenarios on our understanding of Long COVID and other post-viral chronic fatigue syndromes with onset in the pediatric age.