Abstract
Rapid and accurate diagnosis of emerging inflammatory illnesses is challenging due to overlapping clinical features with existing conditions. We demonstrate an approach that integrates proteomic analysis with machine learning to identify diagnostic protein signatures, using the example of SARS-CoV-2-induced multisystem inflammatory syndrome in children (MIS-C). We used plasma samples collected from subjects diagnosed with MIS-C and compared them first to controls with asymptomatic/mild SARS-CoV-2 infection and then to controls with pneumonia or Kawasaki disease. We used mass spectrometry to identify proteins and support vector machine (SVM) algorithm-based classification schemes to identify protein signatures. Diagnostic accuracy was assessed by calculating sensitivity, specificity, and area under the ROC curve (AUC), and corrected for overfitting by cross-validation. Proteomic analysis of a training dataset containing MIS-C (N = 17), and asymptomatic/mild SARS-CoV-2 infected control samples (N = 20) identified 643 proteins, of which 101 were differentially expressed. Plasma proteins associated with inflammation increased, and those associated with metabolism and coagulation decreased in MIS-C relative to controls. The SVM machine learning algorithm identified a three-protein model (ORM1, AZGP1, SERPINA3) that achieved 90.0% specificity, 88.2% sensitivity, and 93.5% AUC, distinguishing MIS-C from controls in the training set. Performance was retained in the validation dataset utilizing MIS-C (N = 19) and asymptomatic/mild SARS-CoV-2 infected control samples (N = 10) (90.0% specificity, 84.2% sensitivity, 87.4% AUC). We next replicated our approach to compare MIS-C with similarly presenting syndromes, such as pneumonia (N = 17) and Kawasaki disease (N = 13), and found a distinct three-protein signature (VWF, FCGBP, and SERPINA3) that accurately distinguished MIS-C from the other conditions (97.5% specificity, 89.5% sensitivity, 95.6% AUC). A software tool was also developed that may be used to evaluate other protein signatures using our data. These results demonstrate that the use of mass spectrometry to identify candidate plasma proteins followed by machine learning, specifically SVM, is an efficient strategy for identifying and evaluating biomarker signatures for disease classification.
Introduction
Timely diagnosis enables the prompt delivery of effective treatment. When a new condition such as the multisystem inflammatory syndrome in children (MIS-C) emerges, researchers and clinicians seek reliable paths to rapid diagnosis and effective treatments. MIS-C, which was first identified at the outset of the COVID-19 pandemic, is a rare, severe, and at times fatal condition characterized by fever, systemic hyperinflammation, and multi-organ dysfunction that can develop 2–6 weeks after SARS-CoV-2 infection1,2. As with many emerging conditions, MIS-C is a diagnosis of exclusion requiring a multi-tiered search for an alternative explanation before diagnosis is established3, a process that has proven expensive in terms of resources and time to diagnosis. Despite the massive effort of scientific teams to elucidate the underlying pathophysiology, to establish paths to timely diagnosis, and to develop effective treatments, we are still seeking an accurate diagnosis and differentiation of MIS-C from other hyperinflammatory syndromes such as Kawasaki disease (KD).
Several studies have investigated the landscape of plasma proteins in MIS-C to gain insight into its pathogenesis and identify biomarkers that are distinctive for MIS-C4,5,6. Most were structured toward discovering candidate proteins rather than assessing their ability to discriminate MIS-C from comparator diseases. In the present study, we integrated data-independent acquisition mass spectrometry (DIA-MS)7 and artificial intelligence to develop an analytical framework for biomarker selection and validation. We used mass spectrometry to identify proteins; support vector machine (SVM)8, a machine learning approach, to identify proteins distinguishing subjects as having MIS-C or an identified alternative disease; and receiver operating characteristics (ROC) curves to assess the resulting model’s discrimination accuracy. Our work has also resulted in an open-access SVM-based analytical tool and a robust dataset that enables the validation of protein biomarker signatures for MIS-C.
Materials and methods
Study participants
We enrolled participants ≤ 21 years old (Table 1) and collected blood samples at nine sites from four states (CT, NJ, NY, OH). Children and youth with MIS-C were classified in accordance with the 2020 U.S. Centers for Disease Control criteria, which include a recent history of SARS-CoV-2 infection, signs of inflammation and involvement of at least two organ systems, and no alternative plausible diagnosis3. Pneumonia was defined by the presence of an infiltrative process in the lung parenchyma on chest radiography secondary to infection (viral or bacterial), without evidence of concurrent SARS-CoV-2 infection. Diagnosis of Kawasaki disease (KD) was based on established criteria9. All pneumonia and KD participants tested negative for SARS-CoV-2 at enrollment. For all disease conditions, blood for proteomics analysis was collected during hospitalization. Controls in our study were subjects with a history of mild or asymptomatic SARS-CoV-2 infection who were defined as having a positive SARS-CoV-2 test and presenting in the outpatient setting with no symptoms or symptoms not requiring inpatient care before sample collection.
Sample preparation for mass spectrometry
Plasma (10 µg per sample) was diluted in 50 mM HEPES, 50 mM EDTA, and 2% SDS. It was reduced with 5 mM DTT for 30 min at 60 °C and alkylated with 20 mM iodoacetamide for 1 h at room temperature and in the dark. The sample was then subjected to SP3 beads digestion with trypsin (sequencing grade, Thermo Fisher Scientific) in 100 mM ammonium bicarbonate, 2 mM CaCl2, and incubated at 37 °C overnight, as described10. Peptides were acidified with formic acid, and 10.0% of each sample was analyzed by liquid chromatography–tandem mass spectrometry (LC–MS/MS).
Liquid chromatography–tandem mass spectrometry
Samples were analyzed by data-independent acquisition mass spectrometry (DIA-MS)7 using a Dionex Ultimate 3000 RLSCnano System (Thermo Fisher Scientific) interfaced with an Orbitrap Eclipse Tribrid mass spectrometer (Thermo Fisher Scientific). Briefly, samples were loaded onto a fused silica trap column, Acclaim PepMap 100, 75 µm × 2 cm (Thermo Fisher Scientific). After washing with 0.1% trifluoroacetic acid, the trap column was brought in-line with an analytical column (Nanoease MZ peptide BEH C18, 130 A, 1.7 µm, 75 µm × 250 mm, Waters) for LC–MS/MS. Peptides were fractionated at 300 nL/min using a segmented linear gradient 4–15% A + B in 30 min (where A: 0.16% formic acid, 80% acetonitrile, and B: 0.2% formic acid), 15–25% A in 40 min, 25–50% A in 44 min, and 50–90% A in 11 min. Solution A was then returned to 4% for 5 min for the subsequent run. Samples of different groups were loaded in alternating fashion. Pooled samples were digested alongside the individual samples and loaded first 3 times to condition the column and run every 10 samples. MS scan range was set at 400–1200, resolution 12,000 with AGC set at 3E6 and ion time set as auto. An 8 m/z window was set to sequentially isolate and fragment the ions in the C-trap with a relative collision energy of 30. The MS/MS data were recorded with a resolution of 30,000.
Raw data were analyzed using an in-silico predicted peptide library generated from the UniProt human reference proteome for library-free database searching using DIA-NN 1.8.111. Results were filtered for posterior error probability for the precursor identification of < 1% and Protein Group Q value, also < 1%. Protein abundance was quantified using the MaxLFQ method and used for downstream analysis12.
Mass spectrometry data analysis
MaxLFQ values were log2-transformed before statistical analysis. We fitted a linear model comparing diseased to control for each protein. In our initial protein screen we included, for each protein, only samples with < 50% missing data. The result of these fitted models was, for each protein, an estimate of the log2 abundance ratio and its standard error, from which we calculated a p value. To adjust for multiple comparisons, we converted the raw p values to Holm p values13. To control the false discovery rate, we calculated q values from the raw p values using the Benjamini–Hochberg method. All calculations were made using the R system and package libraries (https://www.R-project.org/).
Classification model building
The study included (1) a training set comprising participants with MIS-C (n = 17) and control participants with a history of mild or asymptomatic SARS-CoV-2 infection (n = 20), and (2) a pre-defined independent validation set comprising participants with MIS-C (n = 19), mild/asymptomatic participants (n = 10), pneumonia (n = 17), and KD (n = 13). The samples in the independent validation set were collected from different sites and/or at a different stage of the study. We used the SVM classifier (R function “svm” in the “e1071” package) to develop models using the DIA-MS protein data to distinguish MIS-C from other conditions. Data for the SVM procedure were scaled to zero mean and unit variance to ensure consistency across datasets. We tuned the SVM model by testing the linear and radial kernels over a range of the gamma and cost hyperparameters and settled on a linear kernel with cost = 1. To assess the accuracy of the SVM models, we calculated sensitivity, specificity, and area under the ROC curve (AUC)14. We next built a classifier model based on the training set of participants and applied it to the independent validation set to obtain a validated AUC. Five random repetitions of five-fold cross-validation (R package “crossval”), with feature selection applied in each fold, were used to correct for overfitting and to calculate 95% confidence intervals for the sensitivity, specificity, and AUC. ROC curves and bootstrap-generated 95% confidence intervals were plotted using the pROC R package. A summary of the analytical workflow is presented in Fig. 1. We also developed an R package “miscClassify” that allows researchers to input candidate protein signatures and determine their performance with our validation data set. The supplemental material describes how to install and use the package, which is available on GitHub (https://github.com/mooredf22/miscClassify/).
Support vector machine (SVM) model building and cross-validation. The figure shows the steps of SVM model development used in this work and the sample sets utilized at each step.
Term enrichment analysis
Pathway analysis was conducted on a protein list derived from differential expression analysis based on Benjamini–Hochberg adjusted q -value (q < 0.05), using the R package Enrichr (version 3.2)15. Enrichment analysis was performed using the Reactome and Gene Ontology (GO) Biological Processes databases. Each protein set enrichment was assessed by Fisher’s Exact Test, and results were filtered by requiring a false discovery rate (FDR) < 0.05.
Results
The MIS-C plasma proteome
Proteomic analysis was conducted with a training set comprising 17 MIS-C and 20 mild/asymptomatic SARS-CoV-2 infection control samples. A total of 1675 proteins were identified by DIA-MS. After removing the proteins with > 50% missing values, we retained 643 proteins. Of these, 101 were found to be differentially abundant between the MIS-C and control groups based on Benjamini–Hochberg adjusted q value (q < 0.05), which corresponds to an FDR of < 5%. Of the 101 differentially expressed proteins, 41 were more abundant and 60 were less abundant in MIS-C than in control samples (Fig. 2). We performed gene ontology and pathway enrichment analysis and found that the top 20 enriched terms for the differentially increased proteins included terms related to immune function (Fig. 3A,B). The top 20 enriched terms for the differentially decreased proteins include lipid metabolism, coagulation, and protein metabolism (Fig. 3C,D). These findings emphasize the involvement of immune dysregulation, lipid metabolism, and coagulation pathways in MIS-C pathophysiology.
Volcano plot of differentially abundant proteins between MIS-C and mild/asymptomatic SARS-CoV-2. Proteins in the volcano plot are shown as red circles (Holm corrected p value ≤ 0.05), green circles (Holm corrected p value ≤ 0.05 and ≥ twofold change) and orange circles (Holm corrected p value > 0.05 and ≥ twofold change).
Pathway enrichment analysis of differentially abundant proteins. Pathway analysis was conducted on lists of proteins differentially expressed in MIS-C vs. mild/asymptomatic SARS-CoV-2 infection (FDR < 0.05 and p value < 0.05) and the reactome and gene ontology (GO) biological processes databases. The top 20 pathways ranked by p value are shown for differentially increased proteins in MIS-C (panel A, Reactome; panel B, GO) and differentially decreased proteins in MIS-C (panel C, Reactome; panel D, GO).
Development of a support vector machine (SVM) model
To develop a plasma protein signature, we first used the Holm correction13, which resulted in 34 proteins having a corrected p value of ≤ 0.05. To evaluate the ability of these proteins to distinguish MIS-C from mild/asymptomatic cases, we employed an SVM machine-learning algorithm. We selected proteins using three criteria: (1) Holm corrected p value, (2) intercept, a coefficient that accounts for protein abundance levels, and (3) increased abundance in MIS-C relative to controls. The latter criterion was applied since biomarker level increase may be suitable for the downstream development of immunodiagnostic assays for clinical use. We ranked the proteins by intercept and used the top three (ORM1, SERPINA3, AZGP1) (Table 2) to build an SVM classifier model. This model exhibited high specificity (90.0%) and sensitivity (88.2%), and an area under the ROC curve (AUC) of 93.5% (CI 84.8–100%). Using two proteins yielded a lower AUC (90.4%; CI 86.8–94.0%), while adding more proteins to the model did not improve its characteristics. We next validated the model by utilizing an independent set of 19 MIS-C and 10 mild/asymptomatic infection control samples to match the groups used in the training set. The resulting model had a specificity of 90.0%, a sensitivity of 84.2%, and an AUC of 87.4% (CI 74.1–100%) (Fig. 4 and Table 3), showing that our SVM algorithm can predict MIS-C with high accuracy.
External validation of an SVM model. The figure shows a receiver operating characteristic (ROC) curve visualizing the performance of three proteins (ORM1, SERPINA3, and AZGP1) applied to the validation dataset (MIS-C vs. mild/asymptomatic SARS-CoV-2).
Identification and validation of a multi-protein signature of MIS-C
In the clinical setting, it is necessary to distinguish MIS-C from other pathologies presenting similar signs and symptoms. Therefore, we used samples obtained from pneumonia and KD patients to identify protein biomarkers that can accurately differentiate these conditions from MIS-C. We first performed pairwise comparisons between MIS-C and each comparator condition to identify distinct protein expression patterns (Fig. 5A–C). We observed that von Willebrand factor (VWF) was significantly increased in MIS-C on all comparisons (Fig. 5D), and this was the only protein that reached statistical significance in the comparison between MIS-C and KD (Fig. 5A,D). We also observed that the proteins FCGBP, VWF, F11, BCHE, KLKB1, ATRN, SERPINA3, A2M, and PGLYRP2 were shared when comparing MIS-C against pneumonia and mild/asymptomatic SARS-CoV-2 infection (Fig. 5B–D). When we compared MIS-C to all groups in a multi-disease comparison, we observed that, out of 33 proteins that had a Holm p value ≤ 0.05, VWF, FCGBP, and SERPINA3 were the top three upregulated proteins in MIS-C (Fig. 6A and Table 4). An SVM model utilizing these three proteins showed a sensitivity of 89.5%, specificity of 97.5%, and an AUC of 95.6% (CI 89.6–100%). (Fig. 6B). To correct the three-protein model for overfitting, we carried out five-fold cross-validation using the “crossval” R library, and found a sensitivity of 75.3%, a specificity of 92.0%, and an AUC of 93.4% (CI 90.0–100%). While these cross-validated estimates are lower than the uncorrected ones, as expected, they remained high, showing that the model is robust. Slight modifications of the model did not significantly improve its accuracy. A two-protein model gave a slightly lower AUC; a fourth protein, VCAM1, which slightly increased the model’s accuracy, exhibited only a 1.6-fold change and was not included in the protein signature (Supplemental Fig. 1). Moreover, in order to examine the potential effects of demographic variables (age, sex-at-birth, and race/ethnicity) on classification accuracy, we concatenated these variables with the three-protein signature in the SVM model. As a result, we obtained an AUC of 96.9% (CI 92.8–100%), showing that including these variables did not affect model accuracy. These results indicate that, despite the relatively small size of our sample, a plasma protein set comprising VWF, FCGBP, and SERPINA3 exhibits a strong predictive capability for distinguishing MIS-C from pneumonia, KD, and mild/asymptomatic cases in children.
Differentially abundant proteins between MIS-C, pneumonia, Kawasaki Disease, and mild/asymptomatic SARS-CoV-2. Volcano plots of differentially abundant proteins between (A) MIS-C and pneumonia, (B) MIS-C and Kawasaki disease, and (C) MIS-C and mild/asymptomatic SARS-CoV-2 infection. Green and red circles are proteins having Holm corrected p value ≤ 0.05. Green and orange circles are proteins with ≥ twofold change. (D) UpSet plot showing the shared proteins between all pairwise comparisons. MK MIS-C vs. Kawasaki disease, MP MIS-C vs. pneumonia, MA MIS-C vs. mild/asymptomatic SARS-CoV-2 infection.
Multi-disease comparison and SVM model. (A) Volcano plot of differentially abundant proteins between MIS-C and Kawasaki disease (K), pneumonia (P), and mild/asymptomatic SARS-CoV-2 (M/A). Proteins are depicted as green, red, or orange circles, based on Holm adjusted p value and fold-change, as in Fig. 2. (B) ROC curve visualizing the performance of a 3-protein signature (VWF, FCGBP, and SERPINA3). K Kawasaki disease, P pneumonia, M/A mild/asymptomatic SARS-CoV-2 infection.
Model validation using external protein markers
Various groups have studied the plasma proteome of MIS-C patients for biomarker discovery. However, very few have applied classification models to their differentially expressed proteins. To address this gap, we applied the analytical tool described in this work to independently evaluate the performance of two multi-protein signatures for which an AUC was calculated in recent publications (Table 5). Nygaard et al.16 proposed a signature including “FCGR3A”, “LCP1”, “SERPINA3”, and “BCHE”, distinguishing MIS-C from KD and bacterial and viral infections. They reported an AUC of 95.0% based on internal validation and 87.0% on an external validation16. When we used our dataset and SVM model to assess their biomarker panel, we achieved comparable AUC values of 95.8% (uncorrected) and 89.7% (cross-validated corrected estimate). Similarly, Yeoh et al.17 described a signature based on “CD163”, “PCSK9”, and “CXCL9”, which also distinguished MIS-C from KD and bacterial and viral infections, with an originally reported uncorrected AUC of 85.7%. We built an SVM model with two of these proteins (chemokines such as CXCL9 are not detected by our DIA-MS method) and achieved an AUC of 94.1% (uncorrected) and 82.4% (cross-validated correction) (Table 5). These results show that the analysis pipeline and the proteomics dataset generated in the present study can be applied to the evaluation of the classification performance of differentially expressed proteins identified in independent studies of MIS-C.
Discussion
We describe an empirical approach to identify biomarkers for the classification of inflammatory illnesses, using the example of MIS-C, a syndrome for which specific biomarkers accelerating the diagnostic process are still missing. By integrating clinical epidemiology, mass spectrometry, and SVM machine learning, we identified a small set of proteins for MIS-C classification. Our work also developed an SVM-based classification algorithm for the refinement and validation of previously identified MIS-C biomarkers that were not externally validated.
The three-protein diagnostic biosignature we identified is fully consistent with MIS-C pathogenesis. SERPINA3 is a member of the serine protease inhibitor superfamily and is one of the major positive acute-phase proteins secreted by the liver18. Its increase in plasma during acute inflammation serves to prevent tissue damage caused by neutrophils, leukocytes, and phagocytic cells19. Moreover, SERPINA3 is elevated during viral infections such as human rhinovirus20 and SARS-CoV-2 infection in vivo21 and in vitro22. Elevated levels of SERPINA3 are also found in some cancers, where they are associated with poor outcomes19. This protein has also been shown to increase during LPS-induced acute lung injury and COVID-19-induced acute respiratory distress syndrome23. Thus, elevated plasma levels correlate with inflammation and multiorgan damage, which is a hallmark of MIS-C. Increased plasma levels of IgG Fc-binding protein (FCGBP), a mucin-like protein that mediates transport of serum IgG to mucosal surfaces and contributes to mucosal immunity24, can be explained by the gut mucosal barrier damage observed in MIS-C25,26. It has been suggested that SARS-CoV-2 antigen persistence in the gut can trigger heightened immune responses by disrupting intestinal homeostasis26. Furthermore, it has been proposed that a dysregulation of FCGBP may further increase exposure to antigens, as immune complexes are no longer entrapped by this protein and are instead phagocytosed, which can cause an overproduction of cytokines24. Elevated levels of von Willebrand factor (VWF), which promotes hemostasis and platelet adhesion, indicate endothelial activation and damage leading to the formation of microvascular thrombi and disseminated intravascular coagulation27,28, consistent with the hypercoagulable state observed in MIS-C29. Together with a disruption of coagulation pathways, our proteomics data highlight the close interconnection between coagulation and immune activation in MIS-C. Our analysis also identified altered lipid metabolism, including the downregulation of cholesterol transport, HDL remodeling, and plasma lipoprotein assembly. These alterations mirror findings from severe COVID-19 cases, where persistent hypolipidemia and reduced HDL-C and ApoA-I levels correlate with poorer clinical outcomes30,31. Such metabolic disruptions indicate the need for a deeper exploration of lipid-related biomarkers in diagnosing and managing MIS-C. Together, our results connect biomarker discovery with unraveling MIS-C pathogenesis.
Our study has limitations. The demographic and geographic variability of plasma protein levels may need further biosignature validation in diverse populations. Moreover, a definitive catalog of MIS-C biomarkers would benefit from expanding the control conditions to other hyperinflammatory syndromes (e.g., systemic juvenile idiopathic arthritis and hemophagocytic lymphohistiocytosis) and conditions affecting intestinal permeability, such as inflammatory bowel disease. Additionally, our data were collected during disease management, and we were unable to account for patient treatment, which might impact plasma protein levels. Moreover, our cross-sectional design did not include the collection of longitudinal samples, precluding the temporal analysis of biomarker dynamics.
In conclusion, our approach, which integrates clinical epidemiology, mass spectrometry, and artificial intelligence, shifts the focus of MIS-C biomarker research from discovery to differential diagnosis. Further work will help expand the evaluation of our markers of MIS-C to more complex clinical settings and the application of our modeling tools to find classification biomarkers for other challenging hyperinflammatory syndromes, including KD, macrophage activation syndrome, and hemophagocytic lymphohistiocytosis. Additional measures of predictive accuracy, such as the Brier method32, alternative classification approaches, and other machine learning methodologies, may become applicable when our protein signature is further tested in clinical settings. We also expect our work to contribute to preparedness for the potential resurgence of MIS-C and the advent of new syndromes.
Data availability
Data is available in the GitHub page (https://github.com/mooredf22/miscClassify).
References
Riphagen, S., Gomez, X., Gonzalez-Martinez, C., Wilkinson, N. & Theocharis, P. Hyperinflammatory shock in children during COVID-19 pandemic. Lancet 395(10237), 1607–1608 (2020).
Whittaker, E. et al. Clinical characteristics of 58 children with a pediatric inflammatory multisystem syndrome temporally associated with SARS-CoV-2. JAMA 324(3), 259–269 (2020).
Philadelphia TCsHo. Multisystem inflammatory syndrome (MIS-C) clinical pathway chop.edu (2021). Available from: https://pathways.chop.edu/clinical-pathway/multisystem-inflammatory-syndrome-mis-c-clinical-pathway.
Porritt, R. A. et al. The autoimmune signature of hyperinflammatory multisystem inflammatory syndrome in children. J. Clin. Investig. 131(20), e151520 (2021).
Reiter, A. et al. Proteomic mapping identifies serum marker signatures associated with MIS-C specific hyperinflammation and cardiovascular manifestation. Clin. Immunol. 264, 110237 (2024).
Sacco, K. et al. Immunopathological signatures in multisystem inflammatory syndrome in children and pediatric COVID-19. Nat. Med. 28(5), 1050–1062 (2022).
Doerr, A. DIA mass spectrometry. Nat. Methods 12(1), 35 (2015).
Statnikov, A., Aliferis, C. F., Hardin, D. P. & Guyon, I. A Gentle Introduction to Support Vector Machines in Biomedicine (World Scientific Publishing Co, 2011).
McCrindle, B. W. et al. Diagnosis, treatment, and long-term management of Kawasaki disease: A scientific statement for health professionals from the American Heart Association. Circulation 135(17), e927–e999 (2017).
Hughes, C. S. et al. Single-pot, solid-phase-enhanced sample preparation for proteomics experiments. Nat. Protoc. 14(1), 68–85 (2019).
Demichev, V., Messner, C. B., Vernardis, S. I., Lilley, K. S. & Ralser, M. DIA-NN: Neural networks and interference correction enable deep proteome coverage in high throughput. Nat. Methods 17(1), 41–44 (2020).
Cox, J. et al. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol. Cell. Proteom. 13(9), 2513–2526 (2014).
Menyhart, O., Weltz, B. & Győrffy, B. MultipleTesting.com: A tool for life science researchers for multiple hypothesis testing correction. PLoS ONE 16(6), e0245824 (2021).
Pepe, M. S., Cai, T. & Longton, G. Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics 62(1), 221–229 (2006).
Kuleshov, M. V. et al. Enrichr: A comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44(W1), W90–W97 (2016).
Nygaard, U. et al. Proteomic profiling reveals diagnostic signatures and pathogenic insights in multisystem inflammatory syndrome in children. Commun. Biol. 7(1), 688 (2024).
Yeoh, S. et al. Plasma protein biomarkers distinguish multisystem inflammatory syndrome in children from other pediatric infectious and inflammatory diseases. Pediatr. Infect. Dis. J. 43(5), 444–453 (2024).
Soman, A. & Asha, N. S. Unfolding the cascade of SERPINA3: Inflammation to cancer. Biochim. Biophys. Acta Rev. Cancer 1877(5), 188760 (2022).
de Mezer, M. et al. SERPINA3: Stimulator or inhibitor of pathological changes. Biomedicines 11(1), 156 (2023).
Abbasi, S. et al. Impact of human rhinoviruses on gene expression in pediatric patients with severe acute respiratory infection. Virus Res. 300, 198408 (2021).
Suvarna, K. et al. Proteomics and machine learning approaches reveal a set of prognostic markers for COVID-19 severity with drug repurposing potential. Front. Physiol. 12, 652799 (2021).
Ferrarini, M. G. et al. Genome-wide bioinformatic analyses predict key host and viral factors in SARS-CoV-2 pathogenesis. Commun. Biol. 4(1), 590 (2021).
Gong, R. et al. Integrative proteomic profiling of lung tissues and blood in acute respiratory distress syndrome. Front. Immunol. 14, 1158951 (2023).
Kobayashi, K., Tachibana, M. & Tsutsumi, Y. Neglected roles of IgG Fc-binding protein secreted from airway mucin-producing cells in protecting against SARS-CoV-2 infection. Innate Immun. 27(6), 423–436 (2021).
Khan, R. et al. A genetically modulated Toll-like receptor-tolerant phenotype in peripheral blood cells of children with multisystem inflammatory syndrome. J. Immunol. 214(3), 373–383 (2025).
Yonker, L. M. et al. Multisystem inflammatory syndrome in children is driven by zonulin-dependent loss of gut mucosal barrier. J. Clin. Investig. 131(14), e149633 (2021).
Diorio, C. et al. Evidence of thrombotic microangiopathy in children with SARS-CoV-2 across the spectrum of clinical presentations. Blood Adv. 4(23), 6051–6063 (2020).
Diorio, C. et al. Proteomic profiling of MIS-C patients indicates heterogeneity relating to interferon gamma dysregulation and vascular endothelial dysfunction. Nat. Commun. 12(1), 7222 (2021).
Boucher, A. A. et al. Prolonged elevations of factor VIII and von willebrand factor antigen after multisystem inflammatory syndrome in children. J. Pediatr. Hematol. Oncol. 45(4), e427–e432 (2023).
Li, Y. et al. Lipid metabolism changes in patients with severe COVID-19. Clin. Chim. Acta 517, 66–73 (2021).
Mietus-Snyder, M. et al. Changes in HDL cholesterol, particles, and function associate with pediatric COVID-19 severity. Front. Cardiovasc. Med. 9, 1033660 (2022).
Stehouwer, N., Rowland-Seymour, A., Gruppen, L., Albert, J. M. & Qua, K. Validity and reliability of Brier scoring for assessment of probabilistic diagnostic reasoning. Diagnosis (Berlin) 12(1), 53–60 (2025).
Acknowledgements
DIA mass spectrometry was performed by the Biological Mass Spectrometry Facility at Rutgers Robert Wood Johnson Medical School. We wish to thank David Sleat for his contribution in establishing and executing the DIA MS pipeline. This work was funded by NIH grants R61HD105619, R33HD105619, HD105593-03S2, R01AI158911, HD105613, and NCATS UM1TR004789, and by Rutgers ROI–HealthAdvance HA2022-0039.
Author information
Authors and Affiliations
Contributions
JGR, DFM, MLG: Conceptualization, Data interpretation and visualization, Writing—original draft, Writing—review and editing. HZ: Data acquisition, Writing— review and editing. LCK: Data interpretation, Writing—original draft, Writing—review and editing. BC, CS, SG, ER, UNH, WC, ARS, HB, DCK, YK, PWB, DW, ER, SAL, KWH, AHH, JCS, SMF, JR, DBH: Data interpretation, Writing—review and editing. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Study approval
All study activities were approved by the Rutgers Institutional Review Board (Pro2020002961) and all methods were performed in accordance with the relevant guidelines. All participants, including parents and/or legal guardians, provided informed consent prior to engaging in study activities.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Guzmán Rivera, J., Zheng, H., Richlin, B. et al. Mass spectrometry combined with machine learning identifies novel protein signatures as demonstrated with multisystem inflammatory syndrome in children. Sci Rep 15, 36843 (2025). https://doi.org/10.1038/s41598-025-20684-5
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-20684-5





