Mass spectrometry combined with machine learning identifies novel protein signatures as demonstrated with multisystem inflammatory syndrome in children

Guzmán Rivera, Jeisac; Zheng, Haiyan; Richlin, Benjamin; Suarez, Christian; Gaur, Sunanda; Ricciardi, Elizabeth; Hasan, Uzma N.; Cuddy, William; Singh, Aalok R.; Bukulmez, Hulya; Kaelber, David C.; Kimura, Yukiko; Brady, Patrick W.; Wahezi, Dawn; Rothschild, Evin; Lakhani, Saquib A.; Herbst, Katherine W.; Hogan, Alexander H.; Salazar, Juan C.; Moroso-Fela, Sandra; Roy, Jason; Kleinman, Lawrence C.; Horton,, Daniel B.; Moore, Dirk F.; Gennaro, Maria Laura

doi:10.1038/s41598-025-20684-5

Download PDF

Article
Open access
Published: 22 October 2025

Mass spectrometry combined with machine learning identifies novel protein signatures as demonstrated with multisystem inflammatory syndrome in children

Jeisac Guzmán Rivera¹,
Haiyan Zheng²,
Benjamin Richlin³,
Christian Suarez³,
Sunanda Gaur^3,4,
Elizabeth Ricciardi⁵,
Uzma N. Hasan⁵,
William Cuddy⁶,
Aalok R. Singh^6,7,
Hulya Bukulmez⁸,
David C. Kaelber⁹,
Yukiko Kimura¹⁰,
Patrick W. Brady¹¹,
Dawn Wahezi¹²,
Evin Rothschild¹²,
Saquib A. Lakhani^13,14,
Katherine W. Herbst¹⁵,
Alexander H. Hogan^16,17,
Juan C. Salazar¹⁸,
Sandra Moroso-Fela¹⁹,
Jason Roy²⁰,
Lawrence C. Kleinman^19,21,22,
Daniel B. Horton,^19,20,23,24,
Dirk F. Moore²⁰ &
…
Maria Laura Gennaro^1,25

Scientific Reports volume 15, Article number: 36843 (2025) Cite this article

3532 Accesses
6 Altmetric
Metrics details

Subjects

Abstract

Rapid and accurate diagnosis of emerging inflammatory illnesses is challenging due to overlapping clinical features with existing conditions. We demonstrate an approach that integrates proteomic analysis with machine learning to identify diagnostic protein signatures, using the example of SARS-CoV-2-induced multisystem inflammatory syndrome in children (MIS-C). We used plasma samples collected from subjects diagnosed with MIS-C and compared them first to controls with asymptomatic/mild SARS-CoV-2 infection and then to controls with pneumonia or Kawasaki disease. We used mass spectrometry to identify proteins and support vector machine (SVM) algorithm-based classification schemes to identify protein signatures. Diagnostic accuracy was assessed by calculating sensitivity, specificity, and area under the ROC curve (AUC), and corrected for overfitting by cross-validation. Proteomic analysis of a training dataset containing MIS-C (N = 17), and asymptomatic/mild SARS-CoV-2 infected control samples (N = 20) identified 643 proteins, of which 101 were differentially expressed. Plasma proteins associated with inflammation increased, and those associated with metabolism and coagulation decreased in MIS-C relative to controls. The SVM machine learning algorithm identified a three-protein model (ORM1, AZGP1, SERPINA3) that achieved 90.0% specificity, 88.2% sensitivity, and 93.5% AUC, distinguishing MIS-C from controls in the training set. Performance was retained in the validation dataset utilizing MIS-C (N = 19) and asymptomatic/mild SARS-CoV-2 infected control samples (N = 10) (90.0% specificity, 84.2% sensitivity, 87.4% AUC). We next replicated our approach to compare MIS-C with similarly presenting syndromes, such as pneumonia (N = 17) and Kawasaki disease (N = 13), and found a distinct three-protein signature (VWF, FCGBP, and SERPINA3) that accurately distinguished MIS-C from the other conditions (97.5% specificity, 89.5% sensitivity, 95.6% AUC). A software tool was also developed that may be used to evaluate other protein signatures using our data. These results demonstrate that the use of mass spectrometry to identify candidate plasma proteins followed by machine learning, specifically SVM, is an efficient strategy for identifying and evaluating biomarker signatures for disease classification.

Introduction

Timely diagnosis enables the prompt delivery of effective treatment. When a new condition such as the multisystem inflammatory syndrome in children (MIS-C) emerges, researchers and clinicians seek reliable paths to rapid diagnosis and effective treatments. MIS-C, which was first identified at the outset of the COVID-19 pandemic, is a rare, severe, and at times fatal condition characterized by fever, systemic hyperinflammation, and multi-organ dysfunction that can develop 2–6 weeks after SARS-CoV-2 infection^1,2. As with many emerging conditions, MIS-C is a diagnosis of exclusion requiring a multi-tiered search for an alternative explanation before diagnosis is established³, a process that has proven expensive in terms of resources and time to diagnosis. Despite the massive effort of scientific teams to elucidate the underlying pathophysiology, to establish paths to timely diagnosis, and to develop effective treatments, we are still seeking an accurate diagnosis and differentiation of MIS-C from other hyperinflammatory syndromes such as Kawasaki disease (KD).

Several studies have investigated the landscape of plasma proteins in MIS-C to gain insight into its pathogenesis and identify biomarkers that are distinctive for MIS-C^4,5,6. Most were structured toward discovering candidate proteins rather than assessing their ability to discriminate MIS-C from comparator diseases. In the present study, we integrated data-independent acquisition mass spectrometry (DIA-MS)⁷ and artificial intelligence to develop an analytical framework for biomarker selection and validation. We used mass spectrometry to identify proteins; support vector machine (SVM)⁸, a machine learning approach, to identify proteins distinguishing subjects as having MIS-C or an identified alternative disease; and receiver operating characteristics (ROC) curves to assess the resulting model’s discrimination accuracy. Our work has also resulted in an open-access SVM-based analytical tool and a robust dataset that enables the validation of protein biomarker signatures for MIS-C.

Materials and methods

Study participants

We enrolled participants ≤ 21 years old (Table 1) and collected blood samples at nine sites from four states (CT, NJ, NY, OH). Children and youth with MIS-C were classified in accordance with the 2020 U.S. Centers for Disease Control criteria, which include a recent history of SARS-CoV-2 infection, signs of inflammation and involvement of at least two organ systems, and no alternative plausible diagnosis³. Pneumonia was defined by the presence of an infiltrative process in the lung parenchyma on chest radiography secondary to infection (viral or bacterial), without evidence of concurrent SARS-CoV-2 infection. Diagnosis of Kawasaki disease (KD) was based on established criteria⁹. All pneumonia and KD participants tested negative for SARS-CoV-2 at enrollment. For all disease conditions, blood for proteomics analysis was collected during hospitalization. Controls in our study were subjects with a history of mild or asymptomatic SARS-CoV-2 infection who were defined as having a positive SARS-CoV-2 test and presenting in the outpatient setting with no symptoms or symptoms not requiring inpatient care before sample collection.

Table 1 Study demographics of the combined training and validation data sets.

Full size table

Sample preparation for mass spectrometry

Plasma (10 µg per sample) was diluted in 50 mM HEPES, 50 mM EDTA, and 2% SDS. It was reduced with 5 mM DTT for 30 min at 60 °C and alkylated with 20 mM iodoacetamide for 1 h at room temperature and in the dark. The sample was then subjected to SP3 beads digestion with trypsin (sequencing grade, Thermo Fisher Scientific) in 100 mM ammonium bicarbonate, 2 mM CaCl₂, and incubated at 37 °C overnight, as described¹⁰. Peptides were acidified with formic acid, and 10.0% of each sample was analyzed by liquid chromatography–tandem mass spectrometry (LC–MS/MS).

Liquid chromatography–tandem mass spectrometry

Samples were analyzed by data-independent acquisition mass spectrometry (DIA-MS)⁷ using a Dionex Ultimate 3000 RLSCnano System (Thermo Fisher Scientific) interfaced with an Orbitrap Eclipse Tribrid mass spectrometer (Thermo Fisher Scientific). Briefly, samples were loaded onto a fused silica trap column, Acclaim PepMap 100, 75 µm × 2 cm (Thermo Fisher Scientific). After washing with 0.1% trifluoroacetic acid, the trap column was brought in-line with an analytical column (Nanoease MZ peptide BEH C18, 130 A, 1.7 µm, 75 µm × 250 mm, Waters) for LC–MS/MS. Peptides were fractionated at 300 nL/min using a segmented linear gradient 4–15% A + B in 30 min (where A: 0.16% formic acid, 80% acetonitrile, and B: 0.2% formic acid), 15–25% A in 40 min, 25–50% A in 44 min, and 50–90% A in 11 min. Solution A was then returned to 4% for 5 min for the subsequent run. Samples of different groups were loaded in alternating fashion. Pooled samples were digested alongside the individual samples and loaded first 3 times to condition the column and run every 10 samples. MS scan range was set at 400–1200, resolution 12,000 with AGC set at 3E6 and ion time set as auto. An 8 m/z window was set to sequentially isolate and fragment the ions in the C-trap with a relative collision energy of 30. The MS/MS data were recorded with a resolution of 30,000.

Raw data were analyzed using an in-silico predicted peptide library generated from the UniProt human reference proteome for library-free database searching using DIA-NN 1.8.1¹¹. Results were filtered for posterior error probability for the precursor identification of < 1% and Protein Group Q value, also < 1%. Protein abundance was quantified using the MaxLFQ method and used for downstream analysis¹².

Mass spectrometry data analysis

MaxLFQ values were log2-transformed before statistical analysis. We fitted a linear model comparing diseased to control for each protein. In our initial protein screen we included, for each protein, only samples with < 50% missing data. The result of these fitted models was, for each protein, an estimate of the log2 abundance ratio and its standard error, from which we calculated a p value. To adjust for multiple comparisons, we converted the raw p values to Holm p values¹³. To control the false discovery rate, we calculated q values from the raw p values using the Benjamini–Hochberg method. All calculations were made using the R system and package libraries (https://www.R-project.org/).

Classification model building

The study included (1) a training set comprising participants with MIS-C (n = 17) and control participants with a history of mild or asymptomatic SARS-CoV-2 infection (n = 20), and (2) a pre-defined independent validation set comprising participants with MIS-C (n = 19), mild/asymptomatic participants (n = 10), pneumonia (n = 17), and KD (n = 13). The samples in the independent validation set were collected from different sites and/or at a different stage of the study. We used the SVM classifier (R function “svm” in the “e1071” package) to develop models using the DIA-MS protein data to distinguish MIS-C from other conditions. Data for the SVM procedure were scaled to zero mean and unit variance to ensure consistency across datasets. We tuned the SVM model by testing the linear and radial kernels over a range of the gamma and cost hyperparameters and settled on a linear kernel with cost = 1. To assess the accuracy of the SVM models, we calculated sensitivity, specificity, and area under the ROC curve (AUC)¹⁴. We next built a classifier model based on the training set of participants and applied it to the independent validation set to obtain a validated AUC. Five random repetitions of five-fold cross-validation (R package “crossval”), with feature selection applied in each fold, were used to correct for overfitting and to calculate 95% confidence intervals for the sensitivity, specificity, and AUC. ROC curves and bootstrap-generated 95% confidence intervals were plotted using the pROC R package. A summary of the analytical workflow is presented in Fig. 1. We also developed an R package “miscClassify” that allows researchers to input candidate protein signatures and determine their performance with our validation data set. The supplemental material describes how to install and use the package, which is available on GitHub (https://github.com/mooredf22/miscClassify/).

Term enrichment analysis

Pathway analysis was conducted on a protein list derived from differential expression analysis based on Benjamini–Hochberg adjusted q -value (q < 0.05), using the R package Enrichr (version 3.2)¹⁵. Enrichment analysis was performed using the Reactome and Gene Ontology (GO) Biological Processes databases. Each protein set enrichment was assessed by Fisher’s Exact Test, and results were filtered by requiring a false discovery rate (FDR) < 0.05.

Results

The MIS-C plasma proteome

Proteomic analysis was conducted with a training set comprising 17 MIS-C and 20 mild/asymptomatic SARS-CoV-2 infection control samples. A total of 1675 proteins were identified by DIA-MS. After removing the proteins with > 50% missing values, we retained 643 proteins. Of these, 101 were found to be differentially abundant between the MIS-C and control groups based on Benjamini–Hochberg adjusted q value (q < 0.05), which corresponds to an FDR of < 5%. Of the 101 differentially expressed proteins, 41 were more abundant and 60 were less abundant in MIS-C than in control samples (Fig. 2). We performed gene ontology and pathway enrichment analysis and found that the top 20 enriched terms for the differentially increased proteins included terms related to immune function (Fig. 3A,B). The top 20 enriched terms for the differentially decreased proteins include lipid metabolism, coagulation, and protein metabolism (Fig. 3C,D). These findings emphasize the involvement of immune dysregulation, lipid metabolism, and coagulation pathways in MIS-C pathophysiology.

Development of a support vector machine (SVM) model

To develop a plasma protein signature, we first used the Holm correction¹³, which resulted in 34 proteins having a corrected p value of ≤ 0.05. To evaluate the ability of these proteins to distinguish MIS-C from mild/asymptomatic cases, we employed an SVM machine-learning algorithm. We selected proteins using three criteria: (1) Holm corrected p value, (2) intercept, a coefficient that accounts for protein abundance levels, and (3) increased abundance in MIS-C relative to controls. The latter criterion was applied since biomarker level increase may be suitable for the downstream development of immunodiagnostic assays for clinical use. We ranked the proteins by intercept and used the top three (ORM1, SERPINA3, AZGP1) (Table 2) to build an SVM classifier model. This model exhibited high specificity (90.0%) and sensitivity (88.2%), and an area under the ROC curve (AUC) of 93.5% (CI 84.8–100%). Using two proteins yielded a lower AUC (90.4%; CI 86.8–94.0%), while adding more proteins to the model did not improve its characteristics. We next validated the model by utilizing an independent set of 19 MIS-C and 10 mild/asymptomatic infection control samples to match the groups used in the training set. The resulting model had a specificity of 90.0%, a sensitivity of 84.2%, and an AUC of 87.4% (CI 74.1–100%) (Fig. 4 and Table 3), showing that our SVM algorithm can predict MIS-C with high accuracy.

Table 2 Protein candidates used to build the predictive model.

Full size table

Table 3 Support vector machine model evaluation.

Full size table

Identification and validation of a multi-protein signature of MIS-C

In the clinical setting, it is necessary to distinguish MIS-C from other pathologies presenting similar signs and symptoms. Therefore, we used samples obtained from pneumonia and KD patients to identify protein biomarkers that can accurately differentiate these conditions from MIS-C. We first performed pairwise comparisons between MIS-C and each comparator condition to identify distinct protein expression patterns (Fig. 5A–C). We observed that von Willebrand factor (VWF) was significantly increased in MIS-C on all comparisons (Fig. 5D), and this was the only protein that reached statistical significance in the comparison between MIS-C and KD (Fig. 5A,D). We also observed that the proteins FCGBP, VWF, F11, BCHE, KLKB1, ATRN, SERPINA3, A2M, and PGLYRP2 were shared when comparing MIS-C against pneumonia and mild/asymptomatic SARS-CoV-2 infection (Fig. 5B–D). When we compared MIS-C to all groups in a multi-disease comparison, we observed that, out of 33 proteins that had a Holm p value ≤ 0.05, VWF, FCGBP, and SERPINA3 were the top three upregulated proteins in MIS-C (Fig. 6A and Table 4). An SVM model utilizing these three proteins showed a sensitivity of 89.5%, specificity of 97.5%, and an AUC of 95.6% (CI 89.6–100%). (Fig. 6B). To correct the three-protein model for overfitting, we carried out five-fold cross-validation using the “crossval” R library, and found a sensitivity of 75.3%, a specificity of 92.0%, and an AUC of 93.4% (CI 90.0–100%). While these cross-validated estimates are lower than the uncorrected ones, as expected, they remained high, showing that the model is robust. Slight modifications of the model did not significantly improve its accuracy. A two-protein model gave a slightly lower AUC; a fourth protein, VCAM1, which slightly increased the model’s accuracy, exhibited only a 1.6-fold change and was not included in the protein signature (Supplemental Fig. 1). Moreover, in order to examine the potential effects of demographic variables (age, sex-at-birth, and race/ethnicity) on classification accuracy, we concatenated these variables with the three-protein signature in the SVM model. As a result, we obtained an AUC of 96.9% (CI 92.8–100%), showing that including these variables did not affect model accuracy. These results indicate that, despite the relatively small size of our sample, a plasma protein set comprising VWF, FCGBP, and SERPINA3 exhibits a strong predictive capability for distinguishing MIS-C from pneumonia, KD, and mild/asymptomatic cases in children.

Table 4 Biomarker candidates from multi-disease comparison.

Full size table

Model validation using external protein markers

Various groups have studied the plasma proteome of MIS-C patients for biomarker discovery. However, very few have applied classification models to their differentially expressed proteins. To address this gap, we applied the analytical tool described in this work to independently evaluate the performance of two multi-protein signatures for which an AUC was calculated in recent publications (Table 5). Nygaard et al.¹⁶ proposed a signature including “FCGR3A”, “LCP1”, “SERPINA3”, and “BCHE”, distinguishing MIS-C from KD and bacterial and viral infections. They reported an AUC of 95.0% based on internal validation and 87.0% on an external validation¹⁶. When we used our dataset and SVM model to assess their biomarker panel, we achieved comparable AUC values of 95.8% (uncorrected) and 89.7% (cross-validated corrected estimate). Similarly, Yeoh et al.¹⁷ described a signature based on “CD163”, “PCSK9”, and “CXCL9”, which also distinguished MIS-C from KD and bacterial and viral infections, with an originally reported uncorrected AUC of 85.7%. We built an SVM model with two of these proteins (chemokines such as CXCL9 are not detected by our DIA-MS method) and achieved an AUC of 94.1% (uncorrected) and 82.4% (cross-validated correction) (Table 5). These results show that the analysis pipeline and the proteomics dataset generated in the present study can be applied to the evaluation of the classification performance of differentially expressed proteins identified in independent studies of MIS-C.

Table 5 Comparison of model performance.

Full size table

Discussion

We describe an empirical approach to identify biomarkers for the classification of inflammatory illnesses, using the example of MIS-C, a syndrome for which specific biomarkers accelerating the diagnostic process are still missing. By integrating clinical epidemiology, mass spectrometry, and SVM machine learning, we identified a small set of proteins for MIS-C classification. Our work also developed an SVM-based classification algorithm for the refinement and validation of previously identified MIS-C biomarkers that were not externally validated.

The three-protein diagnostic biosignature we identified is fully consistent with MIS-C pathogenesis. SERPINA3 is a member of the serine protease inhibitor superfamily and is one of the major positive acute-phase proteins secreted by the liver¹⁸. Its increase in plasma during acute inflammation serves to prevent tissue damage caused by neutrophils, leukocytes, and phagocytic cells¹⁹. Moreover, SERPINA3 is elevated during viral infections such as human rhinovirus²⁰ and SARS-CoV-2 infection in vivo²¹ and in vitro²². Elevated levels of SERPINA3 are also found in some cancers, where they are associated with poor outcomes¹⁹. This protein has also been shown to increase during LPS-induced acute lung injury and COVID-19-induced acute respiratory distress syndrome²³. Thus, elevated plasma levels correlate with inflammation and multiorgan damage, which is a hallmark of MIS-C. Increased plasma levels of IgG Fc-binding protein (FCGBP), a mucin-like protein that mediates transport of serum IgG to mucosal surfaces and contributes to mucosal immunity²⁴, can be explained by the gut mucosal barrier damage observed in MIS-C^25,26. It has been suggested that SARS-CoV-2 antigen persistence in the gut can trigger heightened immune responses by disrupting intestinal homeostasis²⁶. Furthermore, it has been proposed that a dysregulation of FCGBP may further increase exposure to antigens, as immune complexes are no longer entrapped by this protein and are instead phagocytosed, which can cause an overproduction of cytokines²⁴. Elevated levels of von Willebrand factor (VWF), which promotes hemostasis and platelet adhesion, indicate endothelial activation and damage leading to the formation of microvascular thrombi and disseminated intravascular coagulation^27,28, consistent with the hypercoagulable state observed in MIS-C²⁹. Together with a disruption of coagulation pathways, our proteomics data highlight the close interconnection between coagulation and immune activation in MIS-C. Our analysis also identified altered lipid metabolism, including the downregulation of cholesterol transport, HDL remodeling, and plasma lipoprotein assembly. These alterations mirror findings from severe COVID-19 cases, where persistent hypolipidemia and reduced HDL-C and ApoA-I levels correlate with poorer clinical outcomes^30,31. Such metabolic disruptions indicate the need for a deeper exploration of lipid-related biomarkers in diagnosing and managing MIS-C. Together, our results connect biomarker discovery with unraveling MIS-C pathogenesis.

Our study has limitations. The demographic and geographic variability of plasma protein levels may need further biosignature validation in diverse populations. Moreover, a definitive catalog of MIS-C biomarkers would benefit from expanding the control conditions to other hyperinflammatory syndromes (e.g., systemic juvenile idiopathic arthritis and hemophagocytic lymphohistiocytosis) and conditions affecting intestinal permeability, such as inflammatory bowel disease. Additionally, our data were collected during disease management, and we were unable to account for patient treatment, which might impact plasma protein levels. Moreover, our cross-sectional design did not include the collection of longitudinal samples, precluding the temporal analysis of biomarker dynamics.

In conclusion, our approach, which integrates clinical epidemiology, mass spectrometry, and artificial intelligence, shifts the focus of MIS-C biomarker research from discovery to differential diagnosis. Further work will help expand the evaluation of our markers of MIS-C to more complex clinical settings and the application of our modeling tools to find classification biomarkers for other challenging hyperinflammatory syndromes, including KD, macrophage activation syndrome, and hemophagocytic lymphohistiocytosis. Additional measures of predictive accuracy, such as the Brier method³², alternative classification approaches, and other machine learning methodologies, may become applicable when our protein signature is further tested in clinical settings. We also expect our work to contribute to preparedness for the potential resurgence of MIS-C and the advent of new syndromes.

Data availability

Data is available in the GitHub page (https://github.com/mooredf22/miscClassify).

References

Riphagen, S., Gomez, X., Gonzalez-Martinez, C., Wilkinson, N. & Theocharis, P. Hyperinflammatory shock in children during COVID-19 pandemic. Lancet 395(10237), 1607–1608 (2020).
Article CAS PubMed PubMed Central Google Scholar
Whittaker, E. et al. Clinical characteristics of 58 children with a pediatric inflammatory multisystem syndrome temporally associated with SARS-CoV-2. JAMA 324(3), 259–269 (2020).
Article CAS PubMed PubMed Central Google Scholar
Philadelphia TCsHo. Multisystem inflammatory syndrome (MIS-C) clinical pathway chop.edu (2021). Available from: https://pathways.chop.edu/clinical-pathway/multisystem-inflammatory-syndrome-mis-c-clinical-pathway.
Porritt, R. A. et al. The autoimmune signature of hyperinflammatory multisystem inflammatory syndrome in children. J. Clin. Investig. 131(20), e151520 (2021).
Article CAS PubMed PubMed Central Google Scholar
Reiter, A. et al. Proteomic mapping identifies serum marker signatures associated with MIS-C specific hyperinflammation and cardiovascular manifestation. Clin. Immunol. 264, 110237 (2024).
Article CAS PubMed Google Scholar
Sacco, K. et al. Immunopathological signatures in multisystem inflammatory syndrome in children and pediatric COVID-19. Nat. Med. 28(5), 1050–1062 (2022).
Article CAS PubMed PubMed Central Google Scholar
Doerr, A. DIA mass spectrometry. Nat. Methods 12(1), 35 (2015).
Article CAS Google Scholar
Statnikov, A., Aliferis, C. F., Hardin, D. P. & Guyon, I. A Gentle Introduction to Support Vector Machines in Biomedicine (World Scientific Publishing Co, 2011).
Book Google Scholar
McCrindle, B. W. et al. Diagnosis, treatment, and long-term management of Kawasaki disease: A scientific statement for health professionals from the American Heart Association. Circulation 135(17), e927–e999 (2017).
Article PubMed Google Scholar
Hughes, C. S. et al. Single-pot, solid-phase-enhanced sample preparation for proteomics experiments. Nat. Protoc. 14(1), 68–85 (2019).
Article CAS PubMed Google Scholar
Demichev, V., Messner, C. B., Vernardis, S. I., Lilley, K. S. & Ralser, M. DIA-NN: Neural networks and interference correction enable deep proteome coverage in high throughput. Nat. Methods 17(1), 41–44 (2020).
Article CAS PubMed Google Scholar
Cox, J. et al. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol. Cell. Proteom. 13(9), 2513–2526 (2014).
Article CAS Google Scholar
Menyhart, O., Weltz, B. & Győrffy, B. MultipleTesting.com: A tool for life science researchers for multiple hypothesis testing correction. PLoS ONE 16(6), e0245824 (2021).
Article CAS PubMed PubMed Central Google Scholar
Pepe, M. S., Cai, T. & Longton, G. Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics 62(1), 221–229 (2006).
Article MathSciNet PubMed Google Scholar
Kuleshov, M. V. et al. Enrichr: A comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44(W1), W90–W97 (2016).
Article CAS PubMed PubMed Central Google Scholar
Nygaard, U. et al. Proteomic profiling reveals diagnostic signatures and pathogenic insights in multisystem inflammatory syndrome in children. Commun. Biol. 7(1), 688 (2024).
Article PubMed PubMed Central Google Scholar
Yeoh, S. et al. Plasma protein biomarkers distinguish multisystem inflammatory syndrome in children from other pediatric infectious and inflammatory diseases. Pediatr. Infect. Dis. J. 43(5), 444–453 (2024).
Article PubMed PubMed Central Google Scholar
Soman, A. & Asha, N. S. Unfolding the cascade of SERPINA3: Inflammation to cancer. Biochim. Biophys. Acta Rev. Cancer 1877(5), 188760 (2022).
Article CAS PubMed Google Scholar
de Mezer, M. et al. SERPINA3: Stimulator or inhibitor of pathological changes. Biomedicines 11(1), 156 (2023).
Article PubMed PubMed Central Google Scholar
Abbasi, S. et al. Impact of human rhinoviruses on gene expression in pediatric patients with severe acute respiratory infection. Virus Res. 300, 198408 (2021).
Article CAS PubMed Google Scholar
Suvarna, K. et al. Proteomics and machine learning approaches reveal a set of prognostic markers for COVID-19 severity with drug repurposing potential. Front. Physiol. 12, 652799 (2021).
Article PubMed PubMed Central Google Scholar
Ferrarini, M. G. et al. Genome-wide bioinformatic analyses predict key host and viral factors in SARS-CoV-2 pathogenesis. Commun. Biol. 4(1), 590 (2021).
Article CAS PubMed PubMed Central Google Scholar
Gong, R. et al. Integrative proteomic profiling of lung tissues and blood in acute respiratory distress syndrome. Front. Immunol. 14, 1158951 (2023).
Article CAS PubMed PubMed Central Google Scholar
Kobayashi, K., Tachibana, M. & Tsutsumi, Y. Neglected roles of IgG Fc-binding protein secreted from airway mucin-producing cells in protecting against SARS-CoV-2 infection. Innate Immun. 27(6), 423–436 (2021).
Article CAS PubMed PubMed Central Google Scholar
Khan, R. et al. A genetically modulated Toll-like receptor-tolerant phenotype in peripheral blood cells of children with multisystem inflammatory syndrome. J. Immunol. 214(3), 373–383 (2025).
Article CAS PubMed PubMed Central Google Scholar
Yonker, L. M. et al. Multisystem inflammatory syndrome in children is driven by zonulin-dependent loss of gut mucosal barrier. J. Clin. Investig. 131(14), e149633 (2021).
Article CAS PubMed PubMed Central Google Scholar
Diorio, C. et al. Evidence of thrombotic microangiopathy in children with SARS-CoV-2 across the spectrum of clinical presentations. Blood Adv. 4(23), 6051–6063 (2020).
Article CAS PubMed PubMed Central Google Scholar
Diorio, C. et al. Proteomic profiling of MIS-C patients indicates heterogeneity relating to interferon gamma dysregulation and vascular endothelial dysfunction. Nat. Commun. 12(1), 7222 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Boucher, A. A. et al. Prolonged elevations of factor VIII and von willebrand factor antigen after multisystem inflammatory syndrome in children. J. Pediatr. Hematol. Oncol. 45(4), e427–e432 (2023).
Article CAS PubMed Google Scholar
Li, Y. et al. Lipid metabolism changes in patients with severe COVID-19. Clin. Chim. Acta 517, 66–73 (2021).
Article CAS PubMed PubMed Central Google Scholar
Mietus-Snyder, M. et al. Changes in HDL cholesterol, particles, and function associate with pediatric COVID-19 severity. Front. Cardiovasc. Med. 9, 1033660 (2022).
Article CAS PubMed PubMed Central Google Scholar
Stehouwer, N., Rowland-Seymour, A., Gruppen, L., Albert, J. M. & Qua, K. Validity and reliability of Brier scoring for assessment of probabilistic diagnostic reasoning. Diagnosis (Berlin) 12(1), 53–60 (2025).
Article Google Scholar

Download references

Acknowledgements

DIA mass spectrometry was performed by the Biological Mass Spectrometry Facility at Rutgers Robert Wood Johnson Medical School. We wish to thank David Sleat for his contribution in establishing and executing the DIA MS pipeline. This work was funded by NIH grants R61HD105619, R33HD105619, HD105593-03S2, R01AI158911, HD105613, and NCATS UM1TR004789, and by Rutgers ROI–HealthAdvance HA2022-0039.

Author information

Authors and Affiliations

Public Health Research Institute, Rutgers New Jersey Medical School, Newark, NJ, USA
Jeisac Guzmán Rivera & Maria Laura Gennaro
Center for Advanced Biotechnology and Medicine, Rutgers Biomedical and Health Sciences, Piscataway, NJ, USA
Haiyan Zheng
Pediatric Clinical Research Center, Rutgers Robert Wood Johnson Medical School, New Brunswick, NJ, USA
Benjamin Richlin, Christian Suarez & Sunanda Gaur
Division of Infectious Disease and Immunology, Department of Pediatrics, Robert Wood Johnson Medical School, New Brunswick, NJ, USA
Sunanda Gaur
Department of Pediatrics, Cooperman Barnabas Medical Center, Livingston, NJ, USA
Elizabeth Ricciardi & Uzma N. Hasan
Maria Fareri Children’s Hospital, Valhalla, NY, USA
William Cuddy & Aalok R. Singh
New York Medical College, Valhalla, NY, USA
Aalok R. Singh
Division of Rheumatology, MetroHealth System, Department of Pediatrics, Case Western Reserve University, Cleveland, OH, USA
Hulya Bukulmez
Departments of Internal Medicine, Pediatrics, and Population and Quantitative Health Sciences, Center for Clinical Informatics Research and Education, MetroHealth System, Case Western Reserve University, Cleveland, OH, USA
David C. Kaelber
Hackensack University Medical Center, Hackensack Meridian School of Medicine, Nutley, NJ, USA
Yukiko Kimura
Department of Pediatrics, Cincinnati Children’s Hospital, University of Cincinnati College of Medicine, Cincinnati, OH, USA
Patrick W. Brady
Children’s Hospital at Montefiore, Bronx, NY, USA
Dawn Wahezi & Evin Rothschild
Pediatric Genomics Discovery Program, Department of Pediatrics, Yale University School of Medicine, New Haven, CT, USA
Saquib A. Lakhani
Department of Pediatrics, Cedars Sinai Guerin Children’s, Los Angeles, CA, USA
Saquib A. Lakhani
Connecticut Children’s Medical Center, Connecticut Children’s Research Institute, Hartford, CT, USA
Katherine W. Herbst
Division of Hospital Medicine, Connecticut Children’s Medical Center, Hartford, CT, USA
Alexander H. Hogan
Department of Pediatrics, University of Connecticut Health Center, Farmington, CT, USA
Alexander H. Hogan
Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA
Juan C. Salazar
Division of Population Health, Quality, and Implementation Science, Department of Pediatrics, Robert Wood Johnson Medical School, New Brunswick, NJ, USA
Sandra Moroso-Fela, Lawrence C. Kleinman & Daniel B. Horton,
Department of Epidemiology and Biostatistics, Rutgers School of Public Health, Piscataway, NJ, USA
Jason Roy, Daniel B. Horton, & Dirk F. Moore
Department of Global Urban Health, Rutgers School of Public Health, Piscataway, NJ, USA
Lawrence C. Kleinman
Child Health Institute of New Jersey, New Brunswick, NJ, USA
Lawrence C. Kleinman
Rutgers Center for Pharmacoepidemiology and Treatment Science, Institute for Health, Health Care Policy and Aging Research, New Brunswick, NJ, USA
Daniel B. Horton,
Division of Rheumatology, Department of Pediatrics, Robert Wood Johnson Medical School, New Brunswick, NJ, USA
Daniel B. Horton,
Department of Medicine, Rutgers New Jersey Medical School, Newark, NJ, USA
Maria Laura Gennaro

Authors

Jeisac Guzmán Rivera
View author publications
Search author on:PubMed Google Scholar
Haiyan Zheng
View author publications
Search author on:PubMed Google Scholar
Benjamin Richlin
View author publications
Search author on:PubMed Google Scholar
Christian Suarez
View author publications
Search author on:PubMed Google Scholar
Sunanda Gaur
View author publications
Search author on:PubMed Google Scholar
Elizabeth Ricciardi
View author publications
Search author on:PubMed Google Scholar
Uzma N. Hasan
View author publications
Search author on:PubMed Google Scholar
William Cuddy
View author publications
Search author on:PubMed Google Scholar
Aalok R. Singh
View author publications
Search author on:PubMed Google Scholar
Hulya Bukulmez
View author publications
Search author on:PubMed Google Scholar
David C. Kaelber
View author publications
Search author on:PubMed Google Scholar
Yukiko Kimura
View author publications
Search author on:PubMed Google Scholar
Patrick W. Brady
View author publications
Search author on:PubMed Google Scholar
Dawn Wahezi
View author publications
Search author on:PubMed Google Scholar
Evin Rothschild
View author publications
Search author on:PubMed Google Scholar
Saquib A. Lakhani
View author publications
Search author on:PubMed Google Scholar
Katherine W. Herbst
View author publications
Search author on:PubMed Google Scholar
Alexander H. Hogan
View author publications
Search author on:PubMed Google Scholar
Juan C. Salazar
View author publications
Search author on:PubMed Google Scholar
Sandra Moroso-Fela
View author publications
Search author on:PubMed Google Scholar
Jason Roy
View author publications
Search author on:PubMed Google Scholar
Lawrence C. Kleinman
View author publications
Search author on:PubMed Google Scholar
Daniel B. Horton,
View author publications
Search author on:PubMed Google Scholar
Dirk F. Moore
View author publications
Search author on:PubMed Google Scholar
Maria Laura Gennaro
View author publications
Search author on:PubMed Google Scholar

Contributions

JGR, DFM, MLG: Conceptualization, Data interpretation and visualization, Writing—original draft, Writing—review and editing. HZ: Data acquisition, Writing— review and editing. LCK: Data interpretation, Writing—original draft, Writing—review and editing. BC, CS, SG, ER, UNH, WC, ARS, HB, DCK, YK, PWB, DW, ER, SAL, KWH, AHH, JCS, SMF, JR, DBH: Data interpretation, Writing—review and editing. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Dirk F. Moore or Maria Laura Gennaro.

Ethics declarations

Competing interests

The authors declare no competing interests.

Study approval

All study activities were approved by the Rutgers Institutional Review Board (Pro2020002961) and all methods were performed in accordance with the relevant guidelines. All participants, including parents and/or legal guardians, provided informed consent prior to engaging in study activities.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1. (download DOCX )

Supplementary Information 2. (download DOCX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Guzmán Rivera, J., Zheng, H., Richlin, B. et al. Mass spectrometry combined with machine learning identifies novel protein signatures as demonstrated with multisystem inflammatory syndrome in children. Sci Rep 15, 36843 (2025). https://doi.org/10.1038/s41598-025-20684-5

Download citation

Received: 20 June 2025
Accepted: 16 September 2025
Published: 22 October 2025
Version of record: 22 October 2025
DOI: https://doi.org/10.1038/s41598-025-20684-5