Plasma proteomics for biomarker discovery in childhood tuberculosis

Fossati, Andrea; Wambi, Peter; Jaganath, Devan; Calderon, Roger; Castro, Robert; Mohapatra, Alexander; McKetney, Justin; Luiz, Juaneta; Nerurkar, Rutuja; Nkereuwem, Esin; Franke, Molly F.; Mousavian, Zaynab; Collins, Jeffrey M.; Sigal, George B.; Segal, Mark R.; Kampman, Beate; Wobudeya, Eric; Cattamanchi, Adithya; Ernst, Joel D.; Zar, Heather J.; Swaney, Danielle L.

doi:10.1038/s41467-025-61515-5

Download PDF

Article
Open access
Published: 19 July 2025

Plasma proteomics for biomarker discovery in childhood tuberculosis

Nature Communications volume 16, Article number: 6657 (2025) Cite this article

8633 Accesses
6 Citations
20 Altmetric
Metrics details

Subjects

Abstract

Failure to rapidly diagnose tuberculosis disease (TB) and initiate treatment is a driving factor of TB as a leading cause of death in children. Current TB diagnostic assays have poor performance in children, thus a global priority is the identification of novel non-sputum-based TB biomarkers. Here we use high-throughput proteomics to measure the plasma proteome for 511 children, with and without HIV, and across 4 countries, to distinguish TB status using standardized definitions. By employing a machine learning approach, we derive four parsimonious biosignatures encompassing 3 to 6 proteins that achieve AUCs of 0.87–0.88 and which all reach the minimum WHO target product profile accuracy thresholds for a TB screening test. This work provides insights into the unique host response in pediatric TB disease, as well as a non-sputum biosignature that could reduce delays in TB diagnosis and improve the detection and management of TB in children worldwide.

Multicohort assessment of plasma metabolic signatures of tuberculosis disease in children: a retrospective cross-sectional study

Article Open access 23 January 2026

Plasma host protein signatures correlating with Mycobacterium tuberculosis activity prior to and during antituberculosis treatment

Article Open access 30 November 2022

Sex-specific blood-derived RNA biomarkers for childhood tuberculosis

Article Open access 23 July 2024

Introduction

Tuberculosis (TB) is the leading cause of mortality from an infectious disease worldwide, with 10.8 million cases and 1.3 million deaths each year¹. Children suffer a disproportionate burden: 12% of TB disease occurs in children, but children account for 15% of TB deaths¹. This disparity is largely due to delays in diagnosis and proper treatment initiation, as 96% of deaths are in children for whom treatment had not been initiated². While sputum-based diagnostic testing is routinely performed in adults, children are unable to reliably expectorate sputum, and sputum induction is typically required. Moreover, microbiological testing has sub-optimal sensitivity due to paucibacillary disease with low bacterial burden in children³. As a consequence, there is a large case detection gap where an estimated half of the children with TB disease, and two-thirds of those less than 5 years old, are not reported to public health programs¹. Consequently, the development of non-sputum biomarker TB tests is a global priority to improve TB diagnosis in children.

The majority of TB biomarker discovery studies have been done in adults⁴. In particular, host plasma protein biosignatures have shown promise for TB screening in adults, and have the potential to be translated into a simple point-of-care test^5,6,7,8. Unfortunately, these adult biomarkers have not been validated in children⁶ and translate poorly to pediatric TB disease due to different immune responses and disease manifestations in children⁹. A systematic review found that while there were pediatric-specific blood-based host markers that could meet the WHO target product profile for a TB screening test (≥ 90% sensitivity and ≥ 70% specificity)¹⁰, there was wide heterogeneity, with the majority being from lower quality case-control studies with unclear reference standards¹¹, and overall requiring further validation. Thus, limited biomarker candidates for childhood TB currently exist and the development of robust pediatric-specific host biosignatures is a global priority for early detection of pediatric TB cases¹².

While mass spectrometry (MS) based proteomic analysis enables a broad untargeted approach to biomarker discovery, previous plasma proteomics efforts to identify plasma biosignatures of TB disease have been limited by small sample size, variable reference standards, and exclusive use of healthy controls that overestimate performance by selection of general inflammation markers rather than TB-specific markers^7,13. Past studies also frequently utilized samples from a single region, leading to the discovery of candidate biomarkers that may reflect the co-morbidities and environment specific to the setting, and that fail to validate elsewhere. In this work, we utilize high-throughput plasma proteomics and well-characterized pediatric TB cohorts across four countries to derive a host-based biomarker signature that differentiates childhood TB disease from other causes of respiratory disease.

Results

Clinical characteristics of the cohort

We included plasma samples from 511 children with presumptive pulmonary TB from The Gambia (n = 120), Peru (n = 100), South Africa (n = 111), and Uganda (n = 180), of which 133 (26%) had microbiologically Confirmed TB, 120 (23.4%) had Unconfirmed TB (clinically diagnosed), and 231 (45.2%) were Unlikely TB cases (non-TB LRTI) based on NIH consensus definitions. To prioritize the detection of biomarkers that distinguish TB disease from other non-TB respiratory diseases, rather than non-specific inflammatory markers, our primary focus was on the comparison of Confirmed vs. Unlikely TB. We further confirmed the specificity for TB disease with a small number (n = 27) of asymptomatic healthy children from Uganda, of whom 8 (30%) had evidence of Latent TB infection based on a positive QuantiFERON-Gold test. Demographic and clinical characteristics are summarized in Table 1 and provided for each patient in Supplementary Data 1; the median age was 4 years (IQR 2–7), 46.4% were female, 11.2% were living with HIV, and 52.6% were underweight. Children with confirmed TB were significantly more likely to be living with HIV or be underweight than children with Unconfirmed or Unlikely TB.

Table. 1 Cohort demographic and clinical characteristics (N = 511)

Full size table

DIA-PASEF enabled high-throughput plasma proteomics

For all children, we started from 1 μL of undepleted plasma and performed high-throughput proteomics sample preparation¹⁴ followed by data-independent acquisition (DIA-PASEF) mass spectrometry analysis (Fig. 1a)¹⁵. In total, we quantified 7102 peptides and 859 proteins using a high-throughput (~35 min sample-to-sample) DIA-PASEF acquisition (Fig. 1b), with an average detection of 2628 peptides and 498 proteins per sample (Fig. 1c, d and Supplementary Data 2). From this analysis, we removed 7 outlier samples showing low numbers of peptides and proteins, resulting in 504 samples in total. We achieved an average data completeness of 60.4%, with 241 proteins detected in all 504 samples and 411 detected in more than 75% of the samples (Fig. 1e). The concentration of proteins in plasma exists over a wide dynamic range exceeding 10 orders of magnitude, with a subset of proteins having very high concentrations (e.g., albumin) that can preclude the detection of lower abundance proteins. As we did not use immunodepletion to remove proteins of high concentration¹⁶, we evaluated the dynamic range in proteins detected in our proteomics experiments using concentration values reported from antibody and MS based-assays (HumanProteinAtlas¹⁷). Based on this analysis, we were able to quantify proteins spanning more than 4 orders of magnitude. While those detections were biased towards proteins of higher concentration, we were able to reproducibly detect proteins down to a level of 12.1 ng/L concentration (SERPINF2), with a median concentration of 40 ng/L (Fig. 1f).

**Fig. 1: A high-throughput workflow for plasma proteomics.**

We next evaluated our data across samples from the four clinical sites (Fig. 2a), and observed a consistent signal distribution of MS protein abundances, devoid of upper-end skewing, across 5 orders of magnitude (Fig. 2b). This resulted in highly consistent protein detections across countries, in which 88.7% of all proteins were detected across all sites, with less than 1% of all proteins displaying country-specific identification patterns (Fig. 2c). To normalize any variation between the various clinical sites, batches of sample preparation, or MS acquisition batches, we utilized COMBAT¹⁸, a parametric approach commonly used in proteomics to mitigate batch effects¹⁹. We used as batches the various clinical sites, with added covariates of the MS acquisition and sample preparation batches. After normalization and COMBAT correction, we reduced our data to two dimensions using single-value decomposition to visualize the sample distribution after PCA and positively reduced batch effects for most samples as exemplified by the majority of the samples not being separated by first or second component (Fig. 2d). Lastly, from a quantitative standpoint, we achieved a low coefficient of variation (CV) both within each country (average = 7.9%) and across all countries (~8%) (Fig. 2e). Importantly, this analysis was performed using only proteins identified across more than 75% of the samples (n = 411) to avoid artificially decreasing the CV due to the imputation process (see Methods). Overall, this suggests the absence of substantial country-specific protein abundance differences and the possibility of using the combined data from all clinical sites for analysis of TB-specific differences.

**Fig. 2: Quality control and reproducibility of plasma proteomics across multiple clinical sites.**

Identification of TB disease candidate biomarkers

We first evaluated the ability of plasma proteomics to separate healthy children from symptomatic children undergoing evaluation for pulmonary TB by comparing the protein levels of known inflammatory markers. As expected, serum amyloid protein 1, 2, 4 (SAA1, SAA2, SAA4) and C-reactive protein (CRP) were all significantly upregulated among symptomatic children, with SAA2 displaying the greatest difference amongst the acute phase proteins (Fig. 3a). However, these inflammatory markers were not able to significantly distinguish between the different groups of symptomatic children (i.e., Confirmed, Unconfirmed, or Unlikely TB) (Fig. 3a).

**Fig. 3: Abundance proteomics analysis of pediatric TB cohorts.**

We next focused on comparing plasma protein levels in children with Confirmed TB (n = 112) and Unlikely TB (n = 235) to identify biomarkers that could distinguish TB disease from other non-TB respiratory diseases. From this comparison between Confirmed and Unlikely TB, we identified 47 proteins displaying significantly different abundances, of which 30 displayed downregulation and 17 displayed upregulation (Fig. 3b and Supplementary Data 3). Interestingly, one of the proteins displaying the most statistically significant regulation was the tryptophanyl t-RNA synthetase WARS1, which was increased in children with Confirmed TB vs. Unlikely TB (log2FC = 0.39, BH adjusted p = 3.3 × 10–5) (Fig. 3b), and is linked to TB infection via multiple mechanisms^20,21,22. Overall, the majority of these are known plasma proteins with previous classifications as secreted or extracellular proteins (38/48), minimizing the possibility of random variation in tissue leakage driving the distinction between groups. For the remaining 10 (WARS1, DBH, TUBA1A, ICAM1, GSN, LTA4H, SDC1, CSF1R, THBS4, CDH13), literature evaluation of their localization demonstrated the majority being potentially secreted (9/10) with only one (TUB1A1) not having reported extracellular localization.

We further identified upregulation of multiple specific immunoglobulin heavy (IGHV1-18, IGHV1-3, IGHV2-26, IGHV3-23, IGHV3-30) and light chain variable domains (IGKV1-16, IGKV1D-33, and IGKV3-20) across several countries (Fig. 3c and Supplementary Fig. 1), potentially suggesting an oligoclonal humoral response to TB disease. Additionally, we observed significantly different levels of several proteins (APOM, PON1, CPB2) (Fig. 3b), which have been previously identified in plasma proteomic studies of severe vs non severe COVID-19²³ and an adult TB study⁷, potentially pointing towards those proteins as general markers of lung inflammation rather than specific markers of pediatric TB disease.

Lastly, to more broadly identify pathways with dysregulated patterns between Confirmed TB and Unlikely TB, we performed a pathway enrichment analysis on each pathway included in the KEGG and REACTOME databases, using only gene sets with more than 50% of overlap with our plasma proteomic datasets. In total, 14 pathways showed significant differential means with Benjamini–Hochberg adjusted p-value < 0.05 (Fig. 3d). Amongst the pathways showing significant regulation, we identified several related to complement activation, which have also been identified in studies of whole blood transcriptomics in TB^24,25. Complement upregulation in the context of TB may reflect activation of the classical pathway by antigen-antibody complexes, activation of the alternative pathway or mannose-binding lectin pathway by components of the Mtb cell envelope or cell wall, and/or through increased synthesis as acute phase proteins.

Machine learning based identification of a TB biosignature

To identify the smallest subset of features achieving the required target product profile (TPP) for a screening test (70% specificity at 90% sensitivity), we first utilized LASSO to reduce the number of features to a subset that would allow exhaustive brute-force approaches. It should be noted that prior to this analysis, proteins with more than 50% missing values between Confirmed and Unlikely TB were removed to limit the impact of the data imputation on the final biosignature. The choice of LASSO over other feature selection approaches like Tree-based or recursive feature elimination (forward or reverse) was due to the inherent sparsity of the resulting solutions and the computational performance. We utilized LASSO using 20-fold cross-validation, which led to removal of a large portion of features, resulting in 50 with non-0 LASSO coefficients (Fig. 4a and Source Data). Notably, simply selecting the top N most important proteins by their LASSO feature importance did not reach the WHO TPP for any of the N utilized (Supplementary Fig. 2), which supports the use of deep combinatorial analysis to evaluate the performance of a small subset of features.

**Fig. 4: Machine learning to develop a parsimonious biosignature for pediatric TB disease.**

Thus, we decided to investigate the best combination of a small subset of features using the WHO TPP as an objective function. Specifically, we calculated all possible combinations of N features (from 1 to 6) and selected the combinations maximizing the sensitivity at 70% specificity. For combinations achieving the same sensitivity, we selected the one with the greater AUC. We derived six logistic regression models (trained on a 75% balanced subset of the data and tested on 25% of the remaining samples), of which four met or exceeded the WHO TPP for a screening test (Fig. 4b). The 5 protein models achieved 93% sensitivity at 70% specificity (95% CI for 5 protein model 0.73–0.99), and the 6 protein models achieved 96.7% sensitivity at 70% specificity (95% CI 0.83–0.99) on our test data (Fig. 4c, n = 83, 30 positive, 53 negative). The derived features for the 4 to 6 protein models were mostly shared, with APOM, TNC, and CD44 being shared across the 4, 5, and 6 protein models (Fig. 4d). The selected proteins for most models showed small variance and significantly different means across all TB classes (Fig. 4e), potentially suggesting their relevance in TB disease. Two proteins further showed regulation when comparing Confirmed TB and Unlikely TB: WARS1 (log2FC 0.38, q = 10 × ⁻5) and APOM (log2FC −0.45, q = 10 × ⁻5) (Figs. 3b and 4e). Each individual protein showed a low AUC ranging from 0.577 (HEG1) to 0.745 (APOM), suggesting the lack of a single indicative feature driving the AUC and the need for at least 3 proteins to achieve the WHO TPP (Supplementary Fig. 3).

Detection of unconfirmed TB

We tested the derived biosignatures on 115 Unconfirmed TB cases that passed our proteomics quality control filtering to assess if we could further identify TB cases in symptomatic children with culture-negative disease. Although the lack of microbiological confirmation raises the possibility that these cases did not represent true TB disease, all children in the Unconfirmed TB group had clinical signs and symptoms of TB and improved on anti-TB treatment. In this comparison, we only used biosignatures meeting or exceeding the WHO TPP for a screening test (3, 4, 5, and 6 protein models) and utilized as a probability threshold for classification the AUC point that achieved the WHO TPP. The various models supported the diagnosis of TB in Unconfirmed TB (negative by sputum-based testing) in ~79% of the cases, with different models predicting between 85 and 98 positive cases among the 115 children (Fig. 5a). We observed good agreement between predictions, with 73/115 samples (63%) positively predicted by all models (Fig. 5b). Importantly, we did not observe separation between healthy and latent TB when utilizing any of these three biosignatures, suggesting that these are specific for active TB disease (Supplementary Fig. 4). When evaluating the separation between the various Unconfirmed TB samples and the Confirmed TB group using all identified proteins, we observed a trend where samples positively predicted by the all four models (n = 70), clustered more closely to the Confirmed TB group in latent space derived by PCA, and showed separation on the first component from the negatively predicted unconfirmed samples (n = 11) (Fig. 5c). This suggests that we robustly extrapolated a valid biosignature as the individual contribution of these 8 proteins on the total number of proteins identified (850) is small with only TNC ranking amongst the top 20% features driving the separation on the first component (Supplementary Fig. 5).

Discussion

This study represents the largest TB plasma proteomics study in children to date, and encompasses a diverse pediatric cohort of >500 samples across clinical sites in four LMIC and two continents. The scale of this analysis was made possible by the use of data-independent acquisition to provide high-throughput, accurate, and precise quantification of hundreds of proteins within only ~30 min of MS acquisition. This is in contrast to previous work for the development of host-based biomarker for TB using plasma proteomics, which have revolved around the use of proteomic multiplexing for quantification (e.g., ITRAQ) and long acquisition times, both of which are detrimental for acquisition of large clinical cohorts^13,26,27. Furthermore, the power of this study is amplified by our cohort design, which includes both healthy controls and >200 controls with non-TB respiratory diseases. The inclusion of a non-TB respiratory disease control group addresses a key clinical diagnostic challenge to distinguish children with pulmonary TB disease from those with symptoms due to other causes. Inclusion of this control group avoids the detection of candidate TB biomarkers that are non-specific inflammatory markers that cannot differentiate among symptomatic states, as observed with CRP, SAA1, SAA2, SAA3, and SAA4, and which were included in previous plasma proteomic biosignatures^7,13.

An important milestone of this work is the application of machine learning to develop a minimal host-based biosignature consisting of 3–6 proteins that separate children with Confirmed TB vs. Unlikely TB at a level of specificity and sensitivity that meets or exceeds the WHO criteria for a TB screening test¹⁰. We found WARS1 to be a part of the 4 protein biosignature, and has been identified in adult proteomic studies as a promising TB biomarker^6,28. WARS1 (also known as TrpRS or SYWC) has previously been linked to TB infection by multiple mechanisms. First, upon Mycobacterium tuberculosis infection, a multitude of lymphocytes, including CD4 and CD8 T cells, noncanonical T cells, natural killer cells, and type 1 innate lymphoid cells upregulate¹⁰ interferon gamma (IFNγ) as part of the host immune response, which in turn induces WARS1 expression²⁹. WARS1 is also induced by tryptophan depletion³⁰. Tryptophan depletion by the kynurenine pathway has been detected in multiple metabolomic studies in active TB disease^31,32,33, hence our data further supports previous reports on the importance of Tryptophan metabolism in active TB diseases versus other respiratory illnesses.

Several of the other proteins have either not been associated with TB or only described in adult biosignatures, which further highlights the need for pediatric-specific analyses. For example, TNC (Tenascin-C) is associated with lung disease but not specifically with TB³⁴. In mice, it has been shown that CD44 is a macrophage binding site for M. Tuberculosis that can provide protective immunity³⁵. Further studies in adult TB patients have identified CD44 as a serum biomarker for multidrug-resistant TB in adult patients³⁶. In the case of MMP-2, this protein has been studied in the context of adult TB and found to be elevated in respiratory specimens as compared to healthy controls^37,38 and correlates with markers of disease severity, such as cavitation. However, less is known about the role of this protein in biofluids such as plasma, or in pediatric patients.

APOM was also found across the signatures, and was significantly downregulated in Confirmed versus Unlikely TB. M. tuberculosis infection alters lipid metabolism^7,39,40, and a variety of apolipoproteins have been identified in adult proteomic studies as candidate biomarkers that are also downregulated. While APOM has not been previously reported, it is associated with HDL, which has been found to be lower in individuals with TB and correlated with radiologic extent of disease⁴¹. At the same time, comorbidities such as malnutrition and HIV infection increase the risk of TB and can also alter metabolism, and may have contributed to these differential markers⁴². We found that the proportion of HIV and malnutrition were higher in children with Confirmed TB, but we were limited in the sample size of our test set to perform further subgroup analyses. Prospective validation of these markers is thus needed, overall, by setting, and among key risk groups including infants, children with HIV, and malnutrition.

Importantly, our protein biosignatures did not separate between healthy children and children with latent TB infection for any of the tested models, suggesting that these protein biosignatures are specific for active TB disease. Moreover, application of these host-biosignatures to children with Unconfirmed TB was able to further support a potential diagnosis of TB in ~63% of cases that were negative by sputum-based testing. Although the lack of microbiological confirmation raises the possibility that these cases did not represent true TB disease, all children had clinical signs and symptoms of TB and improved with anti-TB treatment. However, it is important to note that we cannot know with certainty whether our biosignatures are correct in these classifications of TB among the Unconfirmed TB group. Future clinical trials in which anti-TB treatment is provided based on biosignature results would be required to fully address this question.

While the biosignatures derived for childhood TB in his study are the result of a large-scale untargeted discovery-proteomics approach, there have been several targeted cytokine-based signatures identified for TB in children^11,43,44, which are proteins that are often below the limit of detection by mass spectrometry⁴⁵. Furthermore, in several cases, these targeted analyses were completed at a single center with a small sample size. For example, prior work identified a 3-cytokine signature to distinguish children with TB disease from other respiratory diseases in the Gambia, but they achieved a lower AUC of 0.74 and 72.2% sensitivity⁴⁴. Our study benefited from a large sample size, representation from four countries, with a high proportion who were under five years old, living with HIV, and were undernourished. While further prospective validation and subgroup analyses are needed to evaluate robustness and reproducibility, our findings suggest that a simple host-based proteomic signature could be a valuable non-sputum TB screening test for children. To further enable translation, there is also a need for greater development of technologies to support multiplex testing at the point-of-care⁴⁶.

It is important to note that there are limitations to our study. As noted above, the power of our derived biosignatures will require further validation through, for example, prospective clinical trials in which anti-TB treatment is based on patient biosignature classifications. Our biosignatures also include immunoglobulin G proteins. While we observe highly consistent detection of these proteins across our cohort, the high degree of polymorphism in these proteins across the human population may limit their broad utility in a biosignature. Additionally, the accuracy of these biosignatures in subgroups of our cohorts was limited by sample sizes. From a technical perspective, plasma sampling, sample preparation, and data collection, may each have introduced a bias in our results. This includes our prioritization of throughput and reproducibility by not utilizing protein depletion strategies such as antibody or protein coronas. In general, we attempted to mitigate these biases by randomization across the workflow, including specimen collection and data acquisition, and post-analysis computational batch correction. Finally, our biosignatures were evaluated on the test set and not a fully independent hold-out set, hence, the reported performance may be optimistic due to multiple testing, and should be interpreted as exploratory rather than confirmatory.

In conclusion, untargeted proteomics was able to broadly evaluate the plasma of children across four countries, and identify candidate host protein biomarkers that could distinguish pediatric TB disease from other respiratory diseases. Moreover, from these candidate markers, we identified a plasma protein biosignatures of only 3–6 proteins for childhood TB disease that achieved the minimum accuracy for a TB screening tool. These efforts have provided greater characterization of the unique immune response in pediatric TB disease, while providing a non-sputum biosignature that could reduce delays in TB diagnosis and improve detection and management of TB in children worldwide.

Methods

Ethical considerations

This study complies with all relevant ethical regulations. All caregivers completed a written informed consent, including for storage of samples for future studies, and children completed an assent as applicable. The studies were approved by the Mulago Hospital Ethics Research Committee, Gambian Government, and MRC joint ethics committee, London School of Hygiene and Tropical Medicine, Institutional Ethics Committee for Research of National Institute of Health—Peru, University of Cape Town, and the University of California, San Francisco (UCSF) IRB.

Pediatric TB cohort

We analyzed plasma samples that were collected from children less than 15 years old evaluated for pulmonary TB who were previously enrolled as part of prospective diagnostic cohort studies in the Gambia, Peru, South Africa, and Uganda. Children were included if they had signs and symptoms of pulmonary TB, and excluded if they were already taking treatment for TB infection or disease for more than 72 h. All children completed a standard TB evaluation, including clinical exam, chest X-ray, and respiratory sample collection for Xpert MTB/RIF molecular testing and mycobacterial culture. All children had follow-up after 2–3 months, and were assessed for clinical response to any treatment. They were classified according to NIH consensus definitions as Confirmed, Unconfirmed, or Unlikely TB. Confirmed TB was defined as having microbiological evidence of TB disease by a positive Xpert MTB/RIF Ultra or mycobacterial culture positive for M. tuberculosis. Unconfirmed TB cases did not have microbiological evidence of TB, but had signs and symptoms of TB disease with other clinical signs or risk factors suggestive of TB, including abnormal chest X-ray and/or known TB contact. They were started on anti-TB treatment with improvement at the follow-up visit. Unlikely TB cases were symptomatic, but did not have microbiological evidence of TB disease nor other signs or risk factors. In addition, asymptomatic healthy children from Uganda were enrolled, who had interferon-gamma release assay (IGRA) testing with Quantiferon-Gold (Qiagen, Hilden, Germany) testing for TB infection. Healthy controls were defined as asymptomatic and IGRA negative, while Latent TB infection cases were defined as asymptomatic with positive IGRA results. The gender of participants was self-reported in the baseline questionnaire, and was not considered in the study design.

Sample collection and selection

Trained staff performed venipuncture and collected blood samples in all children at baseline and within 72 h of any TB treatment. Blood samples were centrifuged and plasma samples aliquoted and placed in −80 °C freezers. For this analysis, each study site randomly selected plasma samples from Confirmed, Unconfirmed, and Unlikely TB cases in a 1:1:2 ratio, respectively. In addition, a convenience sample of plasma specimens was selected of asymptomatic children from Uganda.

Sample preparation for plasma proteomics

We analyzed a total of 511 plasma samples, with each sample representing an individual patient (n = 1). From each sample, 1 μL of undepleted plasma was transferred in a 96-well plate with 200 μL of inactivation buffer (8 M urea, 100 mM ammonium bicarbonate, 150 mM NaCl), and 0.75 μL/mL of RNAse (NEB) was added. The proteins were transferred to a 96-well filter plate and processed similarly to what we previously described¹⁴. Briefly, the plates were dried by centrifugation (1800 × g at 25 °C for 30 min) and 50 μL of TUA buffer (8 M urea, 20 mM ammonium bicarbonate, 5 mM TCEP) were added. Following incubation at RT on a shaker (500 rpm, 25 °C), chloroacetamide (CAA) was added to 10 mM final concentration and the plates were incubated in the dark for 1 h at room temperature. TCEP/CAA were removed by centrifugation (2000 × g, 30 min, RT) and the plates were washed thrice with 200 μL of ddH20. Trypsin was added in a 1:50 ratio and the samples were digested overnight at 37 °C on a shaker (800 rpm). Peptides were collected by centrifugation (2000 × g, 30 min at RT) and the plate was washed once with 100 μL of ddH20. Resulting peptides were dried under vacuum and were resuspended at approximately 200 ng/μL prior to MS injection and DIA-PASEF analysis. Additionally, from these samples, a representative pool of HIV positive and TB-positive cases were further high-pH fractionated on C18 tips and measured by DDA-PASEF to generate a spectral library⁴⁷. Briefly, this high-pH fractionation was performed using C18 spin columns. These columns were first activated by treatment with one column volume of acetonitrile, followed by equilibration by two column volumes of 0.1% TFA. Peptides were subsequently loaded onto the C18 columns and washed twice with 0.1% TFA. A stepwise elution of bound peptides was performed using increasing concentrations of acetonitrile (5%, 7.5%, 10%, 12.5%, 15%, 17.5%, 20%, 50%) in 0.1% triethylamine (pH 10), and lastly with 2 washes of 50% acetonitrile. The resulting fractions were dried by vacuum centrifugation and resuspended on 0.1% formic acid prior to MS analysis by DDA-PASEF.

DIA-PASEF data acquisition for abundance proteomics

Approx 200 ng per sample were analyzed on a Bruker TimsTOF Pro interfaced with a Ultimate 3000 UHPLC. Peptides were separated using a 15 cm PepSep column (Bruker, 150 cm length, 1.7 μm Reprosil Saphir C18 beads) and sprayed into the Captive source kept at 1700 V and 200 °C. The peptides were separated from 2 to 33% of buffer B (0.1% formic acid in acetonitrile) for 26 min, then B was increased to 90% buffer B for 5 min, and then the column was re-equilibrated at 5% buffer B for 2 min, reaching a total gradient time of 33 min. Buffer A of this separation was 0.1% formic acid. The samples were acquired in DIA-PASEF mode using nine 32 m/z DIA-PASEF windows (500–966 mz) and ion mobility between 0.85 and 1.3 Vs/cm². Data for selected samples was re-acquired when significant mass shifts were observed or when consecutive injections had reduced signal.

DDA-PASEF and DIA-PASEF data analysis

To generate a spectral library for the analysis of DIA-PASEF data files, DDA-PASEF files were searched using MSfragger⁴⁸ within the FragPipe toolkit (v1.8) using the library generation workflow (“DIA-Speclib-quant”) using a human FASTA downloaded in January 2022 (20408 entries). This search was performed using tryptic cleavage specificity, with 2 missed cleavages, fixed modification of carbamidomethylation on cysteine residues, variable modification of methionine oxidation and protein n-terminal acetylation, a precursor mass tolerance of optimized per sample ranging from −20 to +20 ppm (default in FragPipe), as product ion mass tolerance of 20 ppm, and a minimum peptide length of 8. Resulting peptide identifications were filtered to a 1% FDR at the peptide and protein level. The generated library and our previously reported plasma library⁴⁷ were merged using easypqp (https://github.com/grosenberger/easypqp). All DIA-PASEF samples were searched with DIA-NN (v1.8)⁴⁹ using a library-based strategy. MS1 and MS2 tolerances were set to 10 ppm. Protein grouping was performed based on the library ids and cross run-normalization was disabled. Following search, the global report file was filtered to <= 1% protein group Q-values (‘Lib.PG.Q.Value’). Samples were excluded if the number of peptides was below 3 standard deviations of the median number of peptides (2591), which removes samples with less than 1700 peptides. The peptide-level data was normalized using median-centering of the peptides identified in all samples.

Following normalization, the missing values were imputed utilizing an heuristic strategy based on their identification frequency to leverage the large number of samples analyzed in this study.

The following rules were applied:

Peptides identified in > 50% of the samples (at least 250 independent identifications) were imputed with the mean identification value,
Peptides identified in <50% but > 10% of the samples were imputed utilizing a random value extracted from a generated gaussian distribution with mu and sigma of the data downshifted 1.8 × sigma
Peptides identified in <10% of the samples were removed.

Following imputation, the peptide-level data was batch corrected using COMBAT¹⁸ to normalize any variation between the clinical sites, batches of sample preparation, or MS acquisition batches. We used as batches the various clinical sites, with added covariates of the MS acquisition and sample preparation batches (i.e., the different plates). Peptides were rolled into proteins utilizing only proteotypic peptides and a topN strategy (max 3 proteotypic peptides per protein), using the mean intensity to represent a protein intensity. For gene set enrichment analysis, we used the MDtest function (nperm = 1000) from the GSAR R package using the protein intensity values from Confirmed and Unlikely TB samples as input⁵⁰. Protein sets corresponding to known biological pathways were used as the input gene sets. For each signaling pathway, this function performed a two-sided mean difference test of the null hypothesis that there is no difference in the mean of a set of features (i.e., proteins) between two conditions (confirmed TB vs. unlikely TB). Resulting p-values were then adjusted by the Benjamini–Hochberg (BH) approach.

Machine learning based identification of a TB biosignature

Protein-level intensities after normalization across all clinical sites and HIV status for Confirmed TB (n = 120) and Unlikely TB (n = 211) were selected and z-scored. For increased stringency in our proteins for biosignature development, we restricted it to only proteins with 50% or less missing values among the combined collection of patient samples from the Confirmed and Unlikely TB groups. We then selected from the remaining proteins, combinations exceeding the required WHO target product profile for a diagnostic test. Confirmed TB and Unlikely TB cases were included, given clear reference standards for TB and not TB. First, a random 75% of the data was selected for training a LASSO model using scikit-learn LASSOCv function (20 folds stratified by TB class, max_iter = 10000, tol = 0.0001). The feature importance was calculated and the proteins with non 0 coefficients were used for combinational analysis (n = 50 proteins). In this analysis, we generated all possible combinations of features ranging from 1 (50 combinations) to 6 (n = 15,890,700 combinations) and trained a logistic regression model based on the z-scored abundance for each specific combination. The remaining 25% of data was then used as a test set for model evaluation for all models and was not utilized for training at any step in this initial analysis. Models for every N were ranked based on the sensitivity achieved at 90% specificity (on our 25% test split) and the top scoring models for every N were kept for subsequent analysis. Confidence intervals were calculated using the Clopper-Pearson (exact binomial) method. We then applied models achieving the required WHO TPP (3, 4, 5, and 6 protein models) to the Unconfirmed TB cases to determine what proportion could be diagnosed using this model.

Computational packages utilized

Raw proteomics data was analyzed with either MSFragger⁴⁸ (DDA data) or with DIA-NN (DIA data)⁴⁹, and the generated DDA library and our previous reported plasma library⁴⁷ were merged using easypqp (https://github.com/grosenberger/easypqp). For data processing, model training, and figure generation, we used the following packages in Python (v3.8.2): scikit-learn (v1.5.1), pandas (v2.2.2), numpy (v.1.26.4), pyCombat (v), https://github.com/epigenelabs/pyComBat, joblib (v.1.4.2), seaborn (0.13.2), matplotlib (v.3.9.2), matplotlib-base (v3.9.2), scipy (v1.13.1), statsmodel (v0.14.2). The following packages in R (v.4.3.1, release ‘Beagle Scouts’) were used for figure generation: ggplot2 (v.3.5.1), RcolorBrewer (v1.1.3), viridis (v0.6.5), ggpubr (v0.6.0), ggsci (v3.2.0). Additionally, the GSAR R package (v.1.40.0) was used for analysis of the log2FC between Confirmed and Unlikely TB. All code for data analysis, imputation, and figure plots is available here: https://github.com/anfoss/COMBO_code.git.

Statistics and reproducibility

We randomly selected plasma samples in a 1:1:2 ratio of Confirmed:Unconfirmed:Unlikely TB, and sample size was determined by availability of specimens and to ensure adequate precision in the test set. With a sample size of 500 and 25% held for the test set, we would be powered to measure a sensitivity of 90% +/− 12% and specificity of 70% +/− 10% when comparing Confirmed to Unlikely TB. Samples were batched by country, and randomized within a given sample preparation plate and data acquisition for each country and staff were blinded to TB status during data acquisition. All samples were analyzed once with the exception of selected samples where there was evidence of instrument performance deviation, including the observation of significant mass shifts or consecutive injections with reduced signal. Data for these samples was re-collected, and this re-collected data is presented in this study. Samples not passing QCs defined in the section “DIA-PASEF enabled high-throughput plasma proteomics” were removed (n = 7). In the machine learning analysis, data were excluded for greater than 50% missingness.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The raw and processed MS data generated in this study has been deposited in the MassIVE repository with the following dataset Identifier: MSV000096394 and in the ProteomeXchange with the following dataset identifier: PXD057814 with the https://doi.org/10.25345/C5F18SS6N. Source data are provided with this paper.

Code availability

All code for data analysis, imputation, and figure plots is available here: https://github.com/anfoss/COMBO_code.git and at https://doi.org/10.5281/zenodo.15591003.

References

World Health Organization. Global Tuberculosis Report 2024 (World Health Organization, Genève, Switzerland, 2024).
World Health Organization & Viney, K. Roadmap Towards Ending TB in Children and Adolescents (World Health Organization, Genève, Switzerland, 2023).
Jaganath, D., Beaudry, J. & Salazar-Austin, N. Tuberculosis in children. Infect. Dis. Clin. North Am. 36, 49–71 (2022).
Article PubMed PubMed Central Google Scholar
MacLean, E. et al. A systematic review of biomarkers to detect active tuberculosis. Nat. Microbiol. 4, 748–758 (2019).
Article CAS PubMed Google Scholar
Yao, F. et al. Plasma immune profiling combined with machine learning contributes to diagnosis and prognosis of active pulmonary tuberculosis. Emerg. Microbes Infect. 13, 2370399 (2024).
Article PubMed PubMed Central Google Scholar
Koeppel, L. et al. Diagnostic performance of host protein signatures as a triage test for active pulmonary TB. J. Clin. Microbiol. 61, e0026423 (2023).
Article PubMed Google Scholar
Garay-Baquero, D. J. et al. Comprehensive plasma proteomic profiling reveals biomarkers for active tuberculosis. JCI Insight 5, e137427 (2020).
Article PubMed PubMed Central Google Scholar
Mousavian, Z., Källenius, G. & Sundling, C. From simple to complex: Protein-based biomarker discovery in tuberculosis. Eur. J. Immunol. 53, e2350485 (2023).
Article PubMed Google Scholar
Gaeddert, M. et al. Host blood protein biomarkers to screen for tuberculosis disease: a systematic review and meta-analysis. J. Clin. Microbiol. 62, e0078624 (2024).
Article PubMed Google Scholar
High priority target product profiles for new tuberculosis diagnostics: report of a consensus meeting. https://www.who.int/publications/i/item/WHO-HTM-TB-2014.18 (2017).
Togun, T. O., MacLean, E., Kampmann, B. & Pai, M. Biomarkers for diagnosis of childhood tuberculosis: a systematic review. PLoS ONE 13, e0204029 (2018).
Article PubMed PubMed Central Google Scholar
Suliman, S., Jaganath, D. & DiNardo, A. Predicting pediatric tuberculosis: The need for age-specific host biosignatures. Clin. Infect. Dis. 77, 450–452 (2023).
Article PubMed PubMed Central Google Scholar
Schiff, H. F. et al. Integrated plasma proteomics identifies tuberculosis-specific diagnostic biomarkers. JCI Insight 9, e173273 (2024).
PubMed PubMed Central Google Scholar
Fossati, A. et al. System-wide profiling of protein complexes via size exclusion chromatography-mass spectrometry (SEC-MS). Methods Mol. Biol. 2259, 269–294 (2021).
Article CAS PubMed Google Scholar
Gillet, L. C. et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteom. 11, O111.016717 (2012).
Article Google Scholar
Ignjatovic, V. et al. Mass spectrometry-based plasma proteomics: considerations from sample collection to achieving translational data. J. Proteome Res. 18, 4085–4097 (2019).
Article CAS PubMed PubMed Central Google Scholar
Thul, P. J. & Lindskog, C. The human protein atlas: a spatial map of the human proteome. Protein Sci. 27, 233–244 (2018).
Article CAS PubMed Google Scholar
Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).
Article PubMed Google Scholar
Čuklina, J. et al. Diagnostics and correction of batch effects in large-scale proteomic studies: a tutorial. Mol. Syst. Biol. 17, e10240 (2021).
Article PubMed PubMed Central Google Scholar
Scriba, T. J. et al. Sequential inflammatory processes define human progression from M. tuberculosis infection to tuberculosis disease. PLoS Pathog. 13, e1006687 (2017).
Article PubMed PubMed Central Google Scholar
Shi, W. et al. Plasma indoleamine 2,3-dioxygenase activity as a potential biomarker for early diagnosis of multidrug-resistant tuberculosis in tuberculosis patients. Infect. Drug Resist. 12, 1265–1276 (2019).
Article CAS PubMed PubMed Central Google Scholar
Olsson, O. et al. Kynurenine/tryptophan ratio for detection of active tuberculosis in adults with HIV prior to antiretroviral therapy. AIDS 36, 1245–1253 (2022).
Article CAS PubMed Google Scholar
Ciccosanti, F. et al. Proteomic analysis identifies a signature of disease severity in the plasma of COVID-19 pneumonia patients associated to neutrophil, platelet and complement activation. Clin. Proteom. 19, 38 (2022).
Article CAS Google Scholar
Suliman, S. et al. Four-gene pan-African blood signature predicts progression to tuberculosis. Am. J. Respir. Crit. Care Med. 197, 1198–1208 (2018).
Article CAS PubMed PubMed Central Google Scholar
Esmail, H. et al. Complement pathway gene activation and rising circulating immune complexes characterize early disease in HIV-associated tuberculosis. Proc. Natl. Acad. Sci. USA 115, E964–E973 (2018).
Article CAS PubMed PubMed Central Google Scholar
Chen, C., Yan, T., Liu, L., Wang, J. & Jin, Q. Identification of a novel serum biomarker for tuberculosis infection in Chinese HIV patients by iTRAQ-based quantitative proteomics. Front. Microbiol. 9, 330 (2018).
Article PubMed PubMed Central Google Scholar
Xu, D. et al. Serum protein S100A9, SOD3, and MMP9 as new diagnostic biomarkers for pulmonary tuberculosis by iTRAQ-coupled two-dimensional LC-MS/MS. Proteomics 15, 58–67 (2015).
Article CAS PubMed Google Scholar
De Groote, M. A. et al. Discovery and validation of a six-marker serum protein signature for the diagnosis of active pulmonary tuberculosis. J. Clin. Microbiol. 55, 3057–3071 (2017).
Article PubMed PubMed Central Google Scholar
Yeung, M. L. et al. Human tryptophanyl-tRNA synthetase is an IFN-γ-inducible entry factor for Enterovirus. J. Clin. Invest. 128, 5163–5177 (2018).
Article PubMed PubMed Central Google Scholar
Nguyen, T. T. T. et al. Tryptophan-dependent and -independent secretions of tryptophanyl- tRNA synthetase mediate innate inflammatory responses. Cell Rep. 42, 111905 (2023).
Article CAS PubMed Google Scholar
Adu-Gyamfi, C. G. et al. Diagnostic accuracy of plasma kynurenine/tryptophan ratio, measured by enzyme-linked immunosorbent assay, for pulmonary tuberculosis. Int. J. Infect. Dis. 99, 441–448 (2020).
Article CAS PubMed Google Scholar
Ardiansyah, E. et al. Tryptophan metabolism determines outcome in tuberculous meningitis: a targeted metabolomic analysis. Elife 12, e85307 (2023).
Collins, J. M. et al. Tryptophan catabolism reflects disease activity in human tuberculosis. JCI Insight 5, e137131 (2020).
Article PubMed PubMed Central Google Scholar
Donovan, C. et al. Tenascin C in lung diseases. Biol. (Basel) 12, 199 (2023).
CAS Google Scholar
Leemans, J. C. et al. CD44 is a macrophage binding site for Mycobacterium tuberculosis that mediates macrophage recruitment and protective immunity against tuberculosis. J. Clin. Invest. 111, 681–689 (2003).
Article CAS PubMed PubMed Central Google Scholar
Wang, C. et al. A group of novel serum diagnostic biomarkers for multidrug-resistant tuberculosis by iTRAQ-2D LC-MS/MS and Solexa sequencing. Int. J. Biol. Sci. 12, 246–256 (2016).
Article ADS PubMed PubMed Central Google Scholar
Walker, N. F. et al. Doxycycline and HIV infection suppress tuberculosis-induced matrix metalloproteinases. Am. J. Respir. Crit. Care Med. 185, 989–997 (2012).
Article CAS PubMed PubMed Central Google Scholar
Rohlwink, U. K. et al. Matrix metalloproteinases in pulmonary and central nervous system tuberculosis-A review. Int. J. Mol. Sci. 20, 1350 (2019).
Article CAS PubMed PubMed Central Google Scholar
van der Klugt, T., van den Biggelaar, R. H. G. A. & Saris, A. Host and bacterial lipid metabolism during tuberculosis infections: possibilities to synergise host- and bacteria-directed therapies. Crit. Rev. Microbiol. 1, 21 (2024).
Google Scholar
Achkar, J. M. et al. Host protein biomarkers identify active tuberculosis in HIV uninfected and co-infected individuals. EBioMedicine 2, 1160–1168 (2015).
Article PubMed PubMed Central Google Scholar
Deniz, O. et al. Serum total cholesterol, HDL-C and LDL-C concentrations significantly correlate with the radiological extent of disease and the degree of smear positivity in patients with pulmonary tuberculosis. Clin. Biochem. 40, 162–166 (2007).
Article CAS PubMed Google Scholar
Feingold, K. R. et al. Infection and inflammation decrease apolipoprotein M expression. Atherosclerosis 199, 19–26 (2008).
Article CAS PubMed Google Scholar
Kumar, N. P. et al. Discovery and validation of a three-cytokine plasma signature as a biomarker for diagnosis of pediatric tuberculosis. Front. Immunol. 12, 653898 (2021).
Article CAS PubMed PubMed Central Google Scholar
Togun, T. et al. A three-marker protein biosignature distinguishes tuberculosis from other respiratory diseases in Gambian children. EBioMedicine 58, 102909 (2020).
Article PubMed PubMed Central Google Scholar
Geyer, P. E., Holdt, L. M., Teupser, D. & Mann, M. Revisiting biomarker discovery by plasma proteomics. Mol. Syst. Biol. 13, 942 (2017).
Article PubMed PubMed Central Google Scholar
Dincer, C., Bruch, R., Kling, A., Dittrich, P. S. & Urban, G. A. Multiplexed point-of-care testing - xPOCT. Trends Biotechnol. 35, 728–742 (2017).
Article CAS PubMed PubMed Central Google Scholar
Fossati, A. et al. Toward comprehensive plasma proteomics by orthogonal protease digestion. J. Proteome Res. 20, 4031–4040 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kong, A. T., Leprevost, F. V., Avtonomov, D. M., Mellacheruvu, D. & Nesvizhskii, A. I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat. Methods 14, 513–520 (2017).
Article CAS PubMed PubMed Central Google Scholar
Demichev, V., Messner, C. B., Vernardis, S. I., Lilley, K. S. & Ralser, M. DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nat. Methods 17, 41–44 (2020).
Article CAS PubMed Google Scholar
Rahmatallah, Y., Zybailov, B., Emmert-Streib, F. & Glazko, G. GSAR: bioconductor package for gene set analysis in R. BMC Bioinforma. 18, 61 (2017).
Article Google Scholar

Download references

Acknowledgements

The authors are grateful to children and parents for their participation, and to the clinicians and staff at our multiple clinical sites who provided care for the children in this study. NIH funding R01AI152161 and R01AI175312 to A.C., J.D.E., and D.L.S., NIH K23HL153581 to D.J., NIH K23AI144040 to J.M.C., and NIH U19AI109755 to M.F.F. and RCalderon. UKRI MR/P024270/1 and MR/K011944/1 funding to B.K. South African Medical Research Council (SA-MRC) funding to H.Z. We also extend our gratitude to the Paul Farmer African Initiative for Research (PFAIR) for enabling a collaborative partnership between researchers from institutions in Africa and North America to exchange diverse perspectives that enrich research and drive discovery. Figure 1A was created in BioRender by AF (2025) https://BioRender.com/o99n413.

Author information

Zaynab Mousavian
Present address: Department of Global Health, Rollins School of Public Health, Emory University, Atlanta, GA, USA
These authors contributed equally: Andrea Fossati, Peter Wambi, Devan Jaganath.
A full list of members and their affiliations appears in the Supplementary Information.

Authors and Affiliations

J. David Gladstone Institutes, San Francisco, CA, USA
Andrea Fossati, Justin McKetney & Danielle L. Swaney
Quantitative Biosciences Institute (QBI), University of California San Francisco, San Francisco, CA, USA
Andrea Fossati, Justin McKetney & Danielle L. Swaney
Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco, CA, USA
Andrea Fossati, Justin McKetney & Danielle L. Swaney
Uganda Tuberculosis Implementation Research Consortium, Walimu, Kololo, Kampala, Uganda
Peter Wambi & Eric Wobudeya
Institute for Global Health Sciences, Center for Tuberculosis, University of California San Francisco, San Francisco, CA, USA
Devan Jaganath, Robert Castro, Alexander Mohapatra, Rutuja Nerurkar, Adithya Cattamanchi & Joel D. Ernst
Department of Pediatrics, Division of Pediatric Infectious Diseases, University of California San Francisco, San Francisco, CA, USA
Devan Jaganath & Rutuja Nerurkar
Advanced Research and Health, Lima, Peru
Roger Calderon
Department of Medicine, Division of Pulmonary and Critical Care Medicine, University of California San Francisco, San Francisco, CA, USA
Robert Castro
Department of Medicine, Division of Experimental Medicine, University of California San Francisco, San Francisco, California, CA, USA
Alexander Mohapatra & Joel D. Ernst
Department of Pediatrics and Child Health, South African Medical Research Council Unit on Child and Adolescent Health, University of Cape Town, Cape Town, South Africa
Juaneta Luiz & Heather J. Zar
Department of Pediatrics, Dora Nginza Hospital, Gqeberha, South Africa
Juaneta Luiz
Vaccines and Immunity Theme, MRC Unit The Gambia at the London School of Hygiene and Tropical Medicine, Fajara, The Gambia
Esin Nkereuwem & Beate Kampman
Department of Global Health and Social Medicine, Harvard Medical School, Boston, MA, USA
Molly F. Franke
Division of Infectious Diseases, Department of Medicine Solna, Karolinska Institutet, Stockholm, Sweden
Zaynab Mousavian
Division of Infectious Diseases, Department of Medicine, Emory University School of Medicine, Atlanta, GA, USA
Jeffrey M. Collins
Meso Scale Diagnostics, LLC., Rockville, MD, USA
George B. Sigal
Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA
Mark R. Segal
Charité Center for Global Health, Charité Universitätsmedizin Berlin, Berlin, Germany
Beate Kampman
Division of Pulmonary Diseases and Critical Care Medicine, Department of Medicine, University of California Irvine, Irvine, CA, USA
Adithya Cattamanchi

Authors

Andrea Fossati
View author publications
Search author on:PubMed Google Scholar
Peter Wambi
View author publications
Search author on:PubMed Google Scholar
Devan Jaganath
View author publications
Search author on:PubMed Google Scholar
Roger Calderon
View author publications
Search author on:PubMed Google Scholar
Robert Castro
View author publications
Search author on:PubMed Google Scholar
Alexander Mohapatra
View author publications
Search author on:PubMed Google Scholar
Justin McKetney
View author publications
Search author on:PubMed Google Scholar
Juaneta Luiz
View author publications
Search author on:PubMed Google Scholar
Rutuja Nerurkar
View author publications
Search author on:PubMed Google Scholar
Esin Nkereuwem
View author publications
Search author on:PubMed Google Scholar
Molly F. Franke
View author publications
Search author on:PubMed Google Scholar
Zaynab Mousavian
View author publications
Search author on:PubMed Google Scholar
Jeffrey M. Collins
View author publications
Search author on:PubMed Google Scholar
George B. Sigal
View author publications
Search author on:PubMed Google Scholar
Mark R. Segal
View author publications
Search author on:PubMed Google Scholar
Beate Kampman
View author publications
Search author on:PubMed Google Scholar
Eric Wobudeya
View author publications
Search author on:PubMed Google Scholar
Adithya Cattamanchi
View author publications
Search author on:PubMed Google Scholar
Joel D. Ernst
View author publications
Search author on:PubMed Google Scholar
Heather J. Zar
View author publications
Search author on:PubMed Google Scholar
Danielle L. Swaney
View author publications
Search author on:PubMed Google Scholar

Consortia

On behalf of the COMBO Study Consortium

Andrea Fossati
, Peter Wambi
, Devan Jaganath
, Roger Calderon
, Robert Castro
, Alexander Mohapatra
, Justin McKetney
, Juaneta Luiz
, Rutuja Nerurkar
, Esin Nkereuwem
, Molly F. Franke
, Zaynab Mousavian
, Jeffrey M. Collins
, George B. Sigal
, Mark R. Segal
, Beate Kampman
, Eric Wobudeya
, Adithya Cattamanchi
, Joel D. Ernst
, Heather J. Zar
& Danielle L. Swaney

Contributions

Conceptualization: D.L.S., D.J., A.C., J.D.E. Data curation: A.F., J.M., P.W., R.Castro, J.L., E.N., M.F.F., B.K., E.W., A.C., D.J., H.Z. Formal analysis: A.F., J.M., R.Castro, R.N., Z.M. Funding acquisition: D.L.S., J.D.E., A.C., D.J., J.M.C. Investigation: A.F., P.W., R.Calderon, J.L., E.N., M.F.F., B.K., E.W., A.C., J.D.E., D.J., H.Z., A.M. Methodology: A.F., D.J., D.L.S., J.D.E., A.C., J.C., G.B.S., M.R.S. Project administration: D.L.S., A.C., J.D.E., B.K., E.W., H.Z. Software: A.F. Resources: J.D.E., A.C., D.J., H.Z., M.F.F., R.Calderon, B.K., E.W. Supervision: D.L.S., J.D.E., D.J., A.C., M.R.S., B.K., E.W., H.Z. Validation: A.F., J.M. Visualization: A.F., J.M., D.L.S., R.N., Z.M. Writing–original draft: A.F., D.L.S., J.D.E., A.C., D.J., R.N., Z.M. Writing–review & editing: All authors.

Corresponding authors

Correspondence to Heather J. Zar or Danielle L. Swaney.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Paul Elkington, Deborah A. Lewinsohn, who co-reviewed with Dylan Kain, and the other, anonymous, reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Reporting Summary

Transparent Peer Review file

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Fossati, A., Wambi, P., Jaganath, D. et al. Plasma proteomics for biomarker discovery in childhood tuberculosis. Nat Commun 16, 6657 (2025). https://doi.org/10.1038/s41467-025-61515-5

Download citation

Received: 20 November 2024
Accepted: 24 June 2025
Published: 19 July 2025
Version of record: 19 July 2025
DOI: https://doi.org/10.1038/s41467-025-61515-5

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Clinical characteristics of the cohort

DIA-PASEF enabled high-throughput plasma proteomics

Identification of TB disease candidate biomarkers

Machine learning based identification of a TB biosignature

Detection of unconfirmed TB

Discussion

Methods

Ethical considerations

Pediatric TB cohort

Sample collection and selection

Sample preparation for plasma proteomics

DIA-PASEF data acquisition for abundance proteomics

DDA-PASEF and DIA-PASEF data analysis

Machine learning based identification of a TB biosignature

Computational packages utilized

Statistics and reproducibility

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

On behalf of the COMBO Study Consortium

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links