Abstract
Confirmatory diagnosis of childhood tuberculosis (TB) remains a challenge mainly due to its dependence on sputum samples and the paucibacillary nature of the disease. Thus, only ~ā30% of suspected cases in children are diagnosed and the need for minimally invasive, non-sputum-based biomarkers remains unmet. Understanding host molecular changes by measuring blood-based transcriptomic markers has shown promise as a diagnostic tool for TB. However, the implication of sex contributing to disease heterogeneity and therefore diagnosis remains to be understood. Using publicly available gene expression data (GSE39939, GSE39940; nā=ā370), we report a sex-specific RNA biomarker signature that could improve the diagnosis of TB disease in children. We found four gene biomarker signatures for male (SLAMF8, GBP2, WARS, and FCGR1C) and female pediatric patients (GBP6, CELSR3, ALDH1A1, and GBP4) from Kenya, South Africa, and Malawi. Both signatures achieved a sensitivity of 85% and a specificity of 70%, which approaches the WHO-recommended target product profile for a triage test. Our gene signatures outperform most other gene signatures reported previously for childhood TB diagnosis.
Similar content being viewed by others
Introduction
In 2022, an estimated 10.6 million people were diagnosed with tuberculosis (TB) worldwide, and 1.6 million died of TB, making it the second leading cause of death by a single infectious agent (Mycobacterium tuberculosis) after the recent COVID-191. The incidence of TB in children (age: 0ā14Ā years) is estimated to be roughly 12% (1.3 million in 2022) of the total TB burden. Among adults, TB was diagnosed more significantly in males (55%, 5.8 million in 2022) than in females (33%, 3.5 million in 2022)1. Further, sex-specific incidence rates in children vary according to age. For female children, the incidence rate is higher between 10 and 14Ā years, whereas it is higher at a considerably younger age (less than 12Ā months) for male children. Sexual dimorphism in TB has also been reported to vary across different countries2, but the reasons for this dimorphic pattern remain unknown.
According to the World Health Organizationās estimate, only 30% of childhood TB cases are diagnosed3. The high burden and poor outcomes of childhood TB are partially attributed to the challenges involved in obtaining a confirmatory diagnosis. Sputum smear microscopy or sputum culture is the gold standard for the diagnosis of pulmonary TB. However, diagnoses made through sputum smear microscopy are frequently negative due to the paucibacillary nature of childhood TB4. Moreover, clinical overlap of childhood TB with other common childhood diseases, such as pneumonia and other lower respiratory infections, may result in false negative diagnoses of TB. These diagnostic challenges are amplified when there is an active synergistic interaction of the TB bacterium (Mycobacterium tuberculosis) with other disease-causing infectious agents, such as in the case of co-infection with the human immunodeficiency virus (HIV). Overlapping clinical manifestations, such as cough, fever, weight loss, and lymphadenopathy, often lead to missed or late diagnosis or even misdiagnosis of either TB or HIV5. In this regard, the Xpert MTB/RIF assay has been endorsed as the initial diagnostic test for children suspected of having drug-resistant TB. However, it still depends on the sputum samples. These challenges underscore the urgent need for developing rapid, accurate, and non-sputum-based triage and confirmatory tests for TB that are minimally invasive. In recognition of this, the World Health Organization (WHO) created guidelines for developing non-sputum based TB tests for children, according to which the target product profile for a non-sputum-based triage test should have at least 90% sensitivity and 70% specificity, while the diagnostic test should have at least 66% sensitivity and 98% specificity in children with culture-positive TB6.
One minimally invasive approach to diagnose childhood TB is by utilizing blood-derived gene expression signatures as potential diagnostic markers for tuberculosis7,8,9,10,11,12,13,14,15,16,17,18,19,20. This approach is beneficial for diagnosing childhood TB, as it does not depend on sputum samples to detect Mycobacterium tuberculosis. Previously, Anderson et al. identified a 51-transcript signature that classified children with active, culture-confirmed TB from other similarly presenting diseases, with a sensitivity of 82.9% and specificity of 83.6%10. Tornheim et al. identified a 71 gene signature from Indian population by for childhood TB diagnosis. However, the 71 genes identified from the childhood population showed less than 50% overlap with genes identified from adult TB datasets, suggesting that a separate biomarker panel may be necessary for childhood TB diagnosis9. In addition, Gjoen et al. identified two sets of transcripts (one with seven transcripts and another with 10 transcripts) as potential diagnostic biomarkers for childhood TB13.
These study results are promising. However, the implementation of a gene signature with several tens of genes as a diagnostic gene signature is not practically feasible using current technologies. For example, the most widely used diagnostic platform for TB, the PCR-based Cepheid GeneXpert system, are limited in terms of the number of genes (~ā10 genes) that can be detected21. Therefore, an ideal gene signature should be composed ofāā¤ā10 genes. In this regard, the three-gene signature (GBP5, DUSP3, and KLF2) has shown promise as a non-sputum-based biomarker for both adults and children7,22,23. Using a specific cut-off for children, the three gene signature showed a sensitivity of 82.1% and specificity of 76.4% which approaches the WHO-recommended target product profile for a non-sputum-based triage test. However, we do not yet know if the same gene signature can be applied to both male and female children or if a sex-specific biomarker panel is more effective.
In adults, sex bias is observed in the incidence rate of TB24, bacterial load25, inflammatory response, mortality26, and response to treatment27. Studies have also highlighted a sex-specific difference in the gene expression pattern of infected individuals28,29. Although the exact reason for this dimorphism is unclear, Krug et al.29 demonstrated using mouse models that PARP1 contributes to sexually divergent TB immune responses and disease susceptibility partly due to the differences in the immune response. This study shows that basic molecular mechanisms could be different between sexes. Whether this sex-specific difference could also be reflected in the diagnosis, remains unanswered. A better understanding of the impact of sex-specific differences on disease diagnosis is important for establishing universal gene signatures for childhood TB diagnosis.
Using publicly available childhood TB datasets from Kenya, South Africa, and Malawi, we aimed to perform a sex-stratified analysis to identify transcripts that differentiate TB disease from other similarly presenting conditions and understand if sex-specific biomarkers are necessary for diagnosis.
Results
Overview of pediatric datasets selected for initial analysis
The overall study design is illustrated in Fig.Ā 1. Based on our inclusion criteria (see Methods Section for details), we identified GSE39939 and GSE3994010 from the National Institute of Healthās (NIH) Gene Expression Omnibus (GEO). These datasets included samples from Kenya, South Africa, and Malawi. Table S1 summarizes the clinical data of the study subjects agedāā¤ā15Ā years. Briefly, the total number of culture-positive samples was 146 and included 51 subjects with HIV coinfection. Further, 44 samples from Kenya that were culture-negative were also included in the study. Patients presenting with various other diseases (OD) but not TB were used as the control group. In the control group, 224 patients presented similar symptoms to those of TB, and this group included 64 samples that tested negative for the Interferon-Gamma Release assay (IGRA), a blood test used for TB diagnosis, mainly for TB infection. Because our study objective was to identify markers that classify active TB from other diseases, we did not include samples that tested positive for IGRA. The control group (i.e., OD group) also included 92 subjects with HIV coinfection. The OD group was further subdivided into lower respiratory tract infection, malnutrition, malaria, pneumonia, and lymphadenitis.
Blood-derived transcriptome profiles from children with TB disease exhibit a sexually dimorphic pattern
The overall objective of the study was to catalog the gene expression differences between sexes and understand the impact of sexual dimorphic patterns on identifying diagnostic biomarkers. To test if there are differences in the transcriptome profiles in each country30, we first compared TB group with the OD group and identified differentially expressed genes from each country. Next, for each country, we performed a sex-stratified analysis, where we split the samples based on sex and compared the TB group with the OD group to identify differentially expressed genes for each sex. For each sex, we compared the differentially expressed transcripts between the three countries. Next, we identified the number of common and unique differentially expressed genes among male and female children across all three countries.
The transcriptomic analysis of the 34,694 unique transcripts identified 1312, 1263, and 226 differentially expressed protein-coding genes in samples collected from children in Kenya, South Africa, and Malawi, respectively (Fig.Ā 2A, Supplementary Tables S2aāS2c). Less than 10% (89 genes) of the differentially expressed genes were common to all countries (Fig.Ā 2B). Among these, 79 upregulated and 6 downregulated genes were expressed in the same direction in pediatric patients from all three countries.
Evidence of sexual dimorphism in blood-derived transcriptome profiles of children diagnosed with TB. (A) Number of differentially expressed genes identified in pediatric patients from Kenya, South Africa, and Malawi in response to tuberculosis vs. other diseases. The number of up- and down-regulated genes are indicated above each bar (cyan: upregulation; purple: downregulation) for each country. (B) Number of common and unique differentially expressed genes identified in subjects from each country. The number at the center of the Venn diagram (89) indicates the total number of differentially expressed genes that are common to all three countries. (C) Number of differentially expressed genes (TB vs. other diseases)āgenes that are common (yellow) and unique to male (purple) and female (grey) subjects in each country. The number of differentially expressed genes for each group are specified within the graphs. (D) Number of common and unique differentially expressed genes identified in male blood samples from each country. The number at the center of the Venn diagram (84) indicates the common differentially expressed genes in male samples across the three countries. (E) Number of common and unique differentially expressed genes identified from female samples in each country. The number at the center of the Venn diagram (24) indicates the common differentially expressed genes in females across the three countries. (F) Comparison of differentially expressed genes that are common to male and female samples. The number at the center of the Venn diagram indicates the number of differentially expressed genes common to male and female samples across all three countries. (G) Top five pathways identified from the ten common genes between male and female samples. (H) Top 20 pathways identified from the 74 common male genes. (I) Top five pathways identified from the 14 common female genes.
We then stratified our analyses based on the sex of the child, obtaining 58 (19Ā TB; 39 OD), 92 (46Ā TB; 46 OD), and 70 (20Ā TB; 50 OD) male samples from Kenya, South Africa, and Malawi, respectively. For females, we had 32 (16Ā TB; 16 OD), 61 (27Ā TB; 34 OD), and 57 (18Ā TB; 39 OD) samples from Kenya, South Africa, and Malawi, respectively. Differential expression analysis with respect to TB and other similarly presenting diseases revealed clear differences in the gene expression profiles of male and female pediatric patients from all three countries (Fig.Ā 2C). Notably, 84 transcripts were common among the male pediatric patients from all three countries (Fig.Ā 2D, Supplementary Tables S3aāS3c). On the other hand, 24 transcripts were common to the female patients in all three countries (Fig.Ā 2E, Supplementary Tables S3dāS3f). Among these transcripts, ten transcripts were common to both sexes in all three countries, whereas the remaining transcripts in each sex group (74 transcripts for males and 14 for females) were unique to male and female patients, respectively (Fig.Ā 2F). To understand the biological relevance of these unique (sex-specific) and common transcripts, we performed a functional enrichment analysis and identified the significant pathways associated with transcripts in common, as well as male- and female-specific transcripts. Inflammatory pathways were enriched for the 10 common transcripts (Fig.Ā 2G). Immune-related pathways as well as stress pathways, including interferon signaling, were also among the most enriched pathways in male patients (Fig.Ā 2H). The retinoic acid biosynthesis pathway was the most enriched in female patients (Fig.Ā 2I).
Sex-specific gene signature adequately satisfies the WHO target product profile
For all analyses in this section, we considered male and female pediatric samples separately. Differential expression analysis yielded both common genes and genes that were unique to males and females. Therefore, to identify gene signatures that can potentially serve as biomarkers, we tested common genes, sex-specific genes, and a combination of common and sex-specific genes (henceforth referred to as a combined gene list).
For samples derived from males, following feature selection using samples from all three countries (Tables S4aāS4c), ranking (Tables S5aāS5c), and evaluation of different combinations of genes (Figures S1aāS1c), we selected the gene set with the top four gene candidates (SLAMF8, GBP2, WARS, and FCGR1C) that yielded an area under the receiver operating characteristics curve (AUROC) of 0.86 from the list of male-specific genes. At an optimal cut-off, this gene signature yielded a sensitivity of 0.85 and specificity of 0.73 (Fig.Ā 3A). When the specificity was fixed at the WHO recommended specificity of 0.7 for a triage test, this gene signature yielded a sensitivity of 0.86 which is close to the WHO-recommended sensitivity of 0.9 for a triage test. Further, the gene signature clearly stratified the active TB group from the OD group (Fig.Ā 3B). All four genes were upregulated in active TB samples (Fig.Ā 3C). The optimal cut-off estimated for male risk score separated the active TB cases from each of the different categories of similarly presenting diseases, such as pneumonia, malnutrition, lymphadenitis, and lower and upper respiratory tract infection, in each country (Figures S2aāS2c). The associated AUROC values ranged from 0.7 to 0.97 for all three countries (Figures S2dāS2f). To test if the gene signature can also be applied to patients with HIV, we split the male samples based on HIV status and evaluated the performance of the gene signature based on AUROC, sensitivity and specificity. The sensitivity (0.82) and specificity (0.8) values in HIV negative samples approached the threshold for non-sputum based triage test (Figure S3a). We observed a significant separation between OD and active TB groups (Figure S3b). However, while HIV positive male samples showed a specificity of 0.9, the samples had a sensitivity of ~ā0.7 and did not satisfy the sensitivity threshold for a triage test (Figure S3c). We also observed a significant separation between OD and active TB groups (Figure S3d). However, the AUROC was >ā0.85 in both HIV positive and negative samples.
ROC and stratification graphs for male-specific genes in childhood tuberculosis. (A) AUROC curve for the risk score using the four male-specific genes. (B) Box plot showing the disease risk score cut-off between active TB and OD groups. The dotted line indicates the risk score cut-off estimated using the Youden index. SNā=āsensitivity, SPā=āspecificity, ATā=āactive tuberculosis, ODā=āother disease. (C) Box plots of the four male-specific genes selected for risk score construction. Log normalized values obtained from microarray are represented as expression values for each gene. p-values are indicated above the graphs (Unpaired t-test).
For the samples from females, following feature selection using samples from all three countries (Tables S4dāS4f), ranking (Tables S5dāS5f), and evaluation of different combinations of genes (Figures S1dāS1f), the gene set with the top-ranked gene (GBP6) from the common and combined gene list identified active TB cases with a sensitivity of 0.87 and specificity of 0.74. Likewise, the gene set with the top four ranked genes (GBP6, CELSR3, ALDH1A1, and GBP4) from the combined gene list yielded a sensitivity of 0.85 and specificity of 0.69 (Fig.Ā 4A) at an optimal cut-off value, satisfying the WHO recommended target product profile for a non-sputum based triage test. When the specificity was fixed at WHO recommended value of 0.7, the female gene signature presented a sensitivity of 0.84, which is close to the WHO recommended value of 0.9. Further, the signature clearly stratified active TB cases from the OD group (Fig.Ā 4B). For females, three of the genes, except for CELSR3, were upregulated in active TB samples (Fig.Ā 4C). The optimal cut-off estimated for female risk score separated active TB cases from each of the categories of similarly presenting diseases, such as pneumonia, malnutrition, and lower and upper respiratory tract infection, in each country (Figures S4aāS4c). The associated AUROC values ranged from 0.7 to 0.9 for all three countries (Figures S4dāS4f). Similar to male samples, we also split the female samples based on HIV status. The sensitivity (0.89) and specificity (0.72) values in HIV negative samples approached the threshold for non-sputum based triage test (Figure S5a). We observed a significant separation between OD and active TB groups (Figure S5b). However, while HIV positive female samples showed a specificity of 0.71, the samples had a sensitivity of ~ā0.7 and did not satisfy the sensitivity threshold for a triage test (Figure S5c). We also observed a significant separation between OD and active TB groups (Figure S5d). However, the AUROC was ā„ā0.8 in both HIV positive and negative samples.
ROC and stratification graphs for combined (commonā+āspecific) genes in females. (A) AUROC curve for the risk score using the four combined female genes. (B) Box plot showing the disease risk score cut-off between active TB and OD groups. The dotted line indicates the risk score cut-off estimated using the Youden index. SNā=āsensitivity, SPā=āspecificity, ATā=āactive tuberculosis, ODā=āother disease. (C) Box plots of the four combined female genes selected for risk score construction. Log normalized values obtained from microarray are represented as expression values for each gene. p-values are indicated above the graphs (Unpaired t-test).
When we evaluated the performance of the female risk score cut-off in male samples, although the AUROC was 0.83 (Figure S6a), the sensitivity (0.8) did not meet the WHO-recommended threshold for a triage test. However, we observed a significant separation between the OD and active TB groups (Figure S6b). Similarly, we also tested the performance of male risk score cut-off in female samples. In this case, the AUROC was 0.77 (Figure S6c) and the sensitivity (0.75) failed to meet the WHO threshold for a triage test. Nevertheless, we observed a significant separation between the OD and active TB groups (Figure S6d).
We also noted that the male and female risk scores identified from culture-positive samples did not perform well in culture-negative samples in both male (Figures S7aāS7b) and female (Figures S7cāS7d) groups.
Finally, we tested the identified signature in adult population using datasets deposited in GEO. Due to the limited datasets available with information on sex, we could only test these genes in four datasets (GSE28623, GSE73408, GSE83456, GSE144127). From each of these datasets, we extracted the four genes for males and four genes for females, constructed risk scores, and calculated the AUROC, sensitivity, and specificity values. In males, all four datasets satisfied the specificity value (0.7) recommended by WHO for non-sputum based triage test. However, three of the four datasets did not cross the WHO recommended sensitivity value of 0.9. In females, however, two datasets fulfilled the sensitivity value, and two datasets satisfied the specificity value recommended by WHO (Table S7).
Gene signature identified by sex-stratification analyses performs better than most of the previously reported transcriptomic biomarkers for childhood TB
We compared our four male (sensitivity 0.85, specificity 0.73) and female (sensitivity 0.85, specificity 0.69) gene signatures with seven gene signatures previously reported for pediatric population as well as one meta-analysis study, which included the relevant age groups (Table S6). Considering that the performance of the reported gene signatures was not evaluated in male and female samples separately in the previous studies, in our analysis, we used the gene signatures reported in each published study and evaluated their performance in male and female samples obtained from GSE39939 and GSE39940, the datasets used for our analysis. Each study generated a specificity ofāā„ā0.7 in male and female children, but no study surpassed or approached the sensitivity threshold of 0.9 recommended by WHO for a non-sputum-based triage test. The sensitivity values ranged from 0.01 to 0.73 in males and 0.18 to 0.67 in females. The study by Anderson et al.10 that identified 49 transcripts, whose datasets our analysis was based on, was the singular study that had a sensitivity of 0.89 in males. However, in females, the sensitivity was 0.77. In contrast, the findings from our study utilizing the same dataset presents a higher sensitivity in females based on as low as four genes.
Discussion
This study was conducted to understand the effect of sex-specific transcriptomic differences on childhood TB diagnosis across South Africa, Malawi and Kenya. Sex-based differences in the incidence rate of TB have been reported to be age-related. Whereas a higher incidence rate has been reported for boys less than 1Ā year and boys/men older than 15Ā years, the reported incidence rate is higher for females aged 10ā14Ā years2. Whether these differences impact the molecular profiles of males and females, and in turn the diagnosis of TB disease, remains less understood. We observed that sex-specific gene expression differences might impact the classification power of transcriptomic markers. For example, when we compared the transcriptomic signatures from sex-specific, common, and combined gene sets, male-specific genes (SLAMF8, GBP2, WARS, and FCGR1C) yielded higher sensitivity (0.85) and specificity (0.73) values, closer to the target product profile of the triage test (0.9 sensitivity and 0.7 specificity) for childhood TB disease. Similarly, in female children, the genes from the combined gene set (GBP6, CELSR3, ALDH1A1, and GBP4) yielded the best sensitivity (0.85) and specificity (0.69) values, closer to the target product profile.
Sex-specific differences have been widely acknowledged to impact disease incidence rates, mortality, treatment responses, and overall disease manifestation in subjects with different acute and chronic conditions31,32,33,34,35,36,37,38, including TB, for which a higher incidence rate has been reported in boys/males aged less than 1 and more than 15Ā years2,24. The observed bias has been attributed to various factors, including behavioral and occupational causes and access to health care39. Experiments conducted on model organisms suggested that innate physiological differences may influence the susceptibility of an individual to TB39. Additionally, while it is true that sex hormones play a major role in imparting sex-specific molecular differences, it is also possible that the inherent biological sex, including the differences in male and female chromosomes, may influence gene expression and these differences cannot be negated. However, considering these physiological differences between males and females, no conclusive data indicates whether a separate gene signature is required for males and females for improved diagnosis. To address this, we first performed a sex-stratified analysis to identify differentially expressed genes between subjects afflicted with TB and other similarly presenting diseases. We observed that only approximately 30% of differentially expressed genes are common between males and females across the three countries included in this study. We further observed that only 10 genes were common to males and females across all three countries; 74 genes were seen as a unique gene set for males while 14 genes were identified as a unique gene set for females across all three countries. Using these gene sets, we determined that the best sensitivity and specificity values can be obtained when a separate gene signature is used for diagnosing TB in male and female children. Our next question was if these male and female gene signatures can be used interchangeably between sexes or if each performs better for the respective sex. For verification, we tested the male gene signature in female population and vice versa. Clearly, the overall performance of the risk scores were moderate in both the sexes and did not approach the WHO-recommended target product profile, suggesting that the potential diagnostic signature might be sex-specific. Additionally, when we tested the identified signature in adult datasets, the results were inconsistent in both males and females, suggesting that the signature may be more effective for childhood TB diagnosis.
To date, a few transcriptomic signatures have been proposed for childhood TB9,10,13,40,41,42, including the three-gene signature developed by Sweeney et al.7. However, none of the earlier studies evaluated if the same gene signature could act as an effective biomarker for both the sexes. To verify this, we obtained gene signatures from each study and conducted separate analysis for males and females using the datasets included in our study. While all the gene signatures yielded a specificity of at least 0.7, the sensitivity wasāā¤ā0.7. The 49 gene signature proposed by Anderson et al. showed a sensitivity of 0.89 in males. However, the sensitivity decreased to 0.77 in females. In comparison, the separate male and female gene signatures proposed in this study generated a sensitivity of 0.85 in both sexes. These observations suggest that the same gene signature may not yield the best results for males and females, and a separate sex-specific diagnostic gene signature might result in improved diagnosis outcome.
Of the four genes identified for males (SLAMF8, GBP2, WARS, and FCGR1C), three were previously reported as differentially expressed genes associated with TB disease in pediatric samples obtained from the Indian population9. On the other hand, all four genes identified for females (GBP6, CELSR3, ALDH1A1, and GBP4) were expressed in the same direction in the Indian population, and GBP6 was included as a part of their diagnostic gene set, which included 71 genes. Similarly, Anderson et al. have reported ALDH1A1 and GBP6 as parts of the 51 transcript signatures identified for distinguishing active TB from other similarly presenting diseases10. However, because a sex-stratified analysis was not conducted, we could not perform a direct sex-specific comparison and validate our findings in an external population. Nevertheless, we verified if these genes have been reported as potential biomarkers in other published datasets (adult population inclusive). We found studies that have reported a few of the identified genes, but the samples were not stratified based on sex. For example, GBP2 was found to have significant potential in the treatment monitoring of TB43,44. It was also found to be a part of the biomarker panel proposed for triage and confirmatory tests for controls vs. active TB cases in the adult population45. FCGR1C was proposed as part of a three-signature panel with a sensitivity of 0.75 and specificity of 0.81 to distinguish active TB from the OD group8.
Amongst the gene signature proposed for females, GBP6 was included as a part of a biomarker panel that can stratify TB from other diseases in the adult population17,46. Similarly, GBP4 was included as a part of a biomarker panel for HIV+/TB47. It was also reported as one of the 16 gene panel that predicted TB progression in South African adolescents aged between 12 and 18Ā years16. Finally, ALDH1A1 was shown to classify TB from other diseases11. However, the biomarker potential of SLAMF8, WARS, and CELSR3 has not been reported for TB. Altogether, the biomarker potential of ~ā50% of our list of identified genes is supported by the above-mentioned studies. In addition, we are adding a new set of genes that require further validation in a sex-specific model.
We perceive these sex-specific signatures to follow the diagnostic implementation pathway that mirrors the three gene signature diagnostic. The three gene signature was first identified from meta analysis and then was validated by multiple studies. The validation was done using the PCR based Cepheid GeneXpert system, which is a commonly used platform for infectious disease diagnosis, including tuberculosis. Since this system exists in many TB infected areas, such as India, South Africa, we believe findings from this study can be translated to a resource challenged setting. However, further evaluation and validation of the sex-specific genes are warranted in clinical samples, a key limitation of this study. The current risk score cut-off applies only to microarray platforms. Therefore, a different cut-off score specific to the platform will be generated for validation. This is expected, given the exploratory nature of the current study, until we translate the findings to a commercial platform. Moreover, male- and female-specific gene signatures identified in our study are applicable to three African countries, viz., Kenya, South Africa, and Malawi. It is possible, if not likely, that not all genes will be differentially expressed and show diagnostic promise in other populations. Therefore, subsequent validation studies will include samples collected from other geographical locations, in addition to those from the African continent. Similarly, we see HIV as the confounder that could impact the results of these tests and therefore for further validation, we would also include samples that are HIV positive.
In conclusion, we have demonstrated the importance and impact of including sex as a variable for identifying diagnostic biomarkers for childhood TB. Importantly, we identified a minimal set of genes (four genes for each sex) that adequately satisfies the target product profile recommended by WHO for a non-sputum-based triage test and the sex-specific markers were not effective when interchanged between males and females. The inclusion of clinical factors or other existing molecular tests, such as the Xpert Ultra assay, along with these molecular biomarkers may further increase the sensitivity and specificity of these signatures. While blood-based assays are comparatively less invasive than induced sputum or lavage samples, moving towards finger prick samples, which are even more less invasive will be more translational and beneficial to communities where childhood TB is prevalent.
Methods
Datasets for the study
We consulted publicly available transcriptomic datasets that included blood samples from pediatric patients with culture confirmed active TB and other similarly presenting disease groups. The following are the inclusion criteria used in this study: children agedāā¤ā15Ā years, (i) whose blood samples were collected using PAXgene RNA tubes, (ii) for whom treatment was not initiated at the time of sample collection, and (iii) datasets that had HIV+ and HIVā samples. Based on our inclusion criteria, we identified GSE39939 and GSE3994010 from NIH Gene Expression Omnibus (GEO) as suitable datasets for this study. Patient identifiable data was not available on the datasets downloaded for this study.
Microarray data analysis and identification of differentially expressed genes
The overall workflow of the study is depicted in Fig.Ā 1. To verify if geographical differences also affect childhood TB diagnosis, GSE39939 and GSE39940 samples were first split based on geographical location. The datasets, collected as a part of a single study10, contained samples from Kenya, South Africa, and Malawi. For the analysis, the quantile normalized file was downloaded as deposited, the values were scaled, and a log2 transformation was performed. Both GSE39939 and GSE39940 were generatedusing Illumina HumanHT-12 V4.0 expression bead chip; therefore, these datasets shared probe IDs. Probe IDs were annotated to gene symbols. The median expression value was calculated for genes that had multiple probe IDs48. For each country, cases with other (non-TB) diagnosed diseases were assigned as the control/reference group and the gene expression values were compared with those of the active TB group. R studio v4.2.1 (2022.7.1.554) was used for all analyses. Further, differential expression analysis was performed using the limma package in R49. Transcripts that showed at least a 1.2-fold change (non-log space) difference, along with the statistical significance of pā<ā0.05, were classified as differentially expressed genes. The non-coding genes (any transcript with RefSeq accession type starting with XM/XR) were filtered out and only differentially expressed protein-coding genes were retained for subsequent analysis.
Pathway analysis using differentially expressed genes was performed using the Metascape tool50. Reactome, KEGG, Wikipathways, Panther, and Hallmark databases were considered for pathway enrichment and the following criteria were applied for analysis: minimum enrichment of 1.3, p-value cut-off of 0.05, and minimal overlap of three genes in each cluster.
Identification of sex-specific transcriptomic signature score
The differentially expressed genes were classified into (i) common gene signatureāgenes that were common between sexes in all three countries, (ii) sex-specific gene signatureāgenes resulting from sex-stratified analyses (male/female specific signature) across all countries, and (iii) combined gene signatureāthe combination of common and sex-specific gene signature for male and female children (male/female signature), regardless of geography.
Feature selection was performed using the Boruta algorithm (Boruta package)51, where the algorithm was repeated 100 times and features classified as āconfirmedā all 100 times (Figure S8) were selected. The Boruta selected features were then ranked using the Gini score obtained from the Random Forest algorithm (caret package)52. This ranking split the genes into several gene sets, as observed from āelbowsā or ākinksā in the importance plot. For each gene set, the risk score was calculated using the normalized, log2-transformed values of the selected genes. The average expression of the downregulated genes was subtracted from the average expression of the upregulated genes, as reported in Ref.7, and the final value was defined as the risk score. The diagnostic performance of each gene set was assessed based on a receiver operating characteristics (ROC) curve, sensitivity, and specificity calculated using easyROC, a web tool for ROC curve analysis53.
Youden index54 was used to calculate the optimal cut-off for the risk score obtained. The risk score was evaluated using the sensitivity and specificity values obtained at the cut-off that maximized the Youden index, and the associated sensitivity and specificity values were compared with the WHO-recommended target product profile for a non-sputum-based triage test (90% sensitivity and 70% specificity). Risk scores with maximum sensitivity and specificity and those achieving values closer to the target product profile threshold of a triage test was considered final. The optimal cut-offs for each gene set were calculated using samples from all three countries. As the reference group included several health conditions (Table S1), such as pneumonia, lower respiratory tract infection, malnutrition, and lymphadenitis, the Youden-index-estimated cut-offs was tested to see if it separated TB from each of these conditions. A specific health condition was considered only if a minimum of five samples were available. However, to construct AUROC curves for TB vs. each non-TB condition, health conditions that had at least 10 samples, wherever possible were considered. Furthermore, the risk scores identified for male and female pediatric samples were tested in culture-negative samples.
All figures were prepared using GraphPad Prism version 9 for Windows, GraphPad Software, San Diego, California, USA (www.graphpad.com).
Evaluation of previously published pediatric transcriptomic signatures
Eight gene signatures published in seven reports7,9,10,13,40,41,42 were comparedĀ with our gene signatures. For this, the list of genes reported in each study as biomarkers was downloaded and a risk score was generated using the formula provided in Ref7. Then, the sensitivity and specificity for this list of genes was calculated using the male and female samples included in pediatric datasets GSE39939 and GSE39940 (the datasets used in the current study). Risk scores for all published gene signatures were calculated separately for male and female pediatric samples, as explained above (subtracting the average expression values of downregulated genes from the average expression values of upregulated ones). To draw parallel comparisons while calculating the risk score, we divided the genes into upregulated and downregulated groups, as done in the parent studies, irrespective of the direction in which they were expressed in the datasets used for our analysis. AUROC, sensitivity, and specificity values were calculated and compared with the target product profile and the values obtained from our analysis of sex-specific gene signatures.
Data availability
The datasets analysed during the current study are available in the GEO repository under the accession numbers GSE39939 and GSE39940.
References
Geneva: World Health Organization; 2022. Global tuberculosis report 2022.
Peer, V., Schwartz, N. & Green, M. S. Gender differences in tuberculosis incidence ratesāA pooled analysis of data from seven high-income countries by age group and time period. Front. Public Health 10, 997025 (2023).
Dodd, P. J., Gardiner, E., Coghlan, R. & Seddon, J. A. Burden of childhood tuberculosis in 22 high-burden countries: A mathematical modelling study. Lancet Glob. Health 2, e453āe459 (2014).
Zar, H. J. et al. Rapid molecular diagnosis of pulmonary tuberculosis in children using nasopharyngeal specimens. Clin. Infect. Dis. 55, 1088ā1095 (2012).
Venturini, E. et al. Tuberculosis and HIV co-infection in children. BMC Infect. Dis. 14, S5 (2014).
World Health Organization. High priority target product profiles for new tuberculosis diagnostics: Report of a consensus meeting (2014).
Sweeney, T. E., Braviak, L., Tato, C. M. & Khatri, P. Genome-wide expression for diagnosis of pulmonary tuberculosis: A multicohort analysis. Lancet Respir. Med. 4, 213ā224 (2016).
Hoang, L. T. et al. Transcriptomic signatures for diagnosing tuberculosis in clinical practice: A prospective, multicentre cohort study. Lancet Infect. Dis. 21, 366ā375 (2021).
Tornheim, J. A. et al. Transcriptomic profiles of confirmed pediatric tuberculosis patients and household contacts identifies active tuberculosis, infection, and treatment response among Indian children. J. Infect. Dis. 221, 1647ā1658 (2020).
Anderson, S. T. et al. Diagnosis of childhood tuberculosis and host RNA expression in Africa. N. Eng. J. Med. 370, 1712ā1723 (2014).
Kaforou, M. et al. Detection of tuberculosis in HIV-infected and -uninfected African adults using whole blood RNA expression signatures: A case-control study. PLOS Med. 10, e1001538 (2013).
Sivakumaran, D. et al. Host blood RNA transcript and protein signatures for sputum-independent diagnostics of tuberculosis in adults. Front. Immunol. 11, 626049 (2020).
GjĆøen, J. E. et al. Novel transcriptional signatures for sputum-independent diagnostics of tuberculosis in children. Sci. Rep. 7, 5839 (2017).
Sambarey, A. et al. Unbiased identification of blood-based biomarkers for pulmonary tuberculosis by modeling and mining molecular interaction networks. EBioMedicine 15, 112ā126 (2017).
Satproedprai, N. et al. Diagnostic value of blood gene expression signatures in active tuberculosis in Thais: A pilot study. Genes Immun. 16, 253ā260 (2015).
Zak, D. E. et al. A blood RNA signature for tuberculosis disease risk: A prospective cohort study. Lancet 387, 2312ā2322 (2016).
Gliddon, H. D. et al. Identification of reduced host transcriptomic signatures for tuberculosis disease and digital PCR-based validation and quantification. Front. Immunol. 12, 637164 (2021).
Maertzdorf, J. et al. Concise gene signature for point-of-care classification of tuberculosis. EMBO Mol. Med. 8, 86ā95 (2016).
da Costa, L. L. et al. A real-time PCR signature to discriminate between tuberculosis and other pulmonary diseases. Tuberculosis 95, 421ā425 (2015).
de Araujo, L. S. et al. Transcriptomic biomarkers for tuberculosis: Evaluation of DOCK9, EPHA4, and NPC2 mRNA expression in peripheral blood. Front. Microbiol. 7, 1586 (2016).
Warsinske, H., Vashisht, R. & Khatri, P. Host-response-based gene signatures for tuberculosis diagnosis: A systematic comparison of 16 signatures. PLOS Med. 16, e1002786 (2019).
Gupta-Wright, A. et al. Evaluation of the Xpert MTB host response assay for the triage of patients with presumed pulmonary tuberculosis: A multi-site prospective diagnostic accuracy study. https://doi.org/10.2139/ssrn.4512925 (2023).
Olbrich, L. et al. Diagnostic accuracy of a three-gene Mycobacterium tuberculosis host response cartridge using fingerstick blood for childhood tuberculosis: A multicentre prospective study in low-income and middle-income countries. Lancet Infect. Dis. 24, 140ā149 (2023).
Global Tuberculosis Report 2021. https://www.who.int/publications-detail-redirect/9789240037021.
Bini, E. I. et al. The influence of sex steroid hormones in the immunopathology of experimental pulmonary tuberculosis. PLoS One 9, e93831 (2014).
Dibbern, J., Eggers, L. & Schneider, B. E. Sex differences in the C57BL/6 model of Mycobacterium tuberculosis infection. Sci. Rep. 7, 10957 (2017).
Dutta, N. K. & Schneider, B. E. Are there sex-specific differences in response to adjunctive host-directed therapies for tuberculosis?. Front. Immunol. 11, 1465 (2020).
Neyrolles, O. & Quintana-Murci, L. Sexual inequality in tuberculosis. PLOS Med. 6, e1000199 (2009).
Krug, S. et al. Host regulator PARP1 contributes to sex differences and immune responses in a mouse model of tuberculosis. https://doi.org/10.1101/2021.04.21.440820 (2021).
Kulkarni, V. et al. A two-gene signature for tuberculosis diagnosis in persons with advanced HIV. Front. Immunol. 12, 631165 (2021).
Gay, L. et al. Sexual dimorphism and gender in infectious diseases. Front. Immun. 12, 698121 (2021).
Fisher, D. W., Bennett, D. A. & Dong, H. Sexual dimorphism in predisposition to Alzheimerās disease. Neurobiol. Aging 70, 308ā324 (2018).
Stival, A. et al. Sexual dimorphism in tuberculosis incidence: Children cases compared to adult cases in Tuscany from 1997 to 2011. PLOS One 9, e105277 (2014).
Clocchiatti, A., Cora, E., Zhang, Y. & Dotto, G. P. Sexual dimorphism in cancer. Nat. Rev. Cancer 16, 330ā339 (2016).
Laskar, R. S. et al. Sexual dimorphism in cancer: Insights from transcriptional signatures in kidney tissue and renal cell carcinoma. Hum. Mol. Genet. 30, 343ā355 (2021).
Rubin, J. B. et al. Sex differences in cancer mechanisms. Biol. Sex. Diff. 11, 17 (2020).
Stockstill, K. et al. Sexually dimorphic therapeutic response in bortezomib-induced neuropathic pain reveals altered pain physiology in female rodents. Pain 161, 177ā184 (2020).
Capone, I., Marchetti, P., Ascierto, P. A., Malorni, W. & Gabriele, L. Sexual dimorphism of immune responses: A new perspective in cancer immunotherapy. Front. Immunol. 9, 552 (2018).
Nhamoyebonde, S. & Leslie, A. Biological differences between the sexes and susceptibility to tuberculosis. J. Infect. Dis. 209, S100āS106 (2014).
Verhagen, L. M. et al. A predictive signature gene set for discriminating active from latent tuberculosis in Warao Amerindian children. BMC Genom. 14, 74 (2013).
Dhanasekaran, S. et al. Identification of biomarkers for Mycobacterium tuberculosis infection and disease in BCG-vaccinated young children in Southern India. Genes Immun. 14, 356ā364 (2013).
Li, Q. et al. Increased IL-9 mRNA expression as a biomarker to diagnose childhood tuberculosis in a high burden settings. J. Infect. 71, 273ā276 (2015).
Long, N. P. et al. A 10-gene biosignature of tuberculosis treatment monitoring and treatment outcome prediction. Tuberculosis 131, 102138 (2021).
van Doorn, C. L. R. et al. Transcriptional profiles predict treatment outcome in patients with tuberculosis and diabetes at diagnosis and at two weeks after initiation of anti-tuberculosis treatment. EBioMedicine 82, 104173 (2022).
Perumal, P. et al. Validation of differentially expressed immune biomarkers in latent and active tuberculosis by real-time PCR. Front. Immunol. 11, 612564 (2021).
Bobak, C. A., Titus, A. J. & Hill, J. E. Investigating random forest classification on publicly available tuberculosis data to uncover robust transcriptional biomarkers: in Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies 695ā701 (SCITEPRESSāScience and Technology Publications, 2018). https://doi.org/10.5220/0006752406950701.
Chen, Y. et al. Meta-analysis of peripheral blood transcriptome datasets reveals a biomarker panel for tuberculosis in patients infected with HIV. Front. Cell. Infect. Microbiol. 11, 585919 (2021).
Bobak, C. A., Titus, A. J. & Hill, J. E. Comparison of common machine learning models for classification of tuberculosis using transcriptional biomarkers from integrated datasets. Appl. Soft. Comput. 74, 264ā273 (2019).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Zhou, Y. et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun. 10, 1523 (2019).
Kursa, M. B. & Rudnicki, W. R. Feature selection with the Boruta package. J. Stat. Soft. 36, 1ā13 (2010).
Ho, T. K. Random decision forests. In Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) 278 (IEEE Computer Society, 1995).
Goksuluk, D., Korkmaz, S., Zararsiz, G., Karaagaoglu, A. & Ergun,. easyROC: An interactive web-tool for ROC curve analysis using R language environment. R J. 8, 21 (2016).
Ruopp, M. D., Perkins, N. J., Whitcomb, B. W. & Schisterman, E. F. Youden index and optimal cut-point estimated from observations affected by a lower limit of detection. Biom. J. 50, 419ā430 (2008).
Acknowledgements
The authors thank the Burroughs Wellcome Fund institutional program grant awarded to C.A.B. (Grant#1014106) and University of British Columbia start-up funds provided to J.E.H for supporting this work. The authors thank Dr. Ahmad Mani-Varnosfaderani, Dr. Rebecca Davidson, and Dr. Catherine Stein for participating in discussions and providing critical input. The authors also thank Dr. Ahmad Mani-Varnosfaderani, Dr. Shekooh Behroozian, and Ms. Ning Sun for critical reading of the manuscript. We thank Dr. Sangeetha Neralagatta for editing the text of this manuscript.
Author information
Authors and Affiliations
Contributions
The study was conceptualized by P.K., C.A.B., and J.E.H. Formal analysis was conducted by P.K., and the data were interpreted by P.K., C.A.B., and J.E.H. The original manuscript was drafted by P.K. and J.E.H. All authors reviewed and edited the manuscript. The research fund was acquired by J.E.H., and the study was supervised by J.E.H.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Krishnan, P., Bobak, C.A. & Hill, J.E. Sex-specific blood-derived RNA biomarkers for childhood tuberculosis. Sci Rep 14, 16859 (2024). https://doi.org/10.1038/s41598-024-66946-6
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-024-66946-6






