Abstract
The study aims to accurately identify differentially expressed genes (DEGs) and biological pathways in mycobacterial infections through bioinformatics for deeper disease understanding. Differentially expressed genes (DEGs) was explored by Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. Unique DEGs were submitted on least absolute shrinkage and selection operator (LASSO) regression analysis. 1,057 DEGs from two GSE datasets were identified, which were closely connected with NTM/ latent TB infection (LTBI)/active TB disease (ATB). It was demonstrated that these DEGs are mainly associated with detoxification processes, and virus and bacterial infections. Moreover, the METTL7B gene was the most informative marker for distinguishing LTBI and ATB with an area under the curve (AUC) of 0.983 (95%CI: 0.964 to 1). The significantly upregulated HBA1/2 genes were the most informative marker for distinguishing between individuals of IGRA-HC/NTM and LTBI (P < 0.001). Moreover, the upregulated HBD gene was also differ between IGRA-HC/NTM and ATB (P < 0.001). We have identified gene signatures associated with Mycobacterium infection in whole blood, which could be significant for understanding the molecular mechanisms and diagnosis of NTM, LTBI, or ATB.
Similar content being viewed by others
Introduction
Tuberculosis (TB), caused by Mycobacterium tuberculosis (Mtb), significantly contributes to population mortality worldwide1. Mycobacterium avium complex is frequently isolated non-tuberculous mycobacteria (NTM) species, and Mycobacterium avium complex pulmonary disease (MAC-PD) has a heterogeneous clinical course2. There is an inverse relationship between TB incidence and NTM disease in high TB-burden countries3. However, the clinical manifestations of NTM disease are similar to those of TB and may pose a diagnostic challenge even to an experienced clinician4.
Tuberculin skin test (TST) and interferon gamma release assays (IGRAs) are the main diagnostic tools for latent TB infection (LTBI) or concern for active TB disease (ATB), however, both present strengths and limitations5. The new NTM-IGRA could improve the clinical handling of NTM-infected patients and avoid unnecessary misdiagnosis and treatments6. However, the clinical presentation of patients varies according to the time of occurrence of Mycobacterium infection and the level of immunodeficiency7.
Microscopy should be routinely carried out using auramine-phenol staining as it allows the direct evaluation of Mycobacterium and provides a useful adjunct to culture for Mycobacterium diseases8. But the isolation of NTM alone does not make the diagnosis, and the diagnosis of NTM pulmonary disease rests upon microbiological, clinical, and radiological findings. Moreover, the proportion of smear negative patients among pulmonary TB cases was 51%9.
The next-generation sequencing (NGS) technology was used to comprehensively diagnose Mtb and NTM, however, NGS from the direct diagnostic specimen is not yet standardized10. The whole genome sequencing (WGS) may serve as a high-resolution assay for the diagnosis of different subtypes of Mycobacterium infection11. Nevertheless, the sequencing method not only necessitates expensive equipment and experimental consumables but also demands professional sequencing and data analysis teams, ultimately imposing the cost of detection onto the patient and rendering this detection unfeasible in low-level hospitals.
Therefore, there is an urgent need for a rapid and simple method that does not rely on immune tests, sputum specimens, or sequencing, particularly in low-level health care settings where the most readily available specimen is a patient's whole blood. Unfortunately, there is currently no specific biomarker that can accurately distinguish between active tuberculosis, latent infection, and NTM infection using whole blood specimens. With the advancements in RNA-sequencing (RNA-seq) technology, it has become a valuable tool for understanding biological processes in infectious diseases12, and it facilitate the detection of differentially expressed genes (DEGs), hence potential biomarker candidates13.
In this study, we obtained four sepsis datasets (GSE205161, GSE107991, GSE199403 and GSE152218) from the GEO database and classified the raw data with R packages. Afterwards, we applied enrichment analysis and identified key DEGs by performing machine learning techniques. Then, the protein–protein interaction (PPI) analysis, correlation analysis, gene ontology (GO) and gene set enrichment analysis (GSEA) enrichment analysis, and Kyoto Encyclopedia of Genes and Genomes (KEGG)14,15,16 pathway enrichment analysis were performed to demonstrate the potential biological process of Mycobacterium infection. To identify key biomarkers, we utilized logistic analysis, a LASSO-based tenfold cross-validation approach. Finally, external validation datasets were used for extensive filtering and identification of NTM/LTBI/ATB-related molecular biomarkers.
Material and methods
Data sources and processing
The gene expression profiles related to CIN progression were retrieved and downloaded from the Gene Expression Omnibus (GEO) database of the National Center for Biotechnology Information (NCBI). Four expression profiling by high throughput sequencing datasets (GSE205161, GSE107991, GSE199403 and GSE152218) were selected and downloaded from the GEO database for analysis.
GSE20516117 is an expression profiling by high throughput sequencing based on the GPL24676 platform (Illumina NovaSeq 6000, Homo sapiens) and contains samples of non-tuberculous mycobacteria (NTM) pulmonary disease. GSE10799118 is an expression profiling by high throughput sequencing based on the GPL20301 platform (Illumina HiSeq 4000, Homo sapiens) and contains samples of active tuberculosis (ATB) and latent tuberculosis infection (LTBI). GSE19940319 is an expression profiling by high throughput sequencing based on the GPL24676 platform (Illumina NovaSeq 6000, Homo sapiens) and contains samples of NTM pulmonary disease and healthy controls (IGRA + and IGRA-). GSE152218 is an expression profiling by high throughput sequencing based on the GPL1679120 platform (Illumina HiSeq 2500, Homo sapiens) and contains samples of latent TB infection (LTBI) and active TB (ATB).
Analysis of expression profiling datasets
Samples of datasets consisted of four groups: Control (CON), NTM, LTBI and ATB. The GEO2R tool was utilized to analyze the three expression datasets21,22. To filter out the DEGs of the datasets, each dataset is normalized and converted to log2. The filtering conditions were this: |log2-fold change|> 1 and FDR < 0.05. The findings of the intersection analysis were then compared and analyzed using the Venn diagram with the R package. The final DEGs are computed using the intersection of the data sets (Fig. 1).
Functional enrichment analysis
The clusterProfiler package, the org. Hs.eg.db package, and the GOplot package of R software were used to perform Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis on related DEGs using (P < 0.05). Gene set enrichment analysis (GSEA) software (version 4.1.0) was utilized to compare the biological processes that were significantly different between the low- and high-risk groups.
Protein–protein interaction network analysis
We submitted NTM-, LTBI-, and ATB-related DEGs to the STRING database (http://www.string-db.org/) for PPI network prediction (interaction score > 0.7). Genes in clusters were identified by importing STRING database analysis results into Cytoscape v.12.0, and the Molecular Complex Detection (MCODE) plugin was then used to perform clustering analysis for NTM-, LTBI-, and ATB-related DEGs (Degree Cutoff = 2, Node Score Cutoff = 0.2, K-Core = 2, and Max. Depth = 100).
Identification of NTM, LTBI and ATB related genes using LASSO regression
The software package "glmnet" was used to perform LASSO logistic regression analysis on the identified 43 unique genes, with the limited sample size and huge number of covariates considered. Genes at the intersection of those screened by LASSO have been used to diagnose NTM/LTBI/ATB diseases.
Verification using external validation datasets
To validate the mRNA expression levels of key crosstalk genes, we utilized independent external datasets (GSE199403 and GSE152218). The GSE199403 included 73 MAC-PD, 50 IGRA + healthy controls and 75 IGRA- healthy controls samples. The GSE152218 included 16 ATB and 32 LTBI samples. The genes that were screened by LASSO logistic regression analysis were determined as the key genes. Subsequently, we then compared the mRNA expression levels of the key genes between the cases and IGRA- healthy controls groups using the Mann–Whitney U test. The results were considered statistically significant if the p-value was less than 0.05, which was visualized using the “ggplot2” package. Finally, we evaluated the diagnostic efficacy of the key genes in the test dataset by constructing receiver operating characteristic (ROC) curves using the “pROC” package.
Statistical analysis
Statistical analyses were performed using R software (version 4.3.2, http://www.R-project.org). Wilcoxon rank-sum test was used to analyze the significance of differential correlation gene expression in the GEO dataset. Student t-tests were performed to analyze significant differences between distinct groups. The glmnet R package was used for the LASSO regression analysis. A p-value was less than 0.05 indicates statistical significance.
Results
Identification of differential expression genes associated with NTM, LTBI and ATB
To identify differentially expressed genes, we first identified the differentially expressed genes (DEGs) in GEO datasets (GSE205161 and GSE107991). There were 20,488 genes in the NTM vs LTBI group, 20,272 genes in the NTM vs ATB group, 20,299 genes in the LTBI vs ATB group, and a total of 20,736 non-dual genes across all combinations. The significance criteria for DEGs were based on a difference multiple of |Fold Change|≥ 2, which corresponds to |log2FC|> 1, and p-value < 0.05. Ultimately, 229 differential genes in the NTM vs LTBI group met these thresholds. In the NTM vs ATB group, 256 differential genes met the thresholds, and in the LTBI vs ATB group, 572 differential genes met the thresholds (Fig. 1). The top ten most significant up- and down-regulated genes were listed and were for further analysis (Fig. 2).
Transcriptional changes in patients with NTM, LTBI and ATB. (A). NTM vs LTBI, volcano plot of all genes; significantly (FDR < 0.05) differentially expressed genes (|fold change|> 1) are indicated in red (upregulated), or green (downregulated); (B). NTM vs LTBI; (C) LTBI vs ATB; (D). NTM vs LTBI, the top 10 differentially expressed genes are indicated in blue (upregulated), or yellow (downregulated); (E). NTM vs LTBI; (F) LTBI vs ATB.
Heatmaps of differential expression genes (DEGs) of NTM, LTBI and ATB
The patients were subdivided into NTM, LTBI and ATB groups, according to the GEO (GSE205161 and GSE107991) cohorts. The gene expression profiles of the DEGs between the paired groups were shown in the heatmap in Fig. 3 A, B and C. Genes with higher expression are depicted in red on the heatmap, while genes with lower expression were depicted in blue, with the same expression level marked in white. These results indicated that patients with NTM, LTBI and ATB could be classified based on genes associated with these Mycobacterium infections. Then, we found 1,127 DEGs from GEO (GSE205161 and GSE107991) cohorts and finally screened 598 unique DEGs for further analysis (Fig. 3D).
Gene ontology analysis and kyoto encyclopedia of genes and genomes enrichment (KEGG) clusters of DEGs
To investigate the role of DEGs in the NTM, LTBI and ATB groups, we performed GO and KEGG pathway enrichment analyses. The results showed that these genes were strongly enriched in GO–Biological Process (BP) terms related to the detoxification process, response to virus process, as well as to the defense response to bacterium process. In terms of GO–Cell Component (CC) terms, these genes were highly enriched in haptoglobin-hemoglobin complex, blood microparticle and collagen-containing extracellular matrix, and in GO–Molecular Function (MF) terms, they were enriched in antioxidant activity, serine-type endopeptidase activity and endopeptidase activity (Fig. 4 A B and C). KEGG clusters analysis showed that these genes were enriched in Ribosome biogenesis in eukaryotes, Coronavirus disease (COVID-19) and Staphlococcus aureus infection (Fig. 4 D E and F), which was consistent with the results of GO analysis. Overall, the DEGs in the GO enrichment was in agreement with the KEGG cluster analysis.
Enrichment analysis of DEGs among NTM, LTBI and ATB. (A) Bubble plot shows that DEGs between NTM and LTBI are enriched in several biological processes (BP), cell components (CC), molecular functions (MF) in NTM group; (B) Bubble plot shows that DEGs between NTM and ATB are enriched in each term; (C) Bubble plot shows that DEGs between LTBI and ATB are enriched in each term; Circle plot of KEGG pathway enrichment analysis in (D) NTM vs LTBI, (E) NTM vs ATB and (F) LTBI vs ATB groups. In the bubble plot, we sorted the top 10 of BP, CC and MF by p-value and visualize them. In the circle plot of KEGG clusters, the top 5 KEGG pathway are shown by p-value and the color represents each pathway.
Exploration of signaling pathways
Gene set enrichment analysis (GSEA) has an advantage in exploring the involved signaling pathways from an overall perspective. GSEA revealed that the DEGs in NTM vs LTBI group of GEO cohorts were significantly enriched in the erythrocytes take up carbon dioxide and release oxygen, mRNA splicing, O2/CO2 exchange in erythrocytes and scavenging of heme from plasma (Fig. 5A). The DEGs in NTM vs ATB group of GEO cohorts were significantly enriched in the cohesin loading onto chromatin, ODX58/IFIH1-mediated induction of interferon-alpha/beta, mitotic telophase/cytokinesis, RNA polymerase-1 transcription termination and scavenging of heme from plasma (Fig. 5B). The DEGs in LTBI vs ATB group of GEO cohorts were significantly enriched in the amyloid fiber formation, complement cascade, innate immune system, interferon gamma signaling and neutrophil degranulation (Fig. 5C).
Protein–protein interactions network and correlation analyses of DEGs
The interaction of DEGs was investigated further by constructing a PPI network (as shown in Fig. 6) using the online tools. Figure 6 A, B and C showed the number of neighboring nodes of DEGs for NTM vs LTBI, NTM vs ATB and LTBI vs ATB. The DEGs of the top three adjacent nodes (HP, HBD, HBB, PSAD2, ISG15, IFI44L, GBP1, IFIT3 and STAT1) were highlighted with bold (Fig. 6 D E and F). Overall, the LTBI vs ATB group had a more extensive PPI network and maximum adjacent nodes of 56, and the NTM vs LTBI group had the smallest PPI network and the corresponding number of nodes, while the NTM vs LTBI group was between these two groups with the maximum number of adjacent nodes of 24.
Construction of diagnosis gene signature in GEO training cohorts
Based on top 10 upregulated or downregulated DEGs in these three groups, 43 unique DEGs were selected and visualized with Venn diagram (Fig. 7A). These significant genes were subject to least absolute shrinkage and selection operator (LASSO) regression analysis to construct the diagnostic model of NTM, LTBI and ATB diseases. The calculation of the regression coefficient is visualized in Fig. 7 B, C and D. A coefficient profile plot was produced against the log (lambda) sequence in the LASSO model. The optimal parameter (Lambda) was chosen as the first black dotted line indicated (Fig. 7 E to G). When the minimum Lambda were 0.0002, 0.0009 and 0.001 in NTM vs LTBI, NTM vs ATB, and LTBI vs ATB groups, respectively, four significant genes were identified in each group (Fig. 7 H).
Identification of the highly correlated gene expression signatures in NTM, LTBI and ATB group. (A) Venn diagram demonstrates the 43 unique DEGs of three groups; The variation characteristics of the coefficient of 43 unique DEGs in NTM vs LTBI (B), NTM vs ATB (C) and LTBI vs ATB (D) groups; The selection process of the optimum value in the Lasso regression model by cross-validation method in NTM vs LTBI (E), NTM vs ATB (F) and (G) LTBI vs ATB groups; (H) Potential gene signatures for distinguishing between NTM and LTBI, NTM and LTBI, and LTBI and ATB, respectively.
Validation of the diagnosis gene signature in other GEO datasets
GEO datasets (GSE199403 and GSE152218) were implemented to validate the performance of the gene-related diagnosis model. Figure 8 showed the expressions of four hub genes (HBA1, HBA2, HBD and METTL7B) which screened by LASSO regression in all group (Interferon gamma release assay negative healthy control, “IGRA-HC”, NTM, LTBI and ATB). Results revealed that the expression levels of HBA1/2 in patients with NTM between HC and LTBI (Fig. 8 A B). Figure 8 C showed that HBD gene was significantly upregulated in ATB group when compared with IGRA-HC and NTM (all P < 0.001). The METTL7B gene was downregulated significantly in LTBI groups (P < 0.001), and the differences of METTL7B gene expression could efficiently differ between LTBI and ATB (Fig. 8 D).
Further validation of the diagnosis gene signature of each group
For HBA1 group, the area under the curve (AUC) score of HBA1 classifiers was 0.803 (95%CI, 0.709–0.898) in test datasets. The expression of HBA1 gene could efficiently differ LTBI from control group (IGRA-HC and NTM) (Fig. 9 A). For HBA2 group, the area under the curve (AUC) score of HBA2 classifiers was 0.798 (95%CI, 0.702–0.894) in test datasets. The expression of HBA2 gene could also efficiently differ LTBI from control group (IGRA-HC and NTM) (Fig. 9 B). Figure 9 C showed that the expression of HBD gene could diagnosis ATB among control group (IGRA-HC and NTM), with an AUC of 0.772 (95%CI, 0.661–0.883) (Fig. 9 C). Moreover, the expression of METTL7B gene could identify patients with LTBI among IGRA-HC and ATB group, and ROC results indicated that the AUC was 0.983 (95%CI, 0.964–1) (Fig. 9 D).
Discussion
Most genetic markers associated with Mycobacterium infection are identified by comparing DEGs with healthy individuals, however, exploring different markers within subtypes (e.g. Mtb and NTM) is of great significant clinical value in diagnostics and differential diagnosis, especially as such markers are also different in healthy populations. In this study, we identified the role of Mycobacterium infection related genes in NTM, LTBI and ATB diseases. The GEO samples consisted of four subgroups: IGRA-HC, NTM, LTBI and ATB. We used LASSO and ROC to analyze and confirm the unique DEGs of these subgroups and found that the upregulated HBA1/2 could distinguishing LTBI in the NTM vs LTBI group, and the upregulated HBD could identify ATB in the NTM vs ATB group. Moreover, the downregulated METTL7B gene has been associated with LTBI, and may distinguish LTBI from ATB. Most importantly, the expression levels of these genes were also differed significantly between IGRA-HC and these diseases.
Both ATB and NTM pose severe health risks worldwide. A simple non-sequencing technique is required for quick diagnosis, particularly in less experienced hospitals, however no one biomarker is widely utilized for all Mycobacterium23. Previous studies concluded that the level of serum coronin-1A might serve as a novel biomarker for alternative laboratory examination to effectively distinguish patients with tuberculosis from those with other related diseases and healthy individuals24. Results from Wu et al. collectively indicate that IRF1 could serve as a new biomarker for the diagnosis of pulmonary tuberculosis, and IRF1 was seen to be significantly upregulated in patients with TB compared with healthy controls25. However, these studies focused on DEGs between healthy people and patients. Comparative analysis of blood samples from subjects afflicted by these pathologies in an endemic setting may yield new differentiating biomarkers and insights into inflammatory and immunological responses to ATB and NTM. Studies demonstrated that plasma proteomes are different comparing LTBI, ATB, NTM and HC cohorts, and found that these changes were augmented following prior blood immune cell stimulation with the ESAT-6/CFP-10 antigen cocktail26. Studies also confirmed that whole-genome sequencing (WGS) may serve as a high-resolution assay for the diagnosis of different subtypes of Mycobacterium infection11. Moreover, when combining sequencing technology with traditional diagnostic methods could identify pathogens at the early stages of the disease27. Nucleic acid amplification testing combined with acid-fast staining in lung biopsy tissues can lead to timely and accurate diagnosis in patients with smear-negative pulmonary tuberculosis28.
Acid-fast bacilli (AFB) sputum smear microscopy and Mtb culture are commonly used to diagnose ATB29. However, AFB smear microscopy has low sensitivity30 and is not suitable for monitoring therapy effects as it cannot distinguish between living and dead bacilli. Although the nanopore sequencing assay using clinical samples showed the best diagnostic accuracy for sputum-scarce ATB31, for patients who had no positive tuberculosis results (smear or culture), the time to diagnosis has been significantly reduced and the diagnostic efficacy has been declined32. In this case, Transcriptome RNA-Sequencing provide some new information on the role of host’s protective immune responses to Mtb33.
In this study, we analyzed representative DEGs that differed between among HC, NTM, LTBI and ATB groups using transcriptome RNA-sequencing datasets. Although the levels of HBA1/2 were upregulated among all diseases group, the expressions between LTBI and ATB had no significant differences. Moreover, the expressions of HBA1/2 were between HC and LTBI/ATB groups, by utilising the corresponding cutoffs could efficiently differ patients of HC and NTM. As hemoglobin subunit, HBA1/2 were upregulated in NTM group. Indeed, the most NTM-linked most top upregulated gene, revealed by our GSEA analysis, were erythrocytes take up carbon dioxide and release oxygen. Studies showed that the enhancement of oxidative stress in the process closely related to methemoglobin generation in human erythrocytes34.
Bacterial coinfections were observed in 19–66% of patients with MAC-PD during the entire duration of the disease, and patients with bacterial culture positive at diagnosis were more likely to develop bronchiectasis exacerbation35. Therefore, patients with MAC-PD should initiate treatment, of course, the most important thigh before that is diagnosis of this disease. HBA1/2 could identify NTM from LTBI with AUCs of 0.803 and 0.798, respectively. Furthermore, HBD could distinguish NTM from ATB with an AUC of 0.772.
Therefore, these genes have increased the efficiency of NTM identification.
METTL7B, methyltransferase-like 7B, is a member of methyltransferase-like family36. Little is known about the exact role of METTL7B in LTBI and ATB. Our results showed that the downregulated METTL7B could not only distinguishing LTBI from HC with an area under the curve (AUC) of 0.87 (95% CI: 0.81 to 0.94) (P < 0.05), but also the expression of METTL7B between LTBI and ATB was significantly different (P < 0.001). Recent studies reported that METTL7B may be a novel tumor immune microenvironment-associated biomarkers, which correlated with the prognosis of patients prostate adenocarcinoma37. Li et al. found that interfering with METTL7B expression could significantly cause inhibition of lung adenocarcinoma by modulating the ability of cells to proliferate and migrate38. Study provides evidence that PD-1/PD-L1 signaling pathway, which also regulated by METTL7B39, may associated with ATB or other chronic infection and even prevent patients from development of cancer40.
Conclusions
In summary, this study employed bioinformatics and machine learning techniques to identify four potential Mycobacterium-related genes. The diagnostic significance of these genes (HBA1, HBA2, HBD, and METTL7B) was validated using external validation cohorts from the GEO dataset. These findings suggest that these genes could potentially improve the diagnosis of patients with NTM, LTBI, or ATB, particularly in cases where culture or other etiological methods are unsuccessful. Furthermore, our study provides insights into the signaling pathways enriched in mycobacterial infections, thereby contributing to the design of novel therapeutic strategies for managing this disease.
Data availability
The datasets (GSE205161, GSE107991, GSE199403 and GSE152218) for this study can be found in the GEO datasets (https://www.ncbi.nlm.nih.gov/geo/).
References
Orgeur, M., Sous, C., Madacki, J. & Brosch, R. Evolution and emergence of Mycobacterium tuberculosis. FEMS Microbiol. Rev. 48(2), 53 (2024).
Kim, S. Y., Zo, S., Kim, D. H., Shin, S. J. & Jhun, B. W. Single-cell transcriptomics by clinical course of Mycobacterium avium complex pulmonary disease. Sci. Rep. 14(1), 15663 (2024).
Deveci, H. S., Kule, M., Kule, Z. A. & Habesoglu, T. E. Diagnostic challenges in cervical tuberculous lymphadenitis: A review. North Clin. Istanb. 3(2), 150–155 (2016).
Sharma, S. K. & Upadhyay, V. Epidemiology, diagnosis & treatment of non-tuberculous mycobacterial diseases. Indian J. Med. Res. 152(3), 185–226 (2020).
Zellweger, J. P., Sotgiu, G., Corradi, M. & Durando, P. The diagnosis of latent tuberculosis infection (LTBI): currently available tests, future developments, and perspectives to eliminate tuberculosis (TB). Med. Lav. 111(3), 170–183 (2020).
Villar-Hernandez, R. et al. Development and evaluation of an NTM-IGRA to guide pediatric lymphadenitis diagnosis. Pediatr. Infect. Dis. J. 43(3), 278–285 (2024).
Takhar, R. P., Mirdha, K., Purohit, G., Maan, L. & Bainara, M. K. Impact of HIV co-infection on clinical presentation in patients with TB and correlation of the findings with level of immune suppression. Tanaffos. 17(3), 188–197 (2018).
de Vries, G. et al. Diagnosis, treatment and transmission of rifampicin-resistant TB in the Netherlands, 2010–2019. Int. J. Tuberc Lung Dis. 27(6), 471–477 (2023).
Getahun, M. et al. Conventional and molecular methods for the diagnosis of smear negative pulmonary TB and concordance of empirical TB diagnosis with culture assay. Ethiop. J. Public Health Nutr. 1(2), 99–104 (2017).
He, Y., Gong, Z., Zhao, X., Zhang, D. & Zhang, Z. Comprehensive determination of mycobacterium tuberculosis and nontuberculous mycobacteria from targeted capture sequencing. Front. Cell Infect. Microbiol. 10, 449 (2020).
Ge, H. et al. Case report: Intraabdominal infection of Mycobacterium syngnathidarum in an immunocompetent patient confirmed by whole-genome sequencing. Front. Med. (Lausanne). 10, 1265594 (2023).
Choi, Y., Nam, M. W., Lee, H. K. & Choi, K. C. Use of cutting-edge RNA-sequencing technology to identify biomarkers and potential therapeutic targets in canine and feline cancers and other diseases. J. Vet. Sci. 24(5), e71 (2023).
Rezk, M. et al. Biomarker screening in preeclampsia: An RNA-sequencing approach based on data from multiple studies. J. Hypertens. 40(10), 2022–2036 (2022).
Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000).
Kanehisa, M., Furumichi, M., Sato, Y., Kawashima, M. & Ishiguro-Watanabe, M. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res. 51(D1), D587–D592 (2023).
Kanehisa, M. Toward understanding the origin and evolution of cellular organisms. Protein Sci. 28(11), 1947–1951 (2019).
Prieto, M. D. et al. Whole blood RNA-seq demonstrates an increased host immune response in individuals with cystic fibrosis who develop nontuberculous mycobacterial pulmonary disease. PLoS One. 17(12), e0278296 (2022).
Singhania, A. et al. A modular transcriptional signature identifies phenotypic heterogeneity of human tuberculosis infection. Nat. Commun. 9(1), 2308 (2018).
Lindestam Arlehamn, C. S. et al. T-cell deficiency and hyperinflammatory monocyte responses associate with Mycobacterium avium complex lung disease. Front. Immunol. 13, 1016038 (2022).
VanValkenburg, A. et al. Malnutrition leads to increased inflammation and expression of tuberculosis risk signatures in recently exposed household contacts of pulmonary tuberculosis. Front. Immunol. 13, 1011166 (2022).
Barrett T, Wilhite SE, Ledoux P, et al. NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 2013;41(Database issue):D991–995.
Edgar, R., Domrachev, M. & Lash, A. E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30(1), 207–210 (2002).
Zhou, L. et al. A new single gene differential biomarker for Mycobacterium tuberculosis complex and non-tuberculosis mycobacteria. Front. Microbiol. 10, 1887 (2019).
Khamchun, S. & Pongtussanahem, O. Coronin-1A serves as a serum biomarker for supportive diagnosis of Mycobacterium tuberculosis infection. Germs. 13(1), 20–31 (2023).
Wu, L. et al. IRF1 as a potential biomarker in Mycobacterium tuberculosis infection. J. Cell Mol. Med. 25(15), 7270–7279 (2021).
Teklu, T. et al. Differences in plasma proteomes for active tuberculosis, latent tuberculosis and non-tuberculosis mycobacterial lung disease patients with and without ESAT-6/CFP10 stimulation. Proteome Sci. 18(1), 10 (2020).
Liu, Y., Ma, X., Chen, J., Wang, H. & Yu, Z. Nontuberculous mycobacteria by metagenomic next-generation sequencing: Three cases reports and literature review. Front. Public Health. 10, 972280 (2022).
Jiang, F. et al. Nucleic acid amplification testing and sequencing combined with acid-fast staining in needle biopsy lung tissues for the diagnosis of smear-negative pulmonary tuberculosis. PLoS One. 11(12), e0167342 (2016).
Nargan, K. et al. Detection of Mycobacterium tuberculosis in human tissue via RNA in situ hybridization. bioRxiv. 2, 85 (2023).
Paradkar, M. S. et al. Early microbiologic markers of pulmonary tuberculosis treatment outcomes. Ann. Am. Thorac. Soc. 20(12), 1760–1768 (2023).
Zhou, L., Zou, X., Hu, Q., Hua, H. & Qi, Q. Determination of the diagnostic accuracy of nanopore sequencing using bronchoalveolar lavage fluid samples from patients with sputum-scarce pulmonary tuberculosis. J. Infect. Chemother. 30(2), 98–103 (2024).
Yang, S., Guo, J. Q., Li, T. X. & Tang, S. J. Annual progress of interventional diagnosis and treatment of tuberculosis in 2023. Zhonghua Jie He He Hu Xi Za Zhi. 47(4), 363–370 (2024).
Shekarkar, A. M. et al. Integrated analysis of single-cell and bulk RNA sequencing data reveals memory-like NK cell subset associated with mycobacterium tuberculosis latency. Cells. 13(4), 536 (2024).
Chemlal, H. et al. Close relationship between high HbA(1c) levels and methemoglobin generation in human erythrocytes: The enhancement of oxidative stress in the process. J. Diabetes Complications. 37(8), 108543 (2023).
Ito, M. et al. Multiple bacterial culture positivity reflects the severity and prognosis as bronchiectasis in Mycobacterium avium complex pulmonary disease. Respir. Med. 219, 107417 (2023).
Fu, R., Luo, X., Ding, Y. & Guo, S. Prognostic potential of METTL7B in Glioma. Neuroimmunomodulation. 29(3), 186–201 (2022).
Zong, S. & Gao, J. Identifying the tumor immune microenvironment-associated prognostic genes for prostate cancer. Discov. Oncol. 15(1), 42 (2024).
Li, R., Mu, C., Cao, Y. & Fan, Y. METTL7B serves as a prognostic biomarker and promotes metastasis of lung adenocarcinoma cells. Ann. Transl. Med. 10(16), 895 (2022).
Luo, C. et al. A whole exon screening-based score model predicts prognosis and immune checkpoint inhibitor therapy effects in low-grade glioma. Front. Immunol. 13, 909189 (2022).
Cao, S., Li, J., Lu, J., Zhong, R. & Zhong, H. Mycobacterium tuberculosis antigens repress Th1 immune response suppression and promotes lung cancer metastasis through PD-1/PDl-1 signaling pathway. Cell Death Dis. 10(2), 44 (2019).
Funding
This work was sponsored by Hangzhou Medical Health Science and Technology Project [No. A20220558]; Zhejiang General Research Project on Medical Health and Science Technology Plan [No.2020KY238]; The Public Welfare Technology Research Program in Zhejiang Province [No. LGF21H190002].
Author information
Authors and Affiliations
Contributions
All authors contributed to the conceptualization, writing of the original draft, as well as the review and editing of the final version. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Jia, Q., Wu, Y., Huang, Y. et al. New genetic biomarkers from transcriptome RNA-sequencing for Mycobacterium tuberculosis complex and Mycobacterium avium complex infections by bioinformatics analysis. Sci Rep 14, 17385 (2024). https://doi.org/10.1038/s41598-024-68242-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-68242-9