Abstract
Antimicrobial combination therapy is widely used to combat Mycobacterium tuberculosis (Mtb), yet resistance rates continue to rise. Mutator strains, with defects in DNA repair genes, drive resistance in other bacterial infections, but their role in Mtb remains unclear. Here, we study the contribution of single nucleotide polymorphisms (SNPs) in DNA Repair, Replication, and Recombination (3 R) genes to Mtb resistance. Through large-scale bioinformatics analysis of 53,589 whole-genomes, we identified 18 novel SNPs in lineages 2 and 4 linked to genotypic drug resistance in 3 R genes, covering 12.5% of clinical isolates with available genome sequences. Notably, a number of the detected SNPs were positively selected during Mtb evolution. Experimental tests showed that mutM, fpgg2, xthA, and nucS mutants had increased the mutation frequency compared to the wild type. Our findings highlight the role of 3 R gene mutations in resistance, emphasizing the need for surveillance to improve early detection and control strategies.
Similar content being viewed by others
Introduction
The social and economic burden of tuberculosis (TB), caused by Mycobacterium tuberculosis (Mtb), is worsened by the growing issue of antibiotic resistance. Unlike many other bacteria, Mtb acquires drug resistance exclusively through spontaneous mutations such as errors in replication fidelity or nucleotide modifications within chromosomal genes encoding drug targets or drug-activating enzymes, rather than through horizontal gene transfer1,2,3. In addition to replication errors, Mtb’s genome stability is further challenged by the harsh conditions within host macrophages, where oxidative stress and antibiotic pressure increase mutation rates4,5. This, in turn, promotes mutations, including in genes encoding drug targets, ultimately driving antimicrobial resistance (AMR)6,7.
To counteract DNA damage from replication errors, oxidative stress, and antibiotic pressure, mycobacteria have evolved multiple DNA repair mechanisms that help maintain genome stability5. However, mutations in these repair system genes can lead to the emergence of mutator genotypes with elevated mutation rates8. Mutator genotypes are common in bacterial infections and play a key role in antibiotic resistance. In Pseudomonas aeruginosa, Staphylococcus aureus, and Haemophilus influenzae infections in cystic fibrosis patients, mutators arise due to DNA repair deficiencies, leading to positive selection9,10,11. Similarly, Escherichia coli mutator strains frequently exhibit mismatch repair (MMR) deficiencies12. In Mycobacterium leprae, resistance mutations in hyper-mutator strains are linked to loss-of-function variants of the endonuclease Nth, a key enzyme in Base Excision Repair (BER)13.
In Mtb, several studies have investigated mutations in DNA repair genes for their role in mutator phenotypes14,15,16, but only two have explored their potential links to drug resistance17,18. One missense SNP in the proofreading (PHP) domain of DNA polymerase (dnaE1) was identified as a mutator variant, present in 3% of 1700 tested clinical isolates15. Similarly, a study of 1600 Mtb samples uncovered five independent potential mutators, including mutations in the MMR component NucS, which conferred a mutator phenotype in M. smegmatis16,18. More recently, analysis of 2237 Mtb isolates identified four mutations in DNA repair genes (uvrA and uvrB of the Nucleotide Excision Repair (NER) pathway, recF of Homologous Recombination (HR), and mutY of BER that were specifically associated with drug resistance17. Additionally, a study of 67 isolates identified 14 fixed SNPs in 14 DNA repair genes within lineage 4 exclusively present in drug-resistant genomes19. While these studies have demonstrated that Mtb strains with altered DNA repair mechanisms may have fitness advantages, their contribution to drug resistance remains an emerging area of research. These studies are limited by small datasets (1600–2300 isolates) and geographic constraints, as they were conducted in only a few countries, failing to capture the full global diversity of Mtb. As a result, significant knowledge gaps persist. Our study aims to address these limitations by analyzing a more extensive and diverse collection of Mtb isolates, providing deeper insights into the potential role of mutator strains in drug resistance.
In this study, we provide a comprehensive analysis of the implications of 3 R gene variants (Replication, Recombination, and Repair) in the evolution of molecular resistance mechanisms in Mtb. We selected a large panel of 55 genes associated with these mechanisms and explored their diversity using an extensive dataset of clinical and animal isolates from GenBank (54,619 sequence read archive SRA). Through genome-wide association analysis (GWAS), we identified 18 novel variants in different gene systems, all showing relatively high frequencies associated with drug resistance. Additionally, we identified rare variants that emerged independently in various Mtb lineages. We predicted the effects of these polymorphisms on protein activity using computational methods and experimentally validated four mutator candidates by assessing their impact on mutation frequency in Mycobacterium smegmatis (Msm), a model for Mtb. This study enhances our understanding of the role of 3 R genes in drug resistance emergence and may contribute to developing accurate diagnostic methods based on whole-genome sequencing (WGS) data for detecting bacteria at risk of developing resistance in clinical settings.
Results
Phenotypic drug susceptibility and genotypic profiles of Mtb Strains
We analyzed a comprehensive set of 54,619 whole genome sequences, comprising 51,095 clinical Mtb strains and 3524 animal strains sourced from GenBank. Following quality filtering (see Methods), the dataset included 53,589 genomes representing Mtb strains from various regions worldwide. We developed a high-throughput pipeline to collect SNPs only within a specified list of genes: 55 genes related to the 3 R genes, and genes used for strain classification and resistance prediction. All known global Mtb lineages infecting humans were represented, including lineage 1 (6.75%), lineage 2 (29.78%), lineage 3 (7.99%), lineage 4 (48.26%), lineage 5 (M. africanum, also known as West Africa 1, 0.29%), lineage 6 (M. africanum, also known as West Africa 2, 0.30%), lineage 7 (0.05%), lineage 8 (0.004%), and lineage 9 (0.002%). However, we did not identify any isolates of lineage 10. Additionally, animal lineages were present, including M. bovis (6.20%), M. caprae (0.22%), and M. orygis (0.15%).
To assess the distribution of drug resistance in this dataset, we computationally predicted resistance for each isolate genome across twenty drugs using the consensus list from TB-Profiler20, a curated compilation of resistance-associated polymorphisms (see Methods). The genotypic analysis of susceptibility to anti-tuberculosis drugs, using the old terminology, revealed that 39,999 strains (74.7%) were classified as sensitive, 8,689 strains (16.2%) as MDR (carrying mutations conferring resistance to both rifampicin and isoniazid), 4,787 strains (8.9%) as Pre-XDR (MDR genotype with additional resistance to either ofloxacin or kanamycin), and 114 strains (0.2%) as XDR (with mutations for resistance to rifampicin, isoniazid, ofloxacin, and kanamycin) (Fig. 1a). In comparison to global resistance rates, we observed the highest resistance rates as expected in lineage 2 and to a lower extent in lineage 4 (Fig. 1b), consistent with findings from other studies21,22. Lineage 8 and lineage 9 were not included in Fig. 1b due to their very low frequencies.
a The pie chart represents the percentage of drug-resistant type of the Mtb isolates (n = 53,589). b The bar graph represents the proportion of sensitive and drug-resistant strains in each lineage including Sensitive, multidrug-resistant (MDR), pre-extensively drug-resistant (PRE-XDR), and extensively drug-resistant (XDR).
GWAS reveals mutator SNPs exhibiting association with drug resistance phenotype in Mtb
To identify SNPs contributing to the development of antibiotic resistance in Mtb, we conducted an initial screening using allele counting to examine the association between gene adaptive diversity and mutations in 3 R genes23. We hypothesized that mutator strains, if they exist, provide a favorable genetic background for the evolution of drug resistance, and are more likely to carry resistance mutations linked to higher mutation rates in MDR, Pre-XDR, and XDR strains. From our SNP-calling pipeline, we generated a comprehensive table (covering 53,589 isolates) that detailed 3 R gene mutations, resistance profiles, and classifications for each isolate (see Methods). This dataset was used to statistically assess the relationship between 3 R gene SNPs and drug resistance through chi-square tests and Fisher’s exact tests for smaller SNP samples. This approach aimed for high sensitivity, acknowledging the typical trade-off with specificity found in allele-counting methods23. These tests, without accounting for population structure and recombination, identified 156 missense SNPs across 43 out of 55 3 R genes, potentially linked to antibiotic resistance (p-value < 0.05) (Supplementary data S1).
We then performed a second screening using TreeWAS24 to improve specificity by filtering out false positives from the initial analysis. TreeWAS is a tool for Genome-Wide Association Studies that considers population structure and recombination, making it particularly useful for clonal organisms like Mtb. Incorporating evolutionary relationships improves the accuracy of detecting true genetic associations with phenotypes, such as drug resistance. TreeWAS calculates three scores: terminal, simultaneous, and subsequent using phenotype and genotype trees. The terminal score correlates leaf phenotypes (resistant vs. sensitive) and genotypes, the simultaneous score tracks parallel changes in branches, and the subsequent score assesses the proportion of branches with concurrent phenotype-genotype changes. We applied TreeWAS to each 3 R gene individually to identify associations with drug resistance profiles.
Overall TreeWAS identified 18 non-synonymous SNPs in 3 R genes significantly associated with resistance (using the terminal and subsequent scores, p < 0.000001). All were part of the 156 SNPs detected using the allele counting method (Table 1). Among these, five SNPs were supported by the three scores and affected xthA (Fig. 2A, B), mutT3, recGwed, recN, and uvrD2 (Table 1). These 18 SNPs were found in lineage 2 and/or in lineage 4, with frequencies ranging from 0.05% (Trp196Arg in MutT3) to 1.99% (Glu331Gly in UvrD2) altogether concerning 6650 clinical samples (Table 1). They are considered candidate mutators in the subsequent sections.
A Null Distributions of TreeWAS Scores: This panel displays the null distributions of simulated SNP associations for the three TreeWAS scores: terminal, simultaneous, and subsequent. The Trp85Arg SNP in xthA (nucleotide positions 252 and 253) surpasses the significance threshold in all three distributions, strongly indicating its association with antibiotic resistance. B Manhattan Plots of SNP Associations: These plots visualize the distribution of SNP associations across amino acid positions in genes, with the red line representing the significance threshold. SNPs exceeding this threshold, including the Trp85Arg variant, are highlighted to emphasize their significance. The colors of the data points were randomly assigned to differentiate SNPs across the three TreeWAS scores.
To further investigate the distribution of these 18 SNPs across the different 3 R genes, we conducted a principal component analysis (PCA) to determine if significance clustered according to specific functions among the 3 R genes. This analysis did not reveal a distinct system-specific cluster (Supplementary Fig. S1), suggesting that all DNA repair and replication systems may contribute to the emergence of mutator genotypes associated with increased resistance.
Positive selection and damaging effects of SNPs associated with drug resistance
To investigate how selection has influenced the evolution of 3 R genes, we conducted branch-site tests using BUSTED25. This type of analysis detects episodic diversifying selection and assesses individual codons likely under positive selection within each gene. Among the 18 genes analyzed, 11 exhibited significant signatures of positive selection (FDR-adjusted p-values < 0.05), including dnaE1, mutM, xthA, nei1, mutT3, uvrA, uvrB, recA, ssbB, and recB (Supplementary Table S1). Notably, 8 of the 18 SNPs identified by TreeWAS as being associated with drug resistance were under diversifying positive selection (Table 1, ER).
To evaluate the potential phenotypic damage caused by these SNPs, we analyzed their ability to disrupt protein function, structure, stability, or expression in vivo using the EVmutation tool26. This tool models epistasis by accounting for interactions between protein residues and predicts the deleterious effects of mutations. We selected EVmutation because it provides precomputed predictions for all possible amino acid substitutions in most mycobacterial DNA repair proteins, classifying mutations as beneficial, neutral, or damaging. Our analysis revealed that all 18 SNPs associated with drug resistance were classified as damaging, with effect scores ranging from -2.49 to -10.16, indicating highly damaging effects (Table 1).
Phylogenetic features of candidate mutator SNP isolates
To investigate the evolutionary dynamics of Mtb strains carrying candidate mutator SNPs, we retrieved whole-genome SNPs distinguishing two sets of Mtb isolates within the same sub-lineage: those carrying the target mutator candidate and those without it. SNPs in repetitive regions were excluded from the analysis. We constructed maximum likelihood (ML) phylogenies using the GTR + Γ+I substitution model. The GTR is built on the molecular diversity present in the samples and is widely used in Mtb evolutionary studies. To further refine the model, gamma-distributed rate variation (+Γ) was incorporated to capture evolutionary rate heterogeneity, while invariant sites ( + I) accounted for conserved regions.
Specifically, we analyzed 84 whole genomes, including 39 isolates with the Trp85Arg SNP in XthA and 45 control isolates from the same sub-lineage (L2.2.1), which lacked this SNP or any other resistance-associated mutations in the 3 R genes. The majority of isolates carrying the Trp85Arg SNP exhibited multidrug-resistant (MDR) or pre-extensively drug-resistant (PRE-XDR) phenotypes (Fig. 3a). Phylogenetic analysis revealed that isolates with this SNP clustered in a distinct branch with strong bootstrap support (100%). This cluster is characterized by an unexpectedly long branch compared to other groups of isolates (Fig. 3b). Similar analyses were conducted for the other candidate SNPs when control group strains of the same sub-lineage were available, including those in dnaE1, polD2, mutM, uvrA, mutY, nei1, uvrD2, recA, recC, and recN. In these reconstructions, long branches were consistently observed, with a few terminal branches also being relatively long (Supplementary Figs. S2).
a ML tree of 85 Mtb genomes, illustrating the presence or absence of the XthA SNP (Trp85Arg) and the associated drug-resistant profiles of the isolates. b An unrooted form of the ML tree illustrating the presence or absence of the XthA SNP (Trp85Arg) and showing the evolutionary relationships among these strains. The tree is drawn to scale, with branch lengths representing the number of substitutions. The isolate SRR10828835 (lineage 8) was used as an outgroup for all analyses. Bootstrap support values of 100% (based on 1000 replicates) are marked by yellow dots on the internal branches. The branch and label colours represent the presence of the SNP (red) and the absence of the SNP (blue). The colour strip indicates the drug-resistant profiles (Sensitive, MDR, PRE-XDR, and XDR) of the isolates.
To verify the accuracy of these long branches, we performed pairwise distance tests, comparing genetic distances (quantified by sequence divergence) to tree-based distances. These tests confirmed that the branch lengths reflect true evolutionary distances rather than being artefacts of the reconstructions. The results of the Mantel tests further supported this conclusion, with a Mantel statistic of 0.99 and a p-value of 0.001, indicating a highly significant correlation between the genetic distances and tree-based distances. Pearson and Spearman correlation coefficients were both 0.99, demonstrating a very strong linear and monotonic relationship between these two distance measures.
Mutator effect for the selected SNPs in the non-pathogenic Msm model
Our next objective was to assess the impact of consistent candidate mutators on mutation frequency in vitro. In addition to the analyses performed earlier (BUSTED and EVmutation), we estimated the evolutionary conservation of each amino acid position to predict the mutational impacts on the structure and function of the corresponding protein using the ConSurf tool27 (Table 1). We focused on highly conserved SNP positions that demonstrated evidence of positive selective pressure at the codon site, with damaging mutation effect strengths of ≤ -5. Additionally, the selected SNP positions needed to be conserved in Msm, as this organism was utilized as a model species for Mtb, consistent with previous mutagenesis studies. Out of the 18 SNPs identified as associated with drug resistance using TreeWAS, three candidates met all the above criteria: MutM (Glu256Asp), Fpg2 (Tyr50His), and XthA (Trp85Arg) (marked in bold in Table 1). The frequencies of these three alleles in the studied population ranged from 0.55% for Glu256Asp to 1.46% for Tyr50His (Table 1).
In addition to the three selected SNPs, we aimed to investigate a mutation in nucS, a novel gene involved in the mismatch repair (MMR) system in Mtb. NucS corrects mismatched bases that occur during DNA replication, thereby preventing point mutations28,29. We focused on the Tyr132Ser mutation in NucS, which was not detected by TreeWAS due to its low frequency in the studied population (0.01%). However, this mutation was identified in two independent lineages (L2 and L3), indicating a rare instance of convergence (Supplementary data S1). Furthermore, the codon position for this mutation exhibited strong selection (40.73) during evolution, as determined by the BUSTED analysis (Supplementary data S1), highlighting a functional nuclease domain in nucS with a damaging effect score of −5.161.
To test the effects of the selected four allelic polymorphisms on the activity of different genes, we constructed three mutant alleles by introducing point mutations at the appropriate positions in the chromosome of Msm through allelic exchange. For the NucS (Tyr132Ser) allele, we employed the CRISPR-Cas12a-assisted recombineering technique (see Methods). Our analysis did not reveal any significant differences in growth between the WT strain and the four mutants, nor did we observe any noticeable variations in colony size or morphology on the plates (Fig. 4a), as all strains exhibited similar growth patterns. We assessed the phenotypes of the constructed strains by measuring spontaneous resistance to rifampicin and isoniazid. Hyper-mutability was defined as a 10-fold increase in mutation frequency compared to the wild-type allele. As anticipated, the four alleles (MutM (Glu256Asp), Fpg2 (Tyr50His), XthA (Trp85Arg), and NucS (Tyr132Ser)) resulted in increases in mutation frequency of 10, 14, 14, and 18 folds, respectively, for rifampicin, and 20, 18, 19, and 24 folds, respectively, for isoniazid (Fig. 4b). These experimental results suggest that these four SNPs cause DNA repair defects in Mycobacteria.
a Qualitative Assessment of Hypermutability: This panel illustrates the number Rifampicin-resistant colonies of the mutant NucS (Tyr132Ser) compared to the wild-type (M. smegmatis mc²155, WT), providing qualitative insight into the hypermutable phenotype. b Quantitative Comparison of Mutation Frequencies: The box plot presents the frequency of spontaneous mutations conferring resistance to rifampicin and isoniazid across M. smegmatis WT and various mutant strains. The fold-change in mutation frequencies relative to WT is also displayed. Data represent three biologically independent experiments, each conducted in triplicate. Statistical significance was assessed using one-way ANOVA (R base package), with ***p < 0.0001 indicating highly significant differences.
Discussion
The global rise in Mtb antimicrobial resistance demands innovative and measurable strategies to stop its spread. Effective approaches, such as combination antibiotic therapy, are urgently needed. This strategy uses multiple drugs simultaneously, making resistance rare, as it requires several mutations to occur at once in the same genetic background30. Under these conditions, wild-type bacteria generally cannot develop resistance. However, mutator strains with defects in the 3 R genes can more rapidly evolve multi-drug resistance31. Thus, early identification of these mutator strains is crucial for enhancing the effectiveness of combination therapies and improving treatment outcomes. In this study, we first applied a bioinformatics strategy using whole-genome sequencing data to extract mutations in 3 R and antibiotic-resistance genes. Next, we conducted a GWAS to identify potential mutator candidates associated with genotypic drug resistance, revealing further evidence of their mutator nature. Computational tools were then used to prioritize promising candidates, which were experimentally validated by introducing point mutations into the Msm chromosome, followed by mutation frequency tests.
Our dataset, comprising 53,589 SRAs, represents the global diversity of clinical Mtb isolates. This extensive data enabled us to identify 18 SNPs associated with drug resistance across 18 genes involved in various DNA repair pathways (Table 1). Previous studies showed that mutators in clinical populations mainly arise from defects in the MMR pathway, leading to significant mutations that enhance bacterial survival32,33,34. These studies typically highlight mutations in key MMR genes such as mutS, mutL, and mutH as the primary drivers of mutator phenotypes, with mutS mutations being the most common35. However, our findings suggest that in clinical Mtb isolates, mutators do not arise from defects in the MMR pathway. Instead, they are linked to mutations in other DNA repair genes, suggesting that the mutator phenotype in Mtb involves multiple DNA repair pathways. On the other hand, although we detected numerous SNPs in error-prone repair systems (dnaE2, ku, and ligD; Supplementary data S1), our GWAS did not associate these mutations with drug resistance phenotypes. This suggests that these pathways do not play a major role in generating resistance-conferring mutations in Mtb, possibly due to its slow replication rate36, which may limit reliance on translesion synthesis (TLS) for stress survival. Instead, resistance mutations may arise more frequently from replication errors introduced by high-fidelity polymerases or through oxidative stress-induced mutagenesis rather than from specialized error-prone repair systems. These findings underscore the complex interplay of DNA repair systems in Mtb infections.
Previous studies have identified a missense SNP in the proofreading domain (PHP) of the DNA polymerase dnaE115 and five SNPs in the MMR component nucS16 which confer a slightly higher mutation rate when the mutated gene copies are transferred to Msm. However, these SNPs have not been linked to drug resistance profiles in clinical Mtb isolates, likely due to their rare occurrence in the population and the small sample sizes used. In addition, our analysis identified two SNPs (MutY_Arg262Gln and UvrA_Gln135Lys) (Supplementary data S1), previously reported as associated with drug resistance17, but these two SNPs did not show significant association with genotypic resistance in our study. Furthermore, a recent study analyzing 67 isolates identified 14 fixed SNPs in 14 DNA repair genes within lineage 4, exclusively present in drug-resistant genomes19. However, none of these SNPs were associated with resistance in our dataset. This discrepancy may be due to differences in sample size, as our study analyzed a significantly larger dataset, potentially diluting the impact of these mutations. Alternatively, the lack of association could be attributed to limited opportunities for strains carrying these mutations to acquire resistance, for instance due to stricter surveillance or reduced antibiotic use. Moreover, we focused on genotypic resistance and not phenotypic resistance. We further validated the hyper-mutability of four SNPs, demonstrating increases in mutation frequency exceeding tenfold in our experimental work (Fig. 4b). Hyper-mutators in natural often exhibit 10 to 100-fold increased mutation rates9,37. Additionally, under natural infection conditions, stress levels increase, likely leading to a higher mutation frequency. We speculate that mutation frequency tests would show elevated rates in such conditions due to increased DNA damage and the upregulation of DNA repair pathways. These elevated mutation rates can facilitate the selection of hyper-mutators, promoting the progression of chronic diseases and driving the evolution of resistance to therapeutic agents37. Also, our findings indicate that these SNPs have undergone positive selection during the evolution of Mtb, which strengthens our finding and indicates a link between hyper-mutability and antibiotic resistance in Mtb patients.
All 18 SNPs associated with resistance identified through GWAS were located in lineage 2 (9 SNPs) and lineage 4 (9 SNPs) (Table 1). Our study, along with previous research, has shown that Mtb strains from lineage 2 and lineage 4 are more likely to develop resistance (Fig. 1b) and maintain fitness in the presence of resistance-conferring mutations compared to other lineages38,39,40. Here, we hypothesize that one possible explanation for the emergence of resistance in these two lineages could be linked to the presence of mutator SNPs, which increase mutation rates. These mutator SNPs may create a favorable environment for adaptive mutations to occur more frequently, accelerating the evolution of resistance while preserving the overall fitness of the bacterial population.
Our analysis showed a significant prevalence of candidate mutators in Mtb, with 12.5% of the isolates carrying a consistent candidate mutator SNP (Table 1). This suggests that mutator candidates are surprisingly common within the Mtb population. Interestingly, not all candidate mutator strains exhibited genotypic resistance. A possible explanation for this is that SNPs in DNA repair pathway genes may promote the accumulation of other beneficial mutations in bacteria under stress, beyond those directly related to drug resistance. A subset of these mutations may indeed facilitate the organism’s adaptation to various selection pressures, such as the host immune response, while also enabling the bacteria to acquire resistance to multiple antibiotics sequentially or simultaneously. Of note, the increased genetic mutation rates may enhance the potential for developing resistance to second-line drugs even during a period with first-line treatment31.
The phylogenetic analysis revealed that, for each candidate mutator SNP, strains harboring the SNP clustered together in a single branch node, which was highly associated with drug resistance profiles (Fig. 3a) and characterized by a long branch compared to other groups of isolates (Fig. 3b). This pattern indicates a greater genetic divergence for these clades, suggesting that the corresponding strains have accumulated more mutations over a certain period. However, the terminal branches were mostly short. We propose that the introduction of these SNPs in the DNA repair genes may have initially contributed to an increased accumulation of mutations. The persistence of these mutations was then influenced by specific selective pressures, either related to treatment, host defense, or both. In a subsequent phase, the mutation rate may have dropped due to compensatory mutations. Hypermutable phenotypes can indeed confer advantages, such as increased adaptability, but they also come with the cost of accumulating deleterious changes. Once a mutator strain acquires an adaptive trait, such as drug resistance, it is expected to evolve back towards a lower mutation rate41,42. We speculate that the candidate mutator strains detected may currently exhibit lower mutation rates than they did in their recent history due to compensatory mutations elsewhere in the genome. Further experimental investigation of clinical strains or Mtb-engineered strains carrying these SNPs is needed to better understand 3 R gene interactions. Additionally, the variations in these strains, which likely provide advantageous traits, may offer insights into Mtb pathogenesis, including virulence, transmissibility, and persistence in TB patients.
While our large, globally representative dataset captures broad Mtb diversity, it may obscure regional patterns. For example, region-specific selective pressures, healthcare infrastructure, and transmission dynamics could shape Mtb diversity in ways that a global aggregation may overlook. Future studies focusing on geographically stratified analyses could provide deeper insights into these localized effects.
In conclusion, our study presents a comprehensive genome-wide catalog of SNPs in 3 R genes associated with drug resistance in clinical Mtb isolates. These findings have the potential to contribute to drug resistance screening, aiding in the development of more informed treatment strategies. By deepening our understanding of the genetic mechanisms underlying drug resistance, this research underscores the value of monitoring these SNPs in clinical settings to enhance resistance management and improve patient outcomes.
Methods
Characterization of Mtb diversity in DNA repair genes
A pipeline was developed and optimized to download and identify SNPs and small insertions and deletions (indels) in regions corresponding to an ancestral version of Mtb based on the H37Rv reference genome43, with parameters favoring sensitivity. Briefly, sequence reads were retrieved from the European Nucleotide Archive (ENA) (https://www.ebi.ac.uk/ena/) and the NCBI database (https://www.ncbi.nlm.nih.gov/) using the SRA toolkit. For each sample, Fastp software44 was utilized for adapter trimming and read filtering. Nucleotide positions in the reads with a quality score below Q25 were discarded. Quality profiling was checked with FastQC45. High-quality reads were then aligned to the ancestral reference genome using BWA-MEM2 software46, which is based on the H37Rv sequence and shares identical genome annotations43. Single nucleotide polymorphisms (SNPs) and small indels were called using the Genome Analysis Toolkit (GATK) Haplotype caller v447. Sample genotypes were determined by selecting the majority allele according to default parameters, with a minimum quality threshold of 20. Positions with insufficient support were classified as missing data. This approach prioritizes sensitivity at the expense of specificity. Specificity is ensured through the independent identification of SNPs across multiple samples. To optimize data quality and storage, SNPs were then extracted from a predefined list of genes of interest, which included 55 genes from various 3 R pathways, as well as genes important for classifying the isolate and predicting its resistance profile (see below).
Identification of lineage and drugs resistant profile in silico
The pipeline was also utilized for two main objectives: (1) to identify all mutations associated with antimicrobial resistance (AMR) according to the catalogue of mutations in Mtb published by the WHO in 2022 i.e. to obtain genotypic resistance, and (2) to identify classification SNPs as outlined in previous studies48,49,50,51. Specifically, each sample was characterized as resistant or susceptible to common drug types, categorized using the old terminology as follows: fully sensitive (S), multidrug-resistant (MDR), pre-extensively drug-resistant (Pre-XDR), and extensively drug-resistant (XDR). Additionally, samples were assigned to known tuberculosis lineages or sub-lineages. A total of twenty-one drugs were included in the genome-wide analysis, comprising isoniazid (INH), rifampicin (RIF), ethionamide (ETH), pyrazinamide (PZA), ethambutol (EMB), streptomycin (STM), amikacin (AMK), capreomycin (CAP), kanamycin (KAN), ciprofloxacin (CIP), ofloxacin (OFL), moxifloxacin (MOX), cycloserine (CYS), and para-aminosalicylic acid (PAS). Drug family groups, including second-line injectable drugs (SLIDs: AMK, KAN, and CAP) and fluoroquinolones (FLQs: CIP, OFL, and MOX), were also analyzed. Samples with ambiguous resistance or classification outputs were excluded from further analysis.
Allele counting association analysis
A summary table of mutations and isolates was created to conduct chi-square or Fisher’s exact tests, depending on the sample sizes, to identify significant associations between the presence of mutations in 3 R genes and drug resistance. All analyses were performed using the R base package52.
GWAS using a phylogenetic tree-based approach
A phylogenetic method for conducting Genome-Wide Association Studies (GWAS) was implemented using the TreeWAS package in R24. TreeWAS assesses the statistical association between the drug-resistant profiles of isolates and their genotypes at all loci, identifying significant associations while accounting for the confounding effects of clonal population structure and homologous recombination. This method also computes the homoplasy distribution, which includes site-specific substitution counts derived from the empirical dataset using the Fitch parsimony algorithm. Association testing between each genetic locus and phenotype is performed through three independent tests for each locus24. We conducted TreeWAS analyses on each of the 43 genes individually, looking for associations between variants within the 3 R genes and genotypic drug resistance profiles. Concatenated SNP alignments over the whole genome were generated using TB Annotator53. This platform allows easy selection of samples, implements the GATK variant caller, and ensures stringent selection of SNPs in non-repetitive regions. The SNP alignments were utilized to reconstruct a maximum-likelihood phylogeny with FastTree v2.1.854. Subsequently, this phylogeny, along with the resistance profile matrix (sensitive/resistant) and the multiple sequence alignment for each of the 3 R genes, served as inputs for TreeWAS v1.024.
Selection Analysis
Positive selection analysis was conducted using the Branch-site Unrestricted Statistical Test for Episodic Selection (BUSTED)25. This analysis aimed to (1) assess each of the 3 R genes for evidence of episodic selection utilizing a pre-calculated phylogenetic tree, and (2) compare the rates of synonymous (silent) and non-synonymous (amino acid-changing) substitutions at each codon site.
Compute mutation effects strength predicted from sequence co-variation
The EVmutation server26, which employs evolutionary couplings from sequence covariation, was utilized with default parameters to calculate the quantitative effects of each mutator candidate on the stability and function of the corresponding protein.
Estimate the evolutionary conservation of amino acid positions
The ConSurf web server27 was employed with default parameters (E-value cutoff < 0.0001) to assess the evolutionary conservation of amino acid positions in the 3 R proteins of interest. This analysis utilized a probabilistic framework grounded in the phylogenetic relationships among homologous sequences, following a Bayesian method.
Principal component analysis
Principal Component Analysis (PCA) was conducted using the stats package in R52. The analysis was performed with data centering but without scaling. Subsequently, the PCA results were visualized using the ggplot2 package55.
Phylogenetic trees construction and pairwise-distance analyses
To reconstruct the phylogenetic trees, a multiple sequence alignment of concatenated SNPs from the whole Mtb genome was obtained using TB-annotator53. We inferred maximum likelihood (ML) phylogenies for two sets of Mtb isolates of the same sub-lineage, those carrying and lacking the target mutator candidate, using RAxML-HPC56 with the GTR + Γ+I substitution model. The GTR is built on the molecular diversity present in the samples and is widely used in Mtb evolutionary studies. The gamma-distributed rate variation (+Γ) captures evolutionary rate heterogeneity, while invariant sites ( + I) address conserved regions. Mutations in repetitive areas were excluded from the analysis. The isolate SRR10828835 (lineage 8) served as an outgroup for all analyses. A total of 1000 bootstrapped datasets were utilized to estimate the statistical confidence of the nodes, and ascertainment bias associated with using only polymorphic sites was corrected using Felsenstein’s correction in RAxML. The resulting phylogenetic trees were visualized using the iTol tool software57.
The pairwise distances from the aligned FASTA sequences were calculated using the ape package in RStudio58, which employs the Jukes-Cantor model to assess genetic divergence. A phylogenetic tree was constructed using RAxML as described earlier and then downloaded to RStudio for further analysis. We performed correlation tests between the tree distances and pairwise genetic distances, calculating both Pearson and Spearman correlation coefficients for robustness. Additionally, a Mantel test was conducted using the vegan package59 in RStudio to assess the correlation between these distance matrices, yielding significant results. Finally, relationships were visualized using ggplot255, with scatter plots and linear regression lines constructed to illustrate the correlations.
Strains, media, and growth conditions
Escherichia coli (E. coli) strains DH5α and XL10, along with Mycobacterium smegmatis (Msm) mc²155 (GenBank: CP000480), were utilized in this study. E. coli strains were cultured at 37 °C in LB medium supplemented with appropriate antibiotics: 20 µg/ml kanamycin (Km) or 100 µg/ml ampicillin (Amp). The Msm wild-type strain mc²155 and its mutant derivatives were grown at 37 °C in either (1) Middlebrook 7H9 broth (Difco), supplemented with 0.05% Tween 80, 0.4% glycerol, and 5% albumin-dextrose-catalase (ADC, Sigma-Aldrich) on a shaking platform, or (2) on Middlebrook 7H10 agar, supplemented with 0.4% glycerol and 5% oleic-albumin-dextrose-catalase (OADC, Sigma-Aldrich). When necessary, antibiotics were added at the following concentrations: 20 µg/ml kanamycin, 25 µg/ml hygromycin (Hyg), or 50 µg/ml Zeocin (Zeo). Bacterial growth was monitored by measuring the OD600 at various time points.
Plasmid and oligonucleotides
All plasmids used in this study are listed in (Supplementary Table S2). The plasmid pJV53 is an Escherichia-Mycobacteria shuttle plasmid expressing the gp60-61 recombinase under TetO-regulated promoters (Addgene #26904)60. The pJV53-cas12a is a plasmid expressing (1) cas12a under an Acetamide-regulated promoter and (2) gp60-61 recombinase under TetO-regulated promoters (Addgene #158706)61. The pCR-Zeo plasmid was used for cloning and constitutive expression of crRNAs in Mycobacterium (Addgene #158709)61. All oligonucleotides were ordered from Eurogentec, Belgium, and listed in (Supplementary Table S3).
General protocol of DNA manipulation
Plasmid purification was conducted using the GeneJet Plasmid Midiprep or Miniprep Kit (Thermo Fisher Scientific). DNA digestion was carried out with FastDigest restriction enzymes (Thermo Fisher Scientific) unless stated otherwise. PCR reactions were performed using OneTaq® DNA polymerase (New England Biolabs) or DreamTaq DNA polymerase (Thermo Fisher Scientific). PCR product purification was done using the GeneJet PCR Purification Kit (Thermo Fisher Scientific). Sequencing was carried out by Eurofins Genomics, Ebersberg Germany, and DNA synthesis by Twist Bioscience.
Construction of sgRNA expression plasmids
The crRNAs targeting specific regions for mutating a particular gene in M. smegmatis were designed using CHOPCHOP62. Custom PAM (Protospacer Adjacent Motif) sequences “NGG” were utilized to identify target sites. Two complementary oligonucleotides containing the target sequence were synthesized and annealed to create a protospacer cassette with BpmI and HindIII overhangs at the 5’ and 3’ ends, respectively. This cassette was then cloned into the pCR-Zeo plasmid for the constitutive expression of crRNAs61.
Preparation of M. smegmatis electro-competent cells
An overnight starter culture (10 ml) of Msm wild-type strain mc2155 was diluted into 100 ml of Middlebrook 7H9 medium and grown at 37 °C with agitation (150 rpm) to an OD600 of 0.8–1.0. The culture was chilled on ice for 1 h and then bacteria were harvested by centrifugation (4000 g, 15 min, and 4 °C), washed three times with cold 10% glycerol, and finally resuspended in 10% glycerol at 1/500th of the original volume63. The plasmids pJV53 and pJV53-Cas12a were electroporated into Msm strain mc2155 and plated on a selective medium containing kanamycin (20 µg/ml). The resulting strains Msm (pJV53) and Msm (pJV53-Cas12a) were cultured at 37 °C until an OD600 of 0.45–0.5, then acetamide (0.2%, w/v) was added to induce Che9c 60–61 recombinases. After 3 h of induction, electro-competent cells were prepared as described above for the wild-type strain. All electroporations were carried out at 2.5 kV, 25 μF in 2 mm cuvettes using an Electroporator 2510 from Eppendorf.
Construction of NucSTyr132Ser, MutMGlu256Asp, Fpg2Tyr50His, XthATrp85Arg mutants in Msm
The NucSTyr132Ser allele was constructed using the CRISPR-Cas12a-assisted recombineering system61. First, two complementary oligonucleotides targeting nucS (CrNucS(132)_Fs and CrNucS(132)_Rs (Supplementary Table S3) were annealed and cloned into pCR-Zeo plasmid between BpmI and HindIII restriction sites. The resulting pCR-Zeo-nucS-Tyr132Ser plasmid (100 ng) was then electroporated into Msm (pJV53-Cas12a) competent cells together with the ssDNA oligonucleotides carrying the point mutation (500 ng). The pCR-Zeo plasmid contains the temperature-sensitive replication origin of pAL500064. The electroporated cells (100 µl) were then added to 1 ml 7H9 broth containing 10 ng/ml anhydrotetracycline (ATc), incubated for 4 h at 30 °C at 200 rpm, and finally plated on selective 7H10 agar medium containing kanamycin (20 µg/ml), and zeocin (50 µg/ml). After 6 days of growth at 30 °C, several colonies were isolated and picked for PCR and sequencing analysis to confirm the desired recombinants. The MutMGlu256Asp, Fpg2Tyr50His, and XthATrp85Arg mutants were obtained by allelic exchange60. To perform and select the allelic replacement, linear DNA substrates were commercially synthesized (Twist Bioscience) as follows: a codon-optimized promoterless zeocin resistance cassette is flanked: i) upstream by 300–500 bp of the target sequence containing the mutation to be introduced into the chromosome and ii) by 300–500 bp of the region downstream of this sequence, to allow homologous recombination (fragment sequences are presented in Supplementary Table S4). Msm (pVJ53) competent cells, prepared as described above, were electroporated with 100 ng of DNA substrate and incubated in 1 ml of 7H9 broth with 0.2% of acetamide for 4 h at 37 °C at 150 rpm. Recombinant clones were selected after 5 days of growth at 37 °C on plates containing 50 μg/ml zeocin. PCR and sequencing were then used to verify the correct allelic exchange and the presence of the appropriate mutation on the chromosome (Supplementary Table S5).
Analyses of mutation frequencies
Msm cultures (50 mL) were grown in 7H10 to OD600nm = 0.7–1.0 and 100 µl of each culture was spread onto 7H10 containing rifampicin (100 µg /ml) or Isoniazid (100 µg/ml) to determine the number of spontaneous rifampicin/Isoniazid resistant mutants. Each of the cultures was serially diluted and 100 µl of the appropriate dilution was plated in triplicate on 7H10 media to determine the total number of cells (CFU/mL-Number of colonies*dilution factor) / volume of culture plate). The frequency was calculated as the ratio of RifampicinR/IsoniazidR mutants to the total number of cells. Statistical analysis (one-way ANOVA) was performed using n = 6 for each biological experiment in RStudio52.
Data availability
All Mycobacterium tuberculosis SRA sequences analyzed in this study, along with detailed information on lineage classification, drug resistance profiles, and identified mutations in 55 genes, have been made publicly available as a dataset on the Zenodo platform: https://doi.org/10.5281/zenodo.15076826.
References
Wirth, T. et al. Origin, spread and demography of the Mycobacterium tuberculosis complex. PLoS Pathog 4, e1000160 (2008).
Gagneux, S. Ecology and evolution of Mycobacterium tuberculosis. Nat. Rev. Microbiol. 16, 202–213 (2018).
Stritt, C. & Gagneux, S. How do monomorphic bacteria evolve? The Mycobacterium tuberculosis complex and the awkward population genetics of extreme clonality. Peer Community J 3, e92 (2023).
Namouchi, A. et al. The Mycobacterium tuberculosis transcriptional landscape under genotoxic stress. BMC Genomics 17, 791 (2016).
Singh, A. et al. Mechanistic Principles Behind Molecular Mechanism of Rifampicin Resistance in Mutant RNA Polymerase Beta Subunit of Mycobacterium tuberculosis. J. Cell. Biochem. 118, 4594–4606 (2017).
Belenky, P. et al. Bactericidal Antibiotics Induce Toxic Metabolic Perturbations that Lead to Cellular Damage. Cell Rep 13, 968–980 (2015).
Dupuy, P. et al. Distinctive roles of translesion polymerases DinB1 and DnaE2 in diversification of the mycobacterial genome through substitution and frameshift mutagenesis. Nat. Commun. 13, 4493 (2022).
Naz, S. et al. Compromised base excision repair pathway in Mycobacterium tuberculosis imparts superior adaptability in the host. PLoS Pathog 17, e1009452 (2021).
Oliver, A., Cantón, R., Campo, P., Baquero, F. & Blázquez, J. High frequency of hypermutable Pseudomonas aeruginosa in cystic fibrosis lung infection. Science 288, 1251–1254 (2000).
Prunier, A. et al. High Rate of Macrolide Resistance in Staphylococcus aureus Strains from Patients with Cystic Fibrosis Reveals High Proportions of Hypermutable Strains. J. Infect. Dis. 187, 1709–1716 (2003).
Román, F., Cantón, R., Pérez-Vázquez, M., Baquero, F. & Campos, J. Dynamics of Long-Term Colonization of Respiratory Tract by Haemophilus influenzae in Cystic Fibrosis Patients Shows a Marked Increase in Hypermutable Strains. J. Clin. Microbiol. 42, 1450–1459 (2004).
Tenaillon, O. et al. Tempo and mode of genome evolution in a 50,000-generation experiment. Nature 536, 165–170 (2016).
Benjak, A. et al. Phylogenomics and antimicrobial resistance of the leprosy bacillus Mycobacterium leprae. Nat. Commun. 9, 352 (2018).
Dos Vultos, T. et al. Evolution and Diversity of Clonal Bacteria: The Paradigm of Mycobacterium tuberculosis. PLoS ONE 3, e1538 (2008).
Rock, J. M. et al. DNA replication fidelity in Mycobacterium tuberculosis is mediated by an ancestral prokaryotic proofreader. Nat. Genet. 47, 677–681 (2015).
Castañeda-García, A. et al. A non-canonical mismatch repair pathway in prokaryotes. Nat. Commun. 8, 14246 (2017).
Naz, S. et al. GWAS and functional studies suggest a role for altered DNA repair in the evolution of drug resistance in Mycobacterium tuberculosis. eLife 12, e75860 (2023).
Cebrián-Sastre, E., Martín-Blecua, I., Gullón, S., Blázquez, J. & Castañeda-García, A. Control of Genome Stability by EndoMS/NucS-Mediated Non-Canonical Mismatch Repair. Cells 10, 1314 (2021).
Pérez-Martínez, D. E. & Zenteno-Cuevas, R. SNPs in genes related to the repair of damage to DNA in clinical isolates of M. tuberculosis: A transversal and longitudinal approach. PLOS ONE 19, e0295464 (2024).
Verboven, L., Phelan, J., Heupink, T. H. & Van Rie, A. TBProfiler for automated calling of the association with drug resistance of variants in Mycobacterium tuberculosis. PLOS ONE 17, e0279644 (2022).
Dalla Costa, E. R. et al. Mycobacterium tuberculosis of the RD Rio Genotype Is the Predominant Cause of Tuberculosis and Associated with Multidrug Resistance in Porto Alegre City, South Brazil. J. Clin. Microbiol. 51, 1071–1077 (2013).
Shanmugam, S. K. et al. Mycobacterium tuberculosis Lineages Associated with Mutations and Drug Resistance in Isolates from India. Microbiol. Spectr. 10, e01594–21 (2022).
Chen, P. E. & Shapiro, B. J. The advent of genome-wide association studies for bacteria. Curr. Opin. Microbiol. 25, 17–24 (2015).
Collins, C. & Didelot, X. A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination. PLOS Comput. Biol. 14, e1005958 (2018).
Murrell, B. et al. Gene-wide identification of episodic selection. Mol. Biol. Evol 32, 1365–1371 (2015).
Hopf, T. A. et al. Mutation effects predicted from sequence co-variation. Nat. Biotechnol. 35, 128–135 (2017).
Ashkenazy, H. et al. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res 44, W344–W350 (2016).
Ishino, S. et al. Activation of the mismatch-specific endonuclease EndoMS/NucS by the replication clamp is required for high fidelity DNA replication. Nucleic Acids Res 46, 6206–6217 (2018).
Rivera-Flores, I. V., Wang, E. X. & Murphy, K. C. Mycobacterium smegmatis NucS-promoted DNA mismatch repair involves limited resection by a 5'-3' exonuclease and is independent of homologous recombination and NHEJ. Nucleic Acids Res 52, 12308–12323 (2024).
Pletz, M. W., Hagel, S. & Forstner, C. Who benefits from antimicrobial combination therapy?. Lancet Infect. Dis. 17, 677–678 (2017).
Gifford, D. R. et al. Mutators can drive the evolution of multi-resistance to antibiotics. PLoS Genet 19, e1010791 (2023).
Oliver, A., Baquero, F. & Blázquez, J. The mismatch repair system (mutS, mutL and uvrD genes) in Pseudomonas aeruginosa: molecular characterization of naturally occurring mutants. Mol. Microbiol. 43, 1641–1650 (2002).
Schaaff, F., Reipert, A. & Bierbaum, G. An Elevated Mutation Frequency Favors Development of Vancomycin Resistance in Staphylococcus aureus. Antimicrob. Agents Chemother. 46, 3540–3548 (2002).
Oliver, A. Mutators in cystic fibrosis chronic lung infection: Prevalence, mechanisms, and consequences for antimicrobial therapy. Int. J. Med. Microbiol. 300, 563–572 (2010).
Chopra, I., O’Neill, A. J. & Miller, K. The role of mutators in the emergence of antibiotic-resistant bacteria. Drug Resist. Updat. 6, 137–145 (2003).
Gill, W. P. et al. A replication clock for Mycobacterium tuberculosis. Nat. Med. 15, 211–214 (2009).
Wielgoss, S. et al. Mutation rate dynamics in a bacterial population reflect tension between adaptation and genetic load. Proc. Natl. Acad. Sci. 110, 222–227 (2013).
Chitwood, M. H. et al. The recent rapid expansion of multidrug resistant Ural lineage Mycobacterium tuberculosis in Moldova. Nat. Commun. 15, 2962 (2024).
Hakamata, M. et al. Higher genome mutation rates of Beijing lineage of Mycobacterium tuberculosis during human infection. Sci. Rep. 10, 17997 (2020).
European Concerted Action on New Generation Genetic Markers and Techniques for the Epidemiology and Control of Tuberculosis Beijing/W Genotype Mycobacterium tuberculosis and Drug Resistance. Emerg. Infect. Dis. 12, 736–743 (2006).
Raynes, Y. & Sniegowski, P. D. Experimental evolution and the dynamics of genomic mutation rate modifiers. Heredity 113, 375–380 (2014).
Sprouffske, K., Aguilar-Rodríguez, J., Sniegowski, P. & Wagner, A. High mutation rates limit evolutionary adaptation in Escherichia coli. PLoS Genet 14, e1007324 (2018).
Comas, I. et al. Human T cell epitopes of Mycobacterium tuberculosis are evolutionarily hyperconserved. Nat. Genet. 42, 498–503 (2010).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinforma. Oxf. Engl 34, i884–i890 (2018).
Andrews, S. FastQC: A Quality Control Tool for High Throughput Sequence Data. (2010).
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinforma. Oxf. Engl 26, 589–595 (2010).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20, 1297–1303 (2010).
Coll, F. et al. PolyTB: A genomic variation map for Mycobacterium tuberculosis. Tuberculosis 94, 346–354 (2014).
Stucki, D. et al. Mycobacterium tuberculosis lineage 4 comprises globally distributed and geographically restricted sublineages. Nat. Genet. 48, 1535–1543 (2016).
Shitikov, E. et al. Evolutionary pathway analysis and unified classification of East Asian lineage of Mycobacterium tuberculosis. Sci. Rep. 7, 9227 (2017).
Napier, G. et al. Robust barcoding and identification of Mycobacterium tuberculosis lineages for epidemiological and clinical studies. Genome Med 12, 114 (2020).
R core team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. (2024).
Senelle, G., Guyeux, C., Refrégier, G. & Sola, C. TB-annotator: a scalable web application that allows in-depth analysis of very large sets of publicly available Mycobacterium tuberculosis complex genomes. 2023.06.12.526393 Preprint at https://doi.org/10.1101/2023.06.12.526393 (2023).
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments. PLoS ONE 5, e9490 (2010).
Wickham, H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. (2016).
Kozlov, A. M., Darriba, D., Flouri, T., Morel, B. & Stamatakis, A. RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics 35, 4453–4455 (2019).
Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res 49, W293–W296 (2021).
Paradis, E., Claude, J. & Strimmer, K. APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics 20, 289–290 (2004).
Oksanen, J. Vegan community ecology package version 2.6-2. (2022).
van Kessel, J. C. & Hatfull, G. F. Recombineering in Mycobacterium tuberculosis. Nat. Methods 4, 147–152 (2007).
Yan, M.-Y. et al. CRISPR-Cas12a-Assisted Recombineering in Bacteria. Appl. Environ. Microbiol. 83, e00947–17 (2017).
Montague, T. G., Cruz, J. M., Gagnon, J. A., Church, G. M. & Valen, E. CHOPCHOP: a CRISPR/Cas9 and TALEN web tool for genome editing. Nucleic Acids Res. 42, W401–W407 (2014).
Cirillo, J. Efficient Electro-transformation of Mycobacterium smegmatis. (2000).
Guilhot, C., Gicquel, B. & MartÃn, C. Temperature-sensitive mutants of the mycobacterium plasmid pAL5000. FEMS Microbiol. Lett 98, 181–186 (1992).
Acknowledgements
We thank Aizhamal Aitzhanova for her contributions to the initial GWAS analyses. We also thank Prof. Christophe Sola for his valuable discussions and feedback at the project's outset. R.Z. first received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 899987, then another funding from the from ANRS-MIE 2023-2, grant agreement No. 23534. A.L.M. benefits from a PhD fellowship from the ANRS-MIE 2022-2, grant agreement No. 22476.
Author information
Authors and Affiliations
Contributions
R.Z. and G.R. conceptualized the study design. A.L.M. developed the bioinformatics pipeline, while R.Z. conducted the bioinformatics analyses, computational work, statistical analysis, and laboratory experiments. S.S. supervised the laboratory experiments. G.R., H.M., and L.J. supervised the research process. R.Z. wrote the initial manuscript, performed the final editing, and created the figures. G.R., H.M., S.S., and L.J. contributed to the manuscript's review and editing.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zein-Eddine, R., Le Meur, A., Skouloubris, S. et al. Genome wide analyses reveal the role of mutator phenotypes in Mycobacterium tuberculosis drug resistance emergence. npj Antimicrob Resist 3, 35 (2025). https://doi.org/10.1038/s44259-025-00107-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s44259-025-00107-1