Abstract
The DNA replication machinery is an important target for antibiotic development in increasingly drug-resistant bacteria, including Mycobacterium tuberculosis1. Although blocking DNA replication leads to cell death, disrupting the processes used to ensure replication fidelity can accelerate mutation and the evolution of drug resistance. In Escherichia coli, the proofreading subunit of the replisome, the ɛ exonuclease, is essential for high-fidelity DNA replication2; however, we find that the corresponding subunit is completely dispensable in M. tuberculosis. Rather, the mycobacterial replicative polymerase DnaE1 itself encodes an editing function that proofreads DNA replication, mediated by an intrinsic 3′–5′ exonuclease activity within its PHP domain. Inactivation of the DnaE1 PHP domain increases the mutation rate by more than 3,000-fold. Moreover, phylogenetic analysis of DNA replication proofreading in the bacterial kingdom suggests that E. coli is a phylogenetic outlier and that PHP domain–mediated proofreading is widely conserved and indeed may be the ancestral prokaryotic proofreader.
Similar content being viewed by others
Main
In the model organism E. coli, DNA replication fidelity is determined by three main processes—nucleotide insertion fidelity by the DNA polymerase, removal of misincorporated nucleotides by the associated 3′–5′ proofreading exonuclease and post-replicative mismatch repair (MMR)—leading to a basal mutation rate of ∼10−10 mutations per base pair per generation2. Surprisingly, mycobacteria and all actinomycetes lack the genes encoding the MMR system3,4. Whereas E. coli MMR mutants show ∼100- to 1,000-fold increased mutation rates, the basal mutation rate of mycobacteria remains roughly equivalent to that of wild-type E. coli2,5.
In E. coli, DNA replication proofreading is performed by the 3′–5′ ɛ exonuclease encoded by the dnaQ gene2. The ɛ exonuclease subunit associates with the DNA polymerase PolIIIα in trans and excises misincorporated bases during DNA replication. Like E. coli, M. tuberculosis encodes an annotated dnaQ homolog (Rv3711c; Supplementary Fig. 1) that has been assumed to have an important role in replication fidelity6,7. We hypothesized that dnaQ might have a dominant role in replication fidelity in mycobacteria. Surprisingly, however, deletion of dnaQ did not result in an increase in the mutation rate in mycobacteria, as measured by fluctuation analysis (Fig. 1a,b). Mycobacteria encode a second potential dnaQ homolog (Ms4259; Supplementary Fig. 1). However, deletion of this second gene, either individually or in combination with the annotated dnaQ gene, did not increase the mutation rate (Fig. 1b). In addition, although purified M. tuberculosis DnaQ had 3′–5′ DNA exonuclease activity (Supplementary Fig. 2a,b), it did not stably associate with M. tuberculosis DnaE1 (DnaE1MTB) (Supplementary Fig. 2c), whereas E. coli DnaQ (ɛEC) formed a tight complex with E. coli PolIIIα (PolIIIαEC) (Supplementary Fig. 2d).
(a) The rates at which the indicated M. tuberculosis strains acquired resistance to rifampicin were measured by fluctuation analysis. Rv3711c is the annotated dnaQ gene. Circles represent mutant frequency (number of rifampicin-resistant mutants per cell plated in a single culture). Red squares represent the estimated mutation rates (mutations conferring rifampicin resistance per generation), with error bars representing the 95% confidence intervals. (b) Fluctuation analysis was performed with the indicated M. smegmatis strains as in a. Ms6275 is the annotated dnaQ gene, and Ms4259 is the next closest dnaQ homolog. (c) Alignment of the DNA polymerase PHP domains from the indicated species. The metal ion–coordinating residues are highlighted. (d) Real time primer extension activity of purified polymerases. Primer extension results in quenching of template fluorophore (star). WT, wild type; RFU, relative fluorescence units. (e) Vmax and Km measurements derived from three primer extension assays. DnaE1MTB incorporates nucleotides faster than PolIIIαEC. Data points indicate the mean; error bars, s.d. (f) Time course of 3′–5′ exonuclease activity on ssDNA. Wild-type DnaE1MTB shows robust exonuclease activity, whereas the PHP mutants Asp23Asn and Asp226Asn do not. Note the distinct digestion patterns of DnaE1MTB and ɛEC. (g) Primer extension assay as in d with a mismatched DNA substrate. Exonuclease-deficient polymerases cannot extend from mismatched DNA, whereas wild-type DnaE1MTB and PolIIIαEC + ɛEC activities are unaffected. (h) Gel analysis of primer extension reactions shows that extension from mismatched primers requires exonuclease activity, whereas extension activity on matched substrates (NA) is unaffected.
These data suggest that mycobacteria use an alternative exonuclease to ensure replicative fidelity. Whereas DnaE-type polymerases, including PolIIIαEC and DnaE1MTB, are thought to rely on proofreading provided in trans by an ɛ exonuclease subunit2,8, it has been demonstrated in vitro that the DnaE polymerases from two thermophiles harbor intrinsic 3′–5′ exonuclease activity in the polymerase and histidinol phosphatase (PHP) domain9,10. The function of PHP-domain exonuclease activity has remained unclear because Thermus species also contain an annotated dnaQ homolog8,9. Given our finding that dnaQ does not substantially contribute to replication fidelity in mycobacteria (Fig. 1a,b), we hypothesized that the PHP domain of DnaE1MTB encodes an intrinsic exonuclease activity that is the primary source of proofreading in this pathogen.
In the two thermophiles, PHP-domain exonuclease activity depends on metal ion coordination by nine conserved amino acids within the PHP domain9,10,11. These amino acids are conserved in DnaE1MTB (Fig. 1c and Supplementary Fig. 3) but are mutated in PolIIIαEC, where exonuclease activity is lost11. To determine whether DnaE1MTB has exonuclease activity, we purified recombinant wild-type DnaE1MTB and two mutants in which metal-coordinating residues were mutated (Asp23Asn and Asp226Asn) (Supplementary Fig. 4a). Purified wild-type and mutant DnaE1MTB showed similar gel filtration profiles (Supplementary Fig. 4b) and similar folding and thermal stability, as measured by circular dichroism (Supplementary Fig. 4c,d). Using a real-time primer extension assay11, we found that wild-type and PHP-mutant DnaE1MTB proteins also showed robust DNA polymerase activity (Fig. 1d). Indeed, under saturating nucleotide concentrations, the Vmax (maximal initial velocity) for DnaE1MTB in vitro was faster than that of PolIIIαEC (Fig. 1e).
To determine whether the PHP domain of DnaE1MTB has exonuclease activity, we monitored the cleavage of fluorescently labeled single-stranded DNA (ssDNA) oligonucleotides. Wild-type DnaE1MTB showed clear 3′–5′ exonuclease activity (Fig. 1f) but no 5′–3′ exonuclease activity (Supplementary Fig. 4e). In contrast, PHP-mutant DnaE1MTB proteins lacked exonuclease activity (Fig. 1f). In this assay, PHP-mediated exonuclease activity was distinct from that of ɛEC and appeared to pause at sites of predicted secondary structure in the ssDNA (Fig. 1f and data not shown).
To test the ability of DnaE1MTB to excise mismatches during DNA synthesis in vitro, we performed primer extension assays using double-stranded DNA (dsDNA) substrates containing either matched or mismatched 3′ primer termini. Wild-type DnaE1MTB extended from all substrates with similar efficiency (Fig. 1g,h). This activity did not appear to be mismatch extension because the use of a mismatched primer refractory to exonuclease activity could not be extended by DnaE1MTB (Supplementary Fig. 5a). In contrast to wild-type DnaE1MTB, the PHP mutants were unable to extend mismatched substrates (Fig. 1g,h). The behavior of the PHP-mutant DnaE1MTB proteins was very similar to that of PolIIIαEC in the absence of ɛEC, for which we observed almost no primer extension (Fig. 1g,h). Extension of a mismatched substrate could be rescued in the PHP mutants by the addition of exogenous ɛEC, suggesting that the defect in the ability of the PHP-mutant DnaE1MTB proteins to extend mismatched substrates is specific to their loss of exonuclease activity (Supplementary Fig. 5b). These data demonstrate that DnaE1MTB encodes an intrinsic 3′–5′ exonuclease activity that, at least in vitro, is capable of correcting mismatches.
We then assessed the importance of PHP-mediated exonuclease activity for DNA replication proofreading in vivo. Because dnaE1 is an essential gene12, we first determined the consequences of inducible overexpression of wild-type dnaE1 or alleles encoding the PHP mutants. Overexpression of wild-type dnaE1 did not increase the mutation rate (Fig. 2a,b). In contrast, overexpression of either M. tuberculosis or Mycobacterium smegmatis dnaE1 alleles encoding PHP mutants led to a dose-dependent increase in the mutation rate (Fig. 2a,b).
(a,b) Fluctuation analysis in M. smegmatis was performed as in Figure 1a for M. smegmatis (a) and M. tuberculosis (b) dnaE1 alleles. The indicated strains have both the wild-type endogenous dnaE1 allele and an anhydrotetracycline (ATc)-regulated (PTet) dnaE1 allele integrated at the L5 attB site. To enable comparison of protein levels, a dnaE1 allele encoding Myc-tagged protein under the control of the endogenous dnaE1 promoter was loaded under the “WT” lane. For the sake of simplicity, DnaE1MTB numbering has been used throughout. (c) Allele-exchange experiment in a ΔdnaE1 dnaE1::attB(L5) M. smegmatis strain. Plasmids carrying the indicated Myc-tagged dnaE1 alleles were tested for the ability to exchange for the resident attB-integrated plasmid in the parent strain. Error bars, s.d. from three experiments. (d) Growth of the indicated M. smegmatis strains. Scale bars, 4 mm.
We then used an allele-swapping system to replace the endogenous dnaE1 allele with either the wild-type dnaE1 allele or alleles encoding the PHP mutants. Both wild-type and PHP-mutant dnaE1 alleles could substitute for wild-type dnaE1, indicating that both were sufficient for viability (Fig. 2c). However, whereas wild type–complemented strains grew normally, strains complemented with alleles for PHP mutants were severely attenuated for growth (Fig. 2d). Because this growth defect precluded the use of fluctuation analysis to measure mutation rate, we instead performed mutant accumulation assays and enumerated the accumulation of mutations using whole-genome sequencing. In a mutant accumulation assay, we found that the basal mutation rate for M. smegmatis complemented with wild-type dnaE1 was ∼4.5 × 10−10 mutations per base pair per generation, consistent with data from fluctuation analysis and previously published estimates (Table 1 and Supplementary Fig. 6)13. In contrast, the mutation rate in the absence of PHP exonuclease activity was ∼1.0–1.7 × 10−6 mutations per base pair per generation, or ∼7–11 mutations per genome per generation, a ∼2,300- to 3,700-fold increase over the wild-type rate (Table 1). The mutational spectra in both the wild-type and PHP-mutant strains were notable for the relatively high frequency of insertion and deletion events14, which is consistent with lack of a functional MMR system in mycobacteria4.
We hypothesize that the growth defect in the PHP-mutant strains reflects a defect in DNA polymerase function and/or a large increase in the mutation rate that decreases strain fitness. Our data suggest that a defect in PHP-mutant DNA polymerase function is unlikely to be an artifact of protein folding or stability issues (Figs. 1d and 2a–c, and Supplementary Fig. 4). Rather, the growth defect may be due to DNA replication stalling as a result of the relative inability of DnaE1 to extend from misincorporated nucleotides (Fig. 1h). Alternatively, the growth defect could result from the increased mutation rate. In E. coli, the site-specific disruption of DnaQ (ɛEC) proofreading is lethal, but this lethality can be suppressed by overexpression of MutL or a PolIIIαEC antimutator allele, suggesting that increased mutation rate by itself can result in a severe fitness cost15.
Thus, we find that the PHP domain of DnaE1, not DnaQ, is the major replicative exonuclease and a critical determinant of DNA replication fidelity in mycobacteria. Bacterial pathogens under rapidly changing selective pressures often inactivate MMR and acquire a selective advantage by becoming hypermutable16. Because M. tuberculosis does not encode homologs of the MMR system3,4, we asked whether clinical M. tuberculosis strains increase their mutability through loss of PHP domain function or whether PHP domain–mediated proofreading serves a more essential function. Analysis of the dnaE1 sequences from ∼1,700 clinical M. tuberculosis isolates identified 3 missense SNPs found in ∼3% of all isolates (Supplementary Fig. 7a and Supplementary Table 1). By fluctuation analysis, we found one SNP (encoding DnaE1MTB Lys95Asn) in a single clinical M. tuberculosis isolate that caused a small (threefold) increase in the mutation rate; in no cases did mutations affecting the PHP domain fully abrogate PHP domain function (Supplementary Fig. 7b). Thus, PHP domain–mediated proofreading may be essential for M. tuberculosis pathogenesis.
We next sought to determine the distribution of these two contrasting mechanisms of DNA replication fidelity—the PHP domain and the E. coli–like ɛ exonuclease—in the bacterial kingdom (see Supplementary Table 2 for the ∼2,000 bacterial species analyzed). All bacterial replicative DNA polymerases have a PHP domain8. Using the conservation of all nine metal ion–coordinating residues as a proxy for an active PHP-domain exonuclease, we categorized replicative DNA polymerases as having either an 'active' or 'inactive' PHP domain. In addition, we also queried each bacterial species for the presence of an E. coli–like ɛ-exonuclease homolog. On the basis of sequence alignments of the ɛ-exonuclease homologs from gammaproteobacteria, an E. coli–like ɛ-exonuclease homolog was defined as a single-domain protein containing a DEDDh-family exonuclease domain followed immediately by a clamp-binding motif17. In a complementary analysis, which produced concordant results, we defined E. coli–like ɛ-exonuclease homologs more broadly by homology to a hidden Markov model (HMM) built from manually curated alignments of proteobacterial ɛ exonucleases18.
In agreement with a recent phylogenetic study of DNA PolIIIα homologs19, we found that the majority of replicative bacterial polymerases contain an active PHP exonuclease (Fig. 3a,b). However, the majority of bacterial classes did not have an E. coli–like ɛ exonuclease (Fig. 3a,b). Putative ɛ-exonuclease homologs were distributed between two well-defined groups (Fig. 3a), with only the higher-scoring group showing all the characteristics of an E. coli–like ɛ-exonuclease homolog. The lower-scoring group, which included the annotated dnaQ homologs in mycobacteria, appeared to encode 3′–5′ exonucleases but lacked the distinguishing characteristics of ɛEC (Supplementary Table 3 and data not shown). E. coli–like ɛ exonucleases appear to exist uniquely within the alpha-, beta- and gammaproteobacteria (Fig. 3b and Supplementary Table 3)11. In a subset of bacteria, an ɛ exonuclease–like domain has been either inserted into the PHP domain of the replicative DNA polymerase (constituting the PolC family of polymerases) or fused to the N terminus of a DnaE1-type DNA polymerase (Fig. 3b)19. In contrast to previous observations that suggest that DNA PolIIIα homologs may coordinately use both PHP-domain and ɛ exonuclease–domain activity19, we found that, although there were rare exceptions, broadly, the presence of an active PHP domain appeared to be mutually exclusive with the presence of an E. coli–like ɛ exonuclease or an ɛ exonuclease–like domain encoded within the polymerase (for example, PolC). Although these data do not preclude another role for the annotated ɛ-exonuclease homologs in DNA replication in species containing an active PHP domain, they suggest that the PHP domain is the most common replicative exonuclease in the bacterial kingdom and may be the ancestral prokaryotic proofreader.
(a) ɛ-exonuclease homologs identified by BLAST were compared to the E. coli–like ɛ-exonuclease dnaQ_proteo (TIGR01406) HMM model. The distribution of scores is shown. (b) Bacterial phylogenetic tree inferred from an alignment of 16S rRNA genes using RAxML. Subsets of eight species from each bacterial class (labeled on the outer ring of the tree) were chosen to represent the total organismal diversity within each class. Species are colored along the outer ring according to the legend. A subset of PolC-containing bacteria (purple) have PolC polymerases that have conserved all nine PHP-domain metal ion–binding residues in addition to having an ɛ-exonuclease domain inserted into the PHP domain. For this reason, the PolC-containing bacteria have been labeled “inactive/active?”.
Finally, bacterial DNA polymerases are active drug targets1 but have not been successfully targeted by nucleoside analogs, which are commonly used to treat cancer and viral infections20. Nucleoside analogs mimic their physiological counterparts and are incorporated into DNA and/or RNA to inhibit cellular division and viral replication20. There have been efforts to use adenosine analogs to treat M. tuberculosis, but these have had minimal success21. We reasoned that, in addition to imposing a severe fitness cost on its own (Fig. 2d), inhibition of PHP domain–mediated proofreading might sensitize mycobacteria to nucleoside analogs. We first identified nucleoside analogs that specifically disrupted the DNA synthesis mediated by PHP-mutant DnaE1 in vitro (Fig. 4a and data not shown), subsequently focusing on ara-A, a chain-terminating adenosine analog that can be phosphorylated to its active form in M. tuberculosis by adenosine kinase21. Both wild-type and PHP-mutant DnaE1 appeared to incorporate ara-A at similar rates in vitro (Fig. 4a). However, whereas wild-type DnaE1 efficiently removed incorporated ara-A, the PHP mutants remained blocked at sites of ara-A incorporation (Fig. 4a). Consistent with these in vitro results, ara-A had no activity against mycobacteria containing wild-type dnaE1 but was toxic to mycobacteria in which PHP activity was inhibited (Fig. 4b), presumably as a result of inhibition of DNA replication. Selective inhibition of PHP mutants by ara-A was specific, as the minimum inhibitory concentrations (MICs) of a panel of anti-mycobacterial agents showed no difference between the wild-type strain and the PHP mutants (Supplementary Table 4). Disruption of DNA synthesis by PHP-domain inhibition coupled with nucleoside analog treatment would represent a new mechanism of action for an antibiotic and a novel therapeutic option for drug-resistant M. tuberculosis.
(a) Primer extension analysis performed as in Figure 1h in the presence of 200 μM of the adenosine analog ara-A. Incorporation of ara-A impedes primer extension. Whereas wild-type DnaE1MTB can excise ara-A and resume DNA synthesis, the PHP mutant cannot. (b) Determination of the minimum inhibitory concentration (MIC) of ara-A for the indicated M. smegmatis strains. Pink indicates cellular respiration; blue indicates lack of respiration.
Methods
Media.
M. tuberculosis (H37Rv) or M. smegmatis (mc2155) were grown at 37 °C in Middlebrook 7H9 broth or on 7H10 plates supplemented with the appropriate antibiotics.
Bacterial strains and plasmids.
All M. tuberculosis strains are derivatives of H37Rv; all M. smegmatis strains are derivatives of mc2155 with the exception of the protein production strain, which is a derivative of mc24517. The bacterial strains are listed in Supplementary Table 5. All plasmids generated in this study are listed in Supplementary Table 6.
Allele-swapping experiments.
dnaE1 plasmids were transformed into strain JR19 (ΔdnaE1::hyg dnaE1::L5(zeo) [Ms3178]), plated on 7H10 plates supplemented with kanamycin and incubated at 37 °C for 4–5 d. Individual colonies were then patched on 7H10 plates containing either kanamycin or zeocin. Kanamycin-resistant, zeocin-sensitive colonies were scored as 'allele-swap' strains (strains in which the transformed dnaE1 plasmid encoding kanamycin resistance replaced the dnaE1 plasmid encoding zeocin resistance at attB). Kanamycin-resistant, zeocin-resistant colonies were scored as 'co-integrant' strains (strains in which the transformed dnaE1 plasmid encoding kanamycin resistance integrated adjacent to the dnaE1 plasmid encoding zeocin resistance at attB).
Mutant accumulation assay.
Mutant accumulation assay strains were generated by transforming the allele-swap strain JR19 (ΔdnaE1::hyg dnaE1::L5(zeo) [Ms3178]) with plasmid pJR23 (dnaE1::L5(kan) [Ms3178]), plasmid pJR161 (dnaE1-D25N + silent::L5(kan) [Ms3178]) or plasmid pJR87 (dnaE1-D228N + silent::L5(kan) [Ms3178]) and plating on 7H10 plates containing kanamycin. Plasmids pJR87 and pJR161 incorporated silent mutations flanking the PHP domain–targeting mutation to allow for the unambiguous scoring of any reversion events. Resulting colonies were then restreaked to single colonies on 7H10 plates containing kanamycin. Individual colonies were again streaked to single colonies on a 7H10 plate and confirmed for allele swapping on the basis of resistance to kanamycin and sensitivity to zeocin. The resulting patch from the 7H10 plate was used as the 'time 0' strain isolate for the mutant accumulation assay. For each genotype, 12 mutant accumulation assay lines originated from single ∼0.5-mm colonies isolated on 7H10 plates. Each line was streaked for single colonies on a 7H10 plate and incubated for 3–4 d (wild type complemented) or 8 d (PHP mutant complemented). This procedure was then followed repeatedly for the desired number of passages.
Estimation of generations in a colony.
The number of cells in a colony was determined by excising mc2155 colonies of an average diameter of ∼0.5 mm from agar plates, resuspending them in PBS with 0.05% Tween-80, generating a single-cell suspension by sonication and plating dilutions on 7H10 plates. The average number of cells in a ∼0.5-mm colony was 2.37 × 107 cells, with an s.d. of 6.7 × 106 cells, which corresponds to 24.5 generations. These estimates were confirmed by direct counting of the number of cells in a colony in a Petroff-Hausser counting chamber (VWR, 15170-048). Because of clumping of the PHP mutant cells, a similar estimate could not be generated for these strains.
Immunoblots.
DnaE1-Myc was detected using an antibody to Myc (71D10, Cell Signaling Technology) at a 1:1,000 dilution; HSP65 was detected using an antibody to HSP65 (BDI578, Abcam) at a 1:1,000 dilution. IRDye-800 anti-mouse and IRDye-680 anti-rabbit antibodies were used at 1:15,000 dilutions. Immunoblots were imaged on a Li-Cor Odyssey scanner.
Sequencing.
Genomic DNA was isolated from 10-ml M. smegmatis cultures using standard phenol-chloroform extraction techniques. Genomic DNA was quantified using a Qubit Fluorometer (Life Technologies), and libraries were prepared with the Illumina Nextera XT kit. Sequencing was performed using an Illumina MiSeq Desktop Sequencer with MiSeq Reagent Kit v2. Paired-end read sequencing was performed with read lengths of 101 bases. Mutant accumulation lines were covered to an average depth of 74× (range of 35–135×) and 96.3% genome coverage at a depth greater than 10× and mapping quality greater than or equal to 60.
SNP and indel calling.
The reference genome was mc2155 (NCBI, NC_008596). Sequencing reads were aligned to the reference genome using the BWA-MEM algorithm22. We then applied Genome Analysis Toolkit (GATK)23 base quality score recalibration, indel realignment and duplicate removal, and we performed SNP and indel discovery and genotyping simultaneously according to GATK Best-Practices recommendation21,24. An initial round of SNP calling on the original, non-recalibrated data was used to generate a set of 'known' SNPs for use in GATK tools that required previous SNP information. After GATK analysis, SNPs were filtered according to the following hard parameters: QD < 2.0, FS > 60.0, MQ < 50.0, MappingQualityRankSum < −12.5 and ReadPosRankSum < −8.0.
Mutation rate estimated from mutant accumulation assays.
The mutation rate was estimated from the number of SNPs and small indels (≤10 bp) observed across mutant accumulation lines. For the sake of simplicity, we assumed that the number of mutations (m) was an accurate assessment of the mutation rate of the strain during the course of the experiment. It is possible that the low fitness of the PHP-mutant strains may disallow mildly deleterious mutations. Moreover, it is possible that the PHP-mutant alleles retain a small amount of residual exonuclease activity. For these reasons, the estimate for m in the PHP-mutant strains may be an underestimate. The estimation of the mutation rate for a mutant accumulation strain was generated with the equation μ = m/(N × g). The per-base mutation rate (μ) is determined by the number of variants m (SNPs and indels) divided by the covered genome size (N) times the number of generations (g). m is defined by the number of variants observed, N is determined on the basis of 96.3% coverage of a 6,988,209-bp mc2155 genome and g is an estimate of the number of generations that occurred during passaging. Estimates of 95% CIs for the mutation rate were determined using the poissfit function in Matlab (MathWorks).
Estimation of mutation rates by fluctuation analysis.
Fluctuation analysis, rpoB target size determination and statistical comparisons of fluctuation analysis data were performed as previously described25.
Bacterial genomes and sequence data.
Bacterial genome sequences (RefSeq and Draft) were downloaded from NCBI (http://ftp.ncbi.nlm.nih.gov/genomes/). C-family DNA polymerases and ɛ exonuclease homologs were identified by performing protein sequence searches with BLAST26 against the protein database derived from the collected bacterial genomes. BLAST searches were run until convergence (E value = 1 × 10−5 inclusion threshold) using the following representatives as search probes: the E. coli DNA polymerase III α subunit (NP_414726.1); Bacillus subtilis DNA polymerase III PolC type (NP_389540); and E. coli DNA polymerase III ɛ subunit (NP_414751). If a DNA polymerase sequence contained an intein, it was excised before further analysis19. A small number of sequences were found to be fragmented, either owing to frameshifts presumably resulting from sequencing or assembly errors or misannotation of translational start sites—such sequences were removed from further analysis. Annotated 16S rRNA sequences were extracted from the RefSeq and Draft bacterial genomes with custom Perl scripts. For poorly represented bacterial classes, additional sequences were identified in the nr NCBI database with BLAST. See Supplementary Table 2 for the organisms and sequences used in this study.
Multiple-sequence alignments and analysis of sequence features.
Sequences were parsed according to taxonomical class. Protein sequence alignments were constructed with MAFFT27. DNA polymerases were classified as PolC, DnaE1 (the major DNA replicative polymerase in bacteria that do not use PolC) or DnaEn (which includes both the DnaE2 and DnaE3 classes) on the basis of score cutoffs from custom DNA polymerase HMMs or from previous classifications19. For DNA polymerases, the presence of an 'exonuclease-active' PHP domain was queried on the basis of the conservation of all nine essential metal ion–binding residues (His14, His16, Asp23, His48, Glu73, His107, Cys158, Asp226, His228; DnaE1MTB numbering) within the PHP domain using a regular expression. DNA polymerase sequences that encoded alterations within this motif were classified as having an 'exonuclease-inactive' PHP domain. Classification of a subset of PolC DNA polymerases was ambiguous—these sequences conserved all nine metal ion–binding residues but also had an ɛ exonuclease inserted into the PHP domain (characteristic of PolC-type DNA polymerases; see below). For this reason, such PolC sequences were classified as PHP domain 'exonuclease inactive/active?'. Sequence alignments were also used to classify ɛ homologs. An ɛ homolog was not identifiable (E value < 1 × 10−5) for some bacterial species. ɛ homologs were subdivided into one of thee categories: (i) an E. coli–like ɛ, which we defined by the presence of a single-domain protein containing a DEDDh-family exonuclease followed immediately by a clamp-binding motif (QXX[L/F/M/I]X); (ii) an ɛ homolog inserted into the PHP domain of a PolC-type DNA polymerase; and (iii) an ɛ homolog fused to the N terminus of a DnaE-type DNA polymerase. E. coli–like ɛ homologs were further confirmed on the basis of scoring (score cutoff > 210) as hits against the dnaQ_proteo (TIGR01406) model in the NCBI Conserved Domain Database (Supplementary Table 3)18. The only bacterial classes that contained E. coli–like ɛ exonucleases in our sequence collection (score > 210 when compared to TIGR01406) were the α, β and γ proteobacteria (Supplementary Table 3).
Phylogenetic analysis.
To facilitate phylogenetic comparison, a subset of species (eight each) from each bacterial class was chosen to represent the total organismal diversity within that class. 16S rRNA sequences were aligned with Infernal using a covarion model built from a high-quality reference alignment28. Initial phylogenetic relationships were constructed with FastTree with the default settings29. These trees were then used to guide the organism selection process on the basis of the phylogenetic diversity (branch length) that each organism contributed to the tree. The phylogenetic trees for publication were generated with RAxML with the −f a option and the GTRGAMMA model of nucleotide substitution to generate 1,000 bootstrap replicates followed by a search for the best-scoring maximum-likelihood tree30. Convergence was assessed using the −I autoMRE option in RAxML. Tree analysis and visualization were carried out with iTOL31.
M. tuberculosis clinical strain dnaE1 SNP analysis.
M. tuberculosis genome sequences32,33,34 were downloaded from repositories and converted to FASTQ files. Reads were aligned to the M. tuberculosis reference genome (NC_018143) with BWA-MEM22 with the default parameters. Data were further processed using Pilon (Broad Institute) with the “–tracks” setting. SNPs within the PHP domain of DnaE1 (Rv1547; amino acids 1–300; genome coordinates 1,747,700–1,748,599) that passed the Pilon filtering criterion were then extracted from VCF files. A small number of samples from the Casali et al.32 data set were found to be non–M. tuberculosis species and were excluded from the analysis.
Minimum inhibitory concentration determination.
MIC determination was performed as previously described35.
Protein expression and purification.
M. tuberculosis dnaE1 (Rv1547) was cloned into the pYUB28b vector (kindly provided by E. Baker, University of Auckland) and transformed into the electrocompetent M. smegmatis mc24517 strain (kindly provided by W. Jacobs, Albert Einstein College of Medicine). Cells were grown in ZYP-5052 culture medium36 at 37 °C and 200 rpm, collected at OD600 = 6.0 and stored at −80 °C. Wild-type and mutant DnaE1MTB proteins were purified using nickel-affinity, anion-exchange and gel filtration columns, and His tags were cleaved with HRV 3C protease. All purification steps were carried out in 50 mM HEPES, pH 7.5, 0.1–1.0 M NaCl and 2 mM DTT. Proteins were stored at −80 °C in 50 mM HEPES, pH 7.5, 150 mM NaCl and 2 mM DTT. E. coli PolIIIα and ɛ subunits were purified as described previously from BL21 DE3 E. coli37. To remove trace amounts of copurifying endogenous exonuclease, PolIIIα was incubated with an excess of ɛ peptide (residues 209–243) and purified using gel filtration.
Polymerase and exonuclease assays.
All assays were performed in 50 mM HEPES, pH 7.5, 50 mM potassium glutamate, 6 mg/ml BSA, 2 mM DTT and 2 mM magnesium acetate. Real-time primer extension assays were performed as described previously37, using 23 nM protein, 216 nM unlabeled DNA, 21.6 nM labeled DNA and 4 μM dGTP. To determine Vmax and Km values from DnaE1MTB and PolIIIαEC, incorporation rates were measured at 0–100 μM dGTP, and data were fitted in GraphPad Prism (Version 6 for Mac OSX) using the results from three independent experiments. For gel analysis, reactions were performed at 22 °C with 6 nM purified protein and 100 nM DNA substrate. Primer extensions were carried out for 5 min in the presence of 100 μM dNTP unless stated otherwise. For inhibition assays, 200 μM adenine-arabinofuranoside-5′-triphosphate (ara-A; Jena Bioscience) was added to the reaction. For exonuclease assays, samples were taken at different time points. Reactions were stopped in 50 mM EDTA, pH 7.4, separated on a denaturing 20% acrylamide gel and imaged with a Typhoon Imager (GE Healthcare).
Size exclusion chromatography.
Samples (50 μl) of the purified proteins were prepared at a 10 μM concentration and injected onto a Superdex 200 Increase 3.2/300 gel filtration column (GE Healthcare) pre-equilibrated in 50 mM HEPES, pH 7.5, 150 mM NaCl and 2 mM DTT.
DNA substrates.
DNA oligonucleotides were purchased from Integrated DNA Technologies (sequences are shown in Supplementary Table 7). 6-FAM–labeled oligonucleotides were purified using a denaturing 20% acrylamide gel. Substrates were annealed in 3× excess of unlabeled oligonucleotide and stored in 10 mM Tris-HCl, pH 8.0, and 1 mM EDTA at −20 °C.
Circular dichroism (CD) and thermal melt.
CD scans (185–260 nm) were performed in 10 mM KH2PO4, pH 7.0, 100 mM potassium fluoride and 1 mM DTT with a Jasco J-815 instrument using 200 μl of 1 μM dialyzed protein at 20 °C. After this, the protein samples were subjected to a thermal melt by increasing the temperature to 90 °C with steps of 1 °C/min. Structural changes were monitored at a wavelength of 222 nm.
Modeling of the DnaE1MTB PHP domain.
The model of the DnaE1MTB PHP domain was created using the program Modeller38 with the crystal structure of Taq PolIIIα (ref. 39) as a template. For this, a sequence alignment of DnaE1MTB and Thermus aquaticus PolIIIα was calculated with Clustal40 and ESPript41 using multiple DnaE1 and PolIIIα sequences from different bacterial species. Figures were prepared with PyMOL (PyMOL Molecular Graphics System, Version 1.5.0.4, Schrödinger).
References
Robinson, A., Causer, R.J. & Dixon, N.E. Architecture and conservation of the bacterial DNA replication machinery, an underexploited drug target. Curr. Drug Targets 13, 352–372 (2012).
Kunkel, T.A. & Bebenek, K. DNA replication fidelity. Annu. Rev. Biochem. 69, 497–529 (2000).
Mizrahi, V. & Andersen, S.J. DNA repair in Mycobacterium tuberculosis. What have we learnt from the genome sequence? Mol. Microbiol. 29, 1331–1339 (1998).
Springer, B. et al. Lack of mismatch correction facilitates genome evolution in mycobacteria. Mol. Microbiol. 53, 1601–1609 (2004).
Ford, C.B. et al. Use of whole genome sequencing to estimate the mutation rate of Mycobacterium tuberculosis during latent infection. Nat. Genet. 43, 482–486 (2011).
Farhat, M.R. et al. Genomic analysis identifies targets of convergent positive selection in drug-resistant Mycobacterium tuberculosis. Nat. Genet. 45, 1183–1189 (2013).
Cole, S.T. et al. Massive gene decay in the leprosy bacillus. Nature 409, 1007–1011 (2001).
McHenry, C.S. DNA replicases from a bacterial perspective. Annu. Rev. Biochem. 80, 403–436 (2011).
Stano, N.M., Chen, J. & McHenry, C.S. A coproofreading Zn2+-dependent exonuclease within a bacterial replicase. Nat. Struct. Mol. Biol. 13, 458–459 (2006).
Wing, R.A., Bailey, S. & Steitz, T.A. Insights into the replisome from the structure of a ternary complex of the DNA polymerase III α-subunit. J. Mol. Biol. 382, 859–869 (2008).
Barros, T. et al. A structural role for the PHP domain in E. coli DNA polymerase III. BMC Struct. Biol. 13, 8 (2013).
Sassetti, C.M., Boyd, D.H. & Rubin, E.J. Comprehensive identification of conditionally essential genes in mycobacteria. Proc. Natl. Acad. Sci. USA 98, 12712–12717 (2001).
Malshetty, V.S., Jain, R., Srinath, T., Kurthkoti, K. & Varshney, U. Synergistic effects of UdgB and Ung in mutation prevention and protection against commonly encountered DNA damaging agents in Mycobacterium smegmatis. Microbiology 156, 940–949 (2010).
Lee, H., Popodi, E., Tang, H. & Foster, P.L. Rate and molecular spectrum of spontaneous mutations in the bacterium Escherichia coli as determined by whole-genome sequencing. Proc. Natl. Acad. Sci. USA 109, E2774–E2783 (2012).
Fijalkowska, I.J. & Schaaper, R.M. Mutants in the Exo I motif of Escherichia coli dnaQ: defective proofreading and inviability due to error catastrophe. Proc. Natl. Acad. Sci. USA 93, 2856–2861 (1996).
Denamur, E. & Matic, I. Evolution of mutation rates in bacteria. Mol. Microbiol. 60, 820–827 (2006).
Dalrymple, B.P., Kongsuwan, K., Wijffels, G., Dixon, N.E. & Jennings, P.A. A universal protein-protein interaction motif in the eubacterial DNA replication and repair systems. Proc. Natl. Acad. Sci. USA 98, 11627–11632 (2001).
Haft, D.H. et al. TIGRFAMs: a protein family resource for the functional identification of proteins. Nucleic Acids Res. 29, 41–43 (2001).
Timinskas, K., Balvočiūtė, M., Timinskas, A. & Venclovas, č. Comprehensive analysis of DNA polymerase III α subunits and their homologs in bacterial genomes. Nucleic Acids Res. 42, 1393–1413 (2014).
Jordheim, L.P., Durantel, D., Zoulim, F. & Dumontet, C. Advances in the development of nucleoside and nucleotide analogues for cancer and viral diseases. Nat. Rev. Drug Discov. 12, 447–464 (2013).
Long, M.C. et al. Structure-activity relationship for adenosine kinase from Mycobacterium tuberculosis. Biochem. Pharmacol. 75, 1588–1600 (2008).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Ford, C.B. et al. Mycobacterium tuberculosis mutation rate estimates from different lineages predict substantial differences in the emergence of drug-resistant tuberculosis. Nat. Genet. 45, 784–790 (2013).
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Katoh, K., Misawa, K., Kuma, K.-I. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002).
Nawrocki, E.P. & Eddy, S.R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013).
Price, M.N., Dehal, P.S. & Arkin, A.P. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Letunic, I. & Bork, P. Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Res. 39, W475–W478 (2011).
Casali, N. et al. Evolution and transmission of drug-resistant tuberculosis in a Russian population. Nat. Genet. 46, 279–286 (2014).
Zhang, H. et al. Genome sequencing of 161 Mycobacterium tuberculosis isolates from China identifies genes and intergenic regions associated with drug resistance. Nat. Genet. 45, 1255–1260 (2013).
Comas, I. et al. Out-of-Africa migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans. Nat. Genet. 45, 1176–1182 (2013).
Franzblau, S.G. et al. Rapid, low-technology MIC determination with clinical Mycobacterium tuberculosis isolates by using the microplate Alamar Blue assay. J. Clin. Microbiol. 36, 362–366 (1998).
Studier, F.W. Protein production by auto-induction in high-density shaking cultures. Protein Expr. Purif. 41, 207–234 (2005).
Toste Rêgo, A., Holding, A.N., Kent, H. & Lamers, M.H. Architecture of the Pol III–clamp-exonuclease complex reveals key roles of the exonuclease subunit in processive DNA synthesis and repair. EMBO J. 32, 1334–1343 (2013).
Eswar, N. et al. Comparative protein structure modeling using MODELLER. Curr. Protoc. Protein Sci. Chapter 2, Unit 2.9 (2007).
Bailey, S., Wing, R.A. & Steitz, T.A. The structure of T. aquaticus DNA polymerase III is distinct from eukaryotic replicative DNA polymerases. Cell 126, 893–904 (2006).
Larkin, M.A. et al. Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947–2948 (2007).
Gouet, P. ESPript/ENDscript: extracting and rendering sequence and 3D information from atomic structures of. Nucleic Acids Res. 31, 3320–3323 (2003).
Acknowledgements
We thank E. Rubin, B. Bloom, D. Boyd, J. McKenzie, D. Warner and B. Javid for comments, B. Jacobs (Albert Einstein College of Medicine) and M. Wilmans (European Molecular Biology Laboratory) for bacterial strains, and T. Baker (University of Auckland) for plasmids. This work was supported by a Helen Hay Whitney fellowship to J.M.R., US National Institutes of Health Director's New Innovator Award 1DP20D001378, subcontracts from National Institute of Allergy and Infectious Diseases (NIAID) U19AI076217 and AI109755-01, the Doris Duke Charitable Foundation under grant 2010054 to S.M.F. and a UK Medical Research Council grant to M.H.L. (MC_U105197143).
Author information
Authors and Affiliations
Contributions
J.M.R., U.F.L., M.H.L. and S.M.F. designed the project and wrote the manuscript. M.R.C. performed phylogenetic analyses. C.B.F. and E.R.G. made strains and measured mutation rates. R.G., M.C. and S.G. contributed sequencing data.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Mycobacterium tuberculosis (Mtb) and Mycobacterium smegmatis (Msmeg) contain two ɛ (dnaQ) exonuclease homologs.
(a) Sequence alignment of the ɛ-exonuclease homologs from four different species. Conserved catalytic residues of E. coli ɛ are indicated by blue triangles below the sequences. The clamp-binding motif of E. coli ɛ is boxed in green. (b) Active site of the E. coli ɛ exonuclease. (c) Computational model of the active site of Mtb Rv3711c. (d) Computational model of the active site of Mtb Rv2191.
Supplementary Figure 2 Mycobacterium tuberculosis Rv3711c (Rv3711cMTB) is a 3′–5′ DNA exonuclease but does not form a stable complex with DnaE1MTB.
(a) Coomassie-stained gel showing purified E. coli ɛ (ɛEC), DnaE1MTB and Rv3711cMTB. (b) Gel showing a 3′–5′ exonuclease activity assay with ɛEC, DnaE1MTB and Rv3711cMTB. (c) Analytical size exclusion chromatography shows that E. coli PolIIIα (PolIIIαEC) and ɛEC form a stable complex at concentrations as low as 1.5 μM. (d) In contrast, DnaE1MTB and Rv3711cMTB do not show any interaction, even at 10 μM protein concentration (all equimolar amounts).
Supplementary Figure 3 PHP active sites in bacterial replicative DNA polymerases.
(a) Alignment of the PHP domain sequences from replicative DNA polymerases. Conserved metal-binding residues of the PHP domain are indicated by blue triangles below the sequences. Cyan squares indicate residues in E. coli that deviate from the consensus metal-binding motif. (b) Computational model of the Mtb DnaE1 PHP domain based on the crystal structure of T. aquaticus PolIIIα (shown in c). Black circles indicate residues mutated for the experiments performed in this study. (c) The PHP domain active site of T. aquaticus PolIIIα. (d) The PHP domain active site of E. coli PolIIIα. Underlines indicate residues in E. coli that deviate from the consensus metal-binding motif.
Supplementary Figure 4 Mycobacterium tuberculosis DnaE1 wild-type (DnaE1MTB WT) and PHP mutants are properly folded.
(a) SDS-PAGE analysis of purified proteins. Each lane contains 3.5 pmol (~0.5 μg) protein. The gel was stained with Coomassie Brilliant Blue. (b) Purified proteins do not show any aggregation, as judged by size exclusion chromatography. The arrow indicates the void volume (at 0.8 ml). For clarity, graphs are shifted vertically by 150 mAU. (c) Circular dichroism spectra show that DnaE1MTB and PHP mutants are properly folded. (d) Thermal denaturation curves show that WT and mutant DnaE1 have similar melting temperatures of 45–50 °C. (e) Time course of exonuclease activity on single-stranded DNA. DnaE1MTB WT shows robust 3′–5′ exonuclease activity but not 5′–3′ exonuclease activity.
Supplementary Figure 5 Primer extension from mismatched substrates requires exonuclease activity.
(a) Primer extension from mismatched substrates by Mycobacterium tuberculosis DnaE1 wild-type (DnaE1MTB WT) and E. coli PolIIIa (PolIIIαEC) + ɛEC is blocked by a phosphorothioate linkage (denoted by -S-) that is resistant to exonuclease activity. In contrast, a matched primer with a terminal phosphorothioate linkage can be extended normally. (b) Addition of ɛ EC exonuclease in trans allows DnaE1MTB PHP mutants to extend from mismatched DNA substrates.
Supplementary Figure 6 The per–base pair mutation rate of Mycobacterium smegmatis estimated from fluctuation analysis.
(a) Fluctuation analysis was used to determine the rate at which wild-type M. smegmatis acquired resistance to rifampicin. Circles represent the mutant frequency (number of rifampicin-resistant mutants per cell plated in a single culture). The red bar represents the estimated mutation rate (mutations conferring rifampicin resistance per generation), with error bars representing the 95% confidence interval (CI). (b) The number of mutations in rpoB (Ms1367) that confer rifampicin resistance in our fluctuation analysis was determined by sequencing 150 independent rifampicin-resistant isolates. This analysis identified ten unique mutations. The per–base pair mutation rate, μin vitro, was determined by dividing μrifampicin by the target size.
Supplementary Figure 7 Loss-of-function mutations in the dnaE1 PHP domain are rarely found in clinical Mycobacterium tuberculosis isolates.
(a) dnaE1 (Rv1547) PHP domain SNPs observed in clinical Mtb isolates. SNP prevalence refers to the number of clinical strains containing the indicated SNP as compared to the total number of clinical strains analyzed. See Supplemental Table 1 for additional information. (b) Fluctuation analysis was used to determine the rates at which the indicated M. smegmatis strains acquired resistance to rifampicin. With the exception of wild-type M. smegmatis, these strains harbor a deletion of the endogenous dnaE1 (Ms3178) gene and have been complemented with the indicated M. tuberculosis dnaE1 (Rv1547) gene. Circles represent the mutant frequency (number of rifampicin-resistant mutants per cell plated in a single culture). The red bar represents the estimated mutation rate (mutations conferring rifampicin resistance per generation), with error bars representing the 95% confidence interval (CI). *P < 0.05 in comparison of mutant frequencies by Wilcoxon rank-sum test.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–7 and Supplementary Tables 5–7. (PDF 1068 kb)
Supplementary Table 1
dnaE1 (Rv1547 ) PHP domain SNPs in clinical M. tuberculosis isolates. (XLSX 13 kb)
Supplementary Table 2
Protein sequences used in phylogenetic analysis. (XLS 4806 kb)
Supplementary Table 3
HMMER comparison of ε hoologs to TIGR01406 (dnaQ_proteo). (XLSX 1091 kb)
Supplementary Table 4
Drug minimum inhibitory concentrations (μg/ml). (XLSX 8 kb)
Rights and permissions
About this article
Cite this article
Rock, J., Lang, U., Chase, M. et al. DNA replication fidelity in Mycobacterium tuberculosis is mediated by an ancestral prokaryotic proofreader. Nat Genet 47, 677–681 (2015). https://doi.org/10.1038/ng.3269
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/ng.3269
This article is cited by
-
Distinctive roles of translesion polymerases DinB1 and DnaE2 in diversification of the mycobacterial genome through substitution and frameshift mutagenesis
Nature Communications (2022)
-
Polymerization and editing modes of a high-fidelity DNA polymerase are linked by a well-defined path
Nature Communications (2020)
-
An array of basic residues is essential for the nucleolytic activity of the PHP domain of bacterial/archaeal PolX DNA polymerases
Scientific Reports (2019)
-
A non-canonical mismatch repair pathway in prokaryotes
Nature Communications (2017)
-
Self-correcting mismatches during high-fidelity DNA replication
Nature Structural & Molecular Biology (2017)