Escherichia coli phylogeny drives co-amoxiclav resistance through variable expression of TEM-1 beta-lactamase

Matlock, William; Rodger, Gillian; Pritchard, Emma; Colpus, Matthew; Kapel, Natalia; Barrett, Lucinda; Morgan, Marcus; Oakley, Sarah; Hopkins, Katie L.; Roohi, Aysha; Karageorgopoulos, Drosos; Avison, Matthew B.; Walker, A. Sarah; Lipworth, Samuel; Stoesser, Nicole

doi:10.1038/s41467-025-63714-6

Download PDF

Article
Open access
Published: 30 September 2025

Escherichia coli phylogeny drives co-amoxiclav resistance through variable expression of TEM-1 beta-lactamase

Nature Communications volume 16, Article number: 8669 (2025) Cite this article

298 Accesses
10 Altmetric
Metrics details

Subjects

Abstract

Co-amoxiclav (amoxicillin and clavulanate) is a commonly used combination antibiotic, with resistance in Escherichia coli associated with increased mortality. The class A beta-lactamase bla_TEM-1 is often carried by resistant E. coli but exhibits high phenotypic heterogeneity, complicating genotype-phenotype predictions. We curated a dataset of n = 377 diverse E. coli isolates where the only acquired beta-lactamase was bla_TEM-1. We generated hybrid assemblies and co-amoxiclav minimum inhibitory concentrations (MICs), and bla_TEM-1 qPCR expression data for a subset (n = 67/377). We first tested whether intrinsic expression of bla_TEM-1 varied between E. coli lineages, for example, from regulatory system differences, which are challenging to genomically quantify. Using genotypic features, we built a hierarchical Bayesian model for bla_TEM-1 expression, controlling for phylogeny. Expression varied across the phylogeny, with some lineages (phylogroups B1 and C, ST12) expressing bla_TEM-1 more than others (phylogroups E and F, ST372). Next, we built a second model to predict isolate MIC from genotypic features, again controlling for phylogeny. Phylogeny alone shifted MIC past the clinical breakpoint in 19% (55/292) of isolates with greater-than-chance probability, mostly representing ST12, ST69 and ST127. A third causal model confirmed that phylogenetic influence on bla_TEM-1 expression drove variation in MIC. We speculate that intergenic variation underlies this effect.

Variants of β-lactamase-encoding genes are disseminated by multiple genetically distinct lineages of bloodstream Escherichia coli

Article Open access 01 July 2025

Evolution of extended-spectrum β-lactamase-producing ST131 Escherichia coli at a single hospital over 15 years

Article Open access 26 August 2024

Integrating computational approaches to uncover β-lactamase-associated resistance in diarrheagenic Escherichia coli from pediatric patients

Article Open access 22 August 2025

Introduction

The class A beta-lactamase bla_TEM-1 was first identified in 1965 in a clinical Escherichia coli isolate¹. Originally, it was mobilised by two of the earliest named transposons, Tn2 and Tn3, located on plasmids². In the decades since, the genetic context of bla_TEM-1 has evolved³, and other mobile genetic elements such as IS26⁴ and a diverse array of plasmids⁵ contribute to its dissemination. At the time of writing, NCBI contains over 220,000 unique isolates carrying bla_TEM-1 distributed across 26 genera, including the common clinical pathogens Escherichia coli, Klebsiella pneumoniae and Acinetobacter baumannii⁶. The emergence and dissemination of beta-lactam resistance has been a major healthcare challenge⁷, and bla_TEM-1 represents a key example.

In the UK, beta-lactam and beta-lactamase inhibitor combinations such as co-amoxiclav (amoxicillin and clavulanate), are commonly used as a first-line treatment for severe infections⁸. For Enterobacterales, the current EUCAST co-amoxiclav minimum inhibitory concentration (MIC) clinical breakpoint for resistance is 8/2 μg/mL⁹ across all indications, with a recent study concluding that empiric co-amoxiclav treatment of E. coli bacteraemia with MICs > 32/2 μg/mL was associated with significantly higher mortality¹⁰. However, the carriage of bla_TEM-1 is associated with high phenotypic heterogeneity, making genotype-phenotype predictions challenging.

Small-scale, experimental bla_TEM-1 systems have demonstrated that the interplay of location (plasmid or chromosome) and copies in the genome, through varying dosage, contributes to variable resistance^11,12. In addition, other determinants such as mutations in the promoter of bla_TEM-1¹³ and the chromosomally intrinsic ampC gene¹⁴, and efflux pumps¹⁵, are associated with E. coli beta-lactam resistance. Moreover, different regulatory systems^16,17, epistasis (interaction between genes)^18,19 and epigenetics (heritable phenotypic changes without alterations to the underlying DNA sequence)²⁰, might also influence co-amoxiclav resistance. For example, five different E. coli strains carrying the same pLL35 plasmid (which carries bla_CTX-M-15 and bla_TEM-112) varied in cefotaxime resistance²¹. Likewise, the introduction of a pOXA-48, a common conjugative plasmid in carbapenem-resistant clinical Enterobacterales, to six different E. coli strains resulted in variable co-amoxiclav resistance²². This indicates that strain background plays a role in resistance.

Successful genotype-to-phenotype prediction requires a comprehensive understanding of not only individual resistant determinants but also their combined effects. Moreover, this understanding must be translated to clinically relevant pathogens. Yet, to accurately model resistance in these systems, a large sample of linked genomic and phenotypic data is required, which until recently has been limited by sequencing technology and costs.

In this study, we curated and completely reconstructed the genomes of 377 clinical E. coli bacteraemia isolates to reflect a real-world but relatively simple genetic scenario where the only acquired beta-lactamase gene identified was bla_TEM-1, all identical at the amino acid level. We quantified the co-amoxiclav MICs for these isolates and generated bla_TEM-1 qPCR expression data for a subset. We then modelled bla_TEM-1 expression and co-amoxiclav MIC whilst controlling for confounding genetic mechanisms and chromosomal phylogeny, and characterised differences in intergenic content between lineages that may be contributing to differential phenotypic effects.

Results

A curated dataset of E. coli isolates with hybrid assemblies and co-amoxiclav MICs

We began with n = 548 candidate E. coli isolates, which following hybrid assembly, were curated into a final dataset of n = 377/548 (see “Methods” and Supplementary Information). In total, 77% (291/377) of assemblies were complete (all contigs were circularised), with the remaining 27% (86/377) having at least a circularised chromosome to confidently distinguish between chromosomal and plasmid-associated bla_TEM-1. Assemblies contained median = 3 (IQR = 2–5) plasmid contigs.

We identified n = 451 bla_TEM-1 genes on 431 contigs (13% [58/431] chromosomal versus 87% [373/431] plasmid). Isolates carried a median = 1 copy of bla_TEM-1 (range = 1–6). Carrying more than one copy of bla_TEM-1 on a single contig was rare: of all bla_TEM-1-positive contigs, 97% (400/412) versus 3% (12/412) had no duplications versus at least one. The bla_TEM-1 genes had synonymous single-nucleotide polymorphisms (SNPs) in positions 18, 138, 228, 396, 474, 705 and 717, totalling n = 7 single-nucleotide variant (SNV) profiles across the replicons, yet diversity was dominated by bla_TEM-1b at 73% (329/451; SNV profile TATTTCG; see “Methods”)²³. Where bla_TEM-1-positive contigs had at least two copies of bla_TEM-1, they were almost always the same SNV duplicated (11/12).

By examining the genomic arrangement of bla_TEM-1 (namely the replicons it was found on as well as any copies), we found most isolates carried a single non-chromosomal copy (73% [277/377]; Fig. 1a, b). More generally, whilst the plasmid contigs totalled only 3.6% of the total sequence length (bp) across the assemblies (71,126,646 bp/1,969,804,202 bp), they carried 85.8% of the bla_TEM-1 genes (387/451). Such bla_TEM-1-carrying plasmids were represented across the E. coli phylogeny (Fig. 1c). Overall, the dataset comprised 5.6% (21/377) phylogroup A, 8.0% (30/377) B1, 48.5% (183/377) B2, 4.8% (18/377) C, 27.3% (103/377) D, 0.5% (2/377) E, 4.2% (16/377) F and 1.1% (4/377) G. In total, we manually corrected n = 5 EzClermont phylogroup classifications using a chromosomal core gene phylogeny (see “Methods”): OXEC-108 (G to D), OXEC-317 (B2 to D), OXEC-333 (U to B1), OXEC-344 (U to B1) and OXEC-406 (U to B1). The EzClermont publication presented a 98.4% (123/125) true-positive rate on their validation set, which is in line with our 98.7% (372/377)²⁴.

Fig. 1: A genotypically and phenotypically heterogenous population of *bla*_TEM-1-carrying *E. coli.*

Five known upstream promoters modulate the expression of bla_TEM-1: P3, Pa/Pb, P4, P5 and Pc/Pd ^13,25. We linked 91% (409/451) bla_TEM-1 genes to a promoter immediately upstream, of which a majority, 64% (262/409), were identical to the P3 reference. More generally, excluding n = 2 different Pc/Pd-like promoters which have large deletions, we identified SNPs in positions 32, 43, 65, 141, 162 and 175, totalling n = 8 SNV profiles (by Sutcliffe numbering²⁶). Notably, 15% (39/262) of promoters had the Pa/Pb-associated C32T mutation, which produces two overlapping promoter sequences. Figure S1 visualises the joint distribution of isolate phylogroup and linked bla_TEM-1 promoter SNV.

Isolates were associated with a diverse range of co-amoxiclav MICs (μg/mL; ≤2/2 [4 (1.1%)], 4/2 [24 (6.4%)], 8/2 [144 (38.2%)], 16/2 [86 (22.8%)], 32/2 [44 (11.7%)] and >32/2 [75 (19.9%)]; Fig. 1d; see “Methods”). Figure S2 visualises the joint distribution of isolate phylogroup and MIC. Figure S3 visualises the distribution of bla_TEM-1 genome and cell copy number, bla_TEM-1 expression, and co-amoxiclav MIC against the chromosomal core gene phylogeny.

bla _TEM-1 associated with conjugative plasmids

Whilst chromosomal copies of bla_TEM-1 can remain with a lineage over time, plasmidic copies might come and go. This could give the host cell access to a transient boost in resistance without impeding long-term fitness.

Confining the analysis to circularised plasmids (1036/1512), in silico replicon typing revealed the most common plasmid families to be ColRNAI-like, Col156-like, B/O/K/Z-like and Col(MG828)-like at 11% (117/1036), 7.4% (77/1036), 6.8% (70/1036) and 5.2% (54/1036), respectively. All other plasmid families had fewer than 50 representatives. Figure S4 visualises the relationship between strain background (sequence type and phylogroup) and replicon types (PlasmidFinder output; see “Methods”) for the circularised plasmids. Using a 2-sample test for equality of proportions with continuity correction, the bla_TEM-1-positive plasmids (333/1036) were significantly more likely to be putatively conjugative (85.9% [286/333]) compared to the bla_TEM-1-negative plasmids (28.4% [200/703]; prop = 200/703; χ² = 297.02, df = 1, p-value < 2.2e-16). Moreover, for the most common genomic arrangement of bla_TEM-1 (i.e., a single plasmid [277/377]), amongst circularised plasmids (252/277), 90% were putatively conjugative (227/252).

For a genome carrying bla_TEM-1 on the chromosome, the gene’s copy number and total number of genes in the genome are equivalent. For a genome carrying bla_TEM-1 on a plasmid, this might not be the case. This is because plasmids can exist as multiple copies. The calculated copy number of all plasmidic contigs (n = 1512) was median = 3.13 (range=0.04-57.00). Of these, n = 19/1512 contigs (with 6/19 circularised) had calculated copy numbers less than one (see “Methods”). This was potentially due to uneven short-read coverage. However, none carried bla_TEM-1 so were not used in the later modelling. Taking the circularised plasmids with copy number at least one (1030/1512), longer plasmids (> 10kbp; 668/1030) were generally low copy number (median = 2.36), whilst shorter plasmids (≤ 10kbp; 362/1030) were generally high copy number (median = 11.01, see Fig. S5), consistent with previous studies²⁷.

E. coli phylogeny shapes bla _TEM-1 expression

Within different E. coli lineages, bla_TEM-1 and its promoter are potentially subject to different regulatory systems and epigenetic interactions, which may in turn affect bla_TEM-1 expression. To test this, we first selected a random subsample of n = 67/377 isolates with a single copy of bla_TEM-1 in the genome, either on a chromosome (15/67) or a plasmid (52/67). Moreover, we only selected isolates with zero, one, or two mutations in the bla_TEM-1 promoter sequence: C32T, a well-studied but rare mutation which produces two overlapping promoters and is known to increase expression¹³, and G175A, a less studied but common mutation in our dataset (according to Sutcliffe numbering based on the PBR322 plasmid²⁶; see Table 1). Focussing on only two mutations enabled us to statistically explore the effect of their interaction. Isolates were distributed across the entire E. coli phylogeny (phylogroup A [6/67], B1 [4/67], B2 [40/67], C [4/67], D [8/67], E [1/67] and F ([4/67]). We then performed qPCR to evaluate for bla_TEM-1 expression (see “Methods”). Every isolate had at least two replicates (2 [48/67], 4 [1/67], or 9 [18/67), giving a total of n = 262 bla_TEM-1 ΔCt observations (TEM-1 Ct – 16S Ct; see “Methods”) for modelling.

Table 1 Replicon distribution of n = 67 bla_TEM-1 promoter variants

Full size table

To test for the effects of E. coli lineage, we built a maximum likelihood core gene phylogeny for all n = 377 chromosomes (see “Methods”). In total, we identified 17,836 gene clusters, of which 18.7% (3342/17,836) were core genes (those found in ≥98% of chromosomes). The phylogeny (midpoint-rooted and restricted to the n = 67/377 isolates in the expression analysis) is given in Fig. 2a. Using b = 1000 ultrafast bootstraps, all phylogroup node supports were 100%, and more generally, 76.7% (287/374) of internal node supports were 100%, and 87.4% (327/374) were at least 95% (see “Methods”). Moreover, the Robinson-Foulds distance between the ML tree and consensus tree was 4, indicating nearly identical topology.

**Fig. 2: Intrinsic expression of *bla*_TEM-1 shapes co-amoxiclav MIC across the *E. coli* phylogeny.**

Briefly, the expression linear mixed model employed Markov Chain Monte Carlo (MCMC) to estimate parameters. The response variable bla_TEM-1 ΔCt (normalised and 95th percentile truncated) was related to the fixed effects (i) bla_TEM-1 cell copy number (normalised), (ii) presence of the C32T promoter mutation, (iii) presence of the G175A mutation, and (iv) their interaction. Random effects were incorporated to account for qPCR replicates and

phylogenetic relationships between isolates. See Supplementary Information for model specification, outputs and diagnostics.

In decreasing order of effect size, C32T, G175A, and a one unit increase in contig copy number all increased expression (decreased ΔCt; Table 2). There was no additional effect of G175A if C32T was also present (−1.69 < −1.71). After accounting for these covariates, we still identified a contribution from isolate phylogeny. The posterior for contribution of variance from phylogeny demonstrated a long right tail (mean = 0.07; 95% highest posterior density, HPD = [0.00, 0.21]; see “Methods”), suggestive of heterogeneity in phylogenetic signal, where deeper splits between major lineages may explain disproportionately large differences in expression. For qPCR replicates, the contribution of variance exhibited minimal skew (mean = 0.15; 95% HPD = [0.08, 0.23]). To investigate this further, we computed the posterior mean and 95% HPD credible interval for each tip in the phylogeny (Fig. 2c). Compared to the average across the E. coli phylogeny, some phylogroups (B1, C) and STs (12) were associated with increased bla_TEM-1 expression, whilst some phylogroups (E, F) and STs (372) were associated with decreased bla_TEM-1 expression.

Table 2 Parameter estimates for bla_TEM-1 delta cycle threshold (ΔCt) genotype-phenotype model

Full size table

ampC gene variation is highly concordant with E. coli lineage

In many Gram-negative species, chromosomal ampC is regulated by the transcriptional activator AmpR and is inducible in the presence of beta-lactams. However, E. coli lacks ampR and therefore expresses ampC constitutively; overproduction depends on mutations in the promoter or attenuator regions. At the time of writing, the beta-lactamase database (BLDB) contains n = 4915 non-synonymous variants of the gene²⁸.

To quantify how well ampC variants agree with phylogroup and ST, we calculated the homogeneity (h) and completeness (c; both range from 0 to 1; see “Methods”). Briefly, h = 1 means that a phylogroup or ST contains a single ampC variant. Conversely, c = 1 means that all instances of an ampC variant fall within the same phylogroup or ST. For phylogroups, we found h = 0.489 and c = 0.964, and for STs (excluding 38/377, which were unassigned), h = 0.938 and c = 0.877. Overall, this suggests that phylogroups tend to contain distinct ampC variants, which are generally ST-specific, and overall, that E. coli phylogeny is a suitable proxy for ampC variation.

Whilst many E. coli ampC variants present a narrow spectrum of hydrolytic activity, some can potentially hydrolyse third-generation cephalosporins following mutations in the promoter sequence. To explore promoter variation, we aligned all n = 377 ampC promoter sequences. Mutations outside positions −42 to +37 (according to Jaurin numbering¹⁴) were disregarded based on existing characterisations^29,30. In total, n = 12 ampC promoter SNVs were identified, with variation dominated by the E. coli K12 wildtype at 47% (177/377). Table 3 documents all n = 11 mutations identified. A given ampC variant associated almost uniquely with an ampC promoter variant, yet ampC promoter variants were associated with multiple ampC variants (h = 0.483 and c = 0.941).

Table 3 Variation in n = 377 ampC promoters

Full size table

E. coli phylogeny drives co-amoxiclav resistance through expression

We next investigated whether the E. coli lineages with intrinsically higher bla_TEM-1 expression also had intrinsically higher co-amoxiclav MICs. This would be consistent with lineage differences in regulatory regions and epigenetic interactions driving increased resistance.

We employed an MCMC to estimate parameters in an ordinal mixed model. The response variable isolate co-amoxiclav MIC (μg/mL; levels ≤2/2, 4/22, 8/22, 16/2, 32/2, >32/2) was predicted by the fixed effects (i) bla_TEM-1 cell copy number (normalised and 95th percentile truncated), (ii) bla_TEM-1 genome copy number (> 1 vs. 1), (iii) non-wildtype bla_TEM−1 promoter SNVs, and (iv) non-wildtype ampC promoter SNVs. For the model, we only used isolates for which every bla_TEM-1 gene was linked to a promoter, and all the promoters were the same variant. We then filtered out isolates with bla_TEM-1 promoter and ampC promoter variants that appeared less than 10 times. This left n = 292/377 isolates. Full model specification, convergence diagnostics, and outputs are given in Supplementary Information.

In decreasing order of effect size, the presence of C32T and G175A in the bla_TEM-1 promoter, the presence of just C32T, bla_TEM-1 cell copy number and bla_TEM-1 genome copy number all increased co-amoxiclav MIC (Table 4); the remaining effects were compatible with chance. As with the expression model, the posterior distribution for contribution of variance from phylogeny demonstrated a long right tail (mean = 2.80; 95% HPD = [0.72, 5.16]). The phylogeny for the n = 292 isolates with co-amoxiclav MIC tip effects is given in Fig. S6.

Table 4 Parameter estimates for co-amoxiclav minimum inhibitory concentration (MIC) genotype-phenotype model

Full size table

We found that phylogenetic tip effects alone were sufficient to shift category membership across the EUCAST resistance breakpoint (> 8/2 μg/mL). Fixing all other covariates at the midpoint of their latent categories, 19% (55/292) of isolates had a greater than 50% posterior probability of being shifted from the susceptible to the resistant category due to their phylogenetic effect alone. Of these, most were represented by phylogroup B2 and D at 64% (35/55) and 20% (11/55), respectively. The most represented sequence types were ST12, ST127 and ST69 at 27% (15/55), 20% (11/55) and 18% (10/55), respectively. The isolates in these sequence types also used in the expression model are highlighted in Fig. 2d.

Lastly, we developed a combined model to test whether the phylogenetic influence on bla_TEM-1 expression causally varies co-amoxiclav MIC (see Supplementary Information). Here, we only used the predictors identified as significant from previous expression and MIC models (bla_TEM-1 cell and genome copy numbers, and bla_TEM-1 promoter SNV). Briefly, the model estimates a parameter that scales the phylogenetic and non-phylogenetic random effects from expression to MIC. Under causality, the scaling parameter should be constant across all random effect terms. We found the scaling parameter had a posterior mean = −1.13 (95% HPD = [−0.72, −1.49]; p_MCMC = 0.002; values taken from chain 1). This supports a direct and substantial influence of expression on MIC, mediated by phylogenetic relationships.

Intergenic architecture and dynamics vary between E. coli phylogroups

The results from our modelling suggest that lineage-specific regulatory systems play a role in the expression of bla_TEM-1, which in turn modulates resistance. The intergenic regions (IGR) in bacterial chromosomes contain their regulatory systems, with one study finding that 86% of IGRs in a strain of E. coli were transcriptionally active³¹. Like genes, IGRs are subject to genetic flow within bacterial populations. Yet, how closely these dynamics mirror one another remains unclear. Understanding whether IGRs evolve more rapidly than coding sequences can shed light on how bacterial populations adapt and diversify not just at the level of the genes they carry, but in how they control the expression of those genes, ultimately affecting their phenotypic traits.

We first began with a traditional pangenome analysis for all n = 377 chromosomes (see “Methods”). Here, we identified n = 15,781 gene clusters, with 5.0% (782/15,781) present in all isolates, and singletons (one member) and doubletons (two members) comprising 22.7% (3576/15,781) and 9.0% (1413/15,781), respectively. We then performed a similar analysis, but for IGR clusters (see “Methods”). In total, we identified n = 33,345 IGR clusters, of which 1.0% (322/33,345) were present in all isolates, consistent with a small core IGR system. Most IGR clusters represented singletons at 44.7% (14,903/33,345) or doubletons at 11.9% (3977/33,345). We then visualised the top 10,000 IGR clusters against our E. coli phylogeny in Fig. 3a. This indicated phylogroup- and sequence type-specific patterns in intergenic sequence.

**Fig. 3: Intergenic region content and dynamics vary across the *E. coli* phylogeny.**

We next wanted to understand whether IGR diversity outpaced coding-sequence diversity. To do this, we built cluster accumulation curves, randomising genome-addition order 100 times per phylogroup, then plotting the cumulative number of clusters discovered versus the number of genomes sampled. We performed this for all phylogroups with at least 20 members (A, B1, B2 and D) in two separate runs: for all cluster sizes, and clusters with at least two members to account for possible assembly and bioinformatic errors. The averaged curves are presented in Fig. 3b. When considering clusters of all sizes, the IGR curves began below the gene curves, reflecting rapid discovery of moderately common accessory genes, but all overtook them after a period of sampling. This crossover marks the sampling depth at which IGR novelty becomes the dominant source of new information.

Fitting Heaps’ law to the mean curves provided a concise measure of openness (see Fig. 3b and Table 5). Briefly, α describes the rate at which diversity accumulates as more samples are taken, where a higher α indicates faster accumulation. We found that α was consistently higher for IGR curves than for gene curves, and phylogroup A exhibited the highest α, indicative of the most open IGR system. To ensure these patterns were not driven by unequal sampling, we regressed α against both phylogroup sample size and cluster type (IGR vs gene). In this model (adjusted R² = 0.73), sample size had no significant effect on α (β = −1.2e-04, p-value = 0.48), whereas IGRs showed a highly significant positive shift (β = 0.137, p-value = 2.2e-05). Thus, the faster accumulation of IGR diversity truly reflects biological differences in chromosomal IGR dynamics, not sampling artifacts.

Table 5 Heaps’ law model alpha (α) estimates for gene and intergenic region (IGR) cluster analysis

Full size table

Discussion

In our dataset of clinical E. coli, bla_TEM-1 was overwhelmingly carried by conjugative plasmids. This means it can spread between bacterial hosts and different genetic backgrounds. We demonstrated that different bacterial hosts intrinsically vary in their ability to express bla_TEM-1 when accounting for variation in promoters (C32T and G175A mutations) and contig copy number. Moreover, our findings suggest that some clinically successful lineages (e.g., ST12) are better at expressing bla_TEM-1 than less clinically successful lineages (e.g., phylogroup F). With a second model, we also found that different E. coli lineages vary intrinsically in co-amoxiclav MIC (accounting for bla_TEM-1 genome and cell copies, and bla_TEM-1 and ampC promoter variants). Again, we observed that some clinically successful lineages (e.g., ST12) had higher resistance than less clinically successful lineages (e.g., phylogroup F). We also quantified that the clinically successful sequence types ST12, ST69 and ST127 had the highest probability of flipping an isolate from the sensitive to resistant category. A third model demonstrated that these two traits were causally linked: E. coli phylogeny drives co-amoxiclav resistance through variable expression of bla_TEM-1, and underscores the necessity of fully resolving bacterial genomes to incorporate accurate genetic, genomic, and phylogenetic information in resistance prediction models. Future work could include evaluations of single amino acid substitutions in TEM-1 (that hydrolyse third-generation cephalosporins and carbapenemases²⁸), which are typically carried in more complex genetic backgrounds.

This study has limitations. Firstly, it is possible that, due to fragmented plasmid assemblies, some isolates identified as having multiple copies of bla_TEM-1 on multiple plasmids instead had multiple copies on the same plasmid. Nonetheless, our expression analysis only considered isolates with a single copy of bla_TEM-1 in the genome, mitigating this concern. Secondly, we only examined expression for a subsample of our isolates due to resource limitations. Thirdly, whilst there is not an agreed upon standard reference gene for quantifying beta-lactamase expression^32,33,34, previous work has shown 16S to be stable³³. Crucially, our delta Ct values were consistent within isolates. Fourthly, we only observed two potentially relevant porin mutations (a premature stop codon in OmpC on OXEC-40’s chromosome and in OmpF on OXEC-423’s chromosome), limiting our ability to investigate their effects on phenotype. We also identified no relevant efflux pumps beyond AcrEF-TolC (found in every isolate), and it was out of the scope of the study to explore their functionality. Fifthly, we found that tip effects did not strictly align with phylogenetic structure. We used a single fixed phylogeny and a single global phylogenetic-variance parameter, which does not capture uncertainty in tree topology or clade-specific evolutionary rates. Consequently, tips within densely sampled clades can be over-shrunk toward their mean, while tips in sparse clades can be under-shrunk. Future work should incorporate phylogenetic uncertainty (e.g., sampling from a tree posterior) and explore multi-variance or partitioned phylogenetic models to assess the impact of tree balance on tip-effect estimation. Sixthly, modelling phylogenetic effect terms for all the plasmid replicons in the dataset would be computationally prohibitive and statistically unstable. Whilst future studies could focus on curating datasets with less plasmid diversity, this would not be reflective of true clinical bacterial populations. Penultimately, automated susceptibility testing methods, like the BD Phoenix™ used here, may not agree completely with reference methods; yet previous work has shown strong agreement with the EUCAST agar dilution method³⁵. Lastly, plasmid copy number is not static. Moreover, in the presence of antibiotics, it has been demonstrated that resistance gene-carrying plasmids can increase their copy number to increase the chance of survival³⁶. Our point estimates of plasmid copy number were derived from genome assemblies sequenced in the absence of antibiotics, which likely represent a lower bound. Nonetheless, we found strong signal to suggest the import of plasmid copy number on resistance, even if under our sensitivity testing, plasmid copy number potentially increased within isolates.

We posit that lineage-specific regulatory systems in E. coli, shaped by horizontal gene flow, may house the key modulators of bla_TEM-1 expression. Although comprehensive dissection of these elements (for example, through ChIP-seq) extends beyond the scope of this study, our findings suggest that future efforts directed at mapping trans-acting factors, small RNAs, differential DNA methylation and nucleoid-associated protein binding across phylogroups will be essential. Additionally, examining other resistance genes and their expression patterns in a similar phylogenetic framework could provide a broader understanding and prediction of resistance mechanisms across different bacterial species and antibiotics. Our study demonstrates the some clinically successful lineages are better at expressing bla_TEM-1, and have a higher probability of flipping from sensitive to resistant. We speculate that, since bla_TEM-1 is both widespread in clinical bacterial populations and its spread is plasmid mediated, being able to better control the expression of bla_TEM-1 when acquired is a selective advantage for clinical isolates.

Methods

Ethics oversight

The use of genotypic and phenotypic data from these isolates is covered by ethical permissions (London—Queen Square Research Ethics Committee, REC ref. 17/LO/1420). Isolates were a subset of those evaluated in a previous study³⁷, for which data linkage with patient data/antibiotic susceptibility test data was enabled through the Infections in Oxfordshire Research Database (IORD). This database has generic Research Ethics Committee, Health Research Authority and Confidentiality Advisory Group approvals (19/SC/0403, 19/CAG/0144) which facilitate the pseudo-anonymised linkage of routinely collected NHS electronic healthcare record data from the Oxford University Hospitals NHS Foundation Trust Clinical Systems Data Warehouse and research data (e.g., sequencing data) from the Modernising Microbiology and Big Infection Diagnostics Theme of the Oxford NIHR Biomedical Research Centre, Oxford. IORD links records by a specific, random, number ensuring that no patient-identifiable information is shared with researchers using this resource.

Isolate selection

We considered n = 548 candidate E. coli bacteraemia isolates cultured from patients presenting to Oxford University Hospitals NHS Foundation Trust between 2013 and 2018, and selected from a larger study of systematically sequenced isolates based on screening their short-read only assemblies with NCBIAMRFinder (v. 3.11.2) for bla_TEM-1, and the absence of other beta-lactamases³⁸.

DNA extraction and sequencing

Sub-cultures of isolate stocks, stored at −80 °C in 10 % glycerol nutrient broth, were grown on Columbia blood agar (CBA) overnight at 37 °C. DNA was extracted using the EasyMag system (bioMerieux) and quantified using the Broad Range DNA Qubit kit (Thermo Fisher Scientific, UK). DNA extracts were multiplexed as 24 samples per sequencing run using the Oxford Nanopore Technologies (ONT) Rapid Barcoding kit (SQK-RBK110.96) according to the manufacturer’s protocol. Sequencing was performed on a GridION using version FLO-MIN106 R9.4.1 flow cells with MinKNOW software (v. 21.11.7) and basecalled using Guppy (v. 3.84). Short-read sequencing was performed on the Illumina HiSeq 4000, pooling 192 isolates per lane, generating 150 bp paired end-reads³⁷.

Dataset curation and genome assembly

Full details are given in Supplementary Information. Briefly, short- and long-read quality control used fastp (v. 0.23.4) and filtlong (v. 0.2.1), respectively^39,40. We also used Rasusa (v. 0.7.1) on n = 3/548 long-read sets due to memory constraints⁴¹. Genome assembly used Flye (v. 2.9.2-b1786) with bwa (v. 0.7.17-r1188) and Polypolish (v. 0.5.0), and Unicycler (v. 0.5.0), which used SPAdes (v. 3.15.5), miniasm (v. 0.3-r179) and Racon (v. 1.5.0)^{42,43,44,45,46,47,48}. Plasmid contig validation used Mash screen (v. 2.3) with PLSDB (v. 2023_06_23_v2)^49,50. All assemblies were annotated with NCBIAMRFinder (v. 3.11.26 and database v. 2023-11-15.1)³⁸. Alongside, we validated the presence of bla_TEM-1 using tblastn (v. 2.15.0+) with the NCBI Reference Gene Catalog TEM-1 RefSeq protein WP_000027057.1 and 100% amino acid identity⁵¹. Following genome assembly, we removed n = 171/548 isolates, either because (i) the chromosome did not circularise (116/171), (ii) it carried a non-bla_TEM-1 bla_TEM variant and/or an additional acquired beta-lactamase (54/171), or (iii) the chromosome was too short consistent with misassembly (~ 3.5Mbp; 1/171). This left a final dataset of n = 377 isolates.

Antibiotic susceptibility testing

Antibiotic susceptibility testing was performed using the BD Phoenix™ system in accordance with the manufacturers’ instructions, generating MICs for co-amoxiclav.

Generation of cDNA template

RNA extraction and DNase treatment were performed on replicates of each isolate (n = 3 biological/n = 3 technical) as described previously⁵². RNA was quantified post DNase treatment using Broad Range RNA Qubit kit (Thermo Fisher Scientific, UK), normalised to 1 μg and reverse transcribed to cDNA using SuperScript IV VILO (Thermo Fisher Scientific, UK) under the following conditions: 25 °C for 10 min, 42 °C for 60 min and 85 °C for 5 min.

qPCR quantification of bla _TEM-1 expression

bla_TEM-1 expression was quantified in a selection of isolates: initially n = 35 isolates in triplicate, referred to as batch 1; then a further n = 48 isolates in duplicate, referred to as batch 2. Batch 1 were randomly selected by MIC (n = 2 MIC ≤2/2, n = 5 MIC 4/2, n = 9 MIC 8/2, n = 10 MIC 16/2, n = 4 MIC 32/2, n = 5 MIC > 32/2). Batch 2 was enhanced for specific bla_TEM-1 promoter mutations, selecting all isolates with a single bla_TEM-1 gene with C32T (with or without a G146A mutation) that had not already been tested, and then randomly selecting from other wildtype and G146A, single bla_TEM-1 gene isolates. For all qPCR reactions, E. coli cDNA was normalised to 1 ng and amplified in a duplex qPCR reaction targeting bla_TEM-1 and 16S. qPCR standard curves were prepared for both bla_TEM-1 (Genbank Accession: DQ221255.1) and 16S (Genbank Accession: LC747145.1) sequences cloned into pMX vectors (Thermo Fisher Scientific, UK). Tenfold dilutions of linearised plasmids (1–1 × 10⁷ copies/reaction) were used as a standard curve for each experiment. Both curves were linear in the range tested (16S: R² > 0.991; TEM-1: R² > 0.91). The slopes of the standard curves for 16S and bla_TEM-1 were −3.607 and −3.522, respectively. qPCR was performed using a custom 20 μl TaqMan gene expression assay consisting of TaqMan™ Multiplex Master Mix, TaqMan unlabelled primers and a TaqMan probe with dye label (FAM for TEM-1 and VIC for 16S) carried out on the QuantStudio5^TM real-time PCR system (Thermo Fisher Scientific, UK). Cycling conditions were 95 °C for 20 s, followed by 40 cycles of 95 °C for 3 s and 60 °C for 30 s, with Mustang purple as the passive reference. For batch 1, triplicate samples were analysed and standardized against 16S rRNA gene expression. Triplicate reactions for each isolate demonstrated good reproducibility for batch 1 (Fig. S7). Of note, for isolate OXEC-75, TEM-1 expression was very low-level, and 5 reactions (1 technical replicate for biological replicate 1, 1 technical replicate for biological replicate 2, and all 3 technical replicates for biological replicate 3) failed to amplify any product. Due to resource constraints, we reduced replicates for batch 2 (n = 1 biological/n = 2 technical; Fig. S8). To reduce model complexity, we omitted some batch 1 isolates (n = 16/35) which carried more than one copy of bla_TEM-1 in the genome, leaving a total of n = 67 isolates. ΔCt values were calculated by subtracting mean 16S Ct from mean TEM-1 Ct.

Assembly annotations

We annotated the chromosomes using Prokka (v. 1.14.6) with default parameters except --centre X --compliant (see annotate.sh)⁵³. Abricate (v. 1.0.1) was used with default parameters and the PlasmidFinder database (v. 2023-Nov-4) to annotate for plasmid replicons^54,55. Plasmid mobilities were predicted using MOB-suite’s MOB-typer (v. 3.1.4) with default parameters⁵⁵. Briefly, a plasmid was labelled as putatively conjugative if it had both a relaxase and mating pair formation (MPF) complex, mobilisable if it had either a relaxase or an origin of transfer (oriT) but no MPF, and non-mobilisable if it had no relaxase and oriT. Lastly, we assigned sequence types (STs) and phylogroups to our E. coli chromosomes using mlst (v. 2.23.0) with default parameters and EzClermont (v. 0.7.0) with default parameters, respectively^24,56. We used blastn (v. 2.15.0+) with a custom database of known bla_TEM-1 promoters^13,25,57. Due to the high similarity between the P3, Pa/Pb, P4 and P5 reference sequences, we chose the top hit in each position.

SNV analysis

We first determined the sets of sequences we wanted to align: (i) bla_TEM-1 (n = 451; some genomes carried multiple copies), (ii) bla_TEM-1 promoters (n = 409), (iii) ampC (n = 377) and (iv) ampC promoters (n = 377). For bla_TEM-1 and ampC, we extracted the relevant sequences using the coordinate and strand information from the NCBIAMRFinder output (see extractGene.py). For the bla_TEM-1 promoters, we used coordinate and strand information from the earlier blastn results. For the ampC promoters, we took the sequence 200 bp upstream of the ampC gene then manually excised the −42 to +37 region in AliView⁵⁸. Sets of sequences were aligned using MAFFT (v. 7.520) with default parameters except --auto⁵⁹. Variable sites were examined using snp-sites (v. 2.5.1) with default parameters and in -v mode⁶⁰.

Contig copy number

We used BWA (v. 0.7.17-r1188) to map the quality-controlled short-reads to each contig, then SAMtools (v. 1.18) for subsequent processing (see copyNumber.sh)^43,61. For each contig, we calculated the mean depth over its length, then within each assembly, normalised by the mean depth of the chromosome.

Chromosomal core gene phylogeny

Building the chromosomal phylogeny involved four main steps: annotating the chromosomes, identifying the core genes, aligning them and building a phylogeny. Initially, all the chromosomes carried a copy of ampC, meaning it was a core gene and would be included in the phylogeny. Since we wanted to manually verify EzClermont phylogroup classifications with the phylogeny and then compare phylogroups to the distribution of ampC gene variants, we excised the ampC sequence from all the chromosomes beforehand to avoid confounding our analysis (see removeGene.py). To identify the core genes (those with ≥98% frequency in the sample), we used Panaroo (v. 1.4.2) with default parameters except --clean-mode sensitive --aligner mafft -a core --core_threshold 0.98⁶². Panaroo also aligned our core genes using MAFFT (v. 7.520; see runPanaroo.sh)⁵⁹. Lastly, we built the core gene maximum-likelihood phylogeny using IQ-Tree (v. 2.3.0) with default parameters except -m GTR + F + I + R4 -keep-ident -B 1000 -mem 10 G using -s core_gene_alignment_filtered.aln from Panaroo (see runIQTREE.sh)⁶³. The substitution model used was general time reversible (GTR) using empirical base frequencies from the alignment (F), allowing for invariant sites (I) and variable rates of substitution (R4).

Intergenic region analysis

For the gene cluster analysis, we used Panaroo (v. 1.5.1) with default parameters except --clean-mode sensitive --merge_paralogs⁶². Running Panaroo with paralog merging is necessary for Piggy, and reduced the overall pangenome size from the previous run. For the intergenic region cluster analysis, we used Piggy (v. 1.5) with default parameters⁶⁴.

Statistical analysis and visualisation

All statistical analysis was performed in R (v. 4.4.0) using RStudio (v. 2024.04.2+764)^65,66. We implemented MCMC generalised linear mixed models using the MCMCglmm library in R⁶⁷. Model specifications, convergence diagnostics, parameter estimations and outputs are reported in Supplementary Information; see modelExpression.R, modelMIC.R and modelCombined.R, to reproduce the bla_TEM-1 expression, co-amoxiclav MIC and causal models, respectively. Homogeneity and completeness are defined in Rosenberg, A. and Hirschberg, J (2007) and were also implemented in R⁶⁸. A 95% highest posterior density (HPD) credible interval finds the closest points (\(a\) and \(b\)) for which \(F(b)-F(a)=0.95\), where \(F\) is the empirical density of the posterior. Figures were plotted with the ggplot2 library⁶⁹. See buildResults.R and igr_analysis.R to reproduce all statistics and figures in the manuscript.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Metadata for all n = 377 genomes included in the final analysis is given in Supplementary Data File 1. Metadata for all n = 451 bla_TEM-1 annotations identified in these genomes is given in Supplementary Data File 2. qPCR expression data for all replicates is given in Supplementary Data File 3. NCBI accessions for short- and long-read sets and assemblies are given in Supplementary Data File 4. All Supplementary Data Files are also stably archived at https://doi.org/10.5281/zenodo.16731807.

Code availability

All scripts referenced in the Methods can be found in the GitHub repository https://github.com/wtmatlock/tem, which is stably archived at https://doi.org/10.5281/zenodo.16731807.

References

Datta, N. & Kontomichalou, P. Penicillinase synthesis controlled by infectious R factors in Enterobacteriaceae. Nature 208, 239–241 (1965).
Article ADS CAS PubMed Google Scholar
Partridge, S. R. & Hall, R. M. Evolution of transposons containing bla_TEM genes. Antimicrob. Agents Chemother. 49, 1267–1268 (2005).
Shaw, L. P. & Neher, R. A. Visualizing and quantifying structural diversity around mobile resistance genes. Microb. Genom. 9, 001168 (2023).
CAS PubMed PubMed Central Google Scholar
Bailey, J. K., Pinyon, J. L., Anantham, S. & Hall, R. M. Distribution of the bla_TEM gene and bla_TEM-containing transposons in commensal Escherichia coli. J. Antimicrob. Chemother. 66, 745–751 (2011).
Article CAS PubMed Google Scholar
Alcock, B. P. et al. CARD 2023: expanded curation, support for machine learning, and resistome prediction at the comprehensive antibiotic resistance database. Nucleic Acids Res. 51, D690–D699 (2023).
Article CAS PubMed Google Scholar
Sayers, E. W. et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 52, D33 (2023).
Article PubMed Central Google Scholar
Murray, C. J. et al. Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis. Lancet 399, 629–655 (2022).
Article CAS Google Scholar
Sanderson, T. A web tool for exploring the usage of medicines in hospitals in England. Wellcome Open Res. 9, 147 (2024).
Article PubMed PubMed Central Google Scholar
EUCAST: clinical breakpoints and dosing of antibiotics. https://www.eucast.org/clinical_breakpoints.
Yoon, C. H. et al. Mortality risks associated with empirical antibiotic activity in Escherichia coli bacteraemia: an analysis of electronic health records. J. Antimicrob. Chemother. 77, 2536–2545 (2022).
Article CAS PubMed PubMed Central Google Scholar
Martinez, J. L. et al. Resistance to beta-lactam/clavulanate. Lancet 330, 1473 (1987).
Reguera, J. A., Baquero, F., Perez-Diaz, J. C. & Martinez, J. L. Synergistic effect of dosage and bacterial inoculum in TEM-1 mediated antibiotic resistance. Eur. J. Clin. Microbiol. Infect. Dis. 7, 778–779 (1988).
Article CAS PubMed Google Scholar
Lartigue, M. F., Leflon-Guibout, V., Poirel, L., Nordmann, P. & Nicolas-Chanoine, M. H. Promoters P3, Pa/Pb, P4, and P5 upstream from bla_TEM genes and their relationship to β-lactam resistance. Antimicrob. Agents Chemother. 46, 4035–4037 (2002).
Article CAS PubMed PubMed Central Google Scholar
Jaurin, B., Grundström, T. & Normark, S. Sequence elements determining ampC promoter strength in E. coli. EMBO J. 1, 875–881 (1982).
Article CAS PubMed PubMed Central Google Scholar
Siasat, P. A. & Blair, J. M. A. Microbial primer: multidrug efflux pumps. Microbiology 169, 001370 (2023).
Article CAS PubMed PubMed Central Google Scholar
Vital, M. et al. Gene expression analysis of E. coli strains provides insights into the role of gene regulation in diversification. ISME J. 9, 1130–1140 (2015).
Article CAS PubMed Google Scholar
McNally, A. et al. Combined analysis of variation in core, accessory and regulatory genome regions provides a super-resolution view into the evolution of bacterial populations. PLoS Genet. 12, e1006280 (2016).
Article PubMed PubMed Central Google Scholar
Card, K. J., Thomas, M. D., Graves, J. L., Barrick, J. E. & Lenski, R. E. Genomic evolution of antibiotic resistance is contingent on genetic background following a long-term experiment with Escherichia coli. Proc. Natl. Acad. Sci. USA 118, e2016886118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wong, A. Epistasis and the evolution of antimicrobial resistance. Front. Microbiol. 8, 246 (2017).
Wang, X., Yu, D. & Chen, L. Antimicrobial resistance and mechanisms of epigenetic regulation. Front. Cell. Infect. Microbiol. 13, 1199646 (2023).
Dunn, S., Carrilero, L., Brockhurst, M. & McNally, A. Limited and strain-specific transcriptional and growth responses to acquisition of a multidrug resistance plasmid in genetically diverse Escherichia coli lineages. mSystems 6, 10–1128 (2021).
Article Google Scholar
Alonso-del Valle, A. et al. Antimicrobial resistance level and conjugation permissiveness shape plasmid distribution in clinical enterobacteria. Proc. Natl. Acad. Sci. USA 120, e2314135120 (2023).
Article CAS PubMed PubMed Central Google Scholar
Goussard, S. & Courvalin, P. Updated sequence information for TEM β-lactamase genes. Antimicrob. Agents Chemother. 43, 367–370 (1999).
Article CAS PubMed PubMed Central Google Scholar
Waters, N. R., Abram, F., Brennan, F., Holmes, A. & Pritchard, L. Easy phylotyping of Escherichia coli via the EzClermont web app and command-line tool. Access Microbiol. 2, e000143 (2020).
Article Google Scholar
Robin, F. et al. Evolution of TEM-type enzymes: biochemical and genetic characterization of two new complex mutant TEM enzymes, TEM-151 and TEM-152, from a single patient. Antimicrob. Agents Chemother. 51, 1304–1309 (2007).
Article CAS PubMed PubMed Central Google Scholar
Sutcliffe, J. G. Nucleotide sequence of the ampicillin resistance gene of Escherichia coli plasmid pBR322. Proc. Natl. Acad. Sci. 75, 3737–3741 (1978).
Article ADS CAS PubMed PubMed Central Google Scholar
Shaw, L. P. et al. Niche and local geography shape the pangenome of wastewater-and livestock-associated Enterobacteriaceae. Sci. Adv. 7, eabe3868 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Naas, T. et al. Beta-lactamase database (BLDB)–structure and function. J. Enzym. Inhib. Med. Chem. 32, 917–919 (2017).
Article CAS Google Scholar
Tracz, D. M. et al. ampC gene expression in promoter mutants of cefoxitin-resistant Escherichia coli clinical isolates. FEMS Microbiol. Lett. 270, 265–271 (2007).
Article CAS PubMed Google Scholar
Caroff, N., Espaze, E., Bérard, I., Richet, H. & Reynaud, A. Mutations in the ampC promoter of Escherichia coli isolates resistant to oxyiminocephalosporins without extended spectrum β-lactamase production. FEMS Microbiol. Lett. 173, 459–465 (1999).
CAS PubMed Google Scholar
Raghavan, R., Groisman, E. A. & Ochman, H. Genome-wide detection of novel regulatory RNAs in E. coli. Genome Res. 21, 1487–1497 (2011).
Article CAS PubMed PubMed Central Google Scholar
Rodríguez-Villodres, Á et al. Survival of infection with TEM β-lactamase-producing Escherichia coli with Pan-β-lactam resistance. J. Infect. 89, 106268 (2024).
Article PubMed Google Scholar
Singh, T. et al. Transcriptome analysis of beta-lactamase genes in diarrheagenic Escherichia coli. Sci. Rep. 9, 3626 (2019).
Article ADS PubMed PubMed Central Google Scholar
Kjeldsen, T. S. B. et al. CTX-M-1 β-lactamase expression in Escherichia coli is dependent on cefotaxime concentration, growth phase and gene location. J. Antimicrob. Chemother. 70, 62–70 (2015).
Article CAS PubMed Google Scholar
Davies, T. J. et al. Reconciling the potentially irreconcilable? Genotypic and phenotypic amoxicillin-clavulanate resistance in Escherichia coli. Antimicrob. Agents Chemother. 64, 10–1128 (2020).
Article Google Scholar
Hernandez-Beltra, J. C. R. et al. Plasmid-mediated phenotypic noise leads to transient antibiotic resistance in bacteria. Nat. Commun. 15, 2610 (2024).
Article ADS Google Scholar
Lipworth, S. et al. Ten-year longitudinal molecular epidemiology study of Escherichia coli and Klebsiella species bloodstream infections in Oxfordshire, UK. Genome Med. 13, 144 (2021).
Article CAS PubMed PubMed Central Google Scholar
Feldgarden, M. et al. AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence. Sci. Rep. 11, 12728 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Article PubMed PubMed Central Google Scholar
Wick, R. R. Filtlong. https://github.com/rrwick/Filtlong (2019).
Hall, M. Rasusa: Randomly subsample sequencing reads to a specified coverage. J. Open Source Softw. 7, 3941 (2022).
Oxford Nanopore Technologies. medaka. https://github.com/nanoporetech/medaka (2018).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Wick, R. R. & Holt, K. E. Polypolish: short-read polishing of long-read bacterial genome assemblies. PLoS Comput. Biol. 18, e1009802 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Wick, R. R., Judd, L. M., Gorrie, C. L. & Holt, K. E. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput. Biol. 13, e1005595 (2017).
Article ADS PubMed PubMed Central Google Scholar
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
Article MathSciNet CAS PubMed PubMed Central Google Scholar
Li, H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32, 2103–2110 (2016).
Article CAS PubMed PubMed Central Google Scholar
Vaser, R., Sović, I., Nagarajan, N. & Šikić, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ondov, B. D. et al. Mash screen: high-throughput sequence containment estimation for genome discovery. Genome Biol. 20, 232 (2019).
Article PubMed PubMed Central Google Scholar
Galata, V., Fehlmann, T., Backes, C. & Keller, A. PLSDB: a resource of complete bacterial plasmids. Nucleic Acids Res. 47, D195–D202 (2019).
Article CAS PubMed Google Scholar
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinforma. 10, 421 (2009).
Article Google Scholar
Rodger, G. et al. Comparison of direct cDNA and PCR-cDNA Nanopore sequencing of Escherichia coli isolates. Microb. Genom. 10, 001296 (2024).
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
Article CAS PubMed Google Scholar
Carattoli, A. et al. In silico detection and typing of plasmids using PlasmidFinder and plasmid multilocus sequence typing. Antimicrob. Agents Chemother. 58, 3895–3903 (2014).
Article PubMed PubMed Central Google Scholar
Robertson, J. & Nash, J. H. E. MOB-suite: software tools for clustering, reconstruction and typing of plasmids from draft assemblies. Microb. Genom. 4, e000206 (2018).
PubMed PubMed Central Google Scholar
Seemann, T. mlst. https://github.com/tseemann/mlst (2017).
Stoesser, N. et al. Predicting antimicrobial susceptibilities for Escherichia coli and Klebsiella pneumoniae isolates using whole genomic sequence data. J. Antimicrob. Chemother. 68, 2234–2244 (2013).
Article CAS PubMed PubMed Central Google Scholar
Larsson, A. AliView: a fast and lightweight alignment viewer and editor for large datasets. Bioinformatics 30, 3276–3278 (2014).
Article CAS PubMed PubMed Central Google Scholar
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Article CAS PubMed PubMed Central Google Scholar
Page, A. J. et al. SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microb. Genom. 2, e000056 (2016).
PubMed PubMed Central Google Scholar
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Article PubMed PubMed Central Google Scholar
Tonkin-Hill, G. et al. Producing polished prokaryotic pangenomes with the Panaroo pipeline. Genome Biol. 21, 180 (2020).
Article PubMed PubMed Central Google Scholar
Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).
Article CAS PubMed PubMed Central Google Scholar
Thorpe, H. A., Bayliss, S. C., Sheppard, S. K. & Feil, E. J. Piggy: a rapid, large-scale pan-genome analysis tool for intergenic regions in bacteria. Gigascience 7, giy015 (2018).
Article PubMed PubMed Central Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. https://www.R-project.org/ (2025).
Posit Team. RStudio: Integrated Development Environment for R. Posit Software, PBC. http://www.posit.co/ (2024).
Hadfield, J. D. MCMC methods for multi-response generalized linear mixed models: the MCMCglmm R package. J. Stat. Softw. 33, 1–22 (2010).
Article Google Scholar
Rosenberg, A. & Hirschberg, J. V-measure: A conditional entropy-based external cluster evaluation measure. In Proc. of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL). (2007).
Gómez-Rubio, V. ggplot2-elegant graphics for data analysis. J. Stat. Softw. 77, 1–3 (2017).

Download references

Acknowledgements

This work was funded by the National Institute for Health Research (NIHR) Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance (NIHR200915), a partnership between the UK Health Security Agency (UKHSA) and the University of Oxford. It was also supported by the NIHR Oxford Biomedical Research Centre (BRC). The computational aspects of this research were funded from the NIHR Oxford BRC with additional support from the Wellcome Trust Core Award Grant Number 203141/Z/16/Z. The views expressed are those of the author(s) and not necessarily those of the NIHR, UKHSA or the Department of Health and Social Care. The authors thank Jarrod Hadfield for providing guidance on implementing the MCMCglmm library.

Author information

These authors contributed equally: William Matlock, Gillian Rodger.
These authors jointly supervised this work: Samuel Lipworth, Nicole Stoesser.

Authors and Affiliations

Department of Biology, University of Oxford, Oxford, UK
William Matlock
Nuffield Department of Medicine, University of Oxford, Oxford, UK
William Matlock, Gillian Rodger, Emma Pritchard, Aysha Roohi, A. Sarah Walker, Samuel Lipworth & Nicole Stoesser
The National Institute for Health Research Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance at the University of Oxford, Oxford, UK
William Matlock, Gillian Rodger, Emma Pritchard, Matthew Colpus, Natalia Kapel, Aysha Roohi, A. Sarah Walker, Samuel Lipworth & Nicole Stoesser
Oxford University Hospitals NHS Foundation Trust, Oxford, UK
Lucinda Barrett, Marcus Morgan, Sarah Oakley, Drosos Karageorgopoulos, Samuel Lipworth & Nicole Stoesser
UK Health Security Agency, Colindale, UK
Katie L. Hopkins
School of Cellular and Molecular Medicine, University of Bristol, Bristol, UK
Matthew B. Avison

Authors

William Matlock
View author publications
Search author on:PubMed Google Scholar
Gillian Rodger
View author publications
Search author on:PubMed Google Scholar
Emma Pritchard
View author publications
Search author on:PubMed Google Scholar
Matthew Colpus
View author publications
Search author on:PubMed Google Scholar
Natalia Kapel
View author publications
Search author on:PubMed Google Scholar
Lucinda Barrett
View author publications
Search author on:PubMed Google Scholar
Marcus Morgan
View author publications
Search author on:PubMed Google Scholar
Sarah Oakley
View author publications
Search author on:PubMed Google Scholar
Katie L. Hopkins
View author publications
Search author on:PubMed Google Scholar
Aysha Roohi
View author publications
Search author on:PubMed Google Scholar
Drosos Karageorgopoulos
View author publications
Search author on:PubMed Google Scholar
Matthew B. Avison
View author publications
Search author on:PubMed Google Scholar
A. Sarah Walker
View author publications
Search author on:PubMed Google Scholar
Samuel Lipworth
View author publications
Search author on:PubMed Google Scholar
Nicole Stoesser
View author publications
Search author on:PubMed Google Scholar

Contributions

W.M.: conceptualisation, methodology, software, validation, formal analysis, data curation, writing–original draft, writing–review & editing, visualisation. G.R.: conceptualisation, methodology, validation, investigation, data curation, writing–review & editing. E.P.: methodology, writing–review & editing. M.C.: investigation, writing–review & editing. N.K.: investigation, writing–review & editing. L.B.: resources, writing–review & editing. M.M.: resources, writing–review & editing. S.O.: resources, writing–review & editing. K.L.H.: investigation, writing–review & editing. A.R.: writing–review & editing, project administration. D.K.: writing–review & editing. M.B.A.: writing–review & editing, supervision. A.S.W.: writing–review & editing, supervision, funding acquisition. S.L.: conceptualisation, writing–review & editing, supervision. N.S.: conceptualisation, writing–review & editing, supervision, project administration, funding acquisition.

Corresponding authors

Correspondence to William Matlock or Nicole Stoesser.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Ruichao Li and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review file

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Matlock, W., Rodger, G., Pritchard, E. et al. Escherichia coli phylogeny drives co-amoxiclav resistance through variable expression of TEM-1 beta-lactamase. Nat Commun 16, 8669 (2025). https://doi.org/10.1038/s41467-025-63714-6

Download citation

Received: 06 November 2024
Accepted: 27 August 2025
Published: 30 September 2025
DOI: https://doi.org/10.1038/s41467-025-63714-6

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

A curated dataset of E. coli isolates with hybrid assemblies and co-amoxiclav MICs

bla TEM-1 associated with conjugative plasmids

E. coli phylogeny shapes bla TEM-1 expression

ampC gene variation is highly concordant with E. coli lineage

E. coli phylogeny drives co-amoxiclav resistance through expression

Intergenic architecture and dynamics vary between E. coli phylogroups

Discussion

Methods

Ethics oversight

Isolate selection

DNA extraction and sequencing

Dataset curation and genome assembly

Antibiotic susceptibility testing

Generation of cDNA template

qPCR quantification of bla TEM-1 expression

Assembly annotations

SNV analysis

Contig copy number

Chromosomal core gene phylogeny

Intergenic region analysis

Statistical analysis and visualisation

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links

bla _TEM-1 associated with conjugative plasmids

E. coli phylogeny shapes bla _TEM-1 expression

qPCR quantification of bla _TEM-1 expression