Introduction

The Gram-negative commensal bacterium Klebsiella pneumoniae typically inhabits human mucosal surfaces, including the gastrointestinal tract and upper respiratory tract1. It is also an opportunistic pathogen that is responsible for a wide range of nosocomial and community-acquired infections, which often lead to adverse clinical and health-economic outcomes1,2. The disease presentations of K. pneumoniae include pneumonia, bloodstream infection, meningitis, urinary tract infections, wound or soft tissue infections, and liver abscess1. K. pneumoniae can invade normally sterile sites of the body, such as the blood, through localized infectious diseases, surgical procedures, and the presence of catheters and other intravascular medical devices3. The entry and proliferation of bacteria in the bloodstream can lead to a strong immune response, including organ failure and dysfunction, that can be fatal if not treated promptly and appropriately4,5. Worldwide, K. pneumoniae is the third leading cause of bloodstream infections6,7.

The burden of K. pneumoniae infections in hospitals is compounded by antimicrobial resistance (AMR), which greatly limits available treatment options and increases mortality rates especially for patients infected with invasive strains8,9. Some K. pneumoniae clones have become increasingly resistant to multiple antimicrobial agents and are designated as multidrug or extensively drug resistant10,11. Bacteria that harbor carbapenemases and extended spectrum beta-lactamases (ESBLs) are particularly notable because they render many beta-lactams ineffective12. The World Health Organization recognizes third-generation cephalosporin-resistant and carbapenem-resistant Enterobacteriaceae, including K. pneumoniae, as critical threats and for which new antibiotics are urgently needed13. Mortality rates of patients with bloodstream infections due to resistant K. pneumoniae are distressingly high, ranging from 17 to 34% for patients with ESBL-producing K. pneumoniae14,15,16 and up to 42% for patients with carbapenem-resistant K. pneumoniae17,18. ESBL-producing K. pneumoniae have been reported in several hospital outbreaks19,20,21 and their increasing prevalence may be linked to the overuse of expanded spectrum cephalosporins in healthcare settings22. A systematic analysis of combined literature reviews, hospital systems, laboratory data, and surveillance systems spanning 204 countries and territories reported over 50,000 deaths in 2019 alone that were attributed to third-generation cephalosporin resistance in K. pneumoniae23. Forecasting models suggest that if current trends in antimicrobial use continue, over half of invasive K. pneumoniae will become resistant to third-generation cephalosporins and carbapenems by 203024. Despite the increasing threat of resistant K. pneumoniae and the constantly growing suite of high-risk clones, our understanding of the underlying bacterial and genetic features that promote AMR in bloodstream infections and subsequent spread within the hospital setting remains incomplete25,26.

K. pneumoniae comprises a phenotypically and genetically diverse assemblage of clones10. It is broadly categorized into two pathotypes that exhibit distinct disease profiles. The so-called “classical” K. pneumoniae is a common cause of nosocomial infections and often harbor mobile plasmids encoding AMR genes10,27. Hypervirulent K. pneumoniae is a major cause of community-acquired infections in healthy individuals, including liver abscess, pneumonia and meningitis, endophthalmitis, and infections of the central nervous system28. Hypervirulent clones are often susceptible to most antimicrobial agents, but clones exhibiting both hypervirulence and multidrug resistance (referred to as convergent clones) have been described29.

Here, we aim to elucidate the population structure of K. pneumoniae in bloodstream infections from a single medical center and the drivers that facilitate the dissemination of AMR. We analyzed a combination of 136 short-read genome sequences complemented with 12 long-read sequences of K. pneumoniae derived from surveillance of bloodstream infections at the Dartmouth-Hitchcock Medical Center (DHMC), New Hampshire, USA. Altogether, our results highlight the importance of considering both the genetic background of host strains and the routes of plasmid transmission in understanding the spread of AMR in bloodstream infections. Our findings highlight the need for improved infection prevention and control to reduce risk of healthcare-associated pathogen transmission, including the importance of genomic data surveillance to detect onward pathogen and plasmid transmission and persistence.

Results

The bloodstream K. pneumoniae population consists of diverse lineages

A total of 136 K. pneumoniae isolates from bloodstream infections in unique pediatric and adult patients at the DHMC were collected between January 2017 and January 2022 (Supplementary Data 1). The maximum likelihood phylogeny built from 188,166 single nucleotide polymorphisms (SNPs) extracted from a 3.378 Mbp sequence alignment of 3,511 core genes revealed a genetically diverse population. We identified 94 known sequence types (ST; Supplementary Data 2). The ST diversity is high (Simpson’s diversity index = 0.983) and not one ST appears to dominate the population (Fig. 1A, B). Only nine STs contained at least three genomes (ST20, six genomes; ST36, ST45 and ST253, five genomes each; ST1380 and ST307, four genomes each; ST111, ST225, ST23 and ST941 with three genomes each). A total of 11 and 73 STs were represented by two and one genome(s), respectively. Thirteen genomes carried novel combinations of the 7-gene multi-locus sequence typing (MLST) loci and were assigned novel ST designations by the Klebsiella-specific database in BIGSdb30: ST6357 (isolate KPB115), ST6358 (KPB126), ST6359 (KPB139), ST6360 (KPB179), ST6361 (KPB28), ST6362 (KPB29), ST6363 (KPB34), ST6364 (KPB41), ST6365 (KPB56), ST6366 (KPB79), ST6367 (KPB89), ST6368 (KPB90), and ST6369 (KPB93).

Fig. 1: Genomic features of the 136 K. pneumoniae isolates from bloodstream infection.
figure 1

A A maximum likelihood phylogeny showing the year of isolation and the most frequent sequence types (ST), clonal groups (CG; represented by three or more genomes), and sublineages (SL; represented by four or more genomes). The midpoint-rooted tree is calculated using single nucleotide polymorphic sites from the sequence alignment of 3,511 concatenated core genes. Tree scale represents the number of substitutions per site. The barplots show the proportion of (B) ST (C) CG and (D) SL by sampling year. The numbers above the bars in panel B indicate the number of isolates per year. For visual clarity, only the most frequent ST, CG and SL are shown in colored blocks, while less frequent ones are grouped together as “others” in gray blocks.

The 136 genomes can be further classified into 99 known clonal groups (CG; Simpson’s diversity index = 0.986) and 76 known sublineages (SL; Simpson’s diversity index = 0.971) (Fig. 1C,D and Supplementary Data 2). As defined previously31, K. pneumoniae genomes belonging to the same clonal group differ by 43 allelic mismatches, while sublineages differ by 190 allelic mismatches. The most frequently detected clonal groups contained five genomes (CG45), four genomes (CG1380, CG20, CG307, CG36), and three genomes (CG10047, CG111, CG23, CG941). The most frequently detected sublineages were SL17 (consisting of 14 genomes from ST17, ST20, ST422, ST3640, ST5122, ST6364, ST6365), SL3010 (six genomes from ST1, ST5, ST6, ST8, ST9, ST10), SL45 (seven genomes from ST45 and ST987), and SL268 (six genomes from ST36 and ST268). From 2017 to 2021, less common STs, CGs and SLs comprise a large assemblage of the annual population. Overall, these results show that diverse genotypes may similarly cause bloodstream infections, thus emphasizing the opportunistic nature of K. pneumoniae in invasive infections.

Antimicrobial resistance is widespread in bloodstream K. pneumoniae

We carried out antimicrobial susceptibility testing of K. pneumoniae isolates against 20 antimicrobial agents from seven classes of antimicrobial drugs (Fig. 2A, Supplementary Data 1). All isolates exhibited resistance to ampicillin, which was not surprising because it is known to be intrinsic in K. pneumoniae32. Resistance was observed against ampicillin-sulbactam (n = 16 isolates, representing 11.76% of the population), cefuroxime (n = 14, 10.29%), sulphamethoxazole (n = 14, 10.29%), and cefazolin (n = 12, 8.82%). A total of 17 (12.5%), 16 (11.76%), seven (5.14%), and two (1.47%) isolates were resistant to at least one antimicrobial agent belonging to cephalosporin, beta-lactam combination agents, quinolone, and aminoglycosides sub/classes, respectively. However, all or nearly all isolates were susceptible to amikacin and meropenem (n = 136, 100%), ertapenem (n = 135, 99.26%), gentamicin (n = 134, 98.52%), cefoxitin (n = 133, 97.79%), amoxicillin-clavulanic acid and piperacillin/tazobactam (n = 132, 97.05%), levofloxacin (n = 131, 96.32). Altogether, we detected 18 unique AMR profiles, each with different combinations of resistance phenotypes (Fig. 2A). A total of 109 isolates were phenotypically resistant to ampicillin only and not to other antimicrobial agents. Isolates KPB140 and KPB97 exhibited resistance to 14 and 13 antimicrobial agents, respectively, while isolates KPB176 and KPB77 were resistant to 12 antimicrobial agents (Supplementary Data 1).

Fig. 2: Antimicrobial susceptibility phenotypes and genotypes of the 136 K. pneumoniae isolates from bloodstream infection.
figure 2

A An UpSet plot showing the total number of isolates resistant to antimicrobials tested (left bar plot), the total number of isolates exhibiting a particular antibiogram (top bar plot) and filled dots representing the presence of an antimicrobial resistance (AMR) phenotype. Acronyms: Pip.tazobactam Piperacillin-Tazobactam, SXT Sulphamethoxazole/Trimethoprim, Amp. Sulbactam Ampicillin Sulbactam. B Number of genomes carrying individual AMR genes and gene combinations per antimicrobial class. The AMR genes in the color legend are grouped according to antimicrobial class. C Concordance analysis of resistance phenotypes and predicted genotypes.

Using in silico analysis of the genome sequences, we identified the presence of genes that encode AMR determinants (Fig. 2B and Supplementary Fig. S1, Data 3). Across the entire dataset, we detected a total of 64 unique genes encoding resistance to ten antimicrobial drug classes. All 136 genomes harbored at least one AMR gene that confer resistance to beta-lactams. A total of 134 (98.52%) and 123 (90.44% genomes) harbored intrinsic and chromosomally mediated genes encoding resistance to fosfomycin and phenicol/quinolone, respectively. We observed ten combinations of the phenicol/quinolone multi-drug efflux pump alleles occurring in the genomes (Supplementary Data 3). The most frequently detected combinations were the oqxA/B (n = 42, 30.88%), oqxA/B19 (n = 27, 19.85%) and oqxA/B25 (n = 15, 11.02%). Other AMR determinants were also detected but at low frequencies (present in ≤ 21 genomes), including genes that are associated with resistance to aminoglycoside, macrolide, phenicol, quinolone, sulfonamide, tetracycline, and trimethoprim.

Among the beta-lactamases, we identified a total of 24 chromosomally encoded blaSHV gene variants (Fig. 2B and Supplementary Data 3). The most frequently occurring variants were blaSHV-11 (n = 37 genomes, representing 27.2% of the population), blaSHV-1 (n = 28 genomes, 20.58%), and blaSHV-27 (n = 11 genomes, 8.08%). The gene blaSHV has been reported to have undergone robust allelic diversification in clinical K. pneumoniae and other Enterobacteriaceae, and our results were consistent with this diversity33. While the presence of the wild type blaSHV is responsible for resistance to ampicillin, amino acid substitutions from allelic diversification may cause them to have expanded functionality such as ESBL activity and/or beta-lactamase inhibitor resistance activity33. The variants blaSHV-38, blaSHV-164, blaSHV-187 that we detected in our dataset are known to be associated with resistance to cephalosporins34. However, we did not observe resistance to cephalosporins when tested in vitro in isolates harboring these genes. Furthermore, isolate KPB57 which carried the gene blaSHV-41 exhibited resistance to first (cefazolin) and second (cefoxitin) generation cephalosporins but not to third generation cephalosporins. Detection of this bla variant has been associated with conflicting ESBL phenotypes35,36.

High concordance of AMR phenotype and genotype

We sought to investigate the level of concordance between the results of the in vitro antimicrobial susceptibility testing and the in silico screening of genetic elements conferring resistance to different antimicrobial agents. We defined the following terms: (1) True positives were isolates with resistant phenotypes harboring a corresponding resistance genetic determinant; (2) True negatives were isolates with susceptible phenotypes and do not carry the corresponding AMR determinant in their genome; (3) False negatives were isolates exhibiting a resistant phenotype but with no corresponding AMR genetic element detected in the genome; and (4) False positives were isolates that exhibit a susceptible phenotype but carry the AMR genetic element in their genome. Overall, we found high concordance between the resistant phenotypes and the presence of the corresponding AMR genes. For the six antimicrobial classes, concordance values range from 90.44% (aminoglycosides) to 100% (carbapenems) (Fig. 2C and Supplementary Data 4).

In the phenotype-genotype concordance analysis of carbapenem resistance, the single carbapenem resistant isolate KPB97 (ESBL-producing) lacked a detectable carbapenemase gene. Examination of the gene encoding the outer membrane porin revealed a truncation in the ompK36 gene. We observed ompK36 in the ESBL-producer KPB97, whereas other ompK35 and ompK36 truncations were detected in genomes not containing ESBL (Supplementary Data 3). Previous reports have shown that truncation of ompK36 in the presence of an ESBL is sufficient for non-susceptibility to ertapenem but not to imipenem and meropenem37,38. Hence, we considered this isolate as true positive. We therefore assigned perfect concordance for carbapenem resistance, with specificity of 100% (range: 97.3–100%) and sensitivity of 100% (range: 2.50–100%).

Concordance between quinolone resistance phenotype and genotype showed 97% agreement, with specificity of 100% (range: 97.18–100%) and sensitivity of 57.1% (range: 18.4–90.1%). In our study, we associated non-susceptibility to ciprofloxacin and levofloxacin with the presence of either plasmid mediated qnrB1 or the mutations in the quinolone resistance determining regions (gyrA_S83I + parC_S80I)39. A total of four and 129 genomes were true positives and true negatives, respectively. We did not detect mutations in the quinolone resistance determining regions for genomes phenotypically resistant to ciprofloxacin (KPB44 and KPB68) but harbored a qnrB1 gene. We observed no detectable mechanism of resistance in the levofloxacin resistant isolate (KPB87).

Phenotype-genotype concordance of sulfonamide resistance was 97.79%, with specificity of 98.36% (range: 94.2–99.8%%) and sensitivity of 92.85% (range: 66.1–99.8%). Cephalosporin phenotype-genotype agreement was 92.64%, with specificity of 99.16% (range: 95.44%–99.97%) and sensitivity of 43.75% (range: 19.75–70.12%). Notably, all ceftriaxone resistant isolates except isolate KPB70 (resistant to all cephalosporins) harbored the ESBL gene blaCTX-M-15. For aminoglycosides, concordance was 90.44%, with specificity of 90.29% (range: 83.98–94.73%) and sensitivity of 100% (range: 15.81–100%).

Multidrug resistant and hypervirulent clones are present in bloodstream infections

Multidrug resistant clones are defined as those encoding acquired resistance determinants to at least three antimicrobial drug classes at high frequencies (≥56%), whereas hypervirulent clones harbored the plasmid-associated virulence genes iuc, iro and/or rmpA/rmpA2 at frequencies between 31–100%40. We used these in silico definitions to determine whether any of our isolates belong to known multidrug resistant or hypervirulent clones. A total of 17 genomes (12.5%) in our dataset matched known multidrug resistant clones (Fig. 3). The multidrug resistant clones included CG14, CG20, CG147, CG711, CG1123, CG10094, CG10156, CG10253, CG10476 (each with one genome), CG10124 and CG10529 with two genomes each, and CG307 with four genomes. When we mapped the multidrug resistant clonal groups against the AMR gene presence and absence results determined using Kleborate41, we found that only members of CG307 harbored resistance genes to multiple antimicrobial classes, including the ESBL gene blaCTX-M-15 (Supplementary Data 4). However, clones that were not designated as multidrug resistant (CG10524, CG429, CG2623, CG540, CG10151, CG12251, CG45, ST6366, and ST3293) as previously reported40 also harbored resistance genes to multiple drug classes in our study (Fig. 3, Supplementary Data 2 and Data 4). Among the unassigned clones, we detected the presence of blaCTX-M-15 in CG10524 (ST1564), CG429 (ST429), and ST3293.

Fig. 3: Presence of multidrug resistant and hypervirulent clones.
figure 3

The maximum likelihood phylogeny was built from the single nucleotide polymorphic sites of 3,511 concatenated core genes. The tree is identical to that in Fig. 1A. Only the major CG and ST are shown for visual clarity. The Group category refers to multidrug resistant (MDR; red blocks) and hypervirulent (Hv; blue blocks) K. pneumoniae clones defined in references40,47 and implemented on Kleborate41. The matrix shows the presence (purple blocks) or absence (white) of at least one gene conferring resistance to each antimicrobial class. AGLY Acronyms, AGLY aminoglycoside, FLQ fluoroquinolones, PHE Phenicol, SUL sulfonamides, TET tetracycline, TMT trimethoprim, BLA beta-lactamase. The presence of chromosomally encoded blaSHV is indicated by the light blue blocks.

Experimental evidence from previous studies has identified the presence of key markers for hypervirulence that includes the siderophores yersiniabactin, aerobactin and salmochelin as well as hypermucoidy via capsule overproduction28,40. In our study, we identified only one known hypervirulent clone based on the previous classification40, and this clone included three genomes belonging to CG23/ST23. We did not find convergent clones in our dataset, i.e., those that are both hypervirulent and multi-drug resistant42.

Diversity and transmission of plasmid-encoded ESBL gene bla CTX-M-15

Plasmids encoding the ESBL gene blaCTX-M-15 often harbor resistance determinants to additional drug classes26. We sought to understand the epidemiology of plasmids carrying the blaCTX-M-15 by sequencing plasmid genomes. We identified blaCTX-M-15 in seven (5.14%) bloodstream isolates. Because multidrug resistant isolates from non-blood samples (e.g., urine) are also routinely archived by DHMC as part of patient care, we also identified another isolate (KPB102) from a urine sample that carried blaCTX-M-15. From the long-read sequencing data of the eight isolates, we obtained 12 circular plasmid genomes (Fig. 4A and Supplementary Data 5). Of these, four isolates carried two plasmids (KBP68, KPB77, KBP97, KBP176), while the other four isolates carried only one plasmid (KBP44, KBP140, KBP166, KBP102). The eight isolates came from four STs representing four CGs, of which five isolates are members of ST307 (CG307). The gene blaCTX-M-15 was present in either of the two replicon types IncFIB(K)/IncFII(K) (n = 6 plasmid genomes) or IncFIB(K) (n = 2 plasmid genomes). The sizes of these plasmids ranged from 131,110 bp (pKPB68_2) to 246,740 bp (pKPB176_2). Four plasmid genomes with an unknown replicon type were also detected and classified as untyped. The sizes of the untyped plasmids ranged from 3559 bp (pKPB77) to 367,083 bp (pKPB97).

Fig. 4: Plasmid types and transmission of plasmid carrying blaCTX-M-15.
figure 4

A Bar plot showing the number of plasmid genomes and replicon types detected in the eight isolates harboring the gene blaCTX-M-15. B Timeline of transmission of plasmids encoding blaCTX-M-15 between isolates. Horizontal lines connecting circles show the mash distances between plasmid genomes in each linked group. In group 1, both isolates are from ST307/CG307. In group 2, the black asterisk (*) on pKBP97_3 indicates that the plasmid came from an ST429/CG429, while the other two isolates are members of ST307/CG307. For both panels A and B, circles are colored by clinical source (red for blood, yellow for urine). C Structural comparison of plasmid genomes encoding blaCTX-M-15 and belonging to group 1 and group 2 in panel 2B. Gray areas between plasmids represent regions with 80–100% sequence identity. Genes are represented by arrows and are color-coded according to general function: magenta—antimicrobial resistance, brown—heavy metal resistance, green—conjugation transfer, yellow—mobile elements such as insertion sequences, transposons and integrases.

Two groups of plasmid sequences sampled from different patients exhibited high sequence identity (Fig. 4B and Supplementary Fig. S2A, B). The first group consisted of plasmids pKPB102_2 and pKPB166_2, which are both of the IncFIB(K)/IncFII(K) replicon type. The two plasmid sequences are of the same size (231,545 bp) and have 100% sequence identity and sequence coverage, as well as a low pairwise mash distance of 7.14e-6 (Fig. 4B). Each plasmid was retrieved from two different isolates (KPB102 and KPB166) that both belonged to ST307, and which differed by 13 SNPs in their core genome sequence alignment. One isolate was collected from a urine sample in October 2019 and the second isolate from a blood sample in September 2021 from different patients. The high identity between the two plasmid sequences and their common strain background exemplifies a transmission pattern within the same clonal lineage (ST307).

The second group of highly similar plasmid sequences consisted of three IncFIB(K)/IncFII(K) plasmids (pKPB77_2, pKPB97_3, pKPB176_2) (Fig. 4B). The three isolates that carried these plasmids were sampled from three different patients. Plasmids pKPB77_2 (245434 bp) and pKPB97_3 (243759 bp) shared 100% sequence identity, 99% sequence coverage, and a mash distance of 3.1e-5. These two plasmids were retrieved from two distinct lineages—isolate KPB77 is a member of ST307/CG307 and KPB97 is a member of ST429/CG429. Both isolates were from blood samples collected in March 2019 and August 2019, respectively. The two isolates differed by 19,080 SNPs in their core genome alignment. ST307 and ST427 are 4-locus variants in the MLST scheme (gapA, pgi, phoE, tonB). A third isolate in this group is KPB176 (pKPB176_2), which was collected from a blood sample in November 2021. This plasmid (pKPB176_2) exhibited high identity with pKPB77_2 (99% sequence identity, 99.99% sequence coverage, mash distance of 3.1e-4). The isolates from which plasmids pKPB77_2 and pKPB176_2 were derived from both belonged to ST307 and differed by 63 SNPs in their core genome sequence alignment. Such variation in their core SNPs appear to have accumulated over 2 years between March 2019 and November 2021. These results show that this plasmid type was mobilized through both horizontal transmission between phylogenetically distinct lineages (in the case of pKPB77_2 and pKPB97_3) as well as transmission within the same clonal lineage (in the case of pKPB77_2 and pKPB176_2 in ST307). The notable difference between the two groups of plasmids (i.e., group 1 and group 2 in Fig. 4B) is that group 2 contains a ~13 kb cluster of genes that includes IS6 family transposases, Tn5403 family transposase, and AMR genes catB3, blaOXA-1, aac(6’)-Ib-cr5, and tetA (Fig. 4C).

Lastly, we observed the presence of multiple resistance genes carried on different plasmids within a single genome. Multidrug resistance in isolate KPB68 (ST3293/CG12467 sampled in 2018) was mediated by resistance determinants present in two plasmids. The plasmid pKPB68_2 has a genome size of 131,110 bp, replicon type IncFIB(K), and carries blaCTX-M-15, aph(3”)-Ib, aph(6)-Id, and sul2, blaTEM-1. The plasmid pKPB68_3 has a genome size of 32,881 bp with an untyped replicon and carries aac(6’)-Ib-cr, qnrB1, catB3, blaOXA-1, tetA, and dfrA14 (Supplementary Fig. S2A, B).

Antigenic diversity and hypervirulence markers

Using in silico screening of the genomes, we inspected our isolates for the presence and diversity of the surface polysaccharide capsule (KL) and lipopolysaccharide (O) that determine the antigenic serotypes. These two structures activate the human immune system during infection and are also widely used in strain typing43,44. We identified 53 known capsular KL locus types and four genomes with unknown KL types (Simpson’s diversity index = 0.971) (Fig. 5A and Supplementary Fig. S3A, Supplementary Data 3). The most frequent KL types in our dataset were KL28 (present in nine genomes), KL24 (eight genomes), and KL102 (seven genomes). The KL1 type which is a genetically homogenous capsular type commonly associated with hypervirulent K. pneumoniae clones45 was detected in three genomes belonging to ST23. We also identified 11 different O types in our dataset (Simpson’s diversity index = 0.731). Three O types (O1/O2v2 = 54 genomes, O1/O2v1 = 37 genomes, O3b = 25 genomes) accounted for >85% of our dataset (Supplementary Fig. S3B). Two unknown O types were detected in ST2004 (KPB152) and ST6366 (KPB79) genomes.

Fig. 5: Diversity and distribution of surface polysaccharides and hypervirulence markers.
figure 5

A Different combinations of the surface polysaccharides capsule (KL) and lipopolysaccharide (O) loci are shown. The colors and sizes of the bubbles correspond to the number of genomes that carry unique KL and O combinations. B The maximum likelihood core genome phylogeny showing the phylogenetic distribution of genes encoding yersiniabactin (ybt), colibactin (clb2), aerobactin (iuc1), salmochelin (iro1), and rmpA/2. The tree is identical to that in Fig. 1A. Only the major CGs and STs are shown for visual clarity.

We searched our short-read sequence dataset for the presence of Klebsiella-specific virulence factors. First, we screened the genomes for yersiniabactin, a siderophore system for sequestering iron that enhances bacteria survival and replication within the host46. This genetic marker is encoded by the ybt gene and is transferred by mobile genetic elements, in particular the integrative conjugative element ICEkp46. In our study, we detected five known ybt variants (ybt 1, 4, 9, 10 and 14) in 24 genomes representing a total of 17.64% of the population (Fig. 5B). The five ybt variants were widely distributed across the core genome phylogeny. The most common ybt variants were ybt 10, ybt 1, ybt 4, ybt 14, and ybt 9. Variant ybt 10 was present in five genomes from ST45/CG45 and one genome each belonging to ST36/CG36, ST14/CG10156 and ST1564/CG10524. ybt 1 was present in three genomes from ST23/CG23 and one genome from ST260/CG10671. ybt 4 was present in one genome each from ST37/CG10094, ST2217/CG2217, ST2.248/CG10668 and ST6360/CG12462. ybt 14 was present in one genome each from ST20/CG20, ST111/CG111, ST942/CG942. ybt 9 was present in one genome each from ST35/CG35, ST307/CG307 and ST393/CG12458. Two genomes possessed unknown ybt variants (KPB59 from ST36 and KPB141 from ST1554).

Using the short- and long-read hybrid assemblies, we screened for the presence of other key virulence determinants such as clb2 (colibactin), iuc1 (aerobactin), iro1 (salmochelin), and rmpA/A2 (activator for capsule biosynthesis) in plasmid genomes. These genes have been previously identified as genetic markers of hypervirulence in K. pneumoniae47. Yersiniabactin, aerobactin, and salmochelin are siderophores that promote bacterial survival in nutrient-poor environments by chelating iron48 and is therefore particularly important in the growth, replication, and metabolism of bacteria in the blood. The RmpA/2 activator of capsule biosynthesis leads to hypermucoidy49. We detected the presence of these four virulence markers in only four genomes (Fig. 5B and Supplementary Data 3). Three of these four genomes are known hypervirulent clones (ST23/CG23), whereas the fourth genome belongs to clonal group ST260/CG10671, which is a double-locus variant of ST23 (in genes pgi and phoE). These four genomes clustered together in the core genome phylogeny and they all possessed the ybt 1 variant, KL1 capsular type, and two O types (O1/O2v1 in KPB51 and O1/O2v2 in the other three).

Analysis of long read sequencing data on these four isolates showed that the hypervirulence genes are borne on plasmids ranging in size from 178,418 bp (pKPB165_2 in isolate KPB165) to 228,574 bp (pKPB147_2 in isolate KPB147) (Supplementary Data 6). The sequences of the hypervirulence-carrying plasmids were characterized by rearrangement, loss, and/or truncation of genomic regions (Supplementary Fig. S4). For instance, a ~50 kb region present in all other hypervirulence plasmids was missing in pKPB165. This region consists of several genes including those encoding tellurium resistance (terABCDEXZ) and the transposase Tn3. Although the hypervirulence plasmids were frequently flanked by transposases and insertion sequences, they lack a conjugative apparatus, except for the presence of the gene encoding the TraI protein that functions in site and strand nicking50,51. We did not identify an origin of transfer (oriT) site in any of the four hypervirulence plasmids, suggesting that they are not themselves able to initiate their transfer via conjugation.

Discussion

The increasing burden caused by AMR in bloodstream infections has life-threatening consequences because it can considerably increase the rates of treatment failure and death8,9. Hence, understanding the mechanisms of resistance dissemination remains a key strategy for effective management and control of AMR in invasive K. pneumoniae. Here, we analyzed 136 short-read genome sequences of K. pneumoniae complemented with 12 long-read sequences to understand the drivers that facilitate the dissemination of resistant K. pneumoniae strains in bloodstream infections.

Our study highlights the remarkable clonal and genomic diversity of K. pneumoniae in bloodstream infections. Our results are consistent with those reported in other hospitals in the United States11,52,53, Asia42, and Europe54. Our findings greatly expand the number of bloodstream K. pneumoniae clones, indicating that the capacity to cause bloodstream infections is not restricted to one or few genetic lineages. Such diversity means that the underlying causes of resistance, other clinically relevant traits, and disease outcomes may vary considerably between K. pneumoniae from different patients and between settings. Among the clones detected in our study, we report the presence of known multidrug resistant high-risk pandemic clones such as ST14, ST17, ST45 ST147 and ST30710,55,56. Infections caused by ESBL-producing K. pneumoniae present a high burden associated with increased mortality, length of hospital stay, and medical costs57,58. This ESBL subset of K. pneumoniae continuously threatens the efficacies of beta-lactams antimicrobials, which are the preferred treatment options for bloodstream infections due to their broad activity spectrum and selective toxicity59.

Lineages ST258 and ST307 are reported to be endemic in the United States and are associated with resistance to third-generation cephalosporins and carbapenems11,52,53. Interestingly, we did not detect ST258 in our dataset, whereas ST307 was present only at a low frequency (2.94%). All ST307 isolates in our study were multidrug resistant, as well as resistant to third-generation cephalosporins, and were involved in plasmid transmission of the ESBL gene blaCTX-M-15. ST307 likely emerged in Europe around 1994 and rapidly spread across six continents60, causing many outbreaks in nosocomial settings and long-term care centers55. The first published report of ST307 in the United States was in Texas in 201361. However, a decline in the recovery of these two clones have been recorded in some parts of the United States62,63. A search of the K. pneumoniae-dedicated databases in BIGSdb64 and Pathogenwatch65 (as of May 2024) revealed a decline in the frequency of ST258 and ST307 in the United States after 2017, with no record of ST258 since 2022. The underlying causes for this decline are unclear. Different K. pneumoniae clones are known to experience repeated waves of clonal expansion and replacement in other parts of the world66,67,68, and this may partly explain our findings at DHMC and the apparent decline in the frequency of STs 258 and 307 in the country. Hence, continued surveillance of K. pneumoniae, including antimicrobial susceptible and non-ESBL lineages that may become recipients of AMR-carrying plasmids, are critical to uncover signals of emerging STs that have the potential to replace currently known high-risk clones. Early detection and infection control efforts of new threats will help prevent their further spread.

Concordance analysis of AMR phenotype-genotype is critical for evaluating accuracies of antimicrobial susceptibility tests and correlating phenotypes to predicted genotypes. Such analyses are critical for validating phenotypic tests, uncovering novel resistance determinants, and enhancing epidemiological investigations69,70. While concordance for tested antimicrobials was >90%, discrepancies stemming from the inability to match resistant and susceptible phenotypes with genotypes decreased concordance sensitivity and specificities, respectively. For instance, the discrepancies we observed in cephalosporin phenotype-genotype concordance were mainly associated with the inability to sort out blaSHV phenotypes. Such difficulty stems from the highly diversified SHV gene36,71, with 182 blaSHV alleles in the NCBI reference Gene Catalog (as of December 2023). Assignments of SHV allele functionality have relied mainly on varying supporting evidence, such as the assignment of ESBL phenotype without eliminating the presence of other ESBL enzymes or defining specific roles of ESBL activity in the SHV allele33. Future work should aim to precisely elucidate the range of SHV variation, enzyme activity and functions in genetically and ecologically diverse K. pneumoniae.

Mobile genetic elements such as plasmids are important vehicles for the spread of AMR and have been implicated in disease outbreaks and transmission72,73. Plasmids harboring ESBLs are significant contributors to the success of international and epidemic K. pneumoniae clones such as ST307 and ST25810. Our results show that plasmids belonging to the IncF family mediate the clonal (within the same lineage) and horizontal (between lineages) transmission of the ESBL gene blaCTX-M-15. Plasmid transfer between clones and distantly related genomes has been previously reported. For example, clonal transmission of an IncHI1B/IncFIB plasmid between two K. pneumoniae ST11 and transmitted horizontally to Escherichia coli strains (ST10 and ST58) has been documented74. In our study, we also show two notable plasmid acquisitions that can further facilitate the stable maintenance of ESBLs in nosocomial settings. Plasmid transfer can overcome ecological barriers (e.g., urine to blood); hence, future surveillance efforts will benefit from the inclusion of non-blood isolates to improve current knowledge about the ecological range of ESBL-carrying plasmids. Acquisition of more than one plasmid within a single genome can rapidly lead to the emergence of multidrug resistance and convergent pathotypes29. Continuous surveillance of IncF plasmids and the topology of plasmid sharing networks across phylogenetic and spatiotemporal landscapes is therefore critical.

Hypervirulent K. pneumoniae are a typical cause of community-acquired infections, including liver abscesses and bacteremia75,76. In our study, we detected only four genomes harboring the hypervirulence genetic determinants. Three of these genomes belong to the well-established hypervirulent K. pneumoniae clone ST23 (CG23) commonly associated with the KL1 capsule serotype40,76. Intriguingly, members of this ST have evolved from drug susceptible hypervirulent phenotypes that acquired multidrug resistance plasmids77 and carbapenemases78. This troubling convergence of multidrug resistance and hypervirulence creates “superbugs” and has dire consequences in controlling infections caused by this clone. The fourth hypervirulent clone in our study was an ST260 (strain KPB147 with KL1 capsule), a double locus variant of ST23. This clone may have arisen from recombination between an ST23 and another lineage of K. pneumoniae76. Our results show that pKPB147 was unlike the ST23 virulence plasmids which appear to have experienced significant genomic truncation and gene segment rearrangement. The unique characteristics of multidrug resistant and hypervirulent pathotypes should be an important consideration in the development of clinical interventions and treatment in bloodstream infections.

Although we have performed comprehensive profiling of our bloodstream isolates, we acknowledge important limitations in our study. First, our definition of transmission groups in this study is associated with little data on epidemiological linkages other than the source and time of isolation of K. pneumoniae isolates. For example, we did not have information about whether patients were part of the same household, social groups or hospital wards, all of which may facilitate transmission. Second, we carried out long-read sequencing only on 11 out of the 136 blood isolates and one urine isolate. Hence, other plasmid types, plasmids not associated with ESBL carriage, and plasmids carrying other virulence determinants were left unexplored. We acknowledge that it would be ideal to sequence more plasmids, e.g., from isolates sampled in other human body sites or the hospital environment, to a gain a wider picture of transmission routes within the medical center. However, resource limitation precludes us from carrying this out. Future investigations on the ecological niches and distribution of plasmids are therefore critical in understanding their roles in invasive K. pneumoniae infections. Third, our relatively small isolate collection represents a short time frame and may not effectively capture long-term clonal persistence and transmission that may have occurred within the medical center. Lastly, we did not investigate other members of Enterobacteriaceae, which are known to play a role in interspecies transmission, horizontal gene transfer, and environmental persistence79. Nonetheless, our findings provide an important baseline census of the standing lineage and AMR diversity for continued surveillance of bloodstream infections. This work also presents a strong impetus to consider plasmid sharing in epidemiological surveillance within the medical center. The limitations presented here will stimulate future explorations that will define the basis for the adaptive success of K. pneumoniae occupying the bloodstream niche.

Our study shows that the K. pneumoniae in bloodstream infections can vary substantially in terms of the clonal lineages, phenotypic resistance patterns, and carriage of genes that mediate multidrug resistance and hypervirulence. The temporal persistence of resistant strains and local dissemination of ESBL genes lay in part on the rapid transmission of the IncF family of plasmids within the same bacterial lineage and between different lineages. This work will be useful in current efforts in designing effective strategies and interventions to control the spread of high-risk bacterial clones and in reducing the opportunity for pathogen persistence and onward transmission. Continued surveillance and further genomic epidemiological studies in healthcare settings are critical to determine the consequent risk to other vulnerable patients and to the wider community.

Methods

Bacterial isolation, identification, and antimicrobial susceptibility testing

Bacterial isolates from bloodstream infections in unique pediatric and adult patients were grown from clinical blood cultures submitted to the Department of Pathology and Laboratory Medicine at DHMC, New Hampshire, USA from January 2017 to January 2022. Multidrug resistant isolates from non-blood samples (e.g., urine) are also routinely archived as part of patient care. Initial species identification was carried out using either the FilmArray Blood Culture Identification (BCID) panel for isolates sampled from January 2017 to May 2021 or the FilmArray Blood Culture Identification 2 (BCID2) panel for isolates sampled from June 2021 onwards. The BCID/BCID2 panel (BioMerieux; Durham, North Carolina, USA) is a multiplexed PCR assay for rapid identification of causative pathogens from positive blood cultures80. Using this assay, 136 isolates were identified in the clinical laboratory as K. pneumoniae.

Ethical approval was granted by the Committee for the Protection of Human Subjects of DHMC and Dartmouth College. The study protocol was deemed not to be a human subjects research. Samples used in the study were subcultured bacterial isolates that had been archived in the routine course of clinical laboratory operations. No patient specimens were used, and patient protected health information was not collected.

Antimicrobial susceptibility testing (AST) was performed on the MicroScan Walkaway 96 Plus automated instrument (Beckman-Couter; La Brea, California, USA) using two FDA (Food and Drug Administration)-cleared AST panels: the NUC62 Panel from January 2017 to May 2019, then switching to the Neg MIC Panel 46 from June 2019 onwards. A verification study of each panel had been performed prior to use for testing of patient isolates. The breakpoints applied on the NUC63 Panel were those of the manufacturer at the time of FDA clearance. When transitioning to the Neg MIC Panel 46, off-label breakpoints for the carbapenems and select cephalosporins had been validated to align with the Clinical Laboratory Standards Institute M100 S28 guidelines81. The 20 antimicrobial agents tested represented six antimicrobial classes and sub-classes: aminoglycosides (amikacin, gentamicin); antifolate (sulfamethoxazole/trimethoprim); carbapenems (ertapenem, meropenem), cephalosporins (cefoxitin, cefazolin, cefepime, cefotaxime, cefotetan, ceftazidime, ceftriaxone, cefuroxime); fluoroquinolones (ciprofloxacin, levofloxacin); monobactam (aztreonam); penicillins (ampicillin, ampicillin-sulbactam, amoxicillin-clavulanic acid, piperacillin/tazobactam). Results of the antimicrobial susceptibility testing are presented in Supplementary Data 1. All isolates were stored in DMSO solution at −80 °C.

DNA extraction, library preparation, and whole genome sequencing

Isolates were subcultured from DMSO stocks onto commercially prepared tryptic soy agar with 10% sheep red blood cells (Remel; Lenexa, Kansas, USA) and in brain heart infusion broth (BD Difco; Franklin Lakes, New Jersey, USA) at 37 °C for 24 h. For short-read sequencing, DNA was extracted and purified from liquid cultures using the QuickDNA Fungal/Bacterial Miniprep Kit (Zymo Research; Irvine, California, USA) following manufacturer’s protocol. DNA libraries of each sample was prepared using the Illumina DNA Prep Kit and IDT 10 bp UDI indices in accordance with the manufacturer’s instructions. DNA samples were sequenced as multiplexed libraries on the Illumina NextSeq 2000 platform operated per the manufacturer’s instructions. Illumina bcl-convert (v.3.9.3) was used for demultiplexing, quality control, and adapter trimming of reads. Sequencing resulted in 151 nt long paired end reads. For long-read sequencing, high molecular weight DNA isolates was extracted using the Quick-DNA HMW MagBead Kit (Zymo Research; Irvine, California, USA) following manufacturer’s instructions. DNA samples were prepared using the Oxford Nanopore Technologies (ONT) SQK-LSK114 native barcoding kit. Sequencing was performed on the GridION platform using a FLOW-MIN114 Spot-ON Flow Cell, R10 version with a translocation speed of 400 bps. Base calling was performed on the GridION using the super-accurate base-calling model, Guppy v7.0.9. For both short- and long-read sequencing, we used Qubit fluorometer (Invitrogen; Grand Island, New York, USA) to measure DNA concentration. Illumina sequencing was carried out at SeqCenter (Pittsburgh, Pennsylvania, USA), while ONT sequencing at SeqCoast Genomics (Portsmouth, New Hampshire, USA).

Genome assembly, quality check and annotation

De novo assembly of Illumina short reads into contiguous sequences was done using shovill v1.1.0 (https://github.com/tseemann/shovill). Trimming of adapter sequences was enabled using the -trim option. Shovill includes methods for subsampling read depth down to 150X, trimming adapters, correcting sequencing errors, and assembling using SPAdes v.3.14.182. For long reads obtained using ONT, adapters were removed using porechop v0.2.4 (https://github.com/rrwick/Porechop). High quality reads were filtered using filtlong v0.2.1 (https://github.com/rrwick/Filtlong). Reads that were shorter than 1 kb were excluded. Ten percent of the worst reads were also discarded. Hybrid assembly of Illumina and ONT reads was done using unicycler v0.5.083.

The sequence quality of assembled K. pneumoniae genomes were determined using CheckM v.1.1.384 and QUAST v.5.0.285. Using CheckM, we calculated genome completeness ranging between 98.48 and 100 % (mean = 99.87%) and contamination ranging between 0.31 and 1.82% (mean = 0.43%) for Illumina assemblies (Supplementary Data 2 and Fig. S5). For the hybrid assemblies, genome completeness ranged between 98.75 and 100 % (mean = 99.85%) and contamination ranged between 0.33 and 0.85% (mean = 0.45%). The above metrics were all within the genome quality standards recommended by CheckM (Supplementary Data 2). All 136 genome sequences were of high quality consisting of <200 contigs and N50 > 40,000 bp (Supplementary Data 2). The number of contigs in the hybrid assemblies were either two or three, except for KPB68 which had five contigs (two of which were observed not to be of plasmid origin after manual inspection and were excluded in downstream analysis). All plasmid genomes had circular topology. The draft assemblies were annotated using Prokka v.1.14.686. All together, we used 136 short-read genome and 12 long-read genome sequences in all downstream analyses. Associated metadata and genome quality features of all isolates are shown in Supplementary Data 2, Data 6, Fig. S6, Fig. S7.

The species identity of all assembled genomes was confirmed using Kleborate v.2.2.041. Kleborate is a Klebsiella-dedicated species assignment and genotyping pipeline that uses genome assemblies as input and compares them to a taxonomically curated genome dataset41. Kleborate confirmed all 136 isolates to be true K. pneumoniae.

In silico sequence typing and identification of AMR and virulence genes

The K. pneumoniae genome assemblies in this study were submitted to the K. pneumoniae database on BIGSdb (https://bigsdb.pasteur.fr/klebsiella/) to determine their 7-gene multi-locus sequence types (MLST)87, ribosomal MLST88, core genome (cg) MLST based on 629 previously curated core genes89, sublineages, and clonal groups31. One isolate KPB111 was missing the infB locus and could not be placed into any known ST designation (provisionally called ST133-1LV). Genome assemblies were screened for the presence of AMR determinants using Kleborate v.2.2.041 and AMRFinderPlus v.3.10.2334. The presence of K. pneumoniae-specific hypervirulence factors were identified using Kleborate v.2.2.041.

Pan-genome estimation and phylogenetic tree reconstruction

The annotated genomes were used as input to characterize the pan-genome90, i.e., the totality of genes of all strains in our dataset, using Roary v.3.13.091. Nucleotide sequences were aligned using MAFFT v.7.47192. Sequence alignments of the 3,511 core genes (i.e., gene families present in 99% or 134–136 genomes) were concatenated to generate the core genome alignment. SNPs were extracted from the core genome alignment using SNP-sites v.2.5.193. The core SNP alignment was used as input for building a maximum likelihood phylogenetic tree using RAxML v.8.2.1294 with a generalized time reversible (GTR)95 model of nucleotide substitution and gamma distribution of rate heterogeneity. The phylogenetic trees were visualized and annotated using figtree v.1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/) and Interactive Tree of Life (iTOL) v.6.8.196.

In silico plasmid analysis

Using the ONT sequences, complete and circular plasmid sequences were clustered using Mash v.2.397. Accuracy of the distance estimation was improved by implementing a minimum abundance finder --mindepth 0 to ignore likely read errors from unique kmers. Mash distance with threshold of 0.001 was used to define highly similar plasmid sequences74. Plasmid incompatibility types were determined using the Plasmidfinder database accessed on December 6, 202398 implemented on ABRicate v.1.0.1 (https://github.com/tseemann/abricate). To identify instances of genome rearrangement and gene loss, plasmid sequences were aligned, visualized and annotated using progressive MAUVE99. Prediction of the origin of transfer (oriT) site and relaxases was done using oriTfinder100. Plasmid assemblies were annotated using Bakta v.1.4.0101 and Artemis102. Plasmid sequences were visualized and annotated using Easyfig (https://mjsull.github.io/Easyfig/) and Geneious Prime v. 2022.2.1 (http://www.geneious.com).

Statistical analysis

We determined the concordance between the results of the in vitro testing of antimicrobial susceptibility and the presence of corresponding genetic elements conferring resistance against specific antimicrobial classes by calculating the number of true positives, true negatives, false positives, and false negatives. To calculate sensitivity and specificity of a diagnostic test for AMR, we used the epi.tests function of the epiR package v.2.0.57 implemented in RStudio v.2023.12.0 + 369103, following the protocol in ghruR (https://gitlab.com/cgps/ghru/ghrur/). AMR sensitivity and specificity were determined with 95% confidence bounds. AMR sensitivity or true positive rate measures the proportion of truly resistant strains, whereas AMR specificity or true negatives measures the proportion of truly susceptible strains at 95% confidence level. Simpson’s diversity index104 was calculated using the R package vegan105. All plots (bar plots, bubble plots, UpSet plots, presence/absence matrices) were generated using RStudio v.2023.12.0 + 369103.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.