Introduction

A coordinated public health strategy is needed to slow the global spread of antimicrobial resistance. Among drug-resistant organisms, the Centers for Disease Control and Prevention (CDC) identified carbapenem-resistant Enterobacterales (CRE) as a prominent, “urgent threat” to patients in healthcare facilities1. CRE are resistant to nearly all antibiotics and are estimated to cause about 13,000 infections in hospitalized patients and $130 million in healthcare costs annually. Several new agents (e.g., ceftazidime-avibactam, meropenem-vaborbactam, imipenem-cilastatin-relebactam, and plazomicin) have demonstrated success in treating CRE infections2,3. However, emerging resistance to these antibiotics is of clinical concern4,5. Active surveillance and interventions (e.g., patient isolation, a ward-dedicated staff, and environmental cleaning) help reduce CRE spread in healthcare facilities6,7. However, a challenge remains in identifying the most effective control strategy across a healthcare network.

Several CRE outbreaks have been documented within hospitals and other care facilities8,9,10,11,12,13. Additionally, CRE have been isolated from individuals with no prior exposure to a healthcare facility with community-onset CRE contributing to the introduction of CRE to hospitals14,15,16. Efforts to trace bacterial transmission include epidemiological investigations in combination with genetic strain typing (e.g., pulsed-field gel electrophoresis [PFGE], multilocus sequence typing [MLST], whole genome sequencing [WGS]) to estimate the likelihood of transmission events12,17,18,19. A comparison of core single nucleotide polymorphisms (SNPs) derived from genomic sequences have also been increasingly applied to characterize strain relatedness20,21,22. However, detailed epidemiological investigations are often limited to suspected outbreak events at a single facility, which restricts the availability of data for tracking bacterial spread on a larger inter-facility scale. Network analysis has emerged as a tool for characterizing the relationships and group structures between individuals within a large, diverse community23,24,25. Combining a network analysis with core SNP data may offer improved mapping of CRE transmission across a complex healthcare network.

The Consortium on Resistance Against Carbapenems in Klebsiella and other Enterobacteriaceae (CRACKLE-1) study was a multi-site, prospective, observational study of patients with CRE in the US and offered an opportunity to describe CRE spread and diversity across several healthcare systems26. We applied a network analysis combined with bacterial genome data to characterize potential CRKp spread among patients within and between healthcare facilities.

Results

Patient characteristics

Overall, 526 patients were admitted 630 times to one of 16 ACHs with a positive CRKp culture during the study period (Table 1; Supplemental Digital Content 5). Of these patients, 66 (13%) were admitted multiple times to an ACH (median number of hospitalizations: 2, range: 2–7). The median age at index admission was 66 years with about half of patients being female (54%). Most patients were Caucasian (55%) and 82% identified as neither Hispanic nor Latino. Common comorbidities included mechanical ventilation (47%), diabetes mellitus (44%), and renal failure (36%). There were 347 patients (66%) who had a putative transmission link to at least one other patient and were part of the analysis network; the remaining 179 “out-of-network” patients had no potential linkages based on our SNP and time definitions and were excluded from the network analysis. In-network patients were more likely to have longer ICU stays, be on mechanical ventilation, and have a higher white blood cell count. In contrast, diabetes mellitus was more common in out-of-network patients who did not share a potential CRKp transmission link to another patient.

Table 1 Baseline clinical characteristics for each patient at index hospitalization.

Characteristics of CRKp isolated from patients

At index hospitalization, urine was the most common source for CRKp isolation (54%), followed by the bloodstream (16%), and the respiratory tract (15%) (Table 2). Overall, 47 sequence types (ST) were identified among the total 630 CRKp isolates. ST258 (79%) and wzi154 (51%) were the predominant ST and wzi type, respectively (Fig. 1, Table 2). Most strains (99%) carried at least one carbapenemase gene with blaKPC-3 (53%) and blaKPC-2 (45%) being the most common.

Table 2 Baseline bacterial characteristics of CRKp isolated from each patient at unique admissions.
Fig. 1
figure 1

Circle shows the phylogenetic population structure of CRKp isolates using pairwise core single nucleotide polymorphisms for all 630 CRKp isolates. Multilocus sequence type (MLST) and presence of carbapenemases and extended-spectrum beta-lactamases are identified as color-coded bars for each CRKp isolate.

Overall, the median pairwise core SNP difference among all 630 CRKp isolates was 288 (Range: 0–162,955, IQR: 98–19,601) (Supplemental Digital Content 6a). Isolates could be further grouped into two broad categories of sharing 0–200 core SNPs or 200–400 core SNPs differences, which reflects the range of core SNPs primarily observed among ST258 isolates (Supplemental Digital Content 6b and 6c). Median pairwise SNP differences between STs are also provided (Supplemental Digital Content 6d). Lastly, for patients with multiple ACH admissions, the median pairwise core SNP difference between CRKp isolates of the same ST collected from the same patient across multiple admissions was 10 (Range: 0–862, IQR: 3–23).

Healthcare facilities characteristics

Overall, there were 217 healthcare facilities included as part of the network (ACH: 55, SNF: 137, LTACH: 20, Hospice: 5). Most patients were admitted from either their home (39%) or a SNF (34%) (Supplemental Digital Content 7a and 7b). Within an ACH, most patients were located within the emergency department (21%), an intensive care unit (34%), or a medical ward (35%) at the time of first positive CRKp culture (Supplemental Digital Content 7b). Overall, in-hospital mortality was 17% and was not significantly different between patients within or excluded from the network. Patients (N = 630 admissions) were primarily discharged to either a SNF (45%) or home (30%) with 41% of patients discharged back to the same pre-admission facility. Of patients who were initially admitted from home, 51% were instead discharged to a healthcare facility. Few patients (2.9%) were transferred either to or from an out-of-state facility.

Patients were hospitalized for a median length of 13 days (IQR: 7, 26); however, both short- and long-term hospitalizations were observed (Range: 1–373 days). Detection of a CRKp most often (55%) occurred within 48 h of hospitalization with a median of 1 day (IQR: 0, 9, Range: 0–134). However, patients transferred from another ACH were more likely to have a later positive CRKp culture date (Median: 6 days) compared to those patients coming from an LTACH (Median: 2), SNF (Median: 1) or home (Median: 2) (p < 0.001).

Characteristics of putative CRKp transmission network linking patients across healthcare facilities

Overall, putative CRKp transmission was identified in 66% (347/526) of patients (i.e., patient shared an edge with at least one other patient) for a total of 1,069 putative transmission links (Fig. 2). In total, about a third of putative transmission links were between patients infected with genetically similar CRKp who also had an overlapping stay at the same healthcare facility. The remaining putative transmission linkages (740/1069, 69%) were based solely on patients being infected with genetically similar isolates. Additionally, about half of these putative transmission linkages (48%; 352/740; representing 33% [352/1069] total linkages) were between patients that were never known to be at the same facility (including both pre- and post-ACH facilities) regardless of timing at any point during the 4.5 years of study.

Fig. 2
figure 2

Putative CRKp transmission linking patients across healthcare facilities. Each node (circle) represents one patient (n = 347), colored by the acute care hospital (ACH) where the patient was diagnosed with CRKp. Patients are grouped by state based on the ACH location; shaded backgrounds indicate that the component (linked group of patients) comprises multiple states. Putative transmission of CRKp are shown as lines connecting nodes (i.e. edges) (N = 1,069 edges) with the edge color indicating the estimated transmission likelihood (1–100%) based on meeting the threshold for the length of time between admissions at the same facility (pre-admission, ACH, or post-discharge) and the CRKp genetic distance between isolates. Edge direction (arrow) goes from the patient with the earlier positive CRKp culture date to the patient with the later CRKp isolation date. Patients who were admitted to the same facility during the 4.5-year study period have solid edges between them, while patients who never occupied the same facility but have genetically similar CRKp (i.e. \(\le\) 10 SNPs) are linked by dotted lines. A change in node color between adjacent nodes with a solid line therefore indicates that facility overlap was not at the ACH of diagnosis. Singletons (cohort members with no estimated potential transmission linkages) are not shown.

The largest transmission chain encompassed 172 patients diagnosed over 1575 days (Fig. 2; Table 3). This chain comprised 700 putative transmissions (edges), of which 100 were ≤ 10 core SNPs and the remaining majority were defined as putative transmissions based on a combination of timing and 11–25 core SNPs differences. Core SNP differences between the earliest and latest first positive culture in each cluster of linked patients (i.e., network component) ranged from an average of 5.7 core SNPs across patient pairs (i.e., dyads, component size = 2) to 101 SNPs in the largest component (n = 172) (Table 3). However, most transmission chains were short and limited to two patients.

Table 3 Descriptive information about the components in the putative transmission network. The components are composed of individual patients who had a stored CRKp sample for whole genome sequencing; and (a) whose CRKp sequences were different by ≤ 10 SNPs or (b) whose CRKp sequences were different by 11–25 SNPs and whose admission and discharge dates were no more than 14 days apart at a common pre-admission, admission, or post-discharge location. The table aggregates information for the smaller components (those including 2–9 unique patients) and presents information individually for the larger components (those including 20, 32, or 172 unique patients).

Still, a clear source or spread was often not clear as patients frequently shared putative links with multiple patients (Median: 5; IQR: 2, 8; Range: 1–36) (Fig. 2). For patients housed within the same healthcare facility at any point during the study, the number of times a patient had a linkage to a patient with a later admission (network out-degree) ranged from 0 to 15 (Median: 0; IQR: 0, 2), while the number of times a patient had a linkage to another patient with an earlier date of admission (network in-degree) ranged from 0 to 12 (Median: 1; IQR: 0, 2).

ACH were the most common facility type (319/1069, 30%) where putative transmission occurred. Only three transmission clusters included patients who were diagnosed at ACHs across two different states (Ohio and Michigan). Yet, the patient(s) from different states were never housed in the same facility at any point, despite having CRE isolates with ≤ 10 SNP differences (Fig. 2; dotted lines).

Discussion

Network analyses can help identify potential transmission pathways by characterizing the temporal and spatial relationships between infected individuals23,24,25. Here, we report a network analysis combining both genomic and epidemiologic data to characterize CRKp dissemination across multiple healthcare facilities. We characterized the genetic features of circulating CRKp isolates in the US, identified a network of both intra- and inter-facility dissemination of CRKp, and compared the contribution of genetic and patient data in defining potential transmission pathways. Our analysis strengthens support for the role of long-term care facilities in CRKp dissemination.

Overall, 66% of patients were involved with either receiving or transmitting CRKp to another patient at the same healthcare facility. Most identified transmission pathways between patients were small (i.e., between only two patients) which gave the overall network a fragmented appearance. Additionally, most linkages (69%) between patients were defined exclusively by CRKp genetic distance (≤ 10 SNPs) and not supported by patient co-presence at the same healthcare facility. Likewise, there were often multiple potential transmission pathways linking patients using our applied network definitions, highlighting the complexity of tracking CRE dissemination.

Selection of core SNP thresholds and clinical transmission cutoffs can have a significant effect when refining putative transmissions20,27, which we also observed in the sensitivity analyses and which has an effect on the overall level of in-facility transmission observed or CRKp attributed to hospital acquisitions. The majority of patients with CRKp and admitted to an ACH also stayed at other healthcare facilities either prior to or post hospitalization, providing opportunities for CRKp transfer by either direct or indirect routes.

Our study has some limitations. Patient history was relative to admissions to ACH monitored within the study period; therefore, some overlapping stays in healthcare facilities between patients (including at both pre- and post-ACH sites) may have been missed. Also as data were only collected from symptomatic patients admitted to an ACH, sub-clinically infected patients or colonized individuals at the same facilities are not included within the network, which may contribute to under-sampling of CRKp transmission and reduced network connectivity between patients. We assessed linkages based on the core genome without including mobile elements, which is an important mechanism for microbial evolution and may provide more information for identifying putative transmission linkages or assessing whether the appearance of linkages is in fact due to a widespread regional clone. Additionally, surveillance of environmental contamination was not performed. The hospital environment (e.g., sinks) can be a reservoir for nosocomial outbreaks28,29. Long-term asymptomatic colonization (i.e., months to years) within the gastrointestinal tract is also common in patients with CRKp, even after receiving antibiotics for the symptomatic infection30,31,32,33. We selected a time interval of \(\le\) 14 days between hospital stays to account for environmental contamination34,35,36 among isolate pairs with 11–25 core SNP differences, though these reservoirs can be persistent and thus a 14-day time window can lead to underestimating linked transmissions. Relaxing the time assumption for transmission may allow detection of more direct and indirect transmission pathways between patients. Lastly, allowing for network linkages based on genetic distance, time, and space for isolates with 11–25 SNPs but only the genetic distance for closely related (≤ 10 core SNPs) isolates may have combined two different mechanistic processes.

Yet despite these limitations and the assumptions made about facility lengths of stay (see sensitivity analyses in the Supplemental Content), 1/3 of the cohort had a genetically similar infection to another cohort member without ever having been observed to ever be in the same facility location even while ignoring time. Cerqueira et al. also observed limited transmission chains for CRE isolates among patients sampled from multiple U.S. hospitals, as well as a diversity of resistance mechanisms within these isolates62. These findings underscore the need for enhanced surveillance efforts for CRKp to trace spread between and within the community and healthcare facilities. Incorporation of plasmid epidemiology into transmission maps may improve our understanding of dissemination of antibiotic resistance in hospitals. The combination of a fragmented putative transmission network and observing genetically similar CRKp isolated from patients with non-overlapping hospital stays indicates that unobserved transmission is occurring, possibly from subclinically infected patients. Combining these findings with the high degree of patient sharing between healthcare facilities supports the idea that regional or multi-site interventions are needed to reduce CRKp incidence.

Methods

Patient population and clinical definitions

CRACKLE-1 was a multi-site, prospective, observational study of patients admitted with a CRKp infection between 2011 and 2016 to acute care hospitals (ACH) in the US26. In this study, a sub-cohort of patients was selected from CRACKLE-1 that met the following criteria: (1) were admitted to an ACH in Ohio, Michigan, or Pennsylvania, and (2) had a CRKp positive culture isolated from any anatomical site. For patients with multiple CRKp cultures collected during the same admission, the index CRKp was used for genomic analysis. Multiple admissions per patient could be included provided that each admission met the inclusion criteria.

For each admission, data were collected on patient demographics, comorbidities, ACH admission and discharge dates, patient location prior to ACH admission and following discharge (i.e., home, skilled nursing facility [SNF], long-term acute care hospital [LTACH], outside hospital [OSH], or hospice), culture source (i.e., blood, urine, respiratory, wound, or other), dates of CRKp positive and negative cultures, and infection severity.

All research was performed in accordance with relevant guidelines and regulations. We used a waiver-of-informed-consent approach to obtain these data, which was approved by all participating institutional review boards (IRB): University of North Carolina at Chapel Hill IRB, Case Western Reserve University IRB, Cleveland Clinic IRB, Duke University Health System IRB, MetroHealth IRB, University of Pittsburgh IRB, and Wayne State University IRB. The research was minimal risk, and no direct patient contact was involved. Data were obtained as previously described from the electronic health record and included demographic information, presence of relevant comorbidities, lab results, culture episode details, and admission, discharge, and transfer information26,37,38,39. Only bacterial isolates and no human samples were obtained from the participating clinical microbiology laboratories. All data were de-identified prior to this analysis, which was approved by the University of North Carolina Biomedical IRB.

A community-onset infection, compared to a healthcare-onset infection, was defined as a positive CRKp culture within 2 days of admission to an ACH, regardless of pre-admission location. Patients not meeting the requirements of an infection as previously defined (i.e., blood-stream infection, pneumonia, wound, intraabdominal, or other) were classified as being colonized26. Fecal surveillance cultures and the length of stay outside of an ACH were not collected. Thus, for modeling purposes, we assumed a conservative length of stay at each facility based on average by site type: LTACH (30 days, to account for benefit requirements ≥ 25 days)40, hospice (78 days)41, SNF (90 days, under Medicare’s 100 day benefit cap)42, and home (100 days). The arrival date at a pre-admission facility was assigned by subtracting the assumed length of stay at the facility from the ACH admission with the discharge date corresponding to the date of ACH admission. For a post-discharge facility, the arrival date was the date of discharge from an ACH followed by the assumed length of stay at the facility and discharge.

Whole genome sequencing (WGS) and susceptibility testing

WGS was performed on CRKp collected from unique admissions to determine the genetic relatedness between isolates26 (NCBI Bioprojects: PRJNA433394, PRJNA339843, PRJNA1100400). DNA sequence quality was verified by FastQC, and adapters and low-quality sequencing DNA were trimmed using Trimmomatic v0.3643. Sequences were assembled via SPAdes v3.11.144, and the quality was evaluated using QUAST v4.6.245. Contigs < 500 bp were excluded from downstream assembly. Species and multilocus sequence typing (MLST) were confirmed by StrainSeeker v1.546 and MLST (https://github.com/tseemann/mlst), respectively. Capsular polysaccharide gene clusters and wzi allele typing was performed using Kleborate v0.1.047. ABRicate (https://github.com/tseemann/abricate) and ResFinder48 were both used to identify resistance genes, and inconsistencies were further verified using AMRFinder49. Susceptibility testing was performed as previously described26.

Bacterial genomic and phylogenetic analyses

Protein annotations were assigned to each genome assembly by Prokka50, and Roary51 was used to quantify differences in the core SNPs between strains. Pairwise SNPs were incorporated into the network model as a continuous measure of relative strain relatedness. The phylogenetic relationship between strains was characterized using a general time-reversible model with four discrete rates of nucleotide substitution (GTRGAMMA option) in RAxML v8.2.452. The R programs ggtree53 and pheatmap54 were used to visualize the tree and heatmap. We compared pairwise core SNP differences to pairwise average nucleotide identity (ANI) differences as a measure of validity using FastANI55 (Supplemental Digital Content 1).

Network analysis of putative transmission events

We defined putative transmission events between patients in the cohort based on a combination of pairwise genetic distance and facility overlap, for the purpose of creating a network of potential transmission events. Patient-pairs with isolates having ≤ 10 pairwise core SNP differences were assigned an event probability of 1, while patient-pairs > 25 pairwise core SNP differences were assigned an event probability of 0 and were thus excluded from the network as transmission events. Core SNP thresholds of 1056,57 and 2558,59,60 were informed based on thresholds applied in previous studies to identify closely related isolates of K. pneumoniae.

For patient-pairs between 11 and 25 pairwise SNP differences, facility co-presence was defined as \(\le\) 14 days between two patients having stayed at the same facility. SNP differences and the time between same-facility stays (interval of discharge to subsequent patient’s admission) were combined to linearly scale the weighted event probability in decreasing probability as the SNP distance increased from 11 up to 25 SNPs and the time between facility stays increased up to 14 days (Supplemental Digital Content 2a and 2b). A 14-day window was selected to account for indirect transmission events from (a) environmental contamination13,61 and (b) between patients not included in the study (e.g., asymptomatic or unsampled individuals).

We then created a network of the putative transmissions in which each patient was represented as a single node in the network, regardless of the number of unique admissions. For patients sharing the same facility across multiple admissions such that the patient-by-patient combination was not unique, the admission combination with the highest weighted event probability was selected for the network analysis so that every pair of patients had only one edge (putative linkage) in the network and other potential linkages were considered redundant for the analysis. Patients in a patient-pair were temporally ordered as either first or second based on having an earlier or later CRKp culture date, respectively, solely for the purpose of network construction but not to make inference about directionality of transmission. Several patient-level variables were calculated from the network, including total degrees (number of transmission events for each patient), out-degrees (number of putative transmissions to other patients), and in-degrees (number of putative transmissions received from another patient). As (1) we could not epidemiologically confirm direct patient-to-patient transmission, and (2) there was the potential for subclinical infections and long-term asymptomatic carriage among patients as well as other individuals at the facilities that were not surveilled in this study, the presence of an edge in the network is more analogous to patients being in a putative transmission cluster than a confirmed direct transmission event.

Results are reported either as per-patient (patients with multiple admissions represented once in the analyses), per-admission (patients with > 1 admission where a key characteristic may have changed), and per-dyad (pairwise across admissions between two different people).

Sensitivity analysis and the effect of cut-offs on network connectivity

We performed a sensitivity analysis comparing the impact of adjusting core SNP thresholds and facility length of stay on the connectivity between patients within the network (Methods, Results, and Discussion in Supplemental Digital Content 3). Analysis was limited to ST258 isolates to test the impact of methodological choices using a robust sample size compared to the other sequence types.

Statistical tests

Fisher’s exact and Mann–Whitney tests were used to assess the relationships between categorical and continuous variables, respectively. A p value of < 0.05 was considered statistically significant.

Multivariable models

Three multivariable logistic regression models were applied to the data to identify patient and network factors associated with putative transmission events (Methods, Results, and Discussion in Supplemental Digital Content 4).