Introduction

Correct empirical antibiotic therapy for Gram-negative bloodstream infection (GN-BSI) is crucial due to its high impact on morbidity and mortality1,2. Not only patients’ underlying conditions and clinical severity at presentation, but also local epidemiology and rates of antibiotic resistance are milestones in the process of choosing an adequate empirical therapy3,4. If a broader antibiotic spectrum offers more guarantees in terms of antibiotic coverage, the narrowest antimicrobial spectrum gives more chances to reduce pressure and emergence of resistance and/or adverse events associated with antibiotic misuse5.

Several studies aimed to predict multidrug resistance in patients with GN-BSI in order to guide clinicians in the correct empirical therapy choice, even if they usually addressed a single resistance type or mechanism6,7,8. In recent studies, growing attention was driven to find the best time-window for drug-resistance-predicting models9,10. According to the local laboratory workflow, predictive models could be used at different windows, such as Gram staining stage or early species identification, with or without rapid detection of genotypic mechanisms11. Currently, MALDI-TOF is one of the most common methods used for rapid identification of pathogens from positive blood cultures12.

Our study aims to develop an artificial intelligence (AI) model able to predict a microorganism’s susceptibility to different antibiotic classes (i.e., fluoroquinolones, 3rd generation cephalosporins, beta-lactam/beta-lactamase inhibitors, and carbapenems), in patients with a Gram-negative bloodstream infection identified at species level using the MALDI-TOF species identification system. The MALDI-based system was chosen according to the microbiological BSI workflow process of our laboratory and in order to increase the specificity of our predictive model.

Results

Characteristics of population

During the study period, 4497 GN-BSI occurred in our cohort. After excluding 1945 episodes meeting the exclusion criteria, 2552 patients were included for analysis (Supplementary Fig. 1). Baseline characteristics of populations and microorganisms are shown in Tables 1 and 2, respectively.

Table 1 Characteristics of the study population
Table 2 Characteristics of pathogens

Coefficients of the multivariate logistic regression

The coefficients of the variables used in the multivariable logistic regression have been computed and reported in Fig. 1 (referred to the whole dataset) and Supplementary Fig. 2 (restricted only to Enterobacterales). Rectal swab positivity emerged as a strong predictive factor of antibiotic resistance, in particular for carbapenem resistance (C-R), fluoroquinolone resistance (FQ-R), and beta-lactam/beta-lactamase inhibitors resistance (BL/BLI-R). Among Enterobacterales, Klebsiella pneumoniae was associated with all types of antibiotic resistance. Conversely, Escherichia coli and Proteus spp. could strongly predict carbapenem susceptibility.

Fig. 1: Mean values of the most relevant coefficients of the logistic regressions, over the 10 iterations of the outer cross-validation, for the four antibiotic resistances.
figure 1

Each panel shows the mean values of the 10 largest coefficients (in module, the positive ones, related to resistance, represented in red, the negative ones in green) of the logistic regressions, over the 10 iterations of the outer cross-validation. There is one panel for each of the four antibiotic resistances. The blue error bar of each coefficient represents its standard deviation value over the 10 cross-validation iterations.

Results of model validations are shown in Fig. 2. The model showed the best performance in predicting carbapenem resistance with AUROC about 0.921 ± 0.013. Subsequently BL/BLI resistance, 3rd generation cephalosporins (3GC) and FQ resistance showed accurate prediction with AUROC of 0.786 ± 0.033, 0.737 ± 0.022, 0.732 ± 0.029, respectively. The same analysis has been performed also on the Enterobacterales subset; results are reported in Supplementary Fig. 3.

Fig. 2: Mean confusion matrices and performance metrics over the outer cross-validation iterations for different types of antibiotic resistance.
figure 2

Each panel shows the mean row-normalized confusion matrices over the 10 iterations of the outer cross-validation. The mean value of the weighted F1-score, Matthews Correlation coefficient and Area Under Receiver Operating Characteristic Curve (AUROC) are reported for each antibiotic resistance; the associated uncertainty is the standard deviation of the metrics over the above mentioned 10 iterations.

The detailed performances for the extreme gradient boosting classifier and the multi-layer perceptron are reported in Supplementary Figs. 4 and 5.

The mean F1-scores over the iterations for the resistant class were 0.626, 0.639, 0.560, and 0.606, while for the susceptible class were 0.889, 0.690, 0.797, 0.727, respectively for carbapenem, fluoroquinolone, BL/BLI and 3GC resistance. The false omission rates for each antibiotic’s class were 0.035, 0.341, 0.156, and 0.262 respectively for carbapenem, fluoroquinolone, BL/BLI and 3GC resistance.

The developed pipeline has been made available (https://github.com/EttoreRocchi/ResPredAI), along with documentation for running the same workflow on a different dataset, to account for local epidemiology and clinical features.

Discussion

Using a large database of adult hospitalized patients with GN-BSI, an AI model was derived and trained to predict resistance or susceptibility to each of the four most commonly used antibiotic classes (i.e., fluoroquinolones, 3rd generation cephalosporins, BL/BLI and carbapenemes) starting from pathogen identification. Applying a penalized approach, our model was suited to be trained on many variables reducing overfitting and decreasing the effect of features collinearity. It was validated on 10 iterations (due to the 10-fold outer cross-validation) thus it is robust with respect to the splitting of the data between training and testing. The model was trained balancing the weight of each outcome classes (resistance and susceptibility) based on the class frequency, so that it can be more effective also in the prediction on the underrepresented class.

We deem our model could be useful in the clinical practice to improve therapeutic management and outcome of patients with GN-BSI. Indeed, initiation of appropriate antibiotic therapy >12 h from drawing BCs has been associated with a significant increased risk of mortality in a large study of almost 10,000 patients with BSI13. Generally, using MALDI-TOF on positive BCs, pathogen identification is available within 12 h after collecting BCs in a significant number of patients. Thus, our model could be useful in hospitals where MALDI-TOF is available, to predict resistance pattern from species identification and starting or modifying empirical treatment. It could be also complementary to the use of rapid diagnostic assays providing data on genotype resistance in patients with GN-BSI14. Indeed, genotype antibiogram for GN bacteria has several limitations as (i) it does not account for differential phenotypic expression or penetrance of a given resistance gene which could lead to overestimates of resistance; (ii) potential discrepancies between genotype and phenotype due to a broad spectrum of resistance determinants mainly in non-fermenting strains, mostly Pseudomonas aeruginosa; (iii) potential off-target mechanisms; and (iv) costs15. Thus, application of an AI model in settings where drug resistance is endemic could be useful for either diagnostic or antibiotic stewardship purposes. In addition, the application of AI model may be feasible even in centers where clinical microbiologists, infectious disease consultants and/or antimicrobial stewardship staff are not available 24/716.

In our model, higher sensitivity rather than specificity in predicting resistance to antibiotics’ class was accepted. This means we obtained high negative predictive value and consequently very little false omission rates, particularly for carbapenem resistance. This choice minimizes the probability of inappropriate antibiotic therapy in the very early phase. On the other hand, from a stewardship perspective, it could lead to unnecessary broader spectrum antibiotics use, at least in the empirical phase. To avoid antibiotic misuse, if applicable, prompt de-escalation as soon as phenotypic antibiogram is available is strongly recommended17.

Our study has some limitations. The single-center design may be a limit to the generalization of our results. The model trained on the described cohort could have been influenced by the local epidemiology and the patient case mix during the study period. To address this limitation, we have provided the developed pipeline to be trained and tested potentially in any center, to properly capture geographical, epidemiological and clinical differences, we expect to find in hospitals from different regions.

To conclude, our AI model is a promising tool able to support clinicians in the very early clinical decision-process, integrating Gram-negative MALDI-TOF species identification with very few significant demographic, clinical and microbiological variables, to return rapid information on potential resistance to main antibiotic classes in patients with GN-BSI. Prospective multicentric studies are needed in order to further improve its performance in different settings and validate its clinical usefulness.

Methods

Observational cohort study on all consecutive adult patients hospitalized at our center and diagnosed with GN-BSI, from January 1st 2013 to December 31st 2019. Patients were excluded if on palliative care, if death occurred within 48 h from index BSI and when clinical data were incomplete or unavailable.

The study was conducted according to the declaration of Helsinki and Good Clinical Practice guidelines and approved by the local Ethics Committee (no. 894/2021/Oss/AOUBo). Research ethics board approval was obtained in agreement with Comitato Etico Area Vasta Emilia Centro della Regione Emilia-Romagna (CE-AVEC). Informed consent was obtained before enrollment.

Data sources and predictor variables

Patients were screened for enrolment using local microbiology registries. Clinical charts and hospital electronic records were data sources. Data were gathered using a dedicated REDCap electronic case report form (eCRF) hosted by Alma Mater Studiorum - University of Bologna18.

The primary endpoint was antibiotic resistance to four different antibiotic classes including FQ-R, 3GC-R, BL/BLI-R and C-R. Beta-lactam/beta-lactamase inhibitors included amoxicillin/clavulanate and piperacillin/tazobactam for Enterobacterales, only piperacillin/tazobactam for Pseudomonas spp.

Exposure variables included demographic (i.e., age, gender), diabetes (uncomplicated disease or end-organ disease), congestive heart failure, dementia, chronic obstructive pulmonary disease (COPD), chronic kidney disease (CKD), liver disease, solid organ tumor (localized or metastatic), comorbidities according to Charlson comorbidity index19, presence of immunosuppressive conditions (hematopoietic cell transplantation, neutropenia, solid organ transplantation, HIV, corticosteroids therapy), length of hospital stay (LOS) from hospital admission to index BSI, BSI acquisition source (hospital or community acquired) along with inpatient ward (i.e., internal medicine, intensive care unit-ICU, Surgery, Emergency department). BSI sources, defined according to US Centers for Disease Control and Prevention criteria20 were also registered. BSI was defined as “primary” in case of unidentified source of infection. Data about microbiological strains were summarized into Enterobacterales (Klebsiella spp, Escherichia coli, Enterobacter spp.) and non-fermentative Gram-negative (NF-GN) (Pseudomonas spp., Acinetobacter spp). We also took record of rectal swab colonization at BSI onset. Correlation heatmap among variables are shown in Supplementary Fig. 6a, b.

Data analysis

The analyses were carried out within a machine learning framework, developed using the scikit-learn Python package. The problem here posed fell in the category of classification tasks, since the aim was to predict the resistance or susceptibility of a given pathogen to four antibiotic families evaluating clinical and demographic features. A multivariable logistic classifier has been used for this purpose since it represents a well-calibrated model for binary classification. A comparative analysis has been carried out to evaluate the most predictive model among an extreme gradient boosting classifier, a multi-layer perceptron and a logistic regression. Although each model produced robust and consistent performances, the logistic regression model has been chosen in this study not only because of its interpretability (especially when compared to a black box model as the multi-layer perceptron), but in particular because of its higher accuracy in predicting the resistant class of the four antibiotics. The machine learning workflow consists of a One Versus Rest (OVR) framework that allows to train the multivariable logistic classifier so that it learns to classify each pathogen as resistant or susceptible to the four antibiotic classes. The model was trained within a nested cross validation (CV) to avoid overfitting and ensure more robust results. The purpose of the 5-fold inner CV was to fine-tune the hyperparameters of the logistic regression, i.e., the type of penalization (among no penalization, lasso, ridge and elastic-net) and the penalization factor (see Supplementary Table 1). The 10-fold outer CV evaluated the robustness of the model to training and test splitting, since the validation metrics were computed on the test set (corresponding to 10% of the dataset) for each different split. A sketch of the nested CV workflow is presented in Supplementary Fig. 7.

Before training the model, a pre-processing step was required; in particular, after a one-hot encoding to obtain dummy variables for all the categorical variables present in the dataset, the two continuous variables, i.e., the age and the length of hospital stay of the patient, underwent the procedure of feature scaling through standardization.

As already stated above, in addition to a non-penalized model, different regularization techniques were also considered: the L1 penalization (lasso), the L2 penalization (ridge), and a balanced mix between the two (elastic-net).

After training, the models were validated using three metrics: Area Under Receiver-Operating Curve (AUROC), weighted F1-score and Matthews Correlation coefficient being the most common choices when validating a binary classifier (especially if the dataset contains unbalanced classes) since they each provide a different insight on model performance.

As already described above, this work considers four antibiotics classes. The first step of the machine learning framework was to train a logistic regression for each one of these classes independently within a nested cross validation framework. More specifically, for each antibiotics class, data were split into 10 folds of the outer CV. The inner CV is instead a 5-folds.

Thus, for each antibiotics class, 10 values for each of three considered metrics (one for each iteration of the outer CV) were obtained, allowing to determine a variability measure (standard deviation) of the metrics over the 10 iterations.

Once the model was trained, the coefficients for each feature have been extracted to see the impact of each variable on the outcome of the model. Since each logistic regression is trained 10 times (for a 10-folds CV), each feature has been associated with 10 coefficients that have been summarized using their mean and standard deviation.

Finally, we evaluated the positive predictive values (PPV) and the negative predictive values (NPV) of our model, and specifically the false omission rate (FOR) for each antibiotic’s class, defined as FOR = 1 - NPV, which more accurately represents the risk of a wrong classification of a pathogen as susceptible, when it is actually resistant21.