Fig. 1: PhageHostLearn overview and validation procedures. | Nature Communications

Fig. 1: PhageHostLearn overview and validation procedures.

From: Prediction of Klebsiella phage-host specificity at the strain level

Fig. 1

a. Overview of the PhageHostLearn machine learning system. PhageHostLearn processes phage and bacterial genomes into phage RBPs and bacterial K-locus proteins, respectively. Phage RBPs belonging to the same phage and bacterial K-locus proteins belonging to the same bacterium are combined into separate multi-instance representations using ESM-2. These multi-instance representations are concatenated into combined representations of the phage-host pairs. Finally, these representations are given as input into an XGBoost model to make predictions and output a ranking of top candidate phages to test against a given bacterium. b. In silico validation of the PhageHostLearn system using a leave-one-group-out cross-validation (LOGOCV) scheme that measures the ROC AUC and mean hit ratio @ k as evaluation metrics. c. In vitro validation of the PhageHostLearn system using 28 high-risk K. pneumoniae clinical isolates in Spain. The PhageHostLearn system predicts a top-five ranking for each of the clinical isolates. For each ranking, the top five phage candidates are validated in the laboratory using phage spot tests.

Back to article page