Table 1 Overview of the modeling approaches used for predicting phage KL type tropism

From: Unlocking data in Klebsiella lysogens to predict capsular type-specificity of phage depolymerases

Modeling approach

Positive instances

Negative instances

Input Data Shape

Aggregator (method)/ Classifier

Hyperparameter Optimization

DAG

A single prophage instance for each infectious event

OR

A single prophage instance with a unique set of depolymerases

Randomly selected from prophages whose depolymerase sets do not overlap with positive instances

Set of embedding representation of 1280 dimensions

Attention-based (GATv2)

learning weight, weight decay, dropout, attention heads

Average (SAGE)

learning weight, weight decay, dropout

Sequence clustering

Binary vector of 989 dimensions

Random Forest

bootstrap, max depth, max feature, min samples leaf, min samples splits, n estimators

Logistic regression

penalty, C (regularization strength), max iterations, L1 ratio