Table 1 Overview of the modeling approaches used for predicting phage KL type tropism
From: Unlocking data in Klebsiella lysogens to predict capsular type-specificity of phage depolymerases
Modeling approach | Positive instances | Negative instances | Input Data Shape | Aggregator (method)/ Classifier | Hyperparameter Optimization |
|---|---|---|---|---|---|
DAG | A single prophage instance for each infectious event OR A single prophage instance with a unique set of depolymerases | Randomly selected from prophages whose depolymerase sets do not overlap with positive instances | Set of embedding representation of 1280 dimensions | Attention-based (GATv2) | learning weight, weight decay, dropout, attention heads |
Average (SAGE) | learning weight, weight decay, dropout | ||||
Sequence clustering | Binary vector of 989 dimensions | Random Forest | bootstrap, max depth, max feature, min samples leaf, min samples splits, n estimators | ||
Logistic regression | penalty, C (regularization strength), max iterations, L1 ratio |