Table 5 Comparison of ensemble strategies using MAE-based models: performance of snapshot ensemble (within a single training run) and deep ensemble (across multiple pretrained models).
AUC | AUPRC | Sensitivity | Precision | F1-score | ||
|---|---|---|---|---|---|---|
Snapshot ensemble | Simple averaging | 0.761 (0.003) | 0.623 (0.005) | 0.742 (0.008) | 0.558 (0.013) | 0.615 (0.005) |
Weighted averaging | 0.756 (0.006) | 0.618 (0.005) | 0.733 (0.022) | 0.551 (0.015) | 0.606 (0.011) | |
Greedy ensemble (2 sets) | 0.761 (0.002) | 0.622 (0.006) | 0.737 (0.017) | 0.562 (0.009) | 0.612 (0.009) | |
Greedy ensemble (3 sets) | 0.762 (0.003) | 0.622 (0.004) | 0.743 (0.010) | 0.559 (0.010) | 0.615 (0.005) | |
Deep ensemble | Simple averaging | 0.752 (0.004) | 0.525 (0.005) | 0.751 (0.007) | 0.551 (0.006) | 0.616 (0.005) |
Weighted averaging | 0.752 (0.004) | 0.525 (0.005) | 0.751 (0.008) | 0.551 (0.006) | 0.616 (0.005) | |
Greedy ensemble (2 sets) | 0.755 (0.003) | 0.525 (0.003) | 0.753 (0.016) | 0.552 (0.005) | 0.617 (0.009) |