Table 5 Comparison of ensemble strategies using MAE-based models: performance of snapshot ensemble (within a single training run) and deep ensemble (across multiple pretrained models).

From: Integrating snapshot ensemble learning into masked autoencoders for efficient self-supervised pretraining in medical imaging

  

AUC

AUPRC

Sensitivity

Precision

F1-score

Snapshot ensemble

Simple averaging

0.761 (0.003)

0.623 (0.005)

0.742 (0.008)

0.558 (0.013)

0.615 (0.005)

Weighted averaging

0.756 (0.006)

0.618 (0.005)

0.733 (0.022)

0.551 (0.015)

0.606 (0.011)

Greedy ensemble (2 sets)

0.761 (0.002)

0.622 (0.006)

0.737 (0.017)

0.562 (0.009)

0.612 (0.009)

Greedy ensemble (3 sets)

0.762 (0.003)

0.622 (0.004)

0.743 (0.010)

0.559 (0.010)

0.615 (0.005)

Deep ensemble

Simple averaging

0.752 (0.004)

0.525 (0.005)

0.751 (0.007)

0.551 (0.006)

0.616 (0.005)

Weighted averaging

0.752 (0.004)

0.525 (0.005)

0.751 (0.008)

0.551 (0.006)

0.616 (0.005)

Greedy ensemble (2 sets)

0.755 (0.003)

0.525 (0.003)

0.753 (0.016)

0.552 (0.005)

0.617 (0.009)