Table 9 Comparison against baseline.

From: RABEM: risk-adaptive Bayesian ensemble model for fraud detection

Fusion method

Accuracy

F1 score

Precision

Recall

Calibration (ECE)

Notes

Bayesian reliability fusion

99.38%

0.92

89.66%

99.41%

0.024

Strong calibration, best recall

Soft voting (mean probabilities)

98.91%

0.88

87.12%

95.10%

0.072

Good baseline, less calibrated

Bagging (random forest)

98.24%

0.81

85.11%

90.45%

0.090

Lacks probabilistic calibration

Stacking (logistic meta)

98.67%

0.86

87.54%

92.44%

0.063

Better than bagging, still lower recall