Table 3 Performance metrics of the best tabular, image model and ensemble models using majority voting and average strategies. The results are the averages and standard deviations of the threefold validation.

From: Ensemble learning for fetal ultrasound and maternal–fetal data to predict mode of delivery after labor induction

Model

AUROC

F1-score

PPV

NPV

Sensitivity

Specificity

Tabular data model (AdaBoost)

0.738 ± 0.051

0.736 ± 0.024

0.734 ± 0.024

0.790 ± 0.019

0.746 ± 0.022

0.867 ± 0.019

Image model (Inception femur)

0.485 ± 0.019

0.605 ± 0.015

0.596 ± 0.017

0.703 ± 0.006

0.668 ± 0.003

0.908 ± 0.022

Image-based model (Inception femur, abdomen and head)

Max voting

0.470 ± 0.016

0.576 ± 0.016

0.572 ± 0.010

0.693 ± 0.007

0.582 ± 0.023

0.718 ± 0.040

Image-based model (Inception femur, abdomen and head)

Mean voting

0.477 ± 0.001

0.578 ± 0.006

0.587 ± 0.087

0.698 ± 0.001

0.694 ± 0.003

0.991 ± 0.003

AdaBoost + Inception femur, abdomen and head

Max voting

0.621 ± 0.053

0.638 ± 0.043

0.679 ± 0.036

0.792 ± 0.033

0.624 ± 0.045

0.623 ± 0.047

AdaBoost + Inception femur, abdomen and head

Mean voting

0.500 ± 0.001

0.578 ± 0.006

0.587 ± 0.087

0.698 ± 0.001

0.694 ± 0.003

0.991 ± 0.003

Final classification model (AdaBoost + Inception femur)

Max voting

0.705 ± 0.049

0.716 ± 0.028

0.719 ± 0.027

0.802 ± 0.018

0.714 ± 0.029

0.783 ± 0.024

Final classification model (AdaBoost + Inception femur)

Mean voting

0.522 ± 0.016

0.608 ± 0.012

0.606 ± 0.011

0.705 ± 0.005

0.677 ± 0.004

0.924 ± 0.022

  1. Results are presented as mean ± standard deviation. These values were obtained by threefold cross-validation. Columns include: AUROC: area under the receiver operating curve; PPV: positive predictive value; NPV: negative predictive value.