Table 2 Performance of the demodexnet models.

From: Artificial intelligence-assisted prediction of Demodex mite density in facial erythema

Model (class)

Classification performance (95% CI)a

Sensitivity

Specificity

F1-Score

ROC-AUC

Accuracy

P-valueb

SE model

      

 Internal testing set

Image-based model

0.260

(0.143–0.380)

0.980

(0.937–1.000)

0.406

(0.255–0.543)

0.825

(0.734–0.903)

0.620

(0.520–0.720)

0.97

Clinical-data-based model

0.840

(0.731–0.935)

0.780

(0.660–0.894)

0.816

(0.725–0.889)

0.842

(0.751–0.915)

0.810

(0.720–0.880)

0.74

Combined model

0.300

(0.226–0.371)

0.960

(0.927–0.987)

0.448

(0.362–0.529)

0.823

(0.728–0.896)

0.630

(0.573–0.683)

Ref

 External testing set

Image-based model

0.378

(0.243–0.531)

0.800

(0.696–0.896)

0.466

(0.324–0.600)

0.657

(0.550–0.754)

0.610

(0.520–0.710)

0.59

Clinical-data-based model

0.733

(0.585–0.860)

0.618

(0.491–0.741)

0.667

(0.543–0.769)

0.707

(0.609–0.806)

0.670

(0.580–0.760)

0.89

Combined model

0.356

(0.227–0.500)

0.782

(0.667–0.879)

0.438

(0.281–0.572)

0.697

(0.589–0.790)

0.590

(0.490–0.690)

Ref

GMIC model

 Internal testing set

Image-based model

0.640

(0.500–0.767)

0.760

(0.633–0.872)

0.681

(0.571–0.784)

0.833

(0.753–0.908)

0.700

(0.610–0.790)

0.57

Clinical-data-based model

0.860

(0.759–0.952)

0.660

(0.522–0.791)

0.782

(0.690–0.862)

0.790

(0.680–0.873)

0.760

(0.670–0.840)

0.22

Combined model

0.776

(0.681–0.860)

0.800

(0.692–0.907)

0.776

(0.681–0.860)

0.865

(0.785–0.934)

0.780

(0.700–0.850)

Ref

 External testing set

      

Image-based model

0.644

(0.512–0.780)

0.600

(0.468–0.736)

0.604

(0.483–0.720)

0.674

(0.568–0.777)

0.620

(0.530–0.710)

0.32

Clinical-data-based model

0.800

(0.685–0.913)

0.564

(0.439–0.696)

0.686

(0.571–0.777)

0.705

(0.604–0.804)

0.670

(0.570–0.770)

0.58

Combined model

0.756

(0.625–0.878)

0.618

(0.480–0.746)

0.680

(0.562–0.781)

0.746

(0.650–0.840)

0.680

(0.590–0.770)

Ref

  1. CI, confidence interval; GMIC, Globally-aware Multiple Instance Classifier; Ref, reference model; ROC-AUC, area under the receiver operating characteristic curve; SE, stacking ensembleaCalculated using the micro-averaged value of each severity class for the given model, using bootstrap resampling (N = 1000) of the test datasetbThe P-value from the binomial test measures the difference in performance between the combined model and image- or clinical data-based model in terms of ROC-AUC.