Table 2 Performance of the XL model on 3 datasets

From: Scaling convolutional neural networks achieves expert level seizure detection in neonatal EEG

 

Test Set

Validation Sets

 

Cork (n = 41)

Cork (n = 51)

Helsinki (n = 79)

 

per-channel

global channel

global channel

AUC

0.978

0.996

0.982

AP / AP50

0.694 / 0.533

0.833 / 0.701

0.891 / 0.794

Pearson’s r

0.723

0.766

0.824

MCC

0.648

0.739

0.764

Cohen’s κ

0.630

0.726

0.761

Sensitivity/Specificity (%)

51.5 / 99.9

88.9 / 99.2

72.9 / 98.5

PPV/NPV (%)

82.0 / 99.6

62.0 / 99.8

84.9 / 96.8

FD/h

0.053

0.363

0.459

Seizure Burden, r

0.902

0.667

0.739

  1. Testing results are from 20% of the development dataset. Validation results are from held-out datasets from Cork and Helsinki (described in Table 6.) Performance is assessed per-channel on the test dataset and globally (across all channels) on the validation datasets. All metrics are calculated by concatenating all EEG recordings. Metrics for the held-out (validation) multi-annotator sets are based on unanimous consensus annotations.
  2. Key: AUC, area under the receiver-operator-characteristic curve (AUC); AP, average precision; AP50, average precision with recall > 50%; MCC, Matthews correlation coefficient; PPV, positive predictive value; NPV, negative predictive value; FD/h, false detections per hour; r represents correlation; κ represents kappa.