Table 2 Test set performance of SSPUL (GBE) and baseline models

From: Fair positive unlabeled learning for predicting undiagnosed Alzheimer’s disease in diverse electronic health records

Race/

ethnicity

Model

Sensitivity

Precision

Specificity

B. Accuracy

AUC

AUCPR

NH-white

Supervised (risk factors/MCC)

0.45

0.3

0.8

0.63

0.69

0.3

(0.27, 0.57)

(0.25, 0.37)

(0.71, 0.92)

(0.59, 0.65)

(0.68, 0.71)

(0.28, 0.33)

Supervised (full/MCC)

0.45

0.86

0.99

0.72

0.82

0.68

(0.39, 0.51)

(0.79, 0.92)

(0.98, 0.99)

(0.69, 0.75)

(0.80, 0.84)

(0.65, 0.71)

SSPUL (GBE)

0.8

0.8

0.96

0.88

0.95

0.87

(0.71, 0.87)

(0.71, 0.87)

(0.95, 0.98)

(0.83, 0.92)

(0.94, 0.96)

(0.80, 0.91)

NH-AfAm

Supervised (risk factors/MCC)

0.53

0.32

0.76

0.65

0.71

0.34

(0.33, 0.69)

(0.24, 0.41)

(0.65, 0.89)

(0.59, 0.71)

(0.65, 0.76)

(0.25, 0.43)

Supervised (full/MCC)

0.51

0.83

0.98

0.75

0.82

0.7

(0.41, 0.63)

(0.70, 0.94)

(0.95, 0.99)

(0.70, 0.80)

(0.76, 0.87)

(0.62, 0.78)

SSPUL (GBE)

0.81

0.79

0.96

0.88

0.95

0.88

(0.64, 0.92)

(0.63, 0.91)

(0.91, 0.98)

(0.79, 0.94)

(0.92, 0.98)

(0.79, 0.94)

HL

Supervised (risk factors/MCC)

0.44

0.29

0.81

0.63

0.69

0.3

(0.26, 0.59)

(0.23, 0.38)

(0.72, 0.92)

(0.58, 0.68)

(0.65, 0.74)

(0.23, 0.36)

Supervised (full/MCC)

0.52

0.77

0.97

0.74

0.82

0.68

(0.42, 0.62)

(0.65, 0.87)

(0.95, 0.99)

(0.70, 0.79)

(0.77, 0.86)

(0.61, 0.74)

SSPUL (GBE)

0.77

0.77

0.96

0.87

0.95

0.84

(0.60, 0.89)

(0.61, 0.90)

(0.92, 0.98)

(0.77, 0.93)

(0.92, 0.97)

(0.74, 0.92)

EA

Supervised (risk factors/MCC)

0.42

0.32

0.82

0.62

0.69

0.33

(0.21, 0.57)

(0.25, 0.43)

(0.73, 0.94)

(0.57, 0.67)

(0.64, 0.73)

(0.26, 0.40)

Supervised (full/MCC)

0.39

0.88

0.99

0.69

0.79

0.64

(0.29, 0.49)

(0.78, 0.97)

(0.98, 1)

(0.65, 0.74)

(0.74, 0.84)

(0.57, 0.71)

SSPUL (GBE)

0.77

0.77

0.96

0.86

0.91

0.81

(0.67, 0.84)

(0.69, 0.86)

(0.94, 0.98)

(0.81, 0.90)

(0.87, 0.94)

(0.73, 0.87)

  1. Metrics reported are means of 1000 random test sets with 95% CI. Cutoffs for baseline supervised models were selected by maximizing the MCC for unlabeled data in the validation set using proxy labels. Cutoffs for SSPUL was selected by optimizing the GBE for each race/ethnicity in the validation set using positive and proxy labels.
  2. AUC area under the curve, AUCPR area under the precision recall curve, B. accuracy balanced accuracy, EA East Asian, GBE group benefit equality, HL Hispanic Latino, MCC Matthew’s correlation coefficient, NH-AfAm non-Hispanic African American, NH-white non-Hispanic white, SSPUL semi-supervised positive unlabeled learning.