Table 1 Phenotype classification performance for pulmonary hypertension

From: A weakly supervised transformer for rare disease diagnosis and subphenotyping from EHRs with pulmonary case studies

Metric

Count

KOMAP

XGBoost

Transformer (silver = gold)

Transformer (gold only)

WEST (w/o neg)

WEST (w/ neg)

AUC

0.85 (0.77–0.91)

0.86 (0.79–0.92)

0.82 (0.72–0.91)

0.84 (0.71–0.89)

0.88 (0.79–0.94)

0.91 (0.85–0.95)

0.93 (0.87–0.97)

F1 Score

0.79 (0.72–0.85)

0.84 (0.77–0.90)

0.85 (0.79–0.91)

0.82 (0.72–0.87)

0.85 (0.79–0.92)

0.86 (0.77–0.90)

0.88 (0.80–0.92)

PPV

0.65 (0.57–0.74)

0.88 (0.79–0.95)

0.87 (0.78–0.94)

0.84 (0.71–0.89)

0.89 (0.81–0.96)

0.91 (0.75–0.93)

0.95 (0.83–0.97)

Specificity

0 (0–0)

0.78 (0.63–0.90)

0.76 (0.59–0.89)

0.70 (0.49–0.79)

0.81 (0.68–0.93)

0.84 (0.58–0.86)

0.92 (0.71–0.95)

  1. WEST trained with both positive and negative gold-standard labels, denoted WEST (w/ neg), achieved the highest AUC, F1 score, PPV, and specificity across all methods. The Transformer (silver = gold) baseline was trained by treating all silver-standard labels as gold-standard (i.e., no iterative updates or augmentation), while Transformer (gold only) used only expert-validated labels. Transformer metrics were averaged across two cross-validation folds, and all metrics are reported with 95% confidence intervals estimated by bootstrapping on patient-level predictions. Bold values denote the best performance per metric.