Table 1 Performance of hyperSMURF with different selection strategies for negatives.

From: Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants

Neg. selection

imb.ratio

n.folds

AUPRC

AUROC

Mendelian data

±100 Kb

1:302

116

0.7071

0.9805

±500 Kb

1:1432

116

0.6279

0.9802

±1000 Kb

1:2765

111

0.6161

0.9786

TAD

1:1406

125

0.6123

0.9803

GWAS data

±100 Kb

1:80

1402

0.6488

0.9840

±500 Kb

1:277

723

0.4796

0.9841

±1000 Kb

1:409

413

0.4213

0.9851

TAD

1:269

1196

0.4792

0.9838

  1. The first column represents the size of the “genomic window” used to select negatives around each positive or the “TAD-based” negative selection strategy; the second column reports the imbalance between positive and negative examples; the third the number of folds of the “topologically-aware” cross-validation, while the last two columns show the estimated AUPRC and AUROC.