Fig. 1
From: Ranking of non-coding pathogenic variants and putative essential regions of the human genome

Ensemble learning for the prediction of deleterious variants in the non-coding genome. Performance ROC-AUC (a) and PR-AUC (b) on the test set (N = 136 non-coding pathogenic and N = 2017 control variants) of a model trained only with published deleteriousness metrics (blue), only with new features, namely essentiality, 3D genome organization and gene expression features (orange) and with both new features and published metrics (green, ncER). The importance of the various input features in the ncER model is shown in c. Blue, published scores; green, new essentiality features; red, new 3D chromatin structure features; orange, new regulatory/functional screen features. Panel d shows the distribution of ncER percentiles for an independent set of 137 curated non-coding pathogenic Mendelian variants compared to a set of singletons from gnomad matched by genomic element and distance to splice sites. There is statistically significant enrichment for both dominant (N = 85) and recessive (N = 52) non-coding pathogenic variants in high ncER percentiles. p Values were computed with Fisher's exact test. ROC receiver operating characteristic, PR precision-recall, AUC area under the curve