Extended Data Fig. 1: Performance of pgBoost vs. a logistic regression model. | Nature Genetics

Extended Data Fig. 1: Performance of pgBoost vs. a logistic regression model.

From: Linking regulatory variants to target genes by integrating single-cell multiome methods and genomic distance

Extended Data Fig. 1

Average enrichment across recall values of links predicted by pgBoost vs. a logistic regression model with identical features for (a) 4,434 fine-mapped eSNP-eGene pairs attaining maximum PIP > 0.5 across GTEx tissues, (b) 53,701 SNP-gene pairs attaining maximum ABC score > 0.2 across 344 biosamples, (c) 892 links validated by CRISPR, and (d) 155 non-coding SNP-gene pairs derived from fine-mapped GWAS variants with a unique fine-mapped coding variant within a 2 Mb window, at various distance thresholds. The number of positive evaluation links at each distance threshold is specified in parentheses. Confidence intervals denote standard errors. Stars denote 2-sided bootstrap p-values for difference (*: p < 0.05, **: p < 0.01, ***: p < 0.001) of top method vs. each other method (Methods). Gradient boosting significantly outperformed logistic regression on eQTL evaluation data and CRISPR evaluation data >50 kb. Gradient boosting did not significantly outperform logistic regression on GWAS evaluation data. While gradient boosting significantly outperformed logistic regression on all ABC evaluation data (>1 kb and >5 kb), gradient boosting significantly underperformed logistic regression on longer-range ABC evaluation data (>50 kb and >100 kb); this could be due to the gradient boosting model assigning higher weight to distance-based features than the logistic regression model (however, we do not observe this behavior in other evaluation sets).

Back to article page