Fig. 4 | Nature Communications

Fig. 4

From: Genome-wide prediction of DNase I hypersensitivity using gene expression

Fig. 4

Predicting transcription factor binding sites. a Sensitivity–FDR curve for predicting ELF1 binding sites in GM12878 using four different methods: true DNase-seq data (“True”), BIRD, mean DH profile of training cell types (“Mean”), and motif mapping score (“Motif”). For BIRD, “BIRD(UW)”,”BIRD(Duke)”, and “BIRD(Chicago)” denote predictions made using exon arrays generated by three different labs. For each method, the sensitivity–FDR curve shows the percentage of gold standard TFBSs that were discovered by the predicted binding sites at different FDR levels. The total number of gold standard TFBSs was shown on the y axis in the brackets. b ROC curve for predicting ELF1 binding sites in GM12878 using different methods. c The number of DHSs predicted to be ELF1 binding sites in GM12878 by different methods at different FDR levels. d Area under the sensitivity–FDR curve for predicting TFBSs of nine TFs in GM12878 using “True”, “BIRD” (based on BIRD(UW)), “Mean”, and “Motif”. e Area under the ROC curve for predicting TFBSs of nine TFs in GM12878 using different methods. f Number of predicted TFBSs at 50% FDR for predicting TFBSs of nine TFs in GM12878 using different methods. g, h Two examples showing the true ELF1 ChIP-seq signal (read count, black) in GM12878, the true DNase-seq signal (gray), the BIRD-predicted DH signal (blue), and the “Mean” DH signal in training cell types (red). Locations of ELF1 motif sites are shown on the bottom. BIRD more accurately captured the true signal than “Mean” (highlighted with red boxes). It also predicts TFBSs better than the motif only approach as many motif sites are not bound and do not have DH signal at all

Back to article page