Fig. 1
From: Genome-wide prediction of DNase I hypersensitivity using gene expression

Concepts of BIRD. a Outline of the study. ENCODE DNase-seq and exon array data are used to train BIRD. Users can apply BIRD to new or existing gene expression samples to predict DH. The predicted DH can be used to predict TFBSs and differential DHSs, convert expression samples in GEO into a regulome database (PDDB), and improve DNase-seq and ChIP-seq data analyses. b Overview of BIRD. Instead of using individual genes as predictors, BIRD groups co-expressed genes into clusters (i.e., gene-cluster) and uses the clusters’ mean expression levels as predictors. BIRD aggregates two types of models. The locus-level model \({\rm{BIR}}\left( {{\bar {\rm{X}}},{\rm{Y}}} \right)\) predicts the DH level at each genomic locus. The pathway-level model \({\rm{BIR}}\left( {{\bar {\rm{X}}},{\bar {\rm{Y}}}} \right)\) further groups correlated loci (i.e., loci with co-varying DH) into different levels of clusters (i.e., DHS pathways) and predicts the DH level for each pathway. Finally, BIRD predicts DH at each locus by combining the locus-level and pathway-level predictions via model averaging