Fig. 4
From: In silico prediction of high-resolution Hi-C interaction matrices

Analysis of features important for predicting Hi-C contact counts. a Shown are the top 20 MULTI-CELL features ranked based on Out of Bag (OOB) feature importance on chromosome 17 in all five cell lines. Each horizontal bar corresponds to one feature. The feature name includes the name of the histone mark, DNase I or TF, whether it is on one of the interaction regions (R1, R2) or in the intervening window (W), and the specific cell line from which this feature is extracted. b Shown are top 20 features ranked based on counting the number of times a feature is used for test set predictions. Feature rankings are for chromosome 17 for all five cell lines. c Non-negative matrix (NMF) factorization of region-pair by feature-pair matrix for Gm12878 chromosome 17. The \({\bf{U}}\) and \({\bf{V}}\) factors are the NMF factors to provide membership of region pairs or feature pairs in a cluster (white lines demarcate the region pair and feature pair clusters). The factorized feature count matrix is shown below the \({\bf{V}}\) factors and to the right of the \({\bf{U}}\) factors. The heatmap on the right are the features associated with each of the pairs, with rows corresponding to a pair of regions and columns corresponding to the feature values grouped by the cell line from which they are obtained. Bottom are Cytoscape network representation of important pairs of features. The node size is proportional to the number of times the specific feature co-occurs on a path in the regression tree. The thickness of the line is proportional to the number of times the pair of features is used on the path from root to the leaf for a test example pair. Font size of the node label is proportional to its size.