Fig. 1: Central concepts behind ChromatinHD-pred and ChromatinHD-diff. | Nature Communications

Fig. 1: Central concepts behind ChromatinHD-pred and ChromatinHD-diff.

From: ChromatinHD connects single-cell DNA accessibility and conformation to gene expression through scale-adaptive machine learning

Fig. 1

a ChromatinHD-pred inputs raw fragments in a neural network architecture, that will (1) transform the positions of each fragment close to a TSS (e.g. -10kb or -100kb) into a positional encoding, (2) transforms this positional encoding into a fragment embedding, typically with a smaller number of features, using one or more non-linear neural network layers, (3) pools the fragment information for each cell and gene. b ChromatinHD-diff uses cell type/state annotations derived from, for example, single-cell RNA-seq to construct a complex multi-resolution cell type/state-specific probability distribution. To do this, we apply several bijective transforms on the cumulative density function (CDF), to ultimately be able to estimate the likelihood of observing a particular cut site using the probability density function (PDF). c Three nested regions exemplifying how ChromatinHD models capture predictive and differential accessibility at different scales. Raw data of the same regions is presented in Supplementary Fig. 1. Red and blue Δcor represents regions that are respectively positively and negatively associated with gene expression. d Summarized relative performance for various tasks: accuracy of prediction (pred.), correlation between predictivity and CRISPRi sensitivity (CRISPRi), enrichment for transcription factor binding sites (TFBSs), enrichment for eQTLs (eQTL), enrichment for genome-wide association study variants (GWAS), and an average of the relative performance across tasks (all). Only methods that were second-best performing for any of the tasks are shown. Full details for each task is shown in Figs. 2 and 3. e The average of the relatively performance against the top performing method across all tasks (from d) for individual datasets. Source data are provided as a Source Data file.

Back to article page