Extended Data Fig. 3: Extended analysis of deep neural net models. | Nature Genetics

Extended Data Fig. 3: Extended analysis of deep neural net models.

From: The dynamic, combinatorial cis-regulatory lexicon of epidermal differentiation

Extended Data Fig. 3

(a) Schematic describing transfer learning. From left to right: first, models are trained on a large compendium of DNase-seq datasets from ENCODE and Roadmap; these weights are used to initialize training for a keratinocyte specific classification model; finally, these weights are used to initialize training for a regression model. (b) Model performance metrics. Left: area under the precision-recall curve (AUPRC) for the ENCODE/Roadmap pre-training classification tasks across 10 folds. Right: AUPRC for accessibility in keratinocyte timepoints across 10 folds, considering transfer learning or fresh initialization (random seeded weights). Box-and-whisker plots show all points, minimum to maximum, with 25th to 75th interquartile range. (c) Precision-recall curves for the classification stage. Top: Precision-recall for prediction of accessible peaks. Bottom: Precision-recall for prediction of strong enhancer state (presence of ATAC-seq, H3K27ac ChIP–seq, and H3K4me1 ChIP–seq). (d) Heatmaps of observed ATAC signal vs neural net predicted ATAC signal across dynamically accessible regions. (e) Validation of contribution scores by comparing to SNPs exhibiting significant allelic imbalance of ATAC-seq signal. Top: Comparison of effect sizes of allelic imbalance of ATAC-seq signal, between SNPs overlapping nonsignificant contribution scores and those overlapping significant contribution scores. Bottom: comparison of model-derived allelic effect predictions (reference allele - alternate allele) on SNPs overlapping significant contribution scores, separated by whether the SNP was considered allele-sensitive (FDR < 0.10) or not allele-sensitive. Box-and-whisker plots show all points, minimum to maximum, with 25th to 75th interquartile range box. (f) Comparison of neural network derived predictive motifs versus enriched motifs derived by HOMER motif discovery. (g) Predictive, active motif instances of KLF4 show higher ChIP–seq signal relative to inactive motifs in CREs. (h) Evaluation of motif instances identified by sequence-only position weight matrix motif match scores against contribution-weighted sequence motif match scores. (i) Predictive motifs show dynamic footprinting. DLX3 motif is shown. (j) Heatmap showing predictive motifs enriched in CREs corresponding to ATAC-seq trajectories. (k) Heatmap showing TFs whose expression was correlated (r > 0.8) with activity of their matched predictive motifs in CREs corresponding to ATAC-seq trajectories.

Back to article page