Fig. 2: Overview of model design. | Nature Communications

Fig. 2: Overview of model design.

From: Predicting proximal tubule failed repair drivers through regularized regression analysis of single cell multiomic sequencing

Fig. 2

Clustered multiome dataset contains chromatin accessibility and gene expression profiles for each nucleus. The model’s first step is to learn gene expression predicted by accessibility of peaks within 500kbp of the gene TSS. This step identifies cis-regulatory elements (CREs) as peaks with accessibility changes correlated with target gene expression. The second step annotates peaks with potential binding transcription factors (TFs) by scanning for TF motifs. TFs with predicted motifs in predicted CREs are aggregated as putative regulatory TFs for a target gene. The third step is a repeated training step in which the model learns gene expression predicted by expression of TFs selected in the second step. This step identifies putative regulatory TFs based on the correlation between target gene and TF expression in the multiome dataset. For both learning steps, an adaptive elastic-net regression model is used.

Back to article page