Fig. 1: Overview of the EPInformer framework for gene expression prediction by integrating multimodal promoter-enhancer data.

a EPInformer is trained on multimodal epigenomic data and promoter–enhancer sequences to predict CAGE or RNA-seq expression in specific cell types. It first derives promoter and candidate enhancer embeddings using residual and dilated convolutions in EPInformer-seq, which can be pre-trained on cell-type-specific enhancer signals to initialize the convolutional filters. The fusion layer optionally merges the sequence embeddings with distance, chromatin contacts, or epigenomic signals (e.g., H3K27ac and DNase). The interaction encoder employs a series of transformer encoders with multi-head attention modules designed to capture promoter-enhancer interactions. Finally, the prediction module integrates the resulting embeddings with mRNA half-life features and the promoter signal through fully connected layers to predict the gene expression. The EPInformer model is versatile for multiple tasks: b predicting gene expression from promoter and enhancer sequences with multimodal epigenomic signals; c prioritizing enhancers that may drive expression using the attention module of the interaction encoder, with scores derived from the average attention weights of the attention heads and layers; and d identifying regulatory sequence features and transcription factor binding motifs at enhancers pinpointed by attention score for the target gene through the sequence encoder with downstream interpretation tools (e.g., TF-MoDISco-lite36 and Tangermeme37).