Fig. 1: The schematic overview of the pretraining framework.
From: Omnireg-gpt: a high-efficiency foundation model for comprehensive genomic sequence understanding

A OmniReg-GPT Workflow: The model undergoes generative pretraining on randomly extracted sequences from the human genome. The core of OmniReg-GPT features a hybrid attention module that combines local and global attention mechanisms, optimizing resource efficiency during training. Local attention divides query into windows and concatenates each query window with the previous window to form key-value pairs, then merges window dimensions into batch dimension for efficient attention computation. OmniReg-GPT is then applied to a range of predictive tasks through simple transfer learning, and is also used for generating functional regulatory elements. B The pretrained embeddings can be leveraged to predict multi-scale gene regulation profiles, including chromatin feature profiling, local genomic regulation rules, and chromatin topology and interactions. By incorporating regulatory knowledge within the model parameters, OmniReg-GPT is capable of in silico generating cell type-specific, high-activity enhancers through its reasoning abilities. Schematics in A and B were created using BioRender. (Wang, A. (https://BioRender.com/gnepe2b) and (https://BioRender.com/q9xdysb).