Fig. 6: Enhancer generation using OmniReg-GPT’s zero shot capability.
From: Omnireg-gpt: a high-efficiency foundation model for comprehensive genomic sequence understanding

A Schematic illustration of using OmniReg-GPT’s likelihood as a metric of STARR-seq and measuring the activities of enhancer and promoter sequence (sequences 264 bp in length were selected and cloned in all pairwise combinations into the promoter and enhancer positions of a plasmid vector together). B Correlation of activity oracle from experiment, OmniReg-GPT prediction and Enformer prediction, with intrinsic promoter strength and the combination of promoter and enhancer strength. C OmniReg-GPT designs cell-type-specific and high activity enhancers with the help of the score model and a progressive prompt setting. D OmniReg-GPT designs synthetic sequences that achieve higher cell-type specific enhancer activity compared to natural sequences in K562 cell line (n = 2000 sequences for each source in CODA, n = 4000 for each source in Round0, n = 6000 for each source in Round1). Data are shown as violin plots (kernel density estimation) with centered box plots displaying median, quartiles, and 1.5× IQR whiskers. The ‘native’ refers to sequences derived from DHS-natural sequences through two rounds of enhancer generation. E Heatmap of alignment scores between the generated sequences in the first round(left), and second round(right) compared with the initial CODA sequence from four different sources. F Distribution of MinGap scores between generated sequences and DHS-natural sequences (n = 4000 sequences for each source). Data are presented as half-violin plots (right side) showing kernel density estimation overlaid with box plots. Box plots display the median at the center line, upper and lower quartiles as box limits, and 1.5× interquartile range as whiskers. Violin plot width represents data density at each AUROC value. Schematic in A was created using BioRender. (Wang, A. (https://BioRender.com/e49fdgh).). Source data are provided as a Source Data file.