Fig. 6: Generative chromatin accessibility via inference-time guidance.
From: Genome modelling and design across all domains of life with Evo 2

a, Multi-kilobase sequences were designed to control the locations and lengths of chromatin-accessible regions, which are visualized as peaks indicating the degree of accessibility along a one-dimensional genomic sequence. b, 128-bp DNA chunks from a prompt were autoregressively generated with Evo 2. A beam search algorithm then selects the optimal chunks by scoring how well their Enformer- and Borzoi-predicted chromatin accessibility profiles match a target pattern. The best chunks were appended to the prompt to guide subsequent generation. c, Design runs are plotted by how successfully they matched the target pattern versus the compute used. AUROC quantifies how well predicted accessibility profiles can distinguish our desired open- versus closed-chromatin positions. The horizontal axis plots the number of tokens sampled per base pair in the design (standard autoregressive decoding is 1 token per bp). Individual design runs are plotted as grey dots and the averages across design runs for each beam search width are plotted as crosses. d, Two different peak patterns were designed with varying total compute budgets, with more compute leading to clearer designed peaks. e, Designs were experimentally tested by synthesizing and assembling the DNA, performing site-specific integration into mouse or human cells, and measuring chromatin accessibility with ATAC-seq. f–h, Control over the position and width of chromatin accessibility peaks enables Morse code messages (‘EVO2’ (f), ‘LO’ (g) and ‘ARC’ (h)) in the epigenome. Generated DNA sequences replace the native sequence at chrX: 52,051,929–52,123,468 in the mouse genome. Enformer and Borzoi predictions are based on the DNase hypersensitivity tracks in 129 ES-E14 cells. Designs were sampled using 30–84 tokens per base pair (Methods). Designs were experimentally validated in Bl6xcast mESCs. i,j, Integrating the same generated sequence into both HEK293T and K562 cells enables the design of identical patterns across both cell types (i) or of cell-type-specific accessibility profiles (j). k, AUROC quantifies how well experimental accessibility profiles can distinguish our desired open- versus closed-chromatin positions. Five designs were tested in HEK293T and 31 designs in K562 for which we varied the chromatin accessibility along the sequence. Dots indicate individual designs. l, The paradigm of using an accurate scoring function to guide a capable generative model extends beyond chromatin accessibility design, enabling many complex biological design applications.