Extended Data Fig. 11: Additional data for generative chromatin accessibility and cell-type specificity.
From: Genome modelling and design across all domains of life with Evo 2

(a) Genomic tracks visualizing the sequence statistics plotted in (Extended Data Fig. 10f) for the “ARC” and “EVO2” designs. AUROC indicates how predictive the statistic is for the binary peak region labels. Spearman r indicates the correlation between the statistic and the experimentally determined ATAC-seq coverage value. (b) A histogram distribution of 24 designed sequences [see panels (f) and (g)] where high chromatin accessibility in K562 and low accessibility in HEK293T was specified, plotted according to the K562/HEK293T fold change in mean coverage across all positions in the peak. A fold change of 1 indicates the same mean coverage in both cell types. Four sequences with >2-fold change and one sequence with >3-fold change in mean coverage were observed (where K562 has higher mean coverage), representing a 4-17% success rate. (c) Summary results of designs in human cells that attempt to either maximize accessibility (“K562 on” and “HEK293T on”) or minimize accessibility (“K562 off” and “HEK293T off”) across the full designed sequence (see panels (d) and (e), respectively). “Mean coverage” indicates the mean of coverage values across all 1-kb sequence positions. All designs that maximize accessibility have mean coverage values > 2.7 and all designs that minimize accessibility have mean coverage values < 1.3. (d–h) Plots showing ATAC-seq coverage of 1-4 kb designs in which the same sequence was integrated into both HEK293T and K562. The design patterns were grouped into five main categories. (d) The first consists of designs that aim to maximize accessibility in both cell lines across an entire 1-kb region. (e) The second consists of designs that aim to minimize accessibility in both cell lines across an entire 1-kb region. (f) The third consists of designs that aim to design a single peak in the first half of the K562 region while minimizing accessibility across the whole region in HEK293T. (g) The fourth category is similar to the third, but where we aim to design a single peak in the second half of the K562 region. (h) The fifth consists of miscellaneous 4-kb designs that specify either two or four peaks in either K562 alone or in both cell lines. For all human cell line experiments, ATAC-seq coverage values are the average across two transfection/nucleofection replicate populations of cells. (i,j) Distribution of genes plotted by log(1 + TPM) expression values on the x-axis, used to determine a gene expression cutoff for K562 (log(1 + TPM) > 1.5) (i) and for HEK293T (log(1 + TPM) > 2.15) (j). This expression cutoff was used to determine whether TF motifs found in the B7, B10, B11, and B12 designs were significantly enriched for TFs expressed in K562 (i) or in HEK293T cells (j).