Extended Data Fig. 1: The convolutional sequence-to-expression model generalizes reliably and characterizes sequence trajectories under different evolutionary regimes. | Nature

Extended Data Fig. 1: The convolutional sequence-to-expression model generalizes reliably and characterizes sequence trajectories under different evolutionary regimes.

From: The evolution, evolvability and engineering of gene regulatory DNA

Extended Data Fig. 1: The convolutional sequence-to-expression model generalizes reliably and characterizes sequence trajectories under different evolutionary regimes.

ad, Prediction of expression from sequence in complex (YPD) (a, b) and defined (SD-Uracil) (c, d) medium. Predicted (x axis) and experimentally measured (y axis) expression for (a, c) random test sequences (sampled separately from and not overlapping with the training data) and (b, d) native yeast promoter sequences containing random single base mutations. Top left: Pearson’s r and associated two-tailed P value. Compression of predictions in the lower left results from binning differences during cell sorting in different experiments (Supplementary Notes). e, Experimental validation of trajectories from simulations of random genetic drift. Distribution of measured (light grey) and predicted (dark grey) changes in expression in the defined medium (SD-Uracil) (y axis) for the synthesized randomly designed sequences (n = 2,986) at each mutational step (x axis). Midline: median; boxes: interquartile range; whiskers: 5th and 95th percentile range. f, g, Simulation and validation of expression trajectories under SSWM in defined medium (SD-Uracil). f, Distribution of predicted expression levels (y axis) in defined medium at each evolutionary time step (x axis) for sequences under SSWM favouring high (red) or low (blue) expression, starting with native promoter sequences (n = 5,720). Midline: median; boxes: interquartile range; whiskers: 5th and 95th percentile range. g, Experimentally measured expression distribution in defined medium (y axis) for the synthesized sequences (n = 6,304 sequences; 637 trajectories) at each mutational step (x axis) from predicted mutational trajectories under SSWM, favouring high (red) or low (blue) expression. Midline: median; boxes: interquartile range; whiskers: 5th and 95th percentile range. ho, Experimental validation of predicted expression for sequences from the random genetic drift and SSWM simulations. Experimentally measured (y axis) and predicted (x axis) expression level (lo) or expression change from the starting sequence (hk) in complex (h, j, l, n) or defined (i, k, m, o) medium using sequences from the random genetic drift (Fig. 2e, Extended Data Fig 1e, h, i, l, m here) and SSWM (Fig. 2g, Extended Data Fig 1g, j, k, n, o here) validation experiments. Top left: Pearson’s r and associated two-tailed P values.

Back to article page