Extended Data Fig. 9: Additional results for unconstrained generation at genome and chromosome scale.
From: Genome modelling and design across all domains of life with Evo 2

(a) Amino acid sequence recovery for different genes across Evo 2 models when prompted with genomic context of the respective gene. Evo 2 models were tested across multiple stages of context extension, including the base model pretrained at 8k token context, intermediate models at 262k and 524k token context, and the final model with context extension to 1 M token context. (b) Structural alignment of AlphaFold 3-predicted complexes of native human mitochondrial proteins and of AlphaFold 3-predicted complexes of Evo 2-generated human mitochondrial proteins. (c) Predicted aligned error (PAE) of the Evo 2-generated mitochondrial complexes from AlphaFold 3. (d) Distribution of mitochondrial codon frequencies between Evo 2-generated mitochondrial sequences and H. sapiens mitochondria. (e) Frequency of each tRNA anticodon across Evo 2-generated mitochondria as annotated by MitoZ. (f) ESMFold predicted local distance difference test (pLDDT) distributions of natural and Evo 2-generated M. genitalium genes called by Prodigal. (g) Distribution of TM scores from Evo 2-generated M. genitalium genes against UniRef50 AlphaFold DB. (h) Predicted AlphaFold 3 structures, TM scores, and sequence identity comparing genes from Evo 2-generated M. genitalium with natural proteins. (i) Distribution of genes, introns, tRNAs, and promoters on an Evo 2-generated S. cerevisiae sequence compared with the natural S. cerevisiae chromosome III (gray line). (j) Distribution of gene lengths and pLDDTs of Evo 2-generated and natural S. cerevisiae genes, annotated by GeneMark-ES. (k) Distribution of TM scores of genes from Evo 2-generated S. cerevisiae sequence against UniRef 50. (l) Distribution of different secondary structures in Evo 2-generated against S. cerevisiae chromosome III wildtype genes. (m) Protein structure of genes from Evo 2-generated S. cerevisiae sequence compared with the natural structure and sequence. (n) Tetranucleotide usage deviation (TUD) comparison between natural S. cerevisiae, S. pombe, and Evo 2-generated sequences for whole sequence, CDS, and promoters (upstream of gene start).