Fig. 5: Genome-scale generation across the domains of life.
From: Genome modelling and design across all domains of life with Evo 2

a, Evo 2 can generate chromosome- and genome-scale DNA sequences using unconstrained autoregressive generation. The model was prompted with portions of the H. sapiens mitochondrial genome, M. genitalium genome and S. cerevisiae chromosome III to generate DNA sequences with similar lengths to those of the native sequences. b, Evo 2 was prompted with both the genomic context and a portion of a highly conserved protein, followed by measuring the sequence recovery of the Evo 2-generated gene completion against the natural gene. c, Predicted rRNA, CDS and tRNA counts in Evo 2-generated mitochondrial sequences using MitoZ compared with the natural H. sapiens mitochondrial genome values. d, Query cover versus sequence identity of generated mitochondrial sequences against nucleotide BLAST hits in the core_nt database with expect threshold of 0.05, coloured by the E-value. e, Visualizations of Evo 2-generated sequences when prompted with a 3-kb sequence from the H. sapiens mitochondrial genome, demonstrating variation that still retains natural synteny patterns of coding sequences. f, AlphaFold 3-predicted structure of multimeric complexes from an Evo 2-generated sequence resembling human mitochondrial DNA. Sequence identity (seq. ID) compares Evo 2-generated proteins with natural proteins found via a BLASTp query. g, Example Evo 2-generated approximately 600-kb DNA sequence. Evo 2 was prompted with the beginning of the M. genitalium genome. Genes are annotated with Prodigal and coloured on the basis of statistically significant sequence similarity to natural proteins (hmmscan E-value < 0.001). h, The fraction of Prodigal-annotated genes with hmmscan hits between Evo 2 40B and M. genitalium generated by Evo 1. i, Distribution of Prodigal-annotated genes from Evo 2-generated M. genitalium compared with the natural genome. j, Distribution of secondary structure from Evo 2-generated proteins compared to natural M. genitalium proteins. k, AlphaFold 3 structure predictions of example proteins found on Evo 2-generated prokaryotic genomic sequences, with high observed structural similarities to natural proteins while diversifying the sequence com-position. l, The native genome sequence from S. cerevisiae chromosome III and an Evo 2-generated DNA sequence of similar length, which was generated by prompting the model with a 10-kb sequence from S. cerevisiae chromosome III, are visualized alongside predicted homologous yeast gene, exon, promoter and tRNA annotations.