Extended Data Fig. 2: Biosecurity and ethics evaluations for model generation and scoring. | Nature

Extended Data Fig. 2: Biosecurity and ethics evaluations for model generation and scoring.

From: Genome modelling and design across all domains of life with Evo 2

Extended Data Fig. 2: Biosecurity and ethics evaluations for model generation and scoring.

To decrease dual use risks, safety filtering was performed on the training data to remove viral sequences that can infect eukaryotic hosts. Evo 2 is less performant on eukaryotic viruses, as intended. (a) Perplexity scores for viral sequences from the USDA Select Agents and Toxins List consistently demonstrate elevated perplexity values compared to non-pathogenic viruses and prokaryotic viruses. Blue violin plots show the distribution of scores, with individual data points overlaid representing 512-bp chunks sampled uniformly at random across viral genomes. (b) Correlation of language model likelihood with experimental deep mutational scanning (DMS) fitness measurements for human viral proteins. Gray bars represent mean correlation coefficients, with individual data points corresponding to DMS datasets from ProteinGym. Results indicate poor predictive capability on viral protein mutational effects for Evo 2 and Evo 1 models. (c) Comparative analysis of protein sequence generation success rates across different model conditions. Bar heights represent percentage amino-acid sequence recovery in the response sequences when prompted with a portion of a viral protein, with error bars showing standard deviation across multiple responses to the same prompt. Models were tested with various prompting proteins (shown on the horizontal axis) with different Evo 2 models (indicated by color). Random sequence generations are included as a control condition. (d) Analysis of ancestry bias for Evo 2 as a variant effect predictor compared to baselines, with protein mutations converted to DNA codons. Baseline performance data is taken from Pathak et al. Most variant effect predictors have ancestry bias, and score non-European ancestry variants as more pathogenic. Evo 2 has similar ancestry bias as other population-free methods, examined by taking both the ratio (heatmap) and mean difference (bar plot) of min-max scaled scores of each population subgroup to the European subgroup.

Back to article page