Extended Data Fig. 9: GeoFitness values of AmeR mutations.
From: EvoAI enables extreme compression and reconstruction of the protein sequence space

(a) GeoFitness values of all single site mutations generated by the pre-trained model. The first 28 sites of N-terminal were discarded in prediction because of low confidence. A larger score indicates a higher likelihood that this mutation will improve protein function. (b) GeoFitness value ranking of mutations from all the anchors. The selected top 11 sites (13 mutations in total) were colored in red. (c) Predicted score of the top 10 variants each with a combination of 6 mutations designed by EvoAI. (d) Spearman correlation coefficient of the predicted fold repression rank and the experimental fold repression rank of the top 10 variants. The shaded area around the fitted line represents the 95% confidence interval. (e) Top 15 mutations with the highest predicted GeoFitness values from all single mutations. (f) Experimental fold repression of the designed variants using model trained by EvoScan anchors (Top, Middle, Bottom) or deep mutational scanning (DMS) information. Data points are mean of three biological replicates. The centre line represented the median value, while the box contained a quarter to three quarters of the dataset. The minima and maxima were also shown by the whiskers. (g) AUPRC plot for EvoAI-generated variants. (h) AUPRC plot for the test set during EvoAI training.