Fig. 4: Navigating the protein fitness landscape in the VAE latent space. | Nature Communications

Fig. 4: Navigating the protein fitness landscape in the VAE latent space.

From: Deciphering protein evolution and fitness landscapes with latent space models

Fig. 4

a A two-dimensional latent space representation of sequences from the cytochrome P450 family (PF00067). b The two-dimensional latent space representation of 6561 chimeric cytochrome P450 sequences made by combining the three cytochrome P450s (CYP102A1, CYP102A2, CYP102A3) at seven crossover locations. c The two-dimensional latent space representation of 278 chimeric cytochrome P450 sequences whose \({T}_{50}\) values were measured experimentally by the Arnold group55,56,57. Each point represents a chimeric cytochrome P450 sequence. Points are colored by their experimental \({T}_{50}\) values. d The Gaussian process’s performance at predicting \({T}_{50}\) on the training set of 222 chimeric cytochrome P450 sequences using the two-dimensional latent space representation (\({Z}_{1}\), \({Z}_{2}\)) as features and using the radial basis function kernel with Euclidean distance in latent space Z. e The performance of the Gaussian process model from d at predicting T50 on the test set of 56 chimeric cytochrome P450 sequences. f The Gaussian process’s performance at predicting \({T}_{50}\) on the test set of 56 chimeric cytochrome P450 sequences using the 20-dimensional latent space representation (\({Z}_{1}\), ..., \({Z}_{20}\)) as features.

Back to article page