Fig. 4: Cross-protein learning improves the prediction of optimal artificial environments for membrane-protein synthesis.
From: Designer artificial environments for membrane protein synthesis

A t-SNE embedding capturing the proteins’ differential response to reaction conditions reveals no clear clustering of proteins. Each point represents the centroid of all individual model’s predictions for how a protein will respond to all assessed reaction conditions (Method 18). The size of the point represents the standard deviation between the model’s predictions for each protein. Top panel: colored by organism of origin. Bottom panel: colored by the length of the protein. B The inclusion of a protein’s location in the reaction condition embedding space yields improved prediction accuracy. Individual ensembles are trained using the specified input data, one where each protein is held out of the dataset and then predicted afterward. Proteins with no successful synthesis conditions were excluded. n = 16 proteins. Each data point represents the R² obtained by a given model on a particular protein. Statistical significance was evaluated using repeated‐measures ANOVA (with ‘Protein’ as the subject factor and ‘Model’ as the within‐subject factor). Where overall significance was detected, pairwise comparisons between models were conducted using two‐sided paired t‐tests with a Bonferroni correction. Error bars represent mean ± SEM. Source data are provided as a Source Data file.