Fig. 5: Structural features predict successful membrane protein synthesis in artificial environments.
From: Designer artificial environments for membrane protein synthesis

A Protein structures are decomposed into bins based on amino acid location relative to a simulated membrane (Method 20). Classification is based on the vertical position and whether an amino acid contacts lipids or is buried within the protein. Left panel: All possible layers are shown, with the shaded region indicating the membrane interior. Right panel: The protein is rotated 90° to display only amino acids within membrane layers; those in polar or water layers are labeled as part of the external shell. B Classifiers trained on tSNE embeddings of membrane-contacting amino acids achieved the highest accuracy (Method 21). Embeddings were generated using various feature combinations, and the maximum accuracy of each Ensemble Classifier is shown. Although all combinations were evaluated, only pairs of features are displayed to highlight the specific contributions of each feature. C The tested proteins are well dispersed among the other proteins in the dataset when plotted in one of the embedded spaces used in the top classifier. Red points indicate the proteins that were not successfully made. Blue indicates that they were made. The hue of the remaining points indicates the predicted probability of synthesis success using the top classifier. 4,612 membrane proteins. D All 10 paired classifiers with an accuracy >83%, show strong prediction agreement for proteins that are predicted to have a high likelihood of successful synthesis. The mean of the predictions is plotted versus the standard deviation of predictions. Hue indicates the average predicted label for each protein. Red points indicate the proteins that were not successfully made. Blue indicates that they were made. Purple indicates 3 additional proteins that were selected based on the predicted outcome, and all 3 passed the threshold to be considered successfully produced. 4612 membrane proteins were included in the full predicted set shown in this plot. Source data are provided as a Source Data file.