Extended Data Fig. 2: Analysis of μProtein. | Nature Machine Intelligence

Extended Data Fig. 2: Analysis of μProtein.

From: Accelerating protein engineering with fitness landscape modelling and reinforcement learning

Extended Data Fig. 2

a) Performance of μFormer and Ridge on GB1 double mutants with varying training data size. Here, μFormer is a μFormer variation with a smaller supervised scorer module size (μFormer-SS). Training data ratio indicates the number of residues used for training versus the total number of amino acids in GB1. The training data size equals 209, 418, 627, 836, and 1045 for 20%, 40%, 60%, 80%, and 100%, respectively. All scores were evaluated on GB1 saturated double mutants (n=535,917). Center: mean. Error bands: standard deviation. Five experiments are performed for each setting with random selection on training data. b) Illustration of test data split, using a protein of 10 residues and the 40% setting as an example. 2/2 unseen: neither of the mutated residues in double mutants are seen by the model. 1/2 unseen: one and only one of the mutated residues in double mutants are seen by the model. c) Performance of μFormer and Ridge on different splits of GB1 double mutants. Training data split criteria are the same as in a). Center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; points, outliers. Five experiments are performed for each setting with random selection on training data.

Back to article page