Fig. 4: Performance of simulated ALDE campaigns on two combinatorially complete protein datasets, GB1 and TrpB. | Nature Communications

Fig. 4: Performance of simulated ALDE campaigns on two combinatorially complete protein datasets, GB1 and TrpB.

From: Active learning-assisted directed evolution

Fig. 4: Performance of simulated ALDE campaigns on two combinatorially complete protein datasets, GB1 and TrpB.

A Each DE simulation as a greedy single-step walk on four residues, where each residue is fixed to the optimal mutation until all four residues have been iterated across. DE simulations start from every variant that has some measurable function, with all 24 possible orderings of four residues simulated. B Each ALDE simulation starts from a random sample of 96 variants on the 4-site landscape, with four rounds of learning and proposing new sequences to test, each with 96 protein variants. C Hypothetical visualization of the three acquisition functions explored in this work: greedy, upper confidence bound (UCB), and Thompson sampling (TS). D ALDE for four encodings, four models, and three acquisition functions generally outperforms the average DE simulation and random sampling on the GB1 and TrpB datasets. Performance is quantified as the normalized maximum fitness achieved by the end of the ALDE campaign. Error bars indicate standard deviation across 70 random initializations. Source data are provided as a Source Data file.

Back to article page