Extended Data Fig. 7: Extrapolation performance from single and double mutants to higher-order mutants.
From: Learning protein fitness models from evolutionary and assay-labeled data

Analogous to Fig. 5, but training on a random sample from both single and double mutants. Each column shows the performance when training on randomly sampled single mutants and then separately testing on single, double, or triple mutants, none of which were in the training data. The total size (TS) indicates the total number of mutants of a particular order in all of the data. For example, ‘TS=613’ for single mutants means there were 613 total single mutants in the data set that we sampled from. Error bars are centered at the mean and indicate bootstrapped 95% confidence interval from 20 random data splits. See Supplementary Fig. 6 for analogous plot using NDCG.