Fig. 5: Detailed analysis of SPIRED-Fitness.
From: An end-to-end framework for the prediction of protein structure and fitness from single sequence

a Comparison of ECNet, GeoFitness v2 and SPIRED-Fitness when trained with various proportions of data on single mutations of 485 proteins. The full training set contains 70% of the single mutational data in the whole dataset, which corresponds to the maximum value on the horizontal axis in the bar chart. Each bar represents the Spearman correlation coefficient averaged over the 485 proteins. b 10-fold cross validation of GeoFitness v2 and SPIRED-Fitness with the protein-specific data splitting. The boxplot is constructed from the results of 10 independent experiments, in each of which 80% proteins are chosen for training/validation and the remaining 20% unseen proteins are used for testing. Each red dot represents the result of one individual experiment. The center line of each boxplot shows the median of the validation results with the value marked aside. The box limits correspond to the upper and lower quartiles, whereas the whiskers extend to 1.5 inter-quartile range. Source data are provided as a Source Data file.