Fig. 4 | Scientific Reports

Fig. 4

From: Medium-sized protein language models perform well at transfer learning on realistic datasets

Fig. 4

Effect of sample size on transfer learning via feature extraction. Results of LassoCV regression on three DMS datasets, using downsampled subsets ranging from 100 to the maximum number of samples in the dataset. The y-axis represents the averaged 5-fold cross-validation \(R^2\) scores, and error bars represent the standard deviation. The x-axis shows the sample sizes tested. The colored lines represent different models and model sizes: ESM-2 8M, 35M, 150M, 650M, 3B, and 15B (Blues), ESM C 300M, and 600M (Greens) and AMPLIFY 120M, and 350M (Oranges).

Back to article page