Supplementary Figure 6: Evaluation of model overfitting on internal and external datasets.
From: Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning

(a) Comparison precursor charges of peptides from the Bekker-Jensen tryptic dataset. Peptides that were also part of the ProteomeTools Holdout dataset exhibit a different precursor charge distribution than those that were not. (b) Spectral angle distributions by precursor charge for peptides in the Bekker-Jensen tryptic dataset split by whether they were also part of the ProteomeTools Holdout dataset. (c) Benchmark of Prosit’s (green) and MS2PIP’s (orange) fragment ion intensity prediction on tryptic peptides from the Bekker-Jensen dataset. The top histogram shows spectral angles for peptides that were also synthesized in the ProteomeTools project, but not used for training Prosit. The bottom histogram shows the distribution of spectral angles for peptides not part of ProteomeTools.