Fig. 2: A detailed comparison between PrimeNovo and previous deep learning-based approaches on the nine-species benchmark dataset.

a The performance comparison between PrimeNovo and other de novo algorithms for recall-coverage curves on the nine-species benchmark dataset. These curves illustrate recall (the averaged peptide recall)—coverage (the proportion of the predicted spectra to all annotated spectra ranked by the model’s confidence) relationships across all confidence levels for each test species. PrimeNovo CV represents our model trained on the nine-species benchmark dataset using a cross-validation strategy. PrimeNovo represents our model trained on the MassIVE-KB dataset. b The average prediction performance on each individual species for PrimeNovo and comparison models. PrimeNovo w/o PMC presents results obtained using CTC beam search decoding without PMC. c Comparison of Amino Acid level prediction recall across nine different species between Casanovo V2 and PrimeNovo. d Inference Speed Comparison: A comparison of inference speeds, measured in the number of spectra decoded per second, between PrimeNovo and Casanovo V2. The speed tests were conducted on the same computational hardware (single A100 NVIDIA GPU) and averaged over data from all test species. e and (f) Influence of Missing Peaks and Peptide Length: These plots reveal how the degree of missing peaks (less or equal to 8 for missing peaks and length ranging from 7 to 27) and the length of true labels affect the predictions of PrimeNovo and Casanovo V2. We plot a central curve that connects the mean values of the data points (n = 9714), with a light background representing the s.d. (scale factor=0.2) g. Performance on Amino Acids with Similar Masses: A comparison of Casanovo V2 and PrimeNovo in predicting amino acids with very similar molecular masses, such as K (128.094963) with Q (128.058578) and F (147.068414) with Oxidized M (147.035400). h Ablation study: An analysis of the impact of adding each module of our approach on the overall performance (of the nine-species benchmark dataset. (n = 9, data are presented as mean values ± sd). Source data are provided as a Source Data file.