Fig. 3: PrimeNovo’s exceptional performance extends to unseen spectra from various biological sample sources.

a Average peptide recall: This section details the average peptide recall of PrimeNovo compared to baseline models across four distinct large-scale MS/MS datasets. b Enzyme-specific performance: Performance breakdown among six different proteolytic enzymes in the IgG1-Human-HC dataset. c Amino acid-level precision: The chart depicts the amino acid-level precision for PrimeNovo and Casanovo V2 on the IgG1-Human-HC (9719 tested spectrum samples) and HCC datasets (56,000 tested spectrum samples). The x-axis shows the coverage rate of predicted peptides based on each model’s confidence score. For instance, 20%-40% indicate the 20%-40% least confident predictions based on confidence scores. AA precision is then calculated within each coverage range. Note that data are presented as median values of each confidence level with interquartile range (50% percentile interval). d A Venn diagram illustrates the number of overlapping peptides among three de novo sequencing models and a traditional database searching algorithm. Each count represents identical peptides identified by both MaxQuant and the respective model for the same spectrum. e Model fine-tuning results: This chart demonstrates how performance on the HCC test dataset changes with the addition of more HCC training data during fine-tuning. The left side shows fine-tuning with only the HCC dataset, leading to catastrophic forgetting of the original data distribution (nine-species benchmark dataset). The right side shows fine-tuning with a mix of HCC and MassIVE-KB training data. The data points in the right figure show the performance of three different data ratios during the fine-tuning stage. We plot a central curve that connects the mean values of the data points, with a light background representing the s.d. f A comparison of performance between PrimeNovo and five other de novo models on a 3-species test dataset. g This diagram demonstrates the model’s generalization capability when trained exclusively with each training dataset. The left-hand side indicates each one of the four training data PrimeNovo is trained on. The thickness of each line indicates the performance on each of the four testing sets on the right-hand side, with a thicker line being better performance. The numbers on the stem indicate the averaged peptide recall over all four testing sets, highlighting the distributional transferability of each training data. The model trained on MassIVE-KB exhibited the highest average peptide recall, 65% (bolded). Source data are provided as a Source Data file.