Fig. 4: Error analysis and model explainability offer valuable insights into the performance of PrimeNovo. | Nature Communications

Fig. 4: Error analysis and model explainability offer valuable insights into the performance of PrimeNovo.

From: π-PrimeNovo: an accurate and efficient non-autoregressive deep learning model for de novo peptide sequencing

Fig. 4

a Attention map and feature vector similarity: This section showcases the visualization of attention maps between the Transformer encoders of Casanovo V2 and PrimeNovo. It also includes a detailed similarity analysis of each column in the feature vector from the value matrix projection. The boxplot displays the minimum, maximum, median, and quartiles of the similarities scores (n = 421,232, outliers omitted). b Layerwise prediction refinement: A case study demonstrates how PrimeNovo’s non-autoregressive model progressively refines predictions layer by layer, highlighting the model’s capacity for self-correcting its predictions as a whole. Note that * represents the Glutamine deamidation modification on amino acid Q. c The points display the average prediction accuracy at the amino acid level across each layer in PrimeNovo, with the boxplot showing the minimum, maximum, median, and quartiles of the prediction accuracy (n = 88,236). d This diagram illustrates the proportion of peaks corresponding to b-y ions, as determined from predictions, based on all peaks within the PT test set ranked within the top 10 by their contribution scores. e Alignment between the model’s contribution scores and the theoretical b-y ion peaks derived from predictions is presented. The diagram’s lower half shows the magnitude of all contribution scores, emphasizing those matching the b-y ions. The upper half provides a comparison with the original spectrum. f A case study on how the theoretical ions, calculated from the predicted peptide, align with the input spectrum. The matched theoretical b-y ions are distinctly marked in red and blue for predictions made by PrimeNovo and Casanovo, respectively. This comparison seeks to identify potential sources of error in incorrect predictions. The diagram’s bottom left section highlights a high contribution score assigned to an incorrect peak, corresponding to a b-ion peak linked to an erroneous amino acid prediction in PrimeNovo’s final layer. Source data are provided as a Source Data file.

Back to article page