Abstract
Peptide-spectrum match (PSM) rescoring is critical for accurate peptide identification in data-dependent acquisition (DDA)-based proteomics. Existing rescoring frameworks typically combine search-engine scores with heuristic or learned auxiliary features to refine PSM ranking and confidence estimation. Although recent approaches incorporate deep learning-derived representations of spectra, retention time, or ion mobility, the final decision stage still commonly relies on separately trained shallow classifiers, constraining the expressive capacity of the overall scoring framework. Here, we introduce DDA-BERT, a transformer-based end-to-end deep learning model trained with ~271 million PSMs from 11 species. DDA-BERT consistently outperforms existing tools across species-specific benchmarks, achieving 2.24%–269.35%, 3.73%–141.46%, 5.53%–45.64%, and 3.68%–62.77% increases in peptide identifications on human, yeast, Drosophila, and Arabidopsis datasets, respectively. The model retains high sensitivity in trace-level proteomics samples. On HLA immunopeptidomics data, DDA-BERT further increases peptide identifications by 4.14%–87.47%. The main limitations of DDA-BERT include the requirement for GPU-based computing and the need for substantial, diverse training datasets to achieve optimal model performance. This study introduces an alternative DDA rescoring approach and establishes a methodological foundation for scalable, AI-driven peptide identification in DDA proteomics.
Similar content being viewed by others
Acknowledgements
This work is supported by grants from National Natural Science Foundation of China (Key Joint Research Program, grant no. U24A20476) (T.G.), Noncommunicable Chronic Diseases-National Science and Technology Major Project (grant no. 2024ZD0533300) (T.G.), “Pioneer” and “Leading Goose” R&D Program of Zhejiang (2024SSYS0035) (T.G.), and State Key Laboratory of Medical Proteomics (SKLP-K202406) (T.G.). We gratefully acknowledge the Westlake University Supercomputer Center for assistance in data analysis and storage. During the preparation of this work, the authors used ChatGPT to improve language and readability. The authors reviewed and edited the output as needed and take full responsibility for the content of the publication.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
T.G. is the shareholder of Westlake Omics Inc. P.L. is the employee of Westlake Omics Inc. The remaining authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
A, J., Liu, P., Sun, Y. et al. DDA-BERT: end-to-end training for data-dependent acquisition mass spectrometry-based proteomics. Nat Commun (2026). https://doi.org/10.1038/s41467-026-72246-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-026-72246-6


