Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Nature Communications
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. nature communications
  3. articles
  4. article
DDA-BERT: end-to-end training for data-dependent acquisition mass spectrometry-based proteomics
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 27 April 2026

DDA-BERT: end-to-end training for data-dependent acquisition mass spectrometry-based proteomics

  • Jun A1,2,3 na1,
  • Pu Liu4 na1,
  • Yingying Sun1,2,3 na1,
  • Jiaying Lin1,2,3,
  • Xiaofan Zhang1,2,3,
  • Zongxiang Nie2,
  • Jingnan Liu1,2,3,5,
  • Zhiguo Yu1,2,3,6,
  • Yuqi Zhang1,2,3,
  • Ziyuan Xing1,2,3,
  • Yi Chen2 &
  • …
  • Tiannan Guo  ORCID: orcid.org/0000-0003-3869-76511,2,3 

Nature Communications (2026) Cite this article

  • 5332 Accesses

  • 2 Altmetric

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Machine learning
  • Proteome informatics
  • Proteomics

Abstract

Peptide-spectrum match (PSM) rescoring is critical for accurate peptide identification in data-dependent acquisition (DDA)-based proteomics. Existing rescoring frameworks typically combine search-engine scores with heuristic or learned auxiliary features to refine PSM ranking and confidence estimation. Although recent approaches incorporate deep learning-derived representations of spectra, retention time, or ion mobility, the final decision stage still commonly relies on separately trained shallow classifiers, constraining the expressive capacity of the overall scoring framework. Here, we introduce DDA-BERT, a transformer-based end-to-end deep learning model trained with ~271 million PSMs from 11 species. DDA-BERT consistently outperforms existing tools across species-specific benchmarks, achieving 2.24%–269.35%, 3.73%–141.46%, 5.53%–45.64%, and 3.68%–62.77% increases in peptide identifications on human, yeast, Drosophila, and Arabidopsis datasets, respectively. The model retains high sensitivity in trace-level proteomics samples. On HLA immunopeptidomics data, DDA-BERT further increases peptide identifications by 4.14%–87.47%. The main limitations of DDA-BERT include the requirement for GPU-based computing and the need for substantial, diverse training datasets to achieve optimal model performance. This study introduces an alternative DDA rescoring approach and establishes a methodological foundation for scalable, AI-driven peptide identification in DDA proteomics.

Similar content being viewed by others

A streamlined platform for analyzing tera-scale DDA and DIA mass spectrometry data enables highly sensitive immunopeptidomics

Article Open access 07 June 2022

DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysis

Article Open access 14 April 2025

Artificial intelligence-driven approaches for the rational design of peptides with predictable aggregation propensity

Article Open access 25 September 2025

Acknowledgements

This work is supported by grants from National Natural Science Foundation of China (Key Joint Research Program, grant no. U24A20476) (T.G.), Noncommunicable Chronic Diseases-National Science and Technology Major Project (grant no. 2024ZD0533300) (T.G.), “Pioneer” and “Leading Goose” R&D Program of Zhejiang (2024SSYS0035) (T.G.), and State Key Laboratory of Medical Proteomics (SKLP-K202406) (T.G.). We gratefully acknowledge the Westlake University Supercomputer Center for assistance in data analysis and storage. During the preparation of this work, the authors used ChatGPT to improve language and readability. The authors reviewed and edited the output as needed and take full responsibility for the content of the publication.

Author information

Author notes
  1. These authors contributed equally: Jun A, Pu Liu, Yingying Sun.

Authors and Affiliations

  1. Affiliated Hangzhou First People’s Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, School of Future Biomedicine, Westlake University, Hangzhou, China

    Jun A, Yingying Sun, Jiaying Lin, Xiaofan Zhang, Jingnan Liu, Zhiguo Yu, Yuqi Zhang, Ziyuan Xing & Tiannan Guo

  2. Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, China

    Jun A, Yingying Sun, Jiaying Lin, Xiaofan Zhang, Zongxiang Nie, Jingnan Liu, Zhiguo Yu, Yuqi Zhang, Ziyuan Xing, Yi Chen & Tiannan Guo

  3. Research Center for Industries of the Future, School of Life Sciences, Westlake University, Hangzhou, China

    Jun A, Yingying Sun, Jiaying Lin, Xiaofan Zhang, Jingnan Liu, Zhiguo Yu, Yuqi Zhang, Ziyuan Xing & Tiannan Guo

  4. Westlake Omics (Hangzhou) Biotechnology Co., Ltd., Hangzhou, China

    Pu Liu

  5. School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China

    Jingnan Liu

  6. School of Informatics, Hunan University of Chinese Medicine, Changsha, China

    Zhiguo Yu

Authors
  1. Jun A
    View author publications

    Search author on:PubMed Google Scholar

  2. Pu Liu
    View author publications

    Search author on:PubMed Google Scholar

  3. Yingying Sun
    View author publications

    Search author on:PubMed Google Scholar

  4. Jiaying Lin
    View author publications

    Search author on:PubMed Google Scholar

  5. Xiaofan Zhang
    View author publications

    Search author on:PubMed Google Scholar

  6. Zongxiang Nie
    View author publications

    Search author on:PubMed Google Scholar

  7. Jingnan Liu
    View author publications

    Search author on:PubMed Google Scholar

  8. Zhiguo Yu
    View author publications

    Search author on:PubMed Google Scholar

  9. Yuqi Zhang
    View author publications

    Search author on:PubMed Google Scholar

  10. Ziyuan Xing
    View author publications

    Search author on:PubMed Google Scholar

  11. Yi Chen
    View author publications

    Search author on:PubMed Google Scholar

  12. Tiannan Guo
    View author publications

    Search author on:PubMed Google Scholar

Corresponding authors

Correspondence to Yi Chen or Tiannan Guo.

Ethics declarations

Competing interests

T.G. is the shareholder of Westlake Omics Inc. P.L. is the employee of Westlake Omics Inc. The remaining authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Description of Additional Supplementary Files (download PDF )

Supplementary Data 1 (download CSV )

Reporting Summary (download PDF )

Transparent Peer Review File (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

A, J., Liu, P., Sun, Y. et al. DDA-BERT: end-to-end training for data-dependent acquisition mass spectrometry-based proteomics. Nat Commun (2026). https://doi.org/10.1038/s41467-026-72246-6

Download citation

  • Received: 21 November 2024

  • Accepted: 09 April 2026

  • Published: 27 April 2026

  • DOI: https://doi.org/10.1038/s41467-026-72246-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Videos
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims & Scope
  • Editors
  • Journal Information
  • Open Access Fees and Funding
  • Calls for Papers
  • Editorial Values Statement
  • Journal Metrics
  • Editors' Highlights
  • Contact
  • Editorial policies
  • Top Articles

Publish with us

  • For authors
  • For Reviewers
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Nature Communications (Nat Commun)

ISSN 2041-1723 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research