Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Nature Communications
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. nature communications
  3. articles
  4. article
Large-scale data-driven pre-trained DNA models enhance performance across diverse genomics tasks
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 14 May 2026

Large-scale data-driven pre-trained DNA models enhance performance across diverse genomics tasks

  • Canzhuang Sun1 na1,
  • Zhijie He1 na1,
  • Shifei Zhang  ORCID: orcid.org/0009-0004-5475-08751 na1,
  • Kang Xu2,
  • Yu Sun2,
  • Yuyang Wang2,
  • Pengzhen Hu2,
  • Xiaochen Bo  ORCID: orcid.org/0000-0003-1911-79222,
  • Mingzhi Liao  ORCID: orcid.org/0000-0002-5216-57421,
  • Hao Li  ORCID: orcid.org/0000-0002-9464-13722 &
  • …
  • Hebing Chen  ORCID: orcid.org/0000-0003-4102-356X2 

Nature Communications (2026) Cite this article

  • 817 Accesses

  • 8 Altmetric

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computational models
  • Gene regulation
  • Genome informatics

Abstract

Sequence-based deep learning has advanced genome interpretation, yet most models remain task-specific and rely on retraining, limiting scalability across biological contexts. Here we present SUCCEED, a supervised multi-task DNA foundation model pretrained on 6,389 ENCODE functional genomics tracks to learn transferable regulatory representations. By integrating convolutional layers with a Transformer architecture, SUCCEED captures both local sequence motifs and long-range regulatory dependencies, achieving performance comparable to or exceeding Enformer across benchmark tasks. Through transfer learning, it predicts cell-type-specific epigenomic profiles, denoises sparse chromatin accessibility signals, and predicts three-dimensional chromatin contacts without CTCF input across data scales and cell types. Across diverse genomics tasks, SUCCEED performs comparably to supervised foundation models such as Sei and outperforms self-supervised models trained solely on DNA sequence. Overall, SUCCEED is a transferable and scalable foundation model that provides a unified framework for genome-scale regulatory modeling in complex biological contexts.

Similar content being viewed by others

A community effort to optimize sequence-based deep learning models of gene regulation

Article Open access 11 October 2024

Annotating the genome at single-nucleotide resolution with DNA foundation models

Article Open access 29 October 2025

GeneCompass: deciphering universal gene regulatory mechanisms with a knowledge-informed cross-species foundation model

Article Open access 08 October 2024

Funding

This work was supported by the National Natural Science Foundation of China (Nos. 62422318 to H.C. and 62472360 to M.L.), the Beijing Nova Program of Science and Technology (20250484974 to H.C.), the State Key Laboratory of Medical Proteomics (SKLP-K202407 to H.C.), and the National Key Research and Development Program of China (2023YFF0725500 to H.C. and 2024YFA1307700 to X.B.). Additional support was provided by the Science Fund for Distinguished Young Scholars of Shaanxi Province (grant no. 2024JC-JCQN-29 to M.L.).

Author information

Author notes
  1. These authors contributed equally: Canzhuang Sun, Zhijie He, Shifei Zhang.

Authors and Affiliations

  1. College of Life Sciences, Center of Bioinformatics, Northwest A&F University, Yangling, China

    Canzhuang Sun, Zhijie He, Shifei Zhang & Mingzhi Liao

  2. Academy of Military Medical Sciences, Beijing, China

    Kang Xu, Yu Sun, Yuyang Wang, Pengzhen Hu, Xiaochen Bo, Hao Li & Hebing Chen

Authors
  1. Canzhuang Sun
    View author publications

    Search author on:PubMed Google Scholar

  2. Zhijie He
    View author publications

    Search author on:PubMed Google Scholar

  3. Shifei Zhang
    View author publications

    Search author on:PubMed Google Scholar

  4. Kang Xu
    View author publications

    Search author on:PubMed Google Scholar

  5. Yu Sun
    View author publications

    Search author on:PubMed Google Scholar

  6. Yuyang Wang
    View author publications

    Search author on:PubMed Google Scholar

  7. Pengzhen Hu
    View author publications

    Search author on:PubMed Google Scholar

  8. Xiaochen Bo
    View author publications

    Search author on:PubMed Google Scholar

  9. Mingzhi Liao
    View author publications

    Search author on:PubMed Google Scholar

  10. Hao Li
    View author publications

    Search author on:PubMed Google Scholar

  11. Hebing Chen
    View author publications

    Search author on:PubMed Google Scholar

Corresponding authors

Correspondence to Xiaochen Bo, Mingzhi Liao, Hao Li or Hebing Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Transparent Peer Review file (download PDF )

Reporting Summary (download PDF )

Description of Additional Supplementary Files (download PDF )

Supplementary Data 1-5 (download XLSX )

Source data

Source Data (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, C., He, Z., Zhang, S. et al. Large-scale data-driven pre-trained DNA models enhance performance across diverse genomics tasks. Nat Commun (2026). https://doi.org/10.1038/s41467-026-73129-6

Download citation

  • Received: 28 September 2025

  • Accepted: 01 May 2026

  • Published: 14 May 2026

  • DOI: https://doi.org/10.1038/s41467-026-73129-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Videos
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims & Scope
  • Editors
  • Journal Information
  • Open Access Fees and Funding
  • Calls for Papers
  • Editorial Values Statement
  • Journal Metrics
  • Editors' Highlights
  • Contact
  • Editorial policies
  • Top Articles

Publish with us

  • For authors
  • For Reviewers
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Nature Communications (Nat Commun)

ISSN 2041-1723 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing