Abstract
Sequence-based deep learning has advanced genome interpretation, yet most models remain task-specific and rely on retraining, limiting scalability across biological contexts. Here we present SUCCEED, a supervised multi-task DNA foundation model pretrained on 6,389 ENCODE functional genomics tracks to learn transferable regulatory representations. By integrating convolutional layers with a Transformer architecture, SUCCEED captures both local sequence motifs and long-range regulatory dependencies, achieving performance comparable to or exceeding Enformer across benchmark tasks. Through transfer learning, it predicts cell-type-specific epigenomic profiles, denoises sparse chromatin accessibility signals, and predicts three-dimensional chromatin contacts without CTCF input across data scales and cell types. Across diverse genomics tasks, SUCCEED performs comparably to supervised foundation models such as Sei and outperforms self-supervised models trained solely on DNA sequence. Overall, SUCCEED is a transferable and scalable foundation model that provides a unified framework for genome-scale regulatory modeling in complex biological contexts.
Similar content being viewed by others
Funding
This work was supported by the National Natural Science Foundation of China (Nos. 62422318 to H.C. and 62472360 to M.L.), the Beijing Nova Program of Science and Technology (20250484974 to H.C.), the State Key Laboratory of Medical Proteomics (SKLP-K202407 to H.C.), and the National Key Research and Development Program of China (2023YFF0725500 to H.C. and 2024YFA1307700 to X.B.). Additional support was provided by the Science Fund for Distinguished Young Scholars of Shaanxi Province (grant no. 2024JC-JCQN-29 to M.L.).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sun, C., He, Z., Zhang, S. et al. Large-scale data-driven pre-trained DNA models enhance performance across diverse genomics tasks. Nat Commun (2026). https://doi.org/10.1038/s41467-026-73129-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-026-73129-6


