Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
Telomere-to-telomere gapless genome assembly of Siniperca scherzeri
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 02 April 2026

Telomere-to-telomere gapless genome assembly of Siniperca scherzeri

  • Yannian Wu1 na1,
  • Zhiqiang Cheng1 na1,
  • Yang Li1 na1,
  • Hao Xu1,
  • Maoyuan Wang2,
  • Xiaojun Ye2,
  • Mingyong Lai2,
  • Zhiyong Wang1 &
  • …
  • Dongling Zhang1 

Scientific Data , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Genome
  • Genome assembly algorithms

Abstract

The mandarin fish (Siniperca scherzeri), renowned as the “freshwater grouper”, has emerged as a commercially significant aquaculture species in China due to its superior flesh quality, disease resistance, and domestication adaptability. With the rapid development of bioinformatics, higher standards of genome analysis are now required compared to previous reference genomes. In this study, we integrated PacBio HiFi long-read sequencing, Oxford Nanopore Technologies ultralong-read sequencing, and Hi-C chromatin conformation capture to assemble a near-complete telomere-to-telomere genome. The gapless assembly spans 24 chromosomes, with telomeric repeats detected at both ends of 20 chromosomes and at only one end of the remaining four chromosomes. BUSCO evaluation against the Actinopterygii database (actinopterygii_odb10) revealed 98.7% genome completeness. Alignment analyses using minimap2 demonstrated >97% mapping rates for ONT ultralong reads, PacBio HiFi reads, and Hi-C data against the assembled genome. We annotated 23,296 protein-coding genes, establishing a crucial genomic resource for elucidating the species’ evolutionary biology and advancing molecular breeding strategies.

Data availability

The genome sequencing data generated in this study and the genome assembly as well as annotation data have been deposited into the China National GeneBank Sequence Archive (CNSA) under the project accession CNP000795142. The sequencing data have been also archived in the NCBI SRA under accession SRP65285743, and the assembled genome and annotation have been archived in the NCBI GenBank under accession JBVQOQ00000000044. The genome assembly data and annotations have also been deposited at Figshare45.

Code availability

No specific code was used in this study. The data analysis used standard bioinformatic tools specified in the methods.

References

  1. Zhou, C., Yang, Q. & Cai, D. On the Classification and Distribution of the Sinipercinae Fishes (Family Serranidae). Zoolgical Research 9, 113–125 (1988).

    Google Scholar 

  2. Li, Y. et al. Identification of the Sex-Linked Region of Siniperca Scherzeri and Development of Sex-Specific Markers. Aquaculture 600, 742231 (2025).

    Google Scholar 

  3. Sun, C. et al. Construction of a High-Density Linkage Map and Mapping of Sex Determination and Growth-Related Loci in the Mandarin Fish (Siniperca Chuatsi). Bmc Genomics 18, 446 (2017).

    Google Scholar 

  4. Wang, M. et al. Comparison of Growth Performance and Muscle Nutrition Levels of Juvenile Siniperca Scherzeri Fed On an Iced Trash Fish Diet and a Formulated Diet. Fishes. 8, 393 (2023).

    Google Scholar 

  5. He, S. et al. Mandarin Fish (Sinipercidae) Genomes Provide Insights Into Innate Predatory Feeding. Commun. Biol. 3, 361 (2020).

    Google Scholar 

  6. Tu, G. et al. Long-Read Genome Assemblies Reveal a Cis-Regulatory Landscape Associated with Phenotypic Divergence in Two Sister Siniperca Fish Species. Zoolgical Research. 44, 287–302 (2023).

    Google Scholar 

  7. Xue, L. et al. Telomere-to-Telomere Assembly of a Fish Y Chromosome Reveals the Origin of a Young Sex Chromosome Pair. Genome Biol. 22, 203 (2021).

    Google Scholar 

  8. Zhou, Q. et al. Telomere-to-Telomere Gapless Genome Assembly of the Giant Grouper (Epinephelus Lanceolatus). Sci. Data. 11, 1342 (2024).

    Google Scholar 

  9. Sherathiya, V. N., Schaid, M. D., Seiler, J. L., Lopez, G. C. & Lerner, T. N. Guppy, a Python Toolbox for the Analysis of Fiber Photometry Data. Sci. Rep. 11, 24212 (2021).

    Google Scholar 

  10. Liu, Y., Schroder, J. & Schmidt, B. Musket: A Multistage K-Mer Spectrum-Based Error Corrector for Illumina Sequence Data. Bioinformatics 29, 308–315 (2013).

    Google Scholar 

  11. Marcais, G. & Kingsford, C. A Fast, Lock-Free Approach for Efficient Parallel Counting of Occurrences of K-Mers. Bioinformatics 27, 764–770 (2011).

    Google Scholar 

  12. Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. Genomescope 2.0 and Smudgeplot for Reference-Free Profiling of Polyploid Genomes. Nat. Commun. 11, 1432 (2020).

    Google Scholar 

  13. Rautiainen, M. et al. Telomere-to-Telomere Assembly of Diploid Chromosomes with Verkko. Nat. Biotechnol. 41, 1474–1482 (2023).

    Google Scholar 

  14. Wingett, S. et al. Hicup: Pipeline for Mapping and Processing Hi-C Data. F1000Research 4, 1310 (2015).

    Google Scholar 

  15. Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Systerms 3, 95–98 (2016).

    Google Scholar 

  16. Dudchenko, O. et al. De Novo Assembly of the Aedes Aegypti Genome Using Hi-C Yields Chromosome-Length Scaffolds. Science. 356, 92–95 (2017).

    Google Scholar 

  17. Durand, N. C. et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Systerms 3, 99–101 (2016).

    Google Scholar 

  18. Xu, M. et al. Tgs-Gapcloser: A Fast and Accurate Gap Closer for Large Genomes with Low Coverage of Error-Prone Long Reads. Gigascience 9, giaa94 (2020).

    Google Scholar 

  19. Lin, Y. et al. Quartet: A Telomere-to-Telomere Toolkit for Gap-Free Genome Assembly and Centromeric Repeat Identification. Hortic. Res. 10, uhad127 (2023).

    Google Scholar 

  20. Tarailo-Graovac, M. & Chen, N. Using Repeatmasker to Identify Repetitive Elements in Genomic Sequences. Curr Protoc Bioinformatics Chapter 4, 4–10 (2009).

    Google Scholar 

  21. Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a Database of Repetitive Elements in Eukaryotic Genomes. Mob. Dna. 6, 11 (2015).

    Google Scholar 

  22. Flynn, J. M. et al. Repeatmodeler2 for Automated Genomic Discovery of Transposable Element Families. Proc. Natl. Acad. Sci. USA. 117, 9451–9457 (2020).

    Google Scholar 

  23. Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using Native and Syntenically Mapped Cdna Alignments to Improve De Novo Gene Finding. Bioinformatics 24, 637–644 (2008).

    Google Scholar 

  24. Trapnell, C., Pachter, L. & Salzberg, S. L. Tophat: Discovering Splice Junctions with Rna-Seq. Bioinformatics 25, 1105–1111 (2009).

    Google Scholar 

  25. Gremme, G., Brendel, V., Sparks, M. E. & Kurtz, S. Engineering a Software Tool for Gene Structure Prediction in Higher Organisms. Inf. Softw. Technol. 47, 965–978 (2005).

    Google Scholar 

  26. Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-Based Genome Alignment and Genotyping with Hisat2 and Hisat-Genotype. Nat. Biotechnol. 37, 907–915 (2019).

    Google Scholar 

  27. Pertea, M. et al. Stringtie Enables Improved Reconstruction of a Transcriptome From Rna-Seq Reads. Nat. Biotechnol. 33, 290–295 (2015).

    Google Scholar 

  28. Grabherr, M. G. et al. Full-Length Transcriptome Assembly From Rna-Seq Data without a Reference Genome. Nat. Biotechnol. 29, 644–652 (2011).

    Google Scholar 

  29. Haas, B. J. et al. Improving the Arabidopsis Genome Annotation Using Maximal Transcript Alignment Assemblies. Nucleic. Acids. Res. 31, 5654–5666 (2003).

    Google Scholar 

  30. Haas, B. J. et al. Automated Eukaryotic Gene Structure Annotation Using Evidencemodeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).

    Google Scholar 

  31. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic Local Alignment Search Tool. J. Mol. Biol. 215, 403–410 (1990).

    Google Scholar 

  32. Pruitt, K. D., Tatusova, T. & Maglott, D. R. Ncbi Reference Sequences (Refseq): A Curated Non-Redundant Sequence Database of Genomes, Transcripts and Proteins. Nucleic. Acids. Res. 35, D61–D65 (2007).

    Google Scholar 

  33. Bairoch, A. & Apweiler, R. The Swiss-Prot Protein Sequence Database and its Supplement Trembl in 2000. Nucleic. Acids. Res. 28, 45–48 (2000).

    Google Scholar 

  34. Ashburner, M. et al. Gene Ontology: Tool for the Unification of Biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).

    Google Scholar 

  35. Kanehisa, M. & Goto, S. Kegg: Kyoto Encyclopedia of Genes and Genomes. Nucleic. Acids. Res. 28, 27–30 (2000).

    Google Scholar 

  36. Bairoch, A. & Apweiler, R. The Swiss-Prot Protein Sequence Data Bank and its Supplement Trembl in 1999. Nucleic. Acids. Res. 27, 49–54 (1999).

    Google Scholar 

  37. Chan, P. P., Lin, B. Y., Mak, A. J. & Lowe, T. M. Trnascan-Se 2.0: Improved Detection and Functional Classification of Transfer Rna Genes. Nucleic. Acids. Res. 49, 9077–9096 (2021).

    Google Scholar 

  38. Di Tommaso, P. et al. Nextflow Enables Reproducible Computational Workflows. Nat. Biotechnol. 35, 316–319 (2017).

    Google Scholar 

  39. Li, H. & Durbin, R. Fast and Accurate Long-Read Alignment with Burrows-Wheeler Transform. Bioinformatics 26, 589–595 (2010).

    Google Scholar 

  40. Li, H. Minimap2: Pairwise Alignment for Nucleotide Sequences. Bioinformatics 34, 3094–3100 (2018).

    Google Scholar 

  41. He, W. et al. Ngenomesyn: An Easy-to-Use and Flexible Tool for Publication-Ready Visualization of Syntenic Relationships Across Multiple Genomes. Bioinformatics 39, btad121 (2023).

    Google Scholar 

  42. China National GeneBank Sequence Archive (CNSA). https://db.cngb.org/data_resources/project/CNP0007951 (2025).

  43. NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP652857 (2025).

  44. NCBI GenBank https://www.ncbi.nlm.nih.gov/nuccore/JBVQOQ000000000 (2026).

  45. Li, Y. Siniperca scherzeri genome. figshare https://doi.org/10.6084/m9.figshare.30084370.v1 (2025).

Download references

Acknowledgements

This work was financially supported by Fujian Province Seed Industry Innovation and Industrialization Engineering Fishery Project(2021MNZ05), and Agriculture Research System of China (CARS-46).

Author information

Author notes
  1. These authors contributed equally: Yannian Wu, Zhiqiang Cheng, Yang Li.

Authors and Affiliations

  1. Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs, Jimei University, Xiamen, China

    Yannian Wu, Zhiqiang Cheng, Yang Li, Hao Xu, Zhiyong Wang & Dongling Zhang

  2. Freshwater Fisheries Research Institute of Fujian Province, Fuzhou, China

    Maoyuan Wang, Xiaojun Ye & Mingyong Lai

Authors
  1. Yannian Wu
    View author publications

    Search author on:PubMed Google Scholar

  2. Zhiqiang Cheng
    View author publications

    Search author on:PubMed Google Scholar

  3. Yang Li
    View author publications

    Search author on:PubMed Google Scholar

  4. Hao Xu
    View author publications

    Search author on:PubMed Google Scholar

  5. Maoyuan Wang
    View author publications

    Search author on:PubMed Google Scholar

  6. Xiaojun Ye
    View author publications

    Search author on:PubMed Google Scholar

  7. Mingyong Lai
    View author publications

    Search author on:PubMed Google Scholar

  8. Zhiyong Wang
    View author publications

    Search author on:PubMed Google Scholar

  9. Dongling Zhang
    View author publications

    Search author on:PubMed Google Scholar

Contributions

D.Z. and Z.W. designed the study. Y.L. H.X. and Z.C. were involved in sample collection. Y.L., Z.C. and Y.W. performed the experiment and analyzed the data. Y.L. and Y.W. wrote the paper. M.W., X.Y. and M.L. revised the paper.

Corresponding author

Correspondence to Dongling Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Y., Cheng, Z., Li, Y. et al. Telomere-to-telomere gapless genome assembly of Siniperca scherzeri. Sci Data (2026). https://doi.org/10.1038/s41597-026-07113-6

Download citation

  • Received: 15 September 2025

  • Accepted: 19 March 2026

  • Published: 02 April 2026

  • DOI: https://doi.org/10.1038/s41597-026-07113-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing