Abstract
The mandarin fish (Siniperca scherzeri), renowned as the “freshwater grouper”, has emerged as a commercially significant aquaculture species in China due to its superior flesh quality, disease resistance, and domestication adaptability. With the rapid development of bioinformatics, higher standards of genome analysis are now required compared to previous reference genomes. In this study, we integrated PacBio HiFi long-read sequencing, Oxford Nanopore Technologies ultralong-read sequencing, and Hi-C chromatin conformation capture to assemble a near-complete telomere-to-telomere genome. The gapless assembly spans 24 chromosomes, with telomeric repeats detected at both ends of 20 chromosomes and at only one end of the remaining four chromosomes. BUSCO evaluation against the Actinopterygii database (actinopterygii_odb10) revealed 98.7% genome completeness. Alignment analyses using minimap2 demonstrated >97% mapping rates for ONT ultralong reads, PacBio HiFi reads, and Hi-C data against the assembled genome. We annotated 23,296 protein-coding genes, establishing a crucial genomic resource for elucidating the species’ evolutionary biology and advancing molecular breeding strategies.
Data availability
The genome sequencing data generated in this study and the genome assembly as well as annotation data have been deposited into the China National GeneBank Sequence Archive (CNSA) under the project accession CNP000795142. The sequencing data have been also archived in the NCBI SRA under accession SRP65285743, and the assembled genome and annotation have been archived in the NCBI GenBank under accession JBVQOQ00000000044. The genome assembly data and annotations have also been deposited at Figshare45.
Code availability
No specific code was used in this study. The data analysis used standard bioinformatic tools specified in the methods.
References
Zhou, C., Yang, Q. & Cai, D. On the Classification and Distribution of the Sinipercinae Fishes (Family Serranidae). Zoolgical Research 9, 113–125 (1988).
Li, Y. et al. Identification of the Sex-Linked Region of Siniperca Scherzeri and Development of Sex-Specific Markers. Aquaculture 600, 742231 (2025).
Sun, C. et al. Construction of a High-Density Linkage Map and Mapping of Sex Determination and Growth-Related Loci in the Mandarin Fish (Siniperca Chuatsi). Bmc Genomics 18, 446 (2017).
Wang, M. et al. Comparison of Growth Performance and Muscle Nutrition Levels of Juvenile Siniperca Scherzeri Fed On an Iced Trash Fish Diet and a Formulated Diet. Fishes. 8, 393 (2023).
He, S. et al. Mandarin Fish (Sinipercidae) Genomes Provide Insights Into Innate Predatory Feeding. Commun. Biol. 3, 361 (2020).
Tu, G. et al. Long-Read Genome Assemblies Reveal a Cis-Regulatory Landscape Associated with Phenotypic Divergence in Two Sister Siniperca Fish Species. Zoolgical Research. 44, 287–302 (2023).
Xue, L. et al. Telomere-to-Telomere Assembly of a Fish Y Chromosome Reveals the Origin of a Young Sex Chromosome Pair. Genome Biol. 22, 203 (2021).
Zhou, Q. et al. Telomere-to-Telomere Gapless Genome Assembly of the Giant Grouper (Epinephelus Lanceolatus). Sci. Data. 11, 1342 (2024).
Sherathiya, V. N., Schaid, M. D., Seiler, J. L., Lopez, G. C. & Lerner, T. N. Guppy, a Python Toolbox for the Analysis of Fiber Photometry Data. Sci. Rep. 11, 24212 (2021).
Liu, Y., Schroder, J. & Schmidt, B. Musket: A Multistage K-Mer Spectrum-Based Error Corrector for Illumina Sequence Data. Bioinformatics 29, 308–315 (2013).
Marcais, G. & Kingsford, C. A Fast, Lock-Free Approach for Efficient Parallel Counting of Occurrences of K-Mers. Bioinformatics 27, 764–770 (2011).
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. Genomescope 2.0 and Smudgeplot for Reference-Free Profiling of Polyploid Genomes. Nat. Commun. 11, 1432 (2020).
Rautiainen, M. et al. Telomere-to-Telomere Assembly of Diploid Chromosomes with Verkko. Nat. Biotechnol. 41, 1474–1482 (2023).
Wingett, S. et al. Hicup: Pipeline for Mapping and Processing Hi-C Data. F1000Research 4, 1310 (2015).
Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Systerms 3, 95–98 (2016).
Dudchenko, O. et al. De Novo Assembly of the Aedes Aegypti Genome Using Hi-C Yields Chromosome-Length Scaffolds. Science. 356, 92–95 (2017).
Durand, N. C. et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Systerms 3, 99–101 (2016).
Xu, M. et al. Tgs-Gapcloser: A Fast and Accurate Gap Closer for Large Genomes with Low Coverage of Error-Prone Long Reads. Gigascience 9, giaa94 (2020).
Lin, Y. et al. Quartet: A Telomere-to-Telomere Toolkit for Gap-Free Genome Assembly and Centromeric Repeat Identification. Hortic. Res. 10, uhad127 (2023).
Tarailo-Graovac, M. & Chen, N. Using Repeatmasker to Identify Repetitive Elements in Genomic Sequences. Curr Protoc Bioinformatics Chapter 4, 4–10 (2009).
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a Database of Repetitive Elements in Eukaryotic Genomes. Mob. Dna. 6, 11 (2015).
Flynn, J. M. et al. Repeatmodeler2 for Automated Genomic Discovery of Transposable Element Families. Proc. Natl. Acad. Sci. USA. 117, 9451–9457 (2020).
Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using Native and Syntenically Mapped Cdna Alignments to Improve De Novo Gene Finding. Bioinformatics 24, 637–644 (2008).
Trapnell, C., Pachter, L. & Salzberg, S. L. Tophat: Discovering Splice Junctions with Rna-Seq. Bioinformatics 25, 1105–1111 (2009).
Gremme, G., Brendel, V., Sparks, M. E. & Kurtz, S. Engineering a Software Tool for Gene Structure Prediction in Higher Organisms. Inf. Softw. Technol. 47, 965–978 (2005).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-Based Genome Alignment and Genotyping with Hisat2 and Hisat-Genotype. Nat. Biotechnol. 37, 907–915 (2019).
Pertea, M. et al. Stringtie Enables Improved Reconstruction of a Transcriptome From Rna-Seq Reads. Nat. Biotechnol. 33, 290–295 (2015).
Grabherr, M. G. et al. Full-Length Transcriptome Assembly From Rna-Seq Data without a Reference Genome. Nat. Biotechnol. 29, 644–652 (2011).
Haas, B. J. et al. Improving the Arabidopsis Genome Annotation Using Maximal Transcript Alignment Assemblies. Nucleic. Acids. Res. 31, 5654–5666 (2003).
Haas, B. J. et al. Automated Eukaryotic Gene Structure Annotation Using Evidencemodeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic Local Alignment Search Tool. J. Mol. Biol. 215, 403–410 (1990).
Pruitt, K. D., Tatusova, T. & Maglott, D. R. Ncbi Reference Sequences (Refseq): A Curated Non-Redundant Sequence Database of Genomes, Transcripts and Proteins. Nucleic. Acids. Res. 35, D61–D65 (2007).
Bairoch, A. & Apweiler, R. The Swiss-Prot Protein Sequence Database and its Supplement Trembl in 2000. Nucleic. Acids. Res. 28, 45–48 (2000).
Ashburner, M. et al. Gene Ontology: Tool for the Unification of Biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
Kanehisa, M. & Goto, S. Kegg: Kyoto Encyclopedia of Genes and Genomes. Nucleic. Acids. Res. 28, 27–30 (2000).
Bairoch, A. & Apweiler, R. The Swiss-Prot Protein Sequence Data Bank and its Supplement Trembl in 1999. Nucleic. Acids. Res. 27, 49–54 (1999).
Chan, P. P., Lin, B. Y., Mak, A. J. & Lowe, T. M. Trnascan-Se 2.0: Improved Detection and Functional Classification of Transfer Rna Genes. Nucleic. Acids. Res. 49, 9077–9096 (2021).
Di Tommaso, P. et al. Nextflow Enables Reproducible Computational Workflows. Nat. Biotechnol. 35, 316–319 (2017).
Li, H. & Durbin, R. Fast and Accurate Long-Read Alignment with Burrows-Wheeler Transform. Bioinformatics 26, 589–595 (2010).
Li, H. Minimap2: Pairwise Alignment for Nucleotide Sequences. Bioinformatics 34, 3094–3100 (2018).
He, W. et al. Ngenomesyn: An Easy-to-Use and Flexible Tool for Publication-Ready Visualization of Syntenic Relationships Across Multiple Genomes. Bioinformatics 39, btad121 (2023).
China National GeneBank Sequence Archive (CNSA). https://db.cngb.org/data_resources/project/CNP0007951 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP652857 (2025).
NCBI GenBank https://www.ncbi.nlm.nih.gov/nuccore/JBVQOQ000000000 (2026).
Li, Y. Siniperca scherzeri genome. figshare https://doi.org/10.6084/m9.figshare.30084370.v1 (2025).
Acknowledgements
This work was financially supported by Fujian Province Seed Industry Innovation and Industrialization Engineering Fishery Project(2021MNZ05), and Agriculture Research System of China (CARS-46).
Author information
Authors and Affiliations
Contributions
D.Z. and Z.W. designed the study. Y.L. H.X. and Z.C. were involved in sample collection. Y.L., Z.C. and Y.W. performed the experiment and analyzed the data. Y.L. and Y.W. wrote the paper. M.W., X.Y. and M.L. revised the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wu, Y., Cheng, Z., Li, Y. et al. Telomere-to-telomere gapless genome assembly of Siniperca scherzeri. Sci Data (2026). https://doi.org/10.1038/s41597-026-07113-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-026-07113-6