Abstract
Siniperca obscura is an economically valuable and ecologically significant species in China, yet the lack of comprehensive genomic resources has hindered genetic studies and breeding efforts. In this study, we present a high-quality, chromosome-level genome assembly for S. obscura, generated by integrating PacBio HiFi long-read sequencing with Hi-C scaffolding. The final assembly spans 734.12 Mb, with 99.85% of the assembled bases anchored and oriented onto 24 chromosomes. It achieves a contig N50 of 24.59 Mb and a scaffold N50 of 30.62 Mb, with high genome completeness further demonstrated by a BUSCO score of 99.37%. We predicted 23,225 protein-coding genes, with a 99.12% BUSCO completeness, and 98.50% of the genes were functionally annotated. Approximately 30.54% of the genome sequences were classified as repeat elements. This high-quality reference genome provides a valuable resource for advancing molecular breeding, comparative genomics, and evolutionary studies of S. obscura and closely related species.
Data availability
All data generated in this study have been deposited in public repositories and are freely available. Raw sequencing data, including PacBio HiFi, Hi-C, MGI short-read, and transcriptome datasets, are available in the NCBI Sequence Read Archive under accession number PRJNA124564946. The chromosome-level genome assembly of Siniperca obscura has been deposited in NCBI GenBank under accession GCA_049996615.147. The corresponding genome annotation files have been deposited in the Figshare repository48.
Code availability
No special codes or scripts were used in this work, and data processing was carried out based on the protocols and manuals of the corresponding bioinformatics software.
References
Froese, R. & Pauly, D. (Fisheries Centre, University of British Columbia Vancouver, BC, 2010).
Lu, L., Jiang, J., Zhao, J. & Li, C. Comparative genomics revealed drastic gene difference in two small Chinese perches, Siniperca undulata and S. obscura. G3: Genes, Genomes, Genetics 13, jkad101 (2023).
Song, S., Zhao, J. & Li, C. Species delimitation and phylogenetic reconstruction of the sinipercids (Perciformes: Sinipercidae) based on target enrichment of thousands of nuclear coding sequences. Mol Phylogenet Evol 111, 44–55 (2017).
Song, Y. et al. Effects of the Three Gorges Dam on the mandarin fish larvae (Siniperca chuatsi) in the middle reach of the Yangtze River: Spatial gradients in abundance, feeding, growth, and survival. Ecology of Freshwater Fish 33, e12795 (2024).
Chen, D.-X. et al. The phylogenetic placement of Siniperca obscura base on complete mitochondrial DNA sequence. Mitochondrial DNA 25, 218–219 (2014).
Huang, W., Liang, X.-F., Qu, C.-M., Zhao, C. & Cao, L. Isolation and characterization of 31 polymorphic microsatellite markers in Siniperca obscura Nichols. Conserv Genet Resour 5, 153–156 (2013).
Qu, C., Liang, X., Huang, W. & Cao, L. Isolation and characterization of 46 novel polymorphic EST-simple sequence repeats (SSR) markers in two Sinipercine fishes (Siniperca) and cross-species amplification. Int J Mol Sci 13, 9534–9544 (2012).
Chen, D., Guo, X. & Nie, P. Phylogenetic studies of sinipercid fish (Perciformes: Sinipercidae) based on multiple genes, with first application of an immune-related gene, the virus-induced protein (viperin) gene. Mol Phylogenet Evol 55, 1167–1176 (2010).
Li, C., Ortí, G. & Zhao, J. The phylogenetic placement of sinipercid fishes (“Perciformes”) revealed by 11 nuclear loci. Mol Phylogenet Evol 56, 1096–1104 (2010).
Ding, W. et al. A chromosome-level genome assembly of the mandarin fish (Siniperca chuatsi). Frontiers in genetics 12, 671650 (2021).
He, S. et al. Mandarin fish (Sinipercidae) genomes provide insights into innate predatory feeding. Communications Biology 3, 361 (2020).
Yang, C. et al. Screening of genes related to sex determination and differentiation in mandarin fish (Siniperca chuatsi). Int J Mol Sci 23, 7692 (2022).
Tu, G.-X. et al. Long-read genome assemblies reveal a cis-regulatory landscape associated with phenotypic divergence in two sister Siniperca fish species. Zoological Research 44, 287 (2023).
Lu, L., Zhao, J. & Li, C. High-quality genome assembly and annotation of the big-eye mandarin fish (Siniperca knerii). G3: Genes, Genomes. Genetics 10, 877–880 (2020).
Jiang, M. et al. The telomere-to-telomere gap-free reference genome and taxonomic reassessment of Siniperca roulei. GigaScience 14, giaf068 (2025).
Chen, S. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. Imeta 2, e107 (2023).
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun 11, 1432 (2020).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 18, 170–175 (2021).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell systems 3, 95–98 (2016).
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. bioinformatics 25, 1754–1760 (2009).
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell systems 3, 99–101 (2016).
Wang, X. & Wang, L. GMATA: an integrated software package for genome-scale SSR mining, marker development and viewing. Frontiers in plant science 7, 215951 (2016).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27, 573–580 (1999).
Han, Y. & Wessler, S. R. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res 38, e199–e199 (2010).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences 117, 9451–9457 (2020).
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenetic and genome research 110, 462–467 (2005).
Hubley, R. et al. The Dfam database of repetitive DNA families. Nucleic Acids Res 44, D81–D89 (2016).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature biotechnology 37, 907–915 (2019).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature biotechnology 33, 290–295 (2015).
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 31, 5654–5666 (2003).
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature protocols 8, 1494–1512 (2013).
Li, H. Protein-to-genome alignment with miniprot. Bioinformatics 39, btad014 (2023).
Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR genomics and bioinformatics 3, lqaa108 (2021).
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34, W435–W439 (2006).
Bruna, T., Lomsadze, A. & Borodovsky, M. GeneMark-ETP: automatic gene finding in eukaryotic genomes in consistency with extrinsic data. BioRxiv, 2023–2001 (2023).
Kuznetsov, D. et al. OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity. Nucleic Acids Res 51, D445–D451 (2023).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome biology 9, 1–22 (2008).
Buchfink, B., Reuter, K. & Drost, H.-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods 18, 366–368 (2021).
Bairoch, A. et al. The universal protein resource (UniProt). Nucleic Acids Res 33, D154–D159 (2005).
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27–30 (2000).
Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC bioinformatics 4, 1–14 (2003).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat Genet 25, 25–29 (2000).
Kanz, C. et al. The EMBL nucleotide sequence database. Nucleic Acids Res 33, D29–D33 (2005).
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRP598015 (2025).
NCBI GenBank. https://identifiers.org/ncbi/insdc.gca:GCA_049996615.1 (2025).
Liu, H. Chromosome-level genome assembly of the Siniperca obscura. Figshare. https://doi.org/10.6084/m9.figshare.29641379 (2025).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome biology 21, 245 (2020).
Li, K., Xu, P., Wang, J., Yi, X. & Jiao, Y. Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement. Nat Commun 14, 6556 (2023).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Okonechnikov, K., Conesa, A. & Garcia-Alcalde, F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics. 32, 292–294 (2016).
Tang, H. et al. Synteny and collinearity in plant genomes. Science 320, 486–488 (2008).
Tang, H. et al. JCVI: A versatile toolkit for comparative genomics analysis. iMeta, e211 (2024).
Acknowledgements
We acknowledge financial support from the National Modern Agriculture Industry Technology System Special Project (CARS-46), Monitoring of Aquatic Resources in Key Waters of Anhui Province (2024BFAFZ02936), Special Fund for Anhui Agriculture Research System (2021-711), Central Public-interest Scientific Institution Basal Research Fund, CAFS (2025XK01, 2025SJHX1, 2023TD37), China-ASEAN Maritime Cooperation Fund (CAMC-2018F), Guangdong Province Rural Revitalization Strategy Special Fund (2023-SJS-00-001).
Author information
Authors and Affiliations
Contributions
All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Liu, H., Liu, H., Cui, K. et al. Chromosome-level genome assembly of the Siniperca obscura. Sci Data (2026). https://doi.org/10.1038/s41597-026-06678-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-026-06678-6