Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
Chromosome-level genome assembly of the Siniperca obscura
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 02 February 2026

Chromosome-level genome assembly of the Siniperca obscura

  • Haiyang Liu  ORCID: orcid.org/0000-0001-8301-45951,
  • Huijuan Liu1,2,
  • Kai Cui3,
  • Wenxuan Lu3,
  • Jing Li3,
  • Ting Fang3,
  • Na Gao3,
  • Cheng Chen3,
  • Xiuxia Zhao3,
  • Kun Yang3,
  • Yanfeng Lin4 &
  • …
  • Yangyang Liang3 

Scientific Data , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Conservation genomics
  • Structural variation

Abstract

Siniperca obscura is an economically valuable and ecologically significant species in China, yet the lack of comprehensive genomic resources has hindered genetic studies and breeding efforts. In this study, we present a high-quality, chromosome-level genome assembly for S. obscura, generated by integrating PacBio HiFi long-read sequencing with Hi-C scaffolding. The final assembly spans 734.12 Mb, with 99.85% of the assembled bases anchored and oriented onto 24 chromosomes. It achieves a contig N50 of 24.59 Mb and a scaffold N50 of 30.62 Mb, with high genome completeness further demonstrated by a BUSCO score of 99.37%. We predicted 23,225 protein-coding genes, with a 99.12% BUSCO completeness, and 98.50% of the genes were functionally annotated. Approximately 30.54% of the genome sequences were classified as repeat elements. This high-quality reference genome provides a valuable resource for advancing molecular breeding, comparative genomics, and evolutionary studies of S. obscura and closely related species.

Data availability

All data generated in this study have been deposited in public repositories and are freely available. Raw sequencing data, including PacBio HiFi, Hi-C, MGI short-read, and transcriptome datasets, are available in the NCBI Sequence Read Archive under accession number PRJNA124564946. The chromosome-level genome assembly of Siniperca obscura has been deposited in NCBI GenBank under accession GCA_049996615.147. The corresponding genome annotation files have been deposited in the Figshare repository48.

Code availability

No special codes or scripts were used in this work, and data processing was carried out based on the protocols and manuals of the corresponding bioinformatics software.

References

  1. Froese, R. & Pauly, D. (Fisheries Centre, University of British Columbia Vancouver, BC, 2010).

  2. Lu, L., Jiang, J., Zhao, J. & Li, C. Comparative genomics revealed drastic gene difference in two small Chinese perches, Siniperca undulata and S. obscura. G3: Genes, Genomes, Genetics 13, jkad101 (2023).

    Google Scholar 

  3. Song, S., Zhao, J. & Li, C. Species delimitation and phylogenetic reconstruction of the sinipercids (Perciformes: Sinipercidae) based on target enrichment of thousands of nuclear coding sequences. Mol Phylogenet Evol 111, 44–55 (2017).

    Google Scholar 

  4. Song, Y. et al. Effects of the Three Gorges Dam on the mandarin fish larvae (Siniperca chuatsi) in the middle reach of the Yangtze River: Spatial gradients in abundance, feeding, growth, and survival. Ecology of Freshwater Fish 33, e12795 (2024).

    Google Scholar 

  5. Chen, D.-X. et al. The phylogenetic placement of Siniperca obscura base on complete mitochondrial DNA sequence. Mitochondrial DNA 25, 218–219 (2014).

    Google Scholar 

  6. Huang, W., Liang, X.-F., Qu, C.-M., Zhao, C. & Cao, L. Isolation and characterization of 31 polymorphic microsatellite markers in Siniperca obscura Nichols. Conserv Genet Resour 5, 153–156 (2013).

    Google Scholar 

  7. Qu, C., Liang, X., Huang, W. & Cao, L. Isolation and characterization of 46 novel polymorphic EST-simple sequence repeats (SSR) markers in two Sinipercine fishes (Siniperca) and cross-species amplification. Int J Mol Sci 13, 9534–9544 (2012).

    Google Scholar 

  8. Chen, D., Guo, X. & Nie, P. Phylogenetic studies of sinipercid fish (Perciformes: Sinipercidae) based on multiple genes, with first application of an immune-related gene, the virus-induced protein (viperin) gene. Mol Phylogenet Evol 55, 1167–1176 (2010).

    Google Scholar 

  9. Li, C., Ortí, G. & Zhao, J. The phylogenetic placement of sinipercid fishes (“Perciformes”) revealed by 11 nuclear loci. Mol Phylogenet Evol 56, 1096–1104 (2010).

    Google Scholar 

  10. Ding, W. et al. A chromosome-level genome assembly of the mandarin fish (Siniperca chuatsi). Frontiers in genetics 12, 671650 (2021).

    Google Scholar 

  11. He, S. et al. Mandarin fish (Sinipercidae) genomes provide insights into innate predatory feeding. Communications Biology 3, 361 (2020).

    Google Scholar 

  12. Yang, C. et al. Screening of genes related to sex determination and differentiation in mandarin fish (Siniperca chuatsi). Int J Mol Sci 23, 7692 (2022).

    Google Scholar 

  13. Tu, G.-X. et al. Long-read genome assemblies reveal a cis-regulatory landscape associated with phenotypic divergence in two sister Siniperca fish species. Zoological Research 44, 287 (2023).

    Google Scholar 

  14. Lu, L., Zhao, J. & Li, C. High-quality genome assembly and annotation of the big-eye mandarin fish (Siniperca knerii). G3: Genes, Genomes. Genetics 10, 877–880 (2020).

    Google Scholar 

  15. Jiang, M. et al. The telomere-to-telomere gap-free reference genome and taxonomic reassessment of Siniperca roulei. GigaScience 14, giaf068 (2025).

    Google Scholar 

  16. Chen, S. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. Imeta 2, e107 (2023).

    Google Scholar 

  17. Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).

    Google Scholar 

  18. Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun 11, 1432 (2020).

    Google Scholar 

  19. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 18, 170–175 (2021).

    Google Scholar 

  20. Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell systems 3, 95–98 (2016).

    Google Scholar 

  21. Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).

    Google Scholar 

  22. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. bioinformatics 25, 1754–1760 (2009).

    Google Scholar 

  23. Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell systems 3, 99–101 (2016).

    Google Scholar 

  24. Wang, X. & Wang, L. GMATA: an integrated software package for genome-scale SSR mining, marker development and viewing. Frontiers in plant science 7, 215951 (2016).

    Google Scholar 

  25. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27, 573–580 (1999).

    Google Scholar 

  26. Han, Y. & Wessler, S. R. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res 38, e199–e199 (2010).

    Google Scholar 

  27. Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences 117, 9451–9457 (2020).

    Google Scholar 

  28. Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenetic and genome research 110, 462–467 (2005).

    Google Scholar 

  29. Hubley, R. et al. The Dfam database of repetitive DNA families. Nucleic Acids Res 44, D81–D89 (2016).

    Google Scholar 

  30. Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature biotechnology 37, 907–915 (2019).

    Google Scholar 

  31. Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature biotechnology 33, 290–295 (2015).

    Google Scholar 

  32. Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 31, 5654–5666 (2003).

    Google Scholar 

  33. Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature protocols 8, 1494–1512 (2013).

    Google Scholar 

  34. Li, H. Protein-to-genome alignment with miniprot. Bioinformatics 39, btad014 (2023).

    Google Scholar 

  35. Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR genomics and bioinformatics 3, lqaa108 (2021).

    Google Scholar 

  36. Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34, W435–W439 (2006).

    Google Scholar 

  37. Bruna, T., Lomsadze, A. & Borodovsky, M. GeneMark-ETP: automatic gene finding in eukaryotic genomes in consistency with extrinsic data. BioRxiv, 2023–2001 (2023).

  38. Kuznetsov, D. et al. OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity. Nucleic Acids Res 51, D445–D451 (2023).

    Google Scholar 

  39. Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome biology 9, 1–22 (2008).

    Google Scholar 

  40. Buchfink, B., Reuter, K. & Drost, H.-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods 18, 366–368 (2021).

    Google Scholar 

  41. Bairoch, A. et al. The universal protein resource (UniProt). Nucleic Acids Res 33, D154–D159 (2005).

    Google Scholar 

  42. Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27–30 (2000).

    Google Scholar 

  43. Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC bioinformatics 4, 1–14 (2003).

    Google Scholar 

  44. Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat Genet 25, 25–29 (2000).

    Google Scholar 

  45. Kanz, C. et al. The EMBL nucleotide sequence database. Nucleic Acids Res 33, D29–D33 (2005).

    Google Scholar 

  46. NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRP598015 (2025).

  47. NCBI GenBank. https://identifiers.org/ncbi/insdc.gca:GCA_049996615.1 (2025).

  48. Liu, H. Chromosome-level genome assembly of the Siniperca obscura. Figshare. https://doi.org/10.6084/m9.figshare.29641379 (2025).

  49. Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

    Google Scholar 

  50. Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome biology 21, 245 (2020).

    Google Scholar 

  51. Li, K., Xu, P., Wang, J., Yi, X. & Jiao, Y. Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement. Nat Commun 14, 6556 (2023).

    Google Scholar 

  52. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Google Scholar 

  53. Okonechnikov, K., Conesa, A. & Garcia-Alcalde, F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics. 32, 292–294 (2016).

    Google Scholar 

  54. Tang, H. et al. Synteny and collinearity in plant genomes. Science 320, 486–488 (2008).

    Google Scholar 

  55. Tang, H. et al. JCVI: A versatile toolkit for comparative genomics analysis. iMeta, e211 (2024).

Download references

Acknowledgements

We acknowledge financial support from the National Modern Agriculture Industry Technology System Special Project (CARS-46), Monitoring of Aquatic Resources in Key Waters of Anhui Province (2024BFAFZ02936), Special Fund for Anhui Agriculture Research System (2021-711), Central Public-interest Scientific Institution Basal Research Fund, CAFS (2025XK01, 2025SJHX1, 2023TD37), China-ASEAN Maritime Cooperation Fund (CAMC-2018F), Guangdong Province Rural Revitalization Strategy Special Fund (2023-SJS-00-001).

Author information

Authors and Affiliations

  1. Key Laboratory of Tropical and Subtropical Fishery Resources Application and Cultivation, Ministry of Agriculture and Rural Affairs, Pearl River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou, 510380, China

    Haiyang Liu & Huijuan Liu

  2. School of Marine Sciences, Ningbo University, Ningbo, 315211, China

    Huijuan Liu

  3. Key Laboratory of Freshwater Aquaculture and Enhancement of Anhui Province, Fisheries Research Institute, Anhui Academy of Agricultural Sciences, Hefei, 230001, China

    Kai Cui, Wenxuan Lu, Jing Li, Ting Fang, Na Gao, Cheng Chen, Xiuxia Zhao, Kun Yang & Yangyang Liang

  4. Fishery Bureau of Xiuning County, Huangshan, 245400, China

    Yanfeng Lin

Authors
  1. Haiyang Liu
    View author publications

    Search author on:PubMed Google Scholar

  2. Huijuan Liu
    View author publications

    Search author on:PubMed Google Scholar

  3. Kai Cui
    View author publications

    Search author on:PubMed Google Scholar

  4. Wenxuan Lu
    View author publications

    Search author on:PubMed Google Scholar

  5. Jing Li
    View author publications

    Search author on:PubMed Google Scholar

  6. Ting Fang
    View author publications

    Search author on:PubMed Google Scholar

  7. Na Gao
    View author publications

    Search author on:PubMed Google Scholar

  8. Cheng Chen
    View author publications

    Search author on:PubMed Google Scholar

  9. Xiuxia Zhao
    View author publications

    Search author on:PubMed Google Scholar

  10. Kun Yang
    View author publications

    Search author on:PubMed Google Scholar

  11. Yanfeng Lin
    View author publications

    Search author on:PubMed Google Scholar

  12. Yangyang Liang
    View author publications

    Search author on:PubMed Google Scholar

Contributions

All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Haiyang Liu, Yanfeng Lin or Yangyang Liang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Table S1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Liu, H., Cui, K. et al. Chromosome-level genome assembly of the Siniperca obscura. Sci Data (2026). https://doi.org/10.1038/s41597-026-06678-6

Download citation

  • Received: 29 July 2025

  • Accepted: 22 January 2026

  • Published: 02 February 2026

  • DOI: https://doi.org/10.1038/s41597-026-06678-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on Twitter
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing