Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
A chromosome-level reference genome assembly of the giant pangasius (Pangasius sanitwongsei)
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 16 December 2025

A chromosome-level reference genome assembly of the giant pangasius (Pangasius sanitwongsei)

  • Baojiang Gan1,
  • Lingjing Wei1,
  • Yuanxiong Ma1,
  • Feilong Mo1,
  • Shan Xiao1,
  • Yudian Lu1,
  • Kang Liu1,
  • Binlan Yang1,
  • Sheng Zhang1 &
  • …
  • Haiyang Liu  ORCID: orcid.org/0000-0001-8301-45952 

Scientific Data , Article number:  (2025) Cite this article

  • 1122 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Comparative genomics
  • Conservation genomics

Abstract

The giant pangasius (Pangasius sanitwongsei) is a critically endangered freshwater species with considerable ecological and economic significance in Southeast Asia and southern China. However, the lack of a high-quality reference genome has limited studies on its genetic adaptation and conservation strategies. Here, we present a chromosome-scale genome assembly of P. sanitwongsei, generated using PacBio HiFi long-read sequencing and Hi-C chromatin conformation capture. The final assembly spans 808.85 Mb, with a contig N50 of 18.70 Mb and a scaffold N50 of 28.10 Mb, achieving a BUSCO gene completeness of 99.15%. A total of 23,469 protein-coding genes were annotated, and 96.97% of the genes were functionally characterized. This high-quality genomic resource provides crucial insights into the adaptive evolution of Pangasiidae catfishes and offers a valuable foundation for conservation genomics, adaptive evolution, population genetics, and sustainable aquaculture development.

Similar content being viewed by others

A high-quality chromosome-level genome assembly and annotation of the giant freshwater prawn (Macrobrachium rosenbergii)

Article Open access 10 July 2025

A chromosome-level reference genome assembly of the Small snakehead (Channa asiatica)

Article Open access 08 July 2025

Chromosome-level genome assembly and annotation of a sea toad (Chaunax sp.)

Article Open access 19 December 2024

Data availability

The raw sequencing datasets have been deposited in the NCBI Sequence Read Archive under accession SRP57947851. The assembled P. sanitwongsei genome is available in GenBank under accession GCA_051225755.152, and the corresponding genome annotation files has been archived on Figshare (https://doi.org/10.6084/m9.figshare.29715215)53.

Code availability

No special codes or scripts were used in this work, and data processing was carried out based on the protocols and manuals of the corresponding bioinformatics software.

References

  1. Froese, R. & Pauly, D. (Fisheries Centre, University of British Columbia Los Baños, Philippines, 2010).

  2. Kang, B. & Huang, X. Mekong fishes: Biogeography, migration, resources, threats, and conservation. Rev Fish Sci Aquac 30, 170–194 (2022).

    Google Scholar 

  3. Roberts, T. R. & Vidthayanon, C. Systematic revision of the Asian catfish family Pangasiidae, with biological observations and descriptions of three new species. Proceedings of the Academy of Natural Sciences of Philadelphia, 97-143 (1991).

  4. Roberts, T. R. & Baird, I. G. Traditional fisheries and fish ecology on the Mekong River at Khone Waterfalls in southern Laos. Natural History Bulletin of the Siam Society 43, 219–262 (1995).

    Google Scholar 

  5. Chanthasoo, M., Wiwatcharakoset, S. & Lisanga, S. Breeding of Chao Phaya Giant catfish (Pangasius Sanitwongsei) (1990).

  6. Sutthi, N., Panase, A., Phinrub, W., Srisuttha, P. & Panase, P. Cold shock and its effect on biochemical indices, cortisol and electrolyte changes in Chao Phraya catfish, Pangasius sanitwongsei smith, 1931. Comparative Clinical Pathology 31, 757–764 (2022).

    Google Scholar 

  7. Na-Nakorn, U. et al. Genetic Diversity of the Vulnerable Pangasius sanitwongsei using Microsatellite DNA and 16S rRNA. Journal of Fisheries and Environment 33, 24–40 (2009).

    Google Scholar 

  8. Makinen, T., Weyl, O. L. F., Van der Walt, K.-A. & Swartz, E. R. First record of an introduction of the giant pangasius, Pangasius sanitwongsei Smith 1931, into an African river. African Zoology 48, 388–391 (2013).

    Google Scholar 

  9. Hogan, Z., Na-Nakorn, U. & Kong, H. Threatened fishes of the world: Pangasius sanitwongsei Smith 1931 (Siluriformes: Pangasiidae). Environmental Biology of Fishes 84, 305–306 (2009).

    Google Scholar 

  10. Campbell, T., Pin, K., Ngor, P. B. & Hogan, Z. Conserving Mekong megafishes: Current status and critical threats in Cambodia. Water 12, 1820 (2020).

    Google Scholar 

  11. Baird, I. G. & Hogan, Z. S. Hydropower Dam development and fish biodiversity in the Mekong River Basin: A review. Water 15, 1352 (2023).

    Google Scholar 

  12. Jutagate, T. & Rattanachai, A. Inland fishery resource enhancement and conservation in Thailand. Inland fisheries resource enhancement and conservation in Asia 133 (2010).

  13. Kitcharoen, N., Nakkham, K. & Mengumphan, K. A study on growth performance of interspecific crosses-hybrid cat fish spices: Buk Siam hybrid catfish (male Pangasianodon gigas x female P. hypophthalmus) Pangosius larnaudii and Pangasius sanitwongsei (2022).

  14. Karinthanyakit, W. & Jondeung, A. Molecular phylogenetic relationships of pangasiid and schilbid catfishes in Thailand. J Fish Biol 80, 2549–2570 (2012).

    Google Scholar 

  15. Duong, T. Y. et al. Mitophylogeny of Pangasiid catfishes and its taxonomic implications for Pangasiidae and the suborder Siluroidei. Zoological studies 62, e48 (2023).

    Google Scholar 

  16. Na‐Nakorn, U., Sriphairoj, K., Sukmanomon, S., Poompuang, S. & Kamonrat, W. Polymorphic microsatellite primers developed from DNA of the endangered Mekong giant catfish, Pangasianodon gigas (Chevey) and cross‐species amplification in three species of Pangasius. Molecular Ecology Notes 6, 1174–1176 (2006).

    Google Scholar 

  17. Wei, L. et al. Complete mitochondrial genome and phylogenetic position of Pangasius sanitwongsei (Siluriformes: Pangasiidae). Mitochondrial DNA Part B 5, 945–946 (2020).

    Google Scholar 

  18. Sriphairoj, K., Na-Nakorn, U. & Klinbunga, S. Species identification of non-hybrid and hybrid Pangasiid catfish using polymerase chain reaction-restriction fragment length polymorphism. Agriculture and Natural Resources 52, 99–105 (2018).

    Google Scholar 

  19. Gao, Z. et al. A chromosome-level genome assembly of the striped catfish (Pangasianodon hypophthalmus). Genomics 113, 3349–3356 (2021).

    Google Scholar 

  20. Hai, D. M. et al. A high-quality genome assembly of striped catfish (pangasianodon hypophthalmus) based on highly accurate long-read hifi sequencing data. Genes 13, 923 (2022).

    Google Scholar 

  21. Kim, O. T. P. et al. A draft genome of the striped catfish, Pangasianodon hypophthalmus, for comparative analysis of genes relevant to development and a resource for aquaculture improvement. Bmc Genomics 19, 733 (2018).

    Google Scholar 

  22. Wen, M. et al. An ancient truncated duplication of the anti‐Müllerian hormone receptor type 2 gene is a potential conserved master sex determinant in the Pangasiidae catfish family. Mol Ecol Resour 22, 2411–2428 (2022).

    Google Scholar 

  23. Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).

    Google Scholar 

  24. Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun 11, 1432 (2020).

    Google Scholar 

  25. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 18, 170–175 (2021).

    Google Scholar 

  26. Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell systems 3, 95–98 (2016).

    Google Scholar 

  27. Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).

    Google Scholar 

  28. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. bioinformatics 25, 1754–1760 (2009).

    Google Scholar 

  29. Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell systems 3, 99–101 (2016).

    Google Scholar 

  30. Wang, X. & Wang, L. GMATA: an integrated software package for genome-scale SSR mining, marker development and viewing. Frontiers in plant science 7, 215951 (2016).

    Google Scholar 

  31. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27, 573–580 (1999).

    Google Scholar 

  32. Han, Y. & Wessler, S. R. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res 38, e199–e199 (2010).

    Google Scholar 

  33. Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences 117, 9451–9457 (2020).

    Google Scholar 

  34. Abrusán, G., Grundmann, N., DeMester, L. & Makalowski, W. TEclass—a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics 25, 1329–1330 (2009).

    Google Scholar 

  35. Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenetic and genome research 110, 462–467 (2005).

    Google Scholar 

  36. Hubley, R. et al. The Dfam database of repetitive DNA families. Nucleic Acids Res 44, D81–D89 (2016).

    Google Scholar 

  37. Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature biotechnology 37, 907–915 (2019).

    Google Scholar 

  38. Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature biotechnology 33, 290–295 (2015).

    Google Scholar 

  39. Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature protocols 8, 1494–1512 (2013).

    Google Scholar 

  40. Li, H. Protein-to-genome alignment with miniprot. Bioinformatics 39, btad014 (2023).

    Google Scholar 

  41. Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR genomics and bioinformatics 3, lqaa108 (2021).

    Google Scholar 

  42. Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34, W435–W439 (2006).

    Google Scholar 

  43. Bruna, T., Lomsadze, A. & Borodovsky, M. GeneMark-ETP: automatic gene finding in eukaryotic genomes in consistency with extrinsic data. BioRxiv, 2023-2001 (2023).

  44. Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome biology 9, 1–22 (2008).

    Google Scholar 

  45. Buchfink, B., Reuter, K. & Drost, H.-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods 18, 366–368 (2021).

    Google Scholar 

  46. Bairoch, A. et al. The universal protein resource (UniProt). Nucleic Acids Res 33, D154–D159 (2005).

    Google Scholar 

  47. Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27–30 (2000).

    Google Scholar 

  48. Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC bioinformatics 4, 1–14 (2003).

    Google Scholar 

  49. Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat Genet 25, 25–29 (2000).

    Google Scholar 

  50. Kanz, C. et al. The EMBL nucleotide sequence database. Nucleic Acids Res 33, D29–D33 (2005).

    Google Scholar 

  51. NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP579478 (2025).

  52. NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_051225755.1 (2025).

  53. Liu, H. Chromosome-level genome assembly of the Siniperca obscura. Figshare https://doi.org/10.6084/m9.figshare.29715215 (2025).

  54. Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

    Google Scholar 

  55. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Google Scholar 

  56. Tang, H. et al. Synteny and collinearity in plant genomes. Science 320, 486–488 (2008).

    Google Scholar 

  57. Tang, H. et al. JCVI: A versatile toolkit for comparative genomics analysis. iMeta, e211 (2024).

  58. Marçais, G. et al. MUMmer4: A fast and versatile genome alignment system. PLoS computational biology 14, e1005944 (2018).

    Google Scholar 

  59. Goel, M., Sun, H., Jiao, W.-B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome biology 20, 277 (2019).

    Google Scholar 

Download references

Acknowledgements

We acknowledge financial support from the Guangxi Key R&D Program Agriculture and Rural Areas (AB2506910048), National Modern Agriculture Industry Technology System Special Project (CARS-46), Central Public-interest Scientific Institution Basal Research Fund, CAFS (2025XK01, 2025SJHX1, 2023TD37), China-ASEAN Maritime Cooperation Fund (CAMC-2018F), Guangdong Province Rural Revitalization Strategy Special Fund (2023-SJS-00-001).

Author information

Authors and Affiliations

  1. Aquatic Species Introduction and Breeding Center of Guangxi, Nanning, 530031, China

    Baojiang Gan, Lingjing Wei, Yuanxiong Ma, Feilong Mo, Shan Xiao, Yudian Lu, Kang Liu, Binlan Yang & Sheng Zhang

  2. Key Laboratory of Tropical and Subtropical Fishery Resources Application and Cultivation, Ministry of Agriculture and Rural Affairs, Pearl River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou, 510380, China

    Haiyang Liu

Authors
  1. Baojiang Gan
    View author publications

    Search author on:PubMed Google Scholar

  2. Lingjing Wei
    View author publications

    Search author on:PubMed Google Scholar

  3. Yuanxiong Ma
    View author publications

    Search author on:PubMed Google Scholar

  4. Feilong Mo
    View author publications

    Search author on:PubMed Google Scholar

  5. Shan Xiao
    View author publications

    Search author on:PubMed Google Scholar

  6. Yudian Lu
    View author publications

    Search author on:PubMed Google Scholar

  7. Kang Liu
    View author publications

    Search author on:PubMed Google Scholar

  8. Binlan Yang
    View author publications

    Search author on:PubMed Google Scholar

  9. Sheng Zhang
    View author publications

    Search author on:PubMed Google Scholar

  10. Haiyang Liu
    View author publications

    Search author on:PubMed Google Scholar

Contributions

All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Sheng Zhang or Haiyang Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gan, B., Wei, L., Ma, Y. et al. A chromosome-level reference genome assembly of the giant pangasius (Pangasius sanitwongsei). Sci Data (2025). https://doi.org/10.1038/s41597-025-06445-z

Download citation

  • Received: 01 August 2025

  • Accepted: 11 December 2025

  • Published: 16 December 2025

  • DOI: https://doi.org/10.1038/s41597-025-06445-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Associated content

Collection

Genomes of endangered species

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on Twitter
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing