Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
A chromosome-level genome assembly of the panda loach (Yaoshania pachychilus)
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 10 April 2026

A chromosome-level genome assembly of the panda loach (Yaoshania pachychilus)

  • Feng Lin1,2,3 na1,
  • Liang-Kun Wang1 na1,
  • Zhi-Yong Yuan1,
  • Jun Shi2,3,
  • Cai-Xin Liu4,
  • Dong-Yi Wu1,
  • Yao-Quan Han2,3,
  • Yu-Sen Li2,3,
  • Kun Qin5,
  • Yu-Yin Huang2,3,
  • Da-Peng Wang2,3 &
  • …
  • Feng Shao1 

Scientific Data , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Abstract

Yaoshania pachychilus (Gastromyzontidae, Cypriniformes) is a benthic species inhabiting torrential mountain streams, where it attaches to rocks using adhesive pelvic and pectoral fins. Its juvenile black-white coloration has made it popular in the ornamental fish trade, yet wild populations face increasing threats. Here, we present the first chromosome-level genome assembly for this species, generated using PacBio HiFi and Hi-C technologies. The final assembly spans 451.24 Mb, with a contig N50 of 9.87 Mb, and 99.2% of the sequences are anchored onto 25 pseudo-chromosomes. We annotated repetitive elements accounting for 27.8% of the genome and predicted 21,816 protein-coding genes. BUSCO completeness exceeded 97% for both the genome assembly and gene annotation. This reference genome provides a valuable resource for investigating torrent adaptation and pigmentation in Y. pachychilus, and also supports broader phylogenomic and adaptive evolution studies across Gastromyzontidae.

Data availability

All raw sequencing reads from all libraries have been deposited in the NCBI Sequence Read Archive under accession number SRP636306. The genome assembly has been deposited in NCBI GenBank under accession number JBTNTC000000000.1 (https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_054948905.1). Annotation files are available via Figshare (https://doi.org/10.6084/m9.figshare.30403003.v4).

Code availability

All analyses were conducted with publicly available bioinformatics tools according to official guidelines, using default parameters unless noted otherwise. The core code is also available at https://github.com/wangliangkun501/Yaoshania_pachychilus/.

References

  1. Yang, J., Kottelat, M., Yang, J.-X. & Chen, X.-Y. Yaoshania and Erromyzon kalotaenia, a new genus and a new species of balitorid loaches from Guangxi, China (Teleostei: Cypriniformes). Zootaxa 3586, 173–186, https://doi.org/10.11646/zootaxa.3586.1.16 (2012).

    Google Scholar 

  2. Xu, K. & Hu, F. The complete mitochondrial genome of Yaoshania pachychilus (CHEN, 1980) (Cypriniformes, Balitoridae). Mitochondrial DNA B Resour 1, 207–209, https://doi.org/10.1080/23802359.2016.1155088 (2016).

    Google Scholar 

  3. Wang, J. et al. An adhesive locomotion model for the rock-climbing fish, Beaufortia kweichowensis. Sci Rep 9, 16571, https://doi.org/10.1038/s41598-019-53027-2 (2019).

    Google Scholar 

  4. Wang, J., Xi, Y., Ji, C. & Zou, J. A biomimetic robot crawling bidirectionally with load inspired by rock-climbing fish. J. Zhejiang Univ. Sci. A 23, 14–26 (2022).

    Google Scholar 

  5. Wu, J. et al. Light-driven soft climbing robot based on negative pressure adsorption. Chem. Eng. J. 466, 143131, https://doi.org/10.1016/j.cej.2023.143131 (2023).

    Google Scholar 

  6. Zou, J., Wang, J. & Ji, C. The Adhesive System and Anisotropic Shear Force of Guizhou Gastromyzontidae. Sci Rep 6, 37221, https://doi.org/10.1038/srep37221 (2016).

    Google Scholar 

  7. Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890, https://doi.org/10.1093/bioinformatics/bty560 (2018).

    Google Scholar 

  8. Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).

    Google Scholar 

  9. Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204, https://doi.org/10.1093/bioinformatics/btx153 (2017).

    Google Scholar 

  10. Belton, J. M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276, https://doi.org/10.1016/j.ymeth.2012.05.001 (2012).

    Google Scholar 

  11. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5 (2021).

    Google Scholar 

  12. Zeng, X. et al. Chromosome-level scaffolding of haplotype-resolved assemblies using Hi-C data without reference genomes. Nat Plants 10, 1184–1200, https://doi.org/10.1038/s41477-024-01755-3 (2024).

    Google Scholar 

  13. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303.3997. https://doi.org/10.48550/arXiv.1303.3997 (2013).

  14. Faust, G. G. & Hall, I. M. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics 30, 2503–2505, https://doi.org/10.1093/bioinformatics/btu314 (2014).

    Google Scholar 

  15. Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008, https://doi.org/10.1093/gigascience/giab008 (2021).

    Google Scholar 

  16. Robinson, J. T. et al. Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data. Cell Syst 6, 256–258.e251, https://doi.org/10.1016/j.cels.2018.01.001 (2018).

    Google Scholar 

  17. Manni, M., Berkeley, M. R., Seppey, M. & Zdobnov, E. M. BUSCO: Assessing Genomic Data Quality and Beyond. Curr Protoc 1, e323, https://doi.org/10.1002/cpz1.323 (2021).

    Google Scholar 

  18. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580, https://doi.org/10.1093/nar/27.2.573 (1999).

    Google Scholar 

  19. Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences 117, 9451–9457, https://doi.org/10.1073/pnas.1921046117 (2020).

    Google Scholar 

  20. Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 6, 11, https://doi.org/10.1186/s13100-015-0041-9 (2015).

    Google Scholar 

  21. Storer, J., Hubley, R., Rosen, J., Wheeler, T. J. & Smit, A. F. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mobile DNA 12, 2, https://doi.org/10.1186/s13100-020-00230-y (2021).

    Google Scholar 

  22. Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics Chapter 4, 4.10.11–4.10.14, https://doi.org/10.1002/0471250953.bi0410s25 (2009).

    Google Scholar 

  23. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842, https://doi.org/10.1093/bioinformatics/btq033 (2010).

    Google Scholar 

  24. Gabriel, L. et al. BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA. Genome Res. 34, 769–777, https://doi.org/10.1101/gr.278090.123 (2024).

    Google Scholar 

  25. Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915, https://doi.org/10.1038/s41587-019-0201-4 (2019).

    Google Scholar 

  26. Buchfink, B., Reuter, K. & Drost, H.-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods 18, 366–368, https://doi.org/10.1038/s41592-021-01101-x (2021).

    Google Scholar 

  27. Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol 20, 278, https://doi.org/10.1186/s13059-019-1910-1 (2019).

    Google Scholar 

  28. Tang, S., Lomsadze, A. & Borodovsky, M. Identification of protein coding regions in RNA transcripts. Nucleic Acids Res. 43, e78, https://doi.org/10.1093/nar/gkv227 (2015).

    Google Scholar 

  29. Barnett, D. W., Garrison, E. K., Quinlan, A. R., Strömberg, M. P. & Marth, G. T. BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics 27, 1691–1692, https://doi.org/10.1093/bioinformatics/btr174 (2011).

    Google Scholar 

  30. Brůna, T., Lomsadze, A. & Borodovsky, M. GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes. Genome Res. 34, 757–768, https://doi.org/10.1101/gr.278373.123 (2024).

    Google Scholar 

  31. Stanke, M., Schöffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7, 62, https://doi.org/10.1186/1471-2105-7-62 (2006).

    Google Scholar 

  32. Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644, https://doi.org/10.1093/bioinformatics/btn013 (2008).

    Google Scholar 

  33. Gabriel, L., Hoff, K. J., Brůna, T., Borodovsky, M. & Stanke, M. TSEBRA: transcript selector for BRAKER. BMC Bioinformatics 22, 566, https://doi.org/10.1186/s12859-021-04482-0 (2021).

    Google Scholar 

  34. Loman, T. Lund University. A novel method for predicting ribosomal RNA genes in prokaryotic genomes. https://lup.lub.lu.se/student-papers/search/publication/8914064 (2017).

  35. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964, https://doi.org/10.1093/nar/25.5.955 (1997).

    Google Scholar 

  36. Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935, https://doi.org/10.1093/bioinformatics/btt509 (2013).

    Google Scholar 

  37. Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 49, D192–d200, https://doi.org/10.1093/nar/gkaa1047 (2021).

    Google Scholar 

  38. Paysan-Lafosse, T. et al. InterPro in 2022. Nucleic Acids Res. 51, D418–d427, https://doi.org/10.1093/nar/gkac993 (2023).

    Google Scholar 

  39. Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol. Biol. Evol. 38, 5825–5829, https://doi.org/10.1093/molbev/msab293 (2021).

    Google Scholar 

  40. Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–d314, https://doi.org/10.1093/nar/gky1085 (2019).

    Google Scholar 

  41. NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP636306 (2025).

  42. Wang, L. GenBank https://identifiers.org/ncbi/insdc.gca:GCA_054948905.1 (2025).

  43. Wang, L. Genome annotation, predicted CDS, and protein sequences of Yaoshania pachychilus. Figshare https://doi.org/10.6084/m9.figshare.30403003.v4 (2025).

  44. Deng, Y. et al. Genome of the butterfly hillstream loach provides insights into adaptations to torrential mountain stream life. Mol Ecol Resour 21, 1922–1935, https://doi.org/10.1111/1755-0998.13400 (2021).

    Google Scholar 

  45. Shen, Q. et al. Chromosome-level genome assembly of the butterfly hillstream loach Beaufortia pingi. Sci. Data 11, 1260, https://doi.org/10.1038/s41597-024-04144-9 (2024).

    Google Scholar 

  46. Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1432, https://doi.org/10.1038/s41467-020-14998-3 (2020).

    Google Scholar 

  47. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100, https://doi.org/10.1093/bioinformatics/bty191 (2018).

    Google Scholar 

  48. Tang, H. et al. JCVI: A versatile toolkit for comparative genomics analysis. Imeta 3, e211, https://doi.org/10.1002/imt2.211 (2024).

    Google Scholar 

Download references

Acknowledgements

We extend our sincere gratitude to the staff of the Management Center of the Dayao Mountain National Nature Reserve for their assistance. This study was funded by the Guangxi Natural Science Foundation under Grant No. 2026GXNSFBA00640092, the Project of Financial Funds of the Ministry of Agriculture and Rural Affairs: Investigation of Fishery Resources and Habitat in the Pearl River Basin (ZJZX-04), and Special survey on comprehensive scientific investigation of Guangxi Dayao Mountain National Nature Reserve (LKWT-2025-057).

Author information

Author notes
  1. These authors contributed equally: Feng Lin, Liang-Kun Wang.

Authors and Affiliations

  1. Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), School of Life Sciences, Southwest University, Chongqing, 400715, China

    Feng Lin, Liang-Kun Wang, Zhi-Yong Yuan, Dong-Yi Wu & Feng Shao

  2. Guangxi Academy of Fishery Sciences, Nanning, Guangxi, 530021, China

    Feng Lin, Jun Shi, Yao-Quan Han, Yu-Sen Li, Yu-Yin Huang & Da-Peng Wang

  3. Engineering Research Center of Hongshui River Rare Fish Conservation, Guangxi Zhuang Autonomous Region, Nanning, Guangxi, 530021, China

    Feng Lin, Jun Shi, Yao-Quan Han, Yu-Sen Li, Yu-Yin Huang & Da-Peng Wang

  4. State Key Laboratory of Genetic Resources and Evolution & Yunnan Key Laboratory of Biodiversity and Ecological Conservation of Gaoligong Mountain, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650201, China

    Cai-Xin Liu

  5. Management Center of the Dayao Mountain National Nature Reserve, Laibin, Guangxi, 546100, China

    Kun Qin

Authors
  1. Feng Lin
    View author publications

    Search author on:PubMed Google Scholar

  2. Liang-Kun Wang
    View author publications

    Search author on:PubMed Google Scholar

  3. Zhi-Yong Yuan
    View author publications

    Search author on:PubMed Google Scholar

  4. Jun Shi
    View author publications

    Search author on:PubMed Google Scholar

  5. Cai-Xin Liu
    View author publications

    Search author on:PubMed Google Scholar

  6. Dong-Yi Wu
    View author publications

    Search author on:PubMed Google Scholar

  7. Yao-Quan Han
    View author publications

    Search author on:PubMed Google Scholar

  8. Yu-Sen Li
    View author publications

    Search author on:PubMed Google Scholar

  9. Kun Qin
    View author publications

    Search author on:PubMed Google Scholar

  10. Yu-Yin Huang
    View author publications

    Search author on:PubMed Google Scholar

  11. Da-Peng Wang
    View author publications

    Search author on:PubMed Google Scholar

  12. Feng Shao
    View author publications

    Search author on:PubMed Google Scholar

Contributions

F.L., F.S. and D.P.W. conceived this project; F.L., J.S., C.X.L. and K.Q. collected samples; D.Y.W., Y.Q.H., and Y,S.L. prepared the sequencing samples; L.K.W. and F.S. analyzed the data; L.K.W. and F.L. wrote the manuscript; F.S., Z.Y.Y. and F.L. revised the manuscript. All authors have read and approved the final manuscript for publication.

Corresponding authors

Correspondence to Da-Peng Wang or Feng Shao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, F., Wang, LK., Yuan, ZY. et al. A chromosome-level genome assembly of the panda loach (Yaoshania pachychilus). Sci Data (2026). https://doi.org/10.1038/s41597-026-07141-2

Download citation

  • Received: 13 November 2025

  • Accepted: 26 March 2026

  • Published: 10 April 2026

  • DOI: https://doi.org/10.1038/s41597-026-07141-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing