Abstract
Yaoshania pachychilus (Gastromyzontidae, Cypriniformes) is a benthic species inhabiting torrential mountain streams, where it attaches to rocks using adhesive pelvic and pectoral fins. Its juvenile black-white coloration has made it popular in the ornamental fish trade, yet wild populations face increasing threats. Here, we present the first chromosome-level genome assembly for this species, generated using PacBio HiFi and Hi-C technologies. The final assembly spans 451.24 Mb, with a contig N50 of 9.87 Mb, and 99.2% of the sequences are anchored onto 25 pseudo-chromosomes. We annotated repetitive elements accounting for 27.8% of the genome and predicted 21,816 protein-coding genes. BUSCO completeness exceeded 97% for both the genome assembly and gene annotation. This reference genome provides a valuable resource for investigating torrent adaptation and pigmentation in Y. pachychilus, and also supports broader phylogenomic and adaptive evolution studies across Gastromyzontidae.
Data availability
All raw sequencing reads from all libraries have been deposited in the NCBI Sequence Read Archive under accession number SRP636306. The genome assembly has been deposited in NCBI GenBank under accession number JBTNTC000000000.1 (https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_054948905.1). Annotation files are available via Figshare (https://doi.org/10.6084/m9.figshare.30403003.v4).
Code availability
All analyses were conducted with publicly available bioinformatics tools according to official guidelines, using default parameters unless noted otherwise. The core code is also available at https://github.com/wangliangkun501/Yaoshania_pachychilus/.
References
Yang, J., Kottelat, M., Yang, J.-X. & Chen, X.-Y. Yaoshania and Erromyzon kalotaenia, a new genus and a new species of balitorid loaches from Guangxi, China (Teleostei: Cypriniformes). Zootaxa 3586, 173–186, https://doi.org/10.11646/zootaxa.3586.1.16 (2012).
Xu, K. & Hu, F. The complete mitochondrial genome of Yaoshania pachychilus (CHEN, 1980) (Cypriniformes, Balitoridae). Mitochondrial DNA B Resour 1, 207–209, https://doi.org/10.1080/23802359.2016.1155088 (2016).
Wang, J. et al. An adhesive locomotion model for the rock-climbing fish, Beaufortia kweichowensis. Sci Rep 9, 16571, https://doi.org/10.1038/s41598-019-53027-2 (2019).
Wang, J., Xi, Y., Ji, C. & Zou, J. A biomimetic robot crawling bidirectionally with load inspired by rock-climbing fish. J. Zhejiang Univ. Sci. A 23, 14–26 (2022).
Wu, J. et al. Light-driven soft climbing robot based on negative pressure adsorption. Chem. Eng. J. 466, 143131, https://doi.org/10.1016/j.cej.2023.143131 (2023).
Zou, J., Wang, J. & Ji, C. The Adhesive System and Anisotropic Shear Force of Guizhou Gastromyzontidae. Sci Rep 6, 37221, https://doi.org/10.1038/srep37221 (2016).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890, https://doi.org/10.1093/bioinformatics/bty560 (2018).
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204, https://doi.org/10.1093/bioinformatics/btx153 (2017).
Belton, J. M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276, https://doi.org/10.1016/j.ymeth.2012.05.001 (2012).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5 (2021).
Zeng, X. et al. Chromosome-level scaffolding of haplotype-resolved assemblies using Hi-C data without reference genomes. Nat Plants 10, 1184–1200, https://doi.org/10.1038/s41477-024-01755-3 (2024).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303.3997. https://doi.org/10.48550/arXiv.1303.3997 (2013).
Faust, G. G. & Hall, I. M. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics 30, 2503–2505, https://doi.org/10.1093/bioinformatics/btu314 (2014).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008, https://doi.org/10.1093/gigascience/giab008 (2021).
Robinson, J. T. et al. Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data. Cell Syst 6, 256–258.e251, https://doi.org/10.1016/j.cels.2018.01.001 (2018).
Manni, M., Berkeley, M. R., Seppey, M. & Zdobnov, E. M. BUSCO: Assessing Genomic Data Quality and Beyond. Curr Protoc 1, e323, https://doi.org/10.1002/cpz1.323 (2021).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580, https://doi.org/10.1093/nar/27.2.573 (1999).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences 117, 9451–9457, https://doi.org/10.1073/pnas.1921046117 (2020).
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 6, 11, https://doi.org/10.1186/s13100-015-0041-9 (2015).
Storer, J., Hubley, R., Rosen, J., Wheeler, T. J. & Smit, A. F. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mobile DNA 12, 2, https://doi.org/10.1186/s13100-020-00230-y (2021).
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics Chapter 4, 4.10.11–4.10.14, https://doi.org/10.1002/0471250953.bi0410s25 (2009).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842, https://doi.org/10.1093/bioinformatics/btq033 (2010).
Gabriel, L. et al. BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA. Genome Res. 34, 769–777, https://doi.org/10.1101/gr.278090.123 (2024).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915, https://doi.org/10.1038/s41587-019-0201-4 (2019).
Buchfink, B., Reuter, K. & Drost, H.-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods 18, 366–368, https://doi.org/10.1038/s41592-021-01101-x (2021).
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol 20, 278, https://doi.org/10.1186/s13059-019-1910-1 (2019).
Tang, S., Lomsadze, A. & Borodovsky, M. Identification of protein coding regions in RNA transcripts. Nucleic Acids Res. 43, e78, https://doi.org/10.1093/nar/gkv227 (2015).
Barnett, D. W., Garrison, E. K., Quinlan, A. R., Strömberg, M. P. & Marth, G. T. BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics 27, 1691–1692, https://doi.org/10.1093/bioinformatics/btr174 (2011).
Brůna, T., Lomsadze, A. & Borodovsky, M. GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes. Genome Res. 34, 757–768, https://doi.org/10.1101/gr.278373.123 (2024).
Stanke, M., Schöffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7, 62, https://doi.org/10.1186/1471-2105-7-62 (2006).
Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644, https://doi.org/10.1093/bioinformatics/btn013 (2008).
Gabriel, L., Hoff, K. J., Brůna, T., Borodovsky, M. & Stanke, M. TSEBRA: transcript selector for BRAKER. BMC Bioinformatics 22, 566, https://doi.org/10.1186/s12859-021-04482-0 (2021).
Loman, T. Lund University. A novel method for predicting ribosomal RNA genes in prokaryotic genomes. https://lup.lub.lu.se/student-papers/search/publication/8914064 (2017).
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964, https://doi.org/10.1093/nar/25.5.955 (1997).
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935, https://doi.org/10.1093/bioinformatics/btt509 (2013).
Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 49, D192–d200, https://doi.org/10.1093/nar/gkaa1047 (2021).
Paysan-Lafosse, T. et al. InterPro in 2022. Nucleic Acids Res. 51, D418–d427, https://doi.org/10.1093/nar/gkac993 (2023).
Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol. Biol. Evol. 38, 5825–5829, https://doi.org/10.1093/molbev/msab293 (2021).
Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–d314, https://doi.org/10.1093/nar/gky1085 (2019).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP636306 (2025).
Wang, L. GenBank https://identifiers.org/ncbi/insdc.gca:GCA_054948905.1 (2025).
Wang, L. Genome annotation, predicted CDS, and protein sequences of Yaoshania pachychilus. Figshare https://doi.org/10.6084/m9.figshare.30403003.v4 (2025).
Deng, Y. et al. Genome of the butterfly hillstream loach provides insights into adaptations to torrential mountain stream life. Mol Ecol Resour 21, 1922–1935, https://doi.org/10.1111/1755-0998.13400 (2021).
Shen, Q. et al. Chromosome-level genome assembly of the butterfly hillstream loach Beaufortia pingi. Sci. Data 11, 1260, https://doi.org/10.1038/s41597-024-04144-9 (2024).
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1432, https://doi.org/10.1038/s41467-020-14998-3 (2020).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100, https://doi.org/10.1093/bioinformatics/bty191 (2018).
Tang, H. et al. JCVI: A versatile toolkit for comparative genomics analysis. Imeta 3, e211, https://doi.org/10.1002/imt2.211 (2024).
Acknowledgements
We extend our sincere gratitude to the staff of the Management Center of the Dayao Mountain National Nature Reserve for their assistance. This study was funded by the Guangxi Natural Science Foundation under Grant No. 2026GXNSFBA00640092, the Project of Financial Funds of the Ministry of Agriculture and Rural Affairs: Investigation of Fishery Resources and Habitat in the Pearl River Basin (ZJZX-04), and Special survey on comprehensive scientific investigation of Guangxi Dayao Mountain National Nature Reserve (LKWT-2025-057).
Author information
Authors and Affiliations
Contributions
F.L., F.S. and D.P.W. conceived this project; F.L., J.S., C.X.L. and K.Q. collected samples; D.Y.W., Y.Q.H., and Y,S.L. prepared the sequencing samples; L.K.W. and F.S. analyzed the data; L.K.W. and F.L. wrote the manuscript; F.S., Z.Y.Y. and F.L. revised the manuscript. All authors have read and approved the final manuscript for publication.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Lin, F., Wang, LK., Yuan, ZY. et al. A chromosome-level genome assembly of the panda loach (Yaoshania pachychilus). Sci Data (2026). https://doi.org/10.1038/s41597-026-07141-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-026-07141-2