Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
Chromosome-level genome assembly of the longfin barb (Acrossocheilus longipinnis)
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 30 January 2026

Chromosome-level genome assembly of the longfin barb (Acrossocheilus longipinnis)

  • Zechen E  ORCID: orcid.org/0009-0001-6857-47941,2,3,
  • Fangyuan Xiong1,2,3,
  • Yuansheng Zhu1,2,3,
  • Li Wang1,2,3,
  • Jiajun Zhang1,2,3,
  • Shenghui Dong1,2,3 &
  • …
  • Mingxiang Lu1,2,3 

Scientific Data , Article number:  (2026) Cite this article

  • 1535 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Genome assembly algorithms
  • Ichthyology

Abstract

The longfin barb (Acrossocheilus longipinnis), a vulnerable cyprinid fish endemic to China’s Pearl River basin, is of significant conservation concern and also popular in the ornamental fish trade. To facilitate genetic research and molecular breeding for this species, we generated a high-quality genome by integrating PacBio HiFi long reads and Hi-C sequencing data. The final assembly spans approximately 936.04 Mb, achieving high continuity with a contig N50 of 36.09 Mb. Assessment of genome quality revealed excellent completeness (98.76% BUSCO score) and accuracy (QV = 54.46; GCI = 29.76; CRAQ = 96.40). The vast majority of the sequence (927.20 Mb, 99.06%) was successfully anchored to 25 chromosomes. Annotation predicted 24,718 protein-coding genes and identified approximately 553.06 Mb (59.09%) of repetitive elements. This high-quality chromosome-scale reference genome provides a crucial foundation for investigating the genomic underpinnings of A. longipinnis evolution and will significantly advance molecular breeding programs aimed at its conservation and sustainable utilization.

Similar content being viewed by others

Chromosome-level genome assembly of Acrossocheilus fasciatus using PacBio sequencing and Hi-C technology

Article Open access 03 February 2024

Chromosome-scale genome assembly and annotation of Xenocypris argentea

Article Open access 04 April 2025

Chromosome-level genome assembly of Decorus tungting, an endemic cyprinid from China

Article Open access 04 November 2025

Data availability

Raw sequencing data have been deposited in the NCBI SRA database under BioProject accession number PRJNA1297891, with accession numbers as follows: PacBio HiFi: SRR3477099149; Hi-C: SRR3477099250; RNA sequencing: SRR3477099051; DNA short-read sequencing: SRR3477099352. The genome assembly has been uploaded to the GenBank database under the accession GCA_054083375.153. Moreover, the genome assembly, annotation files (GFF3, FASTA), and gene functional annotation datasets, are available via Figshare41. All datasets are publicly accessible without restrictions.

Code availability

No specific code or script was used in this work. Commands used for data processing were all executed according to the manuals and protocols of the corresponding software.

References

  1. Yuan, L. Y., Liu, X. X. & Zhang, E. Mitochondrial phylogeny of Chinese barred species of the cyprinid genus Acrossocheilus Oshima, 1919 (Teleostei: Cypriniformes) and its taxonomic implications. Zootaxa 4059, 151–168 (2015).

    Google Scholar 

  2. Chen, T. E. et al. A New Species of the Genus Acrossocheilus Oshima, 1919 (Cypriniformes: Cyprinidae) from the Dabie Mountains. Animals 15, 734 (2025).

    Google Scholar 

  3. Hou, X.-J. et al. Complete mitochondrial genome of the freshwater fish Acrossocheilus longipinnis (Teleostei: Cyprinidae): genome characterization and phylogenetic analysis. Biologia 75, 1871–1880 (2020).

    Google Scholar 

  4. Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nature biotechnology 37, 1155–1162 (2019).

    Google Scholar 

  5. Lovell, J. T. et al. Four chromosome scale genomes and a pan-genome annotation to accelerate pecan tree breeding. Nature Communications 12, 4125 (2021).

    Google Scholar 

  6. Li, H. & Durbin, R. Genome assembly in the telomere-to-telomere era. Nature Reviews Genetics 25, 658–670 (2024).

    Google Scholar 

  7. Wang, B. et al. Long and Accurate: How HiFi Sequencing is Transforming Genomics. Genomics Proteomics Bioinformatics 23 (2025).

  8. Zheng, J. et al. Chromosome-level genome assembly of Acrossocheilus fasciatus using PacBio sequencing and Hi-C technology. Scientific Data 11, 166 (2024).

    Google Scholar 

  9. Chin, C. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nature Methods 10, 563–569 (2013).

    Google Scholar 

  10. Chen, S., Zhou, Y., Chen, Y. & Jia, G. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).

    Google Scholar 

  11. Sun, H., Ding, J., Piednoël, M. & Schneeberger, K. findGSE: estimating genome size variation within human and Arabidopsis using k-mer frequencies. Bioinformatics (Oxford, England) 34, 550–557 (2018).

    Google Scholar 

  12. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18, 170–175 (2021).

    Google Scholar 

  13. Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics (Oxford, England) 36, 2253–2255 (2020).

    Google Scholar 

  14. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Google Scholar 

  15. Servant, N. et al. HiC-Pro: An optimized and flexible pipeline for Hi-C data processing. Genome Biology 16 (2015).

  16. Durand, N. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Systems 3, 95–98 (2016).

    Google Scholar 

  17. Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, eaal3327 (2017).

    Google Scholar 

  18. Durand, N. C. et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell systems 3, 99–101 (2016).

    Google Scholar 

  19. Xu, G.-C. et al. LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly. GigaScience 8 (2018).

  20. Hu, J. et al. NextPolish2:a repeat-aware polishing tool for genomes assembled using HiFi long reads. (bioRxiv, 2023).

  21. Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenetic and genome research 110, 462–467 (2005).

    Google Scholar 

  22. Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics (Oxford, England) 21(Suppl 1), i351–8 (2005).

    Google Scholar 

  23. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic acids research 35, W265–8 (2007).

    Google Scholar 

  24. Chen, N. Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences. Current Protocols in Bioinformatics 5 (2004).

  25. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research 27, 573–580 (1999).

    Google Scholar 

  26. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nature Methods 12, 357–360 (2015).

    Google Scholar 

  27. Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biology 20, 278 (2019).

    Google Scholar 

  28. Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology 29, 644–652 (2011).

    Google Scholar 

  29. Haas, B. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research 31, 5654–5666 (2003).

    Google Scholar 

  30. Liu, F. et al. The telomere-to-telomere gapless genome of grass carp provides insights for genetic improvement. GigaScience 14 (2025).

  31. Yuan, J. et al. A telomere-to-telomere genome assembly of koi carp (Cyprinus carpio) using long reads and Hi-C technology. GigaScience 14 (2025).

  32. Chen, L. et al. Chromosome-level genome of Poropuntius huangchuchieni provides a diploid progenitor-like reference genome for the allotetraploid Cyprinus carpio. Molecular ecology resources 21, 1658–1669 (2021).

    Google Scholar 

  33. Stanke, M. & Morgenstern, B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic acids research 33, W465–7 (2005).

    Google Scholar 

  34. Solovyev, V., Kosarev, P., Seledsov, I. & Vorobyev, D. Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome biology 7(Suppl 1), S10.1–12 (2006).

    Google Scholar 

  35. Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biology 9, R7 (2008).

    Google Scholar 

  36. Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Research 27, 49–54 (1999).

    Google Scholar 

  37. Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research 28, 27–30 (2000).

    Google Scholar 

  38. Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nature Methods 12, 59–60 (2015).

    Google Scholar 

  39. Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).

    Google Scholar 

  40. NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP604471 (2025).

  41. Li, J. Chromosome-level genome assembly of Acrossocheilus longipinnis using PacBio sequencing and Hi-C technology. Figshare. Dataset. https://doi.org/10.6084/m9.figshare.29665907.v1 (2025).

  42. Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology 21 (2020).

  43. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Google Scholar 

  44. Yin, D. et al. Telomere-to-telomere gap-free genome assembly of the endangered Yangtze finless porpoise and East Asian finless porpoise. GigaScience 13 (2024).

  45. Li, K., Xu, P., Wang, J., Yi, X. & Jiao, Y. Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement. Nature Communications 14, 6556 (2023).

    Google Scholar 

  46. Chen, Q., Yang, C., Zhang, G. & Wu, D. GCI: a continuity inspector for complete genome assembly. Bioinformatics 40 (2024).

  47. Huang, Z. A.-O. et al. Evolutionary analysis of a complete chicken genome. Proc Natl Acad Sci USA. 120(8), e2216641120 (2023).

    Google Scholar 

  48. Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

    Google Scholar 

  49. NCBI sequence read archive https://identifiers.org/ncbi/insdc.sra:SRR34770991 (2025).

  50. NCBI sequence read archive https://identifiers.org/ncbi/insdc.sra:SRR34770992 (2025).

  51. NCBI sequence read archive https://identifiers.org/ncbi/insdc.sra:SRR34770990 (2025).

  52. NCBI sequence read archive https://identifiers.org/ncbi/insdc.sra:SRR34770993 (2025).

  53. NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_054083375.1 (2025).

Download references

Acknowledgements

This work is supported by Operating funds of Hongshui River Rare Fish conservation Center.

Author information

Authors and Affiliations

  1. Scientific Institute of Pearl River Water Resources Protection, Guangzhou, 510611, China

    Zechen E, Fangyuan Xiong, Yuansheng Zhu, Li Wang, Jiajun Zhang, Shenghui Dong & Mingxiang Lu

  2. Hongshui River Rare Fish Conservation Center, Guigang, 537200, China

    Zechen E, Fangyuan Xiong, Yuansheng Zhu, Li Wang, Jiajun Zhang, Shenghui Dong & Mingxiang Lu

  3. Engineering Research Center of Hongshui River Rare Fish Conservation, Guangxi Zhuang Autonomous Region, Guigang, 537200, China

    Zechen E, Fangyuan Xiong, Yuansheng Zhu, Li Wang, Jiajun Zhang, Shenghui Dong & Mingxiang Lu

Authors
  1. Zechen E
    View author publications

    Search author on:PubMed Google Scholar

  2. Fangyuan Xiong
    View author publications

    Search author on:PubMed Google Scholar

  3. Yuansheng Zhu
    View author publications

    Search author on:PubMed Google Scholar

  4. Li Wang
    View author publications

    Search author on:PubMed Google Scholar

  5. Jiajun Zhang
    View author publications

    Search author on:PubMed Google Scholar

  6. Shenghui Dong
    View author publications

    Search author on:PubMed Google Scholar

  7. Mingxiang Lu
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Zechen E conceived this study, designed the experiment, and performed data analysis. Fangyuan Xiong contributed to the experimental design, collected samples, and performed data analysis. Yuansheng Zhu and Li Wang provided funding and contributed to conceptualization. Jiajun Zhang and Shenghui Dong assisted in methodology and data curation. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Fangyuan Xiong.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

E, Z., Xiong, F., Zhu, Y. et al. Chromosome-level genome assembly of the longfin barb (Acrossocheilus longipinnis). Sci Data (2026). https://doi.org/10.1038/s41597-026-06656-y

Download citation

  • Received: 17 September 2025

  • Accepted: 19 January 2026

  • Published: 30 January 2026

  • DOI: https://doi.org/10.1038/s41597-026-06656-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing