Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
A telomere-to-telomere reference genome assembly of the Hypomesus nipponensis
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 27 March 2026

A telomere-to-telomere reference genome assembly of the Hypomesus nipponensis

  • Yanfeng Zhou1,
  • Di’an Fang  ORCID: orcid.org/0000-0003-2151-18651,
  • Yang You1,
  • Fujiang Tang2,
  • Yulin Bai1,
  • Minying Zhang1,
  • Xuemei Li3,
  • Guoping Deng4 &
  • …
  • Dongpo Xu1 

Scientific Data , Article number:  (2026) Cite this article

  • 1180 Accesses

  • 1 Altmetric

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Comparative genomics
  • Genetic variation

Abstract

A small cold-water teleost endemic to Northeast Asia, Hypomesus nipponensis possesses a short lifecycle, high fecundity, and rapid population growth, with extensive introductions for aquacultural purposes across East Asia. In this study, we generated a gap-free, telomere-to-telomere (T2T) genome assembly of H. nipponensis using a combined sequencing strategy, incorporating MGI short reads, PacBio High-Fidelity (HiFi) reads, Oxford Nanopore Technologies (ONT) ultra-long reads, and Hi-C data. The final assembly spans 526.31 Mb with a contig N50 of 20.23 Mb, and all genomic sequences were successfully anchored to 28 pseudochromosomes. BUSCO assessment (Actinopterygii_odb10) confirms 98.19% completeness, including 3,548 single-copy and 26 duplicated orthologs out of 3,640 conserved genes. Repeat elements account for 39.17% (206.18 Mb) of the genome, and 31,310 protein-coding genes are annotated. This gap-free T2T assembly resolves previously uncharacterized genomic regions, providing a high-quality reference for molecular breeding, evolutionary analyses of the Hypomesus genus, and functional investigations into adaptive traits of cold-water fishes.

Similar content being viewed by others

Near telomere-to-telomere genome assembly of the fourfinger threadfin (Eleutheronema tetradactylum)

Article Open access 02 December 2025

A complete telomere-to-telomere chromosome-level genome assembly of X-ray tetra (Pristella maxillaris)

Article Open access 24 March 2025

Telomere-to-telomere haplotype-resolved genome assembly of a female oyster pompano (Trachinotus anak)

Article Open access 02 December 2025

Data availability

All data supporting this study have been publicly available. Raw sequencing data have been deposited in the NCBI Sequence Read Archive (SRA) database under the BioProject id PRJNA128279649, including RNA-seq data (SRR34259912 to SRR34259916), MGI genome survey data (SRR34259908), Hi-C reads (SRR34259911), Nanopore long-read data (SRR34259910) and PacBio long-read data (SRR34259909). The genome assembly has been deposited at the NCBl GenBank under the accession number of GCA_054491055.150. The genome assembly and gene structure annotation are also available on Figshare (https://doi.org/10.6084/m9.figshare.29672606.v1)51.

Code availability

All scripts and pipelines used for the genome assembly and gene annotation followed the standard manuals and protocols of the applied bioinformatics software. No specific code was developed for this study.

References

  1. Sakamoto, D. et al. Population size estimation of the pond smelt Hypomesus nipponensis in Lake Kasumigaura and Lake Kitaura, Japan. Fisheries Science 80, 907–914, https://doi.org/10.1007/s12562-014-0791-1 (2014).

    Google Scholar 

  2. Xie, Y. et al. The fishes of genus Hypomesus and utilization of its resource (in Chinese) (Liaoning Science and Technology Press, 1992).

  3. Yin, C., Chen, Y., Guo, L. & Ni, L. Fish Assemblage Shift after Japanese Smelt (Hypomesus nipponensis McAllister, 1963) Invasion in Lake Erhai, a Subtropical Plateau Lake in China. Water 13, 1800, https://doi.org/10.3390/w13131800 (2021).

    Google Scholar 

  4. Choi, S. & Kim, E. B. Complete mitochondrial genome sequence and SNPs of the Korean smelt Hypomesus nipponensis (Osmeriformes, Osmeridae). Mitochondrial DNA Part B 4, 1844–1845, https://doi.org/10.1080/23802359.2019.1613178 (2019).

    Google Scholar 

  5. Xuan, B. et al. Draft genome of the Korean smelt Hypomesus nipponensis and its transcriptomic responses to heat stress in the liver and muscle. G3 (Bethesda) 11, https://doi.org/10.1093/g3journal/jkab147 (2021).

  6. Zhu, C., Kuang, Y., Li, Z. & Tang, F. Chromosome-level draft genome assembly of Hypomesus nipponensis reveals transposable element expansion reshaping the genome structure. Front Genet 16, 1502681, https://doi.org/10.3389/fgene.2025.1502681 (2025).

    Google Scholar 

  7. Shay, J. W. & Wright, W. E. Telomeres and telomerase: three decades of progress. Nat Rev Genet 20, 299–309, https://doi.org/10.1038/s41576-019-0099-1 (2019).

    Google Scholar 

  8. Wu, M. et al. Segrosome assembly at the pliable parH centromere. Nucleic Acids Res 39, 5082–5097, https://doi.org/10.1093/nar/gkr115 (2011).

    Google Scholar 

  9. Jain, M. et al. Linear assembly of a human centromere on the Y chromosome. Nature Biotechnology 36, 321–323, https://doi.org/10.1038/nbt.4109 (2018).

    Google Scholar 

  10. Vollger, M. R. et al. Segmental duplications and their variation in a complete human genome. Science 376, eabj6965, https://doi.org/10.1126/science.abj6965 (2022).

    Google Scholar 

  11. Yin, D. et al. Telomere-to-telomere gap-free genome assembly of the endangered Yangtze finless porpoise and East Asian finless porpoise. GigaScience 13, https://doi.org/10.1093/gigascience/giae067 (2024).

  12. Zhou, Y. et al. Gap-free genome assembly of Salangid icefish Neosalanx taihuensis. Scientific Data 10, 768, https://doi.org/10.1038/s41597-023-02677-z (2023).

    Google Scholar 

  13. Zhou, Y. et al. Telomere-to-telomere genome and resequencing of 231 individuals reveal evolution, genomic footprints in Asian icefish, Protosalanx chinensis. GigaScience 14, https://doi.org/10.1093/gigascience/giaf067 (2025).

  14. Jiang, M. et al. The telomere-to-telomere gap-free reference genome and taxonomic reassessment of Siniperca roulei. GigaScience 14, https://doi.org/10.1093/gigascience/giaf068 (2025).

  15. Cheng, H. et al. Efficient near telomere-to-telomere assembly of Nanopore Simplex reads. bioRxiv, https://doi.org/10.1101/2025.04.14.648685 (2025).

  16. Healey, A., Furtado, A., Cooper, T. & Henry, R. J. Protocol: a simple method for extracting next-generation sequencing quality genomic DNA from recalcitrant plant species. Plant Methods 10, 21, https://doi.org/10.1186/1746-4811-10-21 (2014).

    Google Scholar 

  17. Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890, https://doi.org/10.1093/bioinformatics/bty560 (2018).

    Google Scholar 

  18. Rhoads, A. & Au, K. F. PacBio Sequencing and Its Applications. Genomics Proteomics Bioinformatics 13, 278–289, https://doi.org/10.1016/j.gpb.2015.08.002 (2015).

    Google Scholar 

  19. Zhu, W. et al. Altered chromatin compaction and histone methylation drive non-additive gene expression in an interspecific Arabidopsis hybrid. Genome Biology 18, 157, https://doi.org/10.1186/s13059-017-1281-4 (2017).

    Google Scholar 

  20. Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).

    Google Scholar 

  21. Sun, H., Ding, J., Piednoël, M. & Schneeberger, K. findGSE: estimating genome size variation within human and Arabidopsis using k-mer frequencies. Bioinformatics (Oxford, England) 34, 550–557, https://doi.org/10.1093/bioinformatics/btx637 (2018).

    Google Scholar 

  22. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5 (2021).

    Google Scholar 

  23. Hu, J. et al. NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads. Genome Biology 25, 107, https://doi.org/10.1186/s13059-024-03252-4 (2024).

    Google Scholar 

  24. Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963, https://doi.org/10.1371/journal.pone.0112963 (2014).

    Google Scholar 

  25. Roach, M. J., Schmidt, S. A. & Borneman, A. R. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19, 460, https://doi.org/10.1186/s12859-018-2485-7 (2018).

    Google Scholar 

  26. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357–359, https://doi.org/10.1038/nmeth.1923 (2012).

    Google Scholar 

  27. Servant, N. et al. HiC-Pro: An optimized and flexible pipeline for Hi-C data processing. Genome Biology 16, https://doi.org/10.1186/s13059-015-0831-x (2015).

  28. Durand, N. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Systems 3, 95–98, https://doi.org/10.1016/j.cels.2016.07.002 (2016).

    Google Scholar 

  29. Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, eaal3327, https://doi.org/10.1126/science.aal3327 (2017).

    Google Scholar 

  30. Durand, N. C. et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell systems 3, 99–101, https://doi.org/10.1016/j.cels.2015.07.012 (2016).

    Google Scholar 

  31. Wang, G. & Yu, W. J. A preliminary study on the karyotype of Hypomesus olidus. Salmon Fishery 2(1), n.p. (in Chinese) (1989).

  32. Xu, G. C. et al. LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly. Gigascience 8, https://doi.org/10.1093/gigascience/giy157 (2019).

  33. Lin, Y. et al. quarTeT: a telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification. Hortic Res, https://doi.org/10.1093/hr/uhad127 (2023).

  34. Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110, 462–467, https://doi.org/10.1159/000084979 (2005).

    Google Scholar 

  35. Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics (Oxford, England) 21(Suppl 1), i351–358, https://doi.org/10.1093/bioinformatics/bti1018 (2005).

    Google Scholar 

  36. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic acids research 35, W265–268, https://doi.org/10.1093/nar/gkm286 (2007).

    Google Scholar 

  37. Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics Chapter 4, 4.10.11–14.10.14, https://doi.org/10.1002/0471250953.bi0410s25 (2009).

    Google Scholar 

  38. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research 27, 573–580, https://doi.org/10.1093/nar/27.2.573 (1999).

    Google Scholar 

  39. Liu, L. et al. Multiomics analysis reveals signatures of selection and loci associated with complex traits in pigs. Imeta 3, e250, https://doi.org/10.1002/imt2.250 (2024).

    Google Scholar 

  40. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nature Methods 12, 357–360, https://doi.org/10.1038/nmeth.3317 (2015).

    Google Scholar 

  41. Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biology 20, 278, https://doi.org/10.1186/s13059-019-1910-1 (2019).

    Google Scholar 

  42. Keilwagen, J., Hartung, F. & Grau, J. GeMoMa: Homology-Based Gene Prediction Utilizing Intron Position Conservation and RNA-seq Data. Methods Mol Biol 1962, 161–177, https://doi.org/10.1007/978-1-4939-9173-0_9 (2019).

    Google Scholar 

  43. Haas, B. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research 31, 5654–5666, https://doi.org/10.1093/nar/gkg770 (2003).

    Google Scholar 

  44. Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Research 27, 49–54, https://doi.org/10.1093/nar/27.1.49 (1999).

    Google Scholar 

  45. Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research 28, 27–30, https://doi.org/10.1093/nar/28.1.27 (2000).

    Google Scholar 

  46. Tatusov, R., Galperin, M., Natale, D. & Koonin, E. The COG Database: A Tool for Genome-Scale Analysis of Protein Functions and Evolution. Nucleic Acids Research 28, https://doi.org/10.1093/nar/28.1.33 (2000).

  47. Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nature Methods 12, 59–60, https://doi.org/10.1038/nmeth.3176 (2015).

    Google Scholar 

  48. Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240, https://doi.org/10.1093/bioinformatics/btu031 (2014).

    Google Scholar 

  49. NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRP595455 (2025).

  50. NCBI GenBank. https://identifiers.org/ncbi/insdc.gca:GCA_054491055.1 (2026).

  51. Zhou, Y. Telomere-to-telomere genome assembly of Hypomesus nipponensis. figshare. Dataset. https://doi.org/10.6084/m9.figshare.29672606.v1 (2025).

  52. Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212, https://doi.org/10.1093/bioinformatics/btv351 (2015).

    Google Scholar 

  53. Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21, 245, https://doi.org/10.1186/s13059-020-02134-9 (2020).

    Google Scholar 

  54. Li, H. New strategies to improve minimap2 alignment accuracy. Bioinformatics 37, 4572–4574, https://doi.org/10.1093/bioinformatics/btab705 (2021).

    Google Scholar 

  55. Marçais, G. et al. MUMmer4: A fast and versatile genome alignment system. PLoS Comput Biol 14, e1005944, https://doi.org/10.1371/journal.pcbi.1005944 (2018).

    Google Scholar 

  56. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv: Genomics https://doi.org/10.48550/arXiv.1303.3997 (2013).

    Google Scholar 

Download references

Acknowledgements

This work was financially supported by the Earmarked Fund for the National Key R&D Program of China (Grant No. 2023YFD2400900) and the Modern Agricultural Technology System Grant (CARS-46).

Author information

Authors and Affiliations

  1. Key Laboratory of Freshwater Fisheries and Germplasm Resources Utilization, Ministry of Agriculture and Rural Affairs, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi, 214081, China

    Yanfeng Zhou, Di’an Fang, Yang You, Yulin Bai, Minying Zhang & Dongpo Xu

  2. Heilongjiang River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Harbin, 150070, China

    Fujiang Tang

  3. Yangtze River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Wuhan, China

    Xuemei Li

  4. Dalian Ocean University, Dalian, 116023, China

    Guoping Deng

Authors
  1. Yanfeng Zhou
    View author publications

    Search author on:PubMed Google Scholar

  2. Di’an Fang
    View author publications

    Search author on:PubMed Google Scholar

  3. Yang You
    View author publications

    Search author on:PubMed Google Scholar

  4. Fujiang Tang
    View author publications

    Search author on:PubMed Google Scholar

  5. Yulin Bai
    View author publications

    Search author on:PubMed Google Scholar

  6. Minying Zhang
    View author publications

    Search author on:PubMed Google Scholar

  7. Xuemei Li
    View author publications

    Search author on:PubMed Google Scholar

  8. Guoping Deng
    View author publications

    Search author on:PubMed Google Scholar

  9. Dongpo Xu
    View author publications

    Search author on:PubMed Google Scholar

Contributions

D. Xu designed and conceived the study. Y. Zhou, D. Fang, Y. You and X. Li collected the samples, conducted experiments. F. Tang, Y. Bai and M. Zhang performed bioinformatics analysis. Y. Zhou, G. Deng and D. Xu wrote and revised the manuscript. All authors have read and approved the final manuscript.

Corresponding authors

Correspondence to Yanfeng Zhou or Dongpo Xu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, Y., Fang, D., You, Y. et al. A telomere-to-telomere reference genome assembly of the Hypomesus nipponensis. Sci Data (2026). https://doi.org/10.1038/s41597-026-07078-6

Download citation

  • Received: 15 September 2025

  • Accepted: 12 March 2026

  • Published: 27 March 2026

  • DOI: https://doi.org/10.1038/s41597-026-07078-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing