Abstract
A small cold-water teleost endemic to Northeast Asia, Hypomesus nipponensis possesses a short lifecycle, high fecundity, and rapid population growth, with extensive introductions for aquacultural purposes across East Asia. In this study, we generated a gap-free, telomere-to-telomere (T2T) genome assembly of H. nipponensis using a combined sequencing strategy, incorporating MGI short reads, PacBio High-Fidelity (HiFi) reads, Oxford Nanopore Technologies (ONT) ultra-long reads, and Hi-C data. The final assembly spans 526.31 Mb with a contig N50 of 20.23 Mb, and all genomic sequences were successfully anchored to 28 pseudochromosomes. BUSCO assessment (Actinopterygii_odb10) confirms 98.19% completeness, including 3,548 single-copy and 26 duplicated orthologs out of 3,640 conserved genes. Repeat elements account for 39.17% (206.18 Mb) of the genome, and 31,310 protein-coding genes are annotated. This gap-free T2T assembly resolves previously uncharacterized genomic regions, providing a high-quality reference for molecular breeding, evolutionary analyses of the Hypomesus genus, and functional investigations into adaptive traits of cold-water fishes.
Similar content being viewed by others
Data availability
All data supporting this study have been publicly available. Raw sequencing data have been deposited in the NCBI Sequence Read Archive (SRA) database under the BioProject id PRJNA128279649, including RNA-seq data (SRR34259912 to SRR34259916), MGI genome survey data (SRR34259908), Hi-C reads (SRR34259911), Nanopore long-read data (SRR34259910) and PacBio long-read data (SRR34259909). The genome assembly has been deposited at the NCBl GenBank under the accession number of GCA_054491055.150. The genome assembly and gene structure annotation are also available on Figshare (https://doi.org/10.6084/m9.figshare.29672606.v1)51.
Code availability
All scripts and pipelines used for the genome assembly and gene annotation followed the standard manuals and protocols of the applied bioinformatics software. No specific code was developed for this study.
References
Sakamoto, D. et al. Population size estimation of the pond smelt Hypomesus nipponensis in Lake Kasumigaura and Lake Kitaura, Japan. Fisheries Science 80, 907–914, https://doi.org/10.1007/s12562-014-0791-1 (2014).
Xie, Y. et al. The fishes of genus Hypomesus and utilization of its resource (in Chinese) (Liaoning Science and Technology Press, 1992).
Yin, C., Chen, Y., Guo, L. & Ni, L. Fish Assemblage Shift after Japanese Smelt (Hypomesus nipponensis McAllister, 1963) Invasion in Lake Erhai, a Subtropical Plateau Lake in China. Water 13, 1800, https://doi.org/10.3390/w13131800 (2021).
Choi, S. & Kim, E. B. Complete mitochondrial genome sequence and SNPs of the Korean smelt Hypomesus nipponensis (Osmeriformes, Osmeridae). Mitochondrial DNA Part B 4, 1844–1845, https://doi.org/10.1080/23802359.2019.1613178 (2019).
Xuan, B. et al. Draft genome of the Korean smelt Hypomesus nipponensis and its transcriptomic responses to heat stress in the liver and muscle. G3 (Bethesda) 11, https://doi.org/10.1093/g3journal/jkab147 (2021).
Zhu, C., Kuang, Y., Li, Z. & Tang, F. Chromosome-level draft genome assembly of Hypomesus nipponensis reveals transposable element expansion reshaping the genome structure. Front Genet 16, 1502681, https://doi.org/10.3389/fgene.2025.1502681 (2025).
Shay, J. W. & Wright, W. E. Telomeres and telomerase: three decades of progress. Nat Rev Genet 20, 299–309, https://doi.org/10.1038/s41576-019-0099-1 (2019).
Wu, M. et al. Segrosome assembly at the pliable parH centromere. Nucleic Acids Res 39, 5082–5097, https://doi.org/10.1093/nar/gkr115 (2011).
Jain, M. et al. Linear assembly of a human centromere on the Y chromosome. Nature Biotechnology 36, 321–323, https://doi.org/10.1038/nbt.4109 (2018).
Vollger, M. R. et al. Segmental duplications and their variation in a complete human genome. Science 376, eabj6965, https://doi.org/10.1126/science.abj6965 (2022).
Yin, D. et al. Telomere-to-telomere gap-free genome assembly of the endangered Yangtze finless porpoise and East Asian finless porpoise. GigaScience 13, https://doi.org/10.1093/gigascience/giae067 (2024).
Zhou, Y. et al. Gap-free genome assembly of Salangid icefish Neosalanx taihuensis. Scientific Data 10, 768, https://doi.org/10.1038/s41597-023-02677-z (2023).
Zhou, Y. et al. Telomere-to-telomere genome and resequencing of 231 individuals reveal evolution, genomic footprints in Asian icefish, Protosalanx chinensis. GigaScience 14, https://doi.org/10.1093/gigascience/giaf067 (2025).
Jiang, M. et al. The telomere-to-telomere gap-free reference genome and taxonomic reassessment of Siniperca roulei. GigaScience 14, https://doi.org/10.1093/gigascience/giaf068 (2025).
Cheng, H. et al. Efficient near telomere-to-telomere assembly of Nanopore Simplex reads. bioRxiv, https://doi.org/10.1101/2025.04.14.648685 (2025).
Healey, A., Furtado, A., Cooper, T. & Henry, R. J. Protocol: a simple method for extracting next-generation sequencing quality genomic DNA from recalcitrant plant species. Plant Methods 10, 21, https://doi.org/10.1186/1746-4811-10-21 (2014).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890, https://doi.org/10.1093/bioinformatics/bty560 (2018).
Rhoads, A. & Au, K. F. PacBio Sequencing and Its Applications. Genomics Proteomics Bioinformatics 13, 278–289, https://doi.org/10.1016/j.gpb.2015.08.002 (2015).
Zhu, W. et al. Altered chromatin compaction and histone methylation drive non-additive gene expression in an interspecific Arabidopsis hybrid. Genome Biology 18, 157, https://doi.org/10.1186/s13059-017-1281-4 (2017).
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).
Sun, H., Ding, J., Piednoël, M. & Schneeberger, K. findGSE: estimating genome size variation within human and Arabidopsis using k-mer frequencies. Bioinformatics (Oxford, England) 34, 550–557, https://doi.org/10.1093/bioinformatics/btx637 (2018).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5 (2021).
Hu, J. et al. NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads. Genome Biology 25, 107, https://doi.org/10.1186/s13059-024-03252-4 (2024).
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963, https://doi.org/10.1371/journal.pone.0112963 (2014).
Roach, M. J., Schmidt, S. A. & Borneman, A. R. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19, 460, https://doi.org/10.1186/s12859-018-2485-7 (2018).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357–359, https://doi.org/10.1038/nmeth.1923 (2012).
Servant, N. et al. HiC-Pro: An optimized and flexible pipeline for Hi-C data processing. Genome Biology 16, https://doi.org/10.1186/s13059-015-0831-x (2015).
Durand, N. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Systems 3, 95–98, https://doi.org/10.1016/j.cels.2016.07.002 (2016).
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, eaal3327, https://doi.org/10.1126/science.aal3327 (2017).
Durand, N. C. et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell systems 3, 99–101, https://doi.org/10.1016/j.cels.2015.07.012 (2016).
Wang, G. & Yu, W. J. A preliminary study on the karyotype of Hypomesus olidus. Salmon Fishery 2(1), n.p. (in Chinese) (1989).
Xu, G. C. et al. LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly. Gigascience 8, https://doi.org/10.1093/gigascience/giy157 (2019).
Lin, Y. et al. quarTeT: a telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification. Hortic Res, https://doi.org/10.1093/hr/uhad127 (2023).
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110, 462–467, https://doi.org/10.1159/000084979 (2005).
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics (Oxford, England) 21(Suppl 1), i351–358, https://doi.org/10.1093/bioinformatics/bti1018 (2005).
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic acids research 35, W265–268, https://doi.org/10.1093/nar/gkm286 (2007).
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics Chapter 4, 4.10.11–14.10.14, https://doi.org/10.1002/0471250953.bi0410s25 (2009).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research 27, 573–580, https://doi.org/10.1093/nar/27.2.573 (1999).
Liu, L. et al. Multiomics analysis reveals signatures of selection and loci associated with complex traits in pigs. Imeta 3, e250, https://doi.org/10.1002/imt2.250 (2024).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nature Methods 12, 357–360, https://doi.org/10.1038/nmeth.3317 (2015).
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biology 20, 278, https://doi.org/10.1186/s13059-019-1910-1 (2019).
Keilwagen, J., Hartung, F. & Grau, J. GeMoMa: Homology-Based Gene Prediction Utilizing Intron Position Conservation and RNA-seq Data. Methods Mol Biol 1962, 161–177, https://doi.org/10.1007/978-1-4939-9173-0_9 (2019).
Haas, B. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research 31, 5654–5666, https://doi.org/10.1093/nar/gkg770 (2003).
Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Research 27, 49–54, https://doi.org/10.1093/nar/27.1.49 (1999).
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research 28, 27–30, https://doi.org/10.1093/nar/28.1.27 (2000).
Tatusov, R., Galperin, M., Natale, D. & Koonin, E. The COG Database: A Tool for Genome-Scale Analysis of Protein Functions and Evolution. Nucleic Acids Research 28, https://doi.org/10.1093/nar/28.1.33 (2000).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nature Methods 12, 59–60, https://doi.org/10.1038/nmeth.3176 (2015).
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240, https://doi.org/10.1093/bioinformatics/btu031 (2014).
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRP595455 (2025).
NCBI GenBank. https://identifiers.org/ncbi/insdc.gca:GCA_054491055.1 (2026).
Zhou, Y. Telomere-to-telomere genome assembly of Hypomesus nipponensis. figshare. Dataset. https://doi.org/10.6084/m9.figshare.29672606.v1 (2025).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212, https://doi.org/10.1093/bioinformatics/btv351 (2015).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21, 245, https://doi.org/10.1186/s13059-020-02134-9 (2020).
Li, H. New strategies to improve minimap2 alignment accuracy. Bioinformatics 37, 4572–4574, https://doi.org/10.1093/bioinformatics/btab705 (2021).
Marçais, G. et al. MUMmer4: A fast and versatile genome alignment system. PLoS Comput Biol 14, e1005944, https://doi.org/10.1371/journal.pcbi.1005944 (2018).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv: Genomics https://doi.org/10.48550/arXiv.1303.3997 (2013).
Acknowledgements
This work was financially supported by the Earmarked Fund for the National Key R&D Program of China (Grant No. 2023YFD2400900) and the Modern Agricultural Technology System Grant (CARS-46).
Author information
Authors and Affiliations
Contributions
D. Xu designed and conceived the study. Y. Zhou, D. Fang, Y. You and X. Li collected the samples, conducted experiments. F. Tang, Y. Bai and M. Zhang performed bioinformatics analysis. Y. Zhou, G. Deng and D. Xu wrote and revised the manuscript. All authors have read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhou, Y., Fang, D., You, Y. et al. A telomere-to-telomere reference genome assembly of the Hypomesus nipponensis. Sci Data (2026). https://doi.org/10.1038/s41597-026-07078-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-026-07078-6


