Abstract
Zacco platypus is a freshwater minnow widely distributed across East Asia, noted for its high environmental adaptability, strong reproductive capacity, and ability to hybridize across genera. The type specimen was described from Japan, yet no representative genome from the Japanese lineage has been reported to date. Introduced to Taiwan in the 1980s, Z. platypus rapidly established a stable population. In this study, we assembled a chromosome-level genome from a Taiwanese population and resequenced a native Japanese individual, providing genomic evidence that the Taiwanese lineage originated from Japan. The assembly and gene annotation achieved BUSCO completeness scores of 98.9% and 96.6%, respectively, representing the highest quality reported among published Z. platypus genomes. Furthermore, we identified chromosomal structural variation among populations, and both PCA and genetic distance analyses revealed that the Japanese lineage is distinct from continental populations, indicating the importance of representative genomes across geographic lineages. This high-quality genome provides a valuable resource for future research in comparative genomics, population genetics, hybridization, speciation, and evolutionary biology.
Similar content being viewed by others
Data availability
The chromosome-level genome assembly is available at GenBank under the accession number JBHGZP00000000036. The genome annotation file has been deposited in the Figshare database37. For Taiwanese Z. platypus, the Nanopore long reads is available under accession SRR3066980338, the Illumina short reads under SRR3066944039, the Hi-C reads under SRR3066990240, and the RNA-seq reads under SRR30669437–SRR3066943941,42,43. The Illumina short reads of Japanese Z. platypus are available under accession SRR3518205844.
Code availability
No specific scripts were developed for this project. All data processing and bioinformatics analyses were conducted using publicly available software, following protocols and manuals provided by each respective tool.
References
Ma, G. C., Watanabe, K., Tsao, H. S. & Yu, H. T. Mitochondrial phylogeny reveals the artificial introduction of the pale chub (Cyprinidae) in Taiwan. Ichthyol Res 53, 323–329, https://doi.org/10.1007/s10228-006-0353-3 (2006).
Fu, S.-J., Cao, Z.-D., Yan, G.-J., Fu, C. & Pang, X. Integrating environmental variation, predation pressure, phenotypic plasticity and locomotor performance. Oecologia 173, 343–354, https://doi.org/10.1007/s00442-013-2626-7 (2013).
Arao, K. & Shimoyama, J. Hybrids between Zacco platypus and Z. temminckii from Aichi prefecture. Japan Sci Rep Toyohashi Mus Nat Hist 16, 53–54 (2006).
Liao, N. L., Huang, S. P. & Wang, T. Y. Interspecific mating behavior between introduced Zacco platypus and native Opsariichthys evolans in Taiwan. Zool Stud 59, e6, https://doi.org/10.6620/zs.2020.59-6 (2020).
Wang, C.-F., Chang, G.-C., Wang, Y.-Q., Chen, Q.-Y. & Lin, G.-Y. Do introduced Opsariichthys and native Opsariichthys interbreed naturally?, (New Taipei Municipal Ming Der High School, New Taipei City, Taiwan, 2011).
Siebold, P. F. V., Haan, W. D., Schlegel, H. & Temminck, C. J. Fauna japonica, sive, Descriptio animalium, quae in itinere per Japoniam, jussu et auspiciis, superiorum, qui summum in India Batava imperium tenent, suscepto, annis 1823-1830. Vol. v.[2] Pisces (Apud Auctorem, 1835).
Nam, S.-E. & Rhee, J.-S. Chromosomal-level genome assembly data from the pale chub, Zacco platypus (Jordan & Evermann, 1902). Data in Brief 55, 110596, https://doi.org/10.1016/j.dib.2024.110596 (2024).
Xu, X. et al. A chromosome-level genome assembly of East Asia endemic minnow Zacco platypus. Scientific Data 11, 317, https://doi.org/10.1038/s41597-024-03163-w (2024).
Perdices, A. & Coelho, M. M. Comparative phylogeography of Zacco platypus and Opsariichthys bidens (Teleostei, Cyprinidae) in China based on cytochrome b sequences. Journal of Zoological Systematics and Evolutionary Research 44, 330–338, https://doi.org/10.1111/j.1439-0469.2006.00368.x (2006).
Wick, R. R., Judd, L. M., Gorrie, C. L. & Holt, K. E. Completing bacterial genome assemblies with multiplex MinION sequencing. Microbial Genomics 3, https://doi.org/10.1099/mgen.0.000132 (2017).
Andrews, S. FastQC: a quality control tool for high throughput sequence data. (2010).
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nature Biotechnology 37, 540–546, https://doi.org/10.1038/s41587-019-0072-8 (2019).
Zimin, A. V. & Salzberg, S. L. The genome polishing tool POLCA makes fast and accurate corrections in genome assemblies. PLOS Computational Biology 16, e1007981, https://doi.org/10.1371/journal.pcbi.1007981 (2020).
Laetsch, D. & Blaxter, M. BlobTools: Interrogation of genome assemblies [version 1; peer review: 2 approved with reservations]. F1000Research 6, https://doi.org/10.12688/f1000research.12232.1 (2017).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Systems 3, 95–98, https://doi.org/10.1016/j.cels.2016.07.002 (2016).
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95, https://doi.org/10.1126/science.aal3327 (2017).
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Molecular Biology and Evolution 38, 4647–4654, https://doi.org/10.1093/molbev/msab199 (2021).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences 117, 9451–9457, https://doi.org/10.1073/pnas.1921046117 (2020).
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Research 35, W265–W268, https://doi.org/10.1093/nar/gkm286 (2007).
Ou, S. & Jiang, N. LTR_FINDER_parallel: parallelization of LTR_FINDER enabling rapid identification of long terminal repeat retrotransposons. Mobile DNA 10, 48, https://doi.org/10.1186/s13100-019-0193-0 (2019).
Ou, S. & Jiang, N. LTR_retriever: A highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiology 176, 1410–1422, https://doi.org/10.1104/pp.17.01310 (2018).
Shao, F., Wang, J., Xu, H. & Peng, Z. FishTEDB: a collective database of transposable elements identified in the complete genomes of fish. Database 2018, bax106, https://doi.org/10.1093/database/bax106 (2018).
Hubley, R. et al. The Dfam database of repetitive DNA families. Nucleic Acids Research 44, D81–D89, https://doi.org/10.1093/nar/gkv1272 (2016).
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 6, 11, https://doi.org/10.1186/s13100-015-0041-9 (2015).
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Current Protocols in Bioinformatics 25, 4.10.11–14.10.14, https://doi.org/10.1002/0471250953.bi0410s25 (2009).
Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genomics and Bioinformatics 3, lqaa108, https://doi.org/10.1093/nargab/lqaa108 (2021).
Gabriel, L., Hoff, K. J., Brůna, T., Borodovsky, M. & Stanke, M. TSEBRA: transcript selector for BRAKER. BMC Bioinformatics 22, 566, https://doi.org/10.1186/s12859-021-04482-0 (2021).
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240, https://doi.org/10.1093/bioinformatics/btu031 (2014).
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nature Genetics 25, 25–29, https://doi.org/10.1038/75556 (2000).
Kanehisa, M., Furumichi, M., Sato, Y., Matsuura, Y. & Ishiguro-Watanabe, M. KEGG: biological systems database as a model of the real world. Nucleic Acids Research 53, D672–D677, https://doi.org/10.1093/nar/gkae909 (2025).
Dierckxsens, N., Mardulyn, P. & Smits, G. NOVOPlasty: de novo assembly of organelle genomes from whole genome data. Nucleic Acids Research 45, e18–e18, https://doi.org/10.1093/nar/gkw955 (2017).
Marçais, G. et al. MUMmer4: A fast and versatile genome alignment system. PLOS Computational Biology 14, e1005944, https://doi.org/10.1371/journal.pcbi.1005944 (2018).
Purcell, S. et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics 81, 559–575, https://doi.org/10.1086/519795 (2007).
Tai, J.-H. et al. The VCF file of four Zacco platypus lineages. figshare. https://doi.org/10.6084/m9.figshare.30600653 (2025).
He, W. et al. VCF2PCACluster: a simple, fast and memory-efficient tool for principal component analysis of tens of millions of SNPs. BMC Bioinformatics 25, 173, https://doi.org/10.1186/s12859-024-05770-1 (2024).
Wang, T.-Y. et al. Zacco platypus isolate ZpHy926, whole genome shotgun sequencing project. GenBank https://identifiers.org/ncbi/insdc:JBHGZP000000000 (2025).
Tai, J.-H. et al. Annotation files of Japan lineage Zacco platypus from Taiwan. figshare. https://doi.org/10.6084/m9.figshare.30011686.v1 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR30669803 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR30669440 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR30669902 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR30669437 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR30669438 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR30669439 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR35182058 (2025).
Acknowledgements
This study was supported by grants from the National Science and Technology Council (NSTC), Taiwan (113-2327-B-002-003-, MOST 109-2311-B-002-023-MY3, MOST 105-2311-B-001-064, MOST 106-2311-B-001-022, and MOST 107-2311-B-001-007), and National Taiwan University (113L7223). It was also partially supported by the Taiwan BioGenome Project, funded by Academia Sinica, Taiwan (AS-Grant 23-23). Additional support was provided through the National Key Area International Cooperation Alliance: University Academic Alliance in Taiwan (UAAT) - Kyushu-Okinawa Open University (KOOU) - Medicine and Life Sciences Integrative Program, funded by the Ministry of Education, Taiwan, to promote international collaboration in cutting-edge research.
Author information
Authors and Affiliations
Contributions
H.Y.W. and T.Y.W. conceived and designed the study. T.Y.W., S.P.H., F.Y.W., Y.T., R.T., J.H.T. and T.H.Y. collected samples. J.H.T. and T.H.Y. performed the data analysis. T.Y.W. and Y.T. conducted experiments. J.H.T. wrote the manuscript. H.Y.W. and T.Y.W. revised the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Tai, JH., Yu, TH., Wang, FY. et al. Chromosome-Level Genome Assembly of the Japanese Zacco platypus for Comparative Genomics. Sci Data (2025). https://doi.org/10.1038/s41597-025-06467-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-025-06467-7


