A telomere-to-telomere gap-free genome assembly of the endangered humphead wrasse (Cheilinus undulatus)

Zhang, Kai; Chen, Jianchao; Duan, Binwei; You, Canbei; Zhou, Wenchuan; Zhao, Yanfei; Yang, Shaosen; Wu, Jinhui; Shi, Qiong

doi:10.1038/s41597-025-05475-x

Download PDF

Data Descriptor
Open access
Published: 11 July 2025

A telomere-to-telomere gap-free genome assembly of the endangered humphead wrasse (Cheilinus undulatus)

Kai Zhang ORCID: orcid.org/0000-0002-3225-5994¹,
Jianchao Chen¹,
Binwei Duan¹,
Canbei You¹,
Wenchuan Zhou²,
Yanfei Zhao³,
Shaosen Yang³,
Jinhui Wu³ &
…
Qiong Shi ORCID: orcid.org/0000-0002-6358-976X^1,4

Scientific Data volume 12, Article number: 1194 (2025) Cite this article

3055 Accesses
3 Altmetric
Metrics details

Subjects

Abstract

Humphead wrasse, Cheilinus undulatus, is an endangered fish species with high economic and ecological value as well as natural sex change from female to male, while sexual selection occurs in breeding aggregations. In our present study, we constructed the first gap-free telomere-to-telomere (T2T) genome assembly for humphead wrasse, by integration of PacBio HiFi, ONT Ultra-long and Hi-C sequencing techniques. With 99% of the entire sequences anchored into 24 chromosomes, this haplotypic genome assembly spans approximately 1.25 Gb and presents a complete set of 48 telomeres and 24 centromeres. In terms of correctness (quality value QV: 53.447) and completeness (BUSCO score: 99.3%), this chromosome-scale assembly is indeed of high quality. We predicted 658.03 Mb of repetitive sequences and annotated 26,609 protein-coding genes in the assembled genome. This high-quality T2T genome assembly not only facilitates the genetic conservation of humphead wrasse, but also offers fundamental genomic data for supporting in-depth investigations on functional genomics, genetic diversity, and selective breeding for this economically important teleost.

A telomere-to-telomere reference genome assembly of the Hypomesus nipponensis

Article Open access 27 March 2026

A complete telomere-to-telomere chromosome-level genome assembly of X-ray tetra (Pristella maxillaris)

Article Open access 24 March 2025

A telomere-to-telomere genome assembly of the protandrous hermaphrodite blackhead seabream, Acanthopagrus schlegelii

Article Open access 27 February 2025

Background & Summary

Humphead wrasse (Cheilinus undulates), also commonly known as Maori or Napoleon wrasse, is an endangered fish species with significant ecological importance for coral reef ecosystems. As a member of the Labridae family within the order Perciformes, it is characterized by large size and striking appearance^1,2. Moreover, it has a sparse population, slow growth, and a long lifespan (over 30 years), as well as intricate reproductive behaviors, but it significantly contributes to bioerosion and sand production³. Previous investigations indicate that its populations are declining alarmingly because of ongoing overharvesting, habitat degradation, and climate change impacts^4,5. Humphead wrasse has been classified as ‘Endangered’ on the IUCN (International Union for Conservation of Nature) Red List and is included in the CITES (Convention on International Trade in Endangered Species of Wild Fauna and Flora) Appendix II⁶.

Like many other environmentally-sensitive sex-changing species, humphead wrasse exhibits protogynous hermaphroditism, i.e., transitioning from female to male at approximately 8-9 years old, after attaining female sexual maturity at 5–7 years^7,8. Males usually follow two distinct developmental pathways to achieve diandry, either developing directly from juveniles into small males (smaller than the smallest mature female) or transitioning from adult females through sex change to become large males (exceeding female size)⁴. However, the detailed molecular mechanisms of its sex change remain largely unknown. Additionally, due to its unique visual system and fused pharyngeal bones, it always serves as an excellent model organism for studying opsin evolution in coral reef fishes, and for comparative studies with other fish genomes to demonstrate specific opsin gene expansions in humphead wrasse⁹.

In a previous genome study for humphead wrasse, a draft chromosome-level genome assembly was reported⁹. Nevertheless, this assembly version contains excessive gaps accompanied by low BUSCO values, leading to significant fragment loss that impairs both genome completeness and annotation accuracy. Here, utilizing cutting-edge high-throughput sequencing platforms including PacBio HiFi and Oxford Nanopore Technologies (ONT) ultra-long technology, we produced a refined telomere-to-telomere (T2T) chromosome-level genome assembly for humphead wrasse. This improved assembly demonstrates superior scaffold N50 and BUSCO scores, as well as gap-free genome sequence with encouraging details of telomeres and centromeres. This new genome assembly not only provides a valuable genetic resource for in-depth investigations on population genetics and conservation biology of humphead wrasse, but also supports comparative and molecular studies on the regulation of natural sex change and opsin evolution in various vertebrates.

Methods

Sample collection

We obtained an adult humphead wrasse (Fig. 1a) from Guangdong Marine Fisheries Experimental Centre, an offsite facility of the Agro-Tech Extension Center of Guangdong Province, which is situated in Huizhou city, Guangdong province, China. Muscle tissue was collected for whole-genome sequencing, and ten tissues (including intestine, spleen, lung, heart, liver, muscle, gill, eye, skin, and gonad) were sampled for transcriptome sequencing. The sampling procedure and experimental workflow were performed in accordance with the guidelines and approval from the Animal Ethics Committee of Shenzhen University (Shenzhen, China).

DNA extraction and genome sequencing

Genomic DNA (gDNA) was extracted from the muscle tissue using a modified CTAB method¹⁰. The extracted gDNA was used for construction of a BGISeq DNA PCRfree library, which was then sequenced on a BGI T7 platform (MGI, Shenzhen, China). A total of 57.98 Gb of raw reads (150 bp in length) were generated, among them low-quality reads and adaptor sequences were filtered using Trimmomatic (v0.40)¹¹ with default settings. Finally, we obtained 56.64 Gb of clean reads for estimating genome size and assembling sequences.

Moreover, we prepared long-read libraries using the PacBio Sequel II System and SMRTbell Express Template Prep Kit 3.0 (Pacific Biosciences, Menlo Park, CA, USA) for HiFi sequencing. The CCS software (SMRT Link v9.0)¹² was then applied to generate consensus sequences. In this study we yielded approximately 113.26 Gb of consensus reads, with an average length of 19.07 kb.

ONT technology was applied by construction of an ultra-long library and then sequencing of one flow cell on a PromethION platform (Oxford Nanopore Technologies Co., UK). The raw reads were first refined to remove those with quality value (QV) below 7. Subsequently, Porechop (https://github.com/rrwick/Porechop) was applied to eliminate adaptors, and Filtlong (https://github.com/rrwick/Filtlong) was employed to remove those reads shorter than 30 kb and mean read quality scores less than 90%. Finally, we obtained a total of 27.88 Gb clean reads, with an average read length of 96.15 kb and an N50 length of 100 kb.

DNA libraries for Hi-C sequencing were constructed with a GrandOmics Hi-C kit (GrandOmics, China), employing DpnII as the restriction enzyme, in accordance with the manufacturer’s instructions. By using the Illumina Novaseq system (Illumina Inc., San Diego, CA, USA), we produced 74.82 Gb of raw reads from the Hi-C libraries. We then employed Trimmomatic (v0.4)¹¹ to remove low-quality reads (quality scores <20), adapter sequences, and reads shorter than 36 bp. After filtering, 71.53 Gb of clean data were available for subsequent chromosome scaffolding.

RNA extraction and transcriptome sequencing

We used poly‐T oligo‐attached magnetic beads to purify mRNAs from the lung, heart, liver, muscle, gill, eye, skin, and gonad tissues. Sequencing libraries were generated from the purified mRNAs using the VAHTS Universal V6 RNA-seq Library Kit for MGI (Vazyme, Nanjing, China) following the manufacturer’s recommendations with unique index codes. Library quantification and size were assessed using Qubit 3.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) and Bioanalyzer 2100 system (Agilent Technologies, CA, USA). Subsequently, sequencing was performed on a MGI-SEQ. 2000 platform by Frasergen Bioinformatics Co. Ltd. (Wuhan, China).

To obtain clean reads, adaptor sequences and low-quality raw reads were filtered via SOAPfilter (v2.2)¹³ with default parameters. In the end, the clean reads of the ten tissues (namely the intestine, spleen, lung, heart, liver, muscle, gill, eye, skin, and gonad) were 5.77, 5.58, 6.29, 8.56, 5.64, 8.38, 6.10, 6.14, 5.96 and 13.58 Gb, respectively. These retained data were collected for annotation of gene structures.

Genome assembly

Genome-size estimation

We employed Jellyfish (v2.2.6)¹⁴ and GenomeScope (v2.0)¹⁵ to analyze the K-mer frequency distribution of the BGI clean reads. Our results showed that the humphead wrasse genome was estimated to be 1.17 Gb in length, with a genomic heterozygosity rate of 0.27% (Fig. 1b).

De novo genome assembly

We applied HiFiasm (v0.19.5)¹⁶ to assemble HiFi and ONT long reads into contigs, which were then polished using T2T-polish¹⁷ with the optimized parameter set to task = best using the BGI short reads. The initial genome assembly had a total length of 1.253 Gb, with a contig N50 of 54.5 Mb and an organized of 53 contigs.

Construction of chromosomes and gap filling

Hi-C reads were aligned to the primary genome assembly using Bowtie2 (v2.3.2)¹⁸, followed by identification of valid contact paired reads through the HiC-Pro (v2.8.1) pipeline¹⁹. The assembled contigs were anchored to chromosomes using these Hi-C valid reads through the 3D-DNA pipeline²⁰ with the parameter -r 0, followed by manual refinement of the chromosome-level scaffolds in JuiceBox²¹. To close nucleotide gaps in the chromosome-level genome assembly, we utilized TGS-GapCloser (v1.1.1)²² with default parameters, leveraging both HiFi and ONT long reads. The final genome assembly spans 1.25 Gb, with 99% of the primary sequences anchored to 24 chromosomes, achieving a contig N50 of 54.51 Mb (Fig. 1c,d and Table 1).

Table 1 Positions of telomeres and centromeres across all chromosomes in the assembled genome of the humphead wrasse.

Full size table

Identification of centromere and telomere sequences

Telomere sequences were identified by detecting (TTAGGG/CCCTAA) repeats in telomeric regions, while centromeres were localized using the Centromics program (https://github.com/ShuaiNIEgithub/Centromics) to analyze HiFi sequencing data, Hi-C data, and the final genome assembly. Finally, we revealed that the humphead wrasse chromosomes possessed 48 telomeres and 24 centromeres (see more details in Fig. 2 and Table 1).

Genome annotation

Repeat annotation. Tandem repeats were identified using Tandem Repeats Finder (TRF, v4.09.1)²³ with the following parameters: 2 7 7 80 10 50 2000. Moreover, transposable elements (TEs) were detected through an integration of de novo prediction and homology searches at both DNA and protein levels. LTR retrotransposons were initially identified using LTR_FINDER (v1.0.7)²⁴ at the DNA level, while RepeatModeler (v2.0.1)²⁵ generated a classified de novo repeat library. Subsequently, RepeatMasker (v4.1.2)²⁶ performed comparative analyses against both the Repbase TE database²⁷ and the newly constructed repeat library. Protein-level TE annotation was performed using RepeatProteinMask²⁶ against the transposable element protein database. A total of 658.03 Mb repetitive sequences were detected in the humphead wrasse genome assembly (Table 2).

Table 2 The proportion of repetitive sequences identified in the humphead wrasse genome assembly.

Full size table

Gene annotation

Protein-coding genes were predicted using a combination of homology-based, ab initio and transcriptome-assisted annotation approaches. The homology-based annotation was initiated by performing Tblastn (v2.11.0+)²⁸ searches against our assembly using protein sequences from four representative species, including yellowfin seabream (Acanthopagrus latus), sharksucker (Echeneis naucrates), zebrafish (Danio rerio), and medaka (Oryzias latipes). The high-quality alignments were subsequently refined using Exonerate (v2.4.0)²⁹ for precise gene model prediction. The de novo annotation was performed using Augustus (v3.4.0)³⁰ and GlimmerHMM (v3.0.4)³¹. For the transcriptome-assisted annotation, RNA-seq reads were first aligned to the reference genome using HiSat2 (v2.2.1)³², followed by transcript assembly through a genome-guided approach implemented in StringTie (v2.1.7)³³. Predicted gene models were integrated and refined using MAKER (v3.01.03)³⁴ to generate a non-redundant gene set. Final annotation improvements, including UTR annotation and alternative splicing variant prediction, were accomplished through the PASA pipeline (v2.4.1)³⁵. Ultimately, we annotated a total of 25,064 protein-coding genes, with an average gene length of 27.84 kb and a mean coding sequence (CDS) size of 1,745.64 bp (see Table 3).

Table 3 Gene structures and functional annotations.

Full size table

Functional annotations

Functional annotation was performed by aligning protein sequences against multiple databases (NCBI NR, KEGG³⁶, GO³⁷, TrEMBL and Swiss-Prot³⁸) using DIAMOND BLASTP (v2.0.7)³⁹, with assignments based on best matches. Functional annotations were assigned to 24,789 genes (98.90%) with supportive evidence from at least one database (see more details in Table 3).

Annotation of non-coding RNA genes

tRNAscan-SE (v2.0.9)⁴⁰ with default settings was utilized to detect tRNA genes. Moreover, we applied RNAmmer (v1.2)⁴¹ to identify rRNA sequences. Annotation of MiRNA and snRNA genes was performed using Infernal (v1.1.2)⁴² through homology searches against the Rfam database (v14.6)⁴³. Finally, a total of 2,221 rRNAs, 3,020 tRNAs, 781 miRNAs, and 646 snRNAs were predicted (see Table 4 for more details).

Table 4 Statistics of the non-coding RNA annotations.

Full size table

Data Records

All genomic data are publically available from China National GeneBank DataBase (CNGBdb) under the project ID no. CRA023609⁴⁴. The genome assembly has been submitted to the GenBank database with the accession number JBMUSF010000000⁴⁵. In addition, comprehensive documentation regarding the genome assembly, gene structures, functional annotations, and repeat elements of humphead wrasse has been deposited on Figshare⁴⁶.

Technical Validation

Evaluation of the genome assembly

Genome completeness was assessed using BUSCO (v5.2.2)⁴⁷ against the actinopterygii_odb10 database (3,640 single-copy orthologs). Our results demonstrated 98.9% complete gene coverage (including 98.4% single-copy and 0.5% duplicated genes), with only 0.4% fragmented sequences (see Table 5). Moreover, the Merqury (v1.3)⁴⁸ analysis estimated a genome assembly quality value of 53.45. Genome assembly accuracy was evaluated by aligning sequencing datasets, revealing mapping rates of 96.58% (RNA-Seq reads), 99.64% (for BGI reads), 99.93% (for PacBio reads), and 100% (for ONT reads). These analyses collectively validate the high-quality of this humphead wrasse genome assembly.3

Table 5 Statistics of BUSCO results of the T2T genome assembly.

Full size table

Collinearity analysis

GenomeSyn (v1.2.7)⁴⁹ was employed for whole-genome synteny comparison between the newly assembled genome and the previously published version (GCF_018320785.1)⁹. Our findings revealed good one-to-one chromosomal synteny between both assemblies (Fig. 3), which further validates that our present assembly of the humphead wrasse genome is indeed of high quality.

Code availability

This study did not employ any custom code. In cases where specific parameters were unavailable for any software type, the default settings recommended by the developers were applied.

References

Oktaviani, D. et al. Initiating Napoleon wrasse (Cheilinus undulatus Ruppell, 1835) as watching species object in Banda Islands marine ecotourism. IOP Conference Series: Earth and Environmental Science 800, 012053 (2021).
Google Scholar
Salvador, M. L. et al. Intact shallow and mesophotic assemblages of large carnivorous reef fishes underscore the importance of large and remote protected areas in the Coral Triangle. Aquatic Conserv: Mar Freshw Ecosyst 34, e4108 (2024).
Article Google Scholar
Friedlander, A. M. et al. Assessing and managing charismatic marine megafauna in Palau: Bumphead parrotfish (Bolbometopon muricatum) and Napoleon wrasse (Cheilinus undulatus). Aquatic Conserv: Mar Freshw Ecosyst 33, 349–365 (2023).
Article ADS Google Scholar
Sadovy, Y. et al. The humphead wrasse, Cheilinus undulatus: Synopsis of a threatened and poorly known giant coral reef fish. Rev Fish Biol Fisher 13, 327–364 (2003).
Article Google Scholar
Russell, B. Cheilinus undulatus. The IUCN Red List of Threatened Species 2004, e.T4592A11023949 (2004).
Donaldson, T. J. & Sadovy, Y. Threatened fishes of the world: Cheilinus undulatus Rüppell, 1835 (Labridae). Environ Biol Fish 62, 428 (2001).
Article Google Scholar
Sadovy de Mitcheson, Y., Liu, M. & Suharti, S. Gonadal development in a giant threatened reef fish, the humphead wrasse Cheilinus undulatus, and its relationship to international trade. J Fish Biol. 77, 706–718 (2010).
Article CAS PubMed Google Scholar
Ji, X. et al. Identification of SF-1 and FOXL2 and their effect on activating P450 aromatase transcription via specific binding to the promoter motifs in sex reversing Cheilinus undulatus. Front Endocrinol 13, 863360 (2022).
Article Google Scholar
Liu, D. et al. Chromosome‐level genome assembly of the endangered humphead wrasse Cheilinus undulatus: Insight into the expansion of opsin genes in fishes. Mol Ecol Resour 21, 2388–2406 (2021).
Article CAS PubMed Google Scholar
Gelvin, S. B., Schilperoort R. A. Plant molecular biology manual. Springer (2012).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS PubMed PubMed Central Google Scholar
Rhoads, A. & Au, K. F. PacBio sequencing and its applications. Genom. Proteom. Bioinfor. 13, 278–289 (2015).
Article Google Scholar
Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012).
Article PubMed PubMed Central Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Article PubMed PubMed Central Google Scholar
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1432 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with HiFiasm. Nat. Methods 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Cartney, A. M. et al. Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies. Nat Methods 19, 687–695 (2022).
Article Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Article PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 3, 95–98 (2016).
Article CAS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).
Article CAS PubMed PubMed Central Google Scholar
Xu, M. et al. TGS-GapCloser: fast and accurately passing through the Bermuda in large genome using error-prone third-generation long reads. BioRxiv 831248 (2019).
Benson, G. Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Article CAS PubMed PubMed Central Google Scholar
Xu, Z. & Wang, H. LTR-FINDER: An efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, 265–268 (2007).
Article Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. PNAS 117, 9451–9457 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinforma. Chapter 4, 4.10.1–4.10.14 (2009).
Google Scholar
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. Dna 6, 1–6 (2015).
Article Google Scholar
Gertz, E. M., Yu, Y. K., Agarwala, R., Schäffer, A. A. & Altschul, S. F. Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST. BMC Biol. 4, 41 (2006).
Article PubMed PubMed Central Google Scholar
Slater, G. S. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31 (2005).
Article PubMed PubMed Central Google Scholar
Stanke, M. & Morgenstern, B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user defined constraints. Nucleic Acids Res. 33, W465–W467 (2005).
Article CAS PubMed PubMed Central Google Scholar
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan. & GlimmerHMM: two open-source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).
Article CAS PubMed Google Scholar
Kim, D. et al. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 37, 907–915 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).
Article CAS PubMed PubMed Central Google Scholar
Cantarel, B. L. et al. MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18, 188–196 (2008).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, 1–22 (2008).
Article Google Scholar
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40, D109–D114 (2012).
Article CAS PubMed Google Scholar
Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Article CAS PubMed PubMed Central Google Scholar
Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000).
Article CAS PubMed PubMed Central Google Scholar
Buchfink, B., Reuter, K. & Drost, H. G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods 18, 366–368 (2021).
Article CAS PubMed PubMed Central Google Scholar
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
Article CAS PubMed PubMed Central Google Scholar
Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35, 3100–3108 (2007).
Article ADS CAS PubMed PubMed Central Google Scholar
Nawrocki, E. P., Kolbe, D. L. & Eddy, S. R. Infernal 1.0: inference of RNA alignments. Bioinformatics 25, 1335–1337 (2009).
Article CAS PubMed PubMed Central Google Scholar
Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 49, D192–D200 (2021).
Article CAS PubMed Google Scholar
NCDC GSA https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA036754 (2025).
Zhang, K. et al. Cheilinus undulatus isolate JC-2025a, whole genome shotgun sequencing project. GenBank https://identifiers.org/ncbi/insdc:JBMUSF010000000 (2025).
Zhang, K. et al. The genome annotation files of Cheilinus undulatus. figshare https://doi.org/10.6084/m9.figshare.28887965 (2025).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article PubMed Google Scholar
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zhou, Z. et al. GenomeSyn: a bioinformatics tool for visualizing genome synteny and structural variations. Journal of genetics and genomics 49, 1174–1176 (2022).
Article PubMed Google Scholar

Download references

Acknowledgements

This project was supported by Research on breeding technology of candidate species for Guangdong modern marine ranching (No. 2024-MRB-00-001), Guangdong Basic and Applied Basic Research Foundation (No. 2023A1515110554), Shenzhen Science and Technology Program (No. 827-0001055), and Research Initiation Fund for Young Faculty Members at Shenzhen University (No. 000001032214).

Author information

Authors and Affiliations

Laboratory of Aquatic Genomics, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, 518057, China
Kai Zhang, Jianchao Chen, Binwei Duan, Canbei You & Qiong Shi
Shenzhen Ocean Development Promotion Center, Shenzhen, 518067, China
Wenchuan Zhou
Agro-Tech Extension Center of Guangdong Province, Guangzhou, 510225, China
Yanfei Zhao, Shaosen Yang & Jinhui Wu
Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, Shenzhen, 518081, China
Qiong Shi

Authors

Kai Zhang
View author publications
Search author on:PubMed Google Scholar
Jianchao Chen
View author publications
Search author on:PubMed Google Scholar
Binwei Duan
View author publications
Search author on:PubMed Google Scholar
Canbei You
View author publications
Search author on:PubMed Google Scholar
Wenchuan Zhou
View author publications
Search author on:PubMed Google Scholar
Yanfei Zhao
View author publications
Search author on:PubMed Google Scholar
Shaosen Yang
View author publications
Search author on:PubMed Google Scholar
Jinhui Wu
View author publications
Search author on:PubMed Google Scholar
Qiong Shi
View author publications
Search author on:PubMed Google Scholar

Contributions

Q.S. and J.W. conceived this study. K.Z. performed data analysis; J.C., S.Y. and Y.Z. participated in the collection of samples; J.C., B.D., C.Y. and W.Z. provided research advice; K.Z. wrote the draft manuscript. Q.S. and J.W. revised the manuscript. All authors have read and approved the final manuscript for publication.

Corresponding authors

Correspondence to Jinhui Wu or Qiong Shi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhang, K., Chen, J., Duan, B. et al. A telomere-to-telomere gap-free genome assembly of the endangered humphead wrasse (Cheilinus undulatus). Sci Data 12, 1194 (2025). https://doi.org/10.1038/s41597-025-05475-x

Download citation

Received: 04 May 2025
Accepted: 26 June 2025
Published: 11 July 2025
Version of record: 11 July 2025
DOI: https://doi.org/10.1038/s41597-025-05475-x