Chromosome-level genome assembly and annotation of the tropical sea cucumber Bohadschia ocellate

Huang, Qianying; Wang, Xuan; Qin, Zhou; Fan, Dingding; Ge, Hua; Lin, Yingxin; Wang, Junyan; Yang, Yun; Xie, Zhenyu; Huo, Da; Chen, Chang; Qin, Haipeng; Zhang, Xiaoli; Zhu, Xiangxing; Tang, Dongsheng; Ren, Chunhua; Hu, Chaoqun; Chen, Ting; Yan, Aifen

doi:10.1038/s41597-025-06453-z

Download PDF

Data Descriptor
Open access
Published: 17 December 2025

Chromosome-level genome assembly and annotation of the tropical sea cucumber Bohadschia ocellate

Qianying Huang¹^na1,
Xuan Wang²^na1,
Zhou Qin²,
Dingding Fan ORCID: orcid.org/0000-0002-8201-3846²,
Hua Ge¹,
Yingxin Lin¹,
Junyan Wang¹,
Yun Yang³,
Zhenyu Xie³,
Da Huo⁴,
Chang Chen⁴,
Haipeng Qin⁵,
Xiaoli Zhang¹,
Xiangxing Zhu¹,
Dongsheng Tang¹,
Chunhua Ren²,
Chaoqun Hu²,
Ting Chen ORCID: orcid.org/0000-0002-5777-909X² &
…
Aifen Yan¹

Scientific Data volume 13, Article number: 137 (2026) Cite this article

1230 Accesses
Metrics details

Subjects

Abstract

A chromosome-level genome assembly of Bohadschia ocellata, a member of the Holothuriidae family, was constructed through the integration of MGI DNBSEQ-T7 short-read sequencing, PacBio HiFi long-read sequencing, and Hi-C genomic scaffolding technology. After optimization to eliminate redundant sequences, the genome assembly was precisely anchored to 23 chromosomes, resulting in a total size of 909.18 Mb. The N50 of its contig and scaffold sequences were 12.00 Mb and 38.97 Mb, respectively, confirming that the assembly was highly continuous. According to Merqury and BUSCO evaluations, the genome assembly reached a QV of 64.44 and completeness of 94.40%. From this assembly, 31,277 protein-coding genes were identified, which were 98.10% complete based on BUSCO assessment of the predicted proteome. Functional annotations were obtained from at least one database for more than 99% of these genes. This high-quality B. ocellata genome assembly from the current study could offer valuable information for further genetic and evolutionary studies of this sea cucumber species.

Chromosome-level genome assembly and annotation of the tropical sea cucumber Holothuria fuscocinerea

Article Open access 24 January 2026

Chromosome-level genome assembly of the sea cucumber, Colochirus anceps

Article Open access 14 October 2025

Chromosomal-level genome assembly and annotation of the tropical sea cucumber Holothuria scabra

Article Open access 09 May 2024

Background & Summary

Sea cucumbers, belonging to the class Holothuroidea within the phylum Echinodermata¹, comprise more than 1,800 species² and possess the second highest diversity among echinoderms³. They typically inhabit coral reefs, rocky substrates or deep sea bottoms⁴. While most sea cucumbers primarily distribute in the Indo-Pacific, they are considered highly valuable in the region, with up to 80 species being commercially harvested⁵. Such scale of exploitation is largely due to their price as luxury commodity and high nutritional value, which made them particularly popular in Asian cuisines⁶. Therefore, it’s unsurprising that most commercial sea cucumbers are tropical members of the families Holothuriidae and Stichopodidae, with only some temperate species from the family Cucumariidae⁷. Ecologically, sea cucumbers also play at least four important roles including the: 1) recycling of nutrients, 2) facilitating bioturbation and deposit-feeding, 3) disturbing sediments to promote mineralization 4) influencing seawater chemistry^8,9. Currently, however, seven species of sea cucumbers have been recognized as “Endangered” and nine as “Vulnerable” on the IUCN Red List¹⁰. This is caused by persistent overfishing, drastically depleting many high-value sea cucumber populations, such as Holothuria Scabra¹¹. Thus, the practice of artificial breeding and stock enhancement as a critical means of managing sea cucumbers fisheries and aquaculture has become increasingly applied^12,13,14.

Bohadschia ocellata is a sea cucumber species of the family Holothuriidae natively found in a wide range across the tropical Indo-Pacific, including the Timor Sea, South China Sea, Philippine Sea and the Great Barrier Reef⁵ (Fig. 1a). In the Persian Gulf, as a species of great density, it’s particularly recorded for great reproduction and growth rate¹⁵. The large, blotchy spots on the dorsum serve as a symbolic trait for identifying the species¹⁶. While behaviourally, B. ocellata is often noticed as exposed during the day, although it is capable of burrowing and can frequently be found buried in the sand¹⁶. In Asian regions, especially in China, the body wall of B. ocellata is commonly consumed¹⁰. Furthermore, the fucosylated chondroitin sulfate extracted from B. ocellata contains key monosaccharide components and demonstrates significant biological activities¹⁷. As a result, overfishing and habitat destruction have led to a sharp decline in B. ocellata populations over the past decade.

Samyn, Y and Vandenspiegel, D have examined the holotype of B. ocellata and concluded it to be a valid species of the genus Bohadschia¹⁸. Therefore, the earlier classification of B. ocellata into the genus Holothuria was proposed as unjustified¹⁹. With continuous progress of sea cucumber taxonomy, both Holothuria and Bohadschia have now been confirmed as independent and valid genera. Meanwhile, based on morphological characteristics and mitochondrial genome sequencings, B. ocellata has been reclassified into the genus Bohadschia^16,20, forming a species different from the established Holothuria ocellata²¹. Although the mitochondrial cytochrome c oxidase I (COI) gene of B. ocellata was successfully sequenced²², the lack of a whole-genome assembly to this day obstacles the more comprehensive study of its phylogenetic affinity, taxonomy and evolutionary process.

To date, chromosome-level genomes have been reported for a few Holothurian species, such as Apostichopus japonicus²³, Holothuria leucospilota²⁴, H. scabra²⁵, Stichopus monotuberculatus²⁶ and Chiridota heheva²⁷. However, whole-genome data for sea cucumbers within the genus Bohadschia remain scarce²⁸. In this study, to fill this gap, PacBio Revio long-read sequencing, MGI DNBSEQ-T7 short-read sequencing and Hi-C mapping were employed to assemble a high-quality chromosome-level genome of B. ocellata. A final genome assembly of 909.18 Mb was obtained and mapped onto 23 pseudochromosomes, with contig and scaffold N50 values of 12.00 Mb and 38.97 Mb, respectively. A total of 31,277 protein-coding genes were identified in the B. ocellata genome, displaying a completeness of 98.10% according to BUSCO assessment of the predicted proteome. Over 99% (30,752) of the genes received functional annotations from at least one database. This chromosome-level genome assembly of B. ocellata not only serves as a valuable genomic resource for further studying the classification and radiation within Bohadschia, but also constructs a worthwhile foundation for future discovering of its evolutionary adaptations, gene functions and conservation strategies.

Methods

Sample collection and nucleic acid extraction

All sea cucumber samples used for sequencing in this study were harvested from Tanmen Port (19.33° N, 110.49° E), Qionghai City, Hainan Province and one was randomly collected at the site to dissect its muscles. The excised muscle tissue was rinsed in a dish containing phosphate-buffered saline (PBS) and immediately transferred to liquid nitrogen. The QIAamp DNA Min Kit was used to extract high-quality DNA from muscle for whole-genome sequencing using both the long-read and short-read approaches.

In the same manner as above, scissors and forceps were used to carefully separate 6 tissues, including intestine, muscle, oral tentacles, respiratory tree, testis and rete mirabile. Total RNA was extracted from the listed 6 tissues for transcriptomic sequencing using the RNAprep Pure Plant Plus Kit (Tiangen Biotechnology Co., Ltd., Beijing, China).

Library preparation and sequencing

Short-read sequencing was performed using the MGI DNBSEQ-T7 platform with a read length configuration of PE150. High-quality DNA that has completed the purity, concentration and integrity tests, is sheared with the help of fragmentase. Then, the fragmented DNA is end-repaired dA-tailed and connected to sequencing adapters. To construct a paired-end library with 350 bp insert, MGIEasy FS DNA Library Preparation Kit (MGI, Shenzhen, China) was used for restriction fragmentation and sequencing adapter ligation. The combined libraries were used to prepare DNA nanoballs (DNBs) through rolling circle replication technology, which were subsequently used for sequencing. The obtained raw data were evaluated and filtered using FastQC (v0.12.1) and SOAPnuke (v2.1.4)²⁹ tools. Acquisition of 73.74 Gb of short-read data were achieved, corresponding to an 81.10 × coverage (Table 1).

Table 1 The clean data and their characteristics obtained from each sequencing.

Full size table

Long-read sequencing was carried out using the PacBio Revio platform (Pacific Biosciences). Isolation of high-quality DNA using the QIAamp DNA Min Kit. The fragmented DNA was then end-repaired and A-tailed using SMRTbell Prep Kit 3.0 (PacBio, Menlo Park, CA, USA). Finally, adapters were connected to both ends of the fragments. After BluePippin size screening, Sequel II Binding Kit 3.2 (PacBio, Menlo Park, CA, USA) was used to bind sequencing polymerase to the constructed library and it was loaded onto the SMRT Cell 8 M sequencing reagent plate for sequencing. SMRT Link v13.1 (Pacific Biosciences) was used to convert each polymerase read with a complete adapter into a Circular Consensus Sequence read (CCS), which was saved in read.bam and filtered according to the QV > = 20³⁰ standard to obtain the final valid data. Acquisition of 53.85 Gb of long-read data were obtained, corresponding to a 59.23 × coverage (Table 1).

High-throughput Hi-C sequencing was also implemented using the MGI DNBSEQ-T7 platform with a sequencing read length of PE150. Formaldehyde fixation and subsequent digestion with the DPNII enzyme (NEB, Ipswich, MA, USA) were applied to muscle samples. After end-repair, biotin labeling and fragment connection, the extracted DNA will be decrosslinked and broken into fragments of approximately 350 bp. The capture of DNA fragments with interactive relationships is facilitated by streptavidin magnetic beads, enabling library construction. Quality assessment of the library’s concentration and insert size was conducted using Qubit 3.0 and Agilent 2100, followed by sequencing of the qualified samples. The raw data were evaluated using FastQC (v0.12.1) software, resulting in 142.18 Gb (156.39 × coverage) of clean data (Table 1).

Utilizing the Illumina high-throughput sequencing platform, transcriptome sequencing was undertaken. While mRNA was enriched from total RNA and purified using AMPure XP beads, it was subsequently reverse transcribed to cDNA, which subsequently underwent end- repair, A-tailing and adapter ligation. PCR amplification was performed on the screened fragments to construct the library and its quality and effective concentration were assessed using AATI and qPCR. Sample-specific barcode primers were used to label the constructed libraries, which were subsequently pooled, followed by paired-end sequencing to generate raw data from the original six tissue samples. The ultimate generation of clean data amounted to 42.30 Gb (Table 1).

Genome assessment and assembly

Assessment of genome size, heterozygosity and repetitive sequence content was conducted through k-mer analysis. 21-mers were counted from raw sequencing reads using Jellyfish (version 2.3.0)³¹, which utilized 10 threads and counted both strands. The k-mer size was set to 21, and the hash size was set to 100 MB (Fig. 1b; Table 2). GenomeScope (version 2.0)³² was employed to analyze the k-mer histograms generated by Jellyfish, using a k-mer size of 21 and assuming a ploidy level of 2. Following the elimination of k-mers with erroneous frequencies, 37,990,663,645 k-mers showing a main peak at depth 65.40 were retained (Table 2). In k-mer analysis, the first, taller peak represents homozygous regions, while the second, smaller peak corresponds to heterozygous regions. GenomeScope estimated a genome size of ~802.34 Mb for B. ocellata, with 27.60% repetitive content. The analysis confirms that the genome of B. ocellata has a high heterozygosity rate, estimated at approximately 1.78%. Other sea cucumber species, such as H. leucospilota²⁴ and S. monotuberculatus²⁶, also exhibit this phenomenon.

Table 2 B. ocellata genomic statistics obtained from K-mer and GenomeScope analyses.

Full size table

PacBio HiFi long-reads were employed for assembly using hifiasm (v0.19.8-r603)³³ software. Preliminary contigs generated through all-versus-all alignment of overlapping reads were subjected to three rounds of iterative self-correction, thereby optimizing sequence accuracy and producing high-confidence contigs. Purge_dups (v1.2.5)³⁴ facilitated the removal of redundant sequences from the preliminary assembly, enabling the completion of the draft genome. MGI short-reads were aligned to the de-redundant genome using Bwa (v0.7.17-r1188)³⁵ and based on this alignment base correction and gap filling was performed by Pilon (v1.23)³⁶. Based on the interactions in the Hi-C matrix, scaffolds were clustered and assigned to chromosomal positions. Subsequent sequencing and orientation were performed using ALLHIC (v1.1)³⁴, culminating in a chromosome-scale assembly comprising 23 pseudochromosomes with 97.27% of scaffolds anchored (Fig. 1c). To interpret the various genomic features and their distribution along chromosomes (Fig. 2; Table 3), a circos plot depicting the genome structure was constructed using Circos (v0.69.8)³⁷. The visualization included the following tracks from inner to outer circles: chromosomes, gene regions, repeat sequences, SNP percentage and NGS sequencing depth. The final genome assembly comprised 73 scaffolds spanning 909.18 Mb, with a scaffold N50 of 38.97 Mb and contig N50 of 12.00 Mb (Table 4).

Table 3 Details of the 23 Hi-C sequencing based chromosomal-level assemblies.

Full size table

Table 4 Summary of the assembled B. ocellata genome after the removal of redundant sequences.

Full size table

Repetitive element annotation

RepeatMasker (version 4.09 with RepBase 20181026)³⁸ and EDTA³⁹, a whole-genome de-novo transposable element annotation pipeline, were employed to generate high-quality TE annotations. The EDTA pipeline, comprising LTRharvest, the parallel version of LTR_FINDER, LTR_retriever, GRF, TIR-Learner, HelitronScanner, RepeatModeler and customized filtering scripts, is designed to construct a candidate repeat sequence library. The subsequent application of nucleotide coding sequences (CDS) contributes to filter out gene-like sequences from the TE library and eliminates redundancy by consolidating results from the three methods mentioned above. A total of 34.31% of the assembled genome was identified as repetitive, among which DNA elements accounted for the highest proportion (13.57%), followed by LINEs (4.33%), LTRs (0.91%), SINEs (0.02%) and simple repeats (0.01%) (Fig. 3a; Table 5).

Table 5 List of repetitive elements obtained through annotation.

Full size table

Noncoding RNA annotation

Ribosomal RNAs (rRNAs) and transfer RNAs (tRNAs) were predicted using default parameters by Barrnap (v0.9)⁴⁰ and tRNAscan-SE (v2.0.11)⁴¹, respectively. The identification of other non-coding RNAs, such as small nuclear RNAs (snRNAs) and microRNAs (miRNAs), was achieved through alignment with the Rfam database (v14.8)⁴², followed by annotation using Infernal (v1.1.4)⁴³. A total of 20 miRNAs, 9,079 tRNAs, 533 rRNAs and 1,785 snRNAs were identified in the genome of B. ocellata (Fig. 3b; Table 6).

Table 6 List of non-coding RNAs obtained through annotation.

Full size table

Gene prediction and functional annotation

To enhance exon–intron structure recognition and to optimize gene prediction models, RNA-seq reads were incorporated into the annotation process, followed by evidence integration from multiple sources using MAKER3 (v3.01.03)⁴⁴ for functional annotations. A total of 42.30 Gb of RNA-seq data were obtained through the sequencing of transcriptomes from six tissues of B. ocellata for use in gene annotation (Table 7). HISAT2 (v2.2.1)⁴⁵ was used to align the cleaned RNA-seq reads to the assembled genome, after which StringTie (v2.1.7)⁴⁶ was utilized to reconstruct transcript structures and quantify expression levels. The incorporation of 9 high-quality protein datasets facilitated homology-based functional annotation, including those from Lytechinus pictus (GCF_015342785.2), Patiria miniata (GCF_015706575.1), Anneissia japonica (GCF_011630105.1), Asterias rubens (GCF_902459465.1), Acanthaster planci (GCF_001949145.1), Lytechinus variegatus (GCF_018143015.1), Strongylocentrotus purpuratus (GCF_000002235.5), H. leucospilota (GCA_029531755.1) and A. japonicus (GCA_002754855.1). Using BRAKER3 (v3.0.8), which synergistically combines GeneMark-ETP + and AUGUSTUS⁴⁷, gene models were predicted and annotated in an unsupervised and evidence-guided manner. By simultaneously leveraging RNA-seq expression profiles and protein orthology, this strategy enables the comprehensive annotation of both well-supported coding regions and novel gene models. Genome from three other Holothuroidea species—H. leucospilota²⁴, A. japonicus²³ and S. monotuberculatus²⁶—were compared with the predicted genes of B. ocellata in terms of gene count and genomic features, with a total of 31,277 genes identified in B. ocellata (Table 8). BUSCO analysis using 954 single-copy orthologs from 65 genomes indicated that 98.10% of genes were complete and only 0.90% were fragmented, indicating the completeness of the annotated gene set (Table 9).

Table 7 Data generated from the transcriptomic sequencing of B. ocellata tissue samples.

Full size table

Table 8 The B. ocellata assembly in comparison with three other available Holothuroidea genomes in terms of genes and gene elements.

Full size table

Table 9 Detailed results obtained from the BUSCO analysis of the annotated genes.

Full size table

Functional annotation and amino acid sequence analysis were conducted using InterProScan (v5.56)⁴⁸ to predict protein families, domains and functional sites. Additionally, the amino acid sequences of the genes were compared with the UniProt⁴⁹, UniProtKB/SwissProt⁵⁰ and KEGG⁵¹ databases using the BLAST (v2.12.0+, e-value 1e⁻⁵)⁵² tool to confirm their functional information and the biological pathways involved. Gene Ontology (GO)⁵³ annotations were obtained from InterProScan output, detailing the molecular functions, cellular components and biological processes associated with each gene. The results showed that a total of 30,752 protein sequences (99%) were annotated with at least one public database (Fig. 4; Table 10).

Table 10 Number of coding genes identified by each method of functional annotation.

Full size table

Data Records

All sequence reads and chromosome-level genome assembly of B. ocellata associated with this project are available under SRP586938⁵⁴ at NCBI. The whole genome shotgun project has been deposited in DDBJ/ENA/GenBank under the accession number JBOCEH000000000⁵⁵, which corresponds to the version described in this paper. The genome sequencing datasets, including PacBio HiFi (SRR33662152)⁵⁶, BGI short-reads (SRR33662151)⁵⁷ and Hi-C reads (SRR35940911)⁵⁸, are publicly accessible via the SRA. Additionally, RNAseq data is available under SRA numbers SRR33662153–SRR33662158^{59,60,61,62,63,64}. Additional related datasets, including genome assembly, gene annotation and functional annotation are available in the Figshare repository⁶⁵ or Baiduyun⁶⁶.

Technical Validation

Nucleic acid quality

After assessing the quality and concentration of DNA using 0.70% agarose gel electrophoresis, NanoDrop One spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) and Qubit 3.0 fluorometer (Life Technologies, Carlsbad, CA, USA), DNA samples showing slight degradation were considered suitable for sequencing library preparation. The concentration, integrity and purity of RNA were assessed using NanoDrop One spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA), Bioanalyzer 5400 system (Agilent Technologies, Santa Clara, California, USA) and agarose gel electrophoresis (0.70%), respectively. RNA samples exhibiting a RIN above 9.50 were considered suitable for downstream library preparation.

Genome assembly and annotation quality

The QV pipeline of Merqury⁶⁷ was used to estimate the assembly QV based on k-mer analysis. Using the “best_k.sh” script from Merqury, the optimal k-mer length was calculated as 19. The number of k-mers in the short-read sequencing data was calculated using Meryl with default settings and the output was then used alongside the assembly in Merqury to perform the QV evaluation. A k-mer completeness of 78.25% and a k-mer-based QV of 64.44 were obtained from the analysis.

BUSCO analysis was performed to further evaluate genome completeness, utilizing the metazoa_odb10 dataset, which includes 954 conserved genes from 65 metazoan genomes. BUSCO evaluation indicated an overall completeness of 98.20%, consisting of 94.40% complete BUSCOs, 3.80% fragmented and 1.80% missing (Table 11). The BUSCO assessment indicated that the majority of core, essential genes were captured in the assembly or annotation, supporting the high quality, completeness and accuracy of the B. ocellata genome, which provides a valuable genomic foundation for resource conservation, selective breeding, and aquaculture development.

Table 11 Detailed results obtained from the BUSCO analysis of the assembled genome.

Full size table

Data availability

All sequencing and assembly data generated in this study have been deposited in public repositories. Raw sequencing data including BGI short-reads, PacBio HiFi long-reads, Hi-C reads and RNA-seq data are available at NCBI Sequence Read Archive database under the number SRP586938⁵⁴ (https://identifiers.org/ncbi/insdc.sra:SRP586938). The whole genome shotgun project has been deposited in the DDBJ/ENA/GenBank under the accession number JBOCEH000000000⁵⁵ (https://identifiers.org/ncbi/insdc:JBOCEH000000000). Additionally, the genome assembly, gene annotation and functional annotation are available in the Figshare repository⁶⁵ (https://doi.org/10.6084/m9.figshare.29124434.v1) or Baiduyun⁶⁶ (https://pan.baidu.com/s/10DBoB_GQQhThYBiloGInsA?pwd=wmhs).

Code availability

All commands and workflows used for data processing were executed in accordance with the respective software manuals and protocols, with the relevant settings and parameters detailed below:

SOAPnuke (v2.1.4): Employed to filter out low-quality reads from MGI raw sequencing data using the software’s default configurations.

SMRT Link (v13.1): Employed to process and filter PacBio raw sequencing data using default configurations.

Jellyfish (v2.3.0): Utilized to count 21-mers for estimating genome size and heterozygosity.

GenomeScope (v2.0.0): Utilized to process the K-mer frequency histogram for estimating genome size, heterozygosity, and repeat content using default configurations.

Hifiasm (v0.19.8-r603): Utilized to assemble the PacBio HiFi data after reads comparison and self-correct using built-in configurations.

Bwa (v0.7.17-r1188): Utilized to map the MGI short read data onto the draft assembly using built-in configurations.

Pilon (v1.23): Utilized to correct residual errors with Bwa alignment result using built-in configurations.

Purge_dups (v1.2.5): Utilized to reduce redundant haplotigs and determine heterozygosity for the draft genome under a configuration of -j 80 -s 80.

ALLHiC (v1.1): Utilized to assign and orient scaffolds using Hi-C reads into chromosome-level assemblies.

Merqury (v1.3): Utilized to assess k-mer coverage and QV value for the qualification of the assembled genome using best-fit K-mer = 19.

BUSCO (v5.7.1): Utilized to estimate genomic coverage using the metazoa_odb10 data collection.

Circos (v0.69): Utilized to display chromosomal structure and visualize the distribution of gene regions, repeat sequences, SNP percentage and NGS sequencing depth.

RepeatMasker (v4.09): Utilized to annotate transposable elements using built-in configurations.

EDTA: Utilized to annotate de-novo transposable elements using built-in configurations.

Barrnap (v0.9): Utilized to identify ribosomal RNAs (rRNAs) using built-in configurations.

tRNAscan-SE (v2.0.11): Utilized to search for transfer RNAs (tRNAs) sing built-in configurations.

Infernal (v1.1.4): Utilized to identify microRNAs (miRNAs) and small nuclear RNAs (snRNAs) using built-in configurations.

Braker (v3.0.8): Utilized to integrate gene prediction results with 9 selected proteomes and RNAseq reads from tissues with parameters set to gff3, threads 48, prot_seq = pep.fasta, bam = bams and UTR = on.

HISAT2 (v2.2.1): Utilized to map transcriptomic data for genome annotation using built-in configurations.

StringTie (v2.1.7): Utilized to assemble the transcripts for the prediction of gene structures using built-in configurations.

MAKER3 (v3.01.03): Utilized to combine outputs from various prediction modes into the final gene collection using built-in configurations.

BLAST (v2.11.0 +): Employed for synteny analysis and functional annotation of predicted genes using the BLASTP module with an E-value threshold of 1e–⁵.

References

Mercier, A., Gebruk, A., Kremenetskaia, A. & Hamel, J-F. in The World of Sea Cucumbers (ed. Mercier, A., Hamel, J-F., Suhrbier, A. D. & Pearce, C. M.) Ch. 1, https://doi.org/10.1016/B978-0-323-95377-1.00001-1 (London Academic Press, 2023).
Mercier, A. et al. Revered and Reviled: The Plight of the Vanishing Sea Cucumbers. Annu. Rev. Mar. Sci. 17, 115–142, https://doi.org/10.1146/annurev-marine-032123-025441 (2025).
Article ADS Google Scholar
Miller, A. K. et al. Molecular phylogeny of extant Holothuroidea (Echinodermata). Mol. Phylogenet Evol. 111, 110–131, https://doi.org/10.1016/j.ympev.2017.02.014 (2017).
Article PubMed Google Scholar
Pearce, C. M., William Gartrell, J., King, X. K. & Zaklan Duff, S. D. in The World of Sea Cucumbers (ed. Mercier, A., Hamel, J-F., Suhrbier, A. D. & Pearce, C. M.) Ch. 2, https://doi.org/10.1016/B978-0-323-95377-1.00014-X (London Academic Press, 2023).
Purcell, S. W. et al. Commercially important sea cucumbers of the world 2nd edn, https://doi.org/10.4060/cc5230en (FAO, 2023).
Gamboa, R. U., Halun, S. Z. B. & Vularika, A. S. in The World of Sea Cucumbers (ed. Mercier, A., Hamel, J-F., Suhrbier, A. D. & Pearce, C. M.) Ch. 9, https://doi.org/10.1016/B978-0-323-95377-1.00021-7 (London Academic Press, 2023).
Conand, C. Tropical sea cucumber fisheries: Changes during the last decade. Mar. Pollut. Bull. 133, 590–594, https://doi.org/10.1016/j.marpolbul.2018.05.014 (2018).
Article CAS PubMed Google Scholar
Slater, M. in The World of Sea Cucumbers (ed. Mercier, A., Hamel, J-F., Suhrbier, A. D. & Pearce, C. M.) Ch. 41, https://doi.org/10.1016/B978-0-323-95377-1.00022-9 (London Academic Press, 2023).
Wolfe, K. in The World of Sea Cucumbers (ed. Mercier, A., Hamel, J-F., Suhrbier, A. D. & Pearce, C. M.) Ch. 28, https://doi.org/10.1016/B978-0-323-95377-1.00028-X (London Academic Press, 2023).
Phelps Bondaroff, T. N. & Morrow, F. in The World of Sea Cucumbers (ed. Mercier, A., Hamel, J-F., Suhrbier, A. D. & Pearce, C. M.) Ch. 13, https://doi.org/10.1016/B978-0-323-95377-1.00009-6 (London Academic Press, 2023).
Hamel, J. F. et al. Global knowledge on the commercial sea cucumber Holothuria scabra. Adv. Mar. Bio. 91, 1–286, https://doi.org/10.1016/bs.amb.2022.04.001 (2022).
Article Google Scholar
Yang, Y. et al. Pipeline for identification of genome-wide microsatellite markers and its application in assessing the genetic diversity and structure of the tropical sea cucumber Holothuria leucospilota. Aquaculture Reports. 37, 102207, https://doi.org/10.1016/j.aqrep.2024.102207 (2024).
Article Google Scholar
Nocillado, J. et al. Spawning induction of the high-value white teatfish sea cucumber, Holothuria fuscogilva, using recombinant relaxin-like gonad stimulating peptide (RGP). Aquaculture. 547, 737422, https://doi.org/10.1016/j.aquaculture.2021.737422 (2022).
Article CAS Google Scholar
Osathanunkul, M. & Suwannapoom, C. Sustainable fisheries management through reliable restocking and stock enhancement evaluation with environmental DNA. Sci. Rep. 13, 11297, https://doi.org/10.1038/s41598-023-38218-2 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Javanmardi, S., Rezaei Tavabe, K., Moradi, S. & Abed-Elmdoust, A. The effects of dietary levels of the sea cucumber (Bohadschia ocellata Jaeger, 1833) meal on growth performance, blood biochemical parameters, digestive enzymes activity and body composition of Pacific white shrimp (Penaeus vannamei Boone, 1931) juveniles. Iranian Journal of Fisheries Sciences. 19, 2366–2383, https://doi.org/10.22092/ijfs.2020.122330 (2020).
Article Google Scholar
Kim, S. W., Kerr, A. M. & Paulay, G. Colour, confusion, and crossing: resolution of species problems in Bohadschia (Echinodermata: Holothuroidea). Zoological Journal of the Linnean Society. 168, 81–97, https://doi.org/10.1111/zoj.12026 (2013).
Article Google Scholar
Thinh, P. D. et al. Fucosylated Chondroitin Sulfate from Bohadschia ocellata: Structure Analysis and Bioactivities. Processes. 12, 2108, https://doi.org/10.3390/pr12102108 (2024).
Article CAS Google Scholar
Samyn, Y. & Vandenspiegel, D. Sublittoral and bathyal sea cucumbers (Echinodermata: Holothuroidea) from the Northern Mozambique Channel with description of six new species. Zootaxa. 4196, 451–497, https://doi.org/10.11646/zootaxa.4196.4.1 (2016).
Article Google Scholar
Liao, Y. & Clark, A. M. The Echinoderms of Southern China (Science Press, Beijing & New York, 1995).
Amin, A. & Thalib, B. Marine of dentistry: pemanfaatan stichopus hermanii dalam bidang kedokteran gigi (Nas Media Pustaka Press, Indonesia, 2024).
Cheng, H. et al. Taxonomic status and phylogenetic analyses based on complete mitochondrial genome and microscopic ossicles: Redescription of a controversial tropical sea cucumber species (Holothuroidea, Holothuria Linnaeus, 1767). Zoosyst. Evol. 101, 791–804, https://doi.org/10.3897/zse.101.137781 (2025).
Article Google Scholar
Patantis, G., Dewi, A. S., Fawzya, Y. N. & Nursid, M. Identification of Beche-de-mers from Indonesia by molecular approach. Biodiversitas. 20, 537–543, https://doi.org/10.13057/BIODIV/D200233 (2019).
Article Google Scholar
Sun, L., Jiang, C., Su, F., Cui, W. & Yang, H. Chromosome-level genome assembly of the sea cucumber Apostichopus japonicus. Sci. Data. 10, 454, https://doi.org/10.1038/s41597-023-02368-9 (2023).
Article CAS PubMed PubMed Central Google Scholar
Chen, T. et al. The Holothuria leucospilota genome elucidates sacrificial organ expulsion and bioadhesive trap enriched with amyloid-patterned proteins. Pnas. 120, e2213512120, https://doi.org/10.1073/pnas.2213512120 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhong, S. et al. Chromosomal-level genome assembly and annotation of the tropical sea cucumber Holothuria scabra. Sci. Data. 11, 474, https://doi.org/10.1038/s41597-024-03340-x (2024).
Article CAS PubMed PubMed Central Google Scholar
Chen, T. et al. Chromosome-level genome assembly and annotation of the tropical sea cucumber Stichopus monotuberculatus. Sci. Data. 11, 1245, https://doi.org/10.1038/s41597-024-03985-8 (2024).
Article CAS PubMed PubMed Central Google Scholar
Zhang, L. et al. The genome of an apodid holothuroid (Chiridota heheva) provides insights into its adaptation to a deep-sea reducing environment. Commun. Biol. 5, 224, https://doi.org/10.1038/s42003-022-03176-4 (2022).
Article CAS PubMed PubMed Central Google Scholar
Ma, B. et al. Analysis of Complete Mitochondrial Genome of Bohadschia argus (Jaeger, 1833) (Aspidochirotida, Holothuriidae). Animals. 12, 1437, https://doi.org/10.3390/ani12111437 (2022).
Article PubMed PubMed Central Google Scholar
Chen, Y. et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. Gigascience. 7, 1–6, https://doi.org/10.1093/gigascience/gix120 (2017).
Article ADS CAS PubMed Central Google Scholar
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162, https://doi.org/10.1038/s41587-019-0217-9 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).
Article CAS PubMed PubMed Central Google Scholar
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics. 33, 2202–2204, https://doi.org/10.1093/bioinformatics/btx153 (2017).
Article CAS PubMed PubMed Central Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods. 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5 (2021).
Article CAS PubMed PubMed Central Google Scholar
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics. 36, 2896–2898, https://doi.org/10.1093/bioinformatics/btaa025 (2020).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 25, 1754–1760, https://doi.org/10.1093/bioinformatics/btp324 (2009).
Article CAS PubMed PubMed Central Google Scholar
Walker, B. J. et al. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLoS. ONE. 9, e112963, https://doi.org/10.1371/journal.pone.0112963 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome research. 19, 1639–1645, https://doi.org/10.1101/gr.092759.109 (2009).
Article CAS PubMed PubMed Central Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. PNSA. 117, 9451–9457, https://doi.org/10.1073/pnas.1921046117 (2020).
Article CAS Google Scholar
Ou, S. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 20, 275, https://doi.org/10.1186/s13059-019-1905-y (2019).
Article CAS PubMed PubMed Central Google Scholar
Aylward, F. O. Introduction to Prokaryotic gene prediction (CDS and rRNA) V. 2. BMC Bioinformatics. 11, 1, https://doi.org/10.17504/protocols.io.pjrdkm6 (2010).
Article Google Scholar
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic Sequence. Nucleic Acids Research. 25, 955–964, https://doi.org/10.1093/nar/25.5.955 (1997).
Article CAS PubMed PubMed Central Google Scholar
Kalvari, I. et al. Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Research. 46, D335–D342, https://doi.org/10.1093/nar/gkx1038 (2018).
Article CAS PubMed Google Scholar
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 29, 2933–2935, https://doi.org/10.1093/bioinformatics/btt509 (2013).
Article CAS PubMed PubMed Central Google Scholar
Cantarel, B. L. et al. MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome research. 18, 188–196, https://doi.org/10.1101/gr.6743907 (2008).
Article CAS PubMed PubMed Central Google Scholar
Kim, D. et al. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915, https://doi.org/10.1038/s41587-019-0201-4 (2019).
Article CAS PubMed PubMed Central Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295, https://doi.org/10.1038/nbt.3122 (2015).
Article CAS PubMed PubMed Central Google Scholar
Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genomics and Bioinformatics. 3, lqaa108, https://doi.org/10.1093/nargab/lqaa108 (2021).
Article CAS PubMed PubMed Central Google Scholar
Mitchell, A. L. et al. InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Research. 47, D351–D360, https://doi.org/10.1093/nar/gky1100 (2019).
Article CAS PubMed Google Scholar
The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Research. 45, D158–D169, https://doi.org/10.1093/nar/gkw1099 (2017).
Article CAS Google Scholar
Boutet, E. et al. UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View. Methods in Molecular Biology. 1374, 23–54, https://doi.org/10.1007/978-1-4939-3167-5_2 (2016).
Article CAS PubMed Google Scholar
Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences. Journal of molecular biology. 428, 726–731, https://doi.org/10.1016/j.jmb.2015.11.006 (2016).
Article CAS PubMed Google Scholar
Camacho, C. et al. BLAST plus: architecture and applications. BMC Bioinformatics. 10, 421, https://doi.org/10.1186/1471-2105-10-421 (2009).
Article CAS PubMed PubMed Central Google Scholar
The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Research. 47, D330–D338, https://doi.org/10.1093/nar/gky1055 (2019).
Article CAS Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP586938 (2025).
Chen, T. Holothuria ocellata isolate DDF-2025, whole genome shotgun sequencing project. GenBank https://identifiers.org/ncbi/insdc:JBOCEH000000000 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR33662152 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR33662151 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR35940911 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR33662153 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR33662154 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR33662155 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR33662156 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR33662157 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR33662158 (2025).
Fan, D. Genome sequence of sea cucumber Bohadschia ocellata (Holothuria ocellata). Figshare https://doi.org/10.6084/m9.figshare.29124434.v1 (2025).
Fan, D. Genome sequence of sea cucumber Bohadschia ocellata (Holothuria ocellata). Baiduyun https://pan.baidu.com/s/10DBoB_GQQhThYBiloGInsA?pwd=wmhs (2025).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245, https://doi.org/10.1186/s13059-020-02134-9 (2020).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This study was graciously supported by grants from the National Natural Science Foundation of China (W2512089 to A.Y., and 42176132 and 32573487 to T.C.), the Research on breeding technology of candidate species for Guangdong modern marine ranching (2024-MRB-00-001 to T.C.), and the Guangdong Province Project (2024A1515011418 to T.C.).

Author information

These authors contributed equally: Qianying Huang, Xuan Wang.

Authors and Affiliations

School of Medicine, Foshan University, Foshan, 528225, China
Qianying Huang, Hua Ge, Yingxin Lin, Junyan Wang, Xiaoli Zhang, Xiangxing Zhu, Dongsheng Tang & Aifen Yan
State Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture, Laboratory of Tropical Marine Bio-resources and Ecology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou, 510301, China
Xuan Wang, Zhou Qin, Dingding Fan, Chunhua Ren, Chaoqun Hu & Ting Chen
Laboratory of Marine Resource Utilization in the South China Sea, Hainan University, Haikou, 570228, China
Yun Yang & Zhenyu Xie
Xisha Marine Environment National Observation and Research Station, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Sansha, 573199, China
Da Huo & Chang Chen
Agro-Tech Extension Center of Guangdong Province, Huizhou, 516081, China
Haipeng Qin

Authors

Qianying Huang
View author publications
Search author on:PubMed Google Scholar
Xuan Wang
View author publications
Search author on:PubMed Google Scholar
Zhou Qin
View author publications
Search author on:PubMed Google Scholar
Dingding Fan
View author publications
Search author on:PubMed Google Scholar
Hua Ge
View author publications
Search author on:PubMed Google Scholar
Yingxin Lin
View author publications
Search author on:PubMed Google Scholar
Junyan Wang
View author publications
Search author on:PubMed Google Scholar
Yun Yang
View author publications
Search author on:PubMed Google Scholar
Zhenyu Xie
View author publications
Search author on:PubMed Google Scholar
Da Huo
View author publications
Search author on:PubMed Google Scholar
Chang Chen
View author publications
Search author on:PubMed Google Scholar
Haipeng Qin
View author publications
Search author on:PubMed Google Scholar
Xiaoli Zhang
View author publications
Search author on:PubMed Google Scholar
Xiangxing Zhu
View author publications
Search author on:PubMed Google Scholar
Dongsheng Tang
View author publications
Search author on:PubMed Google Scholar
Chunhua Ren
View author publications
Search author on:PubMed Google Scholar
Chaoqun Hu
View author publications
Search author on:PubMed Google Scholar
Ting Chen
View author publications
Search author on:PubMed Google Scholar
Aifen Yan
View author publications
Search author on:PubMed Google Scholar

Contributions

Chunhua Ren, Chaoqun Hu, Ting Chen, and Aifen Yan planned and conceptualized the research. Qianying Huang, Xuan Wang, Zhou Qin, Hua Ge, Yingxin Lin, Junyan Wang, Yun Yang, Da Huo, Xiaoli Zhang and Xiangxing Zhu acquired and processed the samples. Zhou Qin and Dingding Fan constructed the genome and performed annotations. Qianying Huang, Xuan Wang, Zhou Qin and Dingding Fan analysed gene functions. Qianying Huang, Xuan Wang, Zhou Qin, Dingding Fan and Ting Chen conducted bioinformatic analyses. Zhenyu Xie, Chang Chen, Haipeng Qin, Dongsheng Tang, Chunhua Ren, Chaoqun Hu, Aifen Yan, and Ting Chen offered experimental materials and computational resources. Qianying Huang, Xuan Wang, Dingding Fan, Aifen Yan and Ting Chen composed the manuscript. Ting Chen and Aifen Yan and carried out revisions. All authors have reviewed and consented to the final version of the manuscript.

Corresponding author

Correspondence to Aifen Yan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Huang, Q., Wang, X., Qin, Z. et al. Chromosome-level genome assembly and annotation of the tropical sea cucumber Bohadschia ocellate. Sci Data 13, 137 (2026). https://doi.org/10.1038/s41597-025-06453-z

Download citation

Received: 08 August 2025
Accepted: 10 December 2025
Published: 17 December 2025
Version of record: 30 January 2026
DOI: https://doi.org/10.1038/s41597-025-06453-z