Background & Summary

Sea cucumbers, belonging to the class Holothuroidea within the phylum Echinodermata1, comprise more than 1,800 species2 and possess the second highest diversity among echinoderms3. They typically inhabit coral reefs, rocky substrates or deep sea bottoms4. While most sea cucumbers primarily distribute in the Indo-Pacific, they are considered highly valuable in the region, with up to 80 species being commercially harvested5. Such scale of exploitation is largely due to their price as luxury commodity and high nutritional value, which made them particularly popular in Asian cuisines6. Therefore, it’s unsurprising that most commercial sea cucumbers are tropical members of the families Holothuriidae and Stichopodidae, with only some temperate species from the family Cucumariidae7. Ecologically, sea cucumbers also play at least four important roles including the: 1) recycling of nutrients, 2) facilitating bioturbation and deposit-feeding, 3) disturbing sediments to promote mineralization 4) influencing seawater chemistry8,9. Currently, however, seven species of sea cucumbers have been recognized as “Endangered” and nine as “Vulnerable” on the IUCN Red List10. This is caused by persistent overfishing, drastically depleting many high-value sea cucumber populations, such as Holothuria Scabra11. Thus, the practice of artificial breeding and stock enhancement as a critical means of managing sea cucumbers fisheries and aquaculture has become increasingly applied12,13,14.

Bohadschia ocellata is a sea cucumber species of the family Holothuriidae natively found in a wide range across the tropical Indo-Pacific, including the Timor Sea, South China Sea, Philippine Sea and the Great Barrier Reef5 (Fig. 1a). In the Persian Gulf, as a species of great density, it’s particularly recorded for great reproduction and growth rate15. The large, blotchy spots on the dorsum serve as a symbolic trait for identifying the species16. While behaviourally, B. ocellata is often noticed as exposed during the day, although it is capable of burrowing and can frequently be found buried in the sand16. In Asian regions, especially in China, the body wall of B. ocellata is commonly consumed10. Furthermore, the fucosylated chondroitin sulfate extracted from B. ocellata contains key monosaccharide components and demonstrates significant biological activities17. As a result, overfishing and habitat destruction have led to a sharp decline in B. ocellata populations over the past decade.

Fig. 1
figure 1

General genomic attributes of B. ocellata based on assembly results. (a) Morphological illustration of B. ocellata and mapping of its natural habitat range; (b)Histogram of 21-mer count distribution of the B. ocellata genome; (c) Hi-C interaction map of the B. ocellata genome.

Samyn, Y and Vandenspiegel, D have examined the holotype of B. ocellata and concluded it to be a valid species of the genus Bohadschia18. Therefore, the earlier classification of B. ocellata into the genus Holothuria was proposed as unjustified19. With continuous progress of sea cucumber taxonomy, both Holothuria and Bohadschia have now been confirmed as independent and valid genera. Meanwhile, based on morphological characteristics and mitochondrial genome sequencings, B. ocellata has been reclassified into the genus Bohadschia16,20, forming a species different from the established Holothuria ocellata21. Although the mitochondrial cytochrome c oxidase I (COI) gene of B. ocellata was successfully sequenced22, the lack of a whole-genome assembly to this day obstacles the more comprehensive study of its phylogenetic affinity, taxonomy and evolutionary process.

To date, chromosome-level genomes have been reported for a few Holothurian species, such as Apostichopus japonicus23, Holothuria leucospilota24, H. scabra25, Stichopus monotuberculatus26 and Chiridota heheva27. However, whole-genome data for sea cucumbers within the genus Bohadschia remain scarce28. In this study, to fill this gap, PacBio Revio long-read sequencing, MGI DNBSEQ-T7 short-read sequencing and Hi-C mapping were employed to assemble a high-quality chromosome-level genome of B. ocellata. A final genome assembly of 909.18 Mb was obtained and mapped onto 23 pseudochromosomes, with contig and scaffold N50 values of 12.00 Mb and 38.97 Mb, respectively. A total of 31,277 protein-coding genes were identified in the B. ocellata genome, displaying a completeness of 98.10% according to BUSCO assessment of the predicted proteome. Over 99% (30,752) of the genes received functional annotations from at least one database. This chromosome-level genome assembly of B. ocellata not only serves as a valuable genomic resource for further studying the classification and radiation within Bohadschia, but also constructs a worthwhile foundation for future discovering of its evolutionary adaptations, gene functions and conservation strategies.

Methods

Sample collection and nucleic acid extraction

All sea cucumber samples used for sequencing in this study were harvested from Tanmen Port (19.33° N, 110.49° E), Qionghai City, Hainan Province and one was randomly collected at the site to dissect its muscles. The excised muscle tissue was rinsed in a dish containing phosphate-buffered saline (PBS) and immediately transferred to liquid nitrogen. The QIAamp DNA Min Kit was used to extract high-quality DNA from muscle for whole-genome sequencing using both the long-read and short-read approaches.

In the same manner as above, scissors and forceps were used to carefully separate 6 tissues, including intestine, muscle, oral tentacles, respiratory tree, testis and rete mirabile. Total RNA was extracted from the listed 6 tissues for transcriptomic sequencing using the RNAprep Pure Plant Plus Kit (Tiangen Biotechnology Co., Ltd., Beijing, China).

Library preparation and sequencing

Short-read sequencing was performed using the MGI DNBSEQ-T7 platform with a read length configuration of PE150. High-quality DNA that has completed the purity, concentration and integrity tests, is sheared with the help of fragmentase. Then, the fragmented DNA is end-repaired dA-tailed and connected to sequencing adapters. To construct a paired-end library with 350 bp insert, MGIEasy FS DNA Library Preparation Kit (MGI, Shenzhen, China) was used for restriction fragmentation and sequencing adapter ligation. The combined libraries were used to prepare DNA nanoballs (DNBs) through rolling circle replication technology, which were subsequently used for sequencing. The obtained raw data were evaluated and filtered using FastQC (v0.12.1) and SOAPnuke (v2.1.4)29 tools. Acquisition of 73.74 Gb of short-read data were achieved, corresponding to an 81.10 × coverage (Table 1).

Table 1 The clean data and their characteristics obtained from each sequencing.

Long-read sequencing was carried out using the PacBio Revio platform (Pacific Biosciences). Isolation of high-quality DNA using the QIAamp DNA Min Kit. The fragmented DNA was then end-repaired and A-tailed using SMRTbell Prep Kit 3.0 (PacBio, Menlo Park, CA, USA). Finally, adapters were connected to both ends of the fragments. After BluePippin size screening, Sequel II Binding Kit 3.2 (PacBio, Menlo Park, CA, USA) was used to bind sequencing polymerase to the constructed library and it was loaded onto the SMRT Cell 8 M sequencing reagent plate for sequencing. SMRT Link v13.1 (Pacific Biosciences) was used to convert each polymerase read with a complete adapter into a Circular Consensus Sequence read (CCS), which was saved in read.bam and filtered according to the QV > = 2030 standard to obtain the final valid data. Acquisition of 53.85 Gb of long-read data were obtained, corresponding to a 59.23 × coverage (Table 1).

High-throughput Hi-C sequencing was also implemented using the MGI DNBSEQ-T7 platform with a sequencing read length of PE150. Formaldehyde fixation and subsequent digestion with the DPNII enzyme (NEB, Ipswich, MA, USA) were applied to muscle samples. After end-repair, biotin labeling and fragment connection, the extracted DNA will be decrosslinked and broken into fragments of approximately 350 bp. The capture of DNA fragments with interactive relationships is facilitated by streptavidin magnetic beads, enabling library construction. Quality assessment of the library’s concentration and insert size was conducted using Qubit 3.0 and Agilent 2100, followed by sequencing of the qualified samples. The raw data were evaluated using FastQC (v0.12.1) software, resulting in 142.18 Gb (156.39 × coverage) of clean data (Table 1).

Utilizing the Illumina high-throughput sequencing platform, transcriptome sequencing was undertaken. While mRNA was enriched from total RNA and purified using AMPure XP beads, it was subsequently reverse transcribed to cDNA, which subsequently underwent end- repair, A-tailing and adapter ligation. PCR amplification was performed on the screened fragments to construct the library and its quality and effective concentration were assessed using AATI and qPCR. Sample-specific barcode primers were used to label the constructed libraries, which were subsequently pooled, followed by paired-end sequencing to generate raw data from the original six tissue samples. The ultimate generation of clean data amounted to 42.30 Gb (Table 1).

Genome assessment and assembly

Assessment of genome size, heterozygosity and repetitive sequence content was conducted through k-mer analysis. 21-mers were counted from raw sequencing reads using Jellyfish (version 2.3.0)31, which utilized 10 threads and counted both strands. The k-mer size was set to 21, and the hash size was set to 100 MB (Fig. 1b; Table 2). GenomeScope (version 2.0)32 was employed to analyze the k-mer histograms generated by Jellyfish, using a k-mer size of 21 and assuming a ploidy level of 2. Following the elimination of k-mers with erroneous frequencies, 37,990,663,645 k-mers showing a main peak at depth 65.40 were retained (Table 2). In k-mer analysis, the first, taller peak represents homozygous regions, while the second, smaller peak corresponds to heterozygous regions. GenomeScope estimated a genome size of ~802.34 Mb for B. ocellata, with 27.60% repetitive content. The analysis confirms that the genome of B. ocellata has a high heterozygosity rate, estimated at approximately 1.78%. Other sea cucumber species, such as H. leucospilota24 and S. monotuberculatus26, also exhibit this phenomenon.

Table 2 B. ocellata genomic statistics obtained from K-mer and GenomeScope analyses.

PacBio HiFi long-reads were employed for assembly using hifiasm (v0.19.8-r603)33 software. Preliminary contigs generated through all-versus-all alignment of overlapping reads were subjected to three rounds of iterative self-correction, thereby optimizing sequence accuracy and producing high-confidence contigs. Purge_dups (v1.2.5)34 facilitated the removal of redundant sequences from the preliminary assembly, enabling the completion of the draft genome. MGI short-reads were aligned to the de-redundant genome using Bwa (v0.7.17-r1188)35 and based on this alignment base correction and gap filling was performed by Pilon (v1.23)36. Based on the interactions in the Hi-C matrix, scaffolds were clustered and assigned to chromosomal positions. Subsequent sequencing and orientation were performed using ALLHIC (v1.1)34, culminating in a chromosome-scale assembly comprising 23 pseudochromosomes with 97.27% of scaffolds anchored (Fig. 1c). To interpret the various genomic features and their distribution along chromosomes (Fig. 2; Table 3), a circos plot depicting the genome structure was constructed using Circos (v0.69.8)37. The visualization included the following tracks from inner to outer circles: chromosomes, gene regions, repeat sequences, SNP percentage and NGS sequencing depth. The final genome assembly comprised 73 scaffolds spanning 909.18 Mb, with a scaffold N50 of 38.97 Mb and contig N50 of 12.00 Mb (Table 4).

Fig. 2
figure 2

Circos plot depicting the genomic features of B. ocellata. The visualization included the following tracks from inner to outer circles: chromosomes, gene regions, repeat sequences, SNP percentage, and NGS sequencing depth.

Table 3 Details of the 23 Hi-C sequencing based chromosomal-level assemblies.
Table 4 Summary of the assembled B. ocellata genome after the removal of redundant sequences.

Repetitive element annotation

RepeatMasker (version 4.09 with RepBase 20181026)38 and EDTA39, a whole-genome de-novo transposable element annotation pipeline, were employed to generate high-quality TE annotations. The EDTA pipeline, comprising LTRharvest, the parallel version of LTR_FINDER, LTR_retriever, GRF, TIR-Learner, HelitronScanner, RepeatModeler and customized filtering scripts, is designed to construct a candidate repeat sequence library. The subsequent application of nucleotide coding sequences (CDS) contributes to filter out gene-like sequences from the TE library and eliminates redundancy by consolidating results from the three methods mentioned above. A total of 34.31% of the assembled genome was identified as repetitive, among which DNA elements accounted for the highest proportion (13.57%), followed by LINEs (4.33%), LTRs (0.91%), SINEs (0.02%) and simple repeats (0.01%) (Fig. 3a; Table 5).

Fig. 3
figure 3

Results of non-coding elements and RNA prediction. (a) Classification of annotated repetitive elements along with their respective Kimura substitution level distributions; (b) Categorization of annotated noncoding RNAs.

Table 5 List of repetitive elements obtained through annotation.

Noncoding RNA annotation

Ribosomal RNAs (rRNAs) and transfer RNAs (tRNAs) were predicted using default parameters by Barrnap (v0.9)40 and tRNAscan-SE (v2.0.11)41, respectively. The identification of other non-coding RNAs, such as small nuclear RNAs (snRNAs) and microRNAs (miRNAs), was achieved through alignment with the Rfam database (v14.8)42, followed by annotation using Infernal (v1.1.4)43. A total of 20 miRNAs, 9,079 tRNAs, 533 rRNAs and 1,785 snRNAs were identified in the genome of B. ocellata (Fig. 3b; Table 6).

Table 6 List of non-coding RNAs obtained through annotation.

Gene prediction and functional annotation

To enhance exon–intron structure recognition and to optimize gene prediction models, RNA-seq reads were incorporated into the annotation process, followed by evidence integration from multiple sources using MAKER3 (v3.01.03)44 for functional annotations. A total of 42.30 Gb of RNA-seq data were obtained through the sequencing of transcriptomes from six tissues of B. ocellata for use in gene annotation (Table 7). HISAT2 (v2.2.1)45 was used to align the cleaned RNA-seq reads to the assembled genome, after which StringTie (v2.1.7)46 was utilized to reconstruct transcript structures and quantify expression levels. The incorporation of 9 high-quality protein datasets facilitated homology-based functional annotation, including those from Lytechinus pictus (GCF_015342785.2), Patiria miniata (GCF_015706575.1), Anneissia japonica (GCF_011630105.1), Asterias rubens (GCF_902459465.1), Acanthaster planci (GCF_001949145.1), Lytechinus variegatus (GCF_018143015.1), Strongylocentrotus purpuratus (GCF_000002235.5), H. leucospilota (GCA_029531755.1) and A. japonicus (GCA_002754855.1). Using BRAKER3 (v3.0.8), which synergistically combines GeneMark-ETP + and AUGUSTUS47, gene models were predicted and annotated in an unsupervised and evidence-guided manner. By simultaneously leveraging RNA-seq expression profiles and protein orthology, this strategy enables the comprehensive annotation of both well-supported coding regions and novel gene models. Genome from three other Holothuroidea species—H. leucospilota24, A. japonicus23 and S. monotuberculatus26—were compared with the predicted genes of B. ocellata in terms of gene count and genomic features, with a total of 31,277 genes identified in B. ocellata (Table 8). BUSCO analysis using 954 single-copy orthologs from 65 genomes indicated that 98.10% of genes were complete and only 0.90% were fragmented, indicating the completeness of the annotated gene set (Table 9).

Table 7 Data generated from the transcriptomic sequencing of B. ocellata tissue samples.
Table 8 The B. ocellata assembly in comparison with three other available Holothuroidea genomes in terms of genes and gene elements.
Table 9 Detailed results obtained from the BUSCO analysis of the annotated genes.

Functional annotation and amino acid sequence analysis were conducted using InterProScan (v5.56)48 to predict protein families, domains and functional sites. Additionally, the amino acid sequences of the genes were compared with the UniProt49, UniProtKB/SwissProt50 and KEGG51 databases using the BLAST (v2.12.0+, e-value 1e−5)52 tool to confirm their functional information and the biological pathways involved. Gene Ontology (GO)53 annotations were obtained from InterProScan output, detailing the molecular functions, cellular components and biological processes associated with each gene. The results showed that a total of 30,752 protein sequences (99%) were annotated with at least one public database (Fig. 4; Table 10).

Fig. 4
figure 4

Upset and Venn plots depict the overlap of protein-coding gene annotations across UniProt, Swiss-Prot, KEGG, InterPro and GO databases.

Table 10 Number of coding genes identified by each method of functional annotation.

Data Records

All sequence reads and chromosome-level genome assembly of B. ocellata associated with this project are available under SRP58693854 at NCBI. The whole genome shotgun project has been deposited in DDBJ/ENA/GenBank under the accession number JBOCEH00000000055, which corresponds to the version described in this paper. The genome sequencing datasets, including PacBio HiFi (SRR33662152)56, BGI short-reads (SRR33662151)57 and Hi-C reads (SRR35940911)58, are publicly accessible via the SRA. Additionally, RNAseq data is available under SRA numbers SRR33662153–SRR3366215859,60,61,62,63,64. Additional related datasets, including genome assembly, gene annotation and functional annotation are available in the Figshare repository65 or Baiduyun66.

Technical Validation

Nucleic acid quality

After assessing the quality and concentration of DNA using 0.70% agarose gel electrophoresis, NanoDrop One spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) and Qubit 3.0 fluorometer (Life Technologies, Carlsbad, CA, USA), DNA samples showing slight degradation were considered suitable for sequencing library preparation. The concentration, integrity and purity of RNA were assessed using NanoDrop One spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA), Bioanalyzer 5400 system (Agilent Technologies, Santa Clara, California, USA) and agarose gel electrophoresis (0.70%), respectively. RNA samples exhibiting a RIN above 9.50 were considered suitable for downstream library preparation.

Genome assembly and annotation quality

The QV pipeline of Merqury67 was used to estimate the assembly QV based on k-mer analysis. Using the “best_k.sh” script from Merqury, the optimal k-mer length was calculated as 19. The number of k-mers in the short-read sequencing data was calculated using Meryl with default settings and the output was then used alongside the assembly in Merqury to perform the QV evaluation. A k-mer completeness of 78.25% and a k-mer-based QV of 64.44 were obtained from the analysis.

BUSCO analysis was performed to further evaluate genome completeness, utilizing the metazoa_odb10 dataset, which includes 954 conserved genes from 65 metazoan genomes. BUSCO evaluation indicated an overall completeness of 98.20%, consisting of 94.40% complete BUSCOs, 3.80% fragmented and 1.80% missing (Table 11). The BUSCO assessment indicated that the majority of core, essential genes were captured in the assembly or annotation, supporting the high quality, completeness and accuracy of the B. ocellata genome, which provides a valuable genomic foundation for resource conservation, selective breeding, and aquaculture development.

Table 11 Detailed results obtained from the BUSCO analysis of the assembled genome.