Chromosome-level genome assembly of the sea cucumber, Colochirus anceps

Jiang, Chunxi; Wu, Qianwen; Su, Fang; Cui, Wei; Chen, Ting; Sun, Lina

doi:10.1038/s41597-025-05931-8

Download PDF

Data Descriptor
Open access
Published: 14 October 2025

Chromosome-level genome assembly of the sea cucumber, Colochirus anceps

Chunxi Jiang^1,2,3,
Qianwen Wu ORCID: orcid.org/0000-0001-9997-6262^1,2,3,
Fang Su^1,2,3,
Wei Cui^1,2,3,
Ting Chen ORCID: orcid.org/0000-0002-5777-909X⁴ &
…
Lina Sun^1,2,3

Scientific Data volume 12, Article number: 1643 (2025) Cite this article

Subjects

Abstract

Colochirus anceps is a benthic sea cucumber recognized for its striking aposematic coloration, ecological role in nutrient recycling, and biomedical potential. In this study, we utilized Illumina short-read, PacBio HiFi long-read, and Hi-C sequencing technologies to assemble the chromosome-level genome of C. anceps. This represents the first genome assembly for the order Dendrochirotida within the class Holothuroidea. The assembled genome has a total size of 2,238.33 Mb, with a contig N50 of 15.09 Mb, and 95.09% of the assembly sequences have been anchored to 23 chromosomes. Annotation identified 24,102 protein-coding genes and revealed a high repeat content of 70.95%. The genome achieved a completeness score of 94.3%, as evaluated through BUSCO analysis. This genome assembly provides valuable insights into the genomic architecture and evolutionary dynamics of sea cucumbers, laying a foundation for exploring their unique biological traits, adaptability, and diversity.

Background & Summary

Colochirus anceps (synonym: Cercodemas anceps) is a benthic sea cucumber within the order Dendrochirotida and the family Cucumariidae¹. This species is characterized by its elongated body and leathery skin, features typical of the class Holothuroidea². The striking coloration of C. anceps serves as an aposematic signal to deter predators. Chromatic vision, which detects hue and chroma, enables predators to perceive these warning colors, while achromatic vision processes luminance for contrast detection. Studies demonstrate that both chromatic and achromatic visual cues in C. anceps contribute significantly to predator avoidance³. Widely distributed in tropical regions, particularly in the waters of Vietnam and Malaysia¹, it plays an essential ecological role in intertidal and seagrass ecosystems⁴. As a deposit feeder, C. anceps contributes to nutrient recycling and sediment oxygenation, which, in turn, enhances infaunal biodiversity and primary productivity^4,5. Its adaptability to various substrates, including sandy and muddy bottoms, further underscores its ecological versatility. C. anceps has also garnered attention for its biomedical potential. It is a rich source of holostane-type triterpene saponins, bioactive compounds with potent cytotoxic properties¹. Specifically, cercodemasoide A, a saponin derived from this species, has demonstrated remarkable anticancer activity against various cancer cell lines, including hepatoma, melanoma, and breast cancer⁶. These findings highlight the therapeutic potential of C. anceps in cancer research and drug development.

In recent years, the increasing number of assembled and annotated sea cucumber genomes provides a critical foundation for investigating their evolutionary history, behavioral traits, and physiological adaptations, while significantly enriching the echinoderm genomic database. According to the latest classification of sea cucumbers by Miller et al. in 2017, the Holothuroidea class is divided into seven orders⁷. Among these, the orders Synallactida and Holothuriida have been the most extensively studied, with the highest number of genome assemblies^{8,9,10,11,12,13,14,15} (Table 1). Additionally, the order Apodida has seen progress in genomic research with the representative species Chiridota heheva having its genome successfully sequenced and analyzed^16,17 (Table 1). In contrast, the order Dendrochirotida remains relatively underexplored in genomic terms. Preliminary genome survey analyses have revealed that species within this order exhibit uniquely large genome sizes, ranging from 2238 to 3754 Mb, and are characterized by a high proportion of repeat sequences¹⁸. These genomic features imply that Dendrochirotida species may have evolved distinct adaptive mechanism.

Table 1 Chromosome-level reference genomes assembled in Holothuroidea.

Full size table

In this study, we selected Colochirus anceps as a representative species of the order Dendrochirotida and utilized PacBio long-read sequencing combined with Hi-C technology to assemble its chromosome-level reference genome. The assembly results revealed that the genome size of C. anceps is 2,238.33 Mb, comprising a total of 1,433 contigs with a contig N50 of 15.09 Mb. Using Hi-C technology, these contigs were further clustered and ordered into 23 chromosomes (2n = 46), consistent with the chromosome number reported for other published sea cucumber genomes. A total of 24,102 protein-coding genes were predicted, of which 96.6% (23,288 genes) were successfully annotated with functional information. In summary, the C. anceps genome represents the first chromosome-level reference genome assembled for a sea cucumber species in the order Dendrochirotida. This genome serves as a significant addition to the sea cucumber genomic database, offering an essential resource for studying genomic diversity and evolutionary relationships among different sea cucumber orders. Moreover, it provides a foundation for investigating the genetic mechanisms underlying the unique physiological traits and ecological adaptations of C. anceps.

Methods

Sample collection, library construction and sequencing

A healthy adult individual of the sea cucumber Colochirus anceps was collected from Xiamen, Fujian Province, China. Muscle, body wall, intestine, respiratory tree, nerve ring, tentacle, and gonad tissues were sampled, immediately frozen in liquid nitrogen, and stored at −80 °C. Muscle samples were used for next-generation sequencing (NGS), PacBio sequencing, and Hi-C sequencing, while both muscle and other tissue samples were used for transcriptome sequencing to assist genome annotation. All processes involved in DNA extraction, library construction, and sequencing were carried out by Novogene (Beijing, China) in strict accordance with the manufacturer’s protocols. Genomic DNA was extracted from muscle tissue to construct short-insert libraries (350 bp), which were sequenced on the Illumina platform, generating 374.0 Gb (160.28 × ) of 150 bp paired-end reads (Table 2). For high-fidelity (HiFi) reads, high-molecular-weight (HMW) DNA was extracted from muscle tissue using Novogene’s SDS method¹⁹. High-quality DNA samples (main band > 30 kb) were fragmented into 15–18 kb pieces using a Covaris ultrasonicator²⁰. Large DNA fragments were enriched and purified using magnetic beads, followed by damage repair and end repair. Sequencing adapters were ligated to the DNA fragments, forming circular templates, and unligated fragments were removed with exonuclease²¹. The prepared libraries were sequenced on the PacBio Sequel II platform, generating 91.4 Gb of PacBio CCS reads with a coverage depth of 39.17 × (Table 2). For Hi-C sequencing, Hi-C library preparation involved crosslinked DNA digestion, biotin labeling, proximity ligation, and DNA purification. The Hi-C libraries were sequenced on the Illumina NovaSeq platform, producing 249.9 Gb of Hi-C reads with a sequencing depth of 107.25 × (Table 2). For transcriptome sequencing, mRNA was enriched from total RNA using Oligo dT magnetic beads. After fragmentation, the first strand of cDNA was synthesized with random hexamer primers, followed by the synthesis of the second strand. Library preparation involved end repair, A-tailing, adapter ligation, fragment selection, amplification, and purification²². The prepared libraries were sequenced on the Illumina platform, generating a total of 51.39 Gb of raw data, including 6.25 Gb from muscle tissue, 6.99 Gb from body wall tissue, 7.90 Gb from intestinal tissue, 9.42 Gb from respiratory tree tissue, 6.03 Gb from gonad tissue, 6.40 Gb from tentacle tissue, and 8.39 Gb from nerve ring tissue²³ (Table 2).

Table 2 Sequencing data used for the assembly and annotation of Colochirus anceps genome.

Full size table

Genome survey, assembly and quality assessment

Before assembling the genome, a genome survey analysis was conducted using short-read sequencing data. K-mer analysis was performed using Jellyfish (v2.2.7) with a k-mer size of 17. The analysis revealed a main peak at a depth of approximately 30, yielding a total of 68,054,868,485 k-mers²⁴. Based on the formula K-mer number/depth, the estimated genome size was calculated to be approximately 2,268.5 Mb, with a corrected size of 2,238.33 Mb²⁵. Notably, this genome size is approximately 2-3 times larger than those reported for species in the order Synallactida. The substantial size difference may reflect distinct evolutionary trajectories in genome architecture, potentially involving differential transposable element activity. The genome heterozygosity rate was 1.06%, and the proportion of repetitive sequences was 69.39% (Table 3). The raw Pacbio sequencing data were quality-controlled using CCS (v5.0.0) with the parameter min-rq = 0.99, generating HiFi reads²⁶. Genome assembly was subsequently performed with Hifiasm (v0.19.5), assembling the HiFi reads into contigs²⁷. The assembly produced a total of 1,433 contigs, with a genome size of 2,407,851,961 bp and a contig N50 length of 15.09 Mb (Table 3). The contig-level genome was then clustered, oriented, and ordered to a near-chromosomal level by integrating Hi-C sequencing data and processing it with AllHiC (v0.9.8) (parameters: enz = DpnII, CLUSTER = n)²⁸. Finally, manual refinement based on chromosomal interaction intensity was carried out using Juicebox (v1.11.08), resulting in a chromosome-level genome assembly²⁹. After Hi-C-assisted assembly, a total of 23 sequences were assembled to the chromosome level (Table 4), matching the karyotypic characteristics observed in species from both Synallactida and Holothuriida orders. While 1,191 sequences remained unassembled at this level, the anchored sequences (2,289,615,330 bp) represent 95.09% of the total genome length (2,407,881,661 bp) (Table 3 and Fig. 1). The assembled genome was evaluated using BUSCO (Benchmarking Universal Single-Copy Orthologs)³⁰, and the results showed that 94.3% of the BUSCO genes were complete, indicating a high level of genome assembly completeness. (Table 3 and Fig. 2a).

Table 3 Statistics of the genome survey, assembly, Hi-C scaffolding and quality assessments.

Full size table

Table 4 Statistics of chromosome clustering counts and lengths.

Full size table

Repeat and ncRNA annotation

Repetitive sequences in the C. anceps genome were predicted using both homology-based and de novo methods. For the homology-based prediction, RepeatMasker (v4.1.2) was employed to identify sequences in the C. anceps genome that shared similarity with those in the RepBase database (parameters: -nolow -no_is -norna -pa 30)³¹. For the de novo prediction, a de novo repeat library for C. anceps was constructed using RepeatModeler (v2.0.3) (parameters: -engine ncbi -pa 30 -LTRStruct)³², and repetitive sequences were subsequently identified through de novo prediction using RepeatMasker (v4.1.2). The de novo repeat library was then integrated with the RepBase database, and RepeatMasker (v4.1.2) was used again to annotate repetitive sequences in the C. anceps genome. The results showed that 70.95% of C. anceps genome consisted of repetitive sequences (Table 5). Among them, 56,123 SINEs (short interspersed nuclear elements) accounted for 0.52%, 824,798 LINEs (long interspersed nuclear elements) accounted for 12.18%, and 1,057,255 LTRs (long terminal repeats) accounted for 11.88% (Table 5). Based on the structural characteristics of tRNA, tRNA in C. anceps genome were identified using tRNAscan-SE (v1.4)³³. For rRNA identification, the highly conserved rRNA sequences of a closely related species, Apostichopus japonicus, were used as reference sequences, and blast (v2.2.26) was employed to search for rRNA in C. anceps genome (parameters: -e 1e-10, -v 10000, -b 10000). To predict miRNA and snRNA, the covariance models from the Rfam were applied using Infernal (v1.1.5)³⁴. The analysis identified 1,458 miRNAs (0.007%), 38,549 tRNAs (0.120%), 7,136 rRNAs (0.057%), and 2,300 snRNAs (0.015%) in C. anceps genome (Table 6).

Table 5 Statistics of repetitive sequences annotation.

Full size table

Table 6 Statistics of non-coding RNA annotation.

Full size table

Protein-coding gene prediction, functional annotation, and genome structure visualization

Gene structure prediction was performed using three methods: de novo prediction, homology-based prediction, and transcript-based prediction. De novo prediction was conducted using Augustus (v3.5) and SNAP (v2013.11.29), which predict gene structures based on the statistical features of the genome, such as codon frequency and exon/intron distribution³⁵. Homology-based prediction utilized alignment tools, including blast (v2.2.26) and GeneWise (v2.4.1), to identify and predict gene structures by aligning known protein-coding sequences from homologous species to C. anceps genome³⁶. The reference species used for homology alignment included Apostichopus japonicus (Ajap)¹², Holothuria leucospilota (Hleu)⁸, and Synapta maculata (Smac). Transcript-based prediction relied on RNA-seq data from various tissues of C. anceps. ORF prediction and protein alignment were carried out using Hisat2 (v2.2.1), StringTie (v2.2.1), and TransDecoder (v5.7.1)^37,38. Finally, the gene sets predicted by the above methods were integrated into a non-redundant gene set using EVidenceModeler (EVM)³⁹. The annotation results from EVM were further refined using PASA (v2.4.1), which added information such as UTRs and alternative splicing, resulting in the final gene set⁴⁰. A total of 24,102 protein-coding genes were predicted in the C. anceps genome. The average transcript length, average CDS length, average exon length, and average intron length were 35,790.07 bp, 1,401.02 bp, 205.39 bp, and 5,907.41 bp, respectively (Table 7; Fig. 3; Fig. 4a). Using blastp (v 2.2.26) and diamond (v0.8.22)⁴¹, the gene set was aligned against commonly used protein databases, including SwissProt, NR, Pfam, KEGG, and InterPro. Functional annotation of protein-coding genes was performed with InterProScan (v5.59-91.0)⁴². The results showed that 22,211 genes were aligned to the NR database, 16,181 genes to the SwissProt database, 16,864 genes to the KEGG database, and 21,944 genes to the InterPro database. After integration, 96.6% of the genes (a total of 23,288 genes) in the gene set were successfully assigned functional annotations (Fig. 4b). The chromosome sizes were calculated using the sequence processing tools seqtk and pyfaidx. Bedtools was employed to create a BED file with 100 Kb intervals (parameter: makewindows -g -w 100000) and to calculate the number of genes and GC content within each interval⁴³. The number and location of repetitive sequences were analyzed using RepeatMasker, while LTR_finder and LTR_retriever (v2.9.0) were utilized to identify and integrate LTR information (parameters: -D 15000 -d 1000 -L 700 -l 100 -p 20 -C -M 0.9)⁴⁴. Finally, the aforementioned data were integrated and visualized as a circular plot (Fig. 5) using Circos v0.69-9⁴⁵.

Table 7 Statistics of gene structure prediction.

Full size table

Data Records

The genomic Illumina sequencing data were deposited in the SRA at NCBI SRR31917503⁴⁶.

The genomic PacBio sequencing data were deposited in the SRA at NCBI SRR31917502⁴⁷.

The transcriptomic sequencing data were deposited in the SRA at NCBI SRR31917493- SRR31917499^{48,49,50,51,52,53,54}.

The Hi-C sequencing data were deposited in the SRA at NCBI SRR31917492⁵⁵, SRR31917500⁵⁶, and SRR31917501⁵⁷.

The final chromosome assembly and genome annotation files are available in Genbank⁵⁸ and Figshare⁵⁹.

Technical Validation

Using BUSCO to assess genome completeness, the results revealed that 94.3% of the BUSCO genes were successfully assembled, with 89.8% classified as complete and single-copy BUSCOs, 4.5% as complete duplicated BUSCOs, 3.2% as fragmented BUSCOs, and 2.5% as missing BUSCOs, indicating a relatively high completeness of the assembly (Fig. 2a). Short-read libraries were aligned to the assembled genome using BWA to evaluate the alignment rate, genome coverage, and depth distribution, assessing the completeness and sequencing uniformity. The results showed a read alignment rate of approximately 98.37% and genome coverage of about 99.36%, indicating strong consistency between the reads and the assembled genome (Fig. 2b and Table 8). To further analyze potential GC bias and contamination, the GC content and average depth of the assembled genome were calculated using 10 kb windows. The results demonstrated a GC content concentrated around 41.65%, with no obvious scatterplot separation, suggesting no significant GC bias or external contamination in the genome (Fig. 2c). Genome quality was assessed using Merqury software based on K-mer analysis, yielding a Qv (quality value) of 32.7327, indicating an accuracy greater than 99.9% (Table 8). In summary, multiple evaluation methods confirmed that the assembled genome exhibits high consistency, completeness, and accuracy Fig. 5.

Table 8 Statistics of genome reads coverage.

Full size table

Data availability

All raw sequencing data can be obtained from NCBI^{46,47,48,49,50,51,52,53,54,55,56,57}, the genome assembly⁵⁸ and annotation files⁵⁹ can be accessed from the GenBank and Figshare databases.

Code availability

All software and pipelines used in this study were executed according to the manual and protocols of the published bioinformatic tools. The versions of the software have been given in the Methods.

References

Cuong, N. X. et al. Cytotoxic triterpene saponins from Cercodemas anceps. Bioorg Med Chem Lett 25, 3151–3156 (2015).
Article CAS PubMed Google Scholar
Aminin, D. L. et al. Anticancer activity of sea cucumber triterpene glycosides. Mar Drugs 13, 1202–1223 (2015).
Article CAS PubMed PubMed Central Google Scholar
Lim, A. Y. H., Chan, I. Z. W., Carrasco, L. R. & Todd, P. A. Aposematism in pink warty sea cucumbers: independent effects of chromatic and achromatic cues. Marine Ecology Progress Series 631, 157–164 (2019).
Article ADS Google Scholar
Woo, S. P. et al. Sea cucumber species of the Merambong Shoal with notes on the distribution and habitat of the dominant species. Malayan Nature Journal 66, 9 (2015).
Google Scholar
Zhou, J. J. & Goh, B. P. Distribution and abundance of echinoderm communities in the intertidal shores of Singapore. Bulletin of Marine Science (2024).
Nga, N. T. P. et al. Nanoliposomal Cercodemasoide A and Its Improved Activities Against NTERA-2 Cancer Stem Cells. Natural Product Communications 15 (2020).
Miller, A. K. et al. Molecular phylogeny of extant Holothuroidea (Echinodermata). Mol Phylogenet Evol 111, 110–131 (2017).
Article PubMed Google Scholar
Chen, T. et al. The Holothuria leucospilota genome elucidates sacrificial organ expulsion and bioadhesive trap enriched with amyloid-patterned proteins. Proc Natl Acad Sci USA 120, e2213512120 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhong, S. et al. The draft genome of the tropical sea cucumber Stichopus monotuberculatus (Echinodermata, Stichopodidae) reveals critical genes in fucosylated chondroitin sulfates biosynthetic pathway. Front Genet 14, 1182002 (2023).
Article CAS PubMed PubMed Central Google Scholar
Zhong, S. et al. Chromosomal-level genome assembly and annotation of the tropical sea cucumber Holothuria scabra. Sci Data 11, 474 (2024).
Article CAS PubMed PubMed Central Google Scholar
Luo, H. et al. De novo genome assembly and annotation of Holothuria scabra (Jaeger, 1833) from nanopore sequencing reads. Genes Genomics 44, 1487–1498 (2022).
Article CAS PubMed Google Scholar
Sun, L., Jiang, C., Su, F., Cui, W. & Yang, H. Chromosome-level genome assembly of the sea cucumber Apostichopus japonicus. Sci Data 10, 454 (2023).
Article CAS PubMed PubMed Central Google Scholar
Shao, G. et al. The genome of a hadal sea cucumber reveals novel adaptive strategies to deep-sea environments. iScience 25, 105545 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Chen, T. et al. Chromosome-level genome assembly and annotation of the tropical sea cucumber Stichopus monotuberculatus. Scientific Data 11, 1245 (2024).
Article CAS PubMed PubMed Central Google Scholar
Lau, N.-S. et al. A chromosomal-level genome assembly of the tropical sea cucumber Stichopus fusiformiossa. Scientific Data 12, 973 (2025).
Article CAS PubMed PubMed Central Google Scholar
Zhang, L. et al. The genome of an apodid holothuroid (Chiridota heheva) provides insights into its adaptation to a deep-sea reducing environment. Commun Biol 5, 224 (2022).
Article CAS PubMed PubMed Central Google Scholar
Pu, Y., Zhou, Y., Liu, J. & Zhang, H. A high-quality chromosomal genome assembly of the sea cucumber Chiridota heheva and its hydrothermal adaptation. Gigascience 13 (2024).
Jiang, C., Yang, H., Liu, B. & Sun, L. Genome variations in sea cucumbers: Insights from genome survey sequencing and comparative analysis of mitochondrial genomes. Comp Biochem Physiol Part D Genomics Proteomics 52, 101328 (2024).
Article CAS PubMed Google Scholar
Natarajan, V. P., Zhang, X., Morono, Y., Inagaki, F. & Wang, F. A Modified SDS-Based DNA Extraction Method for High Quality Environmental DNA from Seafloor Environments. Front Microbiol 7, 986 (2016).
Article PubMed PubMed Central Google Scholar
Jansson, L. et al. Assessment of DNA quality for whole genome library preparation. Analytical Biochemistry 695, 115636 (2024).
Article CAS PubMed Google Scholar
Ruhela, A., Skouridou, V. & Masip, L. Capture, detection and purification of dsDNA amplicons using a DNA binding protein on magnetic beads. Analytical Biochemistry 658, 114923 (2022).
Article CAS PubMed Google Scholar
Kashima, M., Deguchi, A., Tezuka, A. & Nagano, A. J. Low-cost and Multiplexable Whole mRNA-Seq Library Preparation Method with Oligo-dT Magnetic Beads for Illumina Sequencing Platforms. Bio Protoc 10, e3496 (2020).
Article CAS PubMed PubMed Central Google Scholar
Cock, P. J., Fields, C. J., Goto, N., Heuer, M. L. & Rice, P. M. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 38, 1767–1771 (2010).
Article CAS PubMed Google Scholar
Hesse, U. K-Mer-Based Genome Size Estimation in Theory and Practice. Methods Mol Biol 2672, 79–113 (2023).
Article CAS PubMed Google Scholar
Mgwatyu, Y., Stander, A. A., Ferreira, S., Williams, W. & Hesse, U. Rooibos (Aspalathus linearis) Genome Size Estimation Using Flow Cytometry and K-Mer Analyses. Plants (Basel) 9 (2020).
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 37, 1155–1162 (2019).
Article CAS PubMed PubMed Central Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zhang, X., Zhang, S., Zhao, Q., Ming, R. & Tang, H. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat Plants 5, 833–845 (2019).
Article CAS PubMed Google Scholar
Robinson, J. T. et al. Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data. Cell Syst 6, 256–258.e251 (2018).
Article CAS PubMed PubMed Central Google Scholar
Seppey, M., Manni, M. & Zdobnov, E. M. BUSCO: Assessing Genome Assembly and Annotation Completeness. Methods Mol Biol 1962, 227–245 (2019).
Article CAS PubMed Google Scholar
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics Chapter 4, 4.10.11–14.10.14 (2009).
Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci USA 117, 9451–9457 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Chan, P. P. & Lowe, T. M. tRNAscan-SE: Searching for tRNA Genes in Genomic Sequences. Methods Mol Biol 1962, 1–14 (2019).
Article CAS PubMed PubMed Central Google Scholar
Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A. & Eddy, S. R. Rfam: an RNA family database. Nucleic Acids Res 31, 439–441 (2003).
Article CAS PubMed PubMed Central Google Scholar
Chan, K. L. et al. Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data. BMC Bioinformatics 18, 1426 (2017).
Article PubMed Google Scholar
Birney, E. & Durbin, R. Using GeneWise in the Drosophila annotation experiment. Genome Res 10, 547–548 (2000).
Article CAS PubMed PubMed Central Google Scholar
Kim, H. S. et al. De novo assembly and annotation of the marine mysid (Neomysis awatschensis) transcriptome. Mar Genomics 28, 41–43 (2016).
Article PubMed Google Scholar
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol 37, 907–915 (2019).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol 9, R7 (2008).
Article PubMed PubMed Central Google Scholar
Jia, H. et al. PASA: Identifying More Credible Structural Variants of Hedou12. IEEE/ACM Trans Comput Biol Bioinform 17, 1493–1503 (2020).
Article CAS PubMed Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat Methods 12, 59–60 (2015).
Article CAS PubMed Google Scholar
Quevillon, E. et al. InterProScan: protein domains identifier. Nucleic Acids Res 33, W116–120 (2005).
Article CAS PubMed PubMed Central Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Ou, S. & Jiang, N. LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant Physiol 176, 1410–1422 (2018).
Article CAS PubMed Google Scholar
Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res 19, 1639–1645 (2009).
Article CAS PubMed PubMed Central Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR31917503 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR31917502 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR31917493 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR31917494 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR31917495 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR31917496 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR31917497 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR31917498 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR31917499 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR31917492 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR31917500 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR31917501 (2025).
Sun, L. GenBank https://identifiers.org/insdc.gca:GCA_051549935.1 (2025).
Sun, L. & Jiang, C. Genome assembly and annotation files of the sea cucumber (Colochirus anceps). fgshare https://doi.org/10.6084/m9.figshare.28122221 (2024).
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China [grant number No. 42276143], Shandong Provincial Natural Science Foundation [grant number No. ZR2024YQ050] and Taishan Scholar Foundation of Shandong Province [grant number No. 202306279].

Author information

Authors and Affiliations

Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
Chunxi Jiang, Qianwen Wu, Fang Su, Wei Cui & Lina Sun
Laboratory for Marine Ecology and Environmental Science, Qingdao Marine Science and Technology Center, Qingdao, 266237, China
Chunxi Jiang, Qianwen Wu, Fang Su, Wei Cui & Lina Sun
University of Chinese Academy of Sciences, Beijing, 100049, China
Chunxi Jiang, Qianwen Wu, Fang Su, Wei Cui & Lina Sun
South Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou, 510301, China
Ting Chen

Authors

Chunxi Jiang
View author publications
Search author on:PubMed Google Scholar
Qianwen Wu
View author publications
Search author on:PubMed Google Scholar
Fang Su
View author publications
Search author on:PubMed Google Scholar
Wei Cui
View author publications
Search author on:PubMed Google Scholar
Ting Chen
View author publications
Search author on:PubMed Google Scholar
Lina Sun
View author publications
Search author on:PubMed Google Scholar

Contributions

L.S. contributed to the research design. C.J., Q.W. and F.S. collected the samples. W.C., T.C. and C.J. analyzed the data. C.J. and L.S. wrote the draf manuscript and revised the manuscript. All co-authors contributed to this manuscript and approved it.

Corresponding author

Correspondence to Lina Sun.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Jiang, C., Wu, Q., Su, F. et al. Chromosome-level genome assembly of the sea cucumber, Colochirus anceps. Sci Data 12, 1643 (2025). https://doi.org/10.1038/s41597-025-05931-8

Download citation

Received: 28 May 2025
Accepted: 02 September 2025
Published: 14 October 2025
DOI: https://doi.org/10.1038/s41597-025-05931-8

Subjects

Abstract

Background & Summary

Methods

Sample collection, library construction and sequencing

Genome survey, assembly and quality assessment

Repeat and ncRNA annotation

Protein-coding gene prediction, functional annotation, and genome structure visualization

Data Records

Technical Validation

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links