Chromosome-level genome assembly and annotation of a sea toad (Chaunax sp.)

Zhan, Zifeng; Liang, Yanshuo; Wei, Jiehong; Liu, Jing; Xu, Kuidong

doi:10.1038/s41597-024-04245-5

Download PDF

Data Descriptor
Open access
Published: 19 December 2024

Chromosome-level genome assembly and annotation of a sea toad (Chaunax sp.)

Zifeng Zhan^1,2^na1,
Yanshuo Liang ORCID: orcid.org/0000-0001-5179-5956^1,2,3^na1,
Jiehong Wei ORCID: orcid.org/0000-0001-6401-6747^1,2,3,
Jing Liu^1,2 &
…
Kuidong Xu^1,2,3

Scientific Data volume 11, Article number: 1397 (2024) Cite this article

2229 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

Abstract

The sea toad genus Chaunax is a group of small benthic fishes that predominantly inhabiting the deep seas of the Atlantic, Indian, and Pacific Oceans. Although they have the potential to make excellent systems for studies of evolutionary adaptation to deep-sea environments, genomic research on Chaunax has been hindered by a scarcity of high-quality genomic resources. We present a chromosome-scale genome assembly of a Chaunax specimen generated using PacBio long-read sequencing and high-throughput chromosome conformation capture technology. The size of the assembled genome was 706.94 Mb, with a contig N50 of 15.24 Mb and scaffold N50 of 29.42 Mb. Approximately 96.11% of assembled sequences were anchored and oriented onto 24 pseudo-chromosomes. The genome contained 213.47 Mb repetitive sequences, 25,280 protein-coding genes, and 5,090 non-coding RNAs. The high ratio of complete BUSCO genes (97.20%) indicates high quality of genome assembly. The chromosomal-level reference genome of Chaunax sp. provides a preliminary molecular basis for understanding deep-sea adaptation and phenotypic evolution as well as an important reference for whole-genome sequencing of related species.

Chromosome-level genome assembly of Fistularia commersonii (Syngnathiformes, Fistulariidae)

Article Open access 20 January 2025

A chromosome-level reference genome assembly of the Small snakehead (Channa asiatica)

Article Open access 08 July 2025

A chromosome-level genome assembly of a deep-sea starfish (Zoroaster cf. ophiactis)

Article Open access 01 August 2023

Background &Summary

Members of the genus Chaunax (Lophiiformes: Chaunacidae) are commonly referred to as sea toads, and they are extensively distributed across deep-seas areas at mid- to low latitudes^1,2,3. These sea toads are particularly prevalent on the outer continental shelf and the upper continental slope, inhabiting depths ranging from 200 to 2,500 m^3,4. Their morphologically distinctive traits include a head that is rounded to slightly cuboidal; a skin texture interspersed with minute, spine-like scales; and the presence of lateral-line neuromasts on both the head and body³. These features are thought to enhance their sensory capabilities, which are crucial for their survival in challenging deep-sea habitats^5,6.

Similar to other members of the order Lophiiformes, sea toads are benthic and use their pectoral and pelvic fins for support and locomotion across the seafloor, a movement often likened to ‘walking’⁷. They use a sit-and-wait strategy to capture their prey and can employ their esca (a specialized lure at the end of the first dorsal fin spine) to entice prey within striking distance^8,9. These distinctive morphological and behavioral traits make Chaunax species excellent models for studies of evolutionary adaptations to deep-sea environments.

Genetic and evolutionary insights into Chaunax have been limited by a lack of deep-sea sampling and thus genomic data for Chaunacidae. Here, we collected a sea toad specimen in the genus Chaunax from the Zhenbei seamount at a depth of 555.3 m using a submersible vehicle (Faxian) (Fig. 1). The Chaunax specimen had a pinkish-red body with bright-white patches on the dorsal surface, and its skin was covered with a mix of numerous bifurcated and simple spinules. Specialized pectoral fins resembling ‘little feet’ support the body and move quickly across the sea floor. We employed PacBio long-read sequencing, Illumina sequencing, and high-throughput chromosome conformation capture (Hi-C) technology to generate an annotated chromosome-level genome assembly of this Chaunax specimen. This high-quality genome assembly will aid future studies of the deep-sea adaptation and phenotypic evolution of this species, as well as genomic analyses of related species. Our assembly will also advance our understanding of the genomic evolution and phylogenetic relationships within the order Lophiiformes.

Methods

Sample collection and DNA extraction

The Chaunax specimen was collected using a submersible vehicle (Faxian) from the Zhenbei seamount (15°04′45.888″N, 116°33′55.210″E, 555.3 m deep) in the South China Sea (Fig. 1). The sea toad sample was kept in a closed sample chamber placed inside the sample basket of the submersible. The sample was then immediately stored in liquid nitrogen. All experimental protocols were approved by relevant guidelines and regulations established by the Institutional Animal Care and Use Committee of the Institute of Oceanology, Chinese Academy of Science. Sea toad muscle was used to extract genomic DNA using the sodium dodecyl sulfate method¹⁰. The quantity and quality of DNA were determined using agarose gel electrophoresis and a Qubit Fluorometer, respectively.

Illumina sequencing and genome size estimation

Illumina paired-end libraries with insert sizes of 350 bp were constructed and sequenced on an Illumina NovaSeq6000 platform (Illumina, CA, USA). Low-quality reads and sequencing adaptor-contaminated reads were trimmed using fastp¹¹ with the following parameters: -q 10 -u 50 -y -g -Y 10 -e 20 -l 100 -b 150 -B 150. A total of 32.41 Gb Illumina short reads (clean data) were generated and retained for the genome survey (Table 1). The genome survey of Chaunax sp. was performed using the K-mer method. K-mer analysis was conducted using jellyfish v2.2.7¹² with an optimal K-value of 17 (Table 2). The K-mer frequency distribution map was generated to estimate the genome size, heterozygosity, and proportion of repetitive sequences using GenomeScope software¹³. A total of 24,428,214,893 K-mers were obtained using a K-mers peak at a depth of 35 (Table 2). The genome size estimated using the formula K_num/K_depth was approximately 683.46 Mb. The heterozygosity rate and proportion of repetitive sequences were 0.38% and 36.16%, respectively (Table 2).

Table 1 Sequencing data used for the assembly and annotation of Chaunax sp. genome.

Full size table

Table 2 Statistics of the genome survey, genome assembly and quality assessments.

Full size table

PacBio long-read sequencing and Hi-C sequencing

The genomic DNA was fragmented to 20 kb and sequenced using the PacBio Sequel II platform (Pacific Biosciences, USA) following the manufacturer’s protocols. Briefly, the gDNA was sheared using the g-TUBE device (Covaris) to the target fragment size for the construction of 20-kb libraries. Damage repair and end repair were performed using the SMRTbell Damage Repair Kit on the interrupted DNA fragments. After attaching dumbbell adapters, the fragments were digested by exonuclease. BluePippin electrophoresis (Sage Science, MA, USA) was used for size selection of the sequencing library, and the cutoff threshold size was set to 20 kb. A total of 23.67 Gb high-quality HiFi reads were produced using the circular consensus sequencing mode on the PacBio Sequel II platform (Table 1).

A Hi-C library was constructed by chromatin cross-linking, restriction endonuclease (DpnII) cleavage, end repairing and biotin labeling, ligation, and DNA purification and shearing; interacting DNA fragments were captured by streptavidin magnetic beads for library construction. A Qubit 2.0 Fluorometer and Agilent 2100 Bioanalyzer were used to detect the concentration and insert size of the library, and Q-PCR was used to accurately quantify the effective concentration of the library to ensure its quality. The Hi-C sequencing library was sequenced on an Illumina NovaSeq (PE150) platform. A total of 70.63 Gb Hi-C data were generated after quality control (Table 1).

Chromosome-level genome assembly

The high-accuracy PacBio HiFi reads were assembled into the initial set of contigs using Hifiasm v0.19¹⁴ with the following parameters: -l 2 -n 4. Sequence contamination in the genome assembly and mitochondrial sequences were removed via comparison of the genome assembly with the nucleotide sequence database (nt) and mitochondrial database (https://ftp.ncbi.nlm.nih.gov/refseq/release/mitochondrion/) from the National Center for Biotechnology Information (NCBI). The initial genome assembly length was 709.63 Mb with a contig number of 354 and N50 of 15.24 Mb (Table 2). The size of the assembled genome nearly matched that of the results of the genome survey, which reflected the high accuracy and integrity of our assembled genomes.

To generate the chromosome-level assembly, the unmapped pair-end reads, singleton reads, and other invalid reads were filtered using HiC-Pro v2.10.0¹⁵. Only uniquely valid interaction pairs (79.12%) were retained for further assembly. A total of 70.63 Gb clean reads pairs were obtained from the Hi-C library and mapped to the assembled genome using BWA v0.7.17¹⁶. The uniquely mapped data were retained, and the clustered contigs were sorted and oriented using LACHESIS software¹⁷. After Hi-C assembly and manual heat map adjustment, a total of 679.46 Mb of genome sequences were anchored and oriented onto 24 pseudo-chromosomes (Fig. 2a and Table 3). These pseudo-chromosomes, ranging in size from 13.00 to 35.46 Mb (Fig. 2b and Table 3), comprised approximately 96.11% of the total genome. Finally, the chromosome-level genome assembly with a total length of 706.94 Mb was obtained; this assembly showed a high level of continuity, with a scaffold N50 of 29.42 Mb (Table 2).

Table 3 Summary of assembled 24 chromosomes of Chaunax sp.

Full size table

Transcriptome sequencing and assembly

RNA was extracted from multiple tissue samples, including muscle, kidney, liver, gonad, and cholecyst, using the TRIzol reagent (Thermo Fisher Scientific). RNA integrity and quality were assessed using agarose gel electrophoresis, and the quantity of RNA was determined using a Qubit Fluorometer. Sequencing libraries were constructed using the NEBNext® UltraTM RNA Library Prep Kit for Illumina® (NEB, USA) per the manufacturer’s instructions. Illumina RNA-seq libraries were prepared and sequenced on an Illumina NovaSeq6000 platform, and 150 bp paired-end reads were generated. After trimming based on the quality scores using Trimmomatic-0.39¹⁷, a total of 6.77 Gb of clean transcriptome data were obtained (Table 1). The clean reads were aligned to the sea toad genomes using HISAT2 v2.2.1¹⁸.

Annotation of repetitive sequences

Transposable elements (TEs) and tandem repeats were annotated via the following workflows. TEs were identified via a combination of homology-based searches and de novo prediction. A de novo repeat library was generated using RepeatModeler2 v2.0.1¹⁹, along with the two embedded programs RECON v1.0.8²⁰ and RepeatScout v1.0.6²¹, all of which were operated using default parameters. Long terminal repeats (LTRs) were identified using LTR_retriever v2.9.0²², and they were mainly identified using the predicted results of LTRharvest v1.5.10²³ and LTR_FINDER v1.07²⁴. A non-redundant species-specific TE library was constructed by combining the de novo TE sequence library above with the known Dfam v3.5 database, and RepeatClassifier¹⁹ was used to classify the prediction results. Finally, RepeatMasker v4.1.2²⁵ was used to predict the TEs of the genome based on the constructed repeat sequence database. Tandem repeats were annotated by TRF v4.09²⁶ and MISA v2.1²⁷. The results showed that 30.20% of repetitive sequences were annotated in the Chaunax sp. genome, and the total length of the TEs was 143.93 Mb, which comprised 20.36% of the genome; the total length of tandem repeat sequences was 69.54 Mb, which comprised 9.84% of the genome (Table 4).

Table 4 Statistics of the annotated Chaunax sp. repeat sequences.

Full size table

Gene prediction and functional annotation

Protein-coding genes were annotated using a combination of ab initio prediction, homology-based searches, and RNA sequencing (RNA-seq) (Table 5). Augustus v.3.1.0²⁸ and SNAP v2006-07-28²⁹ were used for ab initio gene prediction with default parameters. GeMoMa v.1.7³⁰ software was performed for homology-based prediction. The protein sequences of Lophius litulon³¹, Lophiodes sp.³¹, Solocisquama erythrina³¹, Takifugu rubripes³², and Thamnaconus septentrionalis³³ were downloaded from the NCBI and figshare database. RNA-seq-based gene prediction was performed by mapping clean RNA-seq reads to the reference genome using HISAT2 v2.2.1¹⁸, and the transcripts were assembled by StringTie v.1.2.3³⁴. GeneMarkS-T v5.1³⁵ was used to predict genes based on the assembled transcripts. PASA v2.4.1³⁶ software was used to predict genes based on the unigenes (and full-length transcripts from the PacBio HiFi sequencing) assembled by Trinity v.2.11³⁷. Gene models from these different approaches were combined using EVM v1.1.1³⁸ and updated by PASA. The final gene models were annotated by blasting sequences against the GenBank NR, eggNOG, GO, KEGG, TrEMBL, KOG, Pfam, and SwissProt databases using an E-value cut-off of 1 × 10⁻⁵. We predicted a total of 25,280 protein-coding genes in the Chaunax sp. genome via multiple methods (Table 5), with a gene length of 398,546,511 bp, exon length of 55,947,487 bp, coding sequence length of 42,599,371 bp, and intron length of 342,599,024 bp (Table 6). A total of 23,457 genes (92.79% of the total) were functionally annotated using public databases (Table 7).

Table 5 Gene prediction through integrating multiple methods.

Full size table

Table 6 The comparison of gene models annotated from the Chaunax sp. genome with orther closely related species.

Full size table

Table 7 Functional annotation of Chaunax sp. predicted gene.

Full size table

Non-coding RNA annotation and pseudogene prediction

For non-coding RNA annotation, tRNAs and rRNAs were predicted using tRNAScan-SE v1.3.1³⁹ and Barrnap v0.9 (the parameters deployed in barrnap–kingdom euk)⁴⁰, respectively; miRNAs, snRNAs, and snoRNAs were identified using Infernal v1.1⁴¹ software against the Rfam v14.5⁴² database. A total of 2,462 tRNAs, 1,811 rRNAs, 202 miRNAs, 440 snRNAs, and 175 snoRNAs were predicted (Table 8).

Table 8 Non-coding RNAs and pseudogenes statistics of Chaunax sp. genome.

Full size table

GenBlastA v1.0.4⁴³ software was used to scan the whole genomes after masking predicted functional genes. Putative candidates were then analyzed by searching for premature stop codons and frame-shift mutations using GeneWise v2.4.1⁴⁴. Finally, a total of 67 pseudogenes were identified, encompassing a combined length of 496,033 bp, with an average length of 7,403 bp (Table 8).

Data Records

The Chaunax sp. genome has been deposited in GenBank under the accession number JBAGJB000000000⁴⁵ and the BioProject number PRJNA1068823. The Illumina, PacBio HiFi, Hi-C, and RNAseq data have been deposited in the NCBI Sequence Read Archive under the accession number SRR27768100-SRR27768103⁴⁶. The genome-related annotation files can be accessed through Figshare at https://doi.org/10.6084/m9.figshare.25100474⁴⁷. The specific accessions are provided in the Material and Methods sections describing the data and analyses.

Technical Validation

Genome assembly and annotation completeness evaluation

The completeness of the final genome assembly was evaluated using the Benchmarking Universal Single-Copy Orthologs (BUSCO v5.2.1)⁴⁸ program, along with the actinopterygii_odb10 database and the Core Eukaryotic Genes Mapping Approach (CEGMA v2.5)⁴⁹. This yielded 97.06% of the complete BUSCO genes and 99.56% of the core eukaryotic genes (Table 2). The trimmed Illumina short reads and PacBio long reads were mapped against the assembled genome using BWA to evaluate the accuracy of the assembly, and the mapping rates were 99.63% and 99.75%, respectively (Table 3). The BUSCO completeness of the predicted gene models was determined against the actinopterygii_odb10 database under the protein mode, and the orthologous genes of Chaunax sp. contributed 97.20% of the complete genes, indicating the high completeness of the gene annotation (Table 2).

The interaction strength among chromosomes was evaluated and a Hi-C interaction heat map was constructed using HiCPlotter software⁵⁰. The assembled sequences were anchored and oriented onto 24 pseudo-chromosomes (Fig. 2); within each pseudo-chromosome group, the interaction strength was higher at diagonal positions than at non-diagonal positions, suggesting that the quality of our genome assembly was high.

Code availability

Genome assembly:

(1) Hifiasm: parameter: -l 2 -n 4.

(2) HiC-Pro: all parameters were set as default.

(3) Bwa: parameter: bwa aln && bwa sampe.

(4) LACHESIS: parameter: CLUSTER_MIN_RE_SITES = 328; CLUSTER_MAX_LINK_DENSITY = 2; ORDER_MIN_N_RES_IN_TRUNK = 15; ORDER_MIN_N_RES_IN_SHREDS = 15.

Genome annotation:

(1) RepeatModeler: parameter: BuildDatabase -name && RepeatModeler -pa 12.

(2) LTR_retriever: all parameters were set as default.

(3) LTR_FINDER: parameter: ltr_finder -w 2 -C -D.

(4) RepeatMasker: parameter: repeatmasker -nolow -no_is -norna -engine wublast -parallel 8 -qq.

(5) TRF: parameter: trf 2 7 7 80 10 50 500 -d -h.

(6) MISA: all parameters were set as default.

(7) Augustus: all parameters were set as default.

(8) SNAP: all parameters were set as default.

(9) GeMoMa: parameter: run.sh mmseqs.

(10) Hisat: parameter: hisat2 -- dta -p 10.

(11) Stringtie: parameter: stringtie -p 2.

(12) GeneMarkS-T: all parameters were set as default.

(13) PASA: all parameters were set as default.

(14) Trinity: parameter: Trinity -- genome_guided_bam.

(15) EVidenceModeler: all parameters were set as default.

References

Caruso, J. H. Systematics and distribution of the atlantic chaunacid anglerfishes (Pisces: Lophiiformes). Copeia 1, 153–165, https://doi.org/10.2307/1445616 (1989).
Article Google Scholar
Ho, H. & Shao, K. A new species of Chaunax (Lophiiformes: Chaunacidae) from the western South Pacific, with comment on C. latipunctatus. Zootaxa 2445, 53–61, https://doi.org/10.11646/zootaxa.2445.1.3 (2010).
Article Google Scholar
Ho, H. C., Roberts, C. D. & Stewart, A. L. A review of the anglerfish genus Chaunax (Lophiiformes: Chaunacidae) from New Zealand and adjacent waters, with descriptions of four new species. Zootaxa 3620, 89–111, https://doi.org/10.11646/zootaxa.3620.1.4 (2013).
Article PubMed Google Scholar
Caruso, J. H., Ross, S. W., Sulak, K. J. & Sedberry, G. R. Deep-water chaunacid and lophiid anglerfishes (Pisces: Lophiiformes) off the South-Eastern United States. J Fish Biol. 70, 1015–1026, https://doi.org/10.1111/j.1095-8649.2007.01360.x (2007).
Article Google Scholar
Ho, H. C., Kawai, T. & Satria, F. Species of the anglerfish genus Chaunax from Indonesia, with descriptions of two new species (Lophiiformes: Chaunacidae). Raffles B Zool. 63, 301–308, http://zoobank.org/urn:lsid:zoobank.org:pub:8DBC268B-F75B-4EDF-95A5-2441AAD61600 (2015).
Google Scholar
Ho, H. C. & Ma, W. C. Four new species of the frogmouth genus Chaunax (Lophiiformes: Chaunacidae) from Taiwan and the Philippines. Zootaxa 5189, 146–179, https://doi.org/10.11646/zootaxa.5189.1.17 (2022).
Article PubMed Google Scholar
Lundsten, L., Johnson, S. B., Cailliet, G. M., Devogelaere, A. P. & Clague, D. A. Morphological, molecular, and in situ behavioral observations of the rare deep-sea anglerfish Chaunacops coloratus (garman, 1899), order Lophiiformes, in the eastern North Pacific. Deep Sea Res. I. 68, 46–53, https://doi.org/10.1016/j.dsr.2012.05.012 (2012).
Article CAS Google Scholar
Miya, M. et al. Evolutionary history of anglerfishes (Teleostei: Lophiiformes): a mitogenomic perspective. BMC Evol. Biol. 10, 58, https://doi.org/10.1186/1471-2148-10-58 (2010).
Article CAS PubMed PubMed Central Google Scholar
Moore, T. Y. & Biewener, A. A. Outrun or outmanoeuvre: predator-prey interactions as a model system for integrating biomechanical studies in a broader ecological and evolutionary context. Integr. Comp. Biol. 55, 1188–1197, https://doi.org/10.1093/icb/icv074 (2015).
Article PubMed Google Scholar
Natarajan, V. P., Zhang, X. X., Morono, Y., Inagaki, F. & Wang, F. P. A modified SDS-based DNA extraction method for high quality environmental DNA from seafloor environments. Front. Microbiol. 7, 986, https://doi.org/10.3389/fmicb.2016.00986 (2016).
Article PubMed PubMed Central Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, 884–890, https://doi.org/10.1093/bioinformatics/bty560 (2018).
Article CAS Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).
Article CAS PubMed PubMed Central Google Scholar
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204, https://doi.org/10.1093/bioinformatics/btx153 (2017).
Article CAS PubMed PubMed Central Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5 (2021).
Article CAS PubMed PubMed Central Google Scholar
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259, https://doi.org/10.1186/s13059-015-0831-x (2015).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760, https://doi.org/10.1093/bioinformatics/btp324 (2009).
Article CAS PubMed PubMed Central Google Scholar
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125, https://doi.org/10.1038/nbt.2727 (2013).
Article CAS PubMed PubMed Central Google Scholar
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 37, 907–915, https://doi.org/10.1038/s41587-019-0201-4 (2019).
Article CAS PubMed PubMed Central Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. PNAS 117, 9451–9457, https://doi.org/10.1073/pnas.1921046117 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Bao, Z. R. & Eddy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12, 1269–1276, http://www.genome.org/cgi/doi/10.1101/gr.88502 (2002).
Article CAS PubMed PubMed Central Google Scholar
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, I351–I358, https://doi.org/10.1093/bioinformatics/bti1018 (2005).
Article CAS PubMed Google Scholar
Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal-repeat retrotransposons. Plant Physiol. 176, 1410–1422, https://doi.org/10.1104/pp.17.01310 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ellinghaus, D., Kurtz, S. & Willhoef, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18, https://doi.org/10.1186/1471-2105-9-18 (2008).
Article CAS PubMed PubMed Central Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268, https://doi.org/10.1093/nar/gkm286 (2007).
Article PubMed PubMed Central Google Scholar
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinform. 25, 4.10.1–4.10.14, https://doi.org/10.1002/0471250953.bi0410s25 (2009).
Article Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580, https://doi.org/10.1093/nar/27.2.573 (1999).
Article CAS PubMed PubMed Central Google Scholar
Beier, S., Tiel, T., Munch, T., Scholz, U. & Mascher, M. MISA-web: a web server for microsatellite prediction. Bioinformatics 33, 2583–2585, https://doi.org/10.1093/bioinformatics/btx198 (2017).
Article CAS PubMed PubMed Central Google Scholar
Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644, https://doi.org/10.1093/bioinformatics/btn013 (2008).
Article CAS PubMed Google Scholar
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59, https://doi.org/10.1186/1471-2105-5-59 (2004).
Article PubMed PubMed Central Google Scholar
Keilwagen, J. et al. Using intron position conservation for homology-based gene prediction. Nucleic Acids Res. 44, e89, https://doi.org/10.1093/nar/gkw092 (2016).
Article CAS PubMed PubMed Central Google Scholar
Liang, Y. S. Protein coding sequences of three Lophiiformes species. figshare. Dataset. https://doi.org/10.6084/m9.figshare.27158106.v1 (2024).
NCBI GenBank https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_901000725.2/ (2019).
Bian, L. Thamnaconus septentrionalis genome supporting data. figshare. Dataset. https://doi.org/10.6084/m9.figshare.11874825.v1 (2020).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295, https://doi.org/10.1038/nbt.3122 (2015).
Article CAS PubMed PubMed Central Google Scholar
Tang, S. Y. Y., Lomsadze, A. & Borodovsky, M. Identification of protein coding regions in RNA transcripts. Nucleic Acids Res. 43, e78, https://doi.org/10.1093/nar/gkv227 (2015).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666, https://doi.org/10.1093/nar/gkg770 (2003).
Article CAS PubMed PubMed Central Google Scholar
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–U130, https://doi.org/10.1038/nbt.1883 (2011).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 9, R7, https://doi.org/10.1186/gb-2008-9-1-r7 (2008).
Article CAS PubMed PubMed Central Google Scholar
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964, https://doi.org/10.1093/nar/25.5.955 (1997).
Article CAS PubMed PubMed Central Google Scholar
Seemann, T. Barrnap 0.9: Rapid ribosomal RNA prediction. Available at: https://github.com/tseemann/barrnap.
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935, https://doi.org/10.1093/bioinformatics/btt509 (2013).
Article CAS PubMed PubMed Central Google Scholar
Grifths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33, D121–D124, https://doi.org/10.1093/nar/gki081 (2005).
Article Google Scholar
She, R., Chu, J. S. C., Wang, K., Pei, J. & Chen, N. S. GenBlastA: Enabling BLAST to identify homologous gene sequences. Genome Res. 19, 143–149, https://doi.org/10.1101/gr.082081.108 (2009).
Article CAS PubMed PubMed Central Google Scholar
Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995, https://doi.org/10.1101/gr.1865504 (2004).
Article CAS PubMed PubMed Central Google Scholar
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_037577475.1 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP486588 (2024).
Liang, Y. S. Chromosome-level genome assembly and annotation of the sea toad Chaunax sp, figshare. Dataset., https://doi.org/10.6084/m9.figshare.25100474.v2 (2024).
Simão, F. A. et al. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212, https://doi.org/10.1093/bioinformatics/btv351 (2015).
Article CAS PubMed Google Scholar
Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067, https://doi.org/10.1093/bioinformatics/btm071 (2007).
Article CAS PubMed Google Scholar
Akdemir, K. C. & Chin, L. HiCPlotter integrates genomic data with interaction matrices. Genome Biol. 16, 198, https://doi.org/10.1186/s13059-015-0767-1 (2015).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This research was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB42000000), the National Natural Science Foundation of China (41930533), the National Key R&D Program of China (2022YFC2803800), the Senior User Project of RV KEXUE (KEXUE2020GZ02), and the Science and Technology Program of Nanji Islands National Marine Nature Reserve Administration (JJZBPYCG-2021112901).

Author information

These authors contributed equally: Zifeng Zhan, Yanshuo Liang.

Authors and Affiliations

Laboratory of Marine Organism Taxonomy and Phylogeny, Qingdao Key Laboratory of Marine Biodiversity and Conservation, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
Zifeng Zhan, Yanshuo Liang, Jiehong Wei, Jing Liu & Kuidong Xu
Laboratory for Marine Biology and Biotechnology, Qingdao Marine Science and Technology Center, Qingdao, 266237, China
Zifeng Zhan, Yanshuo Liang, Jiehong Wei, Jing Liu & Kuidong Xu
University of Chinese Academy of Sciences, Beijing, 100049, China
Yanshuo Liang, Jiehong Wei & Kuidong Xu

Authors

Zifeng Zhan
View author publications
Search author on:PubMed Google Scholar
Yanshuo Liang
View author publications
Search author on:PubMed Google Scholar
Jiehong Wei
View author publications
Search author on:PubMed Google Scholar
Jing Liu
View author publications
Search author on:PubMed Google Scholar
Kuidong Xu
View author publications
Search author on:PubMed Google Scholar

Contributions

Z.Z. and K.X. conceived the research project. Y.L., Z.Z. and K.X. collected the samples. Y.L., Z.Z. and W.J. performed the bioinformatics analyses. Y.L. and Z.Z. wrote the manuscript. Z.Z., L.J. and K.X. revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Kuidong Xu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhan, Z., Liang, Y., Wei, J. et al. Chromosome-level genome assembly and annotation of a sea toad (Chaunax sp.). Sci Data 11, 1397 (2024). https://doi.org/10.1038/s41597-024-04245-5

Download citation

Received: 10 May 2024
Accepted: 05 December 2024
Published: 19 December 2024
Version of record: 19 December 2024
DOI: https://doi.org/10.1038/s41597-024-04245-5