Background & Summary

Coral reefs are widely regarded as one of the most biologically diverse and ecologically fragile ecosystems on Earth1. These vital habitats support at least 25% of the world’s marine species, despite covering less than 0.2% of the ocean floor2,3. In addition to their immense ecological importance, coral reefs sustain the livelihoods of millions of people through industries such as fishing and tourism4. However, the inherent fragility of coral reefs makes them particularly susceptible to a range of anthropogenic and environmental stressors. Factors such as rising sea temperatures, ocean acidification, pollution, and destructive fishing practices pose significant threats to their survival5,6,7,8. Notably, the frequency of coral bleaching events, a major indicator of reef health, has increased globally9,10 and is expected to intensify in the coming decades11,12.

Meanwhile, the globalization of trade, tourism, and economies has exacerbated the spread of invasive species, which pose a substantial threat to biodiversity by disrupting ecosystem functions and altering community composition. Such invasions can lead to severe economic consequences13,14. One particularly concerning invasive species is Tubastraea coccinea (T. coccinea), an azooxanthellate coral species (Fig. 1), exhibits a widespread, low-latitude distribution across multiple ocean basins due to its tolerance of conditions that cause bleaching and mortality in zooxanthellate corals15. Native to the Indo-Pacific region, T. coccinea has successfully invaded various areas of the eastern Pacific, as well as the western and eastern Atlantic, extending to southern Brazil, resulting considerable environmental, economic, and social impacts16. Its highly invasive nature is facilitated by a suite of biological traits, including rapid growth, early reproductive maturity, multiple reproductive strategies, and the absence of natural predators. As a result, T. coccinea has colonized over 95% of available surfaces in the Atlantic Ocean16,17,18. Without the development of effective control measures, its spread is likely to continue unabated.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

An image of the T. coccinea sample utilized for genome sequencing.

The phylogenetics of scleractinian corals remains a complex and poorly understood area of research. Despite the use of classical morphological classifications and molecular phylogenetic techniques, many aspects of coral evolution are still shrouded in uncertainty19,20,21. The ancestral state of scleractinians—whether they were originally photosymbiotic or azooxanthellate—remains controversial22,23,24,25. In part, previous studies have tended to focus on shallow-water, photosymbiotic species, and as a result, the biological diversity and ecological significance of azooxanthellate corals—comprising approximately half (>700) of all scleractinian species—remain underexplored. These corals exhibit broad distributions and notable biological diversity26,27, highlighting the need for more genetic data on this poorly understood group.

Genomic approaches have emerged as a powerful tool for advancing our understanding of coral phylogenetics and informing conservation strategies for non-model organisms28,29,30. To better understand the genetic basis of environmental adaptation and the extreme invasiveness of this particular coral genus, we present the first draft genome assembly of T. coccinea generated using long-read PacBio HiFi sequencing. The genome size of T. coccinea is 875.9 Mb, consisting of 2,573 scaffolds with an N50 length of 694.3 kb. Repetitive sequences constitute 26.01% of the total assembly, with unclassified repeats (8.75%), DNA elements (7.11%), and long interspersed nuclear elements (3.83%) (Table 1). We identified 37,307 protein-coding sequences, of which 35,221 (95.2%) are functionally annotated using five functional databases (SwissProt, KEGG, NR, GO, Pfam). The completeness of the genome, assessed using the BUSCO tool, was 96.9%, with 94.7% of the genes being complete, 2.7% fragmented, and 2.6% missing. Additionally, we predicted 1, 963 non-coding RNAs (58 miRNAs, 14,111 tRNAs, 923 rRNAs, and 224 snRNAs) in the T. coccinea genome assembly. These genomic resources will serve as a foundation for future research on the genetic mechanisms underlying the adaptability of T. coccinea to varying environmental conditions, as well as its invasive behavioral and ecological impacts.

Table 1 Statistics of repeat elements in the genome of Tubastraea coccinea.

Methods

Sample collection and DNA extraction

Tubastraea coccinea (Fig. 1) specimens were purchased from commercial suppliers in Qingdao, China, with the original source being Vietnam, and were cultured in an aquarium utilizing circulating seawater. The corals were acclimatized under laboratory conditions for 5 days prior to DNA extraction. A live specimen was further cut into 1 mm pieces and washed three times with the calcium- and magnesium-free PBS solution (wash buffer) adjusted to an osmolarity of 1,100 mOsmol. The pieces were treated with collagenase (type II, 2 mg/ml) for 30 min at room temperature to prepare cell suspensions. The cell suspension was concentrated by centrifugation (500 × g for 5 min at 4 °C). The solution was resuspended and washed three times in wash buffer. The final cell pellet was immediately fixed in liquid nitrogen for DNA extraction. Total DNA was extracted using the standard phenol/chloroform method31. The quantity of genomic DNA quantity was measured using a Nanodrop 2000 spectrophotometer, with acceptable quality standards of OD260/280 ranging from 1.8 to 2.0 and OD260/230 ranging from 2.0 to 2.2. The purity and integrity of the DNA were further assessed via 1% agarose gel electrophoresis.

Library preparation and sequencing

Qualified DNA sample were sent to Novogene (Beijing, China) for library preparation and whole genome sequencing. Using standard PacBio protocols, HiFi sequencing library was prepared with the SMRTbell™ Express Template Prep Kit 2.0 (Pacific Biosciences, California, USA) and sequencing was conducted on the Pacific Biosciences Sequel II systems (Pacific Biosciences, California, USA). The raw base-called data was transferred from the sequencer to SMRTLink v13.1 (https://www.pacb.com/support/software-downloads/), where HiFi reads were generated using the CCS algorithm. A total of 16.1 Gb of high-quality PacBio HiFi reads were obtained.

Genome assembly

The PacBio HiFi long reads were used to assemble into contigs by Hifiasm v0.16.1-r37532 with default parameters. HiFi long reads served as the input for Hifiasm to generate the primary contigs. Hifiasm attempts to eliminate haploid duplications, followed by three iterations of error correction. The assembly was examined for non-target DNA detection using Blobtools v1.1.133, where the top hit based on diamond v2.1.834 results were aligned against the NCBI nr database with an e-value cutoff of 1e-5. 69.83% of contigs showed BLAST hits to Cnidaria, while 13.92% remained unassigned and 12.90% matched to other phyla, likely due to incompleteness in the available coral genome database (Fig. 2). Genome assembly statistics was analyzed with QUAST v5.2.035 and the completeness of the genome assembly was evaluated with BUSCO v5.2.236 utilizing the conserved metazoan gene set “metazoa_odb10”. The T. coccinea assembly consisted of 875.9 Mb, across 2,573 scaffolds, with an N50 of approximately 694.3 kb and BUSCO completeness of 97.4% (Complete + Fragmented) (Fig. 3).

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

BlobPlot of the T. coccinea purged genome assembly. Blue dots show contigs with best blast hits to Cnidaria. Other different colors of the dots represent taxonomic information, as detailed in the legend. Histograms above and to the right of the scatter plot depict the distribution of coverage and GC content proportion, respectively.

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Snail visualization summary of T. coccinea genome assembly statistics. To summarize and visualize statistics, we employed the software ‘assembly-stats’ (https://github.com/hanwnetao/snailplot-assembly-stats).

Repeat annotation

The annotation of transposable elements (TEs) and repeat sequences was conducted in two steps. Firstly, three de novo repeat identification algorithms: RepeatModeler v237 LTR_retriever v2.538, and RepeatScout v1.0.539, were applied to the T. coccinea genome assembly to build de novo repeat libraries, along with the downloading of the Repbase database40. Secondly, RepeatMasker v4.0.941 was employed to analyze and annotate the TEs and repeat sequences found in the library and the database. Software LTR_Finder v1.242 was utilized to predict long terminal repeat (LTR) sequences, with parameters ‘-D 15000 -d 1000 -L 7000 -l 100 -p 20 -C -M 0.9’, followed by LTR_retriever v2.538 to eliminate redundancy in the predicted sequences to produce nonredundant LTR sequences with default parameters.

Gene prediction and functional annotation

To achieve comprehensive gene annotation, three strategies were used for the prediction of protein-coding genes, integrating various sources of evidence: ab initio prediction, homology-based method, and transcriptome-assisted technique. For ab initio prediction, we aligned RNA-seq dataset (SRR8386108) to the T. coccinea draft genome using the STAR v2.7.143 aligner with default settings. The mapping results were subsequently utilized to generate transcript models through a combined approach involving BRAKER v2.1.544, Semi-HMM-based Nucleic Acid Parser (SNAP, v2013.11.29)45 and StringTie v2.1.646 with parameters: ‘-m 200 -a 10 -conservative -g 50 -u’. For homology-based method, metazoan protein sequences from the OrthoDB database and protein-coding sequences of several corals from The NCBI Reference Sequence Database (NCBI RefSeq), including Acropora muricata (RefSeq accession: GCF_036669905.1), Montipora foliosa (RefSeq accession: GCF_036669935.1), Pocillopora verrucose (RefSeq accession: GCF_003704095.1) and Stylophora pistillata (RefSeq accession: GCF_002571385.2), were aligned to the genome assembly utilizing TBLASTN v2.12.047 and GeneWise v2.2.048. For transcriptome-assisted technique, the RNA-seq reads were both de novo and genome-guided assembled using Trinity v2.5.149 with default parameters. The resulting transcripts were further assembled using the Program to Assemble Spliced Alignment (PASA) v2.5.250 with BLAT v3551 and GMAP v2023-12-0152 employed as aligners. Finally, the outcomes from these three strategies were integrated into a unified gene annotation using the EVidenceModeler v1.1.153. Overall, a total of 37,307 protein-coding genes were identified in the T. coccinea genome.

Utilizing the structural characteristics of tRNA, we performed de novo predictions of tRNAs using the tRNAscan-SE v2.0 software54. Additionally, rRNA, snRNA, and miRNA predictions were conducted with Infernal v1.0 software55. This analysis identified four types of noncoding RNAs: 14, 111 tRNAs, 923 rRNAs,224 snRNAs, and 58 miRNAs (Table 2).

Table 2 The statistics of ncRNA annotation in the coral Tubastraea coccinea.

Protein function predictions were performed using various databases, including CDD56, PANTHER57, Superfamily58, Gene3D59, SMART60, and ProSiteProfiles61 to predict protein functions by analyzing the conserved protein domains through InterProScan v5.3662. Furthermore, eggNOG-mapper v263 was utilized to search for homologous genes in the eggNOG database, enabling KEGG64 and GO65 annotation. Functional annotation of the predicted protein-coding genes was performed using blastp v2.2.26 against the SwissProt database, diamond v2.1.8 against the NR database, and hmmscan v3.3.2 against the Pfam database, with an e-value threshold of 1e-5. Ultimately, more than 35,221 (95.2%) genes were successfully annotated (Table 3).

Table 3 The statistics of functional annotation in the coral Tubastraea coccinea.

Data Records

The raw sequencing data and genome assembly of Tubastraea coccinea have been deposited in the National Center for Biotechnology Information (NCBI) under the accession number SRR3164537766 (PacBio data) and JBJUWB00000000067 (genome assembly). Additionally, the genome annotation files (GFF and GTF), predicted protein and CDS files, as well as the gene model annotation file, are available in the figshare database68.

Technical Validation

After completing the genome assembly, we evaluated its quality based on several key aspects. (i) The assembled genome is 875.9 Mb in length, which is consistent with the previously published version and indicates a relatively complete genome. (ii) Genome coverage analysis using SAMtools v1.14 revealed 100% genome coverage and a 99.67% mapping rate for PacBio HiFi reads. (iii) The contig N50 reached 694.3 kb, which is ten times greater than that of the previous version and substantially higher than the N50 of closely related species (T. tagusensis and Tubastraea sp.), which range from 82.7 kb to 227.0 kb based on long-read sequencing69. (iv) The genome assembly completeness reached 97.4%, significantly surpassing the previous version and other Tubastraea species (T. tagusensis and Tubastraea sp.), whose completeness ranges from 88.1% to 91.6%69. (v) A BUSCO evaluation based on the Metazoa_odb10 dataset, which contains 954 conserved genes, showed a gene model completeness of 97.4%, with 94.7% of genes complete, 2.7% fragmented, and 2.6% missing. Together, these results confirm the T. coccinea genome assembly we obtained is of high-quality.