Background & Summary

Coral, as one of the oldest inhabitants on Earth, plays a foundational role in the coral reef ecosystems they constitute1. Coral reef ecosystems are of immense ecological and economic importance, contributing to coastal protection, marine water purification, and the regulation of the carbon cycle. However, these ecosystems are exceptionally sensitive and vulnerable, responding acutely to environmental changes2,3,4,5,6. To date, nearly 50% of the coral reefs have been degraded worldwide, with 14% of that degradation occurring within the past decade4. Reef-building corals are ecologically significant due to their fundamental contributions to reef formation, ecosystem regulation, and biodiversity support7,8. Acropora, a genus of reef-building corals, is widely distributed across the globe, spanning vast oceanic regions from the Red Sea to the Indo-Pacific and even the Caribbean9,10. This genus is known for its great diversity and abundance, serving as the dominant genus of reef-building corals in the Indo-Pacific region11,12. Unfortunately, Acropora species have suffered from severe bleaching events13,14,15, leading to the loss of large areas of coral reefs. Given these challenges, understanding how Acropora responds to the environment is desired. This exploration may necessitate the application of omics-based approaches to uncover adaptive mechanisms that could inform conservation strategies.

Acropora digitifera was the first coral species to have its genome sequenced16, followed by the subsequent release of additional Acropora genomes17,18,19,20. To date, 26 genomes of Acropora coral are available on NCBI. These resources enable comparative analyses, functional gene studies, and adaptive trait investigations, offering molecular insights into stress resistance and diversification. Omics analyses have identified the genes related to symbiosis, may play pivotal roles in enabling Acropora corals to withstand environmental stressors19. Analysis of three hybrid corals (Acropora cf. gemmifera, Acropora cf. humilis, and Acropora cf. monticulosa) within the A. cf. humilis complex, integrating morphological and genomic data, revealed anomalous genomic regions under selection21. These regions include coding sequences potentially involved in thermotolerance or stress response. Consistent with this, hybridization likely facilitated the adaptive radiation of Acropora corals through genomic and mitochondrial data analysis22. Otherwise, In two high-quality, contiguous Acropora genomes, lineage-specific gene family expansion was identified, which may confer a survival advantage23. These advancements without exception highlight the significance of genomic data in the study of Acropora corals.

A. intermedia is particularly widespread and has demonstrated obvious heat resistance during the extensive coral bleaching event that occurred in China’s Greater Bay Area in 202024,25. Therefore, this species is an excellent model for studying coral bleaching and hybridization with congeneric species26. Investigating hybridization events among Acropora corals can provide valuable insights into reticulate evolution. The first A. intermedia genome assembled using short-read methodologies with the overall quality and annotation requiring improvement19. However, the genome assembled from Next-Generation Sequencing (Illumina HiSeq. 2500) reads may exhibit deficiencies in resolving complex repetitive regions, accurately detecting structural variations, performing synteny analyses, and studying gene families. Particularly for A. intermedia, research on thermal tolerance mechanisms and hybridization dynamics critically depends on high-quality genomic resources. To solve these problems and expand the genomic resources available for corals in the South China Sea and, particularly for Acropora, we reassembled and re-annotated a new version of the A. intermedia genome based on long-read PacBio HiFi sequencing, which spans 496.8 Mb with a Contig N50 of 2.9 Mb. BUSCO analysis indicated a completeness of 92.6% (C: 92.6%, S: 91.2%, D: 1.4%, F: 3.4%, M: 4.0%, n: 954). Our assembly exhibits a significant improvement in quality compared to the previous version (Contig N50: 40.3 Kb, BUSCO completeness: 90.6%). The completeness of gene models, assessed via BUSCO, has reached 95.7% (C: 95.7%, S: 93.7%, D: 2.0%, F: 1.9%, M: 2.4%, n: 954), reflecting a 2.7% increase. This enhanced A. intermedia genome offers greater contiguity and completeness, improving data quality and enabling important analysis like structural variations, evolutionary dynamics, and horizontal gene transfer.

Methods

Sample collection and DNA extraction

The samples of A. intermedia were obtained by scuba diving from the Fenghuang Island (Sanya, Hainan Province, China; 18°14′24.70″ N, 109°29′38.31″ E) in June 2024 (Fig. 1). Fresh partial adult A. intermedia sample was cut into small pieces and washed using 3 x PBS. The coral tissue was digested utilizing Type II collagenase (2 mg/ml), and the mixed symbiotic algae and coral polyp cells were separated by performing multiple rounds of brief centrifugation. During this process, most of the algae settled at the bottom of the centrifuge tube, while the supernatant retained pure polyp cells. The supernatant was carefully aspirated, centrifuged, and the resulting pellet was collected as the material for subsequent DNA extraction. DNA was extracted using the phenol-chloroform extraction27. The Nanodrop 2000 spectrophotometer was utilized to measure the quantity of genomic DNA, ensuring that the OD260/280 ratio was between 1.8 and 2.0, and the OD260/230 ratio was between 2.0 to 2.2. Additionally, the purity and structural integrity of the DNA were evaluated through 1% agarose gel electrophoresis.

Fig. 1
figure 1

Images of live Acropora intermedia samples and their skeletal structures (a). Live samples underwater (b) Skeletal structures.

Library preparation and sequencing

A total of 30 µg of DNA was used for the construction of a Circular Consensus Sequencing (CCS) library for PacBio sequencing in Novogene (Beijing, China). The HiFi sequencing library was prepared according to standard PacBio protocols using the SMRTbell™ Express Template Prep Kit 2.0 (Pacific Biosciences, California, USA). Sequencing was performed on the PacBio Sequel II systems (Pacific Biosciences, California, USA). The raw data was transmitted from the sequencer to SMRTLink v13.1 (https://www.pacb.com/support/software-downloads/), where the CCS algorithm was employed to generate HiFi reads.

Genome size estimation

Prior to assembly, the genome size of A. intermedia was estimated using k-mer analysis of raw PacBio HiFi reads. The raw reads were quality-trimmed using fastplong v.0.3828 (-m 20 -l 100 -w 12) and fragmented into 150 bp via custom simple python script. Subsequently, the fragmented reads were used to generate a histogram based on k-mer = 21 with Jellyfish v2.329 (-C -m 21 -h 1,000,000). The output histogram from Jellyfish was subsequently visualized using the online server GenomeScope v.1 (http://genomescope.org)30 with parameter k = 21, ploidy = 2, Read length = 150, and maximum k-mer coverage = 1,000,000 (Fig. 2).

Fig. 2
figure 2

A. intermedia genome assembly size estimation using GenomeScope. Genomescope k-mer (21) distribution from the adapter trimmed PacBio HiFi reads.

Genome assembly and genome quality assessment

Based on previous studies demonstrating superior performance in terms of Con-tig N50 and BUSCO completeness scores for similar datasets, Hifiasm was chosen for initial assembly with default parameters31,32,33. Haploid duplications were removed using Hifiasm’s integrated functionality. To enhance consensus sequence accuracy, three rounds of polishing were performed using Racon v1.5.034. The genome completeness was assessed with BUSCO v5.2.235, employing the conserved metazoan gene set known as “metazoa_odb10”. Additionally, we used the snailplot-assembly-stats to create a SnailPlot (https://github.com/hanwnetao/snailplot-assembly-stats) to visualize our assembly statistics (Fig. 3).

Fig. 3
figure 3

Genome assembly overview of A. Intermedia. The contiguity and completeness of the A. intermedia genome assembly, post-contamination screening, is represented by a circle plot reflecting the full assembly length (~496.8 Mb), distributed across 633 contigs. The Contig N50 (2.9 Mb) is marked in dark orange, and the N90 (536.7 Kb) in light orange. The longest contig was 11.8 Mb (highlighted in red). The BUSCO scores are shown in the top right corner in green.

To assess potential contamination, the assembly was screened for non-target DNA using Blobtools v1.1.136. Taxonomic classification was performed by aligning the top hit from BLAST v2.9.037 against the NCBI NT database, with an e-value threshold of 1e-5 (Fig. 4).

Fig. 4
figure 4

Quality assessment and contamination detection of the A. intermedia genome assembly. (a) Histograms above depict the distribution of coverage. BlobPlot showing taxonomic affiliation at the phylum rank level for A. intermedia. (b) The average GC content. Blue dots show contigs with best BLAST hits to Cnidaria.

The newly generated genome assembly statistics were compared with the previous genome version (GCA_014634585.1) using QUAST v5.2.038. A comparison of the two versions of the genome information is summarized in Table 1.

Table 1 Comparison of Acropora intermedia genomic quality between past versions and this study.

Repeat annotation

Prior to masking repetitive elements, a de novo library of repeats was constructed for the final A. intermedia genome assembly using RepeatModeler v.2.0.139. RepeatMasker v4.1.740 was then used to predict repeat sequence with the de novo self-training result generated by RepeatModeler serving as the input targeting the RepBase-20181026 database. RepeatMasker masked 47.91% of the genome, with most of the repeats being unclassified (20.37%). Transposable elements (TEs) were predicted using EDTA v2.2.241 for de novo prediction. The TE library of EDTA was obtained after filtering. Combined with the results of RepeatMasker and EDTA, the final genome repeat sequence annotation was compiled (Table 2).

Table 2 Statistics of repeat elements in the genome of A. intermedia.

Gene prediction and annotation

We employed multiple methods for genome structure annotation. First, we used the Augustus v3.342 for de novo gene prediction. Genomes of all species in the genus Acropora were downloaded from NCBI to create the test and training sets for the model. After training, the model was used to predict genes on A. intermedia. Additionally, the transcriptome data of A. intermedia (SRP226139) was obtained from NCBI, and transcript assembly was performed using Trinity v2.15.143. We further mapped and aligned the assembled transcripts back to the genome using the Program to Assemble Spliced Alignments (PASA) v2.5.244. Finally, the results from these strategies were integrated into a unified gene annotation using the EVidenceModeler v1.1.145.

For non-coding RNA annotations, we download Rfam databases46,47,48 and annotated ncRNAs, snRNA, and microRNA using infernal v1.1.449 based on the Rfam database. The rRNA annotation was performed using RNAMMER v1.250. In addition, 12,556 high-confidence transfer RNAs were predicted using tRNAscan-SE v2.0.1251 based on filtering of the initial set of 15,871 putative tRNAs using EukHighConfdenceFilter. The final statistical results are summarized in Table 3.

Table 3 The statistics of ncRNA annotation in the coral A. intermedia.

The functional annotation of the predicted protein-coding genes was performed by aligning protein sequences against the eggNOG 5.052 using DIAMOND v2.1.852, followed by hmmsearch (HMMER v3.3.2)53,54 validation against Pfam HMMs for conserved domain identification. In both the DIAMOND v2.1.8 alignment against the eggNOG database and the hmmsearch against Pfam HMMs, results with E-values greater than 1e-5 were removed to ensure the accuracy of the annotations. Additionally, protein function predictions were performed using various databases to revalidate the accuracy and enhance the completeness of the annotations, including Kyoto Encyclopedia of Genes and Genomes (KEGG)55 and Gene Ontology (GO)56. These databases were employed in conjunction with InterProScan v5.3657 to predict protein functions by analyzing the conserved protein domains. Specifically, BLASTP v2.2.2 was used against the SwissProt database 2025_0158, and DIAMOND v2.1.8 was applied against the NR database. Finally, all functional annotation files were merged using a Python script. In summary, 26,611 (99.1%) genes were successfully annotated (Table 4).

Table 4 The statistics of functional annotation in the coral A. intermedia.

Data Records

The raw sequencing data and genome assembly of Acropora intermedia are available at: https://identifiers.org/ncbi/insdc.sra:SRP566323 (PacBio Hifi data)59, and https://identifiers.org/ncbi/insdc.gca:GCA_048544155.1 (genome assembly)60. Additionally, the predicted protein (.gff) and CDS files are available in the figshare database61.

Technical Validation

Through third-generation PacBio sequencing, 14.2 Gb of raw reads were generated. Before genome assembly, k-mer analysis predicted the genome size to be 433.11 Mb with a heterozygosity rate of 1.95% (Fig. 2). The final assembly consists of 633 contigs, totaling 496.8 Mb, with a Contig N50 of 2.9 Mb, L50 of 46, and a maximum contig length of 11.8 Mb (Fig. 3). Upon completion of the genome assembly, we assessed its contiguity, accuracy, and completeness using multiple approaches. The BUSCO completeness of the genome was 92.6% (C: 92.6%, S: 91.2%, D: 1.4%, F: 3.4%, M: 4.0%, n: 954), including both complete and fragmented genes (Fig. 3). Quality assessment and contamination detection of the A. intermedia genome show that 99.76% of contigs have been aligned to the NCBI nt database and belong to the phylum Cnidaria. 95.85% of PacBio HiFi reads were mapped to the genome (Fig. 4). The unmapped rate of 4.15% refers to the proportion of PacBio HiFi reads that did not align with the genome. These reads might have been discarded as contamination during assembly or belong to organellar (e.g. Mitochondria). A comparative analysis with the previous genome version, based on statistical metrics (Table 1), demonstrated significant improvements. Notable enhancements include a substantial increase in Contig N50 from 40.3 Kb to 2.9 Mb, a reduction in the total number of contigs from 20,998 to 633, and the elimination of N’s from 5,276.11 per 100 Kbp to 0. These findings indicate a marked improvement in the contiguity of the genome assembly, with the Contig N50 being over 73 times larger than the previous version. BUSCO analysis of the 633 contigs revealed that 954 (92.6%) of the metazoan single-copy orthologs were fully represented, with 91.2% of these in the single-copy orthologs and 1.4% duplicated. A small fraction, 3.4%, of orthologs were fragmented, and 7.4% were missing. Through structural annotation, 792,517 repeat elements were identified genome-wide. Among the classified repeats, long terminal repeat (LTR) retrotransposons represent the most abundant category (Table 2). Additionally, 17,829 RNA-associated elements and 26,852 protein-coding genes were annotated in the A. intermedia genome (Tables 3 and 4). Furthermore, evaluation of the gene models through BUSCO indicated a gene model completeness of 95.7%, with only 43 genes (4.3% of the total) missing in the final annotated genome (Table 1). These results collectively highlight the high quality, continuity, and completeness of the genome assembly. The update genome of A. intermedia provides an essential resource for deciphering its adaptive thermotolerance and hybrid compatibility, with implications for reef restoration strategies and reconstruction of scleractinian evolutionary history.