Introduction

Despite their biological and economic importance, tropical coral reefs are in decline because of local and global anthropogenic influences1,2,3. At local scales, factors such as pollution and increased nutrient run-off are decreasing coral fitness making them more susceptible to disease and bleaching events4,5,6,7. At a global scale, anthropogenic climate change is further causing deleterious problems through ocean warming3,8,9and ocean acidification10,11,12. Global and local stressors do not act alone, with the synergistic effects of these variables2,10,13,14further reducing coral host fitness, and causing phase shifts from coral to algal reef systems15,16,17.

One area trying to stem the decline of tropical coral reefs is coral restoration which restores stony corals to the reef ecosystem18,19,20. Coral restoration has grown substantially since the early 2000s, with large biomasses of corals outplanted to reef systems18,21. Historically, asexual propagation has been the primary focus of restoration activities and involves fragging larger colonies into smaller pieces which show increased growth rates compared to larger colonies22,23. These fragged pieces of coral are then replanted on the reefs in clusters allowing fusion, resulting in large coral colonies being formed over a markedly shorter time frame22,24. Recently, this method has shown success, with fused outplanted corals subsequently spawning25,26. The increase in coral biomass for restoration activities has led to the need to research optimal outplant sites for restoration success. Outplant sites can vary in their morphology (a flat reef structure versus a spur and groove system) and the abiotic conditions they exhibit27,28,29. With different coral species exhibiting habitat preferences30,31,32, outplant site is a key consideration for restoration. Additionally, within species variability also needs to be considered. For example, different genets from a coral species can vary in bleaching resistance33,34, disease resistance35,36, and growth capabilities37,38,39. The exact mechanisms driving differences between genet responses within a species are multifaceted and may involve the coral host genetics and its microbiome (prokaryotes, viruses, Symbiodiniaceae, protists, and fungi). For the coral host factors influencing gene expression, such as epigenetic modification40,41, gene frontloading42,43, and genetic mutations44,45 could be important in shaping differences between genets, leading to varying levels of baseline gene expression. We hypothesize that this may lead to specific processes, such as immune responses, being naturally higher in some genets providing increased resistance to factors such as pathogens. Therefore, baseline gene expression may play a key role in predicting resistance and susceptibility, as well as performance for outplanted corals at different reef habitats. By understanding the underlying stressors that coral may encounter at an outplant site, preferential genets can be selected to maximize survivability and growth in restoration activities.

In the Caribbean, one species heavily focused on for restoration activities is the critically endangered branching coral Acropora palmata46,47,48. In the Florida Keys, since the 1980’s, this species has seen drastic declines due to disease outbreaks49,50, bleaching events51,52, and human influences53. Due to the fast-growing nature of A. palmata, it is especially conducive to asexual fragmentation54 allowing large biomass to be accrued and outplanted onto the reefs. Despite the Coral Restoration Foundation (CRF) prioritizing A. palmata as a key coral species for outplanting, there has so far been no research on how different reef sites influence the transcriptomic profiles of different A. palmata genets. Understanding the transcriptomic profiles will allow the identification of potential resistant and susceptible genets and underlie the importance of fully characterizing reef site conditions to improve survivability of outplanted corals.

In this study, we characterized the gene expression profiles over a year of multiple fragments of four genets of A. palmata at three reef sites in the Florida Keys with ongoing active restoration: Carysfort Reef, Pickles Reef, and North Dry Rocks (Fig. 1A). We found that genet identity was the largest driver of the gene expression, swamping any signal of reef site or sampling time (i.e., time of year). We also found differences in gene expression of baseline immune activity and metabolic activity between the genets. Once accounting for genet identity, we identified significantly correlated co-expression modules linked to the different sampling times, with these modules linked to the cooler winter months, and hotter summer months.

Fig. 1
figure 1

Reef location, outplant sampling summaries, and sea surface temperatures (SST) for the three reef sites. (A) Locations of the three reef sites in the Florida Keys denoted by red dots. Inset map bottom right identifies the location with a square box of the main map image. (B) Field sampling summaries and samples selected for RNA-seq analysis. The table shows each sampling time point (ST; column 1), reef site (column 2), date of sampling (column 3), the total number of days to sample all reef sites for each sampling time point (column 4), and which subset of sampling time points were selected for RNA-seq analysis. (C) Plot of OISST at the three reef sites studied (colored lines). Seasons are indicated on the figure by blue fill (wet season) and red (dry season). Sampling time point ranges are indicated by black vertical bars on the plot.

Materials and methods

Field site selection

Three active restoration field sites in the Northern Florida Keys, managed by the Coral Restoration Foundation (CRF), were chosen for field sampling: Carysfort Reef (25.2209, −80.2102), Pickles Reef (24.9845, −80.4164), and North Dry Rocks (25.1304, −80.2940) (Fig. 1A). Carysfort Reef was the northernmost sampling site (Fig. 1A), a shallow reef system with low rugosity. Sampled A. palmata outplants were located between 17 and 22 ft. North Dry Rocks was the middle sampling site (Fig. 1A), a spur and groove reef system. All sampled A. palmata were located on one spur and ranged from 10 to 15 ft in depth. Pickles Reef was the most southern sampling site (Fig. 1A) with low rugosity like Carysfort Reef, with sampled corals between 14 and 19 ft. These three reef sites provided different latitudinal locations, and different outplant depths of A. palmata for correlative analysis.

Coral sampling

Five sampling time points were conducted at each of the three reef sites: sampling time point one (ST1) during November 2018, sampling time point two (ST2) during February/March 2019, sampling time point three (ST3) during June/July 2019, sampling time point four (ST4) during September 2019, and sampling time point five (ST5) during November 2019 (Fig. 1B). These covered a full year and the summer and winter months. For each sampling time point, all reefs were sampled in a 7-day period (Fig. 1B) with SCUBA used to access outplanted A. palmata fragments within clusters. Each cluster contained six to seven fragments of the same genet and clusters were identified for each sampling time point using cow tags placed at initial outplanting (Supplementary Fig. S1). For each sampling time point, clusters underwent health surveys and tissue sampling using a hammer and chisel. In each cluster, three of the six to seven fragments were sampled at each sampling time point, with the same fragments sampled between sampling timepoints using unique numeric identifiers and previously taken outplant cluster photographs. A tissue sample was then collected in pre-labeled zip bags with SCUBA, with tools wiped between samples. On the boat, tissue samples were transferred to 2 ml cryovial tubes filled with 1.5 ml of RNAlater and placed on ice. All tissue samples were processed and placed on ice within 35 min of sampling. All tools used in transferring tissue samples were bleached and wiped down with ethanol between each sample. At the end of each day, samples were placed at −80 °C.

Sea surface temperature for the reef sites

Sea surface temperature (SST) was obtained from the National Oceanographic and Atmospheric Association (NOAA) Optimum Interpolation Sea Surface Temperature (OISST) data set55,56,57. To cover all sampling timepoints, data was subset for a temporal range from October 2018 to January 2020 and obtained for each reef site. OISST time course plotting was undertaken in R (v4.0.3) and RStudio (v1.4.1106) using tidyverse58and GGplot259. Average temperatures were calculated across all reef sites, as well as average temperature recorded at each reef site for each sampling time point. The average temperature was also calculated for each reef site and the time span it took to sample all reefs at each sampling time point.

Sample choice for sequencing

A total of 1227 samples of 11 genets of A. palmata were collected from the three reef sites over the five sampling timepoints. Of the 11 genets sampled, four were selected for further processing. These genotypes were selected since they were used in an ex-situ disease experiment60 and are referred to using CRF designations: CN2, CN4, ML2, and HS1. As high replication of genets within each reef was desired, only four of the five sampling timepoints were chosen for sequencing: ST2 (February/March 2019), ST3 (June 2019), ST4 (September 2019), and ST5 (November 2019). This resulted in 380 samples of A. palmata for 3` RNA-sequencing, with full reef, genet, and timepoint breakdown available in Supplemental File S1.

RNA extractions, complementary DNA synthesis, and sequencing

The 380 samples chosen for RNA-seq library prep were randomized to minimize bias due lab work and variables of interest (e.g. sampling time point, reef location). DNA and RNA were extracted from the same piece of coral tissue using the Zymo MagBead DNA/RNA kit on the Kingfisher Flex. For an in-depth protocol please see61. On completion, eluted DNA was placed at −80 °C for amplicon sequencing. For eluted RNA, 8 µl aliquots were used for quality control (nanodrop and qubit), with the remainder sealed and placed at −80 °C for downstream processing.

Total RNA was converted to complementary DNA (cDNA) using the Quant-seq FWD kit (Lexogen) following the high yield/quality manufacturers protocol. The qPCR add-on kit (Lexogen) was utilized to ascertain the RNA concentrations needed to reduce under and over PCR cycling. Samples were dual indexed and sent to the Hussman Institute for Human Genomics (Miller School of Medicine, University of Miami) for single-end (SE) 100 sequencing on a NOVA-seq utilizing two lanes. Sample read depth was requested at 4–5 million reads.

Transcriptomic bioinformatics

Pre-processing of the 3′RNA-seq libraries followed the same pipeline as60. Demultiplexing was done by the sequencing facility, and raw reads had adapters and low-quality reads trimmed using BBDuk.sh62 with recommended parameters from Lexogen (https://www.lexogen.com/quantseq-data-analysis). FastQC63was then used to identify successful removal of low-quality reads and adapters for all samples. Alignment and quantification utilized Salmon64 and the annotated A. palmata transcriptome65,66. Quantified samples were read into R (v4.0.3) and RStudio (v1.4.1106) using tximport67 and quantified to the gene level. Samples with less than 1,000,000 total read counts, and genes with less than four counts in greater than 15 samples, were removed. This filtering criteria was used as it retained ~ 16,000 genes for downstream analyses, while also being conservative only removing extremely lowly expressed genes and allowing any differences due to rare genes within treatments to be maintained.

Prior to analysis, hierarchical clustering using hclust68 was employed to identify any sample groupings that were not explained by available metadata and thus deemed as surrogate variables. Downstream analysis either removed the surrogate variable or incorporated it in statistical models.

Principal component analysis (PCA) was run using a variance stabilized transformation (VST) of the raw counts, on all coral samples, using a modified PlotPCA function from DeSeq2 (v 1.40.2)69. Correlations of principal components (PCs) to metadata variables were done using pcatools::eigencorplot70, with significance identified using a Pearson correlation. Visualization of PCs of interest was done using GGPlot259. To identify finer scale patterns, surrogate/batch variables were removed using limma::removeBatchEffect71 on the VST of raw counts. The batch corrected VST data was then used as input for additional PC analysis.

Differential gene expression (DGE) analysis was run using DeSeq2 (v 1.40.2)69. To identify differences in baseline gene expression between the four genets (ML2, CN2, CN4, and HS1) the Likelihood Ratio Test (LRT) methodology was used. The full model (~ Surrogate Variable + Sampling Timepoint + Genet) and reduced (~ Surrogate Variable + Sampling Timepoint), were used, with alpha set to 0.001. All significant genes were used in gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. To identify finer scale expression profiles between the genets, the VST counts of significant genes from the LRT were used in DEGreport::degpatterns72. Visualization of identified gene clusters was done in GGplot259, and GO and KEGG enrichment analysis was undertaken for all identified clusters. Due to high enrichment of neurodegenerative disease terms, Venn diagram analysis was undertaken to identify the level of overlap of the genes present within these terms for relevant clusters.

Weighted gene co-expression network analysis (WGCNA) was run using the r package WGCNA73. To identify co-expression due to sampling time point and reef, input for WGCNA analysis was the VST transformed counts with genet and surrogate variable variance removed using limma::removeBatchEffect. Outlier samples were identified using the Ward.D2 methodology, and a single signed network was constructed manually with the following key parameters: softPower = 8, minModuleSize = 30, deepSplit = 2, mergedCutHeight = 0.25, minimumVerbose = 3, and cutHeight = 0.997 (Supplementary Fig. 2). The eigengene values of each module were correlated to sampling timepoints (ST2, ST3, ST4, and ST5), OISST SST, reef sites (CF, NDR, and PI), and outplant depth. For the OISST SST, we used the respective daily average temperature value for each sampling time point at each reef site, with this data available in Supplementary File S2. The highest connected gene within each module was also identified (wgcna::chooseTopHubInEachModule). GO and KEGG enrichment analysis was done for each module to ascertain putative biological functions and roles. Interesting modules were then subset from the main wgcna heatmap, and manually arranged for visualization, using ComplexHeatmap74.

Significant gene lists (from DEseq2), clusters (from DEGpatterns), and modules (from WGCNA) were used as input for GO enrichment analysis in Cytoscape v.3.8.275and the application BiNGO76. Enrichment analysis was run using the hypergeometric test and p-value correction using a Benjamini & Hochberg false discovery rate (FDR). Alpha was set at 0.01 for all GO enrichment analyses. The background universe of genes to test enrichment was the set of genes remaining after initial low count filtering. Visualization of GO enrichment was done in Cytoscape v3.8.275, and further visualization of gene expression to GO terms was run in the R package Complex Heatmap74 using the VST counts with genet and batch variable removed.

Significant gene lists (from DEseq2), clusters (from DEGpatterns), and modules (from WGCNA) were also annotated with their KEGG gene identifiers present in the A. palmata annotation file65,66. Clusterprofiler v3.18.177was used to identify KEGG pathway enrichment, with enrichment tested against the KEGG Orthology database (organism = “ko”), and alpha set at 0.01. Visualization of enriched pathways was done with clusterprofiler77, enrichPlot 1.10.278, GGPlot259and complex heatmap74.

For clusters (identified from DEGpatterns), and modules (identified from WGCNA), significant GO and KEGG enrichment results were inspected and used to assign putative overarching functions. We present full significant KEGG and GO results used to assign these putative functions in supplementary files.

Results

Sea surface temperature between the three reef sites

There were minimal differences in temperature profiles at each sampling time point for the three reef sites over the one-year period (Fig. 1C). Seasonally, November 2018 to May 2019 showed average temperatures of ~ 26 °C. May 2019 onwards showed an increase in temperature, with average values around 30 °C and a max in mid-August 2019 of 32 °C. September 2019 onwards saw a decrease back to ~ 26 °C. Due to weather conditions, reefs were sampled at each time point as close as feasibly possible. For ST1 this covered 15 days (14th–26th November 2018), ST2 five days (28th February to 4th March 2019), ST3 seven days (25th June−2nd July 2019), ST4 two days (25th−26th September 2019), and ST5 two days (18th-19th November 2019) (Fig. 1B). For each reef site and sampling time point, the average OISST SST is reported in Table 1.

Table 1 Average optimum interpolation sea surface temperature (OISST) sea surface temperature (SST) and number of samples used in transcriptomic analysis (after removal of low count samples) for each sampling time point at each reef site.

Sequencing results

A total of 374 of the 380 samples were successfully sequenced (Supplemental File S1), with sample read depth ranging from 567,208 to 16,494,255 reads, and an average and median read depth of 1,843,913 (standard deviation (SD) = 1,281,721) and 1,550,645 respectively, and an average alignment to the A. palmata annotated transcriptome of 32.95% which was expected and similar to previous 3’RNA-seq alignment rates in the species60. A total of 12 samples were removed with < 1 million counts across all genes. This resulted in average and median read depths of 1,877,383 (SD = 1,288,203) and 1,563,692 respectively, and 362 samples for downstream analysis (Supplemental File S1). When grouping by genet, this resulted in the following sample size: CN2 = 108, CN4 = 77, HS1 = 102, and ML2 = 75. For a breakdown of sampling time point and reef site sample sizes, please see Table 1. Initial hierarchical clustering of the samples identified a strong surrogate variable which was not explained by any metadata variables taken (Supplementary Fig. S3). As such, all downstream differential expression and co-expression analysis included this as a surrogate effect. Filtering of low count genes resulted in 15,767 genes being retained for downstream analysis. These 15,767 genes also constituted the ‘gene universe’ (i.e. background set of genes used to test enrichment for gene lists of interests) used for GO enrichment analysis. While 3’ RNA-seq requires lower read depth for sample quantification79, we would like to note that our read depth is on the lower end and may affect the inference of gene processes from analysis. Despite low read depth, testing with more stringent filtering thresholds (five counts/gene in all samples, and 20 counts per gene in 80% of samples) identified the same overarching patterns and relationships in our data, with these results available in the GitHub repository.

Genet identity was the largest driver of gene expression variance

After accounting for the surrogate variable, genet identity was the largest driver of gene expression, with groupings of genet present from PC1 to PC4 (Fig. 2A–C) and no clear groupings of sampling time point (Supplementary Fig. S4) or reef site (Supplementary Fig. S5) present. Differential expression analysis, using the likelihood ratio test (LRT) method in DESeq2, identified 7,096 significant genes between the four genets (Supplementary File S3). The degreport::degpatterns72 analysis, utilizing the 7,096 significant genes between the genets from the LRT, identified six different expression clusters (Supplementary File S3) with differing gene sets and expression profiles (Fig. 2D). Cluster 1 (2847 genes) showed significant GO and KEGG pathway enrichment of terms related to cell growth and protein production, with KEGG enrichment identifying multiple terms important in immune system signaling pathways (such as RIG-I-like (KO:04622), C-type lectin (KO:04625), and NOD-like receptor (KO:04621) signaling pathways), and pathogenic infection (such as Shigellosis (KO:05131), Pathogenic Escherichia coli infection (KO:05130), and Yersinia infection (KO:05135)) (Fig. 2E). Genet HS1 showed a strong positive association, and genet CN4 showed a strong negative association with Cluster 1 (Fig. 2D). Cluster 2 (751 genes) identified enrichment of one GO term, Fibroblast growth factor receptor signaling (GO:0008543), and one KEGG term, Lysosome (KO:04142). Genet CN4 again showed a strong negative association, and ML2 showed a strong positive association with Cluster 2 (Fig. 2D). Cluster 3 (565 genes) showed GO enrichment of ribosome and translational processes. Enrichment map analysis of the KEGG enrichment results identified one functional group: terms linked to neurodegenerative diseases (Fig. 2E and Supplementary Fig. S6A) with genet CN2 showing a strong positive association and HS1 showing a strong negative association (Fig. 2D). Cluster 4 (1421 genes) again showed GO enrichment of terms linked to ribosomes and translational processes. Enrichment map analysis of significant KEGG pathways identified two functional groups; terms linked to pathogenic infection, and terms linked to neurodegenerative diseases (Fig. 2E and Supplementary Fig. S6B). CN2 and HS1 showed negative associations with Cluster 4, while CN4 and ML2 showed positive associations (Fig. 2D). Cluster 5 (640 genes) showed no significant GO or KEGG enrichment, with genet CN2 showing a strong negative association and genet HS1 showing a strong positive association (Fig. 2D). Cluster 6 (551 genes) showed no GO enrichment, but KEGG enrichment and enrichment map analysis again identified the majority of terms to be present in a functional group linked to neurodegenerative diseases (Fig. 2E and Supplementary Fig. S6C). Genet ML2 showed a negative association and CN2 showed a positive association with Cluster 6 (Fig. 2D). Full GO enrichment and KEGG pathway enrichment results are available in Supplementary File S4 and Supplementary File S5 respectively. Venn diagram analysis identified that despite there being significant enrichment of similar neurodegenerative disease terms in Clusters 1, 3, 4, and 6, there was minimal overlap of the genes causing enrichment of these terms between the clusters, with no genes shared between all four of the clusters (Supplementary Fig. S6D).

Fig. 2
figure 2

Genet identity was the largest driver of gene expression variance, with expression profile clusters identifying different baseline expressions among genets. (A) Principal component (PC) 1 (16% variance) and PC2 (14% variance) of the four genets with 95% confidence intervals. (B) PC 2 (14% variance) and PC3 (12% variance) of the four genets with 95% confidence intervals. (C) PC 3 (12% variance) and PC4 (10% variance) of the four genets with 95% confidence intervals. (D) The six identified clusters from degreport::degpattern of the significant LRT genes identified between the four genets. Each box identifies a cluster, with the x-axis showing the genet, and the y-axis the computed expression z-score for each genet. A value closer to 1 indicates higher expression, a value closer to 0 indicates neutral expression, and a value closer to − 1 indicates lower expression. (E) Inferred function identified from GO and KEGG enrichment analysis for each cluster. Cells with “—” indicate no inferred function due to low or no significantly enriched GO or KEGG terms. For (A)–(D); tan = genet CN2, blue = genet CN4, green = genet HS1, and gray = genet ML2.

Co-expression analysis identified modules that significantly correlated to the winter (ST2 and ST5) and summer (ST3 and ST4) seasons once accounting for genet variance

On removal of the identified surrogate variable and genet identity, co-expression analysis identified 15 co-expression modules (Supplementary Fig. S7) ranging from 50 to 7514 genes (Supplementary File S6). Three modules showed significant correlations with all four sampling timepoints (Fig. 3A). The Purple module (278 genes, hub gene = 60S Ribosomal) showed negative correlations with ST2 and ST3 (−0.14 and −0.14 respectively) and positive correlations with ST4 and ST5 (0.16 and 0.12 respectively) (Fig. 3A). GO and KEGG enrichment analyses identified terms to be linked to ribosomal and metabolic processes (Supplementary Files S7 and S8 respectively). From significant GO and KEGG enrichment results, the putative function of the purple module was deemed to be “protein synthesis and homeostatic processes”. The Black module (737 genes, hub gene = HIG1 domain family member 1C) showed negative correlations with ST2 and ST5 (−0.27 and −0.41 respectively), and positive correlations to ST3, ST4 (0.24 and 0.4 respectively), and SST (0.5) (Fig. 3A). GO enrichment identified only one significantly enriched term, Endoplasmic reticulum and chaperone complex (GO:0034663) (Supplementary File S7). KEGG enrichment identified several signaling pathways important in general cell proliferation, management, and cell growth. Specifically, this included signaling pathways (MAPK signaling pathway (KO:04010), PI3K-Akt signaling pathway (KO:04151), GnRH signaling pathway (KO:04912), Ras signaling pathway (KO:04014), and cAMP signaling pathway (KO:04024)) and other terms (Focal adhesion (KO:04510), Proteoglycans in cancer (KO:05205), and Protein processing in endoplasmic reticulum (KO:04141)) (Supplementary File S8). From significant GO and KEGG enrichment results, the putative function of the Black module was assigned as “Organismal cellular maintenance and growth”. The Light Cyan module (59 genes, hub gene = Cell Surface hyaluronidase) showed positive correlations with ST2 and ST5 (0.27 and 0.21 respectively) and negative correlations with ST3, ST4 (−0.44 and −0.16 respectively), and SST (−0.41) (Fig. 3A). There was only significant KEGG enrichment for one term: Rheumatoid arthritis (KO:05323) (Supplementary File S8). Due to low GO and KEGG enrichment, no putative function was assigned to the Light Cyan module.

Fig. 3
figure 3

There were significantly correlated co-expression modules to sampling time point and reef location when accounting for genet variance. (A) Heatmap of co-expression modules and their respective correlation and significance to metadata variables of interest. Columns are split into a) sampling time point 2–5 and average OISST, b) reef locations, and c) cluster outplant depth. Heatmap rows are split into sets of co-expression modules. The heatmap fill; red = positive correlation, blue = negative correlation. Text in each cell identifies the Pearson correlation (upper value) and p-value (lower value). White cells denote a significance of >0.05 and were removed. Bar chart to the right of the heatmap indicates the number of genes in each respective co-expression module. (B) Identified hub gene and putative function of each module as ascertained from significant GO and KEGG enrichment pathway analyses. Full GO and KEGG enrichment results used to infer function are available in Supplementary Files S7 and S8 respectively.

Four modules showed significant correlations to three of the four sampling timepoints: Red, Light Yellow, Midnight Blue, and Turquoise (Fig. 3A). The Red module (1944 genes, hub gene = Protein CREG1) showed negative correlations with ST2 and ST3 (−0.13 and −0.1 respectively) and a positive correlation with ST4 (0.14) (Fig. 3A). GO enrichment identified three terms all linked to the mitochondrion (Supplementary File S7), while KEGG enrichment identified two main sets of processes: fatty acid metabolic processes and viral associated infections and responses (Supplementary File S8). From significant GO and KEGG enrichment results, the putative function of the Red Module was assigned as “Mitochondrial Maintenance and Breakdown”. The Light-Yellow module (38 genes, hub gene = Cryptochrome-1) showed positive correlations with ST2 and ST3 (0.32 and 0.13 respectively) and a negative correlation with ST5 (−0.35) (Fig. 3A). There was no KEGG enrichment, but GO enrichment identified terms linked to multiple parts of the mitochondrion, as well processes important in fatty acid metabolism, vitamin metabolism, and respiration (Supplementary File S7). From significant GO enrichment results, putative function of the Light-Yellow module was assigned as “Metabolic Processes”. The Midnight-Blue module (62 genes, hub gene = Bromodomain containing protein 2) showed positive correlations with ST3, ST4 and SST (0.17, 0.18, and 0.3 respectively) and a negative correlation with ST5 (−0.26) (Fig. 3A). There was no significant GO or KEGG enrichment for this module, thus no putative function was assigned. The Turquoise module (7,415 genes, hub gene Myosin-10) showed positive correlations with ST2 and ST5 (0.14 and 0.22 respectively) and negative correlations with ST4 and SST (−0.289 and −0.22 respectively) (Fig. 3A). At alpha 0.01, there were 204 significant GO terms (Supplementary File S7) and 192 significant KEGG terms (Supplementary File S8). Both GO and KEGG pathway showed enrichment of terms linked to metabolic processes, immune signaling pathways, and protein synthesis. Due to high significant enrichment of GO and KEGG terms, the turquoise module was designated as “general organism homeostasis”.

There were no co-expression modules that were significantly correlated to all three reef sites

Co-expression analysis identified two modules that showed correlation to at least two of the three reef sites; Light Yellow module (38 genes, hub gene = Cryptochrome 1) which showed a negative correlation to North Dry Rocks (−0.45) and a positive correlation to Pickles Reef (0.45), and the Cyan module (784 genes, hub gene = uncharacterized protein LOC111326987) which showed a negative correlation to Carysfort (−0.11) and a positive correlation to North Dry Rocks (0.15) (Fig. 3A). The Light Yellow module was assigned the putative function of “metabolic process” from GO and KEGG enrichment analysis. For the Cyan module, GO enrichments identified terms linked to the extracellular space as well as terms important in general organism growth and nervous system cell development (Supplementary File S7). KEGG enrichment only identified two terms: Maturity onset diabetes of the young (KO:04950) and Notch signaling pathway (KO:04330) (Supplementary File 8). The putative function of the Cyan module was “growth processes”.

Discussion

Baseline immune gene expression and putative protein production drove differences in gene expression between the four genets

In this study, we identified that genet identity is the largest driver of gene expression in outplanted A. palmata (Fig. 2A-C), and this masks the signals of sampling time point and reef location. From our GO and KEGG enrichment analysis, we identified that baseline immune gene expression seems to be an important difference between genets of A. palmata. Baseline gene expression of an organism can be described as the normal expression levels of an organism not exposed to stimuli. Within a species, different levels of baseline expression can therefore exist for different processes. For example, differing levels of immunity between genets could drive factors such as disease susceptibility and resistance. For corals, it is well documented that there are differences in the production of immune-related molecules80,81as well as differing levels of disease susceptibility82,83,84. This also holds true within coral species, with different genets showing differing levels of susceptibility to disease35,36, and heat stress33,34. At present, the exact molecular mechanism(s) of differences in baseline immune activity has not been fully elucidated within a coral species. Epigenetic modification could play a role, with histone modification85and DNA methylation86being shown to influence immune gene expression in other organisms. In corals, epigenetic modifications can occur due to environmental perturbations87as well as specific stress such as nutrients88. Similarly, previous work has shown that corals can front-load genes, allowing them to resist stressful conditions42,89. We therefore hypothesize that different genets of A. palmata could be using these processes to cause differences in baseline immune gene expression. We previously showed this same difference in immune-related gene expression A. palmata genets in an ex-situdisease experiment60. In this study, using the same four genets of A. palmata, we identified the same signal for baseline immune gene expression. This indicates that this signal is consistent over time and space, and is therefore important in patterns of resistance and susceptibility in A. palmata genets.

The clusters identified from the genet LRT analysis also identified consistent enrichment of GO terms linked to ribosome and translational processes (clusters 3 and 4). These results could indicate that among A. palmata genets different baseline rates of protein synthesis exist, which could affect physiological variables such as growth rates, as well as responses to biotic and abiotic perturbations. High baseline ribosome and translational processes could be linked to faster-growing genets. Specifically, we think genet CN2 may have faster growth rates in reefs as it showed higher abundances of genes involved in both ribosome and translational and growth processes (cluster 1) but additional work needs to be conducted to provide evidence for this hypothesis.

Clusters 4 and 6 both had designations for neurodegenerative diseases, with enrichment map analysis showing high connectivity for these terms (Supplementary Fig. 6A-C). Interestingly, the intersection of the genes within the highly connected neurodegenerative terms for each cluster had minimal overlap among genets (Supplementary Fig. 6D) indicating that different cellular processes are being expressed among the genets. Neurodegenerative diseases are characterized by processes such as misfolded proteins90,91, oxidative stress92,93, disruption of calcium homeostasis94,95, and mitochondrial dysfunction96,97. While these terms are specifically linked to human neurodegenerative diseases, the core processes and pathways that are involved in these diseases are present throughout the metazoan tree of life. For example, oxidative phosphorylation is a core organismal process that generates adenosine-triphosphate, the source of energy at a cellular level. Stony coral tissue loss disease research has also identified enrichment of neurodegenerative diseases term98 which indicates that the core processes present in well-studied human diseases may show different baseline expression in different coral genets, and thus provide benefits and tradeoffs. Future work should look to fully characterize the potential pathways and processes which are encompassed in these human neural diseases in corals, and subsequently identify how these processes affect the physiological processes of corals, and if this confers any benefits or tradeoffs that can be incorporated into coral restoration projects.

The results from this study, paired with past research, underline the importance of not only characterizing the immune repertoire of coral species but also identifying how differences in baseline gene expression among coral genets may influence susceptibility and resistance to biotic and abiotic stressors. From a coral restoration perspective by characterizing the baseline immune performance, outplanting of more genets that can mount a higher immune response may improve survivorship and restoration of reefscapes.

Co-expression analysis identifies higher potential growth genes in A. palmata in the cooler winter months and shifts to survival and defensive processes in the summer months

Once accounting for genet identity, WGCNA73 analysis identified co-expression modules significantly correlated to sampling time point with three significant modules (Purple, Black Light Cyan) for the four sampling timepoints (Fig. 3A). The Black and the Light Cyan module correlations mirrored the summer versus winter months, with correlations also to sea surface temperature (Fig. 3A). The Light Cyan module, which was negatively correlated with temperature, may encompass important genes involved in growth processes in A. palmata. For instance, the Light Cyan module included genes involved in structural integrity of the extracellular space (collagen genes), and tissue rigidity (tricholycin and reticulobulin genes). The hub gene of this module (cell surface hyaluronidase) has also been shown to be important in regulation of cell adhesion, as well as the degradation of hyaluronan which regulates development and structural integrity in the extracellular matrix99,100. A hyaluronan-like substance has been identified previously in the coral species Mycetophyllia resi with the putative function important in tissue and skeletal matrix structure101. This could indicate that during the cooler winter months, growth processes are favored by outplanted A. palmata on the reef system, which aligns with previous findings that A. palmatagrows quicker at cooler temperatures102.

The Black module showed the inverse compared to the Light Cyan module, with a strong positive correlation (0.5) with sea surface temperatures and sampling time points 3 and 4 (i.e., summer months) (Fig. 3A) and an inferred function of immune, survival, and metabolic processes (Fig. 3B). The survival and metabolic inferred function was identified due to the significant enrichment of the MAPK, PI3/AKT, and RAS signaling pathways. MAPK signaling is a highly conserved signaling pathway103with importance in gene expression, mitosis, metabolism, immune responses, and survivability104. The PI3/AKT signaling pathway is similar, being highly conserved105and playing key roles in the immune system, growth, and survival106. The RAS signaling pathway plays key roles in regulation and proper function of both the MAPK and PI3/AKT signaling pathways107. The enrichment of these pathways during the warmer summer months suggests that outplanted A. palmata may shift to a more survival and maintenance mode compared to growth processes that occur in the winter. This is plausible and indicates that although A. palmata were not exhibiting bleaching and looked relatively healthy, even moderate increases in temperature may induce stress108,109. Increased water temperatures can increase abundance and pathogen virulence110,111,112which may explain the increased immune system expression targeted to prevent infection. While we observed differences in baseline immune gene expression between the genets, this does not limit all genets initiating immune responses to factors such as pathogens in the environment. Rather, it may influence variables such as survivability and disease resistance for specific genets, with higher baseline expression of these immune pathways conferring higher resistance to all, or specific, diseases. The Black module also contained immune signaling pathways and processes adding support to this observation. The TNF signaling pathway is an important immune cytokine that can activate additional innate immune responses113, is important in cell proliferation and cell death114, and consistently shows increased expression in coral disease98,115,116and heat117,118 transcriptomic studies.

Reef location did not have a strong effect on gene expression for outplanted genets of Acropora palmata

Principal component analysis and co-expression analysis did not identify a strong effect of the outplant reef site on gene expression between the four genets of A. palmata. There were two co-expression modules that did show significant correlations to at least two of the three reef sites as well as cluster outplant depth (Fig. 3A). There was only significant GO enrichment for the Light Yellow module, with terms involved in mitochondrial processes and fatty acid metabolism (Supplementary File 7). Coral species are mixotrophic utilizing both autotrophic processes, where energy is derived from photosynthates produced by the obligate symbiont Symbiodiniceae, and heterotrophic processes, involving passive and active means of acquiring energy. Light availability is the key variable that dictates the level of these factors. Light availability however can be affected by not only depth, but turbidity of the reef site, with more shallow reef sites that exhibit higher levels of turbidity having reduced light availability for autotrophic energy acquisition. Despite this, coral species at deeper depths (i.e. species that inhabit shallow to upper mesophotic reef depths ~ 60m) switch to a more heterotrophic lifestyle mode of energy acquisition due to the lower levels of light available causing decreased output from the photosynthetic Symbiodiniceae119. With the significant correlation to depth, it is plausible that outplanted A. palmata at Pickles Reef could rely more heavily on heterotrophic energy acquisition processes. However, this hypothesis requires substantial additional testing. As we only measured outplant depth we cannot disentangle whether this is truly due to outplant depth and light availability, or whether other variables such as turbidity (which can influence light availability), or water flow regime, could be driving this result. Enrichment of fatty acid metabolic processes in the Light Yellow module do however provide additional support for our hypothesis. Corals living at deeper depths in a more heterotrophic lifestyle have been shown to have higher energy reserves than corals at shallow depths in a more autotrophic lifestyle120,121. The outplanted A. palmata at Pickles may therefore be utilizing fatty acid stores that they have acquired through higher rates of heterotrophic feeding. While we pose this hypothesis, we do acknowledge that it is not possible to directly compare mesophotic to shallow coral reefs due to a wide range of differences in the abiotic and biotic variables present. Specifically abiotic variables (such as light availability and temperature122) and biotic variables (such as symbiont density123and growth rates124,125) vary greatly between these reef habitats. To fully test this hypothesis, incorporation of reef sites spreading the full range between shallow and mesophotic reefs would be needed, with a full range of abiotic variables (temperature, outplant depth, turbidity, water flow regime) and biotic (symbiont density, growth rates, microbiome composition) needed to disentangle whether light availability is causing shifts in energy acquisition. Additionally, utilization of methodologies such as whole tissue stable isotope analysis of carbon and nitrogen, as well as tissue lipid content analysis, in conjunction with measured abiotic and biotic variables, would allow a more definitive conclusion of how shifts in levels of heterotrophic and autotrophic energy acquisition may change in outplanted A. palmata.

Conclusions and future directions for genomic analysis of outplanted corals

We identified that genet identity is an important factor in outplanted A. palmata, and a leading driver of the differences seen between the four genets is baseline immune gene expression. Since we identified an intra-population variation in response to phenotypically healthy corals, more A. palmata genets would help identify whether this pattern holds at a population level. While it was only possible to include four genets in this current analysis, incorporation of samples of the additional seven genets taken during the fieldwork portion of this project would be a good starting point. Using a larger sample genet sample size would allow to identify how baseline profiles influence outplant performance. Additional ex-situ experiments of multiple genets of A. palmata to different biotic and abiotic stressors would also help identify the “winner” and “loser” genets. These results could be incorporated into restoration practices potentially increasing outplant survivability by matching fitter genets to preferred reef systems.

Future work looking at the effects of outplant sites on coral gene expression should include a wider range of reef habitats. Despite the reef habitats used in this study exhibiting different topologies and rugosity, they are all still offshore reef sites within close proximity. Inshore and offshore reef systems have been shown to have different abiotic conditions influencing the performance of resident coral populations. By monitoring outplanted coral at different reef systems with different abiotic conditions, it would allow characterization of gene pathways that are affected due to these environmental conditions.