Introduction

The juçara palm (Euterpe edulis Mart.) is endemic to the Atlantic Forest, with a natural range that spans the entire biome and extends into some regions of the Cerrado1,2. This species plays a vital role in supporting these ecosystems by providing fruits and seeds as a food source for various animals, particularly during periods of scarcity3,4. Along with its ability to adapt to diverse environments, E. edulis also displays significant phenotypic and genetic variation4,5,6. In recent years, juçara cultivation for fruit production has gained economic significance, driven by the growing demand for açaí (Euterpe oleracea), a processed pulp made from the fruits of a co-generic species7. However, effective strategies for establishing and managing this species in the field still need to be developed, due to the fragility of the species in its juvenile phase.

The early development of E. edulis poses one of the main challenges for cultivating and managing juçara in the field, due to the high mortality rate of seedlings, which require shading and have low tolerance to water deficits8,9. Additionally, the species has a long developmental period, taking approximately six years from planting to flowering and fruiting10. Naturally, E. edulis thrives in the shaded understory of Atlantic Forests, where water availability is typically high11. Although it is shade-tolerant, studies have shown that a moderate increase in light availability—such as in areas with less dense canopy—can promote better seedling growth12. Another critical factor is the species’ recalcitrant seeds, which limit storage and hinder seedling production13,14. These factors underscore the urgency of developing genomic resources to support more efficient strategies for seedling production and field cultivation. Therefore, our study contributes can aid in candidate genes that may be used to develop more effective strategies for establishing and managing the species in the field15,16. To achieve this, generating genomic knowledge for the species is essential.

Euterpe edulis is a non-model organism without a reference genome, making transcriptome analysis an essential strategy for annotating genes and identifying gene expression patterns under different conditions17,18,19. In transcriptome analysis, leveraging genomic data from available Arecaceae species is particularly valuable for identifying conserved genes across species20,21,22,23,24. In this context, the genome of Elaeis guineensis Jacq. (oil palm) stands out as a high-quality reference, as it is chromosome-based and offers high coverage21, representing a promising alternative for reference mapping in the transcriptome analysis of E. edulis. Moreover, the phylogenetic proximity between these species25,26 provides a robust framework for identifying conserved genes. Using a well-established reference genome in transcriptome analyses also enhances the precision and reliability of gene annotation27,28.

Orthologous genes, which are evolutionarily conserved across different species and maintain their biological function26,29, play fundamental roles and serve as indicators of essential biological processes, particularly when highly expressed. Identifying these genes is crucial, as they reveal conserved mechanisms and transcripts key to critical biological processes specific to a botanical family26,30,31.

In various studies with different species, conserved genes involved in plant development and stress response have been identified through reference mapping, along with differentially expressed genes (DEGs) indicating specific adaptations to environmental conditions32,33,34,35,36,37.

Given the significance and limited availability of genomic data for juçara, particularly regarding the molecular mechanisms involved in its early development, the present study aims to investigate whether phenotypically two divergent E. edulis matrices—UFES_250 and Santa Marta (SM)—exhibit distinct gene expression profiles that may reflect phenotypes of interest for the species production. Additionally, the study seeks to provide conserved genes related to its early development. The research explores the relationships between divergent phenotypic traits and the differential expression of leaf and root transcriptomes to identify conserved genes associated with development and environmental adaptation. By maximizing the detection of genes and transcripts, we aim to contribute to the knowledge of conservation strategies, management, and genetic improvement, ensuring a sustainable future for E. edulis and related species.

Materials and methods

Plant material and morphological characterization

Seeds from two morphologically divergent matrices, UFES_250 (Fig. 1A, B, C) and SM (Fig. 1D, E, F), were collected. Plants of these matrices are located in a private planting area for juçara fruit production in the municipality of Rio Novo do Sul, in the state of Espírito Santo (Brazil) (latitude − 20.807.598, longitude: −40.934.519). The region has a tropical climate with a dry season, an average annual temperature of around 22 ºC, and an altitude of 470 m.

Fig. 1
figure 1

Euterpe edulis matrices UFES_250 (Figures A, B, and C) and Santa Marta (SM) (Figures D, E, and F). Roots and leaves from both matrices were collected after germination for RNA extraction. (A) Euterpe edulis UFES_250 during germination (three months); (B) Euterpe edulis UFES_250 after six months of germination; (C) Root (mean 13 cm) and aerial part (mean 14 cm) of E. edulis UFES_250. (D) Euterpe edulis SM during germination (three months), (E) Euterpe edulis SM after six months of germination, (F) root (mean 17 cm) and aerial part (mean 16 cm) of E. edulis SM.

The UFES_250 matrix was selected for its high pulp yield (31.08%)38 and for displaying typical species characteristics, including a greenish-yellow leaf sheath, cream-colored inflorescence, green immature fruits, and short bracts39. The SM matrix exhibits distinct morphological traits compared to the typical E. edulis plants found in Espírito Santo, thriving better in cooler environments40. According to local growers, SM plants are distinguished by their thicker stems, larger fruits (though with lower pulp yield), bigger fruit clusters, wider spacing between leaf scars, and longer heart of palm. In the field, the UFES_250 genotype exhibited notable differences in fruit and seed characteristics compared to the SM genotype. UFES_250 produced an average of 4 fruit bunches, while SM had a slightly higher average of 4.20 bunches. However, UFES_250 had a significantly higher pulp yield (31.08%) than SM (16.42%). The fruits of UFES_250 were also larger, with an equatorial diameter of 14.50 mm and a longitudinal diameter of 15.21 mm, compared to 13.17 mm and 13.01 mm, respectively, in SM. Similarly, seed dimensions were greater in UFES_250, with an equatorial diameter of 13.02 mm and a longitudinal diameter of 13.60 mm, whereas SM seeds measured 12.59 mm and 11.94 mm, respectively.

The fruits were collected and de-pulped, and 120 seeds from each matrix were obtained in May 2021. The seeds were immediately immersed in warm water (32 °C) for 40 min before being placed in tubes with substrate and maintained in a greenhouse for germination and early development, without temperature control (Fig. 1). After six months, samples of leaves and roots were collected from 50 seedlings of each matrix for phenotypic measurements and transcriptome analysis.

The lengths of the leaflets, stem, aerial part, and root were recorded in centimeters to be used as phenotypic measurements. Fresh masses of the aerial part and root were measured in grams using a precision balance. To obtain dry masses, samples were dried in a forced-air circulation oven at 65 °C until reaching a constant weight, after which they were weighed using a precision balance. Descriptive statistical analyses of these traits were performed using RStudio v.4.2.1 with the “ggplot2” package41. Additionally, an analysis of covariance (ANCOVA) was conducted (Supplementary File 1, Table 3) using the “stats” package42.

RNA extraction and sequencing

Leaf and root samples were collected from the 50 seedlings grown in the greenhouse for six months for RNA extraction (Supplementary File 1, Fig. 1). Five tissue pools were created for each sample, with each pool consisting of tissue from 10 different plants, collected and mixed in equal proportions. These pooled samples, totaling 100 mg of fresh tissue, were placed in microtubes and immediately frozen in liquid nitrogen. The RNA was then extracted from the pooled samples.

The frozen plant material was ground in a mortar using liquid nitrogen. RNA extraction was performed using the cetyltrimethylammonium bromide (CTAB) method, adapted43 with modifications44. Initially, 900 µL of pre-warmed (65 °C) extraction buffer was added to the tissue powder, and the mixture was agitated until homogeneous. The mixture was then transferred to a 2 mL microtube and incubated at 65 °C for 10 min. Next, an equal volume of chloroform/isoamyl alcohol (24:1, v/v) was added, and the tube was vigorously shaken. The microtube was centrifuged at 7,000 × g for 20 min at 4 °C. The supernatant was collected, transferred to a new 1.5 mL microtube, and re-extracted with an equal volume of chloroform/isoamyl alcohol (~ 650 µL).

Next, 0.5 volume of 5 M lithium chloride (LiCl) was added to the supernatant, followed by incubation at −20 °C for four hours. The RNA was selectively pelleted by centrifugation at 16,000 × g for 30 min at 4 °C. The pellet was washed with 75% (v/v) ethanol and air-dried. The RNA was then solubilized in 30 µL of RNase-free ultrapure water and stored in an ultra-low temperature freezer at −80 °C for subsequent analyses. The extracted RNA was treated with DNase (DNase Treatment of RNA Samples Prior to RT-PCR – Promega) following the manufacturer’s protocol.

After RNA isolation, quantification was performed using a NanoDrop ND-1000 spectrophotometer, and the integrity of the samples was assessed by electrophoresis on a 1% agarose gel, using the GelRed intercalant (Biotium) for documentation. The quality and quantity of the total RNA were calculated using the TapeStation System (Agilent) and Qubit (Thermo Fisher Scientific), respectively.

The library was prepared using the Illumina TruSeq Stranded Total RNA Library Prep Plant Kit, following the TruSeq Stranded Total RNA Reference Guide (1000000040499 v00). This kit utilizes RiboZero beads to deplete ribosomal RNA from cytoplasmic, mitochondrial, and chloroplast sources in plant samples and a PCR master mix to transcribe RNA into cDNA. A normalization step was performed after obtaining the cDNA following a published protocol45 and using the Trimmer and Trimmer-Direct kit (Evrogen) to increase the discovery of more genes. The libraries were quantified using Qubit (Thermo Fisher Scientific), and fragment sizes were estimated using the TapeStation System (Agilent). Eight libraries (two for each tissue type and matrix) were subjected to total RNA-seq utilizing the Illumina NovaSeq 6000 platform in transcriptome Analysis mode, generating paired-end reads of 200–400 bp.

Mapping of the reads to the reference genome

Regarding the raw Illumina data in the FASTQ file format, data visualization was performed using FastQC v. 0.11.84446, followed by the removal of adapters and low-quality sequences using Trimmomatic v. 0.3847 with the following settings: Phred 30, Leading 3, Trailing 3, Slidingwindow 4:15, and Minlen 36. The cleaned sequence files were re-assessed using FastQC to ensure the effectiveness of the quality control process.

The filtered sequences were mapped to the reference genome of Elaeis guineensis (GCF_000442705.1)48. The statistical analysis was performed using HISAT2 version 2.1.049, based on the read mapping conducted with Bowtie2 version 2.3.4.150. Contiguous transcript sequences were assembled using StringTie version 2.1.3b, with the reference genome as the basis51.

Differential gene expression analysis

After assembly, gene/transcript abundance was calculated based on read counts, with values normalized using the counts-per-million (CPM) metric. Genes with very low counts across all libraries were filtered out to avoid negatively impacting statistical analyses, as such genes might be underrepresented in the samples. This filtering was performed using the filterByExpr command from the “edegR”52 package, with a minimum count (min.count) of 100. Samples were grouped in a 2D space using principal component analysis (PCA) plots generated with the plotMDS function from the “limma-voom”53 package (50). These plots were instrumental in providing information about the variability of biological replicates for each evaluated tissue. The analysis was conducted to find low variability within samples from the same group compared to the variability observed between different groups.

An initial analysis was performed using all eight libraries, followed by separate analyses of leaf and root tissue libraries, to assess differential expression between matrices. Contrasts for the comparisons were created using the makeContrasts function from the “limma-voom” package. CPM values were employed to calculate fold changes as the ratio between treatment and control on a logarithmic scale, e.g., log2 (CPM_SM/CPM_UFES_250). Fold changes greater than zero were considered up-regulated, while those less than zero were classified as downregulated relative to the control sample.

A statistical threshold of P ≤ 0.05 was used to identify significant results. The “limma-voom package” was also applied for differential gene expression analysis. P-values less than 0.05 were adjusted using the Benjamini-Hochberg method to control the false discovery rate (FDR)54.

DEG hierarchical clustering was conducted using the coolmap function from the “limma-voom” package, applying the average linkage clustering method. Heatmaps were used to display the genetic abundance levels. These heatmaps were constructed using log-transformed and normalized values of genes, based on uncentered Pearson distances and the unweighted pair group method with arithmetic mean (UPGMA). The color scheme represented the logarithmic intensity of gene expression linked to z-score measurements. A quantitative palette ranging from blue to red was applied, where relatively higher gene expression levels were shown in red, and lower expression levels in blue.

DEG volcano plots were generated using the EnhancedVolcano package55, enabling the analysis of tissue-specific genes.

Functional annotation of DEGs, as well as non-differentially expressed and highly expressed genes, was performed using the Gene Ontology (GO) database56 through the online tools DAVID57 and GOSlimViewer from the AgBase Database58. A Venn diagram of DEGs was created to display tissue-specific and tissue-independent genes across all three analyses.

The methodology, spanning from the quality control of sequencing reads to the functional annotation analysis conducted in this study, is summarized in the flowchart below (Fig. 2).

Fig. 2
figure 2

Flowchart of the applied methodology. Reads obtained from the FASTQ file, resulting from Illumina sequencing, were analyzed for quality and mapped to reference, resulting in 32,092 genes. After filtering, 7,140 expressed genes were obtained, which were normalized. Differential gene expression analysis was then performed, highlighting 678 differentially expressed genes (DEGs) in leaves and 444 in roots. When both tissues were analyzed, 11 DEGs were identified. All expressed genes were functionally annotated.

Results

Morphological characterization

The seedlings from the SM matrix emerged earlier (Fig. 1D) than those from the UFES_250 matrix (Fig. 1A). Among the morphological characteristics evaluated, seedlings from the SM matrix showed greater shoot length (SM: 16.43 cm; UFES_250: 14.33 cm; p-value = 0.00229**), stem length (SM: 7.802 cm; UFES_250: 6.824 cm; p-value = 0.0442*), and root length (SM: 17.5; UFES_250: 13.15; p-value = 7.72e−09***) (Fig. 3; Supplementary File 5) after six months. Overall, the average values for all measured variables were higher in the SM seedlings compared to the UFES_250 seedlings.

Fig. 3
figure 3

Comparative boxplot of early development data for traits such as lengths of aerial part, leaflet, stem, and root, as well as fresh and dry weight of aerial part and root from 50 seedlings of each UFES_250 and SM matrix, after six months in a greenhouse. Significant differences are indicated by asterisks (‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05).

Trimming of reads and sequencing data

The Illumina sequencing generated an average of 44,171,557 reads, totaling 6.3 Gbp across the eight libraries (Supplementary File 1, Table 1). After quality analysis, the libraries displayed an average GC content of 44.97% and a Q30 (percentage of bases with a Phred score above 30) of 94.92% (Supplementary File 1, Table 4). The average mapping rate of the reads to the Elaeis guineensis reference genome was 54% for leaf tissue and 12.45% for root tissue, with similar results observed across libraries from different matrices (Supplementary File 1, Table 5; Fig. 2). A total of 32,092 genes and transcripts were identified in the mapping process (Supplementary File 1, Table 2).

Differences between libraries and normalization

Biological replicates clustered both by tissue type and matrix, demonstrating the experimental conditions’ control and the distinct effects of matrix and tissue studied (Fig. 4 and Supplementary File 1, Fig. 3). Differential expression was observed between tissues, with greater variability in gene expression noted in leaf tissue compared to root tissue.

Fig. 4
figure 4

(A) Principal component analysis (PCA) showing the general differences between expression profiles of the libraries from the different matrices UFES_250 (purple ellipse) and Santa Marta (SM) (green ellipse). (B) The relative abundance of the 7,000 normalized genes in leaves and roots of the UFES_250 and SM matrices in individual and hierarchical clustering, describing the clustering between expression profiles in the analyzed tissues. Legend: RP1- Root pool 1; RP2- Root pool 2; LP1- Leaf pool 1; LP3- Leaf pool 3.

The low normalization of the eight libraries for leaf and root samples suggests the presence of a large number of highly up-regulated genes. A normalization factor below one indicates that a small subset of highly abundant genes dominates the sequencing output, leading to reduced counts for other genes compared to what would be expected given the library size. As a result, the effective library size for these samples was reduced. After normalization analysis across all eight libraries, 7,140 genes were identified (Supplementary File 2, Table 1). Of these, 670 were exclusively found in root samples (Supplementary File 2, Table 3), including 130 exclusive to the UFES_250 matrix and 69 to the SM matrix (Supplementary File 2, Table 2). In leaf tissue, 92 exclusive genes were identified (Supplementary File 2, Table 3), with 44 exclusive to UFES_250 and 15 to SM (Fig. 5A and Supplementary File 1, Table 7).

Fig. 5
figure 5

Venn diagram representing exclusive and shared genes by matrix and tissue. (A) Result of the 7,140 genes identified in the analysis of all eight libraries combined, showing the number of genes exclusive to leaf tissue (92) and root tissue (670) and by matrix (UFES_250 and Santa Marta - SM). (B) Result of the 4,239 genes identified in the analysis of only the leaf libraries, showing the number of exclusive genes in the UFES_250 (368) and SM (174) matrices. C- Result of the 6,052 genes identified in the analysis of only the root libraries, showing the number of exclusive genes in the UFES_250 (274) and SM (130) matrices.

In the evaluation of leaf libraries for both matrices (n = 4), the process revealed a low normalization factor for the SM leaves. From this analysis, 4,239 genes were obtained (Supplementary File 3, Table 1), of which 368 were exclusive to UFES_250 and 174 to SM (Fig. 5B and Supplementary File 3, Table 2). Regarding the root tissue libraries (n = 4), the SM samples also exhibited a low normalization factor, revealing 6,052 genes (Supplementary File 4, Table 1), with 274 genes exclusive to UFES_250 and 130 to SM (Fig. 5C and Supplementary File 4, Table 2).

Gene differential expression analysis

The heatmaps and volcano plots of the DEGs are shown in Fig. 6A (all tissues), Fig. 6B (leaf), and Fig. 6C (root). The red and blue colors represent DEGs that are up-regulated and down-regulated in UFES_250 compared to SM, respectively. In the analysis between the matrices UFES_250 and SM, considering all tissues together, 11 DEGs were identified (Supplementary File 2, Table 4), of which five were downregulated and six were up-regulated in UFES_250 (Supplementary File 1, Table 6). In the analysis using only the leaf libraries, 678 DEGs were identified, with 285 up-regulated and 393 downregulated in UFES_250 (Supplementary File 3, Table 3). For the roots, 444 DEGs were identified (Supplementary File 4, Table 3), with 119 up-regulated and 325 down-regulated in UFES_250. The lists of all DEGs and the 50 most highly expressed genes for leaf and root tissues can be found in Table 5 of Supplementary File 3 and Table 5 of Supplementary File 4, respectively.

Fig. 6
figure 6

Differentially expressed genes (DEGs) between UFES_250 and Santa Marta (SM) matrices. On the left, volcano plots of the DEGs. The y-axis represents the significant gene expression level between samples, measured by the p-value, while the x-axis represents the DEGs’ fold change (logFC). Blue and red points indicate down-regulated and up-regulated genes, respectively. Gray points represent genes with no significant differential expression. On the right, heatmaps of down-regulated and up-regulated genes with higher log2 FC values. The color indicates the expression level of DEGs with log2. (A) Result of the 11 DEGs obtained from the analysis of all tissues together, six up-regulated and five down-regulated in UFES_250. (B) Result of the 678 DEGs obtained from the analysis of only leaf tissue, 285 up-regulated and 393 down-regulated in UFES_250. (C) Result of the 444 DEGs obtained from the analysis of only root tissue, 119 up-regulated and 325 down-regulated in UFES_250. Legend: RP1- Root pool 1; RP2- Root pool 2; LP1- Leaf pool 1; LP3- Leaf pool 3.

Functional annotation of differentially expressed genes

All DEGs were functionally assigned to GO terms within the primary ontologies: ‘biological process,’ ‘cellular component,’ and ‘molecular function.’ Among the 11 DEGs identified in the analysis of all tissues (Fig. 7 and Supplementary File 2, Table 5), three GOs for ‘biological process’ functions and five for ‘molecular function’ stood out (Fig. 7A). Based on the GO results, the genes Hsp70 and LOC105056959 are related to responses to chemical stimuli and stress, regulated negatively and positively, respectively, in UFES_250.

Fig. 7
figure 7

Analysis of the main Gene Ontology (GO) of differentially expressed genes (DEGs) from the UFES_250 and Santa Marta (SM) matrices of Euterpe edulis, categorized into ‘biological process’ (green), ‘cellular component’ (blue), and ‘molecular function’ (orange). A. All tissues combined. B. Leaf tissue. C. Root tissue. D. Venn diagram of DEGs showing the distribution of tissue-specific and tissue-independent GOs in the two tissue sets (leaf and root).

The functional annotation of the 678 DEGs in the leaf tissue revealed 25 GOs related to ‘biological processes,’ 12 to ‘cellular components,’ and 15 to ‘molecular functions’ (Supplementary File 3, Table 4). Genes involved in responses to stress, chemical stimuli, light stimuli, and anatomical structural development were annotated, with some positively regulated in UFES_250 (e.g., LOC105042857, LOC105040294, LOC105049137, ClpB1, Hsp70, and aquaporin PIP2) and others negatively regulated (e.g., rpl2, rpl22, LOC105039602, LOC105039245, LOC105039517, and WRKY24) (Fig. 7B).

For the 444 DEGs found in the root tissue analysis, 16 GOs were related to ‘biological processes,’ seven to ‘cellular components,’ and 13 to ‘molecular functions’ (Supplementary File 4, Table 4). Genes involved in responses to stress, chemical stimuli, and abiotic stimuli were identified, with some positively regulated in UFES_250 (e.g., CLB3, TPS, aquaporin PIP2, LOC105042909, and LOC105041389) and others negatively regulated (e.g., LOC105039584 and WRKY24) (Fig. 7C).

Overall, the genes were similarly categorized by GO across both tissues. Of the 31 GOs shared between the two tissues (Fig. 7D), in addition to those involved in biological, metabolic, catabolic processes, and DNA binding activities commonly found in plants, GOs related to responses to stress, chemical stimuli, and abiotic stimuli were annotated for both tissues (Fig. 7A). Most of the 21 tissue-specific GOs for leaf are involved in ‘biological processes,’ such as reproduction, response to light stimulus, post-embryonic development, photosynthesis, and anatomical structure development. The five root-specific GOs are related to cellular protein modification processes, cell communication, vacuole, plasma membrane, and lipid binding.

Functional annotation of genes with no differential expression and highly expressed genes

In the GO analysis of genes with no differential expression (Supplementary File 2, Table 6; Supplementary File 3, Table 6; Supplementary File 4, Table 6), the main ‘biological processes’ identified were cellular process, metabolic process, biosynthetic process, metabolic process of nitrogenous base-containing compounds, transport, cellular protein modification process, organization of cellular components, protein metabolic process, catabolic process, and stress response. The principal ‘cellular components’ included membrane, nucleus, cytoplasm, intracellular anatomical structure, cytosol, and chloroplast. For the ‘molecular function’ component, functions identified included binding, nucleotide binding, catalytic activity, protein binding, molecular function, hydrolase activity, and transferase activity.

Furthermore, genes with functions of interest to the present study were detected, including those related to abiotic stimulus-response, external stimulus-response, cellular homeostasis, photosynthesis, reproduction, light stimulus-response, biotic stimulus-response, cellular differentiation, post-embryonic development, growth, circadian rhythm, cell growth, gene expression regulation, epigenetics, cell-to-cell signaling, pollination, tropism, floral development, and embryonic development (Fig. 8).

Fig. 8
figure 8

Gene Ontology (GO) analysis of genes that are not differentially expressed from the UFES_250 and Santa Marta (SM) matrices of Euterpe edulis, categorized into ‘biological process’ (green), ‘cellular component’ (blue), and ‘molecular function’ (orange), considering A. both tissues; B. leaf tissue; C. root tissue.

When analyzing the highly expressed genes in the two tissues together (Supplementary File 2, Table 5), we found that many of the genes are related to metabolic processes and stress response mechanisms in the UFES_250 matrix. For example, aquaporin PIP2-4 is involved in water transport, the large subunit of ribulose-1,5-bisphosphate carboxylase-oxygenase (ruBisCO) binding protein plays a role in photosynthesis, and heat shock proteins (HSPs) like Hsp 70 kDa protein 14 help protect against thermal stress. Genes related to the regulation of energy metabolism and antioxidant protection, such as glutathione peroxidase and 6-phosphofructokinase, are also highly expressed in this matrix.

In the SM matrix, the expressed genes show a greater predominance of transcription factors, like bHLH148 and MYB78, as well as genes related to hormonal signaling and defense processes, including cytochrome P450 and chitinase 10. The SM matrix also includes genes involved in carbohydrate degradation and synthesis, such as xyloglucan endotransglucosylase/hydrolase. Therefore, while UFES_250 exhibits greater modulation of genes related to cellular metabolism and stress responses, SM stands out for gene regulation and defense mechanisms.

The highly expressed genes in the leaf tissue of UFES_250 show a predominance of functions related to thermal stress, such as HSPs, which play crucial roles in the response to environmental stress and protein folding. Additionally, genes involved in flavonoid biosynthesis, like chalcone synthase and naringenin-dioxygenase, suggest a potential role in defense against pathogens and growth regulation. Genes related to energy metabolism and protein transport regulation, such as exosome complexes and ribosomes, are also present. In SM, the expressed genes display diverse functions, many related to cellular growth regulation and response to environmental signals. Serine/threonine kinases and cell wall-related proteins, like xyloglucan endotransglucosylase/hydrolase, indicate an important role in cell wall remodeling and development.

Genes such as ABC transporters and oxidases indicate involvement in metabolite transport and oxidative stress response. Comparing the gene modulation between the matrices, UFES_250 stands out for its genes related to thermal stress response and cellular protection, while SM is associated with signaling processes and structural development. Both matrices share genes that regulate growth and respond to environmental stresses.

The highly expressed genes in the root tissues of the UFES_250 matrix are associated with structural and cellular maintenance functions, such as tubulins and cytoskeleton-associated proteins. Genes involved in carbohydrate metabolism, including beta-glucosidase and phosphoglucomutase, were also identified. The presence of genes related to autophagy and transport proteins indicates a role in cellular recycling processes and nutrient transport.

On the other hand, in the SM matrix, there is a prevalence of genes associated with hormonal and environmental responses, including bHLH transcription factors and proteins responsive to ethylene and auxin, suggesting greater involvement in growth regulation and stress response. Additionally, genes related to the synthesis of oxidative enzymes (e.g., aldo-keto reductase) indicate detoxification activities and regulation of secondary metabolism. Therefore, while the UFES_250 root shows greater modulation for structural maintenance and transport genes, the SM root exhibits greater modulation for hormonal regulation and adaptive environmental responses.

Discussion

This study presents the first genomic insights for E. edulis, with the annotation of 1,133 DEGs during the early development of seedlings from two divergent matrices (UFES_250 and SM). The findings highlight: (1) differences in gene expression modulation between seedlings from different matrices in both leaf and root tissues; (2) greater conservation of genes in the root and differential modulation of leaf-specific genes; (3) DEGs associated with stress, early development, environmental stimuli, photosynthetic efficiency, and cellular integrity; (4) the predominance of biological and molecular processes specific to leaves and roots, with distinct GO terms for each tissue; (5) DEGs involved in flowering and responses to biotic and abiotic stresses with differential expression between matrices; and (6) key DEGs such as TPS, Hsp70, aquaporin PIP2, ClpB, SERK, and WRKY, which are responsive to stress and regulate development.

The seedlings exhibited phenotypic differences at six months of age, with those from the SM matrix showing higher average values. These outcomes indicate that, in addition to the genetic differentiation of SM individuals40, there is also distinct modulation of gene expression during the early development of seedlings from different matrices.

Although more than 50% of the reads from the foliar transcriptome were mapped to the Elaeis guineensis reference genome, the highest number of genes was detected in roots (6,052) compared to those detected in leaves (4,239). The root genes also varied less in expression. However, the occurrence of only 12% mapping of the root reads indicates that a large part of the genes and the expression variation in this tissue are species-specific.

The identification of root-specific genes in the species and the modulation of their expression may help explain their occurrence in the Atlantic Forest, with significant distribution and variation in phytophysiognomy4. On the other hand, searching for conserved DEGs in plants through transcriptomes is a strategy for identifying candidate genes as promising molecular markers linked to agronomic traits of interest59. The present study corroborates and expands information from studies on Arecaceae species regarding the conservation of genes related to early development60,61,62,63 and responses to abiotic stresses24,64,65,66.

The UFES_250 matrix exhibited about two-thirds of the exclusive genes in both leaf and root tissues (368 in leaf and 274 in root) than the SM matrix, which had fewer exclusive genes (174 in leaf and 130 in root). However, the SM matrix showed more significant initial development.

Genes related to responses to different stresses, seedling viability, and somatic embryogenesis were detected in the leaf and root tissues of the seedlings, including trehalose-6-phosphate synthase (TPS), 70-kDa Hsp70, aquaporin PIP2, casein lytic proteinase B (CLpB), somatic embryogenesis receptor kinase (SERK), and WRKY transcription factor (WRKY).

Considering all libraries, we detected 11 DEGs in both matrices in the analyses of individual tissues. Among these genes, those down-regulated in UFES_250 and up-regulated in SM were involved in transcription regulation (LOC105056959: probable mediator of RNA polymerase II transcription subunit 37c; LOC105044900: NAC domain-containing protein 68; and LOC105052255: WRKY transcription factor WRKY24-like)67,68,69; biosynthesis of sphingolipids, which aid in membrane formation and cell signaling (LOC105051063: serine C-palmitoyltransferase)70; negative regulation of gibberellin (GA) and abscisic acid (ABA) signaling in aleurone cells (LOC105052255: WRKY transcription factor WRKY24-like)69,71,72; and protection of plant cells against oxidative stress, maintenance of the redox balance in the mitochondrial electron transport chain to facilitate photosynthetic metabolism, and regulation of photorespiration (LOC105058492: mitochondrial uncoupling protein 5)73,74,75.

We also detected genes involved in chloroplast development and seedling viability, which were up-regulated in UFES_250 and down-regulated in SM, including genes involved in mediating thylakoid membrane formation, chloroplast thermotolerance during heat stress (LOC105049549: chaperone protein ClpB3)76, catalyzing the interconversion of glyceraldehyde 3-phosphate and dihydroxyacetone phosphate in the glycolytic and gluconeogenic pathways (LOC105056934: triosephosphate isomerase)77, and protein degradation during the cell cycle (U-box domain-containing protein 52, transcript variant X4)78.

These differences in gene regulation between seedlings from different matrices suggest that UFES_250 may have higher efficiency in photosynthesis, chloroplast development, and thermotolerance76 processes. In contrast, SM may have an advantage in gene transcription67,68,69, membrane integrity, and hormonal signaling70,79,80. These variations indicate that UFES_250 could be better adapted to environments with thermal stress and conditions that require high metabolic efficiency76, whereas SM may have an advantage in environments where transcription regulation and cellular integrity are more critical69. These differences in gene regulation can significantly influence the initial development of E. edulis plants, affecting aspects such as growth, environmental adaptation, and responses to abiotic stress. Notably, SM typically occurs naturally at higher altitudes.

Although the most genes were detected in roots compared to leaves, the greatest number of DEGs occurred in leaves (678) compared to roots (444), indicating a higher modulation of expression in the conserved leaf genes. However, studies with other species indicate that the number and type of DEGs can vary between these organs, particularly in response to different environmental stresses. For example, there was a predominance of differential gene expression in roots in Brassica campestris under cold stress81, Medicago sativa under saline stress82, Baphicacanthus cusia treated with methyl jasmonate83, Lolium and Festuca spp. under water stress84, and Prunus persica under different soil conditions85. In contrast, Linum usitatissimum under water stress showed a predominance of differential gene expression in the leaves, related to lignin and proline biosynthesis86, and there was also a predominance of differential gene expression in the leaves of Phoenix dactylifera (Arecaceae) under saline stress64.

GO characterization showed a predominance of ‘biological processes,’ followed by ‘molecular functions’ in the tissues. The Venn diagram analysis for the GOs found in the leaf and root tissues revealed both common and tissue-specific genes. Highly enriched GO terms across all tissues included cell, cell wall, membrane, membrane part, catalytic activity, binding, metabolic, and cellular processes.

Exclusive GO terms in leaves were related to functions such as reproduction, response to light stimuli, post-embryonic development, photosynthesis, and anatomical structure development. The genes LOC105040294, which expresses the GIGANTEA protein (up-regulated in UFES_250), and the genes LOC105039602 and LOC105039245, which express zinc finger domain proteins (down-regulated in UFES_250), are involved in the functions of reproduction, response to light stimuli, post-embryonic development, and anatomical structure. The GIGANTEA protein is important in regulating the flowering time of plants. Photoperiod-controlled flowering is a vital developmental process directly related to the plant’s reproductive success87. Mutations in the GIGANTEA gene delay flowering on long days, but the effects are minimal on short days88. This suggests that GIGANTEA plays a crucial role in regulating the expression of flowering time genes, promoting photoperiod-induced flowering, and participating in the plant’s circadian feedback cycle88,89.

Zinc finger domain proteins also play important roles in plant development and reproduction. These proteins respond to light stimuli and post-embryonic development, influencing the plant’s anatomical structure. CONSTANS, a zinc finger protein, is a transcription factor that acts in the long-day flowering pathway and may mediate the interaction between the circadian clock and flowering control90. Reducing the flowering time is essential for the species studied, with GIGANTEA playing a critical role in this adjustment. Zinc finger proteins are involved in responding to light stimuli, and the plant requires shade for initial development, with an appropriate light response being crucial for healthy growth and development. Both protein types are involved in post-embryonic development and anatomical structure formation, essential for plant viability and environmental adaptation. Therefore, these genes are of great importance for study due to their importance in fundamental biological processes that support plant development and reproductive success.

Similar genes were also identified as critical for regulating flowering and anatomical development in the oil palm Elaeis guineensis91. Specifically, the positive regulation of the GIGANTEA gene in UFES_250 may indicate a photoperiod-specific adaptive response, promoting flowering under certain light conditions92,93,94,95. Meanwhile, the negative regulation of zinc finger protein genes could be associated with mechanisms adjusting the plant’s circadian cycle96. Growth conditions specific to UFES_250, such as light intensity and duration, as well as soil and climatic factors, may influence this differential regulation compared to SM plants91. Therefore, the regulation of these genes in UFES_250 may represent an adaptive response that provides reproductive or developmental advantages suited to the local environment.

The exclusive GOs identified in roots are related to cellular protein modulation, cell communication, vacuole function, plasma membrane, and lipid binding processes. GOs involved in stress response, abiotic stimuli, and chemical stimuli were the most prominent, appearing frequently across the analyses and present in leaf and root tissues. The most significant DEGs involved in these processes included TPS, Hsp70, aquaporin PIP2, ClpB, SERK, and WRKY.

TPS (up-regulated in UFES_250) is responsible for the biosynthesis of trehalose, a sugar critical in protecting cells against abiotic stresses such as dehydration and heat97. Hsp70 (up-regulated in UFES_250) is an HSP expressed under stress conditions (strictly inducible), while some are present in cells under normal growth conditions and are not heat-inducible (constitutive or cognate)98. Aquaporin PIP2 (up-regulated in UFES_250) facilitates water transport across the cell membrane, playing a vital role in the response to water stress99. ClpB (up-regulated in UFES_250) is a member of the molecular chaperone family essential for chloroplast development and seedling viability, mediating the formation of internal thylakoid membranes and providing thermotolerance to chloroplasts during thermal stress76. SERK (up-regulated in UFES_250) is involved in stress response signaling and somatic embryo development100. WRKY (down-regulated in UFES_250) is a family of transcription factors regulating responses to biotic and abiotic stresses80. Studies with other palm species have shown similar genes playing critical roles in regulating stress responses and plant development101,102,103.

The increased expression of TPS, Hsp70, aquaporin PIP2, ClpB, and SERK in UFES_250 may indicate an adaptation to more stressful environments, promoting effective survival under such conditions. Conversely, the decreased regulation of WRKY might suggest a specific adaptive response to different types of environmental stress. These traits could reflect unique adaptations in the SM or UFES varieties, leading to distinct gene expression patterns and responses to environmental stresses.

The gene expression profile of leaves and roots, as presented in this study, helps in understanding the interaction of E. edulis with its environment during the early developmental phase in the field. The findings here are novel and provide valuable information about conserved, differentially expressed, and highly expressed genes responsible for early development, often highlighting stress-response genes. Data normalization emphasized the high expression of essential genes during this critical developmental phase, offering a solid foundation for future management strategies of the species in various environmental conditions. The conserved genes identified in this study may also be applicable to other Arecaceae species. The reference analysis with the genome of Elaeis guineensis was significant, revealing variations among contrasting E. edulis matrices, even within conserved genes. These results support the hypothesis that conserved molecular mechanisms related to environmental responses are activated during the early developmental stage of E. edulis seedlings, demonstrating the species’ adaptation to variable environmental conditions.

Future studies, such as de novo assembly, will enable the identification of species-specific genes, expanding our understanding of the E. edulis transcriptome and potentially uncovering new mechanisms related to development and adaptation. Ultimately, the data obtained corroborate existing literature and pave the way for practical applications in sustainable management and genetic improvement of the species.

Conclusion

This study provided agronomically relevant genes, identified DEGs between morphologically divergent matrices, and proposed candidates for molecular markers in E. edulis. The transcriptome analysis of leaves and roots from genetically distinct backgrounds (UFES_250 and SM) identified 32,000 genes, annotating 1,133 DEGs and other expressed genes, creating an important genomic database for the species. The results revealed gene expression profiles across different tissues and matrices, with greater expression divergence in the leaves. A higher number of exclusive genes was identified in the UFES_250 genetic background, although overall expression was more pronounced in SM. DEGs such as GIGANTEA, zinc finger protein, CONSTANS, TPS, Hsp70, aquaporin PIP2, ClpB, SERK, and WRKY were identified, influencing matrices’ development and morphological differences. The study also highlighted DEGs as potential molecular markers with applications in genetic improvement, while identifying genes with relevant molecular functions for further research. These findings contribute to understanding E. edulis adaptations and promote strategies for improving the species’ productivity and resistance to adverse environmental conditions.