Abstract
The mechanisms of species formation and diversity within the Orchidaceae have always been a topic of great research interest, and molecular biology-based explanations for these phenomena are lacking. Here, we present a comparative genomic analysis comprising 24 chromosome-level genome assemblies and genome skimming data for 204 species from Dendrobium. We identify that the genetic conservation in Dendrobium is associated with its environmental adaptability. And the formation of species diversity in the genus Dendrobium is driven by the compression of the Indian plate and changes in sea level and precipitation. The CYP734A50 might play important role in Dendrobium self-incompatibility (SI) species and the type-1 SI system lost through deletion of the entire single polymorphic locus (S-locus). Our results reveal the genetics background of genomic variation, biosynthesis, and adaptive evolution in Dendrobium, providing fundamental insights into the relationships between the genome and trait evolution of orchids and into angiosperm diversification.
Similar content being viewed by others
Introduction
Species diversity is extremely important, and adaptive evolution to the environment often plays a decisive role in the emergence of new traits. For angiosperms, how species adapt to changing environments and even begin to occupy different ecological niches is a key focus of research on the mechanisms of species origin and diversification1. Orchidaceae account for 10% of the species diversity of angiosperms, and Dendrobium is one of the largest genera in Orchidaceae, comprising approximately 1500 species distributed in ranges from temperate to tropical regions2,3. Dendrobium species are rich in polysaccharides and alkaloids and are renowned for their dual roles in traditional culture as both a consumable food and a valuable medicinal resource4. Through phylogenetic and comparative genomic studies of Dendrobium, we can uncover the mechanisms underlying speciation and adaptation in this genus, and also gain deeper insights into the evolution of orchids.
In our work, we provide whole-genome data at the chromosome level for 24 representative species across 14 sections in Dendrobium. Eight of these species were sequenced de novo in this study, while 16 are from publications5,6,7,8,9,10. To further verify the mechanisms of species differentiation and diversity in the genus Dendrobium, we select 204 representative species for whole-genome sequencing using genome skimming. We analyze these genomes and reveal the origin, diversity, and adaptive evolution of the genus Dendrobium, including the emergence of key traits. The Dendrobium comparative genomics analyses deepen our understanding of the divergence of the origins and diversification of species within the genus and Orchidaceae.
Results
De novo genome assembly and annotation of Dendrobium genomes
The eight genomes were sequenced individually using single-molecule real-time (SMRT) sequencing with an average coverage depth of 141.2, chromosome conformation capture (Hi-C) sequencing with an average coverage depth of 103.2, Illumina sequencing (HiSeq) with an average coverage depth of 102.1, and ONT (Oxford Nanopore Technologies) ultra-long reads with an average coverage depth of 14.9 (Supplementary Data 1). We used k-mer analysis to assess the genome size (Supplementary Fig. 1) and subsequently performed de novo genome assembly for each species. The contig N50 sizes of the eight whole-genome assemblies ranged from 1.01 Mb to 8.92 Mb, with a mean of 3.83 Mb, and the scaffold N50 sizes ranged from 48.74 to 97.5 Mb, with a mean of 64.8 Mb. The final assembled genome sizes ranged from 943.8–1509.4 Mb, with a mean of 1177.1 Mb (Supplementary Fig. 2). For each species, contigs were anchored to chromosomes (Supplementary Fig. 1 and Supplementary Data 1 and 2).
Self-incompatibility (SI) is forced by balancing selection, and the SI-related genes is located mainly at high-heterozygosity sites. The heterozygosity ranged from 0.32–2.28% (Supplementary Data 1)5,6,7,9. Self-compatible (SC) species exhibited low heterozygosity; for example, the heterozygosity of the self-compatible species D. exile and D. cariniferum was 0.32 and 0.885, respectively. In contrast, SI species, such as D. thyrsiflorum and D. spatella, had relatively high heterozygosity, with values of 2.14 and 2.18, respectively. There are SC and SI species or incomplete SI with 1–2% heterozygosity, such as the SC species D. nobile (1.35% heterozygosity) and the SI species D. porphyrochilum (1.34% heterozygosity).
Repetitive DNA comprised 52.2% of each genome meanly, ranging from 55.95–68.29% (see Supplementary Data 2). Among the repetitive sequences, long terminal repeat (LTR) retrotransposons were the most abundant (Supplementary Data 3). To annotate the protein-coding and small RNA genes for each of the eight species, we collected tissue samples and performed RNA sequencing (RNA-seq) (with a mean of 6 Gb pairs for each sample). Using these data, we predicted an average of 29,400 protein-coding genes per genome (see Supplementary Data 2). Benchmarking universal single-copy ortholog (BUSCO) evaluation revealed that 95% of the 1440 single-copy embryophyte genes were completely assembled in these genomes (Supplementary Data 4), indicating a high completeness of the gene annotation. We conducted functional annotations for each Dendrobium variety on the basis of multiple public databases, with the percentage of genes with functional annotations being greater than 90% (Supplementary Data 5). On average, the total annotated length of miRNAs was approximately 10,529 bp per species, accounting for an average of 0.000905% of the genome.
Gene patterns of the Dendrobium genomes
We performed comparative genome analyses of the 24 chromosome-level assemblies (Fig. 1a). Ortholog investigations classified all genes from the 24 Dendrobium genomes into 32,440 families. The total number of gene sets increased as additional genomes were added and approached a plateau, indicating the representativeness of the 24 Dendrobium species. Among the total gene sets, 5209 families were identified in all 24 species and were defined as core genes; 10,121 families presented in 19–23 species ( > 80% of the collection) were defined as soft core genes; 12,265 families present in 2–18 species were defined as dispensable genes; and 4844 families presented in only one species were defined as private genes (Fig. 1b–d).
a Overview of the 24 Dendrobium species utilized in the comparative genome analysis, accompanied by the syntenic analysis among different Dendrobium genomes, organized according to the phylogenetic relationships. b Variations in the number of gene clusters in the genomes. c Proportion of pangenes in the core, softcore, dispensable and private categories. d Flower plot showing the number of core gene clusters. e–g The KEGG pathway enrichment results for core genes, dispensable genes and private genes. The significance was estimated via P-value and corrected by FDR, with significant pathways selected at a threshold of corrected-p-value ≤ 0.05. h A violin plot showing the distribution of Ka/Ks ratios for coding genes across different chromosomes in the representative species D. thyrsiflorum, which has 20 chromosomes. The box spans the interquartile range from the 25th to 75th percentile, with the central line representing the median. i The proportion of different types of structural variations in each Dendrobium genome. Abbreviation information can be found in Supplementary Data 6. j The proportion of deletions, insertions, and SNPs in the Dendrobium genomes. k KEGG functional enrichment of genes affected by structural variations in the Dendrobium genus genome. l Comparison of expression levels among different gene types in eight Dendrobium species. The box spans the interquartile range from the 25th to 75th percentile, with the central line representing the median. Source data are provided as a Source Data file.
Gene Ontology (GO) analyses revealed that the core genes were enriched in biological processes related to signal transduction via protein phosphorylation, chromosome segregation, mitochondrial genome maintenance, and response to acid chemistry (Supplementary Fig. 3). In contrast, dispensable genes were enriched in the respiratory electron transport chain, cellular respiration (Supplementary Fig. 4), oxidative phosphorylation and purine-containing metabolic processes. The GO term enriched among the private genes included DNA metabolism and repair, response to stress and DNA damage, and metabolism of nitrogenous compounds and nucleic acids (Supplementary Fig. 5).
Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses revealed that the core genes were enriched in pathways related to fundamental cellular functions including energy metabolism through the citric acid (TCA) cycle, carbon metabolism, nucleotide synthesis via purine metabolism, and the maintenance of genetic stability through homologous recombination (Fig. 1e–g and Supplementary Fig. 3). Dispensable genes were enriched in pathways related to fundamental metabolic processes and biosynthetic pathways, including DNA replication, secondary metabolite synthesis, energy metabolism, plant‒pathogen interactions, and the metabolism of specific compounds, including the biosynthesis of certain antibiotics and alkaloids (Fig. 1e–g and Supplementary Fig. 4). The KEGG terms enriched among the private genes were related to amino acid metabolism, secondary metabolism, signal transduction, stress response and repair, photosynthesis, and energy metabolism (Fig. 1e–g and Supplementary Fig. 5). We identified beta-glucosidase among the private genes of D. exile. The D. porphyrochilum private genes include a tyrosine aminotransferase, and those of D. cariniferum are enriched with terpene-related entries. We identified homologous recombination as a term enriched among the D. porphyrochilum private genes, including the DNA repair protein RAD51-like. In D. lindleyi, private genes were significantly enriched in the MAPK signaling pathway. We identified 71 genes encoding potassium channel KAT3-like proteins in D. lindleyi MAPKs. D. exile is also a crassulacean acid metabolism (CAM) plant, but its private genes were not enriched with terms related to the MAPK signaling pathway; rather, they are enriched in photosynthesis genes, including two psaJ genes associated with photosystem I and three atpA genes.
Phylogenomic and chromosomal collinearity
We constructed a phylogenetic tree utilizing 523 single-copy orthologous genes from the reference genomes of 24 species. Taken together, the results provided strong support for four major clades: clade A, which contained sect. Crumenate, Aporum, Calyptrochilus, Pedilonum, Calyptrochilus, Densiflora, Formosae and Distichophyllae; clade B, which contained sect. Dendrobium and Densiflora (D. chrysotoxum); clade C, which contained sect. Stachyobium and Parcum (Fig. 1a).
Dendrobium typically has a chromosome count of 2n = 2x = 38; however, three species, D. thyrsiflorum, D. secundum, and D. porphyrochilum, presented a chromosome count of 2n = 2x = 40. The collinearity results revealed that Chr. 20 in D. thyrsiflorum and D. secundum originated from the breakage of another chromosome (Fig. 1a). Clades A and B share a common ancestor, and it can be deduced that Chr. 20 is derived from a break that occurred near the centromere of Chr. 16. Comparative collinearity analysis at the genus level revealed that segments homologous to Chr. 20 were located on different chromosomes across various Dendrobium genomes, likely because of multiple rearrangements occurring over evolutionary time, indicating their distant phylogenetic relationships. The evidence for this is that the three members of sect. Dendrobium in Clade B, along with D. chrysotoxum, presented corresponding chromosome collinearity, and the homologous segments of Chr. 20 were all located on Chr. 9. Therefore, the chromosome counts of 38 should represent the ancestral state of the Dendrobium genus, and the n = 20 condition evolved independently at least twice. However, Chr. 20 is a collinear region in all 24 Dendrobium genomes analyzed, which excludes the possibility of errors due to Hi-C scaffolding. Interestingly, although D. porphyrochilum is distantly related to D. thyrsiflorum and D. secundum, its Chr. 20 exhibited strong collinearity signals. Upon re-examination of the overall genome collinearity, we observed that this segment (Chr. 20) has a high translocation rate across different genomes, suggesting a potential unique function. We conducted functional enrichment analysis of the Chr. 20 genes and identified genes associated with carotenoids and anthocyanins. These genes included zeaxanthin cleavage dioxygenase, secoisolariciresinol dehydrogenase, carotene epsilon-monooxygenase, and carotenoid cleavage dioxygenase. Therefore, we assessed the selective pressure on genes across all chromosomes. The results revealed that Chr. 20 presented a significantly low Ka/Ks ratio, only slightly greater than that of Chr. 2 of D. thyrsiflorum (Fig. 1h). Future studies on comparative genomics and chromosome evolution may help determine the origins and evolutionary history of this segment.
Previous analyses of other published Dendrobium genomes revealed a recent genome-wide duplication (WGD) in Orchidaceae, occurring approximately 75 million years ago, resulting in the duplication of approximately 50% of the genes11. Our Ks distribution for the Dendrobium orchid indicated that there were no unique whole-genome duplication events within this genus (Supplementary Fig. 6). The presence of shared peaks in the Ks distribution plot suggests that the observed duplications are not exclusive to Dendrobium but are shared with other related plant lineages, such as Orchidaceae and monocotyledonous plants.
Sequence variation and its contribution to environmental adaptation in Dendrobium genomes
To determine sequence variations across species, the 24 genomes were compared. We anchored the 23 other genome sequences to the D. thyrsiflorum genome because of its large size and high quality (see Supplementary Data 2). In total, 13,986 syntenic blocks, 3117 inversions, 3524 translocations, 29,538,127 SNPs and 13,978,634 small insertions and deletions were identified (Fig. 1i, j). The large number of SVs from dozens of independently de novo-assembled genomes enabled us to clarify evolutionary processes that could not be detected from one or a few genomes (Supplementary Data 6). Within these structural variant (SV) hotspot regions, we observed significant overlap between SVs and functional genes across species (Fig. 1k). Gene enrichment analysis within SV regions revealed that these genes are in areas of the genome that tend to accumulate structural diversity. These findings suggest that these regions play pivotal roles in the adaptation of Dendrobium species to various environmental stressors. Specifically, the enrichment of genes involved in phenylpropanoid biosynthesis, plant–pathogen interactions, and stilbenoid, diarylheptanoid, and gingerol biosynthesis indicates their importance in the response of Dendrobium species to environmental challenges. The secondary metabolites synthesized by these genes are crucial for both plant defense and plant–environment interactions. We also examined the expression levels of different types of genes in these genomes and found that core genes presented relatively high average expression levels, whereas dispensable and private genes presented relatively low expression levels (Fig. 1l).
FAR1/FRS genes and epiphytism
We utilized the domain information of the FAR1 gene family to identify 1896 members belonging to the FAR1/FRS gene family from Arabidopsis thaliana (TAIR10), Apostasia shenzhenica11, Phalaenopsis equestris11, and 24 Dendrobium species, with Dendrobium accounting for 1793 of these genes (Supplementary Fig. 7). The FAR1/FRS gene family is divided into seven major clades, two of which (Orchid-clade-1 and 2) are specific to orchids (Fig. 2a). Within Orchid-clade-1, in contrast to A. shenzhenica, with 2 members, we observed a noticeable expansion in the epiphytic P. equestris and Dendrobium species. Specifically, P. equestris has 40 homologous genes, whereas Dendrobium species have an average of 51 (Supplementary Fig. 7 and Supplementary Data 7). Upon further examination of the Dendrobium genus, a subclade unique to the Epidendroideae, the species with the most members was D. catenatum, with 155 copies (Supplementary Data 7). Interestingly, D. huoshanense (2), D. chrysotoxum (3), D. tetragonum (2) and D. discolor (2) have fewer members.
a The phylogenetic tree topology of FAR1/FRS genes in the Dendrobium genus. The base graphic of the tree was adapted from Pixabay.com. b The phylogenetic tree topology of MADS-box genes in the Dendrobium genus, with a detailed representation of the ANR1, SVP, B-AP3, and OsMADS32 branches. c KEGG functional enrichment of genes significantly differentially expressed in the lip and sepal during the peak blooming period. d KEGG functional enrichment of genes significantly differentially expressed in the lip and sepal during the flower senescence period. e Fatty acid synthesis pathway, with red indicating genes differentially expressed during the peak blooming period, blue for genes differentially expressed during the flower senescence period, and purple for genes differentially expressed in both the peak blooming and flower senescence periods. f Type-II FAS pathway, with red indicating genes differentially expressed during the peak blooming period, blue for genes differentially expressed during the flower senescence period, and purple for genes differentially expressed in both periods. g Fatty acid degradation pathway, with blue indicating genes differentially expressed during the flower senescence period. h Expression levels of genes in the fatty acid synthesis, fatty acid metabolism, and fatty acid degradation pathways in different parts of the flower. Source data are provided as a Source Data file.
Epiphytic orchids grow on shaded tree trunks in nature. The canopy’s leaves absorb blue/red light, altering the red to far-red light ratio, which affects orchids. Phytochrome A (PHYA) is key for plant growth in this FR light-dominated environment, coordinating the FR high irradiance response essential for thriving in shade12. The regulatory network extends further with FAR1 and FHY3 (far-red elongated hypocotyl 3), transcription factors derived from transposases, which modulate PHYA-mediated FR light perception13. In A. thaliana, FRS genes are recognized players in a light signaling pathway that governs flowering time and overall plant development14. Orchidaceae species have a distinct FRS gene branching pattern; terrestrial orchids possess only two copies, while members of the Dendrobium and Phalaenopsis genera exhibit significant expansion. We propose that this expansion is associated with their epiphytic nature.
Gene families related to flower evolution and the biosynthesis pathways of fatty acids (FAs)
To verify whether the previously reported11 relationship between orchid floral morphology and the MADS-box gene family is conserved in Dendrobium species, and to explore the potential molecular mechanisms underlying epiphytism and spur formation in Dendrobium, we analyzed the MADS-box gene family in members of the genus Dendrobium (Fig. 2b and Supplementary Fig. 8). Dendrobium possesses an OsMADS32 branch (Fig. 2b). B-AP3 has expanded in conjunction with other members of Orchidaceae (Fig. 2b). According to the generated topological structure, one branch arose during the divergence of Orchidaceae ancestors and lacked representatives from A. shenzhenica. We found that Dendrobium species in Asian lineages have lost the AGL12 gene (Fig. 2b) and that this trait is conserved. Previous study indicates that D. discolor and D. tetragonum possess the AGL12-like gene10. We constructed a phylogenetic tree based on Dendrobium species as well as Oryza sativa, Arabidopsis thaliana and Apostasia shenzhenica (Supplementary Fig. 8), identifying one homologous gene in each that clusters with AGL12 in D. discolor (Ddi_Chr06.g07651) and D. tetragonum (Dte_Chr13.g20050). The loss of the AGL12 gene is correlated with shared epiphytic trait of orchids11,15. According to our result of phylogenetic relationships among branches of the Dendrobium genus (Fig. 5d), we suggest the ancestor of the Asian and Australia lineages had been lost the AGL12-like genes, and D. discolor and D. tetragonum species in Australia lineages were regain a AGL12-like gene with neofunctionalization10. In contrast to P. equestris and A. shenzhenica, the Dendrobium genus has a unique SVP. Interestingly, the ANR1 branch was divided into two branches in Dendrobium (Fig. 2b), each of which was lost in different Dendrobium species.
Flower morphology is also closely related to pollination. The MADS-box gene family is crucial for floral development and specific traits in orchids, including epiphytism and pollinia evolution11,16,17. Dendrobium, an epiphytic orchid genus, is distinguished by its labellum spur. Our analysis of the MADS-box gene family in Dendrobium aimed to explore the conservation of the orchid floral morphology relationship with the MADS-box gene family and the molecular mechanisms underlying Dendrobium epiphytism and spur formation. Dendrobium has an OsMADS32 branch, which is linked to rice floral development, suggesting its role in the formation of the Dendrobium labellum. The expansion of the B-AP3 gene family in Dendrobium aligns with that in other Orchidaceae members, supporting the hypothesis that pseudosymmetry traits emerged due to gene loss after whole-genome duplication. The loss of AGL12 in Dendrobium, as observed in other epiphytic orchids, is conserved and may be important for their adaptation. Dendrobium is also unique in possessing an SVP gene, which has been highlighted in studies on the initiation of flowering, indicating its potential importance in the horticultural use of Dendrobium. Furthermore, the ANR1 branches in Dendrobium are diverse, with different species exhibiting gene loss, warranting further research into the relationship of this gene family with species adaptability.
To gain insight into the pathways responsible for fatty acid (FA) biosynthesis in Dendrobium, we conducted a comparative analysis using data from the Acyl Lipids database (https://www.arabidopsis.org/browse/gene_family/acyl_lipid) to verify the contribution of fatty acids to the pollination of orchid plants. In addition, we conducted differential expression analysis between the labellum (lip) and the remaining floral parts on the basis of transcriptome data from different flowering stages (Fig. 2c, d). At the peak of flowering (the 4th day after flowering), genes related to fatty acid synthesis and metabolism were highly expressed in the labellum. After the peak of flowering (on the 5th day after flowering), as the labellum begins to age, genes related to both FA synthesis and metabolism, as well as genes involved in FA degradation, are significantly upregulated (Fig. 2e–h). At the peak of flowering, the highly expressed genes within the fatty acid synthesis pathway were FabD, FabF, FabZ, and those of the Type II FAS pathway (Fig. 2e, h and Supplementary Fig. 9). The highly expressed fatty acid metabolic pathway genes were involved in metabolism downstream of acetyl-CoA, which are expressed primarily in the cytoplasm, plastids, and mitochondria (Supplementary Fig. 10). FabD, FabF and FabZ are highly expressed during anthesis and are essential for fatty-acid-mediated fragrance production that attracts pollinators. We therefore identified their orthologues across Dendrobium and reconstructed the phylogenetic trees (Supplementary Fig. 11), and found the copy number and phylogenetic placement of these three genes are largely conserved throughout the genus. and future work on pollinator attraction in orchids can focus on these loci. After the peak of flowering, the genes in the FA synthesis pathway with high expression were FabG and FabI, as well as those in the Type II FAS pathway (Fig. 2e, f). In the FA metabolic pathway, the highly expressed genes were associated with metabolism downstream of acetyl-CoA; however, in comparison with the peak flowering period, metabolism was terminated at the palmitic acid stage (Fig. 2h). In addition, a significant number of genes related to FA degradation were highly expressed in the labellum after peak flowering (Fig. 2g and Supplementary Fig. 12). These genes are associated with the degradation process leading to the formation of hexadecanoyl-CoA, 1-alcohol, and omega-hydroxy fatty acids. We examined the status of genes related to FA metabolism among different members of the genus Dendrobium. The numbers of members of the fatty acid metabolism pathway across different Dendrobium species ranged from 1054–1265, with an average of 1147, which was 1.46-fold greater than that in A. thaliana. Among these genes, 657 were core genes, and nine were private genes, indicating their significant role in normal survival and development in the Dendrobium genus.
Evolution of SI in Dendrobium
SI is widely adopted by hermaphrodite plants to promote outcrossing. It exhibits highly dynamic origination, loss and regain18,19 in angiosperms, and the origin and evolution of SI may be an important driving force for species differentiation and population expansion18,19. Dendrobium is composed of up to 72% SI species20, and its SI phenotypes are diverse20,21. The interspecific cross compatibility rate is also high20, suggesting that the molecular mechanisms of SI in this genus may be complex and diverse.
In this study, we investigated six known molecular mechanisms of SI (Fig. 3a). These include type I SI, controlled by style S factor S-RNase and pollen S factor SLF, which are commonly found in Plantaginaceae, Solanaceae, Rosaceae and Rutaceae; type II SI, which are controlled by SRK and SCR, in Brassicaceae; type III SI, which are controlled by PrsS and PrpS, in Papaveraceae; type IV SI, which are controlled by CYP-GLO2-KFB-CCM-PUM, in Primulaceae; type V SI, which are controlled by TsSPH1-TsYUC6-TsBAHD, in Turneraceae; and type VI SI, which are controlled by HPS10-S and DUF247I-S, in Poaceae19.
a Origin and evolution of angiosperm SI, the figure summarizes and visually represents the key findings from Zhao et al.19. The differently colored circles and lines represent six known types of SI, with the type present in Orchidaceae remaining unknown. b Mechanisms of heteromorphic SI in Primula, the figure summarizes and visually represents the key findings from Zhang et al.109. SI occurs in short (S) style morphs with stigmas positioned above the anther and long (L) style morphs with stigmas situated below the anther. In S morphs, the CYP734A50 gene inhibits cell elongation by repressing brassinosteroid. c the mechanisms of SI in Dendrobium remain unknown, but the identification of homologous genes or gene families has revealed the presence of genes related to type IV SI, specifically GLO2 and CYP734A50. d Genes homologous to CYP734A50 in Dendrobium are clustered with those in Primulaceae, forming four distinct branches within Dendrobium, indicating gene expansion. e Gene expression patterns of GLO2 and CYP734A50 homologs in various tissues and styles after 72 h of self- and cross-pollination in Dendrobium chrysanthum. with three replications, respectively. One-way analysis of variance (ANOVA) was performed to assess the statistical significance of expression differences among tissues. Post hoc pairwise comparisons were conducted using Tukey’s multiple comparisons test. All plots were generated using GraphPad Prism 8 for mac. Results are presented as mean ± standard error (SE), and statistical significance was defined as p < 0.05. Error bars represent the standard error (SE) of the mean. Statistical significance was determined using ANOVA. Asterisks are shown only on the plots for “style”, “CP72h” and “SP72h” to highlight selected significant comparisons: ***p < 0.001, ****p < 0.0001. Different colors represent tissues and styles post-pollination. CP72h: style after 72 h of cross-pollination; SP72h: style after 72 h of self-pollination. f Maximum likelihood (ML) phylogenetic tree of T2-RNase genes in Orchidaceae and other angiosperms. Blue branches represent II T2-RNase, green branches represent I T2-RNase, and the purple branch represents S-RNase. g ML phylogenetic tree of SLF genes in Orchidaceae and other angiosperms. The yellow branches represent other SLF genes, while the purple branches indicate S-SLF. Source data are provided as a Source Data file.
The identification of homologous genes or gene families in Dendrobium combined with phylogenetic analysis revealed the presence of genes homologous to the type IV SI GLO2 and CYP734A50 (Fig. 3b–d, Supplementary Table 1 and Supplementary Fig. 13a), in which anther-specific expression promoting anther elevation and short-styled specific expression determining the female incompatibility type by inactivating brassinosteroids (BR) in Primula species, respectively22,23. They found that BR is abundant in long-styled flowers but almost undetectable in short-styled, suggesting that the rich BR in long-styled flowers promote and inhibit pollen fertilization in short-styled and long-styled flowers, respectively. In contrast, due to the promotion of BR degradation CYP734A50 in short-styled flowers, short-styled and long-styled flowers exhibit incompatibility and compatibility, respectively, when pollinated by short-styled flowers22. Phylogenetic analysis revealed that the genes homologous to CYP734A50 in Dendrobium, P. equestris, and A. shenzhenica clustered together with those genes in Primulaceae and formed four branches within Dendrobium, indicating the expansion of this gene in Dendrobium. Further the gene expression pattern after self-pollination and cross-pollination in SI species, D. thyrsiflorum and D. chrysanthum24, was carried out, respectively, and the apparent differences was found, such as, DchrysaCYP734A50-3 was specifically and significantly higher expressed in style after 72 h self-pollination than that after 72 h cross-pollination in D. chrysanthum (Fig. 3e and Supplementary Table 2), DthyrsiCYP734A50-3 was also specifically expressed in pollinated style and always higher expressed in style after self-pollination than that after cross-pollination in D. thyrsiflorum (Supplementary Table 3). while the difference of expression pattern in SC species, D. nobile, was not obvious (Supplementary Table 4) among different times after self- and cross-pollination. All above results suggested CYP734A50-3 gene may play an important role in self-incompatibility in Dendrobium, and the different gene expression pattern after self- and cross- pollination lead to the diversity of self-incompatibility phenotypes among different Dendrobium species.
The identification of homologous genes or gene families of the other five SI type in Dendrobium were also carried out. The search for type-I SI showed both SLF and RNase genes in Orchidaceae were not gathered with the known SI genes, respectively, suggesting the SI type-1 has been lost in orchids through deletion of the entire single polymorphic locus (S-locus) (Fig. 3f, g). The search for SI type II and type III also revealed that no homologous genes were present in Dendrobium. The results of type V SI identification revealed that the YUC6 gene formed a branch with TsYUC6 in Dendrobium and that there were two subbranches within Dendrobium (Supplementary Fig. 13b), indicating the expansion of this gene in Dendrobium, but no genes homologous to SHP1 or BAHD were found. For the type VI SI, the DUF247 gene in Dendrobium did not cluster in the same branch as the DUF247 gene in Gramineae (Supplementary Fig. 13c) but rather clustered with the dicotyledonous lineage together or formed a separate branch, and no homolog of the HPS10 gene was found.
The above analysis suggests that the loss of the entire S-locus of the type I SI in orchids, and the present of CYP734A50 homologous gene of type IV SI in Dendrobium contributed the SI of Dendrobium species. Furthermore, the different expression pattern of CYP734A50 homologous gene may determinant in different SI type in Dendrobium species just as acts differently in long- and short- styles in Primula species. These results suggest that a convergent evolutionary relationship of SI may exist between Dendrobium and Primulaceae, potentially contributing to the species diversification.
Some branches of Dendrobium initially had SI, avoiding inbreeding recession, which guaranteed the survival of the species21. The analysis results presented later in this study indicated that reticulate evolutionary events of Dendrobium occurred mainly at the base of the branches, suggesting that some SI systems may have existed at the early stages of each branch, allowing Dendrobium species to survive in the harsh environment of the late Cenozoic ice ages. Then, after each branch expanded into more species, SI originated multiple times independently under suitable conditions, promoting the diversification of the Dendrobium genus.
Terpene synthase genes shape the dendrobine biosynthesis pathway
The biosynthesis of dendrobine, a prominent sesquiterpenoid alkaloid in Dendrobium species, relies heavily on the TPS gene family25. In this study, we identified 829 TPS genes in 24 Dendrobium orchids and reported that the number of members of the TPS (TERPENE SYNTHASE) gene varied significantly among Dendrobium species, ranging from 14 in D. porphyrochilum to 49 in D. nobile, with an average of 34 TPS genes (Fig. 4a, b, Supplementary Fig. 14 and Supplementary Data 8). Section Dendrobium has a relatively high number of TPS members, which may be correlated with its greater medicinal reputation (Supplementary Data 8). For example, we found 27 TPS genes in D. exile, of which 12 were core genes, 15 were dispensable genes (Fig. 4c), and none were private genes. These findings indicate that the 12 most crucial members of the TPS gene family in Dendrobium are important for basic terpene biosynthesis and stress tolerance, whereas the other 15 members are related to the diversity of terpenes in the genus. Upon identification, most core TPS genes were distributed in the TPS-b subfamily, whereas none of the core genes were from the TPS-a subfamily, suggesting that TPS-a proteins are not absolutely required for survival in members of Dendrobium (Fig. 4b). However, the high diversity observed in the TPS-a subfamily is associated with secondary metabolite synthesis in Dendrobium. In the TPS analysis, the TPS-a clade played a role in gene family diversification, indicating that these genes are likely the main factors responsible for generating various types of terpenes or terpene derivatives, such as dendrobines. We found that the species in the Dendrobium clade had a significantly expanded set of TPS-a genes. To explore the link between TPS-a subfamily expansion and environmental adaptation in Dendrobium, we explored TPS-a copy number across the genus (Supplementary Data 8). Members of sect. Dendrobium have the highest gene copy counts, such as D. hercoglossum (25), D. aphyllum (21) and D. nobile (20). This section is also the most species-rich in genus, occupying a wide range of ecological niches, suggesting that the amplified TPS-a repertoire has related to adaptive radiation. The increases are attributable to tandem or adjacent duplications, exemplified by the gene clusters d15_chr02.g02639, d15_chr02.g02645, d15_chr02.g02648 and d15_chr02.g02649 on D. hercoglossum Chr02, and d18_Chr02.g02667, d18_Chr02.g02669, d18_Chr02.g02670 and d18_Chr02.g02671 in D. aphyllum. In contrast, D. huoshanense possesses markedly fewer TPS-a copies, a deficit that may partly explain its narrow distribution (in south-east China).
a The phylogenetic tree topology of TPS in the Dendrobium genus. b The distribution of the number of TPS branches among different members of the Dendrobium genus. c A comparison of core genes in the genomes with the average number of genes in each Dendrobium species for different TPS branches. d The phylogenetic tree of the proliferating cell nuclear antigen (PCNA) gene family. e the phylogenetic tree of the POLD1 gene family, emphasizing Dendrobium-specific expansions. f the phylogenetic tree of the replication protein A (RPA) gene family, highlighting the Dendrobium-specific branch and its expansion relative to other Orchidaceae species. Source data are provided as a Source Data file.
The analysis of NBS resistance genes revealed that Dendrobium indeed has significantly more resistance genes than the other two orchids do; there are only 15 resistance genes in Apostasia and 60 in Phalaenopsis, but 52–175 in Dendrobium (Supplementary Table 5). Furthermore, the results of the phylogenetic analysis revealed an Orchidaceae-specific branch (Supplementary Fig. 15). The presence of numerous and specific resistance genes in the Dendrobium genome provides both a basis and protection for its adaptation to new environments, exploration of new habitats, and resistance to pathogenic microorganisms. Earlier studies have proposed that the NBS gene family is undergoing a genus-wide contraction in Dendrobium, and that this erosion is the principal driver of its observed diversification26. Our genus-scale survey supports this view: copy-number variation among Dendrobium NBS genes is generated largely by tandem duplications and carries no discernible phylogenetic signal. For example, D. huoshanense (52 copies), D. exile (52 copies) and D. parcum (68 copies) are placed on separate major clades. The absence of a shared pattern implies that contraction and selection are imposed primarily by environmental pressures, with recurrent convergence. Thus, our data corroborate the earlier conclusion that NBS gene decline is shaping Dendrobium immunity diversification.
We investigated the MMR gene family across the Dendrobium genomes using P. equestris and A. shenzhenica as references and found that several families in the MMR of members of the Dendrobium genus were expanded specifically (Fig. 4d). For example, we identified two specific expansions of the proliferating cell nuclear antigen (FabG) gene family, one unique to the Epidendroideae subfamily and another unique to Dendrobium (Fig. 4d). A Dendrobium-specific branch of POLD1 was not found in other Orchidaceae species or A. thaliana (Fig. 4e). For the replication protein A (RPA) family, there is a Dendrobium-specific branch related to A. thaliana nucleic acid-binding OB-fold-like protein (AT1G52950.1), with no members detected in either A. shenzhenica or P. equestris. This branch was expanded significantly among Dendrobium species, with D. spatella having up to 12 copies (Fig. 4f). Among these, on branch is unique to the subfamily of orchids. D. porphyrochilum also exhibited expansion compared to other Dendrobium members. The remaining genes, including MSH2, MLH1, PMS1, PFC, and LIG, were relatively conserved, with no significant unique differentiation, duplication, or loss phenomena.
Dendrobium phylogeny and biogeography
To study the degree of species diversity in Dendrobium, we selected 204 species from different branches of Dendrobium, covering 33 out of the 40 sections27. To minimize interference from multi-copy genes and paralogs, we first identified 3253 orchid-specific single-copy genes and retained only SNPs located within these loci, which resulted in 333,563 nuclear SNPs for each species. To integrate both nuclear and plastid signals, we additionally extracted SNPs from the chloroplast genomes and obtained 10,639 plastomic SNPs for each species. On the basis of the 70,217,208 SNPs in total, we constructed a clear and highly supported phylogenetic tree of the Dendrobium in a broad sense and then conducted an evolutionary rate analysis of the whole genus (Fig. 5a–c). This analysis revealed three major branches in Dendrobium in a broad sense, including the basally located Epigeneium branch and the subsequent branching of the major Australian and Asian groups. This fundamental topology is consistent with previous findings28,29,30. Within the Australian clade, our phylogeny resolves nine strongly supported subclades (Fig. 5b), an improvement over previous studies where relationships among sections were largely unresolved29,30,31. Our results show that sect. Rhizobium forms the earliest-diverging lineage, followed by sect. Dendrocoryne. Subsequently, a clade comprising sect. Grastidium and sect. Biloba (Aus-c) and another clade corresponding to the former genus Flickingeria (Aus-d) diverge. The remaining sections, including Fugacia (Aus-e), Latouria (Aus-f), Brevisaccata (Aus-g), Phalaenanthe (Aus-h), and Spatulata (Aus-i), form a well-supported monophyletic group. This topology clarifies that sect. Latouria is not the first divergent clade within the Australian radiation, as suggested by Burke et al.31, but rather is part of a more recent diversification.
a Geographical distribution areas of the Dendrobium genus, illustrating the inferred migration routes of ancient Dendrobium. Area A represents mainland Asia, B represents Malaysia, Sumatra and Borneo, C represents the Philippines, D represents Sulawesi Island, and E represents New Guinea and Australian regions. Map data was sourced from the Database of Global Administrative Areas (gadm.org) and illustrated by QGIS (3.36.1). b Divergence time estimation and biogeographical analysis of the Dendrobium genus, highlighting key divergence events and their corresponding timelines. Species marked with a red star are Dendrobium species with genomes assembling this study. c Analysis of evolutionary rates across the Dendrobium genus, presenting variations in diversification rates among different lineages. d Reticulate network analysis depicting the evolutionary relationships among branches of the Dendrobium genus, illustrating both direct and internal reticulate connections.
In previous analyses, the Asian clade appeared as a large polytomy30. In our study, Asian clade was resolved into 11 well-supported subclades (Fig. 5b). The earliest-diverging lineages are composed of sect. Herbaceae (As-a), sect. Distichophyllae (As-b), and sect. Stachyobium (As-c). The remaining species form two major sister groups. The first (clade As-k) comprises a large assemblage of species primarily from sect. Dendrobium, Holochrysa, and Breviflora. The second is a large, complex clade that includes the remaining sections, such as Formosa (As-d), Densiflora (As-e, As-f), and the highly diverse Pedilonum complex (As-g, As-j). This resolution provides a backbone for future taxonomic and evolutionary studies in this complex group.
Using SNP data from single-copy genes, we constructed a reticulate evolutionary tree for the Dendrobium genus to reveal both the direct and internal reticulate evolutionary relationships among its branches (Fig. 5d). The results indicated that the reticulation in Dendrobium is concentrated primarily at the base of the major branches, such as the divergent events between the Asian and Australian clades, suggesting ancient hybridization or introgression during the early phase of their divergence. Further reticulation events were detected within the Asian clade, for example, between As-f and its sister branch and between As-e and its sister branch. Reticulation was also occasionally observed within branches, such as within As-k. These ancient reticulation events, occurring between major clades rather than among terminal species as seen in genera like Paphiopedilum32, suggest that early hybridization played a critical role in shaping the foundational lineages of Dendrobium before their extensive radiation.
According to our divergence time estimation and ancestral distribution reconstruction, Dendrobium diverged from Epigeneium at 39.83 million years ago (Ma) (95% highest posterior density (HPD): 51.18-30.72 Ma) in southeastern Asia and then diverged into the main branches of Asia and Australia at 33.52 Ma (95% HPD: 42.38-26.68 Ma) (Fig. 5b and Supplementary Data 9), which is close to the median age of terrestrial vertebrate families in the Indo-Australian archipelago (32 Ma)33. It is slightly earlier than the divergence time (28.1 Ma) proposed by Li et al.10, which is likely a result of differences in sampling.
The Australian main branch arose through dispersal from Asia in the Oligocene (between 33.52 and 23.49 Ma), which coincides with a peak of colonization events across Wallace’s Line from Sunda to Wallacea33. Under the warming climate in the late Oligocene, the humid climate zone currently found in Wallacea has existed in Sunda for at least 20–30 million years33, coinciding with a period marked by the expansion of tropical rainforests34,35. We also observed two additional deep colonization events across Wallace’s line in the middle Miocene, of the As-j branch (17.24–11.94 Ma) and the As-g branch (18.73–12.38 Ma) in the Asia main branch (Fig. 5b). This period also coincides with the suitable climate period in the middle Miocene (14 Ma) and recorded colonization event peaks for vertebrate species (13 Ma)33 and Vitaceae plants ( ~ 14 Ma)36.
Both the Asian (23.93 Ma) and Australian (23.49 Ma) major branches started to diversify near the Oligocene–Miocene boundary (23.03 Ma), coinciding with the Oligocene–Miocene transition (24.4 ~ 21.5 Ma)37 and the Mi-1 glaciation (23.1 Ma)38. This period of environmental cooling likely induced population isolation, contributing to divergences within both major branches. The diversification rates estimated from the dated phylogenetic data varied considerably among Dendrobium lineages, and a distinct shift in the diversification rate was detected beginning at 13.3 Ma for the As-k branch (Fig. 5b) in Asia. This increase aligns with the middle Miocene climate transition (15 ~ 13 Ma)38,39, which was marked by increased precipitation and sea level fluctuations (Fig. 5c). These complex geological histories may have driven reticulate evolution. Accordingly, we note that several species (such as D. flexicaule, D. scoriarum, D. catenatum, D. crepidatum, and D. ochraceum)—mostly members of sect. Dendrobium—occupy phylogenetic positions that differ from those reported in the previous study26. This section harbors exceptional species richness and occupies topographically complex regions, so different molecular markers or SNP sets may capture conflicting signals.
Discussion
This comparative genomics study of the Dendrobium provides the opportunity to explore the molecular mechanisms underlying the characteristic differences among its members. The core genes, dispensable genes, and private genes highlight the commonalities and species-specific traits of the Dendrobium species. Our findings indicate that the core genes shared by all Dendrobium species are largely involved in routine energy metabolism and signal transduction. In contrast, dispensable or private genes are closely related to adaptations to different environments, including responses to light and stress, interactions with pathogens, and secondary metabolism (terpene synthesis). Different private genes help different species of Dendrobium succeed in their respective environments. Thus, the complexities of geological history and diverse environments have played a significant role in shaping the genetic patterns of different Dendrobium species. Many Dendrobium species exhibit unique fragrances or adaptations. D. exile flowers are notably fragrant, with the presence of beta-glucosidase in the genome potentially playing a role in the synthesis of its scent compounds. D. porphyrochilum and D. cariniferum also have distinctive scents, with the former possessing a gene encoding tyrosine aminotransferase and the latter showing a prevalence of terpene-related genes, both of which are implicated in fragrance production. The distribution of D. porphyrochilum at relatively high altitudes is facilitated by its private genes, including the RAD51-like DNA repair protein, which is involved in the response to ultraviolet radiation and is crucial for its high-altitude adaptability40. The private genes of D. lindleyi, a CAM plant that thrives in dry, sunny environments, are enriched in the MAPK signaling pathway, which is crucial for responses to stresses such as drought and temperature fluctuations41. The presence of 71 potassium channel KAT3-like genes in this species suggests the importance of osmotic adjustment and water balance under stress, potentially affecting water use efficiency by regulating stomatal function and ion balance. While D. exile also uses CAM photosynthesis, its private genes are associated with photosynthetic processes rather than stress response pathways. It features genes such as psaJ and atpA, which are key to ATP production in both oxidation and photophosphorylation42. Photosystem I, associated with psaJ, adapts to low-light conditions typical of D. exile’s understory habitat.
The diversity of genome structures, especially those marked by SV regions, provides a toolbox of genetic variation allowing Dendrobium species to adapt to various environmental conditions. These structural variations may include gene duplications, deletions, insertions, and rearrangements, which have the potential to alter gene expression patterns, thereby affecting the physiology and developmental processes of Dendrobium. For example, variations in genes related to isoflavonoid and flavonoid biosynthesis may enhance protection against ultraviolet radiation or improve pathogen resistance. In addition, changes in genes associated with tryptophan metabolism and monoterpenoid biosynthesis may be related to plant hormone synthesis and signaling, influencing plant perception of and adaptation to environmental changes. Therefore, the differentiation and adaptive evolution of Dendrobium species may be closely linked to variations within dynamic genomic regions. Structural variations not only provide a genetic basis for the adaptation of plants to diverse environments but also constitute an important source of diversity and complexity within the genus. Further investigation of the genes within these SV regions and their functional roles can lead to a deeper understanding of the molecular mechanisms underlying the adaptive evolution of Dendrobium species and how these plants harness these genetic variations to optimize their performance in various ecological niches.
Dendrobium is recognized for its key chemical components, including alkaloids, aromatic compounds, sesquiterpenoids, and polysaccharides, which are associated with health benefits such as immune modulation, neuroprotection, and antitumor effects43. However, the reliance on wild sources for medicinal purposes has resulted in unregulated harvesting and overexploitation, putting the species at risk and affecting their natural habitats. The biosynthesis of dendrobine, a significant sesquiterpenoid alkaloid in Dendrobium orchids, is highly dependent on the TPS gene family. Terpene synthases (TPSs) are enzymes that catalyze the conversion of farnesyl diphosphate (FPP) into sesquiterpenes (C15) and are believed to have evolved from isoprenyl diphosphate synthase genes, which are essential for dendrobine production. Homologous genes such as CrGES from Catharanthus roseus, which share similarities with TPS02 and TPS21, are implicated in this biosynthetic pathway. The TPS-a subfamily, which exclusively encodes sesqui-TPSs44, may be a critical factor in the greater production of dendrobine in D. nobile than in other Dendrobium species, as it possesses a greater number of TPS-a genes6. Our analyses suggest that different branches of the TPS gene family likely played distinct roles in the evolution of Dendrobium. Specifically, the TPS-a subfamily may contribute to the diversification of the genus, whereas the TPS-b subfamily could be responsible for maintaining the fundamental functions of TPSs in Dendrobium species.
Orchidaceae possess a unique set of FAR1/FRS genes, among which Dendrobium shows a notable expansion. Compared with other orchid genera, such as Cymbidium and Paphiopedilum, which have fewer than 100 species, most Dendrobium species are epiphytic. This life strategy not only avoids competition with terrestrial plants but also provides ecological niche isolation, promoting species diversification and differentiation. In contrast, terrestrial genera such as Cymbidium and Paphiopedilum face slower species differentiation rates because of competition with other terrestrial plants, resulting in fewer species. Epiphytic orchids often grow beneath dense forest canopies, where the available light is limited after passing through the canopy, making far-red light a crucial source of illumination. The FAR1/FRS gene family is linked to the response to far-red light. Under shaded conditions, epiphytic orchids face selective pressure from the low light levels, leading to the retention and expansion of genes associated with far-red light response, increasing the adaptability of the epiphytic orchids to the far-red light in their habitat.
Phylogenetic and biogeographic analyses of 204 Dendrobium species reveal extensive reticulate evolution across the genus, likely linked to the complex geological history of its distribution range. On the basis of our results regarding the origin of the broad-sense Dendrobium genus (including the former Epigeneium genus and the former Flickingeria genus), we suggest that the former Epigeneium originated from the uplift of the Himalayas and then migrated from the mountains to the lowlands and tropical regions along the Indo-China Peninsula, giving rise to Dendrobium sensu stricto. After crossing Wallace’s Line and reaching Australia, the species in each location evolved independently, resulting in two major branches: the Australian and the Asian lineages. These branches continued to evolve rapidly due to the uplift of the Indian plate and its southeastward compression against the Indo-China Peninsula, as well as influences from sea-level changes, monsoon effects, hybridization, and geographic and reproductive isolation, leading to a burst of species diversification.
Methods
Genome sequencing and assembly
Samples of Dendrobium were collected from the wild. DNA was extracted from young leaves using cetyltrimethylammonium bromide (CTAB) reagent for short-read and PacBio sequencing. The leaf buds were fixed with 1% formaldehyde in MS buffer for 30 minutes under a vacuum45. The Hi-C library was constructed by digesting the cross-linked chromatin with the DpnII restriction enzyme, followed by biotin labeling of the DNA ends, proximity ligation, and purification of the resulting chimeric junctions46. An RNA Plant Plus Kit (Tiangen, DP473) was used to extract RNA from the roots and leaves of each species. The RNA was de novo sequenced on an Illumina HiSeq 2500 platform. A 20 kb single-molecule real-time (SMRT) DNA library was constructed and sequenced using the PacBio Sequel platform. The SMRTbell template was prepared using AMPure PB Magnetic Beads (Pacific Biosciences). The obtained PCR products were subsequently purified (AMPure XP system), and the corresponding library quality was assessed using the Agilent Bioanalyzer 2100 system. Library preparations were sequenced using a BGISEQ DNBSEQ-T7 sequencer, and 150 bp paired-end reads were generated.
Each genome was assembled using PacBio reads. First, the PacBio raw reads were corrected using Falcon47, and the corrected reads were assembled using Smartdenovo v1.048. Owing to the high error rate of PacBio reads, indel and SNP errors remained in the assembly results. Illumina reads were used to correct the assembly results using pilon v1.2249. Genome size and heterozygosity were measured using jellyfish v2.1.450 and genomeScope51, which are based on 17-mer distribution. The total length of the assembly was greater than the genome size estimated by k-mer analysis, and trimDup was used to reduce redundancy in the assembly results. SOAPnuke v2.1.052 was used to filter the Hi-C raw reads (parameters: filter -n 0.02 -l 20 -q 0.4 -G 2 -i -Q 2 --seqType 0) and obtain clean reads.
We used the Canu software to correct errors in the PacBio CLR data and performed an initial assembly of the corrected data to obtain contig-level assembly results. Then, we employed ONT ultra-long reads to close gaps in the Dendrobium assemblies. First, ultra-long reads were aligned to the contigs with minimap2 v2.2853. We retained reads that spanned an assembly gap, requiring that ≥ 80 % of each read length be aligned to the two flanking contigs and that the distance between either aligned end and the corresponding contig end did not exceed 5 kb. A gap was considered fillable only if it was crossed by at least four such reads. The gap-spanning portions of these reads were extracted, and a consensus sequence was generated with the wtcns module of Smartdenovo v1.0; finally, the consensus was used to replace the original gap sequence. Subsequently, these gap-filled contigs were ordered and oriented into a chromosome-scale assembly using the Juicer pipeline to align Hi-C reads and analyze the resulting chromatin interaction frequencies54. A Hi-C library was constructed from 0.2 g of leaf tissue collected from newly emerged seedlings via the Proximo™ Hi-C Plant Kit (Phase Genomics) following the manufacturer’s protocol. After verifying the sample quality, the Hi-C library was size-selected for 300–600 bp fragments and sequenced on DNBSEQ-T7 in paired-end 150 bp mode. The resulting reads were used to avoid erroneous alignment of highly similar homologous regions. The read pairs were used as inputs for Juicer54 and 3d-DNA55 Hi-C analyses and scaffolding pipelines. Short reads were trimmed using Trimmomatic (0.39)56 and aligned to contigs using BWA (v0.7.16) with strict parameters (-n = 0) to prevent mismatches and nonspecific alignments in repetitive and homologous regions. The contigs were ordered and oriented, and assembly errors were identified using a 3D-DNA pipeline with default parameters. The resulting Hi-C contact matrix was visualized using Juicebox, and misassemblies and misjoins were manually corrected on the basis of neighboring interactions. We then used PBjelly57 to fill the gaps in the Hi-C assembly by detecting the reads that joined the adjacent reads. After gap closing, the reads were again BLAST searched against the genome, and pilon was used for polishing, resulting in the final assembly. The assembly quality and integrity of the genome were assessed using BUSCO v.358 using embryophyta_odb10 database.
Three strategies were combined to predict the genes in each Dendrobium genome: homology-based, de novo, and RNA-seq data alignment. For homology-based annotation, we aligned the homologs from representative species to the genome using BLASTX59 with a score threshold of “-e 1e − 6”, and the protein sequences of A. thaliana (The Arabidopsis Genome Initiative, Analysis of the genome sequence of the flowering plants A. thaliana 2000), Oryza sativa60, Sorghum bicolor61, Cocos nucifera62, E. guineensis63, and P. dactylifera (DPV01)64 were downloaded from public repositories. We then used Exonerate v2.2.0 in the candidate gene regions to map and search for the most accurate splice alignments and to define the gene models with the following parameters: model protein 2 genome. For de novo prediction, we randomly selected 2000 full-length intact genes (intact structure: start codon, stop codon, perfect intron–exon boundary) from the gene models predicted using homology-based methods to train the model parameters for AUGUSTUS v3.2.165 and SNAP v2006–07–2866. Trained software programs were used for de novo gene prediction from the repeat-masked genome. For transcriptome-based prediction, RNA-seq data were mapped onto the genome to identify splice junctions using HISAT2 v2.1.0 software67. The aligned reads were assembled into transcripts using StringTie v1.2.268. PASA_lite (https://github.com/PASApipeline/PASA_Lite) was used to assemble the transcripts into gene structures to obtain perfect alignments. Finally, the integrated consensus gene models were derived from MAKER v3.31.869 with upper AUGUSTUS and SNAP de novo predictions, the predicted gene structures derived from the six protein-based homologs, and RNA-seq-based transcript structures. To evaluate the quality of the gene prediction, annotation edit distance scores were generated for each predicted gene using the MAKER pipeline. Finally, genes meeting the following conditions were removed: 1) genes with premature stop codons, 2) coding regions (CDSs) with ambiguous bases, and 3) genes with coding sequence lengths < 90 bp.
Gene functional annotations were based on sequence similarity and domain conservation. First, the protein-coding genes were aligned against the KEGG (release 84)70, SwissProt (release-2017_09)71, TrEMBL (release-2017_09)71, NR (release 20170924), and KOG72 databases using BLASTP, with a cutoff E value threshold of 10−5. The best match from the alignment was subsequently used to represent gene function. Second, InterProScan software v5.16–55.073 was used to identify motifs and domains, with reference to the public databases Pfam74, PRINTS75, ProDom76, SMART77, PANTHER78, TIGRFAM79, SUPERFAMILY80, Hamap81, PIRSF82, Gene3D83, and PROSITE84. GO functional information was retrieved from InterProScan by searching for GO terms with the parameter “-goterms”.
Core and dispensable gene family clustering
Core and dispensable gene sets were defined on the basis of gene family clustering using OrthoMCL85 v2.0.9. For the 24 Dendrobium species, a gene containing a CDS with 100% similarity to other genes was removed using the cd-hit-est of the CDHIT86 v4.6 toolkit with the parameter –c 1 –aS 1. The protein sequences of the remaining genes were subjected to homology searching using BLASTp87 with parameters of –evalue 1-e10 –max_target_seqs 116. OrthoMCL (version 2.0.9) was used to analyze the BLAST results with the parameters percentMatchCutoff = 50 and -I = 1.5 to perform gene family clustering. Gene families that were shared among accessions were defined as core gene families, those that were missing from one or two accessions were defined as soft core gene families, those that were missing from more than two accessions were defined as dispensable gene families, and those that existed in only one accession were defined as private gene families. For phylogenetic analysis of each gene family, MUSCLE88 v3.8.31 was used for sequence alignment, and MEGA689 was used to build the phylogenetic trees.
For gene function annotation, KEGG pathway analysis was performed using KOBAS 3.090, protein domains were annotated using InterProScan 573, and GO terms were annotated using PANNZER291. The enrichment test was performed online (https://www.omicshare.com/tools/).
Phylogenetic analysis and synteny analysis
After aligning the single-copy genes using MAFFT92, the alignment results were concatenated and used to construct the phylogenetic tree with PhyML93. Synteny analysis of the 24 genomes was performed via whole-genome alignment using MUMmer494. Genome alignment was performed using NUCmer (–c 1000), and then an alignment block filter was applied using a delta filter in one-to-one alignment mode. Blocks longer than 1000 bp were used for further structural variation detection.
Structural variation identification
SNPs and indels were identified using show-snps (-ClrT) in the MUMmer4 toolkit. We used the structural variants from MUMmer (SVMU)95 pipeline to automate presence and absence variation (PAV) discovery by parsing the results of NUCmer. On the basis of the SVMU results, an insertion/deletion (with the INS/DEL tag) was treated as a PAV. Genome regions detected neither as a synteny block by NUCmer nor as an insertion/deletion by SVMU were treated as PAV regions.
Selection estimation
To estimate the Ka/Ks ratio for gene sequences, we employed the PAML software package using the CODEML model. Nucleotide and protein sequences were aligned using MUSCLE, and the resulting alignments were formatted into a codon alignment. The CODEML model was applied to calculate the nonsynonymous (Ka) and synonymous (Ks) substitution rates, and the Ka/Ks ratio was derived to assess selective pressure. Sites or branches with Ka/Ks > 1 were considered under positive selection, while those with Ka/Ks < 1 indicated purifying selection.
Gene family identification for TPS genes and pathways associated with FA
The HMM profiles for PF01397 (Terpene_synth) and PF03936 (Terpene_synth_C) were downloaded from Pfam (http://pfam.xfam.org/), and the two profiles were used to perform HMM searches against the database of proteins from the 24 Dendrobium species. These sequences were then manually checked, and those with at least one of these domains were retained. The retained amino acid sequences were aligned using MAFFT92. The aligned amino acids were used for phylogenetic tree construction using PhyML93. The tree was generated using the maximum likelihood method based on the Jones–Taylor–Thornton (JTT) matrix-based model96 and the bootstrap method for phylogenetic tests with 1,000 replicates. Genes in the FA biosynthesis pathway of the A. thaliana genome were retrieved from the Acyl Lipids database (http://aralip.plantbiology.msu.edu/pathways/pathways). Gene sequences in the target genome were used to search all the A. thaliana genes. Two criteria were used to define FA-related genes in the Dendrobium genome: (i) the target gene is the top hit of FA-related genes to A. thaliana genomes; (ii) the BLAST E value is less than 10e−10.
Gene family identification for self-incompatibility and NBS genes
To identify homologs of type IV and type V, the amino acid sequences of CYP734A50 (GenBank ID: AOA32956.1, AOA32958.1 and AQT19637.1) were used as queries to search the database of proteins from 24 Dendrobium species from this study and Li et al.10, as well as Apostasia and Phalaenopsis species via BLASTP software with default parameters. And the amino acid sequences of GLO2 (GenBank ID: AQT19638.1), BAHD (GenBank ID: UBU50656.1), SPH1 (GenBank ID: AND01232.1) and YUC6 (GenBank ID: QDP16901.1) were used as queries to search the database of proteins from 24 Dendrobium species via BLASTP software with default parameters. For type VI genes, the HMM profile for PF03140.20 was downloaded from Pfam (pfam.xfam.org/) and used to perform HMM searches against the database of proteins from the 24 Dendrobium species. The amino acid sequences of HPS10 were downloaded online97, and BLASTP was used for HPS10 homolog identification with default parameters.
Homologs of type I, II, and III proteins were identified by performing a BLASTP search against a custom protein database containing sequences from 24 Dendrobium species as well as Apostasia and Phalaenopsis species, and known protein sequences were used as queries for this search98. The details are as follows: the amino acids sequences of T2 RNase and SLF genes were obtained, and BLASTP was used for these genes’ identification using default settings with the expected threshold cut-off of 10-6, respectively98. The amino acids sequences of SRK and SCR genes from Supplementary Fig. 9a and 9b were obtained, and BLASTP was used for these genes’ identification using default settings with the expected threshold cut-off of 0 and 0.05, respectively98. The amino acids sequences of PrsS and PrpS genes from Supplementary Fig. 10a and 10b were obtained, and BLASTP was used for these genes’ identification using default settings with the expected threshold cut-off of 10-6 and 0.05, respectively98.
The HMM profile for PF00931.27 (NB-ARC) was downloaded from Pfam (pfam.xfam.org/) and used to perform HMM searches against the database of proteins from Dendrobium species from this study and Li et al.10 as well as Apostasia and Phalaenopsis species.
The retained amino acid sequences were aligned using MAFFT92. The aligned amino acids were used for phylogenetic tree construction using PhyML93. The tree was generated using the maximum likelihood method based on the default parameter96 and the bootstrap method for phylogenetic tests with 1000 replicates.
Transcriptome analysis
Leaf, root, stem, flower and styles after self- and cross-pollination were sampled from each Dendrobium species for three replications. The transcriptomic data from different flowering stages are derived from our previous study5, which includes petal, sepal, lip, and column, sampled across five developmental stages from bud to fully open flower, with three biological replicates for each stage. And the transcriptomes from to study the functions and gene expression patterns of different genes, reads from different parts were aligned to the corresponding reference gene sets of the species using HISAT, and RPKM values were calculated using RSEM V. 1.3.399 to determine the expression levels in each sample. DESeq was used for differential expression analysis between sample groups to obtain the DEG sets between two biological samples. Hierarchical clustering analysis was carried out for the screened DEGs to find groups of genes with the same or similar expression patterns. For the detection of DEGs, fold change > 2 and false discovery rate (FDR) < 0.01 were used as cutoff values. As a screening standard, fold change (0.01) represents the FDR between two samples with a corrected p-value for significant differences. GO/KEGG enrichment analyses were conducted using the OmicShare cloud platform (https://www.omicshare.com/quote.php).
Biogeographic analysis of Dendrobium species
A total of 204 species representing Dendrobium sensu lato (including Dendrobium, Epigeneium, and Flickingeria) were sampled, covering the entire geographic distribution of the genus, including key centers of diversity such as mainland Asia, the Malay Archipelago, Sumatra, Borneo, the Philippines, Sulawesi, New Guinea (27 species), and Australia (9 species). Seven species from related genera (Coelogyne viscosa, C. leucantha, Phalaenopsis micholitzii, P. pallens, Rhynchostylis retusa, Pelatantheria rivesii, and Oxystophyllum changjiangense) were selected as outgroups for phylogenetic reconstruction. Detailed information on material sources is provided in Supplementary Data 9. Fresh leaf material was collected for genomic DNA extraction, libraries were sequenced by HiSeq 2500 (Illumina, San Diego, CA, USA), aiming for 15 Gb of raw data per sample. k-mer analysis (k = 17) was performed using kmerfreq v5.0100 on the clean reads to estimate the genome size of each sample, ensuring a sequencing depth of at least 5 × for all species.
Single-copy nuclear genes were identified using OrthoFinder v1.0.7101 and orthoMCL85 on the predicted protein-coding gene sets from six representative Orchidaceae species with available genome data: Apostasia shenzhenica, Vanilla shenzhenica, Dendrobium catenatum, Phalaenopsis equestris, Ludisia discolor, and Epipogium roseum. Single-copy genes from D. catenatum identified in this analysis were subsequently used as the reference set. To obtain single-copy nuclear gene SNP data, clean reads were mapped to this reference gene set using BWA-mem. SAM files were processed with Picard tools for reordering (ReorderSam), sorting (SortSam), and marking PCR duplicates (MarkDuplicates). The Genome Analysis Toolkit (GATK) v3.8-0102 was then used for local realignment around indels (RealignerTargetCreator and IndelRealigner) to minimize alignment errors. SNPs and InDels were called using GATK’s UnifiedGenotyper with the BOTH parameters. The resulting variant call set was further filtered based on parameters such as FS (FisherStrand), QD (QualitybyDepth), and missing data rate to produce a high-quality SNP/InDel dataset. Chloroplast genome sequences for Dendrobium species were compiled from published data (Supplementary Data 10) and used as a reference dataset. SNP calling from the newly sequenced genomic data against this chloroplast reference followed the same pipeline described above for single-copy nuclear genes.
Potential reticulate evolutionary events within Dendrobium were investigated by constructing a phylogenetic network using SPLITSTREE4103. Divergence times were estimated using a Bayesian phylogenetic approach implemented in BEAST v1.8.4104. Two calibration points were applied. First, the most recent common ancestor (MRCA) of the Australian Dendrobium clade was constrained using the fossil Dendrobium winikaphyllum105, with a lognormal prior (mean = 22.0 Mya, SD = 1.0 Mya). Second, the crown age of Dendrobium was constrained based on the divergence time estimate from Givnish et al.106, also with a lognormal prior (mean = 29.0 Mya, SD = 1.0 Mya). The analysis employed the GTR + I + G substitution model, an uncorrelated lognormal relaxed molecular clock, and a Birth-Death Process speciation tree prior. Markov chain Monte Carlo (MCMC) analyses were run for 10,000,000 generations, with trees sampled every 5000 generations. The initial 10% of trees (1000 trees) were discarded as burn-in. A maximum clade credibility tree was summarized using TreeAnnotator v1.8.0, and node ages were visualized in FigTree v1.4.0.
Ancestral geographic ranges were reconstructed using the Statistical Dispersal-Vicariance Analysis (S-DIVA) method implemented in RASP107. The 9000 post-burn-in trees generated from the BEAST analysis were used as input for the S-DIVA analysis. Speciation rates across the Dendrobium phylogeny were estimated using the Bayesian Analysis of Macroevolutionary Mixtures (BAMM) software108. The analysis was performed on the 9000 post-burn-in trees obtained from the BEAST divergence time estimation.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Raw sequencing reads, Hi-C data, and assembled genomes for the eight de novo sequenced Dendrobium species, along with the genome skimming data for 204 Dendrobium species have been deposited in the China National GeneBank (CNGB) Nucleotide Sequence Archive (CNSA) under the following accession: CNP0005985 (D. spatella), CNP0005980 (D. exile), CNP0006326 (D. bullenianum), CNP0006327 (D. cariniferum), CNP0006328 (D. densiflorum), CNP0006329 (D. lindleyi), CNP0006330 (D. nobile), CNP0006331 (D. parcum), CNP0006332 (D. porphyrochilum), CNP0006333 (D. secundum), and CNP0006334 (D. thyrsiflorum). We also upload the same sequencing data and assemblies in the National Genomics Data Center (NGDC) under Bioproject accession PRJCA049029. The genome annotation, phylogenetic trees and the related analysis data supporting the findings of this study can be found in Figshare [https://doi.org/10.6084/m9.figshare.30484487.v2]. Source data are provided in this paper.
References
Soltis, P. S., Folk, R. A. & Soltis, D. E. Darwin review: angiosperm phylogeny and evolutionary radiations. Proc. R. Soc. Lond. B Biol. Sci. 286, 20190099 (2019).
Cribb, P. J. & Govaerts, R. Just how many orchids are there? in Naturalia Publications (2005).
Pridgeon, A. M., Cribb, P. J., Chase, M. W. & Rasmussen, F. N. Genera Orchidacearum Volume 6: Epidendroideae (Part 3), (OUP Oxford, 2014).
Li, P., Li, L. & Wang, Y. Traditional uses, chemical compositions and pharmacological activities of Dendrobium: A review. J. Ethnopharmacol. 310, 116382 (2023).
Chen, B. et al. Dendrobium thyrsiflorum genome and its molecular insights into genes involved in important horticultural traits. Plant Biotechnol. J. 22, 2887–2889 (2024).
Xu, Q. et al. Chromosome-scale assembly of the Dendrobium nobile genome provides insights into the molecular mechanism of the biosynthesis of the medicinal active ingredient of Dendrobium. Front. Genet. 13, 844622 (2022).
Zhang, G. et al. The Dendrobium catenatum Lindl. genome sequence provides insights into polysaccharide synthase, floral development and adaptive evolution. Sci. Rep. 6, 19029 (2016).
Han, B. et al. A chromosome-level genome assembly of Dendrobium huoshanense using long reads and Hi-C data. Genome Biol. Evol. 12, 2486–2490 (2020).
Zhang, Y. et al. Chromosome-scale assembly of the Dendrobium chrysotoxum genome enhances the understanding of orchid evolution. Hortic. Res. 8, 183 (2021).
Li, Y. et al. Pangeneric genome analyses reveal the evolution and diversity of the orchid genus Dendrobium. Nat. Plants 11, 421–437 (2025).
Zhang, G. et al. The Apostasia genome and the evolution of orchids. Nature 549, 379–383 (2017).
Casal, J. J., Candia, A. N. & Sellaro, R. Light perception and signalling by phytochrome A. J. Exp. Bot. 65, 2835–2845 (2014).
Lin, R. & Wang, H. Arabidopsis FHY3/FAR1 gene family and distinct roles of its members in light control of Arabidopsis development. Plant Physiol. 136, 4010–4022 (2004).
Lin, R. et al. Transposase-derived transcription factors regulate light signaling in Arabidopsis. Science 318, 1302–1305 (2007).
Ai, Y. et al. The Cymbidium genome reveals the evolution of unique morphological traits. Hortic. Res. 8, 255 (2021).
Becker, A. & Theissen, G. The major clades of MADS-box genes and their role in the development and evolution of flowering plants. Mol. Phylogenet. Evol. 29, 464–489 (2003).
Callens, C., Tucker, M. R., Zhang, D. & Wilson, Z. A. Dissecting the role of MADS-box genes in monocot floral development and diversity. J. Exp. Bot. 69, 2435–2459 (2018).
Zhang, D. et al. Molecular insights into self-incompatibility systems: From evolution to breeding. Plant Commun. 5, 100719 (2024).
Zhao, H. & Xue, Y. Molecular and evolutionary mechanisms of self-incompatibility in angiosperms. Yi Chuan 46, 3–17 (2024).
Zhang, X. et al. Challenges and perspectives in the study of self-incompatibility in Orchids. Int. J. Mol. Sci. 22, 12901 (2021).
Niu, S. et al. Morphological type identification of self-incompatibility in Dendrobium and its phylogenetic evolution pattern. Int. J. Mol. Sci. 19, 2595 (2018).
Huu, C. N. et al. Female self-incompatibility type in heterostylous Primula is determined by the brassinosteroid-inactivating cytochrome P450 CYP734A50. Curr. Biol. 32, 671–676 (2022).
Huu, C. N., Keller, B., Conti, E., Kappel, C. & Lenhard, M. Supergene evolution via stepwise duplications and neofunctionalization of a floral-organ identity gene. Proc. Natl. Acad. Sci. USA 117, 23148–23157 (2020).
Niu, S. et al. Lack of S-RNase-based gametophytic self-incompatibility in orchids suggests that this system evolved after the Monocot-Eudicot split. Front. Plant Sci. 8, 1106 (2017).
Chen, F. et al. The family of terpene synthases in plants: a mid-size family of genes for specialized metabolism that is highly diversified throughout the kingdom. Plant J. 66, 212–229 (2011).
Yang, J. et al. Evolution patterns of NBS genes in the genus Dendrobium and NBS-LRR gene expression in D. officinale by salicylic acid treatment. BMC Plant Biol. 22, 529 (2022).
Wood, H. P. The Dendrobiums. (A.R.G. Gantner Verlag K. G., Ruggell, Liechtenstein, 2006).
Yukawa, T. et al. Phylogenetic implications of chloroplast DNA restriction site variation in the subtribe Dendrobiinae (Orchidaceae). Lindleyana 8, 211–221 (1993).
Clements, M. A. Molecular phylogenetic systematics in the Dendrobiinae (Orchidaceae), with emphasis on Dendrobium section Pedilonum. Telopea 10, 247–298 (2003).
Xiang, X. G. et al. Molecular systematics of Dendrobium (Orchidaceae, Dendrobieae) from mainland Asia based on plastid and nuclear sequences. Mol. Phylogenet. Evol. 69, 950–960 (2013).
Burke, J. M. et al. Molecular phylogenetic analysis of Dendrobium (Orchidaceae), with emphasis on the Australian section Dendrocoryne, and implications for generic classification. Aust. Syst. Bot. 21, 1–14 (2008).
Guo, Y. Y. et al. Reticulate evolution and sea-level fluctuations together drove species diversification of slipper orchids (Paphiopedilum) in South East Asia. Mol. Ecol. 24, 2838–2855 (2015).
Skeels, A. et al. Paleoenvironments shaped the exchange of terrestrial vertebrates across Wallace’s Line. Science 381, 86–92 (2023).
Morley, R. J. et al. High-resolution Palaeogene sequence stratigraphic framework for the Cuu Long Basin, offshore Vietnam, driven by climate change and tectonics, established from sequence biostratigraphy. Palaeogeogr. Palaeoclimatol. Palaeoecol. 530, 113–135 (2019).
Morley, R. J. et al. Sequence biostratigraphic framework for the Oligocene to Pliocene of Malaysia: High-frequency depositional cycles driven by polar glaciation. Palaeogeogr. Palaeoclimatol. Palaeoecol. 561, 110058 (2021).
You, Y. et al. Transition of survival strategies under global climate shifts in the grape family. Nat. Plants 10, 1100–1111 (2024).
Westerhold, T. et al. An astronomically dated record of Earth’s climate and its predictability over the last 66 million years. Science 369, 1383–1387 (2020).
Kim, B. & Zhang, Y. G. Methane hydrate dissociation across the Oligocene–Miocene boundary. Nat. Geosci. 15, 203–209 (2022).
Frigola, A., Prange, M. & Schulz, M. Boundary conditions for the Middle Miocene Climate Transition (MMCT v1.0). Geosci. Model Dev. 11, 1607–1626 (2018).
Davies, A. A. et al. Role of BRCA2 in control of the RAD51 recombination and DNA repair protein. Mol. Cell 7, 273–282 (2001).
Zhang, S. & Klessig, D. F. MAPK cascades in plant defense signaling. Trends Plant Sci. 6, 520–527 (2001).
Schöttler, M. A., Flügel, C., Thiele, W., Stegemann, S. & Bock, R. The plastome-encoded PsaJ subunit is required for efficient Photosystem I excitation, but not for plastocyanin oxidation in tobacco. Biochem. J. 403, 251–260 (2007).
Xu, J. et al. Chemistry, bioactivity and quality control of Dendrobium, a commonly used tonic herb in traditional Chinese medicine. Phytochem. Rev. 12, 341–367 (2013).
Jiang, S., Jin, J., Sarojam, R. & Ramachandran, S. A comprehensive survey on the terpene synthase gene family provides new insight into its evolutionary patterns. Genome Biol. Evol. 11, 2078–2098 (2019).
Wang, C. et al. Genome-wide analysis of local chromatin packing in Arabidopsis thaliana. Genome Res. 25, 246–256 (2015).
Chen, M. et al. De novo genome assembly and Hi-C analysis reveal an association between chromatin architecture alterations and sex differentiation in the woody plant Jatropha curcas. GigaScience 9, giaa009 (2020).
Chin, C. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050–1054 (2016).
Liu, H., Wu, S., Li, A. & Ruan, J. SMARTdenovo: a de novo assembler using long noisy reads. GigaByte 2021, 15 (2021).
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9, e112963 (2014).
Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204 (2017).
Chen, Y. et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. GigaScience 7, gix120 (2018).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Robinson, J. T. et al. Juicebox is provides a cloud-based visualization system for Hi-C data. Cell Syst. 6, 256–258 (2018).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
English, A. C. et al. Mind the Gap: Upgrading genomes with pacific biosciences RS long-read sequencing technology. PLoS ONE 7, e47768 (2012).
Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Goff, S. A. et al. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296, 92–100 (2002).
Paterson, A. H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009).
Lantican, D. V. et al. De novo genome sequence assembly of dwarf coconut (Cocos nucifera l. ‘Catigan Green Dwarf’) provides insights into genomic variation between coconut types and related palm species. G3 9, 2377–2393 (2019).
Singh, R. et al. Oil palm genome sequence reveals divergence of interfertile species in Old and New worlds. Nature 500, 335–339 (2013).
Al-Mssallem, I. S. et al. Genome sequence of the date palm Phoenix dactylifera L. Nat. Commun. 4, 2274 (2013).
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).
Korf, I. Gene finding in novel genomes. BMC Bioinform. 5, 59 (2004).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Cantarel, B. L. et al. MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18, 188–196 (2008).
Ogata, H. et al. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 27, 29–34 (1999).
Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000).
Koonin, E. V. et al. A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol. 5, R7 (2004).
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Bateman, A. et al. The Pfam protein families database. Nucleic Acids Res. 32, D138–D141 (2004).
Attwood, T. K. PRINTS-S: the database formerly known as PRINTS. Nucleic Acids Res. 28, 225–227 (2000).
Corpet, F., Gouzy, J. & Kahn, D. Recent improvements of the ProDom database of protein domain families. Nucleic Acids Res. 27, 263–267 (1999).
Schultz, J. SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res. 28, 231–234 (2000).
Mi, H., Poudel, S., Muruganujan, A., Casagrande, J. T. & Thomas, P. D. PANTHER version 10: expanded protein families and functions, and analysis tools. Nucleic Acids Res. 44, D336–D342 (2016).
Selengut, J. D. et al. TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res. 35, D260–D264 (2007).
Wilson, D. et al. SUPERFAMILY-sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Res. 37, D380–D386 (2009).
Pedruzzi, I. et al. HAMAP in 2015: updates to the protein family classification and annotation system. Nucleic Acids Res. 43, D1064–D1070 (2015).
Wu, C. H. et al. PIRSF: family classification system at the Protein Information Resource. Nucleic Acids Res. 32, D112–D114 (2004).
Lam, S. D. et al. Gene3D: expanding the utility of domain assignments. Nucleic Acids Res. 44, D404–D409 (2016).
Hulo, N. et al. The PROSITE database. Nucleic Acids Res. 34, D227–D230 (2006).
Li, L., Stoeckert, C. J. J. & Roos, D. S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003).
Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinform. 10, 421 (2009).
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: Molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729 (2013).
Bu, D. et al. KOBAS-i: Intelligent prioritization and exploratory visualization of biological functions for gene enrichment analysis. Nucleic. Acids. Res. 49, W317–W325 (2021).
Toronen, P., Medlar, A. & Holm, L. PANNZER2: a rapid functional annotation web server. Nucleic Acids Res. 46, W84–W88 (2018).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).
Marcais, G. et al. MUMmer4: A fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).
Chakraborty, M., Emerson, J. J., Macdonald, S. J. & Long, A. D. Structural variants exhibit widespread allelic heterogeneity and shape variation in complex traits. Nat. Commun. 10, 4872 (2019).
Jones, D. T., Taylor, W. R. & Thornton, J. M. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8, 275–282 (1992).
Wang, Y. et al. Control of gametophytic self-incompatibility in the African wild rice. Preprint at https://doi.org/10.21203/rs.3.rs-2121145/v1 (2022).
Zhao, H. et al. Origin, loss, and regain of self-incompatibility in angiosperms. Plant Cell 34, 579–596 (2022).
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinform. 12, 323 (2011).
Fu, L. et al. findGSEP: estimating genome size of polyploid species using k-mer frequencies. Bioinform. 40, 11, btae647 (2024).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Huson, D. H. & Bryant, D. Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol. 23, 254–267 (2006).
Suchard, M. A. et al. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4, vey016 (2018).
Conran, J. G., Bannister, J. M. & Lee, D. E. Earliest orchid macrofossils: Early Miocene Dendrobium and Earina (Orchidaceae: Epidendroideae) from New Zealand. Am. J. Bot. 96, 466–474 (2009).
Givnish, T. J. et al. Orchid phylogenomics and multiple drivers of their extraordinary diversification. Proc. Biol. Sci. 282, https://doi.org/10.1098/rspb.2015.1553 (2015).
Yu, Y., Harris, A. J. & He, X. S-DIVA (Statistical Dispersal-Vicariance Analysis): A tool for inferring biogeographic histories. Mol. Phylogenet. Evol. 56, 848–850 (2010).
Rabosky, D. L. et al. Rates of speciation and morphological evolution are correlated across the largest vertebrate radiation. Nat. Commun. 4, 1958 (2013).
Zhang, L., Li, P., Zhang, X. & Li, J. Two floral forms in the same species—distyly. Planta 258, 72 (2023).
Acknowledgements
This project was supported by the National Natural Science Foundation of China (32300189, 32300492, 31800343), the Talent Project of Guangzhou Medical University High-level University Construction (06-410-2106132), the Forestry Peak Discipline Construction Project of Fujian Agriculture and Forestry University (72202200205), the National Key Research and Development Program of China (No. 2203YFD1600504) and the Science Research Project of Hebei Education Department (QN2024240).
Author information
Authors and Affiliations
Contributions
Conceptualization: Z.J.L., G.Q.Z., S.R.L., and Q.X. Methodology: B.J.C., J.Y.W., S.C.N., G.Q.Z., and S.R.L. Software and Formal Analysis: B.J.C., J.Y.W., S.C.N., Q.H.C., and Z.H.H. Data Curation and Visualization: B.J.C., X.F.C., and P.J.Z. Investigation and Resource Provision: Y.M.L., K.L.L., Z.Z.Z., Y.Y.L., C.L.Z., X.W.Z., J.L., X.H., L.Y.W., Y.H., Q.Y.Z., M.M.Z., D.Q.C., C.Y.Z., M.Y.Z., Z. Zhou, M.Z.H., Z. Zhao, D.H.P., L.H.Z., and B.C.H. Writing – Original Draft: B.J.C., J.Y.W., S.C.N., and Q.H.C. Writing – Review & Editing: W.C.T., S.R.L., G.Q.Z., Z.J.L., and Q.X. Supervision, Project Administration, and Funding Acquisition: Z.J.L., G.Q.Z., and S.R.L.
Corresponding authors
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Peer review
Peer review information
Nature Communications thanks Ki-Tae Kim and the other anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Chen, BJ., Wang, JY., Niu, SC. et al. Comparative genomics analyses reveal genomic variation and evolutionary adaptation in Dendrobium orchids. Nat Commun 17, 63 (2026). https://doi.org/10.1038/s41467-025-66688-7
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41467-025-66688-7







