Abstract
Pollinator-driven evolution of floral traits is thought to be a major driver of angiosperm speciation and diversification. Ophrys orchids mimic female insects to lure male pollinators into pseudocopulation. This strategy, called sexual deception, is species-specific, thereby providing strong premating reproductive isolation. Identifying the genomic architecture underlying pollinator adaptation and speciation may shed light on the mechanisms of angiosperm diversification. Here, we report the 5.2 Gb chromosome-scale genome sequence of Ophrys sphegodes. We find evidence for transposable element expansion that preceded the radiation of the O. sphegodes group, and for gene duplication having contributed to the evolution of chemical mimicry. We report a highly differentiated genomic candidate region for pollinator-mediated evolution on chromosome 2. The Ophrys genome will prove useful for investigations into the repeated evolution of sexual deception, pollinator adaptation and the genomic architectures that facilitate evolutionary radiations.
Similar content being viewed by others
Introduction
Understanding the genetic mechanisms of adaptation to pollinators is a central question in plant evolutionary biology. The prominent role of pollinators in flower evolution is related to their dual function: enabling sexual reproduction and imposing selection on floral traits1,2. This is particularly evident in plant species with a specialised pollination mechanism associated with floral traits that evolved to attract one or a few functionally alike pollinator species3,4. Therefore, pollinator-mediated evolution of floral traits is considered a major force driving angiosperm diversity by contributing to their radiation5,6. It has been hypothesised that radiations are facilitated by entering a new ecological niche with little or no competition from similar species (ecological opportunity), and the genetic potential allowing the necessary adaptations to evolve (genetic variation), ultimately leading to rapid bursts of speciation7,8,9.
Orchids of the Euro-Mediterranean genus Ophrys ensure reproduction through sexual deception. Specifically, they mimic the olfactory, visual and tactile signals of females of their pollinating insects to entice conspecific males to pseudocopulate with the flower, leading to pollination (Fig. 1a, b). Among the flower traits adapted to pollinators, olfactory signals are pivotal to specific pollinator attraction10,11, with selection by pollinators12 leading to strong odour differentiation among closely related species13. Additional adaptations to pollinators likely involve flower labellum colour, which matches pollinator body coloration, and floral morphology that optimises pollen transfer13. At the same time, conspicuous UV-reflective patterns and odour compounds not primarily required for sexual attraction are highly variable between plants and likely aid male pollinators in memorising and avoiding plants, thereby increasing outcrossing rates13,14,15. This extreme specialisation of Ophrys floral traits makes the plant-insect interaction highly species-specific, and the lack of shared pollinators between species leads to strong premating reproductive isolation12,13. Notwithstanding some uncertainty about the number of species and the extent of pollinator sharing16,17, at the local population level extreme pollinator specialisation is evident in Ophrys species12,13,18. The genus Ophrys is of relatively recent origin (c. 4.9 Ma crown age), with the earliest-diverged Ophrys lineage likely having been wasp-pollinated19,20. Two independent Ophrys lineages later experienced a pollinator shift to Andrena bee pollinators, preceding a burst of speciation in the O. sphegodes group within the last million years19. This resulted in a species radiation with one of the highest reported diversification rates among angiosperms19. While several facets of Ophrys speciation are relatively well understood12,13,19, the genomic features allowing its adaptation to diverse pollinators are largely unknown. Our ability to elucidate how genome architecture has contributed to Ophrys diversification and adaptation to pollinators has hitherto been hampered by the lack of a reference genome.
a Pseudocopulation of an Andrena nigroaenea male with an Ophrys sphegodes flower. Photo courtesy of Noa Schwabe. b Schematic representation of sexual deception. An Ophrys orchid mimics the pollinator’s female insect’s hairs, wings and pheromones to lure the male into pseudocopulation. Photos courtesy of Noa Schwabe. c The 18 assembled pseudochromosomes of O. sphegodes are labelled from 1 to 18. From outside to inside: chromosomes; gene density in light green; Gypsy retroelements in turquoise; Copia retroelements in blue; cytosine methylome in light red; between-species differentiation (global FST) along chromosomes in grey. d Recent LTR insertions vs. their age, showing that LTR expansion reached its maximum at around 1.3 to 0.8 Ma ago.
Here, we present the chromosome-level genome assembly of Ophrys sphegodes, a key representative of the genus and the adaptive radiation, to address questions on the genetic mechanisms underlying such rapid pollinator-driven speciation.
Results and discussion
Chromosome-level genome assembly
The orchid Ophrys sphegodes Mill. has a karyotype of 2n = 2x = 36 12,21,22. The haploid genome size was estimated to be 4.83 Gb by flow cytometry, and heterozygosity was estimated at 1.28% via k-mer analysis (Supplementary Figs. 1 and 2, Supplementary Table 1). To assemble the genome, we generated a total of 409 Gb data on the Nanopore PromethION platform (Supplementary Table 2, Supplementary Fig. 3). Additionally, whole-genome Illumina sequencing data (WGS; 268 Gb, Supplementary Table 3) and Hi-C chromatin conformation capture libraries (Illumina, Supplementary Table 3) were produced to perform polishing and anchoring of scaffolds, respectively. We used Miniasm assembly23 to generate a total of 11,148 contigs in 6.4 Gb, with an N50 value of 754 kb (Supplementary Table 4, Supplementary Fig. 4). We removed under-collapsed heterozygous contigs (1.2 Gb) and used Hi-C data to anchor the scaffolds into 18 pseudomolecules corresponding to the 18 chromosomes expected for the haploid genome (Fig. 1c, Supplementary Fig. 5a; Supplementary Table 5). The final assembled genome size was 5.2 Gb, in line with the estimated genome size, with a scaffold N50 of 218 Mb, L50 of 10 and L75 of 17 (Supplementary Table 4). Overall, 97.8% of raw Illumina WGS reads and 98.1% of an independent Pacific Biosciences (PacBio) dataset (not used for assembly; Supplementary Table 6) could be mapped to the assembly, suggesting that our assembly contains the complete genetic information. Gene region completeness was assessed using BUSCO (Benchmarking Universal Single-Copy Orthologs)24: 1453 out of 1612 (91.1%) conserved core land plant genes were found in our assembly, of which 1361 (84.9%) were complete (Supplementary Fig. 6a). We confidently annotated a total of 42,549 protein-coding genes, of which 90.0% had RNA-seq support or functional annotation (see Supplementary Tables 7, 8, Supplementary Note 1-2 and Supplementary Data 1–4 for genes of interest), which included information on Gene Ontology (GO) terms, protein domain information, putative pathways and enzyme function. Cytosine methylation was inferred from Nanopore sequencing data (Fig. 1c, Supplementary Figs. 7, 8).
A burst in transposon activity preceded adaptive radiation
We manually characterised transposable elements (TEs) in the genome following a protocol25 for TE classification that characterises TEs through a combination of homology-based prediction and manual inspection to find structural motifs and define sequence boundaries. To this end, we created a species-specific database of TEs for this orchid genome, containing a total of 436 sequences specific to O. sphegodes. Using this database, we identified a total of 4.05 Gb repetitive elements, occupying 78% of the O. sphegodes genome (Fig. 1c, Supplementary Table 9). Overall, O. sphegodes represents the largest orchid genome assembled to date and exhibits the highest abundance of long terminal repeat (LTR) elements (75% of the genome), more than Vanilla planifolia (10%)26, Apostasia shenzenica (17%)27, Dendrobium catenatum (40%)27, Phalaenopsis equestris (44%)27, Gastrodia elata (55%)28, Cymbidium sinense (55%)29, and similar to Platanthera guangdongensis and P. zijinensis (73% and 72%30; Supplementary Table 10). TE activity is known to influence genome size variation in eukaryotes31, and LTR/Gypsy and LTR/Copia in particular have previously been shown to correlate with genome size expansion in orchids32,33. We conducted an analysis of LTR insertion age based on the idea that both LTR sequences of a TE are identical at the time of insertion, but will diverge over time as mutations accumulate. In O. sphegodes, analysis of recent LTR insertions showed that LTR activity had an initial increase at around 3 Ma ago, to reach its maximum at around 1.3 to 0.8 Ma (Fig. 1d). During this period, the Mediterranean Basin experienced climatic oscillations with glacial/interglacial periods34,35. It is conceivable that such environmental disturbances36,37,38 might have led to bursts of TE proliferation in O. sphegodes, thus inflating its genome size. Interestingly, the peak of LTR element insertions precedes or overlaps with the radiation of the most species-rich Ophrys lineage, including the O. sphegodes species group, less than 1.0 Ma ago19. The ability to cope with genome size changes has allowed angiosperms to successfully diversify39 and TEs have played an important role in enhancing angiosperm evolution40 through their effects upon gene expression37, as well as gene duplications and genomic rearrangements41,42. Since TEs often carry transcription factor binding sites, TE expansion can rewire existing metabolic networks and facilitate the evolution of new compounds (or mixtures), as was shown for the evolution of nicotine biosynthesis in tobacco43. Thus, TE bursts can contribute to the generation of intraspecific genetic and metabolic diversity44. Since changes in pollinator-attractive hydrocarbon compounds are suspected to be involved early in speciation in the O. sphegodes lineage13, it is tempting to speculate that TE bursts may have provided this lineage with the genetic capacity to adapt to a new pollinator niche (i.e., Andrena bees), thereby facilitating the adaptive radiation of the O. sphegodes group.
Genome evolution through chromosome fusions in the Ophrys lineage
To understand the evolutionary history of the Ophrys lineage, we constructed a phylogenomic tree and estimated divergence times across O. sphegodes and 20 other plant species with fully sequenced genomes, based on single-copy orthologues. Ophrys diverged from Platanthera (both from Orchidoideae subtribe Orchidinae), the most closely related orchid with a fully sequenced genome30,45,46, approximately 22.42 Ma ago with a 95% confidence interval (CI) of 20.77–23.95 Ma, and Orchidoideae separated from other orchids around 54.49 Ma (CI 53.00–56.15 Ma; Fig. 2a, Supplementary Fig. 9). Our analysis further suggests that Orchidaceae separated from the common ancestor of Asparagales approximately 99.96 Ma ago (CI 98.66–105.66 Ma) and places monocot/eudicot divergence around 147.51 Ma (CI 139.87–154.45 Ma), in line with previous studies47,48. Our age estimates for orchids, while in line with results by Kim et al.49, are younger when compared with previous orchid genome studies27,30.
a Phylogenomic tree showing estimated divergence time and evolution of gene families for 21 plant species, colour indicating different plant groups (see inset). Divergence times are indicated by light blue bars at the internodes; their ranges indicate the 95% confidence intervals (CI) of the divergence time. Numbers at the branches indicate the number of expanded (blue) and contracted (red) gene families in the lineages. Bubbles at the tips indicate expanded (blue) and contracted (red) gene families per species (see Supplementary Table 11 and Supplementary Fig. 9 for CI intervals and number of expanded/contracted gene families). b Genome comparison between O. sphegodes and P. zijinensis. Chromosome comparison shows a high degree of collinearity and some chromosome rearrangements (see also Supplementary Fig. 10).
To track chromosome evolution of O. sphegodes, we compared it with the most closely related sequenced orchid genomes30, focusing on the comparison with Platanthera zijinensis (Fig. 2b). The two genera differ in their chromosome numbers, with karyotype organisation of Platanthera (n = 21)50 reflecting the ancestral and Ophrys (n = 18) the derived state45,51. Overall, most chromosomes maintained their structure between Platanthera and Ophrys, but some major rearrangements are apparent, particularly with regard to chromosome fusions. Chromosome (chr) 4 in O. sphegodes appears to be the product of a fusion between chr 7 and part of chr 4 in Platanthera, and Ophrys chr 10 derives from a fusion between Platanthera chr 15 and part of chr 14. Moreover, Platanthera chr 8 has no homologous chromosome in Ophrys, and significant parts of Platanthera chr 3, 4, 11, 14, 20 and 21 lack syntenic regions in O. sphegodes (Fig. 2b and Supplementary Fig. 10). Taken together, these findings are consistent with a reduction in chromosome number via fusions in the Ophrys lineage.
Expanded gene families include genes involved in flower development
Sexual deception is not restricted to the Euro-Mediterranean genus Ophrys, but is a worldwide phenomenon. It has originated several times independently and there are many examples among Australian orchids, whereas only a few non-orchid cases, such as the South African daisy Gorteria diffusa52, are known. Although specific pollinator interactions mediated by floral chemistry are a common theme in sexual deception53,54, it remains unknown why this pollination strategy occurs predominantly among orchids and what allowed its repeated evolution in this family. To gain insights into the genomic basis of sexual deception in Ophrys, we first identified orthologous gene families using OrthoFinder55. We identified a total of 495,819 genes in 26,709 orthogroups among the 21 species, of which 3054 are shared among all species. A total of 1351 families containing 4537 genes were unique to O. sphegodes. We then identified expanded and contracted gene families using CAFE56 (Fig. 2a). In O. sphegodes, 3712 gene families underwent an expansion, whereas 756 underwent a contraction. This is the highest level of gene family expansion reported among orchids to date, followed by V. planifolia (+3248) and D. catenatum (+1817). Of those gene families, 291 and 59 exhibited significant (p-value ≤ 0.01) expansions and contractions, respectively. Among the significantly expanded gene families were genes encoding transcription factors (TFs) involved in plant reproduction and flower development (but also other processes, e.g., stress responses), such as MADS-domain TFs (55 genes encoding type I and 6 genes of type II MADS; Supplementary Figs. 11, 12), MYB TFs (73 genes, of which 32 MYB-P; Supplementary Fig. 13), LATERAL ORGAN BOUNDARIES (LOB) domain TFs (10 genes), C2C2-GATA TFs (12 genes), WRKY TFs (11 genes) involved for instance in trichome development57, SERINE CARBOXYPEPTIDASE-LIKE-1 (SCPL-1) proteins (15 genes) controlling anthocyanin acylation58, TCP TFs (14 genes) regulating flavonoid biosynthesis and floral symmetry59,60, and YABBY TFs (6 genes) involved in establishing adaxial-abaxial polarity61. The configuration of MADS-box genes putatively involved in perianth specification in Ophrys appears similar to other orchids (Supplementary Fig. 12), including the related Orchis italica62, suggesting that the stark difference in their flowers likely results from the action of downstream genes. Secondly, disease resistance-related and stress response genes were found, such as NAC TFs (30 genes), glutathione S-transferases (28 genes) and plant PLEIOTROPIC DRUG RESISTANCE (PDR) proteins (10 genes). Finally, two previously identified putative candidate gene families for floral odour production and anthocyanin biosynthesis63 showed significant expansion too, namely fatty acyl-CoA reductases (FARs; Fig. 3) and chalcone synthases (CHSs; Supplementary Fig. 14), also involved in defence response (29 and 14 genes, respectively).
a Schematic representation of the core pathway showing the biosynthesis of a (Z)−9-alkene from fatty acyl precursors, depicting core proteins in blue boxes. b Gene copy numbers in the Ophrys (Osph, blue boxes) and other orchid genomes (green boxes), abbreviated by first letter of the genus and the first two letters of the species (as in Supplementary Table 11). Gene copy numbers were estimated by tallying the orchid members of orthogroups containing functionally annotated hydrocarbon biosynthetic genes. Gene families expanded in Ophrys are shown with a red outline. c Heatmaps of RNA-seq gene expression (green to blue; grey, not expressed) for O. sphegodes gene copies, showing expression for unpollinated mature flower labella of O. sphegodes (SPH) and O. exaltata (EXA). Numbers in red indicate significantly differentially expressed genes (FDR < 0.05).
Plant adaptation to pollinators in Ophrys may involve the local duplication of candidate genes
The key component for pollinator attraction in Ophrys species is the chemical mimicry of the pollinator female’s sex pheromone, the composition of which has previously been characterised for O. sphegodes s.l.10,64. Alkene hydrocarbons are especially important, as different proportions of (Z)−12-alkenes, (Z)−9-alkenes and (Z)−7-alkenes are the major odour differences between O. sphegodes and the closely related O. exaltata, responsible for attracting different pollinators65,66. Thus, genes involved in hydrocarbon biosynthesis are likely important for pollinator attraction63. We annotated previously identified candidate genes63 in the genome of O. sphegodes (Fig. 3, Supplementary Figs. 14 and 15, Supplementary Table 7, Supplementary Note 1 and Supplementary Data 1). Among these, structural annotation of stearoyl-ACP desaturases (SADs) showed that the key genes, SAD1 and SAD2 (SAD2-type), are duplicated in tandem in a single cluster on chromosome 4 (283.17–283.30 Mb; containing four copies; Fig. 4d), whereas the house-keeping desaturase SAD3 67 resides in one copy in chromosome 5 (280.45–280.47 Mb) and SAD4 (SAD5-type) is also present as a single copy (scaffold 75: 1.04–1.11 Mb; Supplementary Table 7 and Supplementary Data 1). For other O. exaltata SAD5-type genes, we identified four full-length copies in the O. sphegodes genome (at least two of them on the same scaffold 210), although none of these appeared to be a functional copy (Supplementary Table 7 and Supplementary Data 1). This is in line with previous findings that O. sphegodes only expresses functional SAD2-type alleles, while functional SAD5-type alleles are expressed in O. exaltata66,67. It is also consistent with gene expression patterns revealed by RNA-seq data (Supplementary Fig. 16). Phylogenetic analysis confirms the presence of separate SAD gene lineages corresponding to SAD3, SAD2-type (SAD1/2/7/8), and SAD5-type (SAD4/5/6/9/10, an incomplete SAD11) genes (Fig. 4a). Both SAD2- and SAD5-type lineages appear as single copies in Platanthera, suggesting they underwent recent gene duplication events. Of note, SAD2 homologues were only present in orchid genomes from the subfamily Orchidoideae.
a SAD gene family tree. b FAD gene family tree. c FAR gene family tree. Colour shading for taxa as in Fig. 2a. O. sphegodes genes are shown in bold, branch thickness indicating bootstrap support. The first three or four letters of each gene sequence indicate species (as in Supplementary Table 11), where Osph indicates O. sphegodes; Pgu and Pzi indicate Platanthera guangdongensis and P. zijinensis, respectively. Black bars highlight clustered genes shown in subsequent Figure panels. d SAD2 gene cluster on chr 4, containing SAD1/2/8/7Ψ (gene details in Supplementary Table 7 and Supplementary Data 1). e Details of a cluster on chr 1 containing 3 FAD homologues. f Details of scaffold 578 with 6 FAR gene homologues. Gene models of interest are drawn in red.
Like SADs, membrane-bound fatty acid desaturases (FADs)68, which contribute to fatty acid desaturation and, potentially, alkene production69, were also found duplicated in the O. sphegodes genome (Fig. 3). We found a phylogenetic lineage containing four FAD gene copies, of which at least three reside on chromosome 1 (368.58–368.63 Mb; Fig. 4e), a second lineage of two copies clustered on scaffold 33, while one gene, FAD2, was not duplicated. Both lineages showing gene duplications are present as single copies in the Platanthera genome (Fig. 4b).
Fatty acyl-CoA reductase (FAR) homologues show gene duplications in Ophrys. FARs likely catalyse the conversion of fatty acyl-CoA to primary alcohols and different FARs produce fatty alcohols with different acyl chain lengths70. Fifteen out of the 16 FAR homologues found in the O. sphegodes genome form a phylogenetic Orchidoideae gene clade together with a single Platanthera sequence. Of these, 14 genes are distributed over only three scaffolds (Fig. 4c; chr 5: 1; scaffold 279: 4; scaffold 133: 4; scaffold 578: 6). Six of these clustered FAR genes reside in one region of 109 kb on scaffold 578 (Fig. 4f). The close vicinity of these copies in a unique clade shows a recent duplication event in the O. sphegodes lineage (Fig. 4c). Gene duplication plays a crucial role in shaping the evolutionary landscape of genomes, as they provide the main raw material for the evolution of new genes71. Single or tandem gene duplications are also involved in the origin of many plant genes72. Often, retention of duplicate genes occurs non-randomly, as changes in the concentration of gene products can have a selective advantage for the organism73. These genes may provide Ophrys with an opportunity to respond to selection by pollinators, e.g., through positive dosage effects or neofunctionalisation and pseudogenisation of the less effective variants74.
Population genomic analyses reveal putative barrier loci under pollinator-driven selection
Closely related Ophrys species provide plausible examples of pollinator driven-speciation13,18. We therefore investigated the genetic differentiation between O. sphegodes and three other closely related, sympatric co-flowering species, O. exaltata, O. garganica and O. incubacea in Gargano, Southern Italy12,13. These four species are pollinated by sexual deception of different solitary bee males: Andrena nigroaenea, Colletes cunicularius, A. pilipes and A. morio, respectively75. These Ophrys species show variation in floral traits, ranging from labellum coloration (markedly blacker in O. garganica and O. incubacea13), to different floral odours mimicking their pollinators’ sex pheromones, which make a major contribution to reproductive isolation12,13,66,76. We used Genotyping-By-Sequencing (GBS) data13 to investigate the genetic differentiation between species and scanned the genome for signatures of selection. Chord between-population genetic distance77 was calculated per 1 Mb window to identify the most similar/dissimilar species at a given chromosomal region in the O. sphegodes genome. This analysis revealed that segregating polymorphisms between species are distributed across the genome, and that overall genetic (dis)similarity between O. sphegodes and the three other species is roughly equal (Fig. 5a, Supplementary Fig. 17), as may be expected in a species radiation. Yet, cumulatively, the four species were clearly separable in a principal coordinate analysis (PCoA; Fig. 5b; Supplementary Fig. 18 for individual chromosomes). These findings are in line with previous population analyses, suggesting that many polymorphisms in the genome are shared among all species13, whilst few barrier loci may separate them. Furthermore, global FST outlier analysis (FDR < 0.01) of a GBS dataset from 126 individuals13 revealed a highly differentiated region in chr 2 (333–352 Mb; Figs. 1c and 4c; Supplementary Fig. 19). Interestingly, in contrast to the genome-wide pattern, especially O. sphegodes and O. exaltata are clearly separated in the highly differentiated ~20 Mb region on chr 2, as seen by PCoA of this region (Fig. 5d). In an independent FST analysis based on RNA-seq data, this genomic region was confirmed as being differentiated between these species, although this dataset offers denser sampling of the genome and yielded a somewhat larger interval (327 – 358 Mb, 31 Mb in length; 0.6% of the genome; Supplementary Fig. 19 and Supplementary Data 5). Whether this pattern of differentiation is due to divergent selection and suppression of effective recombination via, e.g., divergence hitchhiking or an inversion, is currently unknown. Since the differentiated region of chr 2 did not contain any a priori candidate genes for hydrocarbon or pigment biosynthesis (Supplementary Table 7 and Supplementary Data 1, 2), we performed a GO enrichment analysis (Supplementary Data 6). This revealed term GO:1903415 (‘flavonoid transport from endoplasmic reticulum to plant-type vacuole’) as the most significantly enriched ‘biological process’ category. Three genes (Osph2G68830, Osph2G68850 and Osph2G68960) with this term are homologues of Arabidopsis TT9 (AT3G28430), a peripheral membrane protein necessary for vacuole development and the accumulation of anthocyanins in the vacuole78. Additionally, we screened genes in this region for (i) elevated FST between O. exaltata and O. sphegodes, (ii) annotated transcription factors, (iii) genes with excess amino acid change (i.e., excess non-synonymous nucleotide diversity), and (iv) differential floral labellum gene expression between the two species (Supplementary Fig. 19). Only 7 genes were found in more than one screen (Supplementary Fig. 19d, Supplementary Data 7), three of which had useful annotation information that revealed one AP2 and one B3-ARF TF (Osph2G66470 and Osph2G63210, respectively) as well as one ubiquitin conjugating enzyme (Osph2G63100). While it remains to be established whether any of these genes are directly linked to differences between the two Ophrys species, currently it seems more likely that regulatory rather than biosynthetic genes contribute to this genomic region of differentiation. Overall, using the metaphor of genomic islands of speciation79, these results suggest that adaptation involving individual loci (or small genomic segments) rather than entire genomes, characterises these reproductively isolated sympatric Ophrys species. The overall high level of allele sharing and the genomic mosaic pattern of species relationships (without long contiguous stretches of similar relationships) cannot easily be explained by gene flow after a secondary contact of species that are separated by strong floral isolation12. An alternative explanation could be that the high level of segregating polymorphisms is due to shared ancestry and large effective population sizes, suggesting that the species are in an early stage of genomic divergence11,12,65,79. Our population genomic analyses identified candidate regions potentially under pollinator-driven selection, thus calling for future research into the roles and functions of the genes in these regions.
a Genetic distance among the different species of the O. sphegodes s.l. group plotted along the genome, colours showing the most genetically dissimilar species for a given 1 Mb window (see Supplementary Fig. 17 for genetic similarity). b Genome-wide PCoA plot of pairwise distances between plant individuals revealed genetic separation between the four species. c Global FST among the 4 species was elevated in one region on chr 2 (327–346 Mb, shown as black bar). d PCoA of this region of chr 2 shows increased separation between O. sphegodes and O. exaltata. Colours show the most dissimilar species in panels a and c (grey: no data), and species identity in panels b and d, where O. sphegodes is shown in blue; O. exaltata, O. garganica and O. incubacea are shown in red, green and yellow, respectively.
The chromosome-level genome sequence of O. sphegodes, a plant with high pollinator specificity, provides important insights into the evolution of plant adaptation to pollinators and its role in species diversification. It seems that, ecologically, the recent adaptive radiation of this group was fuelled by the availability of pollinator niches and, genetically, a burst in TE activity and rampant gene duplication provided the genetic raw material for pollinator-mediated selection to act upon. These mechanisms may also provide a blueprint for other angiosperm radiations, as floral trait-mediated reproductive isolation may often be underlain by a simple genetic architecture2. More broadly, this case study supports the idea that the generation of genomic diversity, as genomic potential in the form of genome duplication, hybridisation or TE activity, often precedes adaptive radiation.
Methods
Sample preparation
The single O. sphegodes individual for the genome assembly (accession SPH_8) was selected among several samples previously collected in Capoiale, Gargano area, Southern Italy12, and grown in a pot at the Dept. of Biology, University of Naples Federico II, under natural light at ambient temperature. Fresh young leaves were harvested, snap-frozen, and high-molecular-weight genomic DNA was isolated with an SDS lysis buffer supplemented with β-mercaptoethanol, followed by purification via phenol/chloroform extraction and carboxylated magnetic beads80.
Genome size estimation
Nuclear DNA content and thus genome size was estimated via flow cytometry. Since genome size estimation via flow cytometry from orchid leaves is challenging due to DNA endoreduplication81, we used pollinia (carrying haploid pollen) for the analysis. We followed a previously published protocol82 with slight modifications. Briefly, a pair of pollinia was crushed in a 1.5 ml tube with Otto I buffer using a clean pestle, transferred to a Petri dish, and co-chopped with 2 × 2 cm tissue of a reference leaf (Solanum lycopersicum cultivar ‘Stupicke polni tyckove rane’; tomato 1C = 0.98 pg, as measured in ref. 83). The suspension was filtered, mixed with Otto II buffer, and stained with propidium iodide in the dark at 4 °C for 1 hour. At least 10,000 nuclei were analysed on a Cytoflex S (Beckman Coulter) flow cytometer. The average fluorescence value (Mean PerCP-A, see Supplementary Fig. 1) of reference 2C nuclei was 4,207,840; the average fluorescence for O. sphegodes 1C nuclei was 10,506,736 (Supplementary Table 1). Thus, following a published formula84, and converting picograms of DNA to number of nucleotide pairs85, we estimated a haploid genome size of 4.83 Gb (average of three measurements, see Supplementary Table 1).
Nanopore library construction, sequencing, and genome assembly
DNA isolated from accession SPH_8 was used to prepare two Illumina and eight ONT libraries (Supplementary Table 2), following the general guidelines provided by Oxford Nanopore Technologies® for the 1D Genomic DNA by Ligation (SQK-LSQ109) protocol, with modifications proposed by New England Biolabs® (NEB)80. Six ONT libraries were sequenced on a PromethION PTC0031 platform (Oxford Nanopore Technologies), and two on MinION Mk1B. All sequencing was performed at Agroscope Wädenswil, Switzerland. We used Miniasm v0.3 and Minimap2 v2.1723 to assemble the initial contigs. Pilon v1.2386 and Racon v1.4.387 were used to correct indels and mis-assemblies with Illumina reads (Supplementary Table 3 and Supplementary Method 1). To deal with the heterozygosity of the O. sphegodes genome, we used Redundans v0.1188 to remove under-collapsed contigs and obtain a final assembled genome size of 5.2 Gb (see Supplementary Method 1). Raw Nanopore sequencing data were subsequently re-basecalled with Guppy v5.0.11, and Nanopolish v0.13.3 was used to detect 5-methylcytosine (see Supplementary Method 2).
Hi-C library preparation and assembly scaffolding
A total of 0.5 g leaf tissue were fixed in nuclei isolation buffer and 36% formaldehyde, followed by cell lysis, chromatin digestion with the enzyme HindIII-HF (New England Biolabs, Ipswich, MA, USA, R3104L), re-ligation, DNA re-extraction and library preparation89,90 as detailed in Supplementary Method 1. Four Hi-C libraries were sequenced on an Illumina NovaSeq 6000 SP FlowCell. To assemble the chromosomes, raw Hi-C data was mapped against the genome assembly using the ArimaGenomics mapping script (https://github.com/ArimaGenomics/mapping_pipeline/blob/master/01_mapping_arima.sh). Subsequently, Salsa v2.391,92 was used to build scaffolds (total scaffolds 2511, N50 = 4.8 Mb); Juicebox Assembly Tools (JBAT)93 and the 3D-DNA pipeline v-18011494 were used to order scaffolds to chromosomes.
Gene and non-coding RNA annotation
Gene annotation was carried out via an integrative approach that included de novo gene prediction, homology-based prediction and transcriptome-based prediction. AUGUSTUS v3.4.0 was used as part of the BRAKER2 v2.1.6 pipeline95 and trained with RNA-seq data20 from floral tissues to predict coding regions in the repeat-masked genome. ProtHint v2.6.0 (https://github.com/gatech-genemark/ProtHint) was used to generate protein hints using the protein database liliopsida_odb10-v.2020-09-10 from OrthoDB10, to score intron intervals, start and stop codons from ultra-conserved proteins of the monocot lineage. This extrinsic evidence was given to GeneMark-EX v4.6496 to self-train, and improve prediction accuracy of AUGUSTUS. Proteomes from P. equestris27, C. goeringii97, O. sativa ssp. japonica cultivar Nipponbare IRGSP-1.098, and A. officinalis Aspof.V199 were downloaded and used as input for GeMoMa v1.8100 for homology-based prediction. The transcriptome of O. sphegodes20 was mapped against the genome using PASA v2.5.1 (Program to Assemble Spliced Alignment; https://github.com/PASApipeline/PASApipeline/blob/master/docs/index.asciidoc). Valid transcript alignments were clustered based on genome mapping location and assembled into gene structures. PASA assemblies were incorporated into gene predictions to correct exon boundaries, add UTRs, and update gene structures. AHRD v3.3.3 (https://github.com/groupschoof/AHRD) was used to filter genes based on SwissProt, trEMBL and TAIR10 databases, and candidate genes for pollinator attraction from previous studies were manually included. Functional annotation was carried out using TRAPID v2.0101 for Gene Onthology (GO) terms, InterPro-Scan, KEGG, and iTAK Classifier102 for transcription factors and protein kinases. Genes for tRNA and rRNA were identified using tRNAscan-SE v2.0.9103 and Barrmap v0.9 (https://github.com/tseemann/barrnap; see Supplementary Method 3 for details).
Repetitive element annotation
We used a published approach25 to classify and manually annotate repetitive elements in the genome. Putative TEs identified with RepeatModeler v2.0.1 were blasted against the TRansposable Elements Platform (TREP) database (v.2019, https://trep-db.uzh.ch/). Sequences with strong hits to a known family were selected (plus their flanking regions) and blasted against the O. sphegodes genome to identify other copies belonging to the same TE family. Multiple sequence alignments between those clusters of sequences were performed to create consensus TE sequences, and to search for structural motifs that could help classify them. Sequences with no strong hits to a known family were blasted against PTREP (Protein TRansposable Elements Platform) or Dfam (https://dfam.org) or examined for the presence of structural motifs to identify TEs in the genome. All TE sequences were classified according to a three-letter code25.
Analysis of LTR divergence and insertion time
LTR_harvest v2.0.0104 was used to identify LTRs with complete structure (-xdrop 37 -motif tgca -motifmis 1 -minlenltr 100 -maxlenltr 3000 -mintsd 2), expected to be detectable for young LTR copies. The insertion time was estimated based on the sequence divergence between two LTRs of the same type. 5’ and 3’ LTR of each full-length paralogue were aligned using Mafft105 and divergence between them was estimated using the Kimura two-parameter model using the distmat program implemented in the EMBOSS package v6.6.0 (https://emboss.sourceforge.net/). LTRs insertion age was estimated using a substitution rate of 1.3 × 10−8 substitutions per site per year106.
Phylogenomic analysis and estimation of divergence time
The protein-coding regions of 21 plant species with a sequenced genome (Supplementary Tables 11 and 12) were used for phylogenomic analysis. Orthologue groups were identified using OrthoFinder v2.5.455, using Mafft for multisequence alignment and fastTree107 for building the trees. Among 526,955 genes from the 21 species, 495,819 (92.1%) were assigned into 26,709 orthologue groups. The concatenated supermatrix of 34 single-copy nuclear genes was used to infer phylogenetic relationships using the GTR-GAMMA model with 1000 bootstrap replicates in RAxML v8.2.11108. The same supermatrix and corresponding concatenated ML tree were used for dating analysis. We estimated divergence times using the penalised likelihood method implemented in TreePL v1.0109. We considered 83.78 Ma and 114.92 Ma as the minimum- and maximum-age calibrations of the stem age of orchids49. We used an optimal smoothing parameter determined by the “random subsample and replicate” cross-validation method to accommodate rate heterogeneity. A total of 1000 bootstrap replicates with the topology fixed to the concatenated ML tree and branch lengths allowed to vary were also generated using RAxML for calculating the confidence intervals of age estimates. Results from the dating of the bootstrapped trees were then summarised and visualised on the concatenated ML tree using TreeAnnotator (part of the BEAST2 v2.7.0 package110).
We used CAFE v5.0.056 to identify gene family contractions and expansions, using the reconstructed species tree. Only gene families were kept in which among-species difference in size was less than 100. First, nucleotide sequences of the previously identified single-copy orthologues were translated into amino acids with TranslatorX v1.1 within the MitoPhAST-master v3.0 package111, and then given as input to MUSCLE v5.1.0112 for multiple sequence alignment (MSA). The resulting MSAs were trimmed with trimAL v1.2113 (parameter -automated1 to select optimal cut-off based on alignment similarity) to remove poorly aligned regions or spurious sequences, and the best-fit model for each tree was computed with ModelTest-NG v0.1.7114 (parameters -f ef -h f -s 3 -tr phyml). The final trees were built with phyml v3.3.20220408 (parameters -d nt -c 4 -m AIC -f modelfreq -o modelparam) (https://github.com/stephaneguindon/phyml) using the parameters from ModelTest-NG and 1001 bootstrap replicates.
Comparative genomic analyses
Synteny analyses between O. sphegodes and P. guangdongensis and P. zijinensis were performed using the MCScan115 functions of the JCVI utility libraries v1.2.9 (https://github.com/tanghaibao/jcvi). First, jcvi.compara.catalog ortholog (--min_size 5 -dist 35) used LAST v2.34116 to identify syntenic blocks between the O. sphegodes genome and another genome (see also Supplementary Table 13). Second, jcvi.compara.synteny screen (--minspan 30 -simple) was used to create simplified versions of the anchor files containing the syntenic blocks. Finally, linear synteny plots were made with jcvi.graphics.karyotype and dot-plots with jcvi.graphics.dotplot.
Population genetic analyses
Previously published GBS data from 126 individuals13 and RNA-seq data from floral tissues of 32 accessions20 were used to analyse the genetic variation between Ophrys species. Both datasets were mapped against the genome using BWA-MEM2 v2.2.1117 with default parameters, and variants (SNPs and small InDels; Supplementary Tables 14 and 15) were identified using freebayes v1.0.2 (https://github.com/freebayes/freebayes) with minimum depth 2 (-C 2). Samtools v1.10118 was used to remove duplicated reads prior to variant calling. Raw variants (1,680,249, of which 1,621,433 biallelic GBS markers) were filtered for depth and frequency, and pairwise genetic distances (comparison matrix in Supplementary Table 16) calculated between individuals using SPA v0.1 (https://peb.uni-hohenheim.de/spa) as detailed in Supplementary Method 4. Biallelic SNPs were used with BayeScan v2.1119 to identify candidate loci under selection in a “global” manner, i.e., treating each species as a subpopulation, and in a “pairwise” fashion, comparing two species at a time. Chord distance77,120 was calculated and PCoA was carried out genome-wide and per chromosome. A detailed description of population genetic methods can be found in Supplementary Method 4 and, for analyses of chromosome 2, in Supplementary Method 5.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The genome assembly and raw sequencing data generated for this study, including ONT data, PacBio data, Illumina WGS and HiC data, were submitted to NCBI under BioProject number PRJNA994461. The Whole Genome Shotgun project was deposited at DDBJ/ENA/GenBank under the accession JBANGT000000000. The version described in this paper is version JBANGT010000000. The RNA-seq data used for genome annotation and expression analysis can be found on NCBI under accession PRJNA57427920. The GBS data used for population genetic analysis can be found in the NCBI accession PRJNA25733113. Annotation data, including protein-coding gene annotation, transposable element database and annotation, non-coding RNA, as well as the alternative haplotig fasta file can be found on figshare [https://doi.org/10.6084/m9.figshare.25398166]. Transposable element sequences were also deposited in the TREP database [https://trep-db.uzh.ch/]. Source data are provided with this paper.
References
Waser, N. M. Pollination, angiosperm speciation, and the nature of species boundaries. Oikos 82, 198–201 (1998).
Schlüter, P. M. The magic of flowers or: speciation genes and where to find them. Am. J. Bot. 105, 1957–1961 (2018).
Armbruster, W. S. Floral specialization and angiosperm diversity: phenotypic divergence, fitness trade-offs and realized pollination accuracy. Ann. Bot. Plants 6, plu003 (2014).
Gómez, J. M. & Zamora, R. Generalization vs. specialization in the pollination system of Hormathophylla spinosa (Cruciferae). Ecology 80, 796–805 (1999).
Van der Niet, T. & Johnson, S. D. Phylogenetic evidence for pollinator-driven diversification of angiosperms. Trends Ecol. Evol. 27, 353–361 (2012).
Grant, V. Pollination systems as isolating mechanisms in angiosperms. Evolution 3, 82–97 (1949).
Bouchenak-Khelladi, Y. et al. On the complexity of triggering evolutionary radiations. N. Phytol. 207, 313–326 (2015).
Seehausen, O. Process and pattern in cichlid radiations - inferences for understanding unusually high rates of evolutionary diversification. N. Phytol. 207, 304–312 (2015).
Nürk, N. M. et al. Diversification in evolutionary arenas—Assessment and synthesis. Ecol. Evol. 10, 6163–6182 (2020).
Schiestl, F. P. et al. Sex pheromone mimicry in the early spider orchid (Ophrys sphegodes): patterns of hydrocarbons as the key mechanism for pollination by sexual deception. J. Comp. Physiol A 186, 567–574 (2000).
Mant, J. et al. Cuticular hydrocarbons as sex pheromone of the bee Colletes cunicularis and the key to its mimicry by the sexually deceptive orchid, Ophrys. exaltata. J. Chem. Ecol. 31, 1765–1787 (2005).
Xu, S. et al. Floral isolation is the main reproductive barrier among closely related sexually deceptive orchids. Evolution 65, 2606–2620 (2011).
Sedeek, K. E. M. et al. Genic rather than genome-wide differences between sexually deceptive Ophrys orchids with different pollinators. Mol. Ecol. 23, 6192–6205 (2014).
Stejskal, K. et al. Functional significance of labellum pattern variation in a sexually deceptive orchid (Ophrys heldreichii): evidence of individual signature learning effects. PLoS One 10, e0142971 (2015).
Ayasse, M. et al. Evolution of reproductive strategies in the sexually deceptive orchid Ophrys sphegodes: how does flower‐specific variation of odor signals influence reproductive success? Evolution 54, 1995–2006 (2000).
Vereecken, N. J. et al. Integrating past and present studies on Ophrys pollination-a comment on Bradshaw et al. Bot. J. Linn. Soc. 165, 329–335 (2011).
Bateman, R. M. et al. Species arguments: clarifying competing concepts of species delimitation in the pseudo-copulatory orchid genus Ophrys. Bot. J. Linn. Soc. 165, 336–347 (2011).
Xu, S., Schlüter, P. M. & Schiestl, F. P. Pollinator-driven speciation in sexually deceptive orchids. Int. J. Ecol. 2012, 285081 (2012).
Breitkopf, H. et al. Multiple shifts to different pollinators fuelled rapid diversification in sexually deceptive Ophrys orchids. N. Phytol. 207, 377–389 (2015).
Piñeiro Fernández, L. et al. A phylogenomic analysis of the floral transcriptomes of sexually deceptive and rewarding European orchids, Ophrys and Gymnadenia. Front. Plant Sci. 10, 1553 (2019).
Greilhuber, J. & Ehrendorfer, F. Chromosome numbers and evolution in Ophrys (Orchidaceae). Plant Syst. Evol. 124, 125–138 (1975).
Bianco, P. et al. Karyological studies of some taxa of the genus Ophrys (Orchidaceae) from Apulia (Italy). Caryologia 42, 57–63 (1989).
Li, H. Minimap and miniasm: Fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32, 2103–2110 (2016).
Simão, F. A. et al. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Wicker, T. et al. A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 8, 973–982 (2007).
Hasing, T. et al. A phased Vanilla planifolia genome enables genetic improvement of flavour and production. Nat. Food 1, 811–819 (2020).
Zhang, G. Q. et al. The Apostasia genome and the evolution of orchids. Nature 549, 379–383 (2017).
Yuan, Y. et al. The Gastrodia elata genome provides insights into plant adaptation to heterotrophy. Nat. Commun. 9, 1625 (2018).
Yang, F. X. et al. The genome of Cymbidium sinense revealed the evolution of orchid traits. Plant Biotechnol. J. 19, 2501–2516 (2021).
Li, M. H. et al. Genomes of leafy and leafless Platanthera orchids illuminate the evolution of mycoheterotrophy. Nat. Plants 8, 373–388 (2022).
Kidwell, M. G. Transposable elements and the evolution of genome size in eukaryotes. Genetica 115, 49–63 (2002).
Chumová, Z. et al. Repeat proliferation and partial endoreplication jointly shape the patterns of genome size evolution in orchids. Plant J. 107, 511–524 (2021).
Eriksson, M. C. et al. Repeat dynamics across timescales: a perspective from sibling allotetraploid marsh orchids (Dactylorhiza majalis s.l.). Molec. Biol. Evol. 39, msac167 (2022).
Quézel, P. La flore du bassin méditerranéen: origine, mise en place, endémisme. Ecol. Mediterr. 21, 19–39 (1995).
Thompson, J. D. Plant Evolution in the Mediterranean. Insights for conservation. 2nd ed. Ch. 1 (Oxford University Press, 2020).
Chuong, E. B. et al. Regulatory activities of transposable elements: From conflicts to benefits. Nat. Rev. Genet. 18, 71–86 (2017).
Dubin, M. J. et al. Transposons: a blessing curse. Curr. Opin. Plant Biol. 42, 23–29 (2018).
Meier, J. I. et al. Cycles of fusion and fission enabled rapid parallel adaptive radiations in African cichlids. Science 381, eade2833 (2023).
Puttick, M. N. et al. Size is not everything: rates of genome size evolution, not C-value, correlate with speciation in angiosperms. Proc. R. Soc. B Biol. Sci. 282, 20152289 (2015).
Oliver, K. R. et al. Transposable elements: powerful contributors to angiosperm evolution and diversity. Genome Biol. Evol. 5, 1886–1901 (2013).
Zhao, D. et al. What makes up plant genomes: The vanishing line between transposable elements and genes. Biochim. Biophys. Acta Gene Regul. Mech. 1859, 366–380 (2016).
Balachandran, P. et al. Transposable element-mediated rearrangements are prevalent in human genomes. Nat. Commun. 13, 7115 (2022).
Xu, S. et al. Wild tobacco genomes reveal the evolution of nicotine biosynthesis. Proc. Natl Acad. Sci. USA 114, 6133–6138 (2017).
Lisch, D. How important are transposons for plant evolution? Nat. Rev. Genet. 14, 49–61 (2013).
Bateman, R. M. et al. Molecular phylogenetics and evolution of Orchidinae and selected Habenariinae (Orchidaceae). Bot. J. Linn. Soc. 142, 1–40 (2003).
Wong, D. C. J. & Peakall, R. Orchid phylotranscriptomics: the prospects of repurposing multi-tissue transcriptomes for phylogenetic analysis and beyond. Front. Plant Sci. 13, 910362 (2022).
Bell, C. D. et al. The age and diversification of the angiosperms re-revisited. Am. J. Bot. 97, 1296–1303 (2010).
Hertweck, K. L. et al. Phylogenetics, divergence times and diversification from three genomic partitions in monocots. Bot. J. Linn. Soc. 178, 375–393 (2015).
Kim, Y.-K. et al. Plastome evolution and phylogeny of Orchidaceae, with 24 new sequences. Front. Plant Sci. 11, 22 (2020).
Turco, A. et al. Contribution to the study of wild Orchidaceae, genus Platanthera L.C.M. Richard. Karyotype and C-banding analysis of two species from Italy. Caryologia 4, 11–16 (2021).
Inda, L. A. et al. Phylogenetics of tribe Orchideae (Orchidaceae: Orchidoideae) based on combined DNA matrices: inferences regarding timing of diversification and evolution of pollination syndromes. Ann. Bot. 110, 71–90 (2012).
Kellenberger, R. T. et al. Multiple gene co-options underlie the rapid evolution of sexually deceptive flowers in Gorteria diffusa. Curr. Biol. 33, 1502–1512 (2023).
Peakall, R. Pollination by sexual deception. Curr. Biol. 33, R489–R496 (2023).
Bohman, B. et al. Pollination by sexual deception - it takes chemistry to work. Curr. Opin. Plant Biol. 32, 37–46 (2016).
Emms, D. M. & Kelly, S. OrthoFinder: Phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Bio, F. et al. CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics 36, 5516–5518 (2020).
Johnson, C. S. et al. TRANSPARENT TESTA GLABRA2, a trichome and seed coat development gene of Arabidopsis, encodes a WRKY transcription factor. Plant Cell 14, 1359–1375 (2002).
Curaba, J. et al. Identification of an SCPL gene controlling anthocyanin acylation in carrot (Daucus carota L.) root. Front. Plant Sci. 10, 1770 (2020).
Corley, S. B. et al. Floral asymmetry involves an interplay between TCP and MYB transcription factors in Antirrhinum. Proc. Natl Acad. Sci. USA 102, 5068–5073 (2005).
Li, S. & Zachgo, S. TCP3 interacts with R2R3-MYB proteins, promotes flavonoid biosynthesis and negatively regulates the auxin response in Arabidopsis thaliana. Plant J. 76, 901–913 (2013).
Chen, Y. Y. et al. Genome-wide identification of YABBY genes in Orchidaceae and their expression patterns in Phalaenopsis orchid. Genes 11, 955 (2020).
Valoroso, M. C. et al. The MADS-box genes expressed in the inflorescence of Orchis italica (Orchidaceae). PLoS One 14, e0213185 (2019).
Sedeek, K. E. M. et al. Transcriptome and proteome data reveal candidate genes for pollinator attraction in sexually deceptive orchids. PLoS One 8, e64621 (2013).
Bohman, B. et al. Identification of (Z)−8-heptadecene and n-pentadecane as electrophysiologically active compounds in Ophrys insectifera and its Argogorytes pollinator. Int. J. Mol. Sci. 21, 620 (2020).
Schlüter, P. M. et al. Stearoyl-acyl carrier protein desaturases are associated with floral isolation in sexually deceptive orchids. Proc. Natl Acad. Sci. USA 108, 5696–5701 (2011).
Xu, S. et al. The genetic basis of pollinator adaptation in a sexually deceptive orchid. PLoS Genet 8, e1002889 (2012).
Sedeek, K. E. M. et al. Amino acid change in an orchid desaturase enables mimicry of the pollinator’s sex pheromone. Curr. Biol. 26, 1505–1511 (2016).
Shanklin, J. & Cahoon, E. B. Desaturation and related modifications of fatty acids. Annu. Rev. Plant Physiol. Plant Mol. Biol. 49, 611–641 (1998).
Perera, M. A. D. N. et al. Biological origins of normal-chain hydrocarbons: a pathway model based on cuticular wax analyses of maize silks. Plant J. 64, 618–632 (2010).
Wang, M. et al. Five fatty acyl-coenzyme A reductases are involved in the biosynthesis of primary alcohols in Aegilops tauschii leaves. Front. Plant Sci. 8, 1012 (2017).
Ohno, S. Evolution by gene duplication (Springer, 1970).
Magadum, S. et al. Gene duplication as a major force in evolution. J. Genet. 92, 155–161 (2013).
Innan, H. & Kondrashov, F. The evolution of gene duplications: classifying and distinguishing between models. Nat. Rev. Genet. 11, 97–108 (2010).
Francino, M. P. An adaptive radiation model for the origin of new gene functions. Nat. Genet. 37, 573–578 (2005).
Paulus, H. & Gack, C. Pollinators as prepollinating isolation factors: evolution and speciation in Ophrys (Orchidaceae). Isr. J. Bot. 39, 43–79 (1990).
Schiestl, F. P. & Schlüter, P. M. Floral isolation, specialized pollination, and pollinator behavior in orchids. Annu. Rev. Entomol. 54, 425–446 (2009).
Cavalli-Sforza, L. L. & Edwards, A. W. F. Phylogenetic analysis models and estimation procedures. Evolution 21, 550–570 (1967).
Ichino, T. et al. GFS9/TT9 contributes to intracellular membrane trafficking and flavonoid accumulation in Arabidopsis thaliana. Plant J. 80, 410–423 (2014).
Nosil, P. & Feder, J. L. Genomic divergence during speciation: causes and consequences. Philos. Trans. R. Soc. B Biol. Sci. 367, 332–342 (2012).
Russo, A. et al. Low-input high-molecular-weight DNA extraction for long-read sequencing from plants of diverse families. Front. Plant Sci. 13, 883897 (2022).
Trávníček, P. et al. Challenges of flow-cytometric estimation of nuclear genome size in orchids, a plant group with both whole-genome and progressively partial endoreplication. Cytom. Part A 87, 958–966 (2015).
Temsch, E. M. et al. Heavy metal pollution, selection, and genome size: the species of the Žerjav study revisited with flow cytometry. J. Bot. 2010, 596542 (2010).
Doležel, J. et al. Comparison of three DNA fluorochromes for flow cytometric estimation of nuclear DNA content in plants. Physiol. Plant. 85, 625–631 (1992).
Doležel, J. et al. Estimation of nuclear DNA content in plants using flow cytometry. Nat. Protoc. 2, 2233–2244 (2007).
Doležel, J. et al. Nuclear DNA content and genome size of trout and human. Cytometry 51A, 127–128 (2003).
Walker, B. J. et al. Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963 (2014).
Nagarajan, N. et al. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).
Pryszcz, L. P. & Gabaldón, T. Redundans: An assembly pipeline for highly heterozygous genomes. Nucleic Acids Res. 44, e113 (2016).
Grob, S. et al. Hi-C analysis in Arabidopsis identifies the KNOT, a structure with similarities to the flamenco locus of Drosophila. Mol. Cell 55, 678–693 (2014).
Liu, C. In situ Hi-C library preparation for plants to study their three-dimensional chromatin interactions on a genome-wide scale. Methods Mol. Biol. 1629, 155–166 (2017).
Ghurye, J. et al. Scaffolding of long read assemblies using long range contact information. BMC Genomics 18, 527 (2017).
Ghurye, J. et al. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. PLOS Comput. Biol. 15, e1007273 (2019).
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Hoff, K. J. et al. Whole-genome annotation with BRAKER. Methods Mol. Biol. 1962, 65–95 (2019).
Brůna, T. et al. GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genomics Bioinforma. 2, lqaa026 (2020).
Chung, O. et al. A chromosome-scale genome assembly and annotation of the spring orchid (Cymbidium goeringii). Mol. Ecol. Resour. 22, 1168–1177 (2021).
Kawahara, Y. et al. Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data. Rice 6, 3–10 (2013).
Harkess, A. et al. The asparagus genome sheds light on the origin and evolution of a young Y chromosome. Nat. Commun. 8, 1279 (2017).
Keilwagen, J. et al. Using intron position conservation for homology-based gene prediction. Nucleic Acids Res. 44, e89–e89 (2016).
Bucchini, F. et al. TRAPID 2.0: a web application for taxonomic and functional analysis of de novo transcriptomes. Nucleic Acids Res. 49, e101–e101 (2021).
Zheng, Y. et al. iTAK: A program for genome-wide prediction and classification of plant transcription factors, transcriptional regulators, and protein kinases. Mol. Plant 9, 1667–1670 (2016).
Chan, P. P. et al. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 49, 9077–9096 (2021).
Ellinghaus, D. et al. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinforma. 9, 18 (2008).
Katoh, K. et al. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002).
Ma, J. & Bennetzen, J. L. Rapid recent growth and divergence of rice nuclear genomes. Proc. Natl Acad. Sci. USA 101, 12404–12410 (2004).
Price, M. N. et al. FastTree 2-Approximately maximum-likelihood trees for large alignments. PLoS One 5, e9490 (2010).
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Smith, S. A. & O’Meara, B. C. treePL: divergence time estimation using penalized likelihood for large phylogenies. Bioinformatics 28, 2689–2690 (2012).
Bouckaert, R. et al. BEAST 2: A software platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 10, e1003537 (2014).
Tan, M. H. et al. MitoPhAST, a new automated mitogenomic phylogeny tool in the post-genomic era with a case study of 89 decapod mitogenomes including eight new freshwater crayfish mitogenomes. Mol. Phylogenet. Evol. 85, 180–188 (2015).
Edgar, R. C. MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinforma. 5, 113 (2004).
Capella-Gutiérrez, S. et al. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
Darriba, D. et al. ModelTest-NG: A new and scalable tool for the selection of DNA and protein evolutionary models. Mol. Biol. Evol. 37, 291–294 (2020).
Tang, H. et al. Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps. Genome Res. 18, 1944–1954 (2008).
Kiełbasa, S. M. et al. Adaptive seeds tame genomic sequence comparison. Genome Res. 21, 487 (2011).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, gia008 (2021).
Fischer, M. C. et al. Enhanced AFLP genome scans detect local adaptation in high-altitude populations of a small rodent (Microtus arvalis). Mol. Ecol. 20, 1450–1462 (2011).
Takezaki, N. & Nei, M. Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA. Genetics 144, 389–399 (1996).
Acknowledgements
We are grateful to Martin Mascher and Chang Liu for fruitful discussions about the project, to Yangzi Wang and Jhih-Sheng Liu for assistance with analyses, and to Noa Schwabe for kindly providing pictures of Ophrys and Andrena. This work was supported by the University of Zurich, a Ph.D. project funded by the University Research Priority Program “Evolution in Action: from Genomes to Ecosystems” (to U.G. and P.M.S.), the Swiss National Science Foundation (SNSF; grant 31003A_155943 to P.M.S. and grant IZLRZ3_163885 to U.G.), the German Research Foundation (DFG; project 446145319 to P.M.S.) and the 2022 PRIN Program (to S.C.). This work is set within the framework of the “Laboratoires d’Excellences (LABEX)” TULIP (ANR-10-LABX-41) and of the “École Universitaire de Recherche (EUR)” TULIP-GS (ANR-18-EURE-0019). This work made use of infrastructure services provided by the Science IT team of the University of Zurich (www.s3it.uzh.ch) and was supported via bwHPC by the High Performance and Cloud Computing Group at the Zentrum für Datenverarbeitung of the University of Tübingen, state of Baden-Württemberg, and the DFG (grant INST 37/935-1 FUGG).
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Contributions
A.R., P.M.S. and U.G. conceived and designed this study. A.R., P.M.S. and S.C. provided Ophrys samples. A.R. performed genome size estimation, gDNA extraction and the Hi-C experiment. A.R. and D.S. conducted the initial genome assembly. A.R. improved the assembly and performed annotation analysis. D.F. performed Nanopore library preparation and sequencing. A.R. and M.A. performed the JBAT visualisation to final chromosome scaffold. A.R., T.R.G., L.G. and T.W. conducted the transposable element annotation. M.E.B. conducted the LTR divergence time analysis. H.F.O. performed the methylation analysis. S.X., Q.T., R.E.O. and S.E.G.M. carried out the phylogenetic and gene family analyses. G.P. conducted the synteny analysis. A.R. and P.M.S. performed the population genomic analyses. M.A. performed differential expression and GO analysis. A.R. and M.A. performed figure curation. A.R. wrote the original draft of the manuscript with inputs from P.M.S., S.C., S.X. and U.G. All authors discussed and approved the final manuscript. J.E.F., J.B., S.C., O.P. and U.G. provided critical resources for this study. P.M.S. and U.G. supervised the project. P.M.S., U.G., S.C. and J.E.F. acquired funding.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Rod Peakall and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Russo, A., Alessandrini, M., El Baidouri, M. et al. Genome of the early spider-orchid Ophrys sphegodes provides insights into sexual deception and pollinator adaptation. Nat Commun 15, 6308 (2024). https://doi.org/10.1038/s41467-024-50622-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-024-50622-4
This article is cited by
-
Telomere-to-telomere gap-free genome assembly of Euchiloglanis kishinouyei
Scientific Data (2025)