Introduction

The genus Cypripedium, a subdivision within the Orchidaceae family, is renowned for its species that feature a labellum evolved into a slipper-like structure, commonly known as slipper orchids or lady’s slippers1,2. Cypripedium macranthos Swartz (1800) (Fig. 1a) and C. × ventricosum Swartz (1800) (Fig. 1b) are two such species that thrive in cooler climates and are not tolerant of high temperatures. They are among the few orchids that inhabit temperate, high-latitude regions and have a global distribution limited to Russia, China, Japan, and the Korean Peninsula2,3,4,5. Their distinctive petal shapes and vivid bloom colors endow C. macranthos and C. × ventricosum with significant ornamental value, ranking them among the most favored orchids4. C. shanxiense S. C. Chen (1983) (Fig. 1c) is mainly found in northern and western China, northern Japan, southeastern Russia, and across Mongolia5,6. Distinct from most Cypripedium species, C. shanxiense boasts small, delicate flowers with unique colors that are neither purely greenish-yellow nor yellow, nor bright crimson or purplish-red, yet it retains a high ornamental value and shows promise for garden landscape applications.

Successful hybridization between genera and species of Orchidaceae plants often exists6. C. × ventricosum often grows in association with C. calceolus and C. macranthos, and it is therefore considered to be a natural hybrid of these two species5,6,7. However, there is still insufficient molecular evidence to confirm their relationship. In the past decade, habitat loss, biological factors (such as regeneration challenges) and over-collection of these beautiful wildflowers for horticultural and medicinal purposes have brought these wild species dangerously close to extinction3,4,8. To date, few studies have been conducted on these beautiful and endangered orchids.

The chloroplast genome is a valuable source of information for studying plant phylogeny and evolution, which is attributed to its predominantly maternal inheritance9,10,11,12. With the advancement of sequencing technologies, an increasing number of plant chloroplast genomes are being discovered, which contributes to our understanding of plant phylogeny13,14. While Luo et al. have published the chloroplast genome of C. macranthos, the sample collection site in Yunnan, China, characterized by a tropical or subtropical monsoon climate, does not align with the cool-temperate habitat preferences of C. macranthos15. Consequently, we have sampled from the native distributions of C. macranthos, C. × ventricosum, and C. shanxiense, and in this paper, we report the most up-to-date chloroplast genomes for these species, along with phylogenetic analyses. Our findings are poised to contribute valuable insights for future research, development, and application related to C. macranthos, C. × ventricosum, and C. shanxiense.

Fig. 1
figure 1

Reference image of C. macranthos, C. × ventricosum, and C. shanxiense flowering plant. (a) Reference image of C. macranthos; (b) Reference image of C. × ventricosum; (c) Reference image of C. shanxiense. All of the above images were taken by Mr. Baoqiang Zheng. Images a and b were photographed in Sandawan Township, Yanji County, Jilin Province, China (43.2°N, 129.1°E), and image c was taken in the Beijing Songshan Nature Reserve (40.3°N, 115.4°E). All images are unpublished (used with permission).

Results

Assembly and annotation of the Chloroplast genome

The entire chloroplast genome of C. macranthos was a circular molecule composed of 181,030 bp, comprising four contiguous segments: a large single-copy region (LSC) of 103,242 bp, a short single-copy region (SSC) of 22,268 bp, and two inverted repeat regions (IRA and IRB) of 27,760 bp each, with a total GC content of 34.56% (Fig. 2). The average sequencing depth was 543 × (Figure S7). The genome contained 131 genes, including 85 protein-coding genes, 38 transfer RNA (tRNA) genes, and 8 ribosomal RNA (rRNA) genes. Among these, 8 protein-coding genes (atpF, rpoC1, petB, petD, rpl16, rpl2, ndhB, ndhA) and 6 tRNA genes (trnK-UUU, trnG-UCC, trnL-UAA, trnV-UAC, trnI-GAU, trnA-UGC) contained one intron, 2 protein-coding genes (ycf3, clpP) contained two introns (Figure S1), and 1 gene contained a trans-splicing gene (rps12) (Figure S2).

Fig. 2
figure 2

Chloroplast genome map of C. macranthos. Genes belonging to different functional groups are shown in different colors. Genes within the circle undergo clockwise transcription, while those outside the circle undergo counterclockwise transcription. The functional classification of the genes is shown in the bottom left corner. The light green inner circle indicates the GC content of the chloroplast genome.

The chloroplast genome of C. × ventricosum was a circular molecule totaling 175,385 bp, with a GC content of 34.48%. It consisted of four contiguous segments: two inverted repeat regions (IRA and IRB) of 27,760 bp each, a large single-copy region (LSC) of 97,520 bp, and a short single-copy region (SSC) of 22,345 bp (Fig. 3). The average sequencing depth was 282 × (Figure S8). The genome contained a total of 131 genes, including 85 protein-coding genes, 38 tRNA genes, and 8 rRNA genes. Among these, 8 protein-coding genes (atpF, rpoC1, petB, petD, rpl16, rpl2, ndhB, ndhA) and 6 tRNA genes (trnK-UUU, trnG-UCC, trnL-UAA, trnV-UAC, trnI-GAU, trnA-UGC) contained one intron, 2 protein-coding genes (ycf3, clpP) contained two introns (Figure S3), and 1 gene contained a trans-spliced gene (rps12) (Figure S4).

Fig. 3
figure 4

Chloroplast genome map of C. × ventricosum. Genes belonging to different functional groups are shown in different colors. Genes within the circle undergo clockwise transcription, while those outside the circle undergo counterclockwise transcription. The functional classification of the genes is shown in the bottom left corner. The light green inner circle indicates the GC content of the chloroplast genome.

The chloroplast genome of C. shanxiense was a circular molecule of 177,627 bp, composed of four continuous segments: two inverted repeat regions (IRA and IRB), each 27,691 bp; a large single-copy region (LSC), 99,779 bp; and a short single-copy region (SSC), 22,466 bp, with a total GC content of 34.42% (Fig. 4). The average sequence depth was 420 × (Figure S9). A total of 133 genes were identified, including 87 protein-coding genes, 38 tRNA genes, and 8 rRNA genes. Among these, 9 protein-coding genes (rps16, atpF, rpoC1, petB, petD, rpl16, rpl2, ndhB, ndhA) and 6 tRNA genes (trnK-UUU, trnG-UCC, trnL-UAA, trnV-UAC, trnI-GAU, trnA-UGC) contained one intron, 2 protein-coding genes (ycf3, clpP) contained two introns (Figure S5), and 1 gene (rps12) contained a trans-spliced intron (Figure S6).

Fig. 4
figure 5

Chloroplast genome map of C. shanxiense. Genes belonging to different functional groups are shown in different colors. Genes within the circle undergo clockwise transcription, while those outside the circle undergo counterclockwise transcription. The functional classification of the genes is shown in the bottom left corner. The light green inner circle indicates the GC content of the chloroplast genome.

Phylogenetic analysis

A phylogenetic tree can be represented by a branching diagram that illustrates the relationships between similar organisms in a tree-like structure16. The chloroplast genome sequences of 34 Lythraceae species and 3 Amaryllidaceae species were selected from NCBI to investigate the phylogeny of C. macranthos, C. × ventricosum, and C. shanxiense (Fig. 5a). According to the phylogenetic tree, species from the same genus were grouped together in one branch, while species from different genera diverged. Each node had excellent support (BS ≥ 95.6). However, the former C. macranthos (submitted by Luo et al.15) was not placed in the same branch as the other Cypripedium species. The latest chloroplast genome of C. macranthos (submitted by us) was most closely related to C. calceolus and C. × ventricosum, and C. × ventricosum was positioned between the branches of C. calceolus and C. macranthos.

To further assess the relationship between C. × ventricosum and C. macranthos as well as C. calceolus, a phylogenetic analysis was conducted on the internal transcribed spacer (ITS) sequences of 31 Cypripedium species and 2 Paphiopedilum species. Based on the branching pattern of the tree, C. × ventricosum was most closely related to C. calceolus, and it is positioned between the branches of C. calculous and C. macranthos, which also suggests that it might be a hybrid descendant of these two species (Fig. 5b).

Fig. 5
figure 6

Chloroplast genome and ITS phylogenetic trees. ML tree based on the whole chloroplast genome of C. macranthos, C. × ventricosum, C. shanxiense, and 34 Orchidaceae species, with 3 Amaryllidaceae species as outgroups (a). ML tree based on ITS sequences from 31 Cypripedium species, with 2 Paphiopedilum species as outgroups (b). The numbers at each node represent the bootstrap values from 1000 repetitions. “*” indicates that the bootstrap value is below 70. On the left side of each tree, a ML tree that preserves genetic distances is displayed, with branch colors corresponding to the genus-specific background colors indicated in the figure on the right.

Genome structure analysis

Based on the chloroplast genome sequence alignment results, it can be observed that the latest C. macranthos, C. × ventricosum, and C. shanxiense exhibit high conservation, with variations in the non-coding regions being higher than those in the coding regions. Most of the high mutation areas are also located within the conserved non-coding sequences (CNS). In contrast, the former C. macranthos also exhibits significant variation in the exon regions (Fig. 6a). The comparison of the IR boundaries among C. calceolus, the former C. macranthos, the latest C. macranthos, C. × ventricosum, and C. shanxiense reveals that C. calceolus, the latest C. macranthos, C. × ventricosum, and C. shanxiense are relatively conservative at the IR boundaries. The main difference of the former C. macranthos from the other four is the absence of ycf1 at the IRb-SSC boundary, and ndhF does not span the SSC-IRb boundary (Fig. 6b). Based on the collinearity analysis of the chloroplast genomes, it was found that C. calceolus, the former C. macranthos, the latest C. macranthos, C. × ventricosum, and C. shanxiense did not undergo rearrangements or inversions (Fig. 6c).

Fig. 6
figure 7

Comparison of chloroplast genome structures in five Cypripedium species. Visualization of chloroplast genome sequence alignment for four Cypripedium species using C. calceolus as a reference (a). The vertical scale shows the percent of identity, ranging from 50 to 100%. The horizontal axis shows the coordinates within the chloroplast genome. Comparison of the LSC, SSC, and IR boundaries among five Cypripedium chloroplast genomes (b). The LSC, SSC, and IR regions are shown in different colors. JLB, JSB, JSA, and JLA represent the junction sites between the corresponding regions of the genome, respectively. Genes are depicted as boxes. Collinearity analysis of the chloroplast genomes from five Cypripedium species (c).

Discussion

Chloroplast genomics is a research hotspot for the study of plant evolutionary relationships because the plant chloroplast genome has a comparatively stable and conservative structure and includes extensive genetic information, which is an important basis for the study of plant evolutionary relationships17,18. In this study, the latest complete chloroplast genome of C. macranthos, C. × ventricosum, and C. shanxiense were assembled and phylogenetically analyzed with the published chloroplast genomes of 34 species of Orchidaceae and 3 of Amaryllidaceae. The phylogenetic analysis revealed that distinct genera diverged from one another and that species belonging to the same genus were clustered into a single branch, indicating the presence of distinct boundaries between various genera. Interestingly, species of Cypripedium can be clearly distinguished into two groups within a single branch, suggesting a clear differentiation event in Cypripedium.

Previously, Li et al. cloned six genes and conducted a phylogenetic analysis of the Cypripedium genus, which did not include C. × ventricosum, but found that C. calceolus is most closely related to C. shanxiense, rather than C. macranthos19. In contrast, the phylogenetic analysis of the entire chloroplast genome in this study indicates a close relationship between C. × ventricosum and both C. calceolus and C. macranthos, with C. × ventricosum positioned between the branches of C. calceolus and C. macranthos. Meanwhile, C. shanxiense is most closely related to C. henryi. In addition, this study conducted a phylogenetic analysis based on the known ITS sequences of 31 Cypripedium species, and the results indicate that C. × ventricosum is most closely related to C. calceolus, and it is also positioned between the branches of C. calceolus and C. macranthos. These results indicate that the findings of this study support the view proposed by earlier scholars that C. × ventricosum is an inter-specific hybrid between C. calceolus and C. macranthos5,6,7. Additionally, the former C. macranthos did not cluster with other Cypripedium species in the same branch. Through structural comparative analysis of the genome, we found that the former C. macranthos chloroplast genome did not undergo rearrangements or inversions within the genome. This may suggest that the material from Luo et al. was not a pure strain of C. macranthos15. Given the natural hybridization of Orchidaceae species and the sampling location of Luo et al., it is speculated that this is likely a hybrid species that was mistakenly identified as C. macranthos due to its striking similarity to C. macranthos6,15.

In conclusion, this study supports the idea that C. × ventricosum is an interspecific hybrid between C. calceolus and C. macranthos from a molecular biology perspective. The reliable complete chloroplast genome of C. macranthos, C. × ventricosum, and C. shanxiense in this study will provide important theoretical support for the accurate and rapid identification of C. macranthos, C. × ventricosum, and C. shanxiense species, the scientific delineation of relatives, and the mechanism of genetic evolution, and will also be of great significance to the cultivation of high-quality germplasm resources and the conservation of wild resources of C. macranthos, C. × ventricosum, and C. shanxiense.

Materials and methods

Plant materials

Fresh leaves of C. macranthos and C. × ventricosum were collected from Sandaowan Town, Yanji County, Jilin Province, China (43.2°N; 129.1°E), and fresh leaves of C. shanxiense were collected from Songshan Nature Reserve, Beijing, China (40.3°N, 115.4°E). The specimens of C. macranthos, C. × ventricosum, and C. shanxiense were stored in the flower herbarium of the Research Institute of Forestry, Chinese Academy of Forestry, and can be accessed through the contact person Baoqiang Zheng (zhengbaoqiang@aliyun.com) under the registration number CM0008, CV0004 and CS0014, respectively.

Genome sequencing, assembly, and annotation

Using the Fast Plant Genomic DNA Isolation Kit (Sangon Biotech Co., Ltd., Shanghai, China), total DNA was recovered. Hieff NGS®MaxUp II DNA Library Prep Kit for Illumina® (YEASEN, Shanghai, China) was used for sequencing library construction. The Illumina Novaseq 6000 sequencing platform (Illumina, San Diego, USA) was used to generate paired-ended raw reads.

Fastp v0.36 was used as a tool for quality control of the sequencing data20. Sequencing depth and coverage were calculated using BEDTools v2.31.1 and mapping was performed using R-ggplot221. Bowtie2 v2.1.0 was used to splice the sequenced segments22. The chloroplast genome was assembled using GetOrganelle v1.7.5.3 with C. calceolus (NC045400) as a reference23,24.

CPGAVAS2 (http://47.96.249.172:16019/analyzer/home) was used to annotate the chloroplast genome25. tRNAs were identified using tRNAscan-SE v2.026. BLAST and DOGMA (http://dogma.ccbb.utexas.edu/) were used to verify the annotations27,28. The annotation of the chloroplast genome of C. macranthos, C. × ventricosum, and C. shanxiense were uploaded to GenBank (accession number: PP356909, PP448181, PP503063).

Circos (http://circos.ca/) was used to map circular genes29. CPGview (http://www.1kmpg.cn/cpgview/) was used to map cis- and trans-splicing genes30.

Phylogenetic analysis

Phylogenetic trees were constructed using chloroplast genome sequences and ITS sequences, respectively. Sequence alignment was performed using MAFFT v7 (https://mafft.cbrc.jp/)31. The Gblocks module in PhyloSuite v1.2.2 was used for trimming, and the ModelFinder module was used to determine the best models for the chloroplast genome sequence phylogenetic tree and the ITS sequence phylogenetic tree, which were GTR + F + R4 and SYM + G4, respectively. The IQ-TREE module was used to perform maximum likelihood (ML) analysis32. The phylogenetic tree was visualized using iTOL v6 (https://itol.embl.de/)33. The chloroplast genome sequences and ITS sequences used for phylogenetic analysis were detailed in Tables 1 and 2, respectively.

Table 1 Chloroplast genome sequences for phylogenetic analysis.
Table 2 ITS sequences for phylogenetic analysis.

Comparative analysis of genome structure

Using C. calceolus (NC045400) as the reference genome, a visual comparison of the chloroplast genomes of the former C. macranthos (KF925434), the latest C. macranthos (PP356909), C. × ventricosum (PP448181), and C. shanxiense (PP503063) was conducted with mVISTA (https://genome.lbl.gov/vista/index.shtml), selecting the Shuffle-LAGAN mode55.

Using CPJSdraw (V1.0.0) to generate a visual map of the IR boundaries for the chloroplast genomes of C. calceolus (NC045400), the former C. macranthos (KF925434), the latest C. macranthos (PP356909), C. × ventricosum (PP448181), and C. shanxiense (PP503063)56.

Conducting a collinear analysis of the chloroplast genomes of C. calceolus (NC045400), the former C. macranthos (KF925434), the latest C. macranthos (PP356909), C. × ventricosum (PP448181), and C. shanxiense (PP503063) using Mauve (https://darlinglab.org/mauve/mauve.html).