Introduction

The genus Bletilla, classified within the subfamily Epidendroideae Kostel., tribe Arethuseae Lindl., and subtribe Coelogyninae, is a taxon of significant medicinal and economic value in the family Orchidaceae. This genus includes six recognized species, four of which are native to China: B. striata (Thunb.) H. G. Reichenbach, B. ochracea Schltr., B. formosana (Hayata) Schltr., and B. sinensis (Rolf) Schltr. These species are distributed across multiple provinces in the southwestern, central, and southern regions of China, particularly in Guizhou, Sichuan, Yunnan, and Tibet1. B. striata is a medicinal herb officially listed in the Chinese Pharmacopoeia, renowned for its diverse bioactive compounds. These include dihydrophenanthrene derivatives, polysaccharides, and 2-isobutyl malic acid, which contribute to its pharmacological properties2,3. It demonstrates many pharmacological properties, including antibacterial, anti-inflammatory, and tumor-inhibitory activities4,5,6,7. However, over-excavation of wild B. striata has resulted in its categorization as a critically endangered species. Therefore, resolving the phylogenetic relationships within Bletilla is crucial for its effective conservation, taxonomic clarity, and sustainable utilization.

The taxonomic classification and phylogenetic relationships of Bletilla have long been debated, primarily due to morphological similarities with Arethusa L. and Bletia Ruiz & Pav.8,9,10,11,12,13,14. This has resulted in inconsistent placements across different taxonomic systems. Recently, van den Berg et al.15 and subsequent studies16,17 assigned Bletilla to subtribe Coelogyninae (tribe Arethuseae), which is currently the most widely accepted classification. In addition to these taxonomic difficulties, several studies have found it challenging to determine the monophyly as well as the systematic position of B. sinensis. While some studies support the inclusion of B. sinensis within Bletilla using DNA barcode fragments and CPG sequences18,19, and confirm the monophyly of Bletilla within Arethuseae, others propose segregating B. sinensis from Bletilla into a new monotypic genus Mengzia S. C. Chen & J. W. Zhai based on CPG markers and morphological evidence20. Further, Han et al.21 found B. sinensis clustered separately at the base of the phylogenetic tree, distinct from other Bletilla species. These unresolved conflicts, attributable to limited taxon sampling and insufficient nuclear gene integration, impede coherent taxonomic treatment and conservation of this endangered medicinal complex. To clarify the phylogenetic and evolutionary relationships within Bletilla, it is essential to expand the sampling range and conduct more comprehensive analyses of genetic markers. To resolve these conflicts, this study aimed to: (1) sequence, assemble, and annotate the CPGs of three Bletilla species; (2) systematically investigate the phylogenetic relationships within Bletilla by integrating CPG and nrITS sequences; (3) evaluate the efficiency of CPG ‘super barcodes’ for species identification in Bletilla, providing a reliable basis for taxonomic research and conservation of Bletilla.

Due to extensive morphological variation in Bletilla, key diagnostic traits1,22,23—including petal coloration, labellum pattern, and leaf morphology—overlap across species, resulting in ambiguous delimitation and challenging species identification. Wu et al.24 accurately identified four Bletilla species using the nuclear ribosomal internal transcribed spacer (nrITS) and ycf1 sequences. However, as sampling expanded, Huang et al.20 found that short standard DNA barcodes were inadequate for effective discrimination, necessitating higher resolution methods to resolve interspecific phylogenetic relationships and species identification of Bletilla.

In recent years, rising market demand for B. striata has driven year-on-year price increases. Consequently, various adulterants of B. striata, which are difficult for ordinary consumers to recognize due to their similar appearance and traits, have appeared in the Chinese herbal medicine market25. Cremastra appendiculata, Pleione bulbocodioides, and Pleione formosana, are are commonly misused as substitutes26,27. The use of adulterated products has compromised the safety and efficacy of B. striata medication. The CPG sequences, also known as ‘super barcodes’, characterized by structural simplicity, maternal inheritance, and hypervariable loci, overcome single-gene barcode limitations28,29. CPG data provide sufficient variation with higher identification accuracy30,31 and have thus been successfully applied for phylogenetics and species discrimination in complex genera like Aconitum L.32, Seriphidium (Besser) Fourr.33, and Zingiber Mill.34. This motivates our use of CPG barcoding to resolve Bletilla taxonomic conflicts.

Results

CPG sequencing, assembly, and gene types

The CPGs of B. striata, B. ochracea, and B. formosana exhibited typical circular structures, ranging in size from 159,418 to 160,052 bp (Table 1). Each CPG had conservative quadrilateral structures, containing a pair of inverted repeats IRa and IRb (25,984–26,780 bp), a large single-copy region (LSC) (87,058–87,620 bp), and a small single-copy region (SSC) (18,768–20,464 bp). The overall GC content of the CPGs was approximately in the range of 37.16–37.23%.

Table 1 Summary of the chloroplast genomes features of B. striata, B. ochracea, and B. formosana.

Genetic annotation of the assembled CPGs revealed a total of 127 annotated genes, including 83 protein-coding genes, 36 tRNA genes, and 8 rRNA genes (Fig. 1 and Supplementary Table S1). Similar to other angiosperms, such as Phalaenopsis stobartiana35, Polystachya concreta36, and Physalis angulata37, the Bletilla CPGs contained 17 intron-containing genes. These included five tRNA genes (trnV-UAC, trnK-UUU, trnL-UAA, trnE-UUC, trnA-UGC) and 12 protein-coding genes (rps16, atpF, rpoC1, petB, petD, rpl16, rps12, rpl2, ndhB, ndhA, ycf3, clpP). Among these, rps12, ycf3, and clpP each contained two introns. The rps12 gene exhibited a unique trans-splicing structure, with its 5′ and 3′ ends located in the LSC and IR regions, respectively, forming two independent transcription units. The CPGs of B. striata, B. ochracea, and B. formosana showed structural similarities.

Fig. 1
Fig. 1
Full size image

The chloroplast genome map of Bletilla striata, B. ochracea, and B. formosana. The genes colored according to the functional group are represented in the outermost circle, the genes external to the circumference are transcribed in a counterclockwise direction, and the internal genes are transcribed in a clockwise direction.

Sequence repeats and Codon preference analysis

The simple sequence repeats (SSRs) were found in the different regions of the five samples. A total of 60, 56 and 65 of the SSRs were identified in the B. ochracea, B. formosana and B. striata CPGs, respectively. The most abundant mononucleotide repeat was A/T, which represents approximately 66.07% to 70.77% of the total number. The SSRs (T)13, (T)16, (ATAG)4, (AT)9, and (AGTATA)3 were unique to B. ochracea. (AT)7, (AAGA)3, and (CTATAT)3 were found only in B. striata. (A)13 were lost in B. formosana (Fig. 2a). These specific SSRs provide valuable information for Bletilla taxonomy.

Fig. 2
Fig. 2
Full size image

The number of simple sequence repeats (SSRs) and relative synonymous codon usage (RSCU) of amino acids in B. striata, B. ochracea, and B. formosana. (a) Classified SSR repeat units, (b) Amino acid usage frequency calculated by RSCU.

Relative synonymous codon usage (RSCU) was used to evaluate the codon usage frequency. The RSCU value of each codon was nearly identical, suggesting that the codon preferences among the five samples were similar. Met and Trp were the only two codons exhibiting no bias (RSCU = 1.00) in the five CPGs. Arg, Leu, and Ser were the most frequent amino acids (Fig. 2b). There were 30 codons (one stop codon) with an RSCU > 1.00, indicating that the usage frequency of these codons was greater than that of other synonymous codons. Among these codons, 29 ended with the A/U base, suggesting that high-frequency codons tend to end in A/U bases.

CPG sequences divergence

To assess sequence divergence within the genus Bletilla, we investigated the global comparison of the 13 Bletilla CPGs using mVISTA software, with B. striata (accessions NC_028422) as the reference. The alignment revealed over 25 regions with less than 50% sequence identity, indicating substantial interspecific variation and genetic differentiation, particularly in non-coding regions (Fig. 3). Furthermore, significant intraspecific variations were observed among the CPGs of B. striata, B. ochracea, and B. formosana. The mVISTA plot revealed some regions indicating mutations or deletions, such as trnS-trnG, psbM-trnD, trnL-ndhJ, accD-psaI, and psbE-petG in B. striata. The regions of rps16-trnK, pbsI-trnG, psbM-trnD, psbB-psbT, and ndhF-rpl32 also showed low similarity in B. ochracea.

Fig. 3
Fig. 3
Full size image

Global comparison of the Bletilla chloroplast genomes. Gray arrows, blue and red areas above the alignment indicate genes with their orientation, exon and noncoding region, respectively. The x-axis stands for sequence and the y-axis represents the percent identity ranging from 50 to 100%.

We used the sliding window in DnaSP to filter eight highly variable regions in the CPG, namely, rps16-trnQ, pbsI-trnG, psbM-trnD, trnE-trnT, accD-psaI, rps3-rpl16, ndhF-rpl32, and psaC-ndhE (Fig. 4). The average value of nucleotide diversity (π) among the Bletilla CPGs was 0.01193, with the psbM-trnD (π = 0.2450) and ndhF-rpl32 (π = 0.0899) variable regions exhibiting significantly higher π values compared to other regions (Supplementary Table S2). These highly variable regions may be potential molecular markers for the identification of Bletilla species.

Fig. 4
Fig. 4
Full size image

Sliding window analysis of the Bletilla chloroplast genomes x-axis: position of the midpoint of a window; y-axis: nucleotide diversity of each window; Window length: 600 bp; step size: 200 bp.

Intraspecific and interspecific genetic distances overlap

Genetic distance analysis revealed substantial challenges for delineating species boundaries in Bletilla. While B. sinensis exhibited distant relationships to congeners (GDs > 0.022), we observed significant overlap between intraspecific and interspecific distances among B. striata, B. ochracea, and B. formosana (Table 2 and Supplementary Table S3). Critically, the maximum intraspecific GDs for these three species exceeded the minimum interspecific GDs. This was most notably observed in B. ochracea where the maximum intraspecific GDs (0.058476) surpassed the minimum interspecific GDs (0.000097). Such overlap indicates that genetic distance thresholds alone cannot reliably differentiate these taxa.

Table 2 The intraspecific and interspecific genetic distances (GDs) of Bletilla.

Cluster analysis and identification of Bletilla species based on CPG

Previous nrITS maximum likelihood (ML) phylogeny revealed non-monophyly of B. striata, B. ochracea, and B. formosana, indicating the nrITS data limited discriminatory power for Bletilla species identification (Supplementary Fig. S1). To assess the impact of expanded taxon sampling, we reconstructed a CPG-ML phylogeny using all 26 available Bletilla CPGs from GenBank alongside common adulterants (Fig. 5). All Bletilla (excluding B. sinensis) samples formed a strongly supported monophyletic clade (ML bootstrap support [MLBS] = 100); however, none of the species demonstrated monophyly at the species level. Notably, the nine B. striata samples were distributed across three distinct clades (MLBS = 100). This polyphyletic distribution pattern was similarly observed in both B. ochracea and B. formosana samples. The adulterant species, including P. chlorantha, C. appendiculata, P. bulbocodioides, P. formosana, and T. alba, were distinctly separated from the genus Bletilla clade (MLBS = 100), each forming independent branches on the phylogenetic tree.

Fig. 5
Fig. 5
Full size image

Phylogenetic tree of the genus Bletilla and its adulterants based on chloroplast genomes using Maximum likelihood (ML) method. Numbers are ML bootstrap (MLBS) support.

Phylogenetic analyses of Bletilla based on CPG and nrITS

To gain a more comprehensive understanding of the evolutionary relationships within Bletilla, we utilized the CPG and the corresponding nrITS sequences from the Orchidaceae family for phylogenetic analyses. Our analysis incorporated newly sequenced Bletilla CPGs and 57 representative species from 11 subtribes of Orchidaceae, based on 58 protein-coding genes. Based on these two different types of sequences, the phylogenetic trees (the CPG-tree and the nrITS-tree) constructed using the ML and BI methods were largely congruent and exhibited similar topologies.

At the subfamily level, the clustering pattern of the five subfamilies of Orchidaceae exhibited a high degree of consistency. However, at the genus level, there was a discrepancy in the taxonomic position of Bletilla and B. sinensis. In the CPG-tree (Fig. 6a), Bletilla (excluding B. sinensis) was more closely related to Pleione, Thunia, and Pholidota (MLBS = 100, Bayesian posterior probability [PP] = 1.00), which were grouped together in the branch of the subtribe Coelogyninae and the tribe Arethuseae . B. sinensis formed a clade group with Calanthe triplicata and Tainia dunnii (MLBS = 100, PP = 1.00). This sister clade was positioned outside the tribe Arethuseae and was in close proximity to the tribe Collabieae. In contrast, in the nrITS-tree (Fig. 6b), Bletilla was monophyletic, with B. striata, B. ochracea, B. formosana, and B. sinensis clustered into one group and forming a sister clade with the branch of Thunia, Pleione, and Pholidota (MLBS = 100, PP = 0.99).

Fig. 6
Fig. 6
Full size image

Phylogenetic tree based on the 57 Orchidaceae species using (a) 58 plastomic shared protein-coding genes, and (b) nuclear ribosomal internal transcribed spacer sequences. Numbers to the left of/are Maximum likelihood bootstrap (MLBS) support, and those to the right indicate the Bayesian posterior probabilities (PP).

Discussion

Genetic variation in plants, fundamental for adaptation and evolution, provides critical value for germplasm conservation and variety identification38,39. A substantial quantity of genetic variation exists within Bletilla CPGs. This phenomenon may arise from combined adaptive and neutral forces: the complexity of China’s ecological conditions and geographical patterns driving divergence, alongside high mutation rates, genetic drift, and hybridization. As a widely distributed wild plant, Bletilla species show distinct biogeographic patterns: B. striata, B. ochracea, and B. formosana inhabit diverse ecoregions across China (e.g., southwestern mountains, southeastern coasts, northwestern/southeastern plains), whereas B. sinensis shows a restricted endemic distribution40,41. Temperature gradients (ΔT = 14–26.2 °C), precipitation regimes (467–1,687 mm/yr), and soil types (e.g., Yellow vs. Cinnamon soil) have been identified as major drivers of B. striata ecotype differentiation42. These factors collectively could contribute to the formation of distinct ecotypes in Bletilla, which often harbor unique sequence information. For example, B. striata from Anhui and Zhejiang populations exhibit 3–5 nrITS single nucleotide polymorphisms (SNPs), while samples from Hunan and Jiangxi show unique mutation hotspots43. Future analyses of selection pressures are needed to evaluate the relative roles of adaptive versus neutral processes in shaping this diversity44,45.

The exceptionally high nucleotide diversity of Bletilla CPGs (π = 0.01193) exceeds that of related medicinal genera by 3- to tenfold (Epimedium L.: 0.0012146, Lycium L.: 0.0014747, Schisandra Michx.: 0.0036448), further indicating high genetic diversity within this genus. Chromosomal analyses further validated this result. Chromosome numbers of Bletilla from different provenances were different: B. striata 2n = 32, B. striata (safflower) mutant 2n = 34, and B. ochracea/ B. formosana 2n = 34–3649. These chromosomal characteristics correlate with labellum pigmentation and floral architecture, as exemplified by distinct traits in BJ1 (Yunnan) and BJ2 (Chongqing) accessions (Fig. 7). Crucially, in the B. striata species complex, the B. striata (safflower) mutant may have originated from natural hybridization between the B. ochracea and B. formosana. This mutant exhibits an intermediate chromosome number (2n = 34) and labellum patterning more similar to B. ochracea49. This hybrid genotype demonstrates how hybridization drives rapid trait innovation through gene exchange, expanding intraspecific variation beyond ecological adaptation alone.

Fig. 7
Fig. 7
Full size image

Images of Bletilla striata, B. ochracea, and B. formosana. (a, d, g, j, m) The overall morphology of the plant. (b, e, h, k, n) The morphology of the flower. (c, f, i, l, o) The morphology of the fresh tuber.

Orchidaceae, the most species-rich family of angiosperms, comprises five subfamilies and over 28,000 species50. Extensive species diversity and complex morphological variation have resulted in ambiguous generic and specific boundaries, with persistent taxonomic uncertainties51,52. Although recent studies support separating B. sinensis from Bletilla20,21, its phylogenetic affinities remain unresolved. In our study, both CPG and nrITS phylogenetic trees classified Orchidaceae into five subfamilies, which was consistent with previous reports53,54. However, phylogeny trees showed differences between the monophyletic status of Bletilla and the taxonomic position of B. sinensis, indicating a significant nucleocytoplasmic conflict. Specifically, in nuclear gene (nrITS) phylogenies, B. sinensis clustered into a single branch with B. striata, B. ochracea, and B. formosana with high support, forming a monophyletic Bletilla group within the subtribe Coelogyninae. In contrast, CPG gphylogenies showed B. sinensis distantly related to Bletilla, instead forming a sister clade to the tribe Collabieae.

Despite differing phylogenetic placements of B. sinensis between our study and prior works20,21, both consistently support its separation from the genus Bletilla based specifically on CPG evidence. Methodologically, while previous analyses focused specifically on the Arethuseae tribe20,21, our expanded sampling across 11 subtribes of Orchidaceae (Fig. 6a) reveals novel affinities between B. sinensis and the tribe Collabieae. Thus, we speculate that compared to using selected the tribe Arethuseae samples, a broad taxonomic sampling range leads to changes the in phylogenetic topology and significant shifts in the position of B. sinensis. These findings collectively highlight the necessity for a taxonomic re-evaluation of B. sinensis. Based on the observations, we propose two non-exclusive hypotheses: B. sinensis either represents an independent lineage warranting taxonomic segregation or it should be retained in Bletilla but exhibits a hybrid genomic mosaic potentially derived from historical introgression between Bletilla and the tribe Collabieae ancestors. This speculation is consistent with the common phenomenon of nucleocytoplasmic conflicts caused by hybridization in Orchidaceae. To test these hypotheses, future studies should compare floral ontogeny, conduct D-statistic analyses of population admixture55, and perform reproductive isolation assays to clarify its taxonomic status. Moreover, irrespective of its final taxonomic classification, the endemic vulnerability and evolutionary distinctiveness of B. sinensis warrant its designation as a high conservation priority56. This approach aligns with conservation strategies emphasizing evolutionary distinctiveness.

Nucleocytoplasmic conflicts can arise through multiple mechanisms, with introgressive hybridization57 and chloroplast capture58,59 being common triggering factors. Such conflicts often manifest as gene tree discordance, particularly between maternally inherited CPG genes and biparentally inherited nuclear genes60,61. In angiosperms, limited interspecific nuclear gene exchange stands in sharp contrast to the maternal inheritance of CPG. This contrast not only facilitates chloroplast capture during hybridization but also generates hybrid genotypes. Notably, hybridization and gene introgression are pivotal for the formation of new genotypes. These mechanistic foundations lead us to hypothesize that hybridization and introgression may be the causes of the phylogenetic conflicts and difficulties in Bletilla species identification. First, hybridization-generated recombinant genotypes disrupt species monophyly, as evidenced by the non-monophyly of B. striata, B. ochracea, and B. formosana in phylogenetic trees. Second, chloroplast capture via introgression causes the discordant placement of B. sinensis. The CPG of B. sinensis was grouped with C. triplicata and T. dunnii (Fig. 6a), while the nrITS supported its phylogenetic affinity with Bletilla (Fig. 6b). Third, introgression widens intraspecific genetic variation, invalidating the ‘distance threshold’ method (Table 2). These processes collectively undermine conventional molecular delineation and prompt us to propose an integrative framework.

Furthermore, these pronounced genetic variations profoundly threaten species conservation and medicinal authentication for B. striata. CPG ‘super barcodes’ demonstrate efficacy in some taxa62,63, yet fail critically in Bletilla (Fig. 5, Supplementary Fig. S1). These failures mirror limitations in other taxa: CPG barcoding resolved only 60.00% of Lauraceae Juss. species64 and 27.27% of Schima Reinw. ex Blume species65. This limitation, compounded by overlapping intra-/inter-specific genetic distances and non-monophyly phylogeny relationships, prevents reliable species delineation. It also increases the adulteration risks from substitutes like C. appendiculata, thereby threatening medicinal safety and regional quality standards. To address this, we propose developing ecotype-specific SNP markers for geographic traceability66 and combining plastid hypervariable regions (psbM-trnD) with nrITS to overcome single-barcode limitations.

In conclusion, to achieve accurate phylogenetic inferences and species identification in Bletilla, it is essential to integrate multiple lines of evidence for a comprehensive re-evaluation. Consequently, an integrative framework is indispensable to reconcile discordant data types. It synergistically incorporates nuclear loci to resolve species boundaries obscured by chloroplast capture67, plastid hypervariable regions to detect hybridization signatures68, ecological niche data to identify hybrid zones and validate ecotype divergence69, and genome-wide markers to resolve recombinant genotypes while enabling the development of DNA signature and tissue-specific expression markers70,71. This comprehensive approach promises to enhance taxonomic resolution in Bletilla.

Conclusion

In this study, we assembled and annotated five novel CPGs of Bletilla (Orchidaceae), revealing conserved structural features alongside significant intraspecific variations. We identified eight hypervariable regions as potential molecular markers. The observed phylogenetic conflicts in the placement of Bletilla and B. sinensis, likely arising from introgressive hybridization, help explain the taxonomic controversies. The CPG ‘super barcodes’ show limited discriminatory power due to overlapping intraspecific distances and high genetic diversity, severely hindering reliable species identification. For practical applications, the evolutionary distinctiveness and unresolved taxonomy of B. sinensis necessitate its urgent designation as a high-priority conservation unit, while the failure of conventional barcodes severely threatens B. striata medicinal authentication, increasing adulteration risks. To address these, we propose an integrative framework combining nuclear loci, plastid hypervariable regions, ecological niche modeling, and whole-genome sequencing for developing ecotype-specific SNP markers for reliable herbal traceability and adulteration prevention. This framework enables the simultaneous protection of unique genetic diversity and safeguarding of medicinal product safety.

Materials and methods

Sample material and DNA extraction

We sequenced and analyzed five new CPGs from Bletilla species, comprising two B. striata, two B. ochracea, and one B. formosana. The plant materials of Bletilla species were collected from natural populations in Chongqing, Yunnan, Zhejiang, Shanxi, and Anhui provinces, China (Fig. 7), during the field investigation for ‘The Fourth National Survey of Chinese Material Medica Resources’ (Grant No. 201207002). All field collections were conducted under the supervision of and with prior permission from the local forestry departments, in strict compliance with relevant Chinese laws and ethical guidelines. The formal identification of the plant material was undertaken by Researcher Zhengyu Liu from the Chongqing Institute of Medicinal Plant Cultivation, based on examination of key morphological characteristics. Voucher specimens have been deposited in the Herbarium of the Chongqing Institute of Medicinal Plant Cultivation with the accession numbers CIMPC-RFM-20220401 to CIMPC-RFM-20220405 (see also Supplementary Table S4).

Before DNA extraction, healthy young leaves were surface-sterilized by wiping with 70% ethanol, followed by two to three rinses with sterile distilled water. Total genomic DNA was extracted using the plant genomic DNA kit (Tiangen Biotech, Beijing, China). DNA quality and concentration were evaluated using a NanoPhotometer® N60, with samples exhibiting an OD260/OD280 ratio between 1.8 and 2.0 considered high-quality. For each sample, the total amount of DNA was required to be no less than 0.5–1 µg. High-quality DNA samples were subsequently sent to Lingen Biotechnology Co., Ltd. (Shanghai, China) for sequencing and library preparation.

DNA sequencing and assembly annotation

The DNA library was constructed using the NEBNEXT® Ultra™ II DNA Library Prep Kit72, with an insertion fragment length of 350 bp. Sequencing was performed on the Illumina HiSeq X Ten platform (PE150) at Novogene (Beijing, China). Raw reads were processed using Trimmomatic73 to remove low-quality sequences and adapter contamination. The filtered reads (in FASTQ format) were used to assemble the CPG sequences by GetOrganelle74 (script: get_organelle_from_reads.py -1 R1.fq.gz -2 R2.fq.gz -o output -R 50 -k 21,45,65,85,105 -P 1000000 -F embplant_pt -t 50). The CPGAVAS275 was used for annotation, and a circular genome map was generated with OGDRAW76. The MISA77 was applied to predict simple sequence repeats (SSR) with parameter thresholds set at 1, 2, 3, 4, 5 and 6, and nucleotide parameters of 10, 5, 4, 3, 3 and 3. Relative synonymous codon usage (RSCU) was analyzed with codon W 1.4.2 (http://codonw.sourceforge.net/). The final annotated sequences were manually curated using Sequin software and deposited into the GenBank database (https://www.ncbi.nlm.nih.gov/).

Comparative genome analysis

The CPG of B. striata (accessions NC_028422) was used as a reference for comparative analysis. Full-length CPG sequences were compared using mVISTA78 with default parameters to identify structural variations and conserved regions. To assess nucleotide diversity and identify highly variable regions, the CPG sequences were first aligned using MAFFT v7.313. DNA polymorphism analysis was then performed with DnaSP v5.179 employing a sliding window approach (200 bp step size, 600 bp window length) based on the multiple sequence alignment (FASTA format). Genetic distances were calculated using the Kimura 2-parameter model in MEGA v11.0.

Phylogenetic analysis of Bletilla and B. sinensis

To resolve the phylogenetic controversies surrounding Bletilla, we retrieved the CPG sequences and corresponding nrITS sequences of 57 Orchidaceae species from the GenBank database, with Marchantia polymorpha selected as the outgroup. A total of 58 shared protein-coding genes were extracted from these CPGs. These homologous regions were aligned using MAFFT v7.313 (script: mafft --auto ingroup_outgroup.fas > ingroup_outgroup_mafft.fasta)80. Two different phylogenetic algorithms were employed in this analysis: Maximum likelihood (ML) method and Bayesian inference (BI) method. ML analysis containing 1000 bootstrap repeats was constructed by IQTree v1.6.681 (script: iqtree -s ingroup_outgroup_mafft.fasta -m MFP -bb 1000 -bnni) under the GTR + F + I + R4 (CPG) and TNe + I + G4 (nrITS) model selected by ModelFinder82. BI analysis was performed using MrBayes v3.2.683 under GTR + G + I models selected by MrModeltest 2.384, with four Markov chains and two parallel runs. Each run consisted of 3,000,000 generations, sampling every 1000 generations. Convergence was assessed to ensure reliable results. The phylogenetic tree was visualized using FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/). All sequence data used in this phylogenetic analysis were sourced from the NCBI database and publicly available literature, with detailed information provided in Supplementary Table S5.

Identification of Bletilla and its adulterants

To evaluate the identification efficiency of CPG for the genus Bletilla and its common adulterants, we conducted a cluster analysis on six adulterant species (Platanthera chlorantha, Cremastra appendiculata, Pleione bulbocodioides, Pleione formosana, Pleione yunnanensis, Thunia alba) and the genus Bletilla, with Nymphaea nouchali and Barclaya longifolia85 selected as the outgroups. CPG sequences were aligned using the MAFFT v7.31380, and a phylogenetic tree was constructed using the IQTree v1.6.681 with the same parameters as described in “Phylogenetic analysis of Bletilla and B. sinensis” section, under the TVM + F + R3 model selected by ModelFinder82. Successful species identification was defined by the clustering of samples from the same species into a monophyletic group in the ML tree. All sequence data used in this identification analysis were sourced from the NCBI database and publicly available literature, with detailed information provided in Supplementary Table S6.