Abstract
To determine the role of internal transcribed spacer 2 (ITS2) in the identification of Spatholobus suberectus and explore the genetic diversity of S. suberectus. A total of 292 ITS2s from S. suberectus and 17 other plant species were analysed. S. suberectus was clustered separately in the phylogenetic tree. The genetic distance between species was greater than that within S. suberectus. Synonymous substitution rate (Ks) analysis revealed that ITS2 diverged the most recently within S. suberectus (Ks = 0.0022). These findings suggested that ITS2 is suitable for the identification of S. suberectus. The ITS2s were divided into 8 haplotypes and 4 evolutionary branches on the basis of secondary structure, indicating that there was variation within S. suberectus. Evolutionary analysis revealed that the GC content of paired regions (pGC) was greater than that of unpaired regions (upGC), and the pGC showed a decreasing trend, whereas the upGC remained unchanged. Single-base mutation was the main cause of base pair substitution. In both the initial state and the equilibrium state, the substitution rate of GC was higher than that of AU. The increase in the GC content was partly attributed to GC-biased gene conversion (gBGC). High GC content reflected the high recombination and mutation rates of ITS2, which is the basis for species identification and genetic diversity. We characterized the sequence and structural characteristics of S. suberectus ITS2 in detail, providing a reference and basis for the identification of S. suberectus and its products, as well as the protection and utilization of wild resources.
Similar content being viewed by others
Introduction
Spatholobus suberectus is distributed in Fujian Province and the Guangxi Zhuang Autonomous Region of China1. It is a leguminous plant used in traditional Chinese medicine. Dried S. suberectus stem is used as a medicinal component. Due to the red juice that exudes during harvesting, it is also known as “Ji Xue Teng” in China. Modern pharmacological and clinical research has shown that S. suberectus has anti-inflammatory1, antioxidant2, antiphotoaging3, antidiabetic4, and anticancer5 properties. In addition, S. suberectus has long been used as a nourishing food additive (wine, soup and tea) in China2. Owing to its importance in food and medicine, the market demand for S. suberectus is high, but the scarcity of wild resources and long growth period (more than 7 years) before it can be used as a medicine limit its supply6. Unscrupulous businessmen, driven by profits, mix vine plants with S. suberectus, which greatly affects the effectiveness and safety of its use in clinical medicine. The keys to solving this problem will be the development of methods for the identification of S. suberectus and its products and transformation of the supply model from wild resources to artificial cultivation.
Methods such as source, character and microscopic identification and chemical composition analysis are often used to identify medicinal plants or processed products7,8,9. DNA barcoding is a molecular diagnostic technology that uses standard, sufficiently variable DNA fragments for species identification and delimitation10. The DNA barcode fragments are short in length and easy to amplify, and even if the samples are not fresh enough (e.g., samples from herbaria or prepared products), the DNA that has been partially degraded can be distinguished11,12. The construction of barcode libraries from known taxa is the basis of this work, as well as the analysis of phylogenetic relationships on the basis of library assignment of barcode sequences to distinguish species13. Internationally recognized candidate sequences for plant DNA barcodes include the chloroplast–plastid region (matK, rbcL, ycf, psbA-trnH, etc.) and the nuclear internal transcribed spacer (ITS) region14. The Consortium for the Barcode of Life (CBOL) Plant Group proposed the combination of plastid and nuclear ITS regions as an effective barcoding tool for distinguishing plant species15. The China CBOL Plant Group has incorporated ITS (or ITS2) into the core barcode for seed plant identification, and psbA-trnH are recommended as auxiliary barcodes16. Chen et al.17 conducted a comparative analysis of the amplification success rate, intraspecies and interspecies variation, and barcoding gap of multiple candidate sequences and reported that ITS2 performed the best. They also used 6,600 samples of 4,800 plant species to evaluate the ability to use ITS2 for identification. The results revealed that the species identification success rate was 92.7%. Therefore, the use of the ITS2 sequence as a universal DNA barcode sequence for medicinal plants was proposed. The “Pharmacopoeia of the People’s Republic of China” (2015 edition) includes the guiding principles for DNA barcoding technology and establishes a Chinese herbal medicine identification system based on ITS2 (ITS) supplemented with psbA-trnH18,19.
Several DNA barcodes have been found to be useful for identifying S. suberectus. An et al. used 26 S rDNA to distinguish S. suberectus, Callerya dielsiana, Derris taiwaniana, Mucuna sempervirens and Derris trifoliata through seven samples20. Huang et al. used matk to distinguish S. suberectus, D. trifoliata, Entada phaseoloides, Callerya cinerea and Sargentodoxa cuneata through 8 samples21. Zhou et al. used psbA-trnH to distinguish S. suberectus, S. cuneata, Kadsura interior, Kadsura heteroclita, M. sempervirens, Mucuna birdwoodiana, C. dielsiana and Callerya tsui through 79 samples22. ITS2 is located between the 5.8 S and 26 S eukaryotic ribosomal RNA genes and does not encode proteins23. Since the ITS region is not incorporated into the ribosome, it is subject to less natural selection pressure during evolution, thus tolerating more variation and showing extremely extensive sequence polymorphism in most eukaryotic organisms24,25. The ITS2 region has been used as a phylogenetic marker to identify many medicinal plants, closely related plants and a wide range of species10. Bupleurum L. (Apiaceae)26, Uncaria27, Aristolochia28, Eryngium29, Gnaphalium affine30, Rheum officinale31 and other medicinal plants can all be identified to a certain extent via ITS2 barcodes. We used ITS2 to distinguish S. suberectus from source species of almost all easily confused products (17). This study addresses the lack of a universally applicable barcode for the identification of S. suberectus, thereby ensuring the safe use of S. suberectus as a medicine.
The highly variable ITS2 is not only used in species identification but also contributes to genetic diversity analysis of species and varieties. Khazal et al. analysed the genetic diversity of Leishmania major through phylogenetic inference based on ITS232. Delva et al. analysed the genetic diversity of Amylomyces rouxii through phylogenetic analysis, genetic distance, genetic variation and haplotype network construction on the basis of ITS1/ITS2 and D1/D233. Lin et al. used ITS2 and the mitochondrial cytochrome c oxidase subunit 1 gene (cox1) as genetic markers to conduct genotyping analysis, identified 17 different ITS2 haplotypes and determined the population genetic structure of Sargassum plagiophyllum C. Agardh34. The mature secondary structure of the catalytic ribosomal RNA (a central loop connected to a four-finger structure) is highly conserved35,36. The prediction of secondary structure can not only serve to supplement and verify phylogeny at the sequence level but also assist in the discovery of genotypic variations in the population29. Umdale et al. evaluated the species and genetic diversity of Asian Vigna through haplotype and secondary structure analysis based on ITS237. Therefore, we analysed the genetic diversity of wild S. suberectus via ITS2. Genotype mining will provide information and labels for screening excellent varieties in the future and lay the foundation for artificial cultivation.
The guanine and cytosine (GC) content provides the material basis for species diversity and genetic diversity. GC base pairs also guarantee the structural stability of double-stranded DNA and RNA38,39. The GC content and distribution may be constrained and driven by structure, thermodynamic stability, and other factors40,41. The GC content and distribution are also reflective of sequence selection and structural evolution42. A recent study revealed that the paired region of angiosperm ITS2 contains a relatively high GC content and that GC-biased gene conversion (gBGC) is one of the main reasons for the high GC content43. We explored the evolution of S. suberectus ITS2 in relation to structure-related GC substitution trends and mechanisms and the relationships between GC content and species differentiation and genetic diversity.
Materials and methods
Sample collection and specimen identification
Field sampling of S. suberectus and source plants of easily confused products was conducted from May to June 2023. Fresh leaves from a total of 56 samples, including S. suberectus (39), M. sempervirens (6), and Craspedolobium unijugum (11), were collected in this study. The leaves were immediately placed into a sealed plastic bag containing enough silica gel to avoid DNA degradation. All the plants were identified by Prof. Yunfeng Huang and Prof. Kejian Yan of the Guangxi Institute of Chinese Medicine & Pharmaceutical Science. S. suberectus (Herbarium: 00327831), M. sempervirens (Herbarium: 02014496), and C. unijugum (Herbarium: 02028691) can be identified in the Chinese Virtual Herbarium (https://www.cvh.ac.cn/index.php). The samples were obtained mainly from the Guangxi Zhuang Autonomous Region and Yunnan Province, China. The distance between all the sampled individuals of the same population was greater than 50 m. Sample information is shown in Table S1.
DNA extraction, amplification, and sequencing
The genomic DNA of all the samples was extracted from approximately 15 mg of silica gel-dried leaves via the Fast Pure Plant DNA Isolation Mini Kit (Vazyme, Nanjing, China). The quality and concentration of the genomic DNA were determined via a NanoDrop 1000 spectrophotometer. Each DNA solution was diluted or concentrated to approximately 50 ng/µL for PCR amplification. PCRs were performed in a volume of 25 µL, which consisted of 50 ng of template DNA (1 µL), 12.5 µL of 2×Taq PCR master mix (Vazyme, Nanjing, China), 2 µL of 10 µmol/L forward and reverse primers, and 9.5 µL of ddH2O. The primers used were as follows: ITS2-2 F “ATGCGATACTTGGTGTGAAT” and ITS2-3R “GACGCTTCTCCAGACTACAAT”10. The reaction program was as follows: 94 °C for 5 min; 94 °C for 30 s, 58 °C for 45 s, and 72 °C for 45 s (30 cycles); and 72 °C for 10 min. All the PCR products were detected via agarose gel electrophoresis, and the gel was photographed via a UV transilluminator. The product was purified via a Fast Pure Gel DNA Extraction Mini Kit (Vazyme, Nanjing, China), and the reaction mixture was sequenced on an ABI 3130xl automatic sequencer (Applied Biosystems, Foster City, California, USA).
Sequence assembly, feature comparison and genetic analysis
The sequencing peak diagram was spliced and calibrated via Codon Code Aligner 8.0.2 software. The primers and low-quality regions of the sequenced ITS2 sequences were removed and cut according to the annotation file to obtain the complete sequences. The HMMer annotation method, which is based on the hidden Markov model, was used to remove the 5.8 S and 28 S sequences to obtain accurate ITS2 spacer sequences44. We used BLAST to search for homologous genes for all obtained sequences against the National Center for Biotechnology Information (NCBI) datanse (https://blast.ncbi.nlm.nih.gov/Blast.cgi? PROGRAM = blastn&PAGE_TYPE = BlastSearch&LINK_LOC = blasthome, accessed on 28 September 2023), and the source species were determined on the basis of the homologous genes with the highest similarity score and the lowest E value45. The ITS2 sequences of S. suberectus, M. sempervirens and C. unijugum were submitted to the GenBank database (https://www.ncbi.nlm.nih.gov/genbank/, accession number: PP465924-PP465979, accessed on 12 March 2024). Two hundred and thirty-six (236) ITS2 sequences from 15 adulteration-prone species were downloaded from GenBank (https://www.ncbi.nlm.nih.gov/genbank/, accessed on 28 September 2023) and analysed together. Searches for C. dielsiana (Herbarium: 02015435), S. cuneata (Herbarium: 02108794), M. birdwoodiana (Herbarium: 02098429), C. cinerea (Herbarium: 01924183), K. interior (Herbarium: 02015552), Kadsura heteroclite (Herbarium: 02231169), D. trifoliata (Herbarium: 01924020), E. phaseoloides (Herbarium: 02074730), Mucuna macrocarpa (Herbarium: 01781725), Padbruggea filipes (Herbarium: 1289072), Bauhinia championii (Herbarium: 01965870), Callerya nitida (Herbarium: 02036623), Schisandra propinqua (Herbarium: 02231152), and Schisandra henryi (Herbarium: 02231143) were performed against the Chinese Virtual Herbarium database (https://www.cvh.ac.cn/index.php). Searches for Wisteriopsis reticulata (ID: 88791) were performed against the Chinese Field Herbarium database (https://www.cfh.ac.cn/album/ShowSpAlbum.aspx?spid=88791). MAFFT was used to perform sequence alignment46. The tool trimAl was used to trim aligned sequences47. ModelTest-NG was used to select the optimal evolutionary model for ITS2 sequences48. In accordance with the Akaike information criterion (AICc), the JC model was selected to construct a phylogenetic tree via RAxML-NG software49. The bootstrap method (1000 repetitions) was used to check the support rate of each branch50. The R package ggtree was used for visualization of evolutionary trees51.
The R package DECIPHER was used to calculate intraspecific and interspecific genetic distances52. The R package ggplot2 was used to visualize the results in the form of box plots53. DNAsp v6.0 was used to analyse the Ks value between the ITS2 sequences of S. suberectus and other plants in the form of noncoding sequences54. The results are presented as density plots via the R package ggplot253. The R package pegas was used for statistical analysis of the haplotypes, and the results were visualized via the basic plotting functions of R software55. RNAfold software was used to obtain the secondary structure of ITS2 from S. suberectus56. LocARNA software was used to obtain consensus secondary structures and secondary structure-based phylogenetic trees57.
In the studies of Xian and Liu et al., the DNA/RNA hybrid substitution model was used to explain the substitution patterns of the ITS2 paired and unpaired regions43,58. We used the substitution model selection script (model_selection.pl) in PHASE 3.0 to select the best substitution model on the basis of the AICc value59. The phylogeny of ITS2 was inferred on the basis of sequence alignment files, consensus secondary structure files and NJ trees. MCMC analysis was performed for 10,000,000 generations to reach convergence, with sampling every 100 generations and 30,000 (30%) trees being burned-in. The remaining trees were used to infer substitution rates at initial and equilibrium states via the mcmcsummarize program of the PHASE package.
The equilibrium GC content (GC*) was calculated according to the method of Xian and Liu et al.43,58. In the convergence state, the GC content of the sequence in the equilibrium substitution mode can be calculated as the percentage of the AT→GC substitution rate in the sum of the AT→GC and GC→AT substitution rates60.
Results
Species identification, sequence characterization and phylogenetic inference
We performed BLAST alignment of the ITS2 sequences of 56 samples. Consistent with the morphological identification, S. suberectus (39), M. sempervirens (6) and C. unijugum (11) were identified. Their percent identity was above 96%. The average percentage identity of S. suberectus ITS2 was 99.83%. A total of 233 sequences from adulteration-prone species in GenBank were analysed together. S. suberectus had the shortest sequence (201/202 bp) and the highest GC content (69.31–71.29). S. cuneata had the longest sequence (228–257 bp). B. championii had the lowest GC content (52.75–54.13) (Table S1).
At the sequence level, phylogenetic analysis was performed according to the ML method to better distinguish species. There were obvious topological differences between S. suberectus and 17 easily confused species, including M. sempervirens and C. unijugum, indicating the usefulness of ITS2 in identifying S. suberectus (Fig. 1).
Genetic differentiation of S. Suberectus and the source plants of adulterated products
Genetic distance models are used to measure the extent of genetic differences between species. The intraspecific genetic distance of S. suberectus was distributed between 0 and 0.244, with an average value of 0.149 (Fig. 2). The average interspecific genetic distance between S. suberectus and S. propinqua was the smallest (0.648), ranging from 0.609 to 0.741. The average interspecific genetic distance between S. suberectus and P. filipes was the greatest (0.746), ranging from 0.721 to 0.756. The intraspecific genetic distances of S. suberectus were all smaller than the interspecific genetic distances of S. suberectus and other species, indicating clear genetic differences.
Ks can be used to compare gene duplication events and evolutionary rates within and between species. The Ks value within S. suberectus was the smallest (0.002), while the Ks value between S. suberectus and K. heteroclita was the highest (0.352) (Fig. 3). These findings indicated that S. suberectus and K. heteroclita complete the differentiation of ITS2 at an early stage. ITS2 of S. suberectus had two peaks with a large peak interval, indicating that ITS2 had undergone at least two large-scale duplications in S. suberectus and that the differentiation rate was slow.
Intraspecific variation in S. Suberectus ITS2 sequences
We used a haplotype network to analyse the genetic variation in the ITS2 sequence of S. suberectus. The 39 ITS2 sequences were divided into 8 haplotypes (H). The main haplotypes were H2 and H6, which contained 11 and 20 sequences, respectively (Fig. 4). There were fewer mutation sites between H2 and H1/3/4/5 (5, 5, 4 and 1, respectively) and between H8 and H6/7 (1 and 2, respectively). There were 21 mutations in H2 and H7, indicating that ITS2 evolved in two different directions, towardH2 and H6. In addition, the only member of H7 was SS030, whose base at position 183 was not detected (Y). H7 and H6 had no other mutations except at this position. This result indicated that the base at position 183 of SS030 may be T, and H6 and H7 can be classified into the same haplotype.
S. suberectus ITS2 haplotype network. Different haplotypes are represented by circles of different colours, and the size and number of sectors they are divided into represent the number of sequence entries that make up the haplotype. The length of the lines between haplotypes represents the number of mutation sites. Variant positions and changed bases between haplotypes are marked and connected via dashes.
Prediction and phylogenetic inference of the secondary structure of S. Suberectus ITS2
A phylogenetic tree was constructed on the basis of the ITS2 sequence and secondary structure of S. suberectus. The 39 S. suberectus ITS2 sequences were divided into 4 branches (Fig. 5). The members within each branch contained more similar sequences and secondary structures (Figures S1 and S2). ITS2 of branches I, II, and III all contained a classic four-arm structure with one ring, whereas the four arms of ITS2 of branch IV were distributed on a free single strand. In all branches, structure IV had the most rings. Clade I contained 3 bulges, 3 internal loops and a hairpin loop, whereas Clade II contained 2 bulges, 4 internal loops and a hairpin loop. This was also the most important structural difference between Clades I and II. Structure IV of Clade III contained 3 bulges, 2 internal loops and a hairpin loop, whereas structure IV of Clade IV contained 4 bulges, 2 internal loops and a hairpin loop. In addition to the central loop, Clade III had another multiple loop in structure I, which was quite different from the results for the other clades.
Structure-based GC heterogeneity and mutation direction of S. Suberectus ITS2
Liu et al. defined the equilibrium GC content (GC*) as the GC content when the substitution pattern of the sequence remains unchanged over time (convergent evolution) in the future equilibrium state43. The GC* provides clues for inferring the evolution trend of the GC content. We performed statistical analysis of the GC content (pGC and upGC) of the paired and unpaired regions as well as the equilibrium GC (pGC* and upGC*) (Fig. 6). The pGC (75.85 ± 0.49) was significantly greater than the upGC (58.12 ± 0.87). The pGC* (70.4) was lower than the current pGC, indicating a downwards trend in paired region GC replacement patterns. In addition, the upGC* (58.28) was similar to the current upGC content, indicating the opposite evolutionary trend for paired regions and nonpaired regions.
Comparison of the GC and equilibrium GC (GC*) contents of paired and unpaired regions of the ITS2 secondary structure. Boxplots with data points in different colours represent the GC content of paired and unpaired regions (pGC and upGC), respectively. GC* values in different regions are marked with red solid lines. The red lines for the paired and unpaired regions are marked on the right with “pGC*” and “upGC*”, respectively.
We used the best substitution model, HKY85 + G_RNA16A, which is based on the lowest AICc value, to infer the base pair substitution process. We found a total of 8 double-base substitutions, including correctly paired substitutions (such as AU→GC) and hybrid mismatched substitutions (such as GU→GC) (Fig. 7A and B). We also identified 12 possible single-base substitution events. They included 8 heterozygous mismatches (such as GU→GC) and 4 homozygous mismatched substitutions (such as GG→GC) (Fig. 7C and D). When substitution occurred in the initial or convergent state, the transition rate generated by the driving GC was always higher than that generated by the AU (Fig. 7). Base pair substitutions primarily drove the generation of correct pairs (AU and GC) through single-base substitutions. The substitution rate in the convergence state was higher than the initial substitution rate (Fig. 7C and D).
Base substitution rates for generating AU and GC in the initial state (I) and equilibrium state (E). Both nucleotides in the base pair were substituted to produce AU and GC. Before the substitution, they exhibited correct pairing (A) and heterozygous pairing (B), respectively. Only one nucleotide in the base pair was substituted to produce AU and GC. Before substitution, the pairs exhibited heterozygous pairing (C) and homozygous pairing (D). Different substitution processes are marked with different colours in the legend.
Discussion
ITS2 is a DNA barcode that can be used to effectively identify S. Suberectus
The ITS region is one of the most widely used DNA barcodes. The noncoding internal transcribed spacer region (ITS1 and ITS2) of ribosomal DNA in the ITS region has a higher evolutionary rate than the coding region does, shows a high degree of differentiation at the species level, and can be used to identify closely related species61. Its recognition ability exceeds that of the plastid region15,62,63,64. Amplification and sequencing success rates are the basis for barcoding applications65. Kress et al. proposed that short DNA sequences are easier and more economical to extract and sequence66. Cahyaningsih et al. reported that the GC content is positively correlated with sequencing accuracy67. The ITS2 sequence used in this study met these conditions (~ 220 bp, ~ 61.74%) (Table S1).
Meier et al. used the condition that the minimum interspecific genetic distance was greater than the maximum intraspecific genetic distance as the criterion for effectively distinguishing species68. The KS value is positively correlated with the degree of differentiation69. These theories combined with our results (Figs. 2 and 3) suggest that ITS2 is suitable for the identification of S. suberectus. In addition, compared with the studies on the identification of S. suberectus using 26 S rDNA (7 samples, 4 species)20, matk (8 samples, 5 species)21 and psbA-trnH (79 samples, 8 species)22, our study included almost all the source species of almost all easily confused products (292 sequences, 17 species), providing more comprehensive and valuable results for practical applications.
Genetic variation of S. Suberectus ITS2
ITS2 has sufficient variation to be an essential marker for classification and genetic diversity analysis of animals, plants and microorganisms37,70. Ding et al. used ITS2 sequences to evaluate the genetic differences of Artemisia annua71. Lin et al. reported that S. plagiophyllum on the west coast of Thailand contained a total of 17 different ITS2 haplotypes34. Our study revealed that S. suberectus contained 8 ITS2 haplotypes and that there were two main haplotypes (H2 and H6) (Fig. 4). We speculate that fewer types of variation result from homogenization caused by natural selection. Preliminary analysis of ITS2 haplotypes is the basis for distinguishing the molecular characteristics of members within species72. In the future, joint analysis of medicinal ingredient content, haplotypes and copy numbers will promote the development of screening methods for high-quality S. suberectus.
ITS2 secondary structure is highly relevant to species taxonomy73. It is difficult to use ITS2 sequences to identify changes at the species level, but comparisons of secondary structures make up for this shortcoming29. ITS2 secondary structures often differ among genotypes. The secondary structure of ITS2 can be used as a marker for the genotypes of Eryngium foetidum29 and Colocasia esculenta74. We predicted the secondary structure of ITS2 from S. suberectus (Figures S1 and S2). On the basis of these findings, we constructed a phylogenetic tree and drew a consensus secondary structure map (Fig. 5). These results can be used to develop variety markers for the cultivation and selection of wild resources and promote the protection and utilization of wild resources in the future.
A high GC content is the basis for the successful identification and analysis of the genetic diversity of S. suberectus via ITS2
During meiosis, chromosomal recombination results in base mismatches75. The gBGC hypothesis suggests that DNA repair mismatches are preferentially converted to GCs rather than ATs76. ITS2 is a region of ribosomal DNA (nrDNA) with a high local recombination rate. ITS2 evolved due to chromosomal recombination in a wide range of organisms61. Rapidly reorganized regions containing higher GC contents are thought to be characteristic of the gBGC model77. gBGC is considered one of the reasons for the increased GC content in the ITS2 of angiosperms, including those of the genus Corydalis43,58. The conversion of base pairs in the pairing region of ITS2 of S. suberectus to GC is consistent with the above characteristics (Fig. 7). In addition, since the GC content in the current study was higher than the equilibrium GC content, we speculate that the driving force for maintaining the high GC content of S. suberectus ITS2 is not only gBGC (Fig. 6). The synthesis of GC requires more biochemical resources than the synthesis of AT78. The current high GC content in the paired region may be driven by structural selection, ensuring the thermodynamic stability of ITS279.
High GC content is an intuitive reflection of high recombination and mutation rates caused by high levels of meiosis80,81. gBGC can maintain mutations within a certain range and produce more homologous genes82,83. We speculate that S. suberectus may have a relatively high level of meiosis, resulting in relatively high recombination in ITS2, which makes S. suberectus easy to distinguish from other species in terms of ITS2, and there are many types within S. suberectus.
Conclusion
In this study, phylogenetic trees were constructed, and genetic distances and KS values were calculated via ITS2 of S. suberectus and 17 other species. ITS2 of S. suberectus was assigned to a separate branch in the phylogenetic tree. The genetic distance and KS value of ITS2 in S. suberectus were smaller than those between S. suberectus and other species. These results support the potential of using ITS2 for the identification of S. suberectus.
The genetic diversity of S. suberectus based on ITS2 was analysed. S. suberectus ITS2 had 8 haplotypes, and the most important haplotypes were H2 and H6. The phylogenetic tree based on secondary structure revealed 4 branches. These results provide information for the division of S. suberectus diversity.
One of the reasons for the high GC content in S. suberectus ITS2 is gBGC. The high degree of recombination and mutation of ITS2 caused by the high degree of meiosis is the basis for distinguishing S. suberectus from other species, as is the high degree of polymorphism within S. suberectus.
Data availability
Availability of data and materials: All data generated or analyzed in this study are included in this published article and its Supplementary Material. The ITS2 sequences of S. suberectus, M. sempervirens and C. unijugum were submitted to Genbank database (https://www.ncbi.nlm.nih.gov/genbank/, accession number: PP465924-PP465979).
References
Liu, X. Y. et al. Anti-inflammatory activity of some characteristic constituents from the vine stems of Spatholobus Suberectus. Molecules. 24 (20), 3750 (2019).
Li, W. et al. Chemical characterization of procyanidins from Spatholobus Suberectus and their antioxidative and anticancer activities. J. Funct. Foods. 12, 468–477 (2015).
Kwon, K. R. et al. Attenuation of UVB-induced photo-aging by polyphenolic-rich Spatholobus suberectus stem extract via modulation of MAPK/AP-1/MMPs signaling in human keratinocytes. Nutrients. 11 (6), 1341 (2019).
Zhao, P. et al. Spatholobus suberectus exhibits antidiabetic activity in vitro and in vivo through activation of AKT-AMPK pathway. Evid. Based Complement. Alternat Med. 18, 6091923 (2017).
Zhang, F. et al. A review of the pharmacological potential of Spatholobus Suberectus Dunn on cancer. Cells. 11 (18), 2885 (2022).
Qin, S. S. et al. Comparative genomics of Spatholobus suberectus and insight into flavonoid biosynthesis. Front. Plant. Sci. 4 (11), 528108 (2020).
Cheng, Y. Y. et al. Analysis of Sheng-Mai-San, a ginseng-containing multiple components traditional Chinese herbal medicine using liquid chromatography tandem mass spectrometry and physical examination by electron and light microscopies. Molecules. 21 (9), 1159 (2016).
Sun, J. X. et al. Precise identification of Celosia argentea seed and its five adulterants by multiple morphological and chemical means. J. Pharm. Biomed. Anal. 216 (15), 114802 (2022).
Li, X. X. et al. Comprehensive identification of Vitex trifolia fruit and its five adulterants by comparison of micromorphological, microscopic characteristics, and chemical profiles. Microsc Res. Tech. 83 (12), 1530–1543 (2022).
Chen, S. L. et al. Validation of the ITS2 region as a novel DNA barcode for identifying medicinal plant species. PLoS One 5(1), e8613 (2010).
Sokołowska, J. et al. Assessment of ITS2 region relevance for taxa discrimination and phylogenetic inference among Pinaceae. Plants. 11 (8), 1078 (2022).
Gao, Z. T. et al. DNA mini-barcoding: a derived barcoding method for herbal molecular identification. Front. Plant. Sci. 10, 987 (2019).
Coissac, E. et al. From barcodes to genomes: extending the concept of DNA barcoding. Mol. Ecol. 25 (7), 1423–1428 (2016).
Hollingsworth, P. M. et al. Telling plant species apart with DNA: from barcodes to genomes. Philos. Trans. R Soc. Lond. B Biol. Sci. 371 (1702), 20150338 (2016).
China, Plant, B. O. L. et al. Comparative analysis of a large dataset indicates that internal transcribed spacer (ITS) should be incorporated into the core barcode for seed plants. PNAS. 108, 19641–19646 (2011).
Plant, C. B. O. L. Working. Group. A DNA barcode for land plants. PNAS. 108 (49), 19641–19646 (2011).
Chen, S. L. et al. A renaissance in herbal medicine identification: from morphology to DNA. Biotechnol. Adv. 32 (7), 1237–1244 (2014).
Zhang, Z. X. et al. Morphological and physiological responses of Spatholobus Suberectus Dunn to nitrogen and water availability. Photosynthetica. 57 (4), 1130–1141 (2019).
Xiao, J. P. et al. Pharmacodynamic material basis and potential mechanism study of Spatholobi Caulis in reversing osteoporosis. Evid-Based Compl Alt. 14, 3071147 (2023).
An, R. et al. Molecular identification of Spatholobus Suberectus and its adulterants based on 26S rDNA D1-D3 region sequence analysis. J. Guangzhou Univ. Chin. Med. 27 (04), 403–406 (2010).
Huang, Q. L. et al. Analysis and molecular identification of matK gene in Spatholobus Suberectus and its adulterated products. North. Hortic. 17, 94–98 (2015).
Zhou, H. et al. Psba-trnh barcode molecular identification of Spatholobi Caulis, Kadsurae Caulis, Sargentodoxa cuneata and other Spatholobi medicinal materials. Modernization Traditional Chin. Med. Materia Medica-World Sci. Technol. 18 (01), 40–45 (2016).
Nafisi, H. et al. Characterizing nrDNA ITS1, 5.8S and ITS2 secondary structures and their phylogenetic utility in the legume tribe Hedysareae with special reference to Hedysarum. PLoS One 18(04), e0283847 (2023).
Keller, A. et al. 5.8S-28S rRNA interaction and HMM-based ITS2 annotation. Gene. 430 (1–2), 50–57 (2009).
Giudicelli, G. C. et al. Secondary structure of nrDNA Internal Transcribed spacers as a useful tool to align highly divergent species in phylogenetic studies. Genet. Mol. Biol. 40 (1 Suppl 1), 191–199 (2017).
Chao, Z. et al. DNA Barcoding Chin. Med. Bupleurum Phytomedicine 21(13), 1767–1773 (2014).
Wei, S. et al. Molecular identification and targeted quantitative analysis of medicinal materials from Uncaria species by DNA barcoding and LC-MS/MS. Molecules. 24 (01), 175 (2019).
Dechbumroong, P. et al. DNA barcoding of Aristolochia plants and development of species-specific multiplex PCR to aid HPTLC in ascertainment of Aristolochia herbal materials. PLoS One 13(8), e0202625 (2018).
Acharya, G. C. et al. Molecular phylogeny, DNA barcoding, and ITS2 secondary structure predictions in the medicinally important Eryngium genotypes of east coast region of India. Genes (Basel). 13 (9), 1678 (2022).
Zheng, M. et al. Molecular authentication of medicinal and edible plant Gnaphalium affine (cudweed herb, Shu-qu-cao) based on DNA barcode marker ITS2. Acta Physiol. Plant. 43 (8), 119 (2021).
Zhou, Y. et al. ITS2 barcode for identifying the officinal rhubarb source plants from its adulterants. Biochem. Syst. Ecol. 70, 177–185 (2017).
Khazal, R. M. et al. Genetic diversity of Leishmania major isolated from different dermal lesions using ITS2 region. Acta Parasitol. 69, 831–838 (2024).
Delva, E. et al. Genetic diversity of Amylomyces rouxii from Ragi Tapai in Java island based on ribosomal regions ITS1/ITS2 and D1/D2. Mycobiology. 50 (2), 132–141 (2022).
Lin, Y. et al. Marine conditions in Andaman Sea shape the unique genetic structure of Sargassum Plagiophyllum C. Agardh. J. Appl. Phycol. 36 (1), 501–511 (2024).
Mai, J. C. et al. The internal transcribed spacer 2 exhibits a common secondary structure in green algae and flowering plants. J. Mol. Evol. 44 (3), 258–271 (1997).
Coleman, A. W. ITS2 is a double-edged tool for eukaryote evolutionary comparisons. Trends Genet. 19 (7), 370–375 (2003).
Umdale, S. D. et al. Genetic diversity of Asian Vigna species (Subgenus Ceratotropis; Genus Vigna) in India based on ITS2 sequences data. Plant. Mol. Biol. Rep. 41 (3), 454–469 (2023).
Li, X. Q. et al. Variation, evolution, and correlation analysis of C + G content and genome or chromosome size in different kingdoms and phyla. PLoS One 9(2), e88339 (2014).
Kakimoto, Y. et al. MicroRNA stability in FFPE tissue samples: dependence on GC content. PLoS One 11(9), e0163125 (2016).
Chen, H. et al. Analysis of DNA interactions and GC content with energy decomposition in large-scale quantum mechanical calculations. Phys. Chem. Chem. Phys. 23 (14), 8891–8899 (2021).
Zhang, J. et al. GC content around splice sites affects splicing through pre-mRNA secondary structures. BMC Genom. 12 (1), 90 (2011).
Karro, J. E. et al. Exponential decay of GC content detected by strand-symmetric substitution rates influences the evolution of isochore structure. Mol. Biol. Evol. 25 (2), 362–374 (2007).
Liu, Y. et al. GC heterogeneity reveals sequence-structures evolution of angiosperm ITS2. BMC Plant. Biol. 23 (1), 608 (2023).
Bengtsson-Palme, J. et al. Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data. Methods Ecol. Evol. 4, 914–919 (2013).
González-Pech, R. A. et al. Commonly misunderstood parameters of NCBI BLAST and important considerations for users. Bioinformatics. 35 (15), 2697–2698 (2018).
Nakamura, T. et al. Parallelization of MAFFT for large-scale multiple sequence alignments. Bioinformatics. 34 (14), 2490–2492 (2018).
Capella-Gutiérrez, S. et al. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 25 (15), 1972–1973 (2009).
Darriba, D. et al. ModelTest-NG: a new and scalable tool for the selection of DNA and protein evolutionary models. Mol. Biol. Evol. 37 (1), 291–294 (2019).
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 30 (9), 1312–1313 (2014).
Tamura, K. et al. Prospects for inferring very large phylogenies by using the neighbor-joining method. PNAS. 101 (30), 11030–11035 (2004).
Yu, G. et al. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Mol. Biol. Evol. 35 (12), 3041–3043 (2018).
Wright, E. & Using DECIPHER v2.0 to analyze big biological sequence data in R. R J. 8 (1), 352–359 (2016).
Valero-Mora, P. M. et al. ggplot2: elegant graphics for data analysis. Meas-Interdiscip Res. 17 (3), 160–167 (2019). 2nd ed.
Rozas, J. et al. DnaSP 6: DNA sequence polymorphism analysis of large data sets. Mol. Biol. Evol. 34 (12), 3299–3302 (2017).
Paradis, E. Pegas: an R package for population genetics with an integrated–modular approach. Bioinformatics. 26 (3), 419–420 (2010).
Benman, R. B. Using RNAFOLD to predict the activity of small catalytic RNAs. Biotechniques. 15 (6), 1090–1095 (1993).
Wright, P. R. et al. CopraRNA and IntaRNA: predicting small RNA targets, networks and interaction domains. Nucleic Acids Res. 42, W119–123 (2014).
Xian, Q. et al. Structure-based GC investigation sheds new light on ITS2 evolution in Corydalis species. Int. J. Mol. Sci. 24 (9), 7716 (2023).
Allen, J. E. et al. Assessing the state of substitution models describing noncoding RNA evolution. Genome Biol. Evol. 6 (1), 65–75 (2014).
Sueoka, N. On the genetic basis of variation and heterogeneity of DNA base composition. PNS. 48 (4), 582–592 (1962).
Álvarez, I. et al. Ribosomal ITS sequences and plant phylogenetic inference. Mol. Phylogenet Evol. 29 (3), 417–434 (2003).
Lv, Y. N. et al. Identification of medicinal plants within the Apocynaceae family using ITS2 and psba-trnh barcodes. Chin. J. Nat. Med. 18 (8), 594–605 (2020).
Gao, T. et al. Identification of medicinal plants in the family Fabaceae using a potential DNA barcode ITS2. J. Ethnopharmacol. 130 (1), 116–121 (2010).
Feng, S. G. et al. Application of the ribosomal DNA ITS2 region of Physalis (Solanaceae): DNA barcoding and phylogenetic study. Front. Plant. Sci. 7, 1047 (2016).
Yu, J. et al. Progress in the use of DNA barcodes in the identification and classification of medicinal plants. Ecotox Environ. Safe. 208, 111691 (2021).
Kress, W. J. et al. Use of DNA barcodes to identify flowering plants. PNAS. 102 (23), 8369–8374 (2005).
Cahyaningsih, R. et al. DNA barcoding medicinal plant species from Indonesia. Plants. 11 (10), 1375 (2022).
Meier, R. et al. The use of mean instead of smallest interspecific distances exaggerates the size of the barcoding gap and leads to misidentification. Syst. Biol. 57 (5), 809–813 (2008).
Lynch, M. et al. The evolutionary fate and consequences of duplicate genes. Science. 290 (5494), 1151–1155 (2020).
Smith, E. G. et al. Host specificity of Symbiodinium variants revealed by an ITS2 metahaplotype approach. ISME J. 11 (6), 1500–1503 (2017).
Ding, X. X. et al. Developing population identification tool based on polymorphism of rDNA for traditional Chinese medicine: Artemisia annua L. Phytomedicine. 116, 154882 (2023).
Obert, T. et al. Delimitation of five astome ciliate species isolated from the digestive tube of three ecologically different groups of lumbricid earthworms, using the internal transcribed spacer region and the hypervariable D1/D2 region of the 28S rRNA gene. BMC Evol. Biol. 20 (1), 37 (2020).
Liu, Z. W. et al. Molecular authentication of the medicinal species of Ligusticum (Ligustici Rhizoma et Radix, Gao-ben) by integrating non-coding internal transcribed spacer 2 (ITS2) and its secondary structure. Front. Plant. Sci. 9 (10), 429 (2019).
Devi, M. P. et al. DNA barcoding and ITS2 secondary structure predictions in Taro (Colocasia esculenta L. Schott) from the north eastern hill region of India. Genes (Basel). 13 (12), 2294 (2022).
Johzuka-Hisatomi, Y. et al. Efficient transfer of base changes from a vector to the rice genome by homologous recombination: involvement of heteroduplex formation and mismatch correction. Nucleic Acids Res. 36 (14), 4727–4735 (2008).
Lesecque, Y. et al. GC-biased gene conversion in yeast is specifically associated with crossovers: molecular mechanisms and evolutionary significance. Mol. Biol. Evol. 30 (6), 1409–1419 (2013).
Rousselle, M. et al. Influence of recombination and GC-biased gene conversion on the adaptive and nonadaptive substitution rate in mammals versus birds. Mol. Biol. Evol. 36 (3), 458–471 (2018).
Rocha, E. P. et al. Base composition bias might result from competition for metabolic resources. Trends Genet. 18 (6), 291–294 (2002).
Higgs, P. G. RNA secondary structure: physical and computational aspects. Q. Rev. Biophys. 33 (3), 199–253 (2000).
Kiktev, D. A. et al. GC content elevates mutation and recombination rates in the yeast Saccharomyces cerevisiae. PNAS. 115 (30), E7109–7118 (2018).
Long, X. et al. Independent evolution of sex chromosomes and male pregnancy-related genes in two seahorse species. Mol. Biol. Evol. 40 (1), 279 (2023).
Liu, A. et al. GC-biased gene conversion drives accelerated evolution of ultraconserved elements in mammalian and avian genomes. Genome Res. 33 (10), 1673–1689 (2023).
Boman, J. et al. The effects of GC-biased gene conversion on patterns of genetic diversity among and across butterfly genomes. Genome Biol. Evol. 13 (5), 064 (2021).
Acknowledgements
Thanks to all co-authors for their dedication to this article. The authors thank the editors and reviewers for their work in promoting the manuscript.
Funding
This study was funded by the Survey and Collection of Germplasm Resources of Woody & Herbaceous Plants in Guangxi, China (GXFS-2021-34); the Guangxi High-Level Key Disciplines Construction Pilot Project in Traditional Chinese Medicine—Authentication of Chinese Medicinal Materials (No. 27); the Self-funded scientific research project of Guangxi Zhuang Autonomous Region Administration of Traditional Chinese Medicine “Investigation and quality evaluation of germplasm resources of Spatholobus suberectus in Guangxi production areas (GXZYA20220013)”; the Guangxi Traditional Chinese Medicine Appropriate Technology Development and Promotion Project “Optimal solutions for suitable niches for six key protected wild medicinal plants in Guangxi based on photosynthetic characteristics (GZSY22-06)”.
Author information
Authors and Affiliations
Contributions
Author Contributions: Conceptualization, Z.Z, J.W., and Y.H.; Data curation, C.X.; Formal analysis, Y.N.; Funding acquisition, Y.H.; Investigation, K.L.; Methodology, Z.Z., J.W. and K.L.; Resources, C.X. and Y.N.; Software, J.W.; Supervision, K.L. and Y.H.; Validation, J.W.; Visualization, J.W., Y.N. and K.L.; Writing–original draft, Z.Z.; Writing–review & editing, Z.Z., J.W., and Y.H. All authors have read and agreed to the published version of the manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Plants materials involved in this research are used for scientific research and are allowed to be used and provided free of charge in this study. This article did not contain any studies with human participants or animals and did not involve any endangered or protected species.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhao, Zy., Wu, Jw., Xu, Cg. et al. Molecular identification and studies on genetic diversity and structure-related GC heterogeneity of Spatholobus Suberectus based on ITS2. Sci Rep 14, 23523 (2024). https://doi.org/10.1038/s41598-024-75763-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-75763-w