Molecular identification and studies on genetic diversity and structure-related GC heterogeneity of Spatholobus Suberectus based on ITS2

Zhao, Zi-yi; Wu, Jia-wen; Xu, Chuan-gui; Nong, You; Huang, Yun-feng; Lai, Ke-dao

doi:10.1038/s41598-024-75763-w

Download PDF

Article
Open access
Published: 09 October 2024

Molecular identification and studies on genetic diversity and structure-related GC heterogeneity of Spatholobus Suberectus based on ITS2

Zi-yi Zhao¹^na1,
Jia-wen Wu²^na1,
Chuan-gui Xu¹,
You Nong¹,
Yun-feng Huang¹ &
…
Ke-dao Lai¹

Scientific Reports volume 14, Article number: 23523 (2024) Cite this article

2288 Accesses
2 Citations
Metrics details

Subjects

Abstract

To determine the role of internal transcribed spacer 2 (ITS2) in the identification of Spatholobus suberectus and explore the genetic diversity of S. suberectus. A total of 292 ITS2s from S. suberectus and 17 other plant species were analysed. S. suberectus was clustered separately in the phylogenetic tree. The genetic distance between species was greater than that within S. suberectus. Synonymous substitution rate (Ks) analysis revealed that ITS2 diverged the most recently within S. suberectus (Ks = 0.0022). These findings suggested that ITS2 is suitable for the identification of S. suberectus. The ITS2s were divided into 8 haplotypes and 4 evolutionary branches on the basis of secondary structure, indicating that there was variation within S. suberectus. Evolutionary analysis revealed that the GC content of paired regions (pGC) was greater than that of unpaired regions (upGC), and the pGC showed a decreasing trend, whereas the upGC remained unchanged. Single-base mutation was the main cause of base pair substitution. In both the initial state and the equilibrium state, the substitution rate of GC was higher than that of AU. The increase in the GC content was partly attributed to GC-biased gene conversion (gBGC). High GC content reflected the high recombination and mutation rates of ITS2, which is the basis for species identification and genetic diversity. We characterized the sequence and structural characteristics of S. suberectus ITS2 in detail, providing a reference and basis for the identification of S. suberectus and its products, as well as the protection and utilization of wild resources.

Unlocking Spanish pear genetic diversity: strategies for construction of a national core collection

Article Open access 04 November 2024

Pre-breeding in alfalfa germplasm develops highly differentiated populations, as revealed by genome-wide microhaplotype markers

Article Open access 08 January 2025

Parallel adaptation in autopolyploid Arabidopsis arenosa is dominated by repeated recruitment of shared alleles

Article Open access 17 August 2021

Introduction

Spatholobus suberectus is distributed in Fujian Province and the Guangxi Zhuang Autonomous Region of China¹. It is a leguminous plant used in traditional Chinese medicine. Dried S. suberectus stem is used as a medicinal component. Due to the red juice that exudes during harvesting, it is also known as “Ji Xue Teng” in China. Modern pharmacological and clinical research has shown that S. suberectus has anti-inflammatory¹, antioxidant², antiphotoaging³, antidiabetic⁴, and anticancer⁵ properties. In addition, S. suberectus has long been used as a nourishing food additive (wine, soup and tea) in China². Owing to its importance in food and medicine, the market demand for S. suberectus is high, but the scarcity of wild resources and long growth period (more than 7 years) before it can be used as a medicine limit its supply⁶. Unscrupulous businessmen, driven by profits, mix vine plants with S. suberectus, which greatly affects the effectiveness and safety of its use in clinical medicine. The keys to solving this problem will be the development of methods for the identification of S. suberectus and its products and transformation of the supply model from wild resources to artificial cultivation.

Methods such as source, character and microscopic identification and chemical composition analysis are often used to identify medicinal plants or processed products^7,8,9. DNA barcoding is a molecular diagnostic technology that uses standard, sufficiently variable DNA fragments for species identification and delimitation¹⁰. The DNA barcode fragments are short in length and easy to amplify, and even if the samples are not fresh enough (e.g., samples from herbaria or prepared products), the DNA that has been partially degraded can be distinguished^11,12. The construction of barcode libraries from known taxa is the basis of this work, as well as the analysis of phylogenetic relationships on the basis of library assignment of barcode sequences to distinguish species¹³. Internationally recognized candidate sequences for plant DNA barcodes include the chloroplast–plastid region (matK, rbcL, ycf, psbA-trnH, etc.) and the nuclear internal transcribed spacer (ITS) region¹⁴. The Consortium for the Barcode of Life (CBOL) Plant Group proposed the combination of plastid and nuclear ITS regions as an effective barcoding tool for distinguishing plant species¹⁵. The China CBOL Plant Group has incorporated ITS (or ITS2) into the core barcode for seed plant identification, and psbA-trnH are recommended as auxiliary barcodes¹⁶. Chen et al.¹⁷ conducted a comparative analysis of the amplification success rate, intraspecies and interspecies variation, and barcoding gap of multiple candidate sequences and reported that ITS2 performed the best. They also used 6,600 samples of 4,800 plant species to evaluate the ability to use ITS2 for identification. The results revealed that the species identification success rate was 92.7%. Therefore, the use of the ITS2 sequence as a universal DNA barcode sequence for medicinal plants was proposed. The “Pharmacopoeia of the People’s Republic of China” (2015 edition) includes the guiding principles for DNA barcoding technology and establishes a Chinese herbal medicine identification system based on ITS2 (ITS) supplemented with psbA-trnH^18,19.

Several DNA barcodes have been found to be useful for identifying S. suberectus. An et al. used 26 S rDNA to distinguish S. suberectus, Callerya dielsiana, Derris taiwaniana, Mucuna sempervirens and Derris trifoliata through seven samples²⁰. Huang et al. used matk to distinguish S. suberectus, D. trifoliata, Entada phaseoloides, Callerya cinerea and Sargentodoxa cuneata through 8 samples²¹. Zhou et al. used psbA-trnH to distinguish S. suberectus, S. cuneata, Kadsura interior, Kadsura heteroclita, M. sempervirens, Mucuna birdwoodiana, C. dielsiana and Callerya tsui through 79 samples²². ITS2 is located between the 5.8 S and 26 S eukaryotic ribosomal RNA genes and does not encode proteins²³. Since the ITS region is not incorporated into the ribosome, it is subject to less natural selection pressure during evolution, thus tolerating more variation and showing extremely extensive sequence polymorphism in most eukaryotic organisms^24,25. The ITS2 region has been used as a phylogenetic marker to identify many medicinal plants, closely related plants and a wide range of species¹⁰. Bupleurum L. (Apiaceae)²⁶, Uncaria²⁷, Aristolochia²⁸, Eryngium²⁹, Gnaphalium affine³⁰, Rheum officinale³¹ and other medicinal plants can all be identified to a certain extent via ITS2 barcodes. We used ITS2 to distinguish S. suberectus from source species of almost all easily confused products (17). This study addresses the lack of a universally applicable barcode for the identification of S. suberectus, thereby ensuring the safe use of S. suberectus as a medicine.

The highly variable ITS2 is not only used in species identification but also contributes to genetic diversity analysis of species and varieties. Khazal et al. analysed the genetic diversity of Leishmania major through phylogenetic inference based on ITS2³². Delva et al. analysed the genetic diversity of Amylomyces rouxii through phylogenetic analysis, genetic distance, genetic variation and haplotype network construction on the basis of ITS1/ITS2 and D1/D2³³. Lin et al. used ITS2 and the mitochondrial cytochrome c oxidase subunit 1 gene (cox1) as genetic markers to conduct genotyping analysis, identified 17 different ITS2 haplotypes and determined the population genetic structure of Sargassum plagiophyllum C. Agardh³⁴. The mature secondary structure of the catalytic ribosomal RNA (a central loop connected to a four-finger structure) is highly conserved^35,36. The prediction of secondary structure can not only serve to supplement and verify phylogeny at the sequence level but also assist in the discovery of genotypic variations in the population²⁹. Umdale et al. evaluated the species and genetic diversity of Asian Vigna through haplotype and secondary structure analysis based on ITS2³⁷. Therefore, we analysed the genetic diversity of wild S. suberectus via ITS2. Genotype mining will provide information and labels for screening excellent varieties in the future and lay the foundation for artificial cultivation.

The guanine and cytosine (GC) content provides the material basis for species diversity and genetic diversity. GC base pairs also guarantee the structural stability of double-stranded DNA and RNA^38,39. The GC content and distribution may be constrained and driven by structure, thermodynamic stability, and other factors^40,41. The GC content and distribution are also reflective of sequence selection and structural evolution⁴². A recent study revealed that the paired region of angiosperm ITS2 contains a relatively high GC content and that GC-biased gene conversion (gBGC) is one of the main reasons for the high GC content⁴³. We explored the evolution of S. suberectus ITS2 in relation to structure-related GC substitution trends and mechanisms and the relationships between GC content and species differentiation and genetic diversity.

Materials and methods

Sample collection and specimen identification

Field sampling of S. suberectus and source plants of easily confused products was conducted from May to June 2023. Fresh leaves from a total of 56 samples, including S. suberectus (39), M. sempervirens (6), and Craspedolobium unijugum (11), were collected in this study. The leaves were immediately placed into a sealed plastic bag containing enough silica gel to avoid DNA degradation. All the plants were identified by Prof. Yunfeng Huang and Prof. Kejian Yan of the Guangxi Institute of Chinese Medicine & Pharmaceutical Science. S. suberectus (Herbarium: 00327831), M. sempervirens (Herbarium: 02014496), and C. unijugum (Herbarium: 02028691) can be identified in the Chinese Virtual Herbarium (https://www.cvh.ac.cn/index.php). The samples were obtained mainly from the Guangxi Zhuang Autonomous Region and Yunnan Province, China. The distance between all the sampled individuals of the same population was greater than 50 m. Sample information is shown in Table S1.

DNA extraction, amplification, and sequencing

The genomic DNA of all the samples was extracted from approximately 15 mg of silica gel-dried leaves via the Fast Pure Plant DNA Isolation Mini Kit (Vazyme, Nanjing, China). The quality and concentration of the genomic DNA were determined via a NanoDrop 1000 spectrophotometer. Each DNA solution was diluted or concentrated to approximately 50 ng/µL for PCR amplification. PCRs were performed in a volume of 25 µL, which consisted of 50 ng of template DNA (1 µL), 12.5 µL of 2×Taq PCR master mix (Vazyme, Nanjing, China), 2 µL of 10 µmol/L forward and reverse primers, and 9.5 µL of ddH2O. The primers used were as follows: ITS2-2 F “ATGCGATACTTGGTGTGAAT” and ITS2-3R “GACGCTTCTCCAGACTACAAT”¹⁰. The reaction program was as follows: 94 °C for 5 min; 94 °C for 30 s, 58 °C for 45 s, and 72 °C for 45 s (30 cycles); and 72 °C for 10 min. All the PCR products were detected via agarose gel electrophoresis, and the gel was photographed via a UV transilluminator. The product was purified via a Fast Pure Gel DNA Extraction Mini Kit (Vazyme, Nanjing, China), and the reaction mixture was sequenced on an ABI 3130xl automatic sequencer (Applied Biosystems, Foster City, California, USA).

Sequence assembly, feature comparison and genetic analysis

The sequencing peak diagram was spliced and calibrated via Codon Code Aligner 8.0.2 software. The primers and low-quality regions of the sequenced ITS2 sequences were removed and cut according to the annotation file to obtain the complete sequences. The HMMer annotation method, which is based on the hidden Markov model, was used to remove the 5.8 S and 28 S sequences to obtain accurate ITS2 spacer sequences⁴⁴. We used BLAST to search for homologous genes for all obtained sequences against the National Center for Biotechnology Information (NCBI) datanse (https://blast.ncbi.nlm.nih.gov/Blast.cgi? PROGRAM = blastn&PAGE_TYPE = BlastSearch&LINK_LOC = blasthome, accessed on 28 September 2023), and the source species were determined on the basis of the homologous genes with the highest similarity score and the lowest E value⁴⁵. The ITS2 sequences of S. suberectus, M. sempervirens and C. unijugum were submitted to the GenBank database (https://www.ncbi.nlm.nih.gov/genbank/, accession number: PP465924-PP465979, accessed on 12 March 2024). Two hundred and thirty-six (236) ITS2 sequences from 15 adulteration-prone species were downloaded from GenBank (https://www.ncbi.nlm.nih.gov/genbank/, accessed on 28 September 2023) and analysed together. Searches for C. dielsiana (Herbarium: 02015435), S. cuneata (Herbarium: 02108794), M. birdwoodiana (Herbarium: 02098429), C. cinerea (Herbarium: 01924183), K. interior (Herbarium: 02015552), Kadsura heteroclite (Herbarium: 02231169), D. trifoliata (Herbarium: 01924020), E. phaseoloides (Herbarium: 02074730), Mucuna macrocarpa (Herbarium: 01781725), Padbruggea filipes (Herbarium: 1289072), Bauhinia championii (Herbarium: 01965870), Callerya nitida (Herbarium: 02036623), Schisandra propinqua (Herbarium: 02231152), and Schisandra henryi (Herbarium: 02231143) were performed against the Chinese Virtual Herbarium database (https://www.cvh.ac.cn/index.php). Searches for Wisteriopsis reticulata (ID: 88791) were performed against the Chinese Field Herbarium database (https://www.cfh.ac.cn/album/ShowSpAlbum.aspx?spid=88791). MAFFT was used to perform sequence alignment⁴⁶. The tool trimAl was used to trim aligned sequences⁴⁷. ModelTest-NG was used to select the optimal evolutionary model for ITS2 sequences⁴⁸. In accordance with the Akaike information criterion (AICc), the JC model was selected to construct a phylogenetic tree via RAxML-NG software⁴⁹. The bootstrap method (1000 repetitions) was used to check the support rate of each branch⁵⁰. The R package ggtree was used for visualization of evolutionary trees⁵¹.

The R package DECIPHER was used to calculate intraspecific and interspecific genetic distances⁵². The R package ggplot2 was used to visualize the results in the form of box plots⁵³. DNAsp v6.0 was used to analyse the Ks value between the ITS2 sequences of S. suberectus and other plants in the form of noncoding sequences⁵⁴. The results are presented as density plots via the R package ggplot2⁵³. The R package pegas was used for statistical analysis of the haplotypes, and the results were visualized via the basic plotting functions of R software⁵⁵. RNAfold software was used to obtain the secondary structure of ITS2 from S. suberectus⁵⁶. LocARNA software was used to obtain consensus secondary structures and secondary structure-based phylogenetic trees⁵⁷.

In the studies of Xian and Liu et al., the DNA/RNA hybrid substitution model was used to explain the substitution patterns of the ITS2 paired and unpaired regions^43,58. We used the substitution model selection script (model_selection.pl) in PHASE 3.0 to select the best substitution model on the basis of the AICc value⁵⁹. The phylogeny of ITS2 was inferred on the basis of sequence alignment files, consensus secondary structure files and NJ trees. MCMC analysis was performed for 10,000,000 generations to reach convergence, with sampling every 100 generations and 30,000 (30%) trees being burned-in. The remaining trees were used to infer substitution rates at initial and equilibrium states via the mcmcsummarize program of the PHASE package.

The equilibrium GC content (GC*) was calculated according to the method of Xian and Liu et al.^43,58. In the convergence state, the GC content of the sequence in the equilibrium substitution mode can be calculated as the percentage of the AT→GC substitution rate in the sum of the AT→GC and GC→AT substitution rates⁶⁰.

Results

Species identification, sequence characterization and phylogenetic inference

We performed BLAST alignment of the ITS2 sequences of 56 samples. Consistent with the morphological identification, S. suberectus (39), M. sempervirens (6) and C. unijugum (11) were identified. Their percent identity was above 96%. The average percentage identity of S. suberectus ITS2 was 99.83%. A total of 233 sequences from adulteration-prone species in GenBank were analysed together. S. suberectus had the shortest sequence (201/202 bp) and the highest GC content (69.31–71.29). S. cuneata had the longest sequence (228–257 bp). B. championii had the lowest GC content (52.75–54.13) (Table S1).

At the sequence level, phylogenetic analysis was performed according to the ML method to better distinguish species. There were obvious topological differences between S. suberectus and 17 easily confused species, including M. sempervirens and C. unijugum, indicating the usefulness of ITS2 in identifying S. suberectus (Fig. 1).

Genetic differentiation of S. Suberectus and the source plants of adulterated products

Genetic distance models are used to measure the extent of genetic differences between species. The intraspecific genetic distance of S. suberectus was distributed between 0 and 0.244, with an average value of 0.149 (Fig. 2). The average interspecific genetic distance between S. suberectus and S. propinqua was the smallest (0.648), ranging from 0.609 to 0.741. The average interspecific genetic distance between S. suberectus and P. filipes was the greatest (0.746), ranging from 0.721 to 0.756. The intraspecific genetic distances of S. suberectus were all smaller than the interspecific genetic distances of S. suberectus and other species, indicating clear genetic differences.

Ks can be used to compare gene duplication events and evolutionary rates within and between species. The Ks value within S. suberectus was the smallest (0.002), while the Ks value between S. suberectus and K. heteroclita was the highest (0.352) (Fig. 3). These findings indicated that S. suberectus and K. heteroclita complete the differentiation of ITS2 at an early stage. ITS2 of S. suberectus had two peaks with a large peak interval, indicating that ITS2 had undergone at least two large-scale duplications in S. suberectus and that the differentiation rate was slow.

Intraspecific variation in S. Suberectus ITS2 sequences

We used a haplotype network to analyse the genetic variation in the ITS2 sequence of S. suberectus. The 39 ITS2 sequences were divided into 8 haplotypes (H). The main haplotypes were H2 and H6, which contained 11 and 20 sequences, respectively (Fig. 4). There were fewer mutation sites between H2 and H1/3/4/5 (5, 5, 4 and 1, respectively) and between H8 and H6/7 (1 and 2, respectively). There were 21 mutations in H2 and H7, indicating that ITS2 evolved in two different directions, towardH2 and H6. In addition, the only member of H7 was SS030, whose base at position 183 was not detected (Y). H7 and H6 had no other mutations except at this position. This result indicated that the base at position 183 of SS030 may be T, and H6 and H7 can be classified into the same haplotype.

Prediction and phylogenetic inference of the secondary structure of S. Suberectus ITS2

A phylogenetic tree was constructed on the basis of the ITS2 sequence and secondary structure of S. suberectus. The 39 S. suberectus ITS2 sequences were divided into 4 branches (Fig. 5). The members within each branch contained more similar sequences and secondary structures (Figures S1 and S2). ITS2 of branches I, II, and III all contained a classic four-arm structure with one ring, whereas the four arms of ITS2 of branch IV were distributed on a free single strand. In all branches, structure IV had the most rings. Clade I contained 3 bulges, 3 internal loops and a hairpin loop, whereas Clade II contained 2 bulges, 4 internal loops and a hairpin loop. This was also the most important structural difference between Clades I and II. Structure IV of Clade III contained 3 bulges, 2 internal loops and a hairpin loop, whereas structure IV of Clade IV contained 4 bulges, 2 internal loops and a hairpin loop. In addition to the central loop, Clade III had another multiple loop in structure I, which was quite different from the results for the other clades.

Structure-based GC heterogeneity and mutation direction of S. Suberectus ITS2

Liu et al. defined the equilibrium GC content (GC*) as the GC content when the substitution pattern of the sequence remains unchanged over time (convergent evolution) in the future equilibrium state⁴³. The GC* provides clues for inferring the evolution trend of the GC content. We performed statistical analysis of the GC content (pGC and upGC) of the paired and unpaired regions as well as the equilibrium GC (pGC* and upGC*) (Fig. 6). The pGC (75.85 ± 0.49) was significantly greater than the upGC (58.12 ± 0.87). The pGC* (70.4) was lower than the current pGC, indicating a downwards trend in paired region GC replacement patterns. In addition, the upGC* (58.28) was similar to the current upGC content, indicating the opposite evolutionary trend for paired regions and nonpaired regions.

We used the best substitution model, HKY85 + G_RNA16A, which is based on the lowest AICc value, to infer the base pair substitution process. We found a total of 8 double-base substitutions, including correctly paired substitutions (such as AU→GC) and hybrid mismatched substitutions (such as GU→GC) (Fig. 7A and B). We also identified 12 possible single-base substitution events. They included 8 heterozygous mismatches (such as GU→GC) and 4 homozygous mismatched substitutions (such as GG→GC) (Fig. 7C and D). When substitution occurred in the initial or convergent state, the transition rate generated by the driving GC was always higher than that generated by the AU (Fig. 7). Base pair substitutions primarily drove the generation of correct pairs (AU and GC) through single-base substitutions. The substitution rate in the convergence state was higher than the initial substitution rate (Fig. 7C and D).

Discussion

ITS2 is a DNA barcode that can be used to effectively identify S. Suberectus

The ITS region is one of the most widely used DNA barcodes. The noncoding internal transcribed spacer region (ITS1 and ITS2) of ribosomal DNA in the ITS region has a higher evolutionary rate than the coding region does, shows a high degree of differentiation at the species level, and can be used to identify closely related species⁶¹. Its recognition ability exceeds that of the plastid region^15,62,63,64. Amplification and sequencing success rates are the basis for barcoding applications⁶⁵. Kress et al. proposed that short DNA sequences are easier and more economical to extract and sequence⁶⁶. Cahyaningsih et al. reported that the GC content is positively correlated with sequencing accuracy⁶⁷. The ITS2 sequence used in this study met these conditions (~ 220 bp, ~ 61.74%) (Table S1).

Meier et al. used the condition that the minimum interspecific genetic distance was greater than the maximum intraspecific genetic distance as the criterion for effectively distinguishing species⁶⁸. The KS value is positively correlated with the degree of differentiation⁶⁹. These theories combined with our results (Figs. 2 and 3) suggest that ITS2 is suitable for the identification of S. suberectus. In addition, compared with the studies on the identification of S. suberectus using 26 S rDNA (7 samples, 4 species)²⁰, matk (8 samples, 5 species)²¹ and psbA-trnH (79 samples, 8 species)²², our study included almost all the source species of almost all easily confused products (292 sequences, 17 species), providing more comprehensive and valuable results for practical applications.

Genetic variation of S. Suberectus ITS2

ITS2 has sufficient variation to be an essential marker for classification and genetic diversity analysis of animals, plants and microorganisms^37,70. Ding et al. used ITS2 sequences to evaluate the genetic differences of Artemisia annua⁷¹. Lin et al. reported that S. plagiophyllum on the west coast of Thailand contained a total of 17 different ITS2 haplotypes³⁴. Our study revealed that S. suberectus contained 8 ITS2 haplotypes and that there were two main haplotypes (H2 and H6) (Fig. 4). We speculate that fewer types of variation result from homogenization caused by natural selection. Preliminary analysis of ITS2 haplotypes is the basis for distinguishing the molecular characteristics of members within species⁷². In the future, joint analysis of medicinal ingredient content, haplotypes and copy numbers will promote the development of screening methods for high-quality S. suberectus.

ITS2 secondary structure is highly relevant to species taxonomy⁷³. It is difficult to use ITS2 sequences to identify changes at the species level, but comparisons of secondary structures make up for this shortcoming²⁹. ITS2 secondary structures often differ among genotypes. The secondary structure of ITS2 can be used as a marker for the genotypes of Eryngium foetidum²⁹ and Colocasia esculenta⁷⁴. We predicted the secondary structure of ITS2 from S. suberectus (Figures S1 and S2). On the basis of these findings, we constructed a phylogenetic tree and drew a consensus secondary structure map (Fig. 5). These results can be used to develop variety markers for the cultivation and selection of wild resources and promote the protection and utilization of wild resources in the future.

A high GC content is the basis for the successful identification and analysis of the genetic diversity of S. suberectus via ITS2

During meiosis, chromosomal recombination results in base mismatches⁷⁵. The gBGC hypothesis suggests that DNA repair mismatches are preferentially converted to GCs rather than ATs⁷⁶. ITS2 is a region of ribosomal DNA (nrDNA) with a high local recombination rate. ITS2 evolved due to chromosomal recombination in a wide range of organisms⁶¹. Rapidly reorganized regions containing higher GC contents are thought to be characteristic of the gBGC model⁷⁷. gBGC is considered one of the reasons for the increased GC content in the ITS2 of angiosperms, including those of the genus Corydalis^43,58. The conversion of base pairs in the pairing region of ITS2 of S. suberectus to GC is consistent with the above characteristics (Fig. 7). In addition, since the GC content in the current study was higher than the equilibrium GC content, we speculate that the driving force for maintaining the high GC content of S. suberectus ITS2 is not only gBGC (Fig. 6). The synthesis of GC requires more biochemical resources than the synthesis of AT⁷⁸. The current high GC content in the paired region may be driven by structural selection, ensuring the thermodynamic stability of ITS2⁷⁹.

High GC content is an intuitive reflection of high recombination and mutation rates caused by high levels of meiosis^80,81. gBGC can maintain mutations within a certain range and produce more homologous genes^82,83. We speculate that S. suberectus may have a relatively high level of meiosis, resulting in relatively high recombination in ITS2, which makes S. suberectus easy to distinguish from other species in terms of ITS2, and there are many types within S. suberectus.

Conclusion

In this study, phylogenetic trees were constructed, and genetic distances and KS values were calculated via ITS2 of S. suberectus and 17 other species. ITS2 of S. suberectus was assigned to a separate branch in the phylogenetic tree. The genetic distance and KS value of ITS2 in S. suberectus were smaller than those between S. suberectus and other species. These results support the potential of using ITS2 for the identification of S. suberectus.

The genetic diversity of S. suberectus based on ITS2 was analysed. S. suberectus ITS2 had 8 haplotypes, and the most important haplotypes were H2 and H6. The phylogenetic tree based on secondary structure revealed 4 branches. These results provide information for the division of S. suberectus diversity.

One of the reasons for the high GC content in S. suberectus ITS2 is gBGC. The high degree of recombination and mutation of ITS2 caused by the high degree of meiosis is the basis for distinguishing S. suberectus from other species, as is the high degree of polymorphism within S. suberectus.

Data availability

Availability of data and materials: All data generated or analyzed in this study are included in this published article and its Supplementary Material. The ITS2 sequences of S. suberectus, M. sempervirens and C. unijugum were submitted to Genbank database (https://www.ncbi.nlm.nih.gov/genbank/, accession number: PP465924-PP465979).

References

Liu, X. Y. et al. Anti-inflammatory activity of some characteristic constituents from the vine stems of Spatholobus Suberectus. Molecules. 24 (20), 3750 (2019).
Article PubMed PubMed Central CAS Google Scholar
Li, W. et al. Chemical characterization of procyanidins from Spatholobus Suberectus and their antioxidative and anticancer activities. J. Funct. Foods. 12, 468–477 (2015).
Article Google Scholar
Kwon, K. R. et al. Attenuation of UVB-induced photo-aging by polyphenolic-rich Spatholobus suberectus stem extract via modulation of MAPK/AP-1/MMPs signaling in human keratinocytes. Nutrients. 11 (6), 1341 (2019).
Article PubMed PubMed Central CAS Google Scholar
Zhao, P. et al. Spatholobus suberectus exhibits antidiabetic activity in vitro and in vivo through activation of AKT-AMPK pathway. Evid. Based Complement. Alternat Med. 18, 6091923 (2017).
Article Google Scholar
Zhang, F. et al. A review of the pharmacological potential of Spatholobus Suberectus Dunn on cancer. Cells. 11 (18), 2885 (2022).
Article PubMed PubMed Central CAS Google Scholar
Qin, S. S. et al. Comparative genomics of Spatholobus suberectus and insight into flavonoid biosynthesis. Front. Plant. Sci. 4 (11), 528108 (2020).
Article Google Scholar
Cheng, Y. Y. et al. Analysis of Sheng-Mai-San, a ginseng-containing multiple components traditional Chinese herbal medicine using liquid chromatography tandem mass spectrometry and physical examination by electron and light microscopies. Molecules. 21 (9), 1159 (2016).
Article PubMed PubMed Central Google Scholar
Sun, J. X. et al. Precise identification of Celosia argentea seed and its five adulterants by multiple morphological and chemical means. J. Pharm. Biomed. Anal. 216 (15), 114802 (2022).
Article PubMed CAS Google Scholar
Li, X. X. et al. Comprehensive identification of Vitex trifolia fruit and its five adulterants by comparison of micromorphological, microscopic characteristics, and chemical profiles. Microsc Res. Tech. 83 (12), 1530–1543 (2022).
Article Google Scholar
Chen, S. L. et al. Validation of the ITS2 region as a novel DNA barcode for identifying medicinal plant species. PLoS One 5(1), e8613 (2010).
Sokołowska, J. et al. Assessment of ITS2 region relevance for taxa discrimination and phylogenetic inference among Pinaceae. Plants. 11 (8), 1078 (2022).
Article PubMed PubMed Central Google Scholar
Gao, Z. T. et al. DNA mini-barcoding: a derived barcoding method for herbal molecular identification. Front. Plant. Sci. 10, 987 (2019).
Article ADS PubMed PubMed Central Google Scholar
Coissac, E. et al. From barcodes to genomes: extending the concept of DNA barcoding. Mol. Ecol. 25 (7), 1423–1428 (2016).
Article PubMed CAS Google Scholar
Hollingsworth, P. M. et al. Telling plant species apart with DNA: from barcodes to genomes. Philos. Trans. R Soc. Lond. B Biol. Sci. 371 (1702), 20150338 (2016).
Article PubMed PubMed Central Google Scholar
China, Plant, B. O. L. et al. Comparative analysis of a large dataset indicates that internal transcribed spacer (ITS) should be incorporated into the core barcode for seed plants. PNAS. 108, 19641–19646 (2011).
Article ADS Google Scholar
Plant, C. B. O. L. Working. Group. A DNA barcode for land plants. PNAS. 108 (49), 19641–19646 (2011).
ADS Google Scholar
Chen, S. L. et al. A renaissance in herbal medicine identification: from morphology to DNA. Biotechnol. Adv. 32 (7), 1237–1244 (2014).
Article PubMed CAS Google Scholar
Zhang, Z. X. et al. Morphological and physiological responses of Spatholobus Suberectus Dunn to nitrogen and water availability. Photosynthetica. 57 (4), 1130–1141 (2019).
Article CAS Google Scholar
Xiao, J. P. et al. Pharmacodynamic material basis and potential mechanism study of Spatholobi Caulis in reversing osteoporosis. Evid-Based Compl Alt. 14, 3071147 (2023).
Article Google Scholar
An, R. et al. Molecular identification of Spatholobus Suberectus and its adulterants based on 26S rDNA D1-D3 region sequence analysis. J. Guangzhou Univ. Chin. Med. 27 (04), 403–406 (2010).
CAS Google Scholar
Huang, Q. L. et al. Analysis and molecular identification of matK gene in Spatholobus Suberectus and its adulterated products. North. Hortic. 17, 94–98 (2015).
Google Scholar
Zhou, H. et al. Psba-trnh barcode molecular identification of Spatholobi Caulis, Kadsurae Caulis, Sargentodoxa cuneata and other Spatholobi medicinal materials. Modernization Traditional Chin. Med. Materia Medica-World Sci. Technol. 18 (01), 40–45 (2016).
Google Scholar
Nafisi, H. et al. Characterizing nrDNA ITS1, 5.8S and ITS2 secondary structures and their phylogenetic utility in the legume tribe Hedysareae with special reference to Hedysarum. PLoS One 18(04), e0283847 (2023).
Keller, A. et al. 5.8S-28S rRNA interaction and HMM-based ITS2 annotation. Gene. 430 (1–2), 50–57 (2009).
Article PubMed CAS Google Scholar
Giudicelli, G. C. et al. Secondary structure of nrDNA Internal Transcribed spacers as a useful tool to align highly divergent species in phylogenetic studies. Genet. Mol. Biol. 40 (1 Suppl 1), 191–199 (2017).
Article PubMed PubMed Central CAS Google Scholar
Chao, Z. et al. DNA Barcoding Chin. Med. Bupleurum Phytomedicine 21(13), 1767–1773 (2014).
CAS Google Scholar
Wei, S. et al. Molecular identification and targeted quantitative analysis of medicinal materials from Uncaria species by DNA barcoding and LC-MS/MS. Molecules. 24 (01), 175 (2019).
Article PubMed PubMed Central Google Scholar
Dechbumroong, P. et al. DNA barcoding of Aristolochia plants and development of species-specific multiplex PCR to aid HPTLC in ascertainment of Aristolochia herbal materials. PLoS One 13(8), e0202625 (2018).
Acharya, G. C. et al. Molecular phylogeny, DNA barcoding, and ITS2 secondary structure predictions in the medicinally important Eryngium genotypes of east coast region of India. Genes (Basel). 13 (9), 1678 (2022).
Article PubMed CAS Google Scholar
Zheng, M. et al. Molecular authentication of medicinal and edible plant Gnaphalium affine (cudweed herb, Shu-qu-cao) based on DNA barcode marker ITS2. Acta Physiol. Plant. 43 (8), 119 (2021).
Article CAS Google Scholar
Zhou, Y. et al. ITS2 barcode for identifying the officinal rhubarb source plants from its adulterants. Biochem. Syst. Ecol. 70, 177–185 (2017).
Article Google Scholar
Khazal, R. M. et al. Genetic diversity of Leishmania major isolated from different dermal lesions using ITS2 region. Acta Parasitol. 69, 831–838 (2024).
Article PubMed CAS Google Scholar
Delva, E. et al. Genetic diversity of Amylomyces rouxii from Ragi Tapai in Java island based on ribosomal regions ITS1/ITS2 and D1/D2. Mycobiology. 50 (2), 132–141 (2022).
Article PubMed PubMed Central Google Scholar
Lin, Y. et al. Marine conditions in Andaman Sea shape the unique genetic structure of Sargassum Plagiophyllum C. Agardh. J. Appl. Phycol. 36 (1), 501–511 (2024).
Article CAS Google Scholar
Mai, J. C. et al. The internal transcribed spacer 2 exhibits a common secondary structure in green algae and flowering plants. J. Mol. Evol. 44 (3), 258–271 (1997).
Article ADS PubMed CAS Google Scholar
Coleman, A. W. ITS2 is a double-edged tool for eukaryote evolutionary comparisons. Trends Genet. 19 (7), 370–375 (2003).
Article PubMed CAS Google Scholar
Umdale, S. D. et al. Genetic diversity of Asian Vigna species (Subgenus Ceratotropis; Genus Vigna) in India based on ITS2 sequences data. Plant. Mol. Biol. Rep. 41 (3), 454–469 (2023).
Article CAS Google Scholar
Li, X. Q. et al. Variation, evolution, and correlation analysis of C + G content and genome or chromosome size in different kingdoms and phyla. PLoS One 9(2), e88339 (2014).
Kakimoto, Y. et al. MicroRNA stability in FFPE tissue samples: dependence on GC content. PLoS One 11(9), e0163125 (2016).
Chen, H. et al. Analysis of DNA interactions and GC content with energy decomposition in large-scale quantum mechanical calculations. Phys. Chem. Chem. Phys. 23 (14), 8891–8899 (2021).
Article PubMed CAS Google Scholar
Zhang, J. et al. GC content around splice sites affects splicing through pre-mRNA secondary structures. BMC Genom. 12 (1), 90 (2011).
Article CAS Google Scholar
Karro, J. E. et al. Exponential decay of GC content detected by strand-symmetric substitution rates influences the evolution of isochore structure. Mol. Biol. Evol. 25 (2), 362–374 (2007).
Article PubMed Google Scholar
Liu, Y. et al. GC heterogeneity reveals sequence-structures evolution of angiosperm ITS2. BMC Plant. Biol. 23 (1), 608 (2023).
Article PubMed PubMed Central CAS Google Scholar
Bengtsson-Palme, J. et al. Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data. Methods Ecol. Evol. 4, 914–919 (2013).
Article Google Scholar
González-Pech, R. A. et al. Commonly misunderstood parameters of NCBI BLAST and important considerations for users. Bioinformatics. 35 (15), 2697–2698 (2018).
Article Google Scholar
Nakamura, T. et al. Parallelization of MAFFT for large-scale multiple sequence alignments. Bioinformatics. 34 (14), 2490–2492 (2018).
Article PubMed PubMed Central CAS Google Scholar
Capella-Gutiérrez, S. et al. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 25 (15), 1972–1973 (2009).
Article PubMed PubMed Central Google Scholar
Darriba, D. et al. ModelTest-NG: a new and scalable tool for the selection of DNA and protein evolutionary models. Mol. Biol. Evol. 37 (1), 291–294 (2019).
Article PubMed Central Google Scholar
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 30 (9), 1312–1313 (2014).
Article PubMed PubMed Central CAS Google Scholar
Tamura, K. et al. Prospects for inferring very large phylogenies by using the neighbor-joining method. PNAS. 101 (30), 11030–11035 (2004).
Article ADS PubMed PubMed Central CAS Google Scholar
Yu, G. et al. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Mol. Biol. Evol. 35 (12), 3041–3043 (2018).
Article PubMed PubMed Central CAS Google Scholar
Wright, E. & Using DECIPHER v2.0 to analyze big biological sequence data in R. R J. 8 (1), 352–359 (2016).
Article Google Scholar
Valero-Mora, P. M. et al. ggplot2: elegant graphics for data analysis. Meas-Interdiscip Res. 17 (3), 160–167 (2019). 2nd ed.
Google Scholar
Rozas, J. et al. DnaSP 6: DNA sequence polymorphism analysis of large data sets. Mol. Biol. Evol. 34 (12), 3299–3302 (2017).
Article PubMed CAS Google Scholar
Paradis, E. Pegas: an R package for population genetics with an integrated–modular approach. Bioinformatics. 26 (3), 419–420 (2010).
Article PubMed CAS Google Scholar
Benman, R. B. Using RNAFOLD to predict the activity of small catalytic RNAs. Biotechniques. 15 (6), 1090–1095 (1993).
Google Scholar
Wright, P. R. et al. CopraRNA and IntaRNA: predicting small RNA targets, networks and interaction domains. Nucleic Acids Res. 42, W119–123 (2014).
Article PubMed PubMed Central CAS Google Scholar
Xian, Q. et al. Structure-based GC investigation sheds new light on ITS2 evolution in Corydalis species. Int. J. Mol. Sci. 24 (9), 7716 (2023).
Article PubMed PubMed Central CAS Google Scholar
Allen, J. E. et al. Assessing the state of substitution models describing noncoding RNA evolution. Genome Biol. Evol. 6 (1), 65–75 (2014).
Article PubMed PubMed Central Google Scholar
Sueoka, N. On the genetic basis of variation and heterogeneity of DNA base composition. PNS. 48 (4), 582–592 (1962).
Article CAS Google Scholar
Álvarez, I. et al. Ribosomal ITS sequences and plant phylogenetic inference. Mol. Phylogenet Evol. 29 (3), 417–434 (2003).
Article PubMed Google Scholar
Lv, Y. N. et al. Identification of medicinal plants within the Apocynaceae family using ITS2 and psba-trnh barcodes. Chin. J. Nat. Med. 18 (8), 594–605 (2020).
PubMed Google Scholar
Gao, T. et al. Identification of medicinal plants in the family Fabaceae using a potential DNA barcode ITS2. J. Ethnopharmacol. 130 (1), 116–121 (2010).
Article PubMed CAS Google Scholar
Feng, S. G. et al. Application of the ribosomal DNA ITS2 region of Physalis (Solanaceae): DNA barcoding and phylogenetic study. Front. Plant. Sci. 7, 1047 (2016).
Article PubMed PubMed Central Google Scholar
Yu, J. et al. Progress in the use of DNA barcodes in the identification and classification of medicinal plants. Ecotox Environ. Safe. 208, 111691 (2021).
Article CAS Google Scholar
Kress, W. J. et al. Use of DNA barcodes to identify flowering plants. PNAS. 102 (23), 8369–8374 (2005).
Article ADS PubMed PubMed Central CAS Google Scholar
Cahyaningsih, R. et al. DNA barcoding medicinal plant species from Indonesia. Plants. 11 (10), 1375 (2022).
Article PubMed PubMed Central CAS Google Scholar
Meier, R. et al. The use of mean instead of smallest interspecific distances exaggerates the size of the barcoding gap and leads to misidentification. Syst. Biol. 57 (5), 809–813 (2008).
Article PubMed Google Scholar
Lynch, M. et al. The evolutionary fate and consequences of duplicate genes. Science. 290 (5494), 1151–1155 (2020).
Article ADS Google Scholar
Smith, E. G. et al. Host specificity of Symbiodinium variants revealed by an ITS2 metahaplotype approach. ISME J. 11 (6), 1500–1503 (2017).
Article PubMed PubMed Central CAS Google Scholar
Ding, X. X. et al. Developing population identification tool based on polymorphism of rDNA for traditional Chinese medicine: Artemisia annua L. Phytomedicine. 116, 154882 (2023).
Article PubMed CAS Google Scholar
Obert, T. et al. Delimitation of five astome ciliate species isolated from the digestive tube of three ecologically different groups of lumbricid earthworms, using the internal transcribed spacer region and the hypervariable D1/D2 region of the 28S rRNA gene. BMC Evol. Biol. 20 (1), 37 (2020).
Article PubMed PubMed Central CAS Google Scholar
Liu, Z. W. et al. Molecular authentication of the medicinal species of Ligusticum (Ligustici Rhizoma et Radix, Gao-ben) by integrating non-coding internal transcribed spacer 2 (ITS2) and its secondary structure. Front. Plant. Sci. 9 (10), 429 (2019).
Article Google Scholar
Devi, M. P. et al. DNA barcoding and ITS2 secondary structure predictions in Taro (Colocasia esculenta L. Schott) from the north eastern hill region of India. Genes (Basel). 13 (12), 2294 (2022).
Article PubMed CAS Google Scholar
Johzuka-Hisatomi, Y. et al. Efficient transfer of base changes from a vector to the rice genome by homologous recombination: involvement of heteroduplex formation and mismatch correction. Nucleic Acids Res. 36 (14), 4727–4735 (2008).
Article PubMed PubMed Central CAS Google Scholar
Lesecque, Y. et al. GC-biased gene conversion in yeast is specifically associated with crossovers: molecular mechanisms and evolutionary significance. Mol. Biol. Evol. 30 (6), 1409–1419 (2013).
Article PubMed PubMed Central CAS Google Scholar
Rousselle, M. et al. Influence of recombination and GC-biased gene conversion on the adaptive and nonadaptive substitution rate in mammals versus birds. Mol. Biol. Evol. 36 (3), 458–471 (2018).
Article PubMed Central Google Scholar
Rocha, E. P. et al. Base composition bias might result from competition for metabolic resources. Trends Genet. 18 (6), 291–294 (2002).
Article PubMed CAS Google Scholar
Higgs, P. G. RNA secondary structure: physical and computational aspects. Q. Rev. Biophys. 33 (3), 199–253 (2000).
Article PubMed CAS Google Scholar
Kiktev, D. A. et al. GC content elevates mutation and recombination rates in the yeast Saccharomyces cerevisiae. PNAS. 115 (30), E7109–7118 (2018).
Article PubMed PubMed Central CAS Google Scholar
Long, X. et al. Independent evolution of sex chromosomes and male pregnancy-related genes in two seahorse species. Mol. Biol. Evol. 40 (1), 279 (2023).
Article Google Scholar
Liu, A. et al. GC-biased gene conversion drives accelerated evolution of ultraconserved elements in mammalian and avian genomes. Genome Res. 33 (10), 1673–1689 (2023).
Article PubMed PubMed Central Google Scholar
Boman, J. et al. The effects of GC-biased gene conversion on patterns of genetic diversity among and across butterfly genomes. Genome Biol. Evol. 13 (5), 064 (2021).
Article Google Scholar

Download references

Acknowledgements

Thanks to all co-authors for their dedication to this article. The authors thank the editors and reviewers for their work in promoting the manuscript.

Funding

This study was funded by the Survey and Collection of Germplasm Resources of Woody & Herbaceous Plants in Guangxi, China (GXFS-2021-34); the Guangxi High-Level Key Disciplines Construction Pilot Project in Traditional Chinese Medicine—Authentication of Chinese Medicinal Materials (No. 27); the Self-funded scientific research project of Guangxi Zhuang Autonomous Region Administration of Traditional Chinese Medicine “Investigation and quality evaluation of germplasm resources of Spatholobus suberectus in Guangxi production areas (GXZYA20220013)”; the Guangxi Traditional Chinese Medicine Appropriate Technology Development and Promotion Project “Optimal solutions for suitable niches for six key protected wild medicinal plants in Guangxi based on photosynthetic characteristics (GZSY22-06)”.

Author information

Jia-wen Wu and Zi-yi Zhao contributed equally.

Authors and Affiliations

Guangxi Key Laboratory of Traditional Chinese Medicine Quality Standards, Guangxi Institute of Chinese Medicine & Pharmaceutical Science, Nanning, 530022, China
Zi-yi Zhao, Chuan-gui Xu, You Nong, Yun-feng Huang & Ke-dao Lai
College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin, 150000, China
Jia-wen Wu

Authors

Zi-yi Zhao
View author publications
Search author on:PubMed Google Scholar
Jia-wen Wu
View author publications
Search author on:PubMed Google Scholar
Chuan-gui Xu
View author publications
Search author on:PubMed Google Scholar
You Nong
View author publications
Search author on:PubMed Google Scholar
Yun-feng Huang
View author publications
Search author on:PubMed Google Scholar
Ke-dao Lai
View author publications
Search author on:PubMed Google Scholar

Contributions

Author Contributions: Conceptualization, Z.Z, J.W., and Y.H.; Data curation, C.X.; Formal analysis, Y.N.; Funding acquisition, Y.H.; Investigation, K.L.; Methodology, Z.Z., J.W. and K.L.; Resources, C.X. and Y.N.; Software, J.W.; Supervision, K.L. and Y.H.; Validation, J.W.; Visualization, J.W., Y.N. and K.L.; Writing–original draft, Z.Z.; Writing–review & editing, Z.Z., J.W., and Y.H. All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Yun-feng Huang or Ke-dao Lai.

Ethics declarations

Ethics approval and consent to participate

Plants materials involved in this research are used for scientific research and are allowed to be used and provided free of charge in this study. This article did not contain any studies with human participants or animals and did not involve any endangered or protected species.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Supplementary Material 3

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhao, Zy., Wu, Jw., Xu, Cg. et al. Molecular identification and studies on genetic diversity and structure-related GC heterogeneity of Spatholobus Suberectus based on ITS2. Sci Rep 14, 23523 (2024). https://doi.org/10.1038/s41598-024-75763-w

Download citation

Received: 07 July 2024
Accepted: 08 October 2024
Published: 09 October 2024
Version of record: 09 October 2024
DOI: https://doi.org/10.1038/s41598-024-75763-w