Introduction

The Malvaceae family, known for its economic and medicinal importance, consists of nine subfamilies, 243 genera, and approximately 5461 species1,2. Considering their medicinal and therapeutic activities, several Malvaceae taxa are traditionally classified as part of the ‘Bala group’ by the Ayurvedic Pharmacopoeia of India3. These plants refer to a category of medicinal herbs known for their nourishing, strengthening, and rejuvenating properties. They promote vitality, vigour, and overall well-being when used in various Ayurvedic formulations4,5. The term Bala translates to ‘strength’ or ‘vigour’ in Sanskrit, highlighting the therapeutic nature of these plants in Ayurvedic medicine. The ‘Bala group’ refers to a combination of several plant species, each known by different names, such as Atibala (Abutilon indicum (L.) Sweet), Bala (Sida cordifolia L.), Mahabala (Sida rhombifolia L.), Nagabala (Sida cordata (Burm.f.) Borss.Waalk.), or Raktabala (Sida acuta Burm.f. or Sida spinosa L.). However, it should be noted that although the name Sugandhabala contains the word “Bala,” it actually refers to Valeriana jatamansi Jones (Caprifoliaceae), which has distinct properties and is not part of the true Bala group. It is sometimes mistaken for Pavonia zeylanica (L.) Cav. (Malvaceae). Among these, the genera Sida L. and Abutilon Mill., belonging to the tribe Malveae under the subfamily Malvoideae are widely utilized as natural health products for promoting well-being and vitality5,6.

The genus Sida encompasses numerous economically and medically significant plants, and their pharmacological potential has been widely investigated. Several Sida species have been employed in traditional medicine in Africa, America, China, and India to improve general health and treat neurological disorders 5,7,8. Ved and Goraya9 enlisted Sida rhombifolia as a highly traded medicinal plant in India. Popularly known as Mahabala, it is notably used to treat diabetes, gout, hypertension, uterine malfunction, tuberculosis, and heart diseases in different parts of America, China, India, and Indonesia7,10,11,12. However, because of its high demand over its supply, herbal drugs are frequently adulterated or substituted with their closely related genera, such as Malvastrum A. Gray, Malva Tourn. ex L., Abutilon Mill., and Urena Dill. ex L.5,13,14,15. Additionally, previous investigations have revealed that morphological similarity, shared vernacular names, and overlapping distributions of most Sida species have potentially led to misidentification5.

A close ally of the genus Sida, Abutilon, is also known for its medicinal properties16,17. Additionally, it holds substantial economic importance, notably through Abutilon theophrasti Medik., a vital fibre source extensively utilized in the jute and paper industries. Renowned in China and India for its medicinal efficacy in treating various acute ailments18, Abutilon has nonetheless been subject to adulteration or substitution19. Confusion between Abutilon theophrasti and Malva verticillata L. has also been documented, with both referenced in a shared herbal remedy20. Although Malva and Malvastrum are used as herbal medicines for treating different diseases and disorders, they possess distinct chemical constituents with different medicinal properties compared to Sida and Abutilon21. Therefore, to maintain the efficacy of these herbal drugs, it is essential to develop a more accurate and effective method for identifying these medicinally important plants.

In recent years, efforts have been made to resolve the species complexes and combat adulteration using morphological and chemical characterization, HPLC fingerprinting, and molecular techniques based on site-specific markers5,16,22. Kumar et al.5 employed nuclear (ITS2) and plastid markers (psbA-trnH and matK) to assess the market samples of the genus Sida. However, the PCR success rate in these samples was low. Although these attempts have provided valuable baseline data, additional support is required to resolve challenges related to species identification and address the issue of adulteration.

Plastomes have emerged as powerful tools for resolving species complexes and are renowned for harbouring extensive evolutionary information about organisms23. Moreover, they contain highly variable and conserved regions, making them a promising DNA super barcode for land plants24,25. Typically, the plastome of land plants exhibits a quadripartite structure comprising two inverted repeats (IRs) and one copy each of the large single copy (LSC) and small single copy (SSC) regions26,27,28. While recent phylogenomic studies by Wang et al.29 have explored the intra-familial relationships within Malvaceae, encompassing representative samples from all nine subfamilies; however, many economically important genera and/ or species, including Sida, were overlooked. Guo et al.30 reported the plastome sequence and structure of S. szechuensis Matsuda, the only Sida plastome reported to date. Despite its high economic and medicinal importance, research has not been carried out to investigate the evolution of the plastome and phylogenomic position of Sida, particularly in comparison with its herbal substitutes. Furthermore, there is an imperative demand for studies focusing on DNA super-barcoding to address the challenges of adulteration and identification of Sida and Abutilon herbal drugs.

Considering these challenges, the present study aims to delve into the evolutionary patterns of plastomes, assess the structural variations, and elucidate the phylogenetic position of the economically and medicinally important species belonging to the Sida and Abutilon genera, along with their respective adulterants or substitutes. Moreover, the current study also attempted to generate DNA super barcodes as a reliable method for accurate identification and to combat adulteration issues associated with these crucial medicinal plant species.

Results and discussion

Plastome features: assembly and comparison

The assembled plastomes of Sida, Abutilon, and Malvastrum exhibit the characteristic quadripartite structure consisting of the LSC, SSC, and two IR regions (Fig. 1). The plastome sequences were submitted to the GenBank (NCBI) and accession numbers were obtained (Table 1). The comparative study revealed that the size of the plastomes ranged from 158,162 base pairs (bp) (Malva wigandii (Alef.) M.F.Ray) to 160,332 bp (Abutilon theophrastii) (Table 1). Notably, the overall plastome size, as well as the sizes of the LSC, SSC, and IR regions, remained relatively consistent across all species included in this study. The number of genes and the percentage of GC content were also found to be stable among the taxa examined (Table 1). In line with previous studies on Malvaceae17,29,30,31,32, our study corroborates findings related to plastome structure, gene composition, and GC content. In Sida, the only reported plastome of S. szechuensis by Guo et al.30 depicted the plastome size of 159,878 bp, which aligns closely with the plastome sizes of the Sida species newly sequenced in this study (Table 1). However, we observed that the plastome of Sida cuspidata was slightly larger than those of other Sida species, primarily due to the expanded lengths of the LSC and SSC regions. Despite this variation in size, gene content, number of duplicated genes, and overall GC content remained consistent across all examined Sida plastomes (Table 1).

Fig. 1
figure 1

Circular plastome maps of Sida cuspidata, S. rhombifolia, Abutilon theophrasti, and Malvastrum coromandalianum. The innermost circle represents the quadripartite structure, with two inverted repeat regions (IRA and IRB), LSC, and SSC regions. Genes present outside the circle are transcribed counterclockwise, whereas those inside are transcribed clockwise. Genes belonging to different functional groups are shown in different colours. The gene content and organization are similar for Sida species; therefore, one figure was drawn as representative of both species.

Table 1 Summary of Basic Genomic Features of 11 plastomes of the tribe Malveae used in this study.

Furthermore, a comparative analysis was conducted between the plastomes of Sida, Abutilon, and the commonly encountered adulterants, namely Malvastrum and Malva. The comparative analysis using mVISTA revealed intriguing deletions in both coding and non-coding regions of the examined plastomes (Fig. 2). Within Malvastrum and Malva taxa, distinctive as well as shared deletions were observed in non-coding regions. The shared deletions spanning approximately 250 to 300 bp were observed between the atpF-atpH and psbE-petL genes (Fig. 2). Significantly, Malvastrum exhibited notable deletions between the petN-trnD GUC and trnN GUU-ndhF genes, while Abutilon displayed a deletion between the trnT GGU-psbD genes. Further, our study also depicted a unique deletion of approximately 500 bp in the non-coding region between the trnT UGU-trnL UAA genes that can be used as a DNA signature to differentiate Abutilon and Callianthe. Additionally, the plastomes of Malva exhibited unique deletions between ndhC-trnV UAC, trnV GAC-rps12, and within the trnA UGC gene, ranging from 200 to 400 bp (Fig. 2). This indicates the variations and specific deletion patterns, highlighting the genomic diversity within Malvaceae. Further investigations of these deletions can provide better insights into the evolutionary relationships and can elucidate the potential implications of these regions as a DNA barcode to resolve the adulterations in Sida and Abutilon-related herbals.

Fig. 2
figure 2

Sequence identity plot comparing the complete plastomes of 11 taxa from tribe Malveae with Sida rhombifolia as a reference using mVISTA. The X-axis indicates the genes with their orientation in the chloroplast genome. The Y-axis indicates the percent identity, and ranges from 50 to 100%, and a cutoff of 70% identity was used for the plots. Genome regions are colour coded, and deletions are indicated in white.

Compression and expansion of the IRs-SSC region

The variability in plastome size is mainly attributed to the expansion and contraction of inverted repeats. Such variation leads to the gain or loss of certain genes, and the boundaries of IRs-SSC, in turn, exhibit unpredictable dynamics across various plant species33,34,35. The current study revealed the constant size of LSC, SSC, and IRs in the studied taxa of Sida and their allied groups. The length of SSC-IRs remained constant, but it depicted the presence of only one complete functional copy of the ycf1 gene at the junction of SSC-IR (Fig. 3). Our findings corroborate the reports of Wang et al.29 in Malvaceae, wherein most of the allied taxa showed the presence of a single copy of ycf1 gene in the SSC region. Whereas, Hibiscus, Urena, and Firmiana showed the presence of a partially duplicated ycf1 gene in the IR region with only one functional and one truncated non-functional copy (Fig. 3). Most of the angiosperms are characterized by the presence of ndhF gene at IR-SSC border and are known to provide stability to IR/SSC borders36. Corroborating the previous studies on Malvaceae and other flowering plants29,36, the current study also depicted ndhF as a border gene, but it displays an unstable nature as its position varies by about 50 to 1,000 bp away from the IR-SSC junction. In Sida and Abutilon species, the shift of the ndhF gene from the border was relatively minimal, ranging from 56 to 112 bp. On the other hand, the maximum shift was observed in Malvastrum and Malva species, where the ndhF gene moved 580 to 911 bp away from the IR-SSC border. The species of Malvastrum and Malva were characterized by the presence of trnN gene about 310–311 bp away from the IR-SSC border. Furthermore, the IR-LSC and LSC-IR border was marked by the presence of trnH and rps19 genes, respectively, in all the studied taxa (Fig. 3). These findings shed light on chloroplast genome structural variations and gene organization in these taxa. Further research is needed to understand their functional and evolutionary implications.

Fig. 3
figure 3

Comparison of the borders of LSC, SSC, and IR regions of plastomes in 11 taxa from the tribe Malveae. Genes are denoted by coloured boxes. Genes shown below are transcribed in reverse, and those shown above the lines are transcribed forward.

Simple sequence repeats, tandem repeats, and potential DNA barcodes

The plastomes of 11 taxa, including Sida, Abutilon, and their adulterants, were analysed for the presence of SSRs, tandem repeats, and nucleotide diversity to identify the most potential DNA barcodes for the above plants. During the SSR analysis, we observed that all species had the highest abundance of mononucleotide repeats and the lowest abundance of hexanucleotide repeats (Fig. 4a). This pattern suggests that mononucleotide repeats are more prevalent and conserved across these taxa. Regarding the size of SSRs, the majority were found to be 10 in size, with their distribution scattered randomly across the plastome (Fig. 4b). While examining the distribution of SSRs across the plastomes, we found that the highest number of SSRs, especially the mononucleotide repeats, were located in LSC, followed by SSC and IRs (Fig. 4c, Supplementary Table S1). This further indicates that the LSC region is more prone to the occurrence of SSRs compared to other regions of the plastome. The maximum diversity of SSRs was found in Abutilon theophrasti and Malva parviflora. The size, type, and position of SSRs vary across the studied taxa and could be used as a unique tool as DNA markers to identify the species. Moreover, our results are in accordance with the previous findings17,29,31. Our analysis of oligonucleotide repeats revealed inconsistencies and variations across the different species, wherein the forward repeats were the most abundant, while the complement repeats were the least observed throughout the taxa (Fig. 5a). Additionally, the size of the repeats showed inconsistency in number between different taxa, with sizes larger than 35 being the most common and sizes ranging from 32 to 34 being the least frequent (Fig. 5b, Supplementary Table S2). The abundance and distribution patterns of forward and complementary repeats can serve as informative markers for distinguishing different species31. Additionally, SSR markers are among the most informative and versatile DNA-based markers used in plant research37,38, and particularly effective for assessing genetic diversity within closely related taxa39. These findings offer valuable insights for developing efficient DNA barcodes to support species identification and conservation strategies.

Fig. 4
figure 4

Comparisons of the simple sequence repeats (SSR) among the 11 plastomes of tribe Malveae: (a) Type of SSRs detected in each plastome, with the X-axis representing SSRs types (compound, di, hexa, mono, penta, tetra, and tri) and the Y-axis showing their representive counts. (b) Size of SSR detected in each plastome, where circle size and colour intensity reflect the frequency of each size class; (c) Frequencies of identified SSRs in LSC, SSC, and IR regions highlighted with different colours.

Fig. 5
figure 5

Comparisons of the tandem repeats among the 11 plastomes of tribe Malveae (a) Type of tandem repeats detected in each plastome – complement, forward, palindromic, reverse (x-axis indicates taxa names and y-axis represents number of each type of repeats); (b) Size of tandem repeats in each plastome where circle size and colour intensity reflect the frequency of each size class.

Recent studies on the plastomes of the Malvaceae family have provided valuable insights into potential DNA barcodes. Various research groups, including Abdullah et al.31, Alzahrani17, and Wang et al.29, have identified several intergenic spacers and genes with promising barcode potential. However, these studies have largely addressed the family at a broad level, without explicitly targeting economically significant taxa. In plant identification, genus- or species-specific DNA barcodes are often recommended to effectively detect adulteration or misidentification40,41. For commercially significant species within Malvaceae, however, suitable barcodes remain underexplored, complicating efforts to resolve adulteration issues. Some previous studies have highlighted the use of universal DNA markers such as ITS, matK, psbA-trnH, and rbcL in the Sida species, including Mahabala and other members of the Bala group. Among these, psbA-trnH has been highlighted as a potential candidate5,17,22, which is consistent with our plastome-based analysis that also identified psbA-trnH as a potential barcode for species within the Bala group.

Our analyses of Sida plastomes revealed six intergenic regions (trnH GUG-psbA, trnN GUU-ndhF, psbZ-trnG GCC, trnT GGU-psbD, petD-rpoA, and ccsA-ndhD) and two genes (petL and ycf1) exhibiting high nucleotide diversity (Fig. 6a). Notably, among these, only ycf1 was previously reported in Malvaceae17,29. We propose the use of psbA-trnH as a potential DNA barcode, along with the other markers identified in our study (trnN-GUU–ndhF, psbZ–trnG-GCC, trnT-GGU–psbD, petD–rpoA, ccsA–ndhD, petL, and ycf1) to aid in identifying adulterants in Sida and related taxa. However, it is important to acknowledge that psbA-trnH is known for its high variability and frequent inversions, characteristics that may lead to ambiguous phylogenetic resolution among closely related taxa42. Therefore, all these markers need to be validated thoroughly before it is employed for phylogenetic inference within the group.

Fig. 6
figure 6

Sliding window analysis on the complete plastomes of 11 plastomes within tribe Malveae using a window length of 600 bp and a step size of 200 bp. X-axis represents the nucleotide position, while Y-axis indicates the nucleotide diversity (π). (a) Highlights eight most variable genes (cut-off-0.02) within the assembled Sida plastomes; (b) shows seven most variable genes (cut-off-0.01) within the assembled Malvastrum plastomes; (c) displays five most variable genes (cut-off-0.01) within the assembled Malva plastomes.

To further strengthen the use of DNA barcodes and combat adulterations in Sida and Abutilon herbal drugs, we also studied the plastomes of Malvastrum and Malva, their frequent adulterants. In Malvastrum, we identified six intergenic regions or spacers (trnT UCU-trnL UAA, petA-psbJ, trnS GCU-trnR UCU, rps2-rpoC2, trnN GUU-ndhF, and psaC-ndhG) and one gene (ycf1) as the most promising DNA barcode for the genus (Fig. 6b). On the other hand, Malva exhibited five intergenic regions/spacers (petA-psbJ, ndhF-rpl32, trnN GUU-ndhF, accD-psaI, and psbZ-trnG GCC) as the most potential DNA barcodes for the genus (Fig. 6c). Our current findings are partially consistent with those of Alzahrani17 and Wang et al.29, sharing two intergenic spacers (petA-psbJ and accD-psaI) and one gene (ycf1) as potential DNA barcodes for identifying taxa belonging to Malvaceae. Therefore, the proposed specific DNA barcodes can be the most promising tool for combating adulterations and misidentification of the herbal drugs belonging to Sida and its allied genera.

Due to the limited plastome data available for Abutilon, we were unable to perform interspecific nucleotide diversity analysis to identify the potential barcodes. Alzahrani17 reported 10 intergenic spacers and five genes as hotspot regions in plastomes. However, the plastome of Abutilon fruticosum (MT772391) used in that study is unverified by GenBank, and our attempt at re-annotation revealed inconsistencies in the boundaries of the SSC and IR regions. Consequently, we excluded the Abutilon fruticosum plastome from our analysis to prevent probable misleading outcomes. However, we conclude that the proposed barcodes by Alzahrani17 needs to be validated, and additional plastome data should be included to generate precise DNA barcodes for Abutilon.

The phylogenetic position of ‘Mahabala’ along with its allied taxa and adulterants within the tribe Malveae

In the past decade, plastome data have brought about significant advancements in overcoming adulteration issues in crude drugs through phylogenomic and DNA barcoding approaches. These approaches have also facilitated a better understanding of the phylogenetic positions and accurate placement of various medicinally important taxa across the plant kingdom43,44,45. However, despite being a medically important group, the genomic approach within the tribe Malveae has been largely overlooked by researchers. Due to overlapping morphological characteristics and the availability of powdered or dried forms in the market, the herbal drugs of Sida and its related genera, including Abutilon, Malvastrum, Malva, and Callianthe, are frequently misidentified based only on vegetative traits5,16.

In this study, we made the first-ever attempt to elucidate the phylogenomic position of Mahabala along with its allied genera and adulterants within the tribe Malveae using plastome data. Phylogenetically, Sida was found to be sister to Abutilon and Callianthe with strong bootstrap support (BS-100), forming a distinct sister clade with Malvastrum, Malva, Alcea, and Althaea with moderate support (BS-78) (Fig. 7). Our results are consistent with the findings of Guo et al.30, who investigated the phylogenomic position of Sida within the family Malvaceae using plastome data of Sida szechuensis. Their study revealed a close relationship between S. szechuensis and the genera Malva and Malvastrum. However, their study had limited samples from the tribe Malveae, and the data from allied genera were overlooked in their phylogenetic analyses. Additionally, Abdullah et al.31, Cvetkovic et al.46, and Wang et al.29 attempted to understand the phylogenetic relationships within Malvaceae. However, none included Sida or Abutilon samples to understand the phylogenetic position of these taxa within the family. The expanded sampling within tribe Malveae in the current study revealed greater phylogenetic separation between Sida, Malvastrum, and Malva, and provided a clearer picture of their phylogenetic positions, which is in accordance with the findings of Guo et al.30

Fig. 7
figure 7

Circular phylogenomic tree illustrating the phylogenetic relationships within the tribe Malveae. The tree was constructed using the Maximum Likelihood (ML) algorithm in IQTree and visualized with iTOL. The numbers at each node represent the bootstrap values, while the red colour indicates the outgroup. The tree revealed the placement of Sida acuta (NC064374) within the Malvastrum clade.

Another study by Alzahrani17 aimed to understand plastome evolution and the phylogenomic position of Abutilon, suggesting that Abutilon is sister to Althaea, forming a sister clade to the tribe Malveae. However, the study included only limited samples from Malveae and overlooked other allied genera of Abutilon in their phylogenomic analysis. In contrast, our study included the closely allied taxa and increased sampling from the tribe Malveae, strongly supporting the placement of the genera Abutilon and Althaea within the tribe Malveae. Notably, Abutilon is placed as a sister to Callianthe and Sida with strong support (BS-100), and appeared far from Althea and Malva (Fig. 7), in agreement with earlier taxonomic classifications47,48. Alzahrani17, however, proposed that Abutilon is closely related to Althaea and suggested including Althaea within subtribe Abutileae, thereby contradicting traditional classifications. Based on our findings, we reject Alzahrani’s proposed reclassification and emphasize the need for further comprehensive plastome studies across the tribe Malveae.

Misidentification issues are not confined to market samples or crude drugs alone; they also extend to publicly available sequence data in GenBank. Specifically, the plastome sequence NC064374 was initially submitted to NCBI as Sida acuta. However, comparative plastome analysis revealed that it belongs to the closely related genus Malvastrum, which is frequently substituted for Sida. Our phylogenomic tree strongly supports the placement of Sida acuta (NC064374) with the newly generated plastome of Malvastrum coromandelianum with strong bootstrap support (BS-100) (Fig. 7). Malvastrum and Sida are often misidentified due to their morphological similarities, and the only distinguishing character is the presence of an epicalyx in the genus Malvastrum. Therefore, our study provides insights into the correct phylogenetic placement of medicinally and economically important taxa belonging to Sida and Abutilon and their frequent adulterants (Malva and Malvastrum). However, we believe including more plastome data from these genera would provide a better understanding of their phylogenetic relationships.

Overall, our study provides new insights into the evolutionary history of the tribe Malveae. While the total plastome size remains largely consistent across genera, distinct molecular differences, particularly in non-coding regions, clearly differentiate them. Each genus, including Sida, Malva, Malvastrum, Abutilon, and Callianthe, exhibited unique patterns of insertions and deletions. For instance, Abutilon and Callianthe showed specific deletions between the trnT-GGU–psbD and trnT-UGU–trnL-UAA regions, respectively. In contrast, structural variations in Malvastrum and Malva were observed between atpF–atpH and psbE–petL. Sida, however, displayed a relatively conserved plastome structure, suggesting a more stable evolutionary trajectory.

These molecular distinctions, along with genus-specific SNPs, SSRs, and tandem repeats, not only highlight the evolutionary divergence among these groups but also enhance species-level identification. In taxa such as those within Malveae, where morphological traits are often ambiguous, these genomic features provide reliable markers for distinguishing medicinal plants and preventing adulteration in herbal products. Our findings demonstrate that integrating plastome variation with phylogenetic analysis offers a powerful approach for understanding plant evolution and improving species identification in both research and applied contexts.

Conclusion

This study compared the plastomes of taxa belonging to the ‘Bala group’ and its adulterants, which are economically important taxa within the Malvaceae family. The comparative analysis revealed intriguing deletions in both coding and non-coding regions, providing valuable insights into the genomic diversity within the family. Additionally, the investigation of the contraction and expansion of IR-SSC regions highlighted the variability and unstable nature of certain genes at the junction. The analysis of SSRs and tandem repeats identified potential DNA markers for species identification within the Sida, Abutilon, Malvastrum, and Malva. Based on the nucleotide diversity analysis, specific DNA barcodes were proposed for Sida (trnH GUG-psbA, trnN GUU-ndhF, psbZ-trnG GCC, trnT GGU-psbD, petD-rpoA, and ccsA-ndhD), Malvastrum (trnT UCU-trnL UAA, petA-psbJ, trnS GCU-trnR UCU, rps2-rpoC2, trnN GUU-ndhF, and psaC-ndhG), and Malva (petA-psbJ, ndhF-rpl32, trnN GUU-ndhF, accD-psaI, and psbZ-trnG GCC), addressing adulteration issues in the Bala group. However, further research is needed to validate and expand these findings. Furthermore, the phylogenomic analysis elucidated the phylogenetic positions of economically important species and their adulterants within the tribe Malveae. Overall, this study contributes to the development of DNA barcodes to ensure the efficacy of herbal drugs prepared using Mahabala and related groups while also enhancing our understanding of plastome evolution.

Materials and methods

Sampling, DNA extraction, and sequencing

Fresh leaf samples of Sida cuspidata from Costa Rica, S. rhombifolia & Abutilon theophrastii from South Korea and Malvastrum coromandelianum (L.) Garcke from China were gathered. The voucher specimens of the above taxa (KRIB0079328, KRIB0085770, KRIB0080474, and KRIB0093292) were deposited at the herbarium of Korea Research Institute of Bioscience and Biotechnology (KRIB). Total DNA extraction was carried out from the silica-dried leaves using a Qiagen DNA extraction Kit (Cat. No. 69104) according to the manufacturer’s protocol. However, the best quality DNA of one biological replicate for each taxon was selected based on the DNA QC using a Qubit fluorometer. The high molecular weight genomic DNA was sheared, and short-insert (550 bp) paired-end libraries were prepared using the TruSeq Nano DNA Library Prep Kit (Illumina). After the library preparation, each sample with unique index primers was pooled and run in a single lane of an Illumina HiSeq X Ten System (Illumina, San Diego, CA, USA) with a read length of 151 bp. To obtain the maximum coverage, more than 40 million reads for each taxon were generated using the whole genome sequencing approach, generating a total of more than 16 GB of high-quality (Q > 30) raw data for each sequenced taxon (Supplementary Table S3).

Plastome assembly and annotation

The high-quality reads were assembled to generate the assembled contig of the plastome using de-novo assembly with a single accession of each taxon. The quality of the raw reads generated after sequencing was first analyzed using FastQC v0.11.749, and Trimmomatic v0.3950 was used to filter the raw reads and adaptor contamination. The high-quality reads were assembled by importing the forward and reverse reads with a read length of 151 bp and an insert size of 300 bp in NOVOPlasty v 4.3.151. The rbcL gene of all four taxa was used as a seed input sequence for the de-novo assembly of the plastome. To confirm the accuracy of the de-novo assembled plastome, the reference-based assembly was also performed, wherein the closest allied reference, Sida szechuensis Matsuda (NC064374), was selected. The orientation and order of IRs, LSC, and SSC regions were further confirmed by NCBI blast and graphic view using Geneious Prime 2022.2.1 (https://www.geneious.com). The de-novo and reference-based assembled plastomes depicted the same sequences; hence, the de-novo assembled plastomes were used for further analysis. The genome annotation of the assembled plastomes was performed using GeSeq─Annotation of Organellar Genome52, an online tool of CHLOROBOX (https://chlorobox.mpimp-golm.mpg.de/geseq.html). The transfer RNAs were identified based on some additional parameters, such as ARAGORN v1.2.3853 and tRNAscan-SE v2.0754. The annotations were confirmed and validated using Geneious Prime 2022.2.1 (https://www.geneious.com) and the NCBI blast tool. The circular plastome maps were drawn using Chloroplot55, an online Program for the Versatile Plotting of Organelle Genomes (https://irscope.shinyapps.io/chloroplot/). Later, the assembled and annotated plastomes were deposited in the NCBI GenBank, and the accession numbers were obtained (Table 1).

Plastome structure and comparison

Comparative analyses among selected Sida and Abutilon plastomes, along with their adulterants, were performed. In addition to the newly sequenced plastomes, other medicinally important plastomes belonging to different genera, Abutilon, Malva, Malvastrum, and Callianthe, available on NCBI, were also retrieved and included in the analysis. To examine the overall differences among compared plastomes, the junctions (LSC/IRb; IRb/SSC, SSC/IRa; IRa/LSC) of the species were mapped using IRscope (https://irscope.shinyapps.io/irapp/)56. Subsequently, plastome sequences were compared for their gene order, gene content, and similarity using the mVISTA57 with the annotation of S. rhombifolia as a reference.

Identification of repetitive sequence

The microsatellite markers across the analyzed plastomes were screened and identified using MISA58, an online server (http://webblast.ipk-gatersleben.de/misa/index.php?action=1) with minimum iterations of ten, five, four, three, three and three repeat units for mononucleotide (p1), dinucleotide (p2), trinucleotide (p3), tetranucleotide (p4), pentanucleotide (p5), hexanucleotide (p6), respectively. Additionally, REPuter59, an online server (https://bibiserv2.cebitec.uni-bielefeld.de/reputer), was used to identify and locate the repeated sequences, including forward, reverse, complement, and palindromic repeats, based on the parameters: i. 30 bp minimum repeat length; ii. 90% or greater sequence identity (Hamming distance = 3).

Nucleotide diversity

The plastomes of three Sida, two Malvastrum, and four Malva species were aligned separately using Geneious Prime 2022.2.1 plugin MAFFT v7.40260. To investigate the patterns of variability across the aligned plastomes, we applied a sliding window approach. Nucleotide diversity (Pi) analysis was performed by importing the aligned plastome sequences in DnaSP v6.12.01 software61 with a step size of 200 bp and a window length of 600 bp. The nucleotide diversity for each group (Sida, Malvastrum, and Malva) was tested individually to trace the most accurate DNA barcodes for the respective groups. The graph was plotted using the nucleotide diversity found in LSC, SSC, and IR regions.

Phylogenomic analysis

The phylogenomic analysis was performed using the protein-coding genes (CDS) extracted from the plastomes of the family Malvaceae to confirm the phylogenetic position and relationships among the selected medicinally important species belonging to the tribe Malveae and their adulterants or substitutes. A total of 83 CDS of all taxa were extracted using Geneious Prime 2022.2.1 and were aligned using the MAFFT v7.450 plugin. The aligned sequences were further trimmed from both ends, and the gaps were curated manually in Geneious Prime 2022.2.1. The aligned matrix of 95,847 bp was used to generate the phylogenetic tree using the Maximum Likelihood (ML) algorithm. The best-fit nucleotide substitution model was selected based on the model-test plugin Modelfinder included in IQTree v 2.6.1 software62. The final ML tree was generated using the IQTree v2.6.1 software62 with 100,000 iterations of ultrafast bootstraps. The obtained tree was visualized in iTOL v663.