Introduction

Withania coagulans (Stocks) Dunal, also known as Indian Rennet or Paneer Booti, is a medicinally important plant belonging to the Solanaceae family. Native to arid and semi-arid regions of countries such as Iran, India, Pakistan, and Afghanistan, this species has long been used in traditional medicine for its therapeutic properties1. In the ethnobotanical context, W. coagulans holds significant cultural value and has been utilized for centuries in various communities for multiple health-related purposes. Ethnomedicinally, different parts of the plant—particularly its fruits and roots—are used in the treatment of diabetes, liver disorders, inflammation, and gastrointestinal ailments2,3,4. In ethno-veterinary medicine, decoctions and powders derived from W. coagulans have traditionally been administered to livestock to improve digestion and manage reproductive or respiratory issues, particularly in arid and resource-limited rural settings. These multifaceted applications highlight the species’ role in indigenous healthcare systems and validate its medicinal reputation among local populations.

Despite its medicinal relevance, W. coagulans remains underutilized, with relatively few studies focusing on its genetic diversity, morphological features, or fruit characteristics. Evaluating variation in these traits is essential for conservation, domestication, and breeding efforts aimed at enhancing its medicinal and agronomic potential. Accurate trait determination and genetic characterization are foundational steps in modern breeding programs, as they enable the identification of intra-species diversity and the selection of superior genotypes5,6. Morphological and pomological characterization is particularly important in germplasm evaluation, where it provides a baseline for genetic resource management and cultivar development. In W. coagulans, traits such as plant height, leaf size, fruit morphology, and seed characteristics are valuable indicators of adaptive potential, genetic differentiation, and phenotypic plasticity7,8. When coupled with multivariate statistical techniques—such as principal component analysis (PCA), correlation matrix analysis (CMA), and hierarchical clustering—trait data can be comprehensively evaluated to reveal meaningful patterns of variation and trait interrelationships. These methods assist in grouping accessions into distinct genetic clusters, thereby facilitating more efficient and targeted selection in breeding programs.

Although interest in medicinal plants is steadily increasing, comprehensive studies addressing the genetic and phenotypic diversity of W. coagulans are still limited. In contrast, related species such as Withania somnifera (Ashwagandha) have been extensively studied and shown to possess substantial variability in morphological and biochemical characteristics9,10, suggesting that W. coagulans may exhibit similar diversity. Nevertheless, a systematic pomological and morphological assessment of W. coagulans accessions has not yet been thoroughly conducted. This knowledge gap hinders the species’ full utilization in both medicinal and horticultural applications.

To address this gap, the present study undertakes a comprehensive multivariate evaluation of W. coagulans accessions collected from five regions of Sistan-va-Baluchestan province in Iran. The main objectives are: (i) to assess the extent of pomological and morphological diversity; (ii) to identify the most discriminative traits contributing to phenotypic variation; and (iii) to classify the accessions based on trait similarity using clustering and heat map visualization. Understanding the species’ adaptation to harsh environments also provides insight into its morphological differentiation and survival strategies in response to ecological stress. By establishing a detailed characterization framework, this study aims to support future efforts in genetic improvement, conservation, and sustainable utilization of this valuable but under-researched medicinal species.

Materials and methods

Plant material

Pomological and morphological variations of 55 natural accessions of W. coagulans were evaluated in five areas of Sistan-va-Baluchestan province, Iran. Geographical coordinates and altitude corresponding to each surveyed area are shown in Table 1. Also, the geographic locations of collection areas of the studied accessions are shown in Fig. 1. To ensure accurate sampling, a minimum distance of 200 m was kept between accessions in each area to prevent the collection of clone samples. The formal identification of the specimens was performed by Prof. Dr. Ali Khadivi. A herbarium voucher specimen with sediment number WC-4432 has been donated to the public available herbarium of the Faculty of Agriculture and Natural Resources of Arak University, Iran. For this study, we obtained permission to collect and study Withania coagulans, issued by the Agricultural and Natural Resources Ministry of Iran.

Table 1 Geographic locations of collection areas of the studied Withania coagulans accessions in Sistan-va-Baluchestan province, Iran.
Fig. 1
figure 1

Geographic locations of collection areas of the studied Withania coagulans accessions.

Pomological and morphological evaluations

A total of forty-three pomological and morphological traits were analyzed to assess phenotypic variation among W. coagulans accessions (Table 2). For each accession, 20 fully developed leaves and 20 physiologically mature fruits were randomly collected from different parts of the plant canopy to ensure representative sampling. Quantitative traits related to the dimensions of leaves, fruits, and seeds were measured using a high-precision digital caliper (Loyka B5110-150-IP54 protected, China), thereby allowing for accurate identification of morphological differences. Individual fruit and seed weights were measured using a digital scale with a sensitivity of ± 0.01 g (JNB5002, JOANLAB, Zhejiang, China), increasing the reliability of mass-related data. Qualitative morphological traits such as shape, color, and growth form were evaluated by the same research team under standardized lighting conditions to reduce observer bias and were coded based on descriptive criteria to minimize subjectivity (Table 3).

Table 2 Descriptive statistics for the pomological and morphological traits utilized in the studied Withania coagulans accessions.
Table 3 Frequency distribution for the measured qualitative morphological characters in the studied Withania coagulans accessions.

Statistical analysis

Analysis of variance (one-way ANOVA, p < 0.05) was performed using JMP® Pro 17 software11 to assess the variation among accessions based on the measured traits. The relationships between the recorded traits were determined using Pearson correlation coefficients (r) in Origin Pro® 2025 software12. To identify the key traits influencing accession clustering, principal component analysis (PCA) was conducted in Origin Pro® 2025. To enhance the interpretability of the components, the Varimax rotation method with Kaiser Normalization was applied, ensuring a more structured and meaningful representation of the component relationships. Furthermore, a heat map analysis based on Ward’s method and Euclidean distance coefficients was carried out in Origin Pro® 2025 to classify accessions and variables. The first and second principal components (PC1 and PC2) were utilized to construct a two-dimensional biplot, illustrating the distribution of accessions and quantitative traits. Additionally, fruit- and seed-related traits were considered dependent variables, and multiple regression analysis (MRA) was employed to identify the factors influencing these traits. The MRA was performed using the ‘‘stepwise’’ method within the ‘‘linear regression analysis’’ option in SPSS® software (SPSS Inc., Chicago, IL, USA)13,14.

Results and discussion

Descriptive statistics among accessions

The pomological and morphological traits of W. coagulans accessions were analyzed, and descriptive statistics for each trait are presented in Table 2. Previous studies by Goodarzi et al.15, Mostafa et al.16, Elwakil et al.17, and Khadivi et al.18 identified that certain variables exhibited no significant variation among accessions (CV = 0.00%), prompting their exclusion from further analysis. In line with these findings, two variables (basal leaf color and upper leaf color) were found to lack significant variation (CV = 0.00%) among the accessions in this study. Consequently, the evaluations were conducted based on the remaining 41 variables. In this regard, the one-way ANOVA (p < 0.05) demonstrated significant differences among the evaluated accessions. The highest variation was observed in Basal leaf apex shape (60.27), main stem color (58.37), leaf density (57.59), fruit ripening data (57.59), and plant growth habit (52.86). In contrast, the lowest variation was recorded in Fruit pedicel diameter (14.40), fruit pedicel length (12.01), fruit length (11.29), fruit width (9.27), and seed length (8.94). Notably, 33 out of 41 variables (representing 80.49% in total) had coefficients of variation (CVs) greater than 20.00%. This value indicated a substantial level of variability among the studied accessions19. Traits displaying a coefficient of variation (CV) exceeding 20.00% are more distinguishable among specimens and can be effectively utilized as reliable markers for differentiating accessions, genotypes, or cultivars20. Conversely, traits with a wider quantitative range tend to exhibit higher CV% values, indicating greater potential for selection and improvement of these characteristics21. In contrast, morphological traits with lower CV values are more consistent and can be considered stable features across accessions20. Our results are in agreement with a study conducted on pomegranate in Iran, which reported a CV of 84.50%22. Similarly, a study on oak (Quercus brantii Lindl.) identified a CV of 77.36%, further highlighting considerable variability among the analyzed accessions23.

The plant height ranged from 60 (‘Rachedr-10’) to 200 (‘Kahiri-11’) cm, indicating substantial variation that may result from genetic diversity and environmental influences. The basal leaves exhibited a length between 32.60 (‘Rachedr-9’) and 71.41 (‘Hit-2’) mm, with a width ranging from 8.91 (‘Kahiri-6’) to 25.79 (‘Hit-3’) mm, suggesting notable morphological plasticity, which could be an adaptive advantage under varying ecological conditions. The petiole length of the basal leaves varied from 5.56 (‘Rachedr-4’) to 18.15 (‘Hit-2’) mm, highlighting differences in leaf attachment and potential effects on leaf orientation and photosynthetic efficiency. For the upper leaves, the length was observed to be between 27.19 (‘Malekabad-2’) and 59.47 (‘Malekabad-10’) mm, while the width ranged from 8.30 (‘Rachedr-4’) to 21.54 (‘Hit-3’) mm, which implies a gradual reduction in leaf size towards the apex, a common trait associated with optimizing light capture and reducing water loss. The petiole length of the upper leaves ranged from 3.16 (‘Rachedr-6’) to 14.10 (‘Veshder-6’) mm, with a petiole width of 1.02 (‘Malekabad-2’) to 2.40 (‘Hit-4’) mm, suggesting variability in mechanical support and potential differences in leaf positioning for efficient gas exchange. The fruit length varied from 7.54 (‘Kahiri-11’) to 12.34 (‘Kahiri-8’) mm, and the fruit width ranged from 8.28 (‘Kahiri-5’) to 12.48 (‘Kahiri-8’) mm, indicating moderate variation that could influence dispersal mechanisms and reproductive success. The fruit pedicel length ranged from 3.99 (‘Malekabad-4’) to 6.81 (‘Rachedr-7’) mm, with a pedicel diameter between 0.80 (‘Rachedr-4’) and 1.58 (‘Rachedr-7’) mm, which may affect fruit attachment strength and susceptibility to environmental stresses such as wind or predation. The fruit weight varied from 0.12 (‘Kahiri-5’ and ‘Kahiri-11’) to 0.32 (‘Kahiri-8’) g, while the fruit skin thickness ranged from 0.16 (‘Rachedr-11’) to 0.76 (‘Kahiri-8’) mm, suggesting potential differences in desiccation resistance and post-harvest longevity. Each fruit contained between 18 (‘Kahiri-5’ and ‘Kahiri-11’) and 38 (‘Rachedr-11’) seeds, indicating considerable variability in reproductive output, which may have implications for seed dispersal and establishment success. The seed weight per fruit ranged from 0.05 (‘Kahiri-5’) to 0.22 (Rachedr-6 and Rachedr-7) g, highlighting differences in resource allocation strategies among accessions. The seeds exhibited a length between 2.52 (‘Malekabad-1’) and 3.72 (‘Malekabad-6’) mm, a width ranging from 1.35 (‘Malekabad-3’) to 2.76 (‘Veshder-1’) mm, and a thickness of 0.68 (‘Malekabad-13’) to 1.88 (‘Malekabad-6’) mm, reflecting morphological variation that may influence germination dynamics and seedling vigor.

Due to the limited number of studies on this topic, our findings have been compared with previous research conducted on the Solanaceae family. In this context, a similar study conducted in India using 14 accessions of Withania somnifera (L.) Dunal reported that plant height ranged from 37.78 to 135.82 cm, leaf length ranged from 6.5 to 9.6 cm, leaf width ranged from 3.7 to 5.8 cm, and the number of seeds per fruit ranged from 30.1 to 43.77. Kumar et al.24 measured seed size in W. coagulans, with seed length varied between 2.2 and 3.7 mm, width varied between 1.8 and 2.7 mm, and thickness varied between 0.9 and 1.3 mm. Gunn25 categorized Solanaceae family seeds based on external observation into two categories: medium-sized and tiny-sized seeds. Seeds larger than 1.5 mm were classified as medium-sized, while those smaller were classified as tiny26. Mahdy et al.27 identified seed sizes in various Solanum species. For example, the length of Solanum lycopersicum (tomato) seeds ranged from 2.14 to 3.96 mm, and the width ranged from 2.65 to 3.91 mm; Solanum diphyllum seeds ranged from 2.42 to 3.46 mm in length and from 2.65 to 3.91 mm in width; Solanum abutiloides seeds ranged from 1.03 to 1.41 mm in length and from 0.93 to 1.35 mm in width; and Solanum macrocarpon seeds ranged from 2.5 to 3.0 mm in both length and width. The seed sizes of W. coagulans fall into the medium-sized seed category within the Solanaceae family. These sizes are observed to fall within a similar range, especially when compared with other Solanum species. For instance, the seed sizes of S. lycopersicum and S. diphyllum are close to those of W. coagulans. However, seeds of species such as S. abutiloides are smaller in size. These findings suggest that the seed size of W. coagulans shows a moderate level of variation within the family and may serve as an important criterion in the classification of the species. Moreover, this diversity in seed size could provide insights into the species’ adaptability and dispersal strategies.

The observed variability in traits such as plant height, leaf size, and fruit weight may be influenced not only by genetic differences but also by microclimatic variation, soil composition, and elevation differences across the five collection regions. These environmental factors could potentially shape adaptive pomological and morphological responses in W. coagulans, thereby contributing to the phenotypic plasticity observed in this study. Similar findings were reported by Iqbal et al.28, who demonstrated that W. coagulans populations from ecologically diverse sites exhibited significant variation in anatomical traits such as root cortical thickness, stem vascular bundle area, and leaf stomatal density, which were strongly associated with local environmental conditions.

Specifically, microclimatic conditions—such as localized temperature fluctuations, relative humidity, and light intensity—can significantly impact vegetative growth and reproductive development29,30. Soil characteristics, including texture, pH, organic matter content, and nutrient availability, may further modulate root development, leaf expansion, and fruit setting patterns31,32. Additionally, elevation gradients are known to impose distinct ecological pressures that influence plant physiology, often resulting in altitudinal clines in trait expression33.

In the context of W. coagulans, which naturally occurs in arid and semi-arid zones, the interaction between environmental heterogeneity and genetic background may drive differential allocation of resources towards structural or reproductive organs. Such plastic responses are ecologically advantageous, allowing accessions to optimize survival and reproductive success under varying stress conditions34.

Consequently, the combined effect of environmental selection and intrinsic genetic variability likely underpins the broad range of phenotypic traits documented in this study, reflecting the adaptive responses of W. coagulans to diverse ecological conditions. This extensive morphological diversity among accessions underscores the species’ potential for both breeding and conservation initiatives. While certain traits—such as leaf shape and seed morphology—remain relatively stable across populations, considerable variability is evident in plant architecture, fruiting time, and fruit characteristics. Such variation offers valuable opportunities for the identification and selection of superior genotypes possessing desirable agronomic and medicinal attributes, thereby supporting the development of targeted improvement programs for this underutilized medicinal species.

Frequency distribution analysis is a statistical method used to categorize qualitative data into specific intervals (classes) and calculate the number of observations within each interval. This analysis is employed to understand the structure of a dataset and determine measures of central tendency and distribution characteristics. The analysis process begins with defining class intervals appropriate to the type and distribution of the data. These intervals may have equal or varying widths, depending on the characteristics of the dataset. Subsequently, the number of observations within each class is calculated, and a frequency table is constructed35. In frequency distribution analysis, measures such as absolute frequency, relative frequency, cumulative frequency, and percentage frequency are utilized. In our study, absolute frequency distribution was used (Table 3). Absolute frequency refers to the number of times a particular class or value appears in a dataset. In other words, it represents the count of observations within each category or class interval36.

The qualitative morphological characteristics of W. coagulans accessions exhibited considerable variation across different traits. Regarding plant growth habits, most accessions displayed an open structure (22), followed by semi-erect (16), prostrate (14), and erect (3). This distribution suggests that open and semi-erect growth forms are dominant, which may be advantageous for better light interception and air circulation.

Plant growth vigor was categorized as low (25) and moderate (30), with no accessions classified as high. This indicates a general tendency towards moderate to low vigor, which may impact biomass production and adaptation potential. Similarly, plant breadth, canopy density, branch density, and branch flexibility followed a comparable pattern, with moderate values being slightly more frequent than low ones, highlighting a moderate architectural development among the studied accessions.

Branching types were predominantly multi-stemmed with intermediate branching (27) while low-branching multi-stemmed plants were also present (7). A significant proportion of accessions exhibited high branching intensity (21). This distribution reflects a genetic predisposition towards intermediate to high branching, which might contribute to structural stability and reproductive efficiency.

Stem diameter was mostly moderate (35), with a smaller proportion classified as low (20), suggesting that a majority of the accessions had a well-developed structural framework. The predominant stem color was gray-whitish (42), followed by gray (13), indicating a consistent phenotypic trait that may be associated with drought resistance or other adaptive functions.

Leaf characteristics also showed variation, with leaf density being predominantly low (38) and moderate in some accessions (17). Basal leaf shape was equally distributed between elliptic (22) and broad elliptic (22), while a smaller fraction exhibited a narrow elliptic form (11). Apex shape followed a similar trend, where acute (23) and intermediate (19) shapes were more common than obtuse (13), suggesting that leaf architecture is relatively conserved among these accessions.

For upper leaves, elliptic shape was the most common (31), followed by narrow elliptic (15) and broad elliptic (9). The apex shape of the upper leaves was mainly intermediate (32), while acute (12) and obtuse (11) forms were less frequent. This variation in leaf morphology might influence photosynthetic efficiency and water use.

Fruit ripening predominantly occurred in mid-June (38), with fewer accessions ripening in late June (17). This suggests that the majority of accessions have a relatively early fruiting period, which could be beneficial for agricultural planning and harvest management.

Fruit density was distributed among low (15), moderate (22), and high (18) categories, indicating variability in reproductive capacity among accessions. Regarding fruit shape, the most common forms were oblate (24) and spheroid (23), with fewer accessions producing ovate fruits (8). The predominant fruit skin color was cream (34), followed by light cream (11) and light brown (10), suggesting a relatively uniform pigmentation.

Fruit flesh color showed greater diversity, with accessions displaying cream (10), dark cream (6), light brown (16), brown (8), and dark brown (15) hues. This variation may influence consumer preference and processing quality.

Seed morphology was largely homogeneous, with most accessions exhibiting ear-shaped seeds (49), while kidney-shaped seeds were rare (6). The seed color was mostly brown (43), with a smaller proportion classified as light brown (11) and only one accession displaying a cream-colored seed. This suggests that seed traits are relatively stable, possibly due to genetic constraints. Our findings differ from those reported by Khanna et al.7, who conducted a similar study on W. somnifera (L.) Dunal using 14 accessions. These differences are likely attributed to species variations, despite both species belonging to the same family, as well as variations in the accessions used.

Overall, the observed morphological diversity among W. coagulans accessions highlights the species’ potential for breeding and conservation efforts. While traits such as leaf shape and seed morphology appear relatively stable, characteristics related to plant architecture, fruiting period, and fruit properties exhibit notable variability. This diversity may offer opportunities for selecting superior genotypes with desirable agronomic and medicinal traits.

The shrub, leaf, fruit, and seed of the studied W. coagulans accessions are shown in Fig. 2.

Fig. 2
figure 2

The shrub, leaf, fruit, and seed of the studied Withania coagulans accessions.

Correlation matrix analysis (CMA)

Correlation matrix analysis (CMA) is a statistical method used to assess the strength and direction of relationships between multiple quantitative variables37. In this analysis, Pearson’s correlation coefficient is commonly applied to determine the linear correlation between pairs of variables38. The Pearson correlation coefficient (r) ranges from − 1 to + 1, where + 1 indicates a perfect positive linear relationship, − 1 indicates a perfect negative linear relationship, and 0 signifies no linear relationship39. This method is particularly suitable for quantitative data, as it assumes that the variables being analyzed follow a linear relationship and are normally distributed40. By calculating pairwise correlations, researchers can identify patterns of association and multivariate relationships within the dataset, providing valuable insights into the nature and strength of these relationships. Pearson’s correlation is most effective when the data is continuous, and normally distributed, and the relationship between variables is linear, making it a standard choice in multivariate statistical analysis41.

Figure 3 presents the simple correlations among quantitative pomological and morphological traits in W. coagulans accessions. The results reveal significant and consistent relationships among leaf, fruit, and seed traits, reflecting coordinated growth and structural integration within the species.

Fig. 3
figure 3

Simple correlations among the quantitative pomological and morphological variables utilized in studied Withania coagulans accessions. For abbreviations, please see Table 2.

Upper leaf length and width were positively correlated (r = 0.58**), indicating proportional expansion in leaf dimensions. Likewise, fruit length showed strong positive correlations with both fruit width (r = 0.74**) and fruit weight (r = 0.64**), suggesting that larger fruits tend to be both wider and heavier. Fruit width and weight also exhibited a significant relationship (r = 0.63**), highlighting their interdependence in determining overall fruit size.

Fruit weight was positively associated with the number of seeds per fruit (r = 0.55**) and seed weight (r = 0.83**), implying that heavier fruits tend to carry more and heavier seeds. Additionally, fruit skin thickness and seed thickness were significantly correlated (r = 0.64**), which may point to a protective or physiological adaptation.

Interestingly, seed length was positively correlated with both seed width (r = 0.34*) and seed thickness (r = 0.48**), while a negative correlation was found between seed width and seed thickness (r = –0.38**). This inverse relationship suggests a possible morphological trade-off, likely driven by developmental constraints or internal resource allocation.

Overall, the correlations demonstrate complex yet structured interactions among morphological traits, with strong positive associations dominating. These interrelationships reflect developmental coordination and adaptive strategies, underscoring the importance of a multivariate approach in analyzing plant phenotypes and understanding ecological adaptations in W. coagulans.

Due to the absence of a correlation matrix analysis in the literature involving variables of this species and within this genus, the results have been evaluated as independent among themselves. This situation highlights the originality of the study while also demonstrating its potential to lay a foundation for future research. Therefore, the contribution of these findings to the literature gains even greater significance, particularly given the lack of similar studies.

Multiple regression analysis (MRA)

Multiple regression analysis (MRA) is a powerful statistical method used to examine the relationship between a dependent variable and multiple independent variables42. In this study, MRA was conducted using the stepwise method within the framework of linear regression analysis. This method systematically selects the most significant predictor variables while eliminating redundant ones, thereby improving the efficiency and interpretability of the model43. As part of the regression model, the correlation coefficient (r) was calculated to determine the strength and direction of the relationship between the dependent and independent variables. Additionally, the coefficient of determination (r2) was computed to assess the explanatory power of the model, indicating the proportion of variance in the dependent variable accounted for by the independent variables44. To compare the relative effects of different predictors on the dependent variable, standardized beta coefficients (β) were utilized. These coefficients provide a standardized measure of the influence of each independent variable, enabling meaningful comparisons across variables with different measurement scales42. Before the analysis, assumptions of normality and multicollinearity were examined. To detect potential multicollinearity issues, variance inflation factors (VIF) were computed, ensuring that the predictors did not exhibit excessive intercorrelations44. The significance of the regression coefficients was assessed using t-tests, while the overall model fit was evaluated using R2 and adjusted R2 values. Additionally, residual diagnostics were performed to verify the assumptions of linear regression, including the normality of residuals and homoscedasticity43. The significance level for all statistical tests was set at p < 0.0545.

The findings from the multiple regression analysis revealed that fruit weight plays a crucial role in determining key reproductive traits such as seed count and seed weight in W. coagulans (Table 4). Specifically, fruit weight was identified as a strong predictor of both the number of seeds per fruit and seed weight, indicating that larger fruits tend to produce more seeds and heavier seeds. The strong positive relationship between seed weight per fruit (β = 0.66, p = 0.00) and fruit weight supports the notion that fruit size and mass are critical factors influencing seed development in W. coagulans. This highlights the importance of fruit size as a determinant of reproductive success in this species.

Table 4 The characteristics associated with fruit- and seed-related traits in studied Withania coagulans accessions as revealed using MRA and coefficients.

In addition to seed weight, upper leaf length (β = 0.21, p = 0.00) and fruit length (β = 0.21, p = 0.01) also exhibit positive correlations with fruit weight, suggesting that larger leaves and fruit length may contribute to overall plant productivity and fruit mass. On the other hand, seed thickness (β = 0.25, p = 0.00) has a positive effect but with a less pronounced influence compared with seed weight, further emphasizing the role of seed development in fruit weight.

For the number of seeds per fruit, the strongest predictor is fruit weight (β = 0.46, p = 0.00), followed by fruit length (β = 0.31, p = 0.02). Interestingly, fruit skin thickness (β = − 0.34, p = 0.00) and upper leaf petiole length (β = − 0.25, p = 0.02) show negative correlations, suggesting that these traits may limit seed production. This negative relationship could point to a resource allocation effect, where thicker fruit skins and longer petioles may divert resources away from seed production.

When considering seed weight per fruit, the most influential factor is fruit weight (β = 0.79, p = 0.00), indicating a very strong positive relationship. Larger fruits tend to produce heavier seeds, which is a critical factor in the reproductive success of W. coagulans. Fruit pedicel length (β = 0.22, p = 0.00) also has a positive impact on seed weight, suggesting that longer pedicels may facilitate better nutrient transport to the seeds. However, seed thickness (β = − 0.18, p = 0.01) and upper leaf petiole length (β = − 0.15, p = 0.03) exhibit slight negative correlations, indicating a minor inhibitory effect on seed weight. The bold values are supported by the correlation matrix analysis.

Overall, the results underscore the central role of fruit weight in shaping the reproductive traits of W. coagulans. These findings suggest that fruit size, along with seed development traits, are critical in determining seed production. Additionally, the negative correlations observed with fruit skin thickness and upper leaf petiole length emphasize the potential role of resource allocation in influencing seed production. These results provide important insights into the reproductive strategies of W. coagulans, highlighting the significance of fruit size and morphology in reproductive success.

Due to the absence of MRA on similar variables within the same species and genus in previous studies, the findings have been independently interpreted and discussed among themselves within the scope of this study.

Principal component analysis (PCA)

Principal component analysis (PCA) is a technique aimed at uncovering the underlying structure of multivariate data. This method allows for the reduction of high-dimensional datasets to lower dimensions while preserving the most meaningful information, thus facilitating the visualization and interpretation of the data. PCA is particularly useful when dealing with datasets that include both quantitative and qualitative variables, as it helps identify relationships between variables and reduces the complexity of the data46.

In the analysis, Kaiser Normalization was applied to standardize the data, ensuring that all variables contribute equally to the analysis47. This step enables a more accurate comparison of variables by eliminating units of measurement as a source of bias. Additionally, the Varimax rotation method was employed to enhance the interpretability of the principal components by maximizing the variance of the squared loadings of a factor across variables48. This rotation technique helps clarify the structure of the factors, ensuring that each factor represents a distinct underlying dimension, which is crucial for meaningful interpretation49. The application of Varimax rotation results in more easily interpretable factors, as it allows for clearer differentiation between components50. Principal components were selected based on the criterion that only those with an eigenvalue exceeding 1.00 were included, as this threshold ensures that the components capture a significant amount of variance in the data, thus contributing meaningfully to the analysis51. The rotation procedure reached convergence after 12 iterations, indicating that the model effectively accounted for the variance within the dataset (Table 5).

Table 5 Eigenvalues of the principal component axes from the PCA of pomological and morphological characters in the studied Withania coagulans accessions.

The first 12 principal components account for 80.23% of the total variation, indicating that these components collectively capture the majority of the variability present in the dataset. Statistical significance of the components was assessed, and it was determined that PC1, PC2, and PC3 were statistically significant (p < 0.01). This highlights the importance of these components in explaining the structure of the data.

PC1 represents 11.44% of the total variation and is predominantly supported by variables related to fruit characteristics, such as seed weight per fruit (0.86), fruit weight (0.85), fruit pedicel diameter (0.77), fruit width (0.69), fruit length (0.67), number of seeds per fruit (0.67), and fruit pedicel length (0.55). These results suggest that the primary factors influencing the variability in fruit-related traits are captured in this component. The high factor loadings for these variables indicate a strong relationship between them and PC1, emphasizing their collective contribution to fruit morphology and weight.

PC2 accounts for 10.64% of the total variation, with a notable contribution from variables related to leaf morphology, including basal leaf length (0.86), upper leaf petiole width (0.81), upper leaf length (0.78), basal leaf width (0.75), upper leaf width (0.75), and upper leaf petiole width (0.66). This component reflects the variability in leaf structure, suggesting that these leaf-related traits play a significant role in the overall variation within the dataset. The positive loadings for these variables indicate that they tend to vary in the same direction, contributing collectively to the variance captured by PC2.

PC3 explains 9.55% of the total variation and is largely driven by variables related to leaf apex and shape, including upper leaf apex shape (0.90), basal leaf apex shape (0.75), basal leaf shape (− 0.81), and upper leaf shape (− 0.86). The negative loadings for basal leaf shape and upper leaf shape suggest an inverse relationship between these variables and the others in PC3. This component captures variation in the shape and apex characteristics of the leaves, providing insights into how these specific morphological traits contribute to the overall variation in the dataset.

Together, the first three principal components explain 31.63% of the total variation, highlighting the combined influence of fruit and leaf morphological traits in shaping the overall variation within the dataset. These findings emphasize the importance of considering both fruit and leaf characteristics in understanding the genetic and phenotypic diversity within the study material. The results also suggest that fruit-related traits, particularly those concerning weight and shape, are strongly interrelated, while leaf-related traits exhibit a separate but significant contribution to the overall variation. This analysis offers an important understanding of the key factors influencing variability in the examined accessions and can inform future breeding or cultivation approaches aimed at enhancing these traits.

Biplot analysis is a technique commonly used to visualize multidimensional datasets. Derived from PCA, this method displays the first two principal components (typically PC1 and PC2) on the same plot, allowing for an understanding of the data structure52. The biplot represents both observations (e.g., accessions, genotypes) and variables (e.g., pomological, morphological traits) in a two-dimensional plane. This analysis is particularly useful when working with quantitative data as it facilitates the visualization of relationships and patterns among variables53,54. Additionally, it enables a quick assessment of similarities and differences within the dataset55.

Biplot for the studied W. coagulans accessions and quantitative variables based on PC1/PC2 of pomological and morphological traits is shown in Fig. 4. Thus, the first two components (PC1 = 29.88% and PC2 = 15.18%) account for 45.06% of the total variation.

Fig. 4
figure 4

Biplot for the studied Withania coagulans accessions and quantitative variables based on PC1/PC2 of pomological and morphological traits. For abbreviations, please see Table 2.

The accessions were distributed across four distinct regions in the biplot, with quantitative variables scattered across three different clusters. Cluster 1 included accessions such as ‘Hit-7’, ‘Kahiri-5’, ‘Kahiri-6’, ‘Kahiri-7’, ‘Kahiri-11’, ‘Malekabad-1’, ‘Malekabad-4’, ‘Malekabad-5’, ‘Veshder-1’, ‘Veshder-2’, ‘Veshder-4’, ‘Veshder-6’, ‘Veshder-7’, and ‘Veshder-9’, with the plant height variable. This cluster shows a clear association between these accessions and plant height, suggesting that these accessions exhibit similar growth patterns. The proximity of these accessions in the biplot indicates they share certain phenotypic traits, possibly related to height, that distinguish them from those in other clusters.

Cluster 2 was characterized by accessions such as ‘Hit-1’, ‘Hit-2’, ‘Hit-3’, ‘Hit-4’, ‘Hit-5’, ‘Hit-6’, ‘Hit-8’, ‘Kahiri-10’, ‘Malekabad-8’, ‘Malekabad-9’, ‘Malekabad-10’, ‘Malekabad-11’, ‘Malekabad-13’, ‘Veshder-8’, and ‘Veshder-10’, associated with variables like basal leaf length, basal leaf width, basal leaf petiole length, upper leaf petiole width, upper leaf length, upper leaf width, upper leaf petiole length, upper leaf petiole width, and seed width. The grouping of these accessions indicates a shared set of leaf and seed traits, suggesting that these traits may be a defining factor for these accessions. The clustering of multiple leaf-related variables further emphasizes the importance of leaf morphology in characterizing this group.

Cluster 3 consisted of accessions such as ‘Kahiri-1’, ‘Kahiri-4’, ‘Kahiri-9’, ‘Malekabad-2’, ‘Malekabad-3’, ‘Malekabad-6’, ‘Malekabad-7’, ‘Malekabad-12’, ‘Rachedr-4’, ‘Rachedr-5’, ‘Rachedr-9’, ‘Rachedr-10’, ‘Veshder-3’, and ‘Veshder-5’. These accessions appear to exhibit some distinct morphological features not captured by the clusters associated with leaf and seed traits, as observed in cluster 2 and cluster 4. The lack of a dominant variable associated with these accessions suggests a more heterogeneous set of traits, and further investigation into their specific characteristics might reveal underlying factors contributing to their clustering.

Finally, cluster 4 included accessions such as ‘Kahiri-2’, ‘Kahiri-3’, ‘Kahiri-8’, ‘Malekabad-14’, ‘Malekabad-15’, ‘Rachedr-1’, ‘Rachedr-2’, ‘Rachedr-3’, ‘Rachedr-6’, ‘Rachedr-7’, ‘Rachedr-8’, and ‘Rachedr-11’, and was associated with fruit-related variables, including fruit length, fruit width, fruit pedicel length, fruit pedicel diameter, fruit weight, fruit skin thickness, number of seeds per fruit, seed weight per fruit, seed length, and seed thickness. The grouping of these accessions suggests that fruit-related traits, such as size, weight, and seed characteristics, are particularly important for distinguishing these accessions. These traits may represent significant phenotypic differences that are not as prominent in the other clusters.

Additionally, it is noteworthy that the accessions ‘Kahiri-5’, ‘Kahiri-7’, ‘Kahiri-8’, ‘Kahiri-11’, ‘Malekabad-1’, ‘Malekabad-2’, ‘Malekabad-5’, ‘Rachedr-4’, ‘Rachedr-6’, and ‘Rachedr-7’ fell outside the 95% confidence ellipse, indicating that these accessions exhibit variability that is not fully captured by the common groupings of the biplot. This suggests that these accessions may represent unique genetic backgrounds or phenotypic variations that are distinct from the majority of the studied accessions56. This divergence warrants further investigation to understand the underlying factors contributing to their exceptional positioning in the biplot, which could be useful for future breeding programs or trait optimization efforts57,58.

In summary, the four clusters provide significant insights into the variation among accessions, emphasizing the importance of plant height, leaf characteristics, and fruit-related traits in shaping the diversity of the studied accessions. The clear separation of clusters based on these key traits suggests that these morphological characteristics can serve as valuable indicators for future breeding programs aimed at optimizing specific traits. Cluster 1 and cluster 4 stand out due to their strong associations with plant height and fruit traits, respectively, while cluster 2’s linkage to leaf and seed traits underscores the role of these factors in genetic differentiation. Cluster 3 represents a more diverse group of accessions, warranting further investigation to identify the specific traits contributing to this heterogeneity.

These findings highlight the strong association between specific accessions and certain phenotypic traits, revealing patterns that can guide future research on genetic diversity and the selection of desirable traits in breeding programs. Moreover, the deviation of certain accessions from the 95% confidence ellipse underscores the need for a more in-depth analysis of their unique genetic and phenotypic characteristics.

Heat map analysis (HMA)

In this study, heat map analysis (HMA) was performed using Ward’s method and Euclidean distance for clustering quantitative data. Ward’s method aims to create more balanced and compact groups by minimizing within-cluster variance59. During the clustering process, at each step, two clusters were merged to minimize the increase in the total within-cluster sum of squares. The Euclidean distance was chosen as the distance metric to calculate the linear distance between two data points60. Euclidean distance is only suitable for quantitative variables, and to prevent differences in variable scales from affecting the clustering results, all data were standardized before analysis61. The dendrogram obtained from the heat map analysis was used to determine the optimal number of clusters and to visualize the relationships among them. To enhance interpretability, an appropriate color scale was selected, and the distribution of data was examined in detail.

The hierarchical clustering analysis of W. coagulans accessions and variables, based on pomological and morphological characterizations, demonstrated a structured differentiation pattern, as visualized in Fig. 5. The clustering process initially divided the variables into two major groups, A and B, which were further split into subgroups A1, A2, B1, and B2.

Fig. 5
figure 5

Visualization of clustering patterns of Withania coagulans accessions and variables based on pomological and morphological characterizations using a heat map. For abbreviations, please see Table 2.

In subgroup A1, the variables plant height, basal leaf length, upper leaf length, and number of seeds per fruit were clustered, indicating a strong association with vegetative growth and reproductive potential. Subgroup A2, consisting of basal leaf width, upper leaf width, basal leaf petiole length, fruit length, upper leaf petiole length, and fruit pedicel length, suggests a grouping of traits related to leaf and fruit dimensions. Conversely, subgroup B1 contained upper leaf petiole width, seed length, seed width, fruit pedicel diameter, and seed thickness, suggesting a focus on seed morphology and associated reproductive traits. Finally, subgroup B2, which included fruit weight, seed weight per fruit, and fruit skin thickness, highlights characteristics that primarily impact fruit quality and yield.

Similarly, accessions were first classified into two primary groups, C and D, and then further into subgroups C1, C2, D1, and D2. Subgroup C1 contained accessions ‘Hit-8’, ‘Hit-6’, ‘Veshder-8’, ‘Hit-4’, ‘Veshder-5’, ‘Veshder-10’, ‘Hit-5’, ‘Hit-7’, ‘Malekabad-13’, ‘Malekabad-11’, ‘Hit-3’, ‘Hit-2’, ‘Malekabad-10’, ‘Malekabad-9’, and ‘Malekabad-8’, suggesting these accessions share similar morphological characteristics, potentially due to genetic proximity or environmental adaptation. Subgroup C2 comprised accessions ‘Kahiri-8’, ‘Kahiri-2’, ‘Malekabad-15’, ‘Malekabad-7’, ‘Malekabad-6’, ‘Rachedr-6’, ‘Rachedr-1’, ‘Rachedr-8’, ‘Rachedr-3’, ‘Rachedr-7’, ‘Rachedr-2’, ‘Malekabad-14’, ‘Malekabad-12’, and ‘Malekabad-3’, representing another genetically distinct cluster with shared pomological features.

Subgroup D1 included accessions ‘Veshder-9’, ‘Veshder-2’, ‘Kahiri-6’, ‘Veshder-7’, ‘Veshder-1’, ‘Kahiri-9’, ‘Veshder-4’, ‘Veshder-3’, ‘Kahiri-4’, ‘Veshder-6’, ‘Hit-1’, ‘Kahiri-10’, and ‘Kahiri-3’, while subgroup D2 contained ‘Rachedr-11’, ‘Rachedr-10’, ‘Rachedr-5’, ‘Rachedr-4’, ‘Kahiri-7’, ‘Kahiri-1’, ‘Rachedr-9’, ‘Malekabad-2’, ‘Kahiri-11’, ‘Kahiri-5’, ‘Malekabad-4’, ‘Malekabad-5’, and ‘Malekabad-1’. These subgroups illustrate additional differentiation among accessions, likely driven by genetic variation or environmental influences.

The observed clustering patterns provide valuable insights into the phenotypic diversity of W. coagulans accessions. The distinct grouping of accessions suggests the presence of underlying genetic divergence or differential adaptation mechanisms. The strong association of specific pomological and morphological traits within each cluster presents a valuable framework for selection criteria in breeding programs aimed at enhancing fruit quality, yield, or adaptability.

Although the clustering was primarily based on morphological similarity, a partial geographic consistency was observed. In particular, several accessions from the Malekabad, Kahiri, and Rachedr regions were grouped within the same clusters (especially C2 and D2), suggesting that regional environmental conditions or shared genetic backgrounds may influence morphological expression. However, not all clusters showed a complete match with collection locations, indicating that phenotypic structure is shaped by both genetic diversity and environmental heterogeneity.

This pattern suggests the presence of genotype × environment interactions, where phenotypic traits are differentially expressed in response to local agroecological conditions. The convergence of accessions from similar regions within distinct clusters may reflect site-specific selective pressures such as altitude, soil texture, or moisture availability. Moreover, the partial mismatch between geography and clustering outcome also implies a possible gene flow between populations or historical seed exchange across regions. Such complexities reinforce the need for integrating molecular markers in future studies to disentangle the relative contributions of genetic and environmental components to observed phenotypic variation.

Moreover, the clear separation of fruit-related traits (B2 subgroup) from vegetative traits (A1 and A2 subgroups) implies potentially distinct genetic controls governing plant architecture and fruiting characteristics. This differentiation underscores the necessity of further genomic and physiological investigations to unravel the regulatory mechanisms behind these trait clusters.

Overall, the findings emphasize the effectiveness of clustering analysis in distinguishing accessions based on their morphological and pomological attributes. These results reinforce the importance of multivariate approaches in germplasm characterization and selection strategies for W. coagulans improvement, providing a robust foundation for future breeding and conservation efforts.

Due to the absence of HMA on similar variables within the same species and genus in previous studies, the findings of this study have been interpreted independently and discussed among themselves.

Conclusions

This study presents a comprehensive evaluation of the pomological and morphological diversity among W. coagulans accessions using multivariate statistical approaches. Significant variation in plant traits was observed, reflecting genetic diversity and adaptive potential. The integration of HMA, PCA, and CMA revealed clear trait associations and accession groupings, highlighting phenotypic differentiation within the germplasm.

HMA successfully classified accessions based on leaf, fruit, and seed traits, while PCA emphasized the contribution of fruit and seed parameters to overall phenotypic variation. Strong correlations among fruit weight, seed weight, and reproductive traits suggest their importance in selection strategies. Notably, traits such as pedicel length, leaf size, and plant architecture emerged as promising markers for breeding and conservation.

Several accessions exhibited superior characteristics. ‘Kahiri-8’ and ‘Malekabad-15’ showed enhanced fruit size and weight; ‘Hit-8’ and ‘Veshder-8’ exhibited high vegetative growth; while ‘Rachedr-11’ and ‘Kahiri-7’ had high seed output. ‘Malekabad-6’ and ‘Rachedr-6’ combined favorable fruit and seed traits, indicating potential for multi-purpose use.

Given the scarcity of prior studies on W. coagulans, our findings fill a crucial gap and establish a foundation for future molecular and agronomic research. The results underline the value of multivariate tools for germplasm characterization and support the potential of this neglected medicinal plant in breeding and conservation programs.