Introduction

Barberry, particularly the seedless cultivar ‘Zereshk Bidaneh’, is a distinctive fruit crop cultivated across arid and semi-arid regions, notably in South Khorasan, Iran, which spans over 20,000 hectares. Its remarkable ability to thrive in low-resource, saline soils is a significant advantage3. Iran hosts five wild barberry species: Berberis vulgaris, B. orthobotrys, B. khorasanica, B. crataegina, and B. integerrima, each valued for their unique fruit qualities, commonly used in local juices and as medicinal herbs3,11.

Research conducted by Rezaei and Balandary37 highlighted notable genetic diversity within 42 Iranian wild barberry genotypes in a germplasm collection. This diversity pertains to various quantitative and qualitative fruit traits, including color, size, shape, taste, seed count per berry, and cluster size. Among these, superior genotypes coded R2N1, R4N1 (from B. integerrima), and R5N1 (from B. crataegina) were recognized for their large and heavy fruits, as well as high vitamin C and anthocyanin content. A breeding project that involved reciprocal crosses among these genotypes and the cultivated seedless cultivar ‘Zereshk Bidaneh’ has produced 90 promising hybrids, which warrant further investigation37,41,42. While previous studies have concentrated on morphological diversity among hybrids, the phenological and biochemical characteristics essential for breeding initiatives have not been thoroughly explored42. The current research aims to assess the phenological and biochemical diversity of these F1 hybrid progeny.

Barberry berries are rich in bioactive compounds, including phenolics, anthocyanins, flavonoids, carotenoids, alkaloids, and vitamin C (VC)14. Their total antioxidant capacity (TAC) has been found to exceed that of common fruits like apples, oranges, and blueberries39, highlighting barberry’s potential as a natural source for industrial applications. The barberry berries exhibit a color spectrum ranging from red to dark blue and blackish-purple, which influences their potential usage as natural colorants38. Dark-colored berries are particularly noted for their greater color stability32. The red and black hues of barberry berries are utilized as natural food colorants, particularly in rice dishes3. However, there is a lack of published information regarding the phenological characteristics of barberries, including flowering time, duration of flowering, and fruit ripening period. This research is the first research conducted on these hybrid genotypes obtained.

In plant breeding, genetic diversity and selection are crucial, particularly for barberry species, given their extensive diversity and applications. Numerous studies have explored morphological diversity across different regions, including investigations on B. asiatica, B. lycium, and B. aristata in West Asia, India, and Pakistan2,7,49. Within Iran, research has focused on B. integerrima and B. crataegina18,38,47, while studies in Türkiye and Kyrgyzstan have included B. vulgaris, B. integerrima, and B. crataegina34,55. Extensive evaluations have been conducted on antioxidant compounds and alkaloid levels across various barberry species, revealing significant variations in phenolic and anthocyanin content4,16,25,36,45. Barberry genotypes from various locations exhibit differences in fruit quality and quantity, likely due to the interplay of growing conditions, genetic backgrounds, climates, geography, cultural practices, and maturation stages7. Consequently, evaluating germplasm under uniform environmental conditions is vital for accurate assessments.

Therefore, the present study was conducted to comprehensively evaluate the phenological and biochemical diversity of 90 F1 hybrid barberry progenies derived from crosses between ‘Zereshk Bidaneh’ and selected wild genotypes. By examining traits such as flowering time, fruit maturation, total anthocyanin content, phenolics, vitamin C, antioxidant activity, and flavor indices, the study aims to identify superior genotypes with potential for breeding programs and industrial applications.

Materials and methods

Plant materials

A total of 90 barberry genotypes (as detailed in Table 1) with the seedless cultivar ‘Zereshk Bidaneh’ were meticulously chosen for this study from the collection housed at the Faculty of Agriculture, Shahrood University of Technology, Shahrood, Iran in year of 2023. The city of Shahroud is positioned at a latitude of 36°25’ N and a longitude of 54°58’ E, with an elevation of 1380 m above sea level, experiencing a cold and arid climate. These selected genotypes were the outcome of precisely controlled reciprocal hybridizations involving various barberry species, which included R2N1 and R4N1 from B. integerrima, as well as R5N1 from B. crataegina, in crossing with the seedless cultivar ‘Zereshk Bidaneh’.

The formal identification of the specimens was performed by Dr. Mehdi Rezaei. A herbarium voucher specimen with sediment number BB-2313 has been donated to the public available herbarium of the Faculty of Agriculture of Shahrood University of Technology, Iran.

The seedlings were cultivated with a spacing of 1.5 × 2 m, and they underwent standardized horticultural practices, including nutritional and irrigation protocols. The seedlings entered their initial fruiting phase in the year 2018. Each genotype was assigned a unique code reflecting the crossbreeding of the respective parental genotypes, detailed in Table 1.

Table 1 Progeny code and parental crosses in the germplasm collection of barberry (Berberis spp.) at the faculty of agriculture, Shahrood university of technology, iran.

Phenological measurements

The phenological traits of the barberry genotypes were systematically evaluated through daily visual observations and records from three different branches of each genotype. The phenological traits included the “Start Flowering”, marking the initiation of flowering and the opening of one-quarter of the flowers; “Full bloom”, indicating the date when three-quarters of the flowers had fully opened; “Petal Fall”, denoting the date of petal shedding, emergence, and visibility of fruits; and “Ripen & Harvest”, which indicated the date of fruit maturity, often coinciding with autumn’s cooler temperatures and the fruit’s preference for sweetness.

Biochemical measurements

Sensory properties

Total soluble solids (TSS) in fruit juice were measured using a handheld refractometer (Atago, Japan) at room temperature in (˚Brix). The pH of the fruit juice, without any dilution, was measured using a digital pH meter (Mettler Toledo, Switzerland)40. Titratable acidity (TA) was determined by titrating a diluted sample of fruit juice with 0.1 N sodium hydroxide (NaOH) solution, using phenolphthalein as an indicator. The titration continued until the solution reached a pH of 8.1, at which point the volume of NaOH consumed was recorded. The acid content was then expressed as a percentage, representing the citric acid concentration in the fruit juice sample40. The fruit flavor index was calculated based on the ratio of TSS to TA.

Vitamin C (VC)

To determine the vitamin C content, 1 mL of fruit juice was accurately homogenized with 2 mL of a solution containing 16% metaphosphoric acid and 0.18% EDTA. The mixture was then centrifuged at 10,000 rpm for 5 min using a tabletop centrifuge (Elektromag M4800, İstanbul, Türkiye) to obtain the clear supernatant. Subsequently, 400 µL of the supernatant was combined with 200 µL of a 3% metaphosphoric acid solution and 200 µL of the Folin–Ciocalteu reagent. To reach a final reaction volume of 2 mL, 1,200 µL of distilled water was added. The mixture was then allowed to stand for 10 min at room temperature to ensure full reaction development. The absorbance of the resulting solution was measured at 760 nm using a UV-visible spectrophotometer (UNICO, UV/Vis 2100). The vitamin C concentration in the sample was calculated using the following formula.

$${\rm VC (mg/100 mL) = (Abs_{sample} / Abs_{standard}) \times Standard concentration \times Dilution factor}$$

Total anthocyanin content (TAC)

The total anthocyanin content (TAC) of barberry juice samples was determined according to the pH differential method described by Lako et al.27. This technique is based on spectrophotometric measurements using two buffer systems: a 0.25 M potassium chloride (KCl) buffer at pH 1.0 and a 0.4 M sodium acetate (C₂H₃NaO₂) buffer at pH 4.5. In this procedure, 400 µL of fruit juice was thoroughly mixed with 3.6 mL of each buffer solution. The absorbance of the resulting mixtures was recorded at 510 nm and 700 nm using a spectrophotometer. The total anthocyanin content was then calculated as milligrams of cyanidin 3-glucoside equivalent per liter (mg/L) using the following equation56:

$${\rm Total\: anthocyanin\: content (mg/L) = (A\times MW\times DF\times 104)/(\epsilon \times L)}$$

Where: A = (Abs510 - Abs700)pH 1.0 - (Abs510 - Abs700)pH 4.5. MW = Molecular weight of cyanidin 3-glucoside (484.84 g/mol). DF = Dilution factor [(DF) = Final Volume/Initial Volume)]. ε = Molar absorptivity of cyanidin 3-glucoside (29,600 L/mol cm). L = Path length of cuvette (1 cm).

Total phenolic content (TPC)

The total phenolic content (TPC) of the fruit juice samples was determined following the method described by Lamuela-Raventós28. Specifically, 1 mL of fruit juice was reacted with the Folin–Ciocalteu reagent, followed by the addition of sodium carbonate as a reducing agent. After the designated incubation period, the absorbance of the resulting blue-colored complex was measured at 760 nm using a UV-visible spectrophotometer. The absorbance values were then compared against a gallic acid calibration curve, and the results were expressed as milligrams of gallic acid equivalents per 100 g of fresh weight (mg GAE/100 g FW).

Total antioxidant activity (TAA)

The total antioxidant activity (TAA) was assessed using the DPPH (2,2-diphenyl-1-picrylhydrazyl) radical scavenging method. In this assay, 40 µL of fruit extract—obtained from the TPC measurement step—was mixed with 960 µL of a 4 mM DPPH methanol solution [29]. The mixture was homogenized using a vortex mixer (Labtorn LVOM-A40, 200–3000 rpm, Surrey, United Kingdom) for 3 min at 300 rpm and incubated at room temperature for 15 min (Elektromag M 3025 BP, İstanbul, Türkiye). Following incubation, the absorbance was measured at 517 nm using a spectrophotometer. A DPPH solution without fruit extract served as the blank, and its absorbance was also recorded. The percentage of antioxidant activity was calculated using the following equation48.

$$\:\text{\%}\:\text{i}\text{n}\text{h}\text{i}\text{b}\text{i}\text{t}\text{i}\text{o}\text{n}\:\text{o}\text{f}\:\text{D}\text{P}\text{P}\text{H}=\left[\text{A}\text{b}\text{s}\left(\text{b}\text{l}\text{a}\text{n}\text{k}\right)-\text{A}\text{b}\text{s}\left(\text{s}\text{a}\text{m}\text{p}\text{l}\text{e}\right)/\text{A}\text{b}\text{s}\left(\text{b}\text{l}\text{a}\text{n}\text{k}\right)\right]\times\:100$$

Statistical analysis

To evaluate the biochemical traits related to the fruit, samples were collected randomly during the ripening period in October, November, and December of 2023. For each genotype, three replications were used, and each replication consisted of 20 fruits. The sampling followed a Completely Randomized Design (CRD) to ensure unbiased analysis. The average values obtained from the replications were used for the biochemical assessments to more accurately characterize each genotype. This approach minimized the influence of external factors and ensured statistically reliable results, allowing for an accurate evaluation of the biochemical variation among genotypes.

For the analysis of phenological traits, observations were conducted during the spring and autumn seasons of 2023. These evaluations were carried out under uniform environmental conditions to ensure data consistency, reliability, and reproducibility.

To determine the variation among genotypes based on the measured traits, a one-way analysis of variance (ANOVA, p < 0.05) was performed using JMP® Pro 17 software19. Relationships between the recorded traits were examined using Pearson correlation coefficients (r) and visualized through analyses conducted with Origin Pro® 2024b software31. To identify the main traits responsible for genotype differentiation, Principal Component Analysis (PCA) was employed. The interpretability of the components was enhanced by applying Varimax rotation with Kaiser Normalization. In addition, heat map analysis was performed using Ward’s method and Euclidean distance coefficients to classify both genotypes and variables. A two-dimensional biplot was generated based on the first and second principal components (PC1 and PC2) to illustrate the distribution patterns of genotypes and associated traits.

Results and discussion

Phenological characters

This study conducted a thorough evaluation of phenological traits, including the onset of flowering, full bloom, petal abscission, and fruit harvesting time, across a diverse collection of 91 barberry genotypes, as illustrated in Fig. 1; Table 2. Significant differences were observed among these genotypes. Genotype ‘0506’ exhibited an early onset of flowering on April 15 and reached full bloom by April 20, with early petal shedding occurring on April 25. In contrast, genotype ‘0202’ demonstrated a delayed response, with flowering beginning on May 5 and concluding with petal abscission on May 13. This indicates a notable difference of approximately 40 days in the flowering onset between the earliest and latest blooming genotypes. For comparison, the flowering onset dates for apple and pear varieties in this area, Bastam, Shahrood, Iran, during the same year ranged from approximately April 20 to April 29, while stone fruits such as apricots and almonds flowered between March 20 and March 30 (Personal observation). It is noteworthy that barberry flowers bloom after those of apple and pear, thereby minimizing the risk of spring frost damage to the flowers. Ali-Tavakoli-Kaghaz et al.5 examined the phenological stages of the species B. integerrima in various regions of Kerman Province, Iran, reporting a flowering duration ranging from 30.5 to 38.5 days, which aligns with the results of this study.

The duration of flowering varied significantly among the genotypes, lasting from seven days for genotypes ‘0104’, ‘0105’, ‘0203’, ‘0204’, ‘0205’, and ‘0605’, to an extended 20 days for genotype ‘0538’. Furthermore, genotypes ‘0202’ and ‘0405’ exhibited the earliest fruit ripening, reaching maturity on October 26, while genotype ‘0101’ fruits ripened on December 4, and genotype ‘0803’ on November 28. The difference in fruit maturation times between the earliest and latest ripening genotypes was 38 days. The documented variations in flowering and fruiting timelines among diverse barberry genotypes hold intrinsic value for prospective breeding programs15. In wild barberry in Kerman Province, Iran, the number of days until harvest maturity has been reported to range between 164 and 218 days. A notable strength of our study is the presence of various genotypes within a single collection, all under uniform climatic conditions and management practices.

Aligning cultivation practices with identified phenological stages offers a pathway to optimize resource utilization and improve overall crop management efficiency51. Moreover, these insights contribute to a broader understanding of the ecological adaptability of barberry genotypes, illuminating their responses to environmental cues. By capitalizing on the unique flowering timelines and frost resistance of barberry, researchers and practitioners can further enhance cultivation strategies, ensuring the development of resilient and high-yielding cultivars that are well-adapted to changing climatic conditions.

Fig. 1
figure 1

Phenological stages of flowering and fruit ripening in ‘Zereshk Bidaneh’ (Seedless) and 90 F1 progenies from interspecies hybrids of barberry. More details regarding genotype codes are in Table 1.

Biochemical characters

The hybridization of different barberry species has resulted in a significant degree of morphological variation (p < 0.05). This variation is particularly evident in the diverse fruit colors, shapes, and sizes exhibited by the hybrids (Fig. 2).

Fig. 2
figure 2

The variation in berries color, shape, and size of some hybrids resulting from interspecific hybridization of barberry.

The evaluation of biochemical traits among the investigated barberry genotypes revealed substantial and informative variability, particularly iconcerning total anthocyanin content (TAC), a key compound contributing both to fruit pigmentation and potential health benefits due to its antioxidant properties. TAC levels varied significantly among genotypes, ranging from 133.36 mg/L in genotype ‘0203’ to 2063.55 mg/L in ‘0601’, with a population mean of 614.44 mg/L cyanidin 3-glycoside (Table 2). The coefficient of variation (CV) for TAC reached 67%, indicating a high degree of genetic diversity for this trait within the studied population.

As depicted in Fig. 3, the majority of genotypes exhibited moderate TAC levels (250–500 mg/L); however, four genotypes (‘0601’, ‘0606’, ‘0607’, and ‘0617’) showed exceptionally high TAC values exceeding 1250 mg/L. These high-performing genotypes originated from a common parental background (coded 06), resulting from the cross between R5N1♀ × BD♂. The superior anthocyanin accumulation observed in R5N1 had previously been documented by Rezaei et al.38, who noted its dark blue pigmentation and high phenolic concentration, thus supporting the hypothesis of strong heritability for anthocyanin biosynthesis in barberry.

Comparative analysis with prior studies further reinforces the significance of the present findings. Özgen et al.32 reported TAC values ranging from 506 to 803 mg/L cyanidin 3-glycoside in fruit juice among six B. vulgaris genotypes from Central Türkiye—substantially lower than the maximum values recorded in the current F1 population. Ardestani et al.6 observed TAC levels of 22.96–65.28 mg/100 g DW in B. vulgaris and 39.15–417.07 mg/100 g DW in B. integerrima from Iran. Similarly, Nilda et al.30 reported a TAC range of 360–874 mg/L in B. vulgaris fruit juice, while Şensu et al.43 noted concentrations of 455.91 mg and 366.52 mg/100 g DW in B. integerrima and B. vulgaris, respectively. Yildiz et al.54 documented lower TAC values (35.1–100.4 mg/100 g DW) in B. vulgaris genotypes from various Turkish provinces.

The wide spectrum of TAC observed in this study not only confirms the existence of considerable genotypic variation but also highlights the potential for selecting elite genotypes with enhanced anthocyanin profiles. These findings offer valuable insights for breeding programs aiming to improve both the nutritional value and visual appeal of barberry cultivars through targeted selection strategies.

VC, a vital antioxidant compound, exhibited considerable variation among the evaluated genotypes, with values ranging from 81.74 mg/100 mL in genotype ‘0505’ to 136.56 mg/100 mL in ‘0203’, and a population mean of 108.50 mg/100 mL (Table 2). As illustrated in Fig. 3, the majority of genotypes fell within the 100–120 mg/100 mL range, indicating a relatively high VC content across the F1 population. Notably, genotypes ‘0202’, ‘0305’, ‘0516’, and ‘0804’ exceeded 130 mg/100 mL, demonstrating exceptional ascorbic acid accumulation. These levels surpass those commonly reported for commercially important fruits such as oranges and strawberries35, thereby highlighting the potential nutritional value of these barberry genotypes.

TAA, reflecting the overall presence of health-promoting antioxidant compounds, ranged from 7.89% (‘0607’) to 90% (‘0203’) (Table 2). Most genotypes clustered within the 70–80% range, indicating consistently high antioxidant potential (Fig. 3). Previous studies have noted that certain Berberis species exhibit antioxidant capacities comparable to, or even exceeding, those of well-known antioxidant-rich fruits like blueberries39. This strong antioxidant activity reinforces the medicinal and functional value of barberry fruits17.

TPC, a critical parameter linked to antioxidant defense and metabolic health, displayed a wide range among genotypes, from 1224 mg GAE/100 mL (‘0308’) to 2082 mg GAE/100 mL (‘0603’), with a mean of 1770 mg GAE/100 mL (Table 2). Several genotypes, including ‘0202’, ‘0204’, ‘0404’, ‘0405’, ‘0407’, and ‘0504’, showed notably high TPC levels exceeding 1600 mg GAE/100 mL, suggesting strong phenolic accumulation and bioactive potential.

These findings are consistent with earlier chemical profiling studies. Gholizadeh-Moghadam et al.12 identified gallic acid and p-coumaric acid as dominant phenolic constituents in Berberis extracts. Özgen et al.32 reported exceptionally high TPC values (2565–3629 mg GAE/g) in B. vulgaris juice samples from Central Türkiye. In Iranian accessions, Ardestani et al.6 recorded a TPC of 47.80 g/100 g in B. integerrima, significantly greater than 27.99 g/100 g in B. vulgaris. Contrastingly, Hassanpour and Alizadeh16 found higher phenolic content in B. vulgaris (10.46–18.43 mg GAE/g DW) compared to B. integerrima (5.98–14.16 mg GAE/g DW). Similarly, Nilda et al.30 documented a wide TPC range (2281–3462 mg GAE/g) in B. vulgaris juice, while Şensu et al.43 reported that purple-fruited B. integerrima had higher phenolic concentrations (81.42 mg GAE/g DW) than red-fruited B. vulgaris (74.74 mg GAE/g DW). In Turkish populations, Gıdık13 noted TPC values ranging from 10.84 to 28.92 mg GAE/g, with B. crataegina showing the lowest levels and wild B. vulgaris the highest.

The average of TSS, an important indicator of fruit sweetness, is 29.27 °Brix, and the values ​​vary between 14 (‘0801’) and 38 (‘0403’) °Brix (Table 2). This high sugar content is particularly significant compared with other fruits, as most genotypes exhibited sugar levels exceeding 25% (Fig. 3). Notably, the genotypes ‘0103’, ‘0403’, ‘0406’, ‘0508’, ‘0512’, ‘0520’, ‘0523’, and ‘0535’ showed TSS values above 35%. In Iran, Rezaei et al.38 reported TSS values ranging from 9.5 to 26%, with a mean of 19.73% across 42 Berberis genotypes. Özgen et al.32 documented TSS levels between 6.95 and 20.85% for various B. vulgaris genotypes. In addition, Ardestani et al.6 found an average TSS of 17.33% among different Berberis genotypes, while Ahmed et al.2 reported TSS values ranging from 23.60 to 27.63% in their study on Berberis genotypes.

The pH levels ranged from 2.66 (‘0530’) to 3.76 (‘0605’), with a mean of 3.10 (Table 2). Barberry fruits are typically very sour, with high acidity being a significant issue, although some consumers enjoy this sour taste. In most wild species, the acid content is so high that the fruit is not edible. The fruit pH in B. vulgaris and B. crataegina was reported to be between 2.44 and 3.2538.

Barberry fruits are characterized by their notable acidity, which contributes to their distinctive tart flavor. The average TA of barberry is 13.46%, with a range between 4.10 (‘0515’) and 32.65% (0538’) (Table 2). This broad variability is reflected in the high coefficient of variation of 41.46%, indicating significant potential for selecting diverse sourness levels within the population (Table 2). Most genotypes possess acidity levels ranging from 10 to 15%. Notably, genotypes ‘0538’, ‘0529’, and ‘0536’, derived from the parental crossing R4N1♀ × BD♂, demonstrate high TA exceeding 23%. In contrast, certain genotypes, such as ‘0407’, ‘0515’, ‘0526’, ‘0609’, ‘0618’, and ‘0803’, exhibit TA below 5%. It is important to note that acidity is influenced by factors such as harvest timing and environmental conditions51. In comparison, a typical sour lemon may have a TA of around 5.8%, with a range of 5.2–6.3%24, highlighting the higher acidity of barberry fruits. Barberries are recognized for their substantial organic acid content, predominantly malic acid, followed by citric and tartaric acids30. In wild-type barberry from Iran, total acidity has been reported to range from 2.68 to 8.44%, with an average of 5.19%38. Research on B. vulgaris genotypes from Antalya, Türkiye, indicates that malic acid content varies between 3.41 and 8.81 g/L, while citric acid content ranges from 1.12 to 1.91 g/L30. This high organic acid concentration is integral to the taste and flavor profile of barberry fruits.

The FI, which integrates the balance between sweetness and acidity to reflect overall taste perception, exhibited a broad range among the evaluated barberry genotypes, with values extending from 0.98 in genotype ‘0538’ to 7.68 in ‘0515’. The population mean was moderate, and the high coefficient of variation (53.26%) suggests substantial genotypic variability and strong selection potential for flavor-related improvement (Table 2). Notably, genotypes ‘0515’ and ‘0609’ recorded FI values exceeding 7, indicating their superior taste profiles with favorable sweetness-to-acidity ratios. When contextualized with other commonly consumed sour fruits, the observed FI values underscore the relative palatability of certain barberry genotypes. For instance, lemon typically presents a FI between 1.2 and 2.0, sour cherry ranges from 4.17 to 7.95, and pomegranate cultivars have been reported with FI values ranging from 2.3 to 7.024,44,53. It is important to acknowledge that these ranges may vary depending on cultivar characteristics and environmental conditions influencing fruit metabolism and maturity51. Overall, the wide spectrum of FI values among the F1 progenies reflects the biochemical diversity of the population and emphasizes the opportunity to identify genotypes with enhanced flavor attributes. These findings hold significance not only for breeding programs targeting consumer-preferred traits but also for the development of barberry-based products with optimized sensory appeal for culinary and functional food applications.

Table 2 Descriptive analysis of phenological and biochemical characteristics in ‘zereshk bidaneh’ and 90 F1 progenies from interspecies hybrids of barberry.
Fig. 3
figure 3

The distribution and frequency of ‘Zereshk Bidaneh’ and 90 F1 progenies from interspecies hybrids of barberry in biochemical traits. The box plot presents the maximum, first quartile, median, third quartile, minimum, mean, and standard deviation, along with the upper and lower 95% confidence intervals of the mean.

Correlation matrix analysis (CMA)

In the correlation matrix analysis (CMA), Pearson correlation coefficients (r) were used to measure the strength and direction of the linear relationships between variables. The coefficients range from − 1 to + 1, where values closer to + 1 indicate a strong positive correlation, values closer to −1 indicate a strong negative correlation, and values near 0 suggest no significant linear relationship33. This method is widely used in quantitative research to identify and interpret the interdependence between traits9. Pearson correlation analysis provides valuable insights into how different traits influence one another, facilitating a deeper understanding of the studied system46.

The correlation matrix provides a detailed analysis of the relationships among the measured traits, highlighting significant correlations (Fig. 4). One of the most notable findings is the strong negative correlation between TAC and TAA (r = −0.81, p < 0.01). Although anthocyanins are widely recognized for their antioxidant properties, this inverse relationship suggests that other bioactive compounds may play a more prominent role in determining the overall antioxidant capacity of the fruit. This unexpected result emphasizes the need for further investigation into the complex biochemical interactions within the fruit matrix.

VC demonstrates several significant correlations with other traits. It is positively correlated with TPC (r = 0.23, p < 0.05), TSS (r = 0.26, p < 0.05), pH (r = 0.26, p < 0.05), and the FI (r = 0.22, p < 0.05). These correlations highlight the role of VC not only in the phenolic biosynthesis pathway but also in influencing sweetness, acidity, and flavor perception in the fruit. The positive correlation between VC and the FI suggests that VC contributes to improving the sensory quality of the fruit, likely enhancing its flavor profile.

The FI itself shows a strong relationship with multiple traits. It is positively correlated with TSS (r = 0.21, p < 0.05) and VC, emphasizing the importance of sweetness and ascorbic acid in enhancing flavor perception. Conversely, the FI has a strong negative correlation with TA (r = −0.82, p < 0.01), indicating that higher levels of acidity negatively affect the perceived flavor quality. These findings highlight the critical balance between sweetness and acidity in optimizing fruit quality.

In conclusion, the significant correlations observed in the matrix provide valuable insights into the complex interplay of biochemical and sensory traits in the fruit. The positive associations of VC with TPC, pH, TSS, and the FI highlight its multifaceted role in enhancing both nutritional and sensory quality. Meanwhile, the negative correlation between TAC and TAA, as well as the impact of TA on flavor perception, underscores the need for careful management of these traits in breeding programs and postharvest practices to optimize fruit quality.

Fig. 4
figure 4

Simple correlations among the variables utilized in studied ‘Zereshk Bidaneh’ barberry progenies. For abbreviations, please see Table 2.

Principal component analysis (PCA)

Principal component analysis (PCA) was conducted to identify the key traits influencing the grouping of genotypes. This multivariate statistical technique reduces the dimensionality of data by transforming the original variables into a set of new, uncorrelated variables called principal components. The analysis was performed using the criterion of eigenvalues greater than 1.00 to select the components that explain a significant proportion of the variance in the dataset. This approach ensures that only meaningful components contributing substantially to the data’s variability are retained for further interpretation1.

To enhance the interpretability of the components, Varimax rotation with Kaiser Normalization was applied22. The Varimax rotation method maximizes the variance of squared loadings of a factor across variables, resulting in a clearer and more distinct separation of variables under each component. This makes it easier to identify the primary traits associated with each principal component. Kaiser Normalization standardizes the variance of the loadings before rotation, ensuring that the rotated components remain statistically reliable and meaningful23. Together, these methods provide a more robust understanding of the relationships between variables and their influence on genotype differentiation21.

As a result of principal component analysis (PCA), three main components (PC1, PC2, and PC3) were identified as significant in explaining the biochemical traits in the dataset (Table 3). These three components account for 65.99% of the total variance. Among them, PC1 and PC2, which together explain 50.04% of the variance, are the most influential in representing the core structure of the dataset. In contrast, PC3, with a more limited explanatory power (15.95%), still contributes valuable insights into the biochemical differences among the genotypes.

PC1 is primarily characterized by strong loadings for FI (0.56), TAA (0.38), TAC (−0.41), and TA (−0.53). These results highlight a negative relationship between FI and TA, indicating that genotypes with higher FI values generally exhibit lower acidity levels. Similarly, the negative correlation between TAC and FI suggests that higher anthocyanin levels are associated with reduced FI scores. These findings emphasize the interplay between organoleptic traits, such as flavor, and functional properties, such as antioxidant activity. The high variance explained by PC1 (p < 0.01) underscores its pivotal role in distinguishing the biochemical differences among genotypes.

PC2 is strongly associated with TAC (0.52), VC (0.40), pH (0.38), and TAA (−0.50). The positive loadings for TAC, VC, and pH indicate a direct correlation among these traits, suggesting that an increase in pH levels is accompanied by higher concentrations of anthocyanins and VC. Conversely, the negative loading for TAC implies that genotypes with higher antioxidant levels tend to exhibit lower pH and VC values. This highlights the complex and potentially inverse relationships of antioxidant activity with other biochemical traits. The substantial explanatory power of PC2 (p < 0.01) further supports its significance in interpreting the biochemical diversity of the genotypes.

PC3 is defined by strong contributions from TSS (0.53), TPC (0.46), and TA (0.44). This component predominantly focuses on quality-related parameters, with positive loadings for TSS and TPC suggesting that these traits are key factors distinguishing genotypes based on flavor and functional composition. The inclusion of TA also indicates its association with TSS and TPC. While PC3 explains a smaller portion of the variance (15.95%) and is statistically less significant (p < 0.05), it plays a complementary role in providing insights into the biochemical profiles of genotypes.

In summary, the PCA results reveal complex relationships among biochemical traits and demonstrate that genotypes can be characterized by distinctive contributions across the principal components. Traits such as flavor index, acidity, antioxidant activity, and phenolic content emerge as critical parameters for understanding the biochemical variation. These findings emphasize the importance of evaluating biochemical traits comprehensively in both breeding programs and phenotypic assessments.

Table 3 Principal component analysis results for biochemical traits in 90 F1 progenies from interspecies hybrids of barberry.

PCA biplot analysis combines genotypes, points, and traits into a single graph as vectors. The direction and length of the vectors show the contribution of each trait to the principal components, while the angle between the vectors reflects the correlations8,26.

The 95% confidence ellipse represents the variation within the groups of genotypes and variables. In other words, it shows the region where it should be distributed normally as 95%. The genotypes and variables that fall outside the 95% confidence ellipse represent unique or outlier values. Therefore, it is very important to investigate the genotypes and variables that fall outside the ellipse20. In Cluster 1, genotypes ‘0601’, ‘0606’, ‘0607’, and ‘0617’ lie outside the 95% confidence ellipse and are characterized by high TAC, suggesting they may possess unique qualities. Cluster 2 includes genotypes ‘0804’, ‘0203’, ‘0806’, and ‘0608’, associated with high levels of VC, TPC, TSS, and pH. Additionally, genotypes ‘0515’, ‘0203’, ‘0609’, ‘0618’, ‘0407’, and ‘0512’, which exhibit high FI and are located within the ellipse, may highlight desirable taste profiles (Fig. 5). In Cluster 3, TA is more significant than other traits for genotypes ‘0538’, ‘0537’, ‘0529’, ‘0501’, and 0502, which are characterized by high TA. Cluster 4 is marked by TAA, with most genotypes in this cluster positioned within the central part of the ellipse (Fig. 5).

Fig. 5
figure 5

An analysis of the biplot for the first two principal components (PC1/PC2) of ‘Zereshk Bidaneh’ and 90 F1 progenies derived from interspecies hybrids of barberry. For abbreviations, please see Table 2.

Heat map analysis (HMA)

Heat map analysis (HMA) is a method that visually reveals the similarities and differences within data. In this analysis, similarities between genotypes or variables are calculated using Ward’s method and Euclidean distance. Ward’s method creates clusters by minimizing the total sum of squares between observations50, while Euclidean distance is used to measure the similarity between each observation10. These two methods are typically employed in an agglomerative approach to cluster the data and understand the relationships between observations.

The heat map visually displays the relationships between each observation and variable using colors, where higher values are represented by warm colors and lower values by cool colors. This analysis helps in identifying correlations between variables and detecting outliers. Heat map analysis using Ward’s method and Euclidean distance is a powerful tool for understanding the relationships between genotypes and traits by clearly revealing similarities and differences in large datasets52. Visualization of clustering patterns of ‘Zereshk Bidaneh’ (seedless) and 90 F1 progenies and variables based on biochemical characterizations using a heat map is presented in detail in Fig. 6.

Initially, the variables were systematically categorized into two primary groups, A and B, to facilitate a structured evaluation of their biochemical and sensory properties. Each group was subsequently divided into two subgroups based on specific characteristics. In subgroup A1, total anthocyanin content was categorized, reflecting its pivotal role in pigmentation and antioxidant properties, which are critical for assessing functional and visual attributes. Subgroup A2 included total phenolic content, a key indicator of the phenolic compound diversity, often associated with health-promoting activities and oxidative stability.

Group B focused on sensory and nutritional quality parameters. Subgroup B1 encompassed vitamin C, total antioxidant activity, total soluble solids, and titratable acidity, collectively representing essential traits for nutritional profiling and overall fruit quality. Subgroup B2 included pH and flavor index, parameters directly linked to sensory appeal, consumer preference, and marketability. This hierarchical classification enabled a streamlined approach to analyzing complex datasets, ensuring clear differentiation between functional and sensory attributes.

Similarly, genotypes were initially divided into two broad groups, C and D, and further stratified into subgroups C1, C2, D1, and D2, highlighting distinct patterns in their biochemical and phenotypic profiles. The genotypes within subgroup C1 included ‘0607’, ‘0601’, ‘0801’, ‘0617’, ‘0606’, ‘0703’, and ‘0402’. These genotypes potentially exhibit similarities in traits related to Group A variables, suggesting a shared emphasis on anthocyanin and phenolic compound accumulation, which may contribute to high antioxidant potential and rich pigmentation.

In subgroup C2, genotypes such as ‘0806’, ‘0804’, ‘0608’, ‘0602’, ‘0405’, ‘0404’, ‘0615’, ‘0524’, ‘0508’, ‘0522’, ‘0517’, ‘0514’, ‘0620’, ‘0535’, ‘0403’, ‘0610’, ‘0533’, and ‘0105’ were clustered. The diversity within this subgroup could indicate variations in phenolic profiles or the presence of unique metabolic pathways contributing to subtle differences in antioxidant activities or phenolic content.

Group D, which primarily aligns with variables from Group B, also exhibited a clear hierarchical structure. Subgroup D1 included genotypes ‘0528’, ‘0501’, ‘0507’, ‘0401’, ‘0616’, ‘0539’, ‘0406’, ‘0304’, ‘0538’, ‘0536’, ‘0529’, ‘0701’, ‘0537’, ‘0619’, ‘0611’, ‘0511’, ‘Seedless’, ‘0534’, ‘0513’, ‘0603’, ‘0204’, and ‘0102’. These genotypes may be characterized by balanced acidity, high Vitamin C content, and favorable antioxidant activity, making them suitable for applications where both nutritional and sensory qualities are prioritized.

Subgroup D2, containing genotypes ‘0526’, ‘0515’, ‘0609’, ‘0407’, ‘0512’, ‘0618’, ‘0301’, ‘0803’, ‘0201’, ‘0532’, ‘0530’, ‘0203’, ‘0103’, ‘0805’, ‘0807’, ‘0516’, ‘0509’, ‘0506’, ‘0702’, ‘0614’, ‘0521’, ‘0519’, ‘0505’, ‘0802’, ‘0531’, ‘0523’, ‘0613’, ‘0525’, ‘0520’, ‘0605’, ‘0504’, ‘0408’, ‘0205’, ‘0527’, ‘0510’, ‘0518’, ‘0502’, ‘0612’, ‘0104’, ‘0303’, ‘0503’, ‘0302’, ‘0202’, and ‘0101’, likely represents a more diverse group with substantial variability in sensory parameters such as flavor index and pH. These genotypes may exhibit unique sensory profiles, catering to specific consumer preferences or niche markets.

The structured grouping of both variables and genotypes underscores the robustness of this analytical approach. Group A variables, particularly anthocyanin and phenolic content, are critical for evaluating antioxidant capacity and functional properties, which are essential for breeding programs focused on health-oriented cultivars. Meanwhile, Group B variables emphasize the importance of sensory attributes and nutritional quality, highlighting their relevance for consumer satisfaction and market demands.

From a genotypic perspective, the clustering patterns reveal potential synergies among genotypes within the same subgroups, as well as significant diversity across subgroups. This information is invaluable for breeders aiming to select genotypes that optimize both functional and sensory traits. For instance, genotypes in subgroup C1 could be prioritized for antioxidant-rich products, while those in subgroup D2 may be preferred for fresh consumption due to their superior sensory profiles.

In conclusion, the hierarchical grouping of variables and genotypes provides a comprehensive framework for understanding the intricate relationships between biochemical traits and genotypic diversity. This approach not only facilitates targeted selection in breeding programs but also enhances the potential for developing cultivars with optimized functional, nutritional, and sensory qualities tailored to diverse market and consumer demands.

Fig. 6
figure 6

Visualization of clustering patterns of ‘Zereshk Bidaneh’ (seedless) and 90 F1 progenies and variables based on biochemical characterizations using a heat map. For abbreviations, please see Table 2.

Comparison of multivariate classification methods

In the present study, both PCA and agglomerative HMA were applied to classify barberry progenies in the F1 population based on their biochemical and antioxidant activity. Although these multivariate techniques rely on different mathematical principles their outputs demonstrated a notable degree of consistency. Specifically, genotypes such as 406, 504, 505, 507, 513, 518, 521, 613, 614, 702, 702, and 802 were grouped closely together in the PCA biplot and were also assigned to the same or adjacent clusters in the HCA dendrogram. This overlap reinforces the reliability of the phenotypic structure observed and confirms the classification validity achieved through independent yet complementary multivariate approaches.

Nonetheless, some minor discrepancies were noted, which likely reflect the differential sensitivities of PCA and HMA to data distribution and variable structure. For future studies, employing quantitative concordance measures, such as the cophenetic correlation coefficient or Procrustes analysis, may provide additional insights into the agreement between clustering methods and help validate classification robustness.

Conclusions

The investigation of phenological and biochemical traits across 91 barberry genotypes revealed substantial genetic variability, providing a valuable foundation for targeted breeding and cultivation strategies. Significant differences in flowering and fruiting times—ranging from early to late genotypes—underscore the opportunity to select cultivars with optimal phenological timing suited to specific climatic and agronomic conditions. The analysis of biochemical properties highlighted considerable variation in key traits such as TAC, VC, TAA, and TPC. A large proportion of genotypes showed high levels of TAC and VC, in some cases exceeding values reported for commonly consumed fruits such as oranges and strawberries. Additionally, the observed variability in TSS, TA, and FI points to opportunities for selecting high-quality genotypes with superior taste and nutritional profiles. Among the evaluated progenies, genotypes ‘0601’, ‘0606’, ‘0607’, and ‘0617’ consistently exhibited superior biochemical traits, including exceptionally high TAC (> 1250 mg/L) and elevated TAA. Notably, genotype ‘0601’ recorded the highest TAC value (2063.55 mg/L), while ‘0203’ demonstrated the highest TAA (90%) and TPC levels. Genotype ‘0515’ was outstanding in terms of FI (7.68), suggesting strong potential for consumer preference and sensory appeal. These high-performing genotypes highlight the dual value of nutritional enhancement and sensory appeal in breeding priorities.

Traits such as TAC and FI are highly valuable in selection and breeding programs. High TAC content indicates enhanced antioxidant capacity, while FI reflects flavor quality, both crucial for marketable cultivar development. Principal component analysis (PCA) and heat map clustering confirmed the relevance of TAC, TAA, and FI as major discriminating traits, further supporting the selection of elite genotypes. These top-performing individuals represent promising candidates for future barberry breeding programs aimed at enhancing bioactive compound content, flavor quality, and market value.

This study, conducted in a single season and location, serves as a baseline for future multi-year and multi-location evaluations to validate trait stability.

Overall, the study highlights the extensive phenological and biochemical diversity present in F1 hybrid barberry populations. By harnessing this variability, breeders and researchers can accelerate the development of improved barberry cultivars that meet both health-related and sensory expectations, thereby contributing to progress in functional horticulture and nutritional innovation.