Introduction

Syzygium cumini Skeels, commonly known as jamun, Indian blackberry, black plum, Jambul, or Java plum, is a tropical fruit tree of the family Myrtaceae native to the Indian subcontinent. The species carries profound cultural and historical importance, with ancient texts referring to India as “Jambudweep,” signifying its abundance and reverence in traditional society1. Today, jamun is recognized as a promising “fruit of the future” owing to its nutritional richness, medicinal attributes, and industrial potential2,3,4,5. India is the leading global producer, contributing 15.4% of world output (13.5 million tonnes), with Maharashtra, Uttar Pradesh, Tamil Nadu, Gujarat, and Assam as major producing states6.

Nutritionally, jamun fruit is a valuable source of vitamin C, folic acid, iron, calcium, potassium, magnesium, and dietary fiber, while also being low in calories and glycemic index. Its rich phytochemical profile includes anthocyanins, flavonoids, tannins, ellagitannins, and alkaloids such as jamboline, which collectively impart potent antioxidant, antidiabetic, hepatoprotective, cardioprotective, and chemopreventive properties7,8,9,10,11. Seeds are particularly valued for their hypoglycemic effect in type-2 diabetes management12, while fruits provide anthocyanin-derived pigmentation, essential minerals, and dietary fiber. Beyond these, jamun exhibits anti-inflammatory, antimicrobial, radioprotective, and neuroprotective activities, broadening its relevance in functional food and pharmacological applications2,5.

The fruit also has significant industrial and processing potential. Fresh jamun is highly perishable with a short harvest window, but value-added products such as juices, squashes, jams, jellies, syrups, candies, and dehydrated powders extend its usability5,13. Fermented beverages such as jamun wine, vinegar, and probiotic drinks have demonstrated good consumer acceptance14,15. Seed powder and standardized extracts are marketed as nutraceutical supplements for glycemic control, obesity prevention, and antioxidant support10,11,12. Cosmetic and natural product applications utilize anthocyanins and polyphenolic extracts from the pulp as natural colorants and anti-aging agents16. Even by-products such as seed oil and pomace have found industrial use, with pomace being incorporated as a functional ingredient in value-added products like ice cream, thereby contributing to sustainable utilization17.

Despite this wide-ranging potential, jamun remains underutilized. Cultivation is often restricted to seedling-origin trees on roadsides and forest margins, leading to high variability in fruit size, texture, and quality, inconsistent yields, and absence of standardized orchard practices18. Commercial expansion is constrained by short shelf life and poor postharvest handling19. To address these challenges, breeding efforts must prioritize traits such as sequential maturity, firmer texture, and improved storability20. Systematic characterization of germplasm is a prerequisite for genetic improvement and conservation. Morphological descriptors, though widely used, are insufficient in capturing hidden variability influenced by the environment21. Molecular markers provide greater precision in assessing genetic diversity, population structure, and relationships among genotypes. Among these, microsatellite or simple sequence repeat (SSR) markers are particularly valuable due to their codominant inheritance, high reproducibility, and genome-wide abundance, enabling fine-scale resolution of allelic diversity, assessment of genetic relationships, and detection of population structure with greater accuracy than dominant markers22,23. Coupled with multivariate analyses such as correlation, CA biplot, and clustering, these approaches enable the identification of superior genotypes and the development of core collections for breeding and conservation24.

In the present study, SSR markers were employed alongside detailed morphological and biochemical characterization to unravel genetic diversity among selected jamun genotypes. While earlier molecular studies in jamun have primarily relied on dominant markers such as RAPD25,26,27 and ISSR28, this work represents one of the first comprehensive efforts integrating extensive SSR marker analysis with morphological and biochemical characterization in jamun. By combining phenotypic, nutraceutical, and molecular datasets, the study aims to identify superior genotypes for targeted breeding, conservation, and industrial utilization, thereby strengthening the foundation for systematic improvement of this underutilized yet highly valuable fruit crop.

Materials and methods

Plant material

The study was conducted on 23 selected genotypes of jamun (S. cumini) during 2023–2025 (Table S1). These genotypes were chosen from a germplasm block of 108 accessions. The selection criteria were based on consistent bearing, fruit size, pulp percentage, sensory appeal, and preliminary biochemical performance, ensuring representation of morphological diversity and industrial relevance. Of the 23 genotypes, 13 were maintained at the jamun germplasm block of CHES, Hirehalli, while the remaining 10 were obtained from the germplasm block of ICAR–Indian Institute of Horticultural Research (IIHR), Bengaluru.

The experimental site at ICAR–IIHR, Hessaraghatta, Bengaluru (13°71′ N, 72°29′ E; 890 m amsl), is characterized by a subtropical humid climate with moderate temperatures (19–32 °C), lateritic red soils, and an average annual rainfall of about 1083 mm, mostly during July–September. In contrast, CHES, Hirehalli (13.34° N, 77.1° E; 845 m amsl) experiences a hot and relatively dry climate, with higher maximum temperatures (up to 35.4 °C), lower annual rainfall (630 mm), and average relative humidity around 61.5%.

Phenotyping

Morphological characters

Thirty-four morphological traits were examined, including 20 quantitative and 14 qualitative traits. From each genotype, 10 mature fruits were randomly harvested to record data. Fruit length, fruit width, seed length, and seed width were measured using digital Vernier calipers and expressed in cm. Fruit weight, seed weight, and pulp weight were measured using an electronic balance with 0.01 g precision. Color attributes were assessed using standard color charts. Data on tree growth (height, canopy area, and girth), leaf traits (size, ratios, and petiole length), and fruit and seed characteristics (dimensions, weights, pulp percentage, pulp-to-seed ratio, fruit size, and seed size) were collected as per crop descriptors. Qualitative traits, including tree habit, foliage type, leaf and fruit morphology, and color attributes of fruit and pulp, were also recorded using standard jamun descriptors.

Biochemical characters

Total soluble solids (TSS) were measured using a portable hand refractometer at 20 °C and expressed in °Brix29. Titratable acidity was estimated by titration with 0.1 N NaOH using phenolphthalein indicator and expressed as % citric acid equivalent30. Ascorbic acid was quantified by the 2,6-dichlorophenol indophenol (DCPIP) titration method and expressed as mg/100 g fresh weight30. Total phenolic content was estimated using the Folin–Ciocalteu method and expressed as mg gallic acid equivalent (GAE) per 100 g fresh weight31. Antioxidant capacity was assessed using the ferric reducing antioxidant power (FRAP) assay32 and the DPPH radical scavenging assay33, with results expressed as ascorbic acid equivalent antioxidant capacity (AEAC). Total and reducing sugars were determined by the Lane and Eynon method and expressed as g/100 g fresh weight, while non-reducing sugars were calculated by difference29. Anthocyanins were estimated using the pH differential spectrophotometric method, with cyanidin-3-glucoside as the reference standard, and expressed as mg C3GE/100 g fresh weight34.

Genotyping

In silico mining of SSRs

Simple sequence repeats (SSRs) were identified through computational analysis. Raw sequence reads of S. cumini were retrieved from the NCBI Sequence Read Archive (SRA) and assembled into contigs following Phred-based quality filtering. Microsatellites were mined from the assembled sequences using Krait software version 2.0.6 (http://krait.biosv.com/en/latest/), targeting di- and tri-nucleotide motifs with ≥ 6 repeat units. Primers flanking polymorphic loci were designed using Primer3, with parameters set to 18–22 bp primer length, melting temperature of 55–60 °C, and expected product size of 100–300 bp (Table S2).

DNA extraction, quality assessment, and quantification

Genomic DNA was extracted from young leaves of 23 jamun genotypes using the CTAB method of Doyle and Doyle35 with minor modifications. Approximately 100 mg leaf tissue was ground in liquid nitrogen, mixed with pre-warmed CTAB buffer (65 °C) containing freshly added β-mercaptoethanol, and incubated at 65 °C for one h with intermittent mixing. After chloroform: isoamyl alcohol (24:1) extraction, the aqueous phase was recovered, re-extracted, and DNA was precipitated with absolute ethanol at − 20 °C overnight. Pellets were washed with 70% ethanol, air-dried, and dissolved in TE buffer. RNA contamination was removed by RNaseA treatment, and DNA stocks were stored at − 20 °C.

DNA quality was checked by electrophoresis on 0.8% agarose gels in 1× TAE buffer stained with ethidium bromide. High-molecular-weight intact bands indicated good quality, whereas smearing indicated degradation. DNA purity and concentration were determined using a Nanodrop spectrophotometer at A260/A280, with ratios of ~ 1.8 considered pure. Concentrations were calculated assuming 1 OD260 = 50 µg/ml of double-stranded DNA, and working stocks were standardized to 30 ng/µl with TE buffer for downstream applications.

PCR amplification and optimization

The PCR amplification was carried out to amplify SSR loci from the genomic DNA of jamun genotypes. Each 25 µl reaction contained ~ 50 ng template DNA, 1× assay buffer (100 mM Tris-HCl, 500 mM KCl, 1% Triton X-100, 16 mM MgCl₂), 2.5 mM of each dNTP, 10 pM each forward and reverse primer (Bioserve, India), 1.25 U Taq DNA polymerase (3B BlackBio Biotech, India), and molecular-grade water. A master mix was prepared to minimize pipetting errors, and aliquots were distributed into PCR tubes. Primers were received in lyophilized form, reconstituted under sterile conditions, and stored at − 20 °C until use.

Amplification was performed in a thermal cycler using a touchdown protocol to increase specificity. The program consisted of an initial denaturation at 94 °C for 4 min, followed by 10 cycles of denaturation (94 °C, 45 s), annealing (60 °C, 1 min), and extension (72 °C, 1 min). This was followed by 25 cycles with annealing at 55 °C for 30 s, and extension at 72 °C for 1 min. A final extension was carried out at 72 °C for 6 min, and products were held at 4 °C.

Gel electrophoresis of PCR products

The PCR amplicons were resolved on 4% agarose gel prepared in 1× TBE buffer and electrophoresed at 75 V until the tracking dye reached the gel front. Gels were visualized under a UV transilluminator, and images were captured using a gel documentation system. A 1 kb DNA ladder was used as a molecular size standard.

Data scoring and analysis

SSR amplicons were scored visually across all genotypes, considering only clear, reproducible, and distinct bands. Molecular data was interpreted using Darwin software version 6.0.021 (https://mybiosoftware.com/darwin-diversity-phylogenetic-analysis.html), for generating the UPGMA dendrogram. The polymorphic information content, expected heterozygosity (He), and observed heterozygosity (Ho) were analysed using Cervus software version 3.0.7 (https://mybiosoftware.com/cervus-3-0-3-parentage-analysis.html).

Statistical analysis

Data were analyzed using SAS software version 9.0 (http://support.sas.com/documentation/whatsnew/91x/index.htm). Descriptive statistics (mean, range, SD, and CV%) were computed for all traits, while frequency and percentage distribution were used for qualitative attributes. Mean separation was performed with Tukey’s test. Pearson’s correlation coefficients (r) were calculated in SPSS software version 20.0 (https://www.ibm.com/support/pages/spss-statistics-20-available-download) to assess trait associations. Key patterns were visualized through correspondence analysis (CA) biplots, and clustering relationships were illustrated using a Ward’s method heat map.

Results

Qualitative morphological variation among genotypes

The qualitative morphological characterization of the 23 S. cumini genotypes revealed distinct variation across tree, leaf, flowering, and fruit traits (Table 1). The variability in different orangs of jamun genotypes studied is provided as supplementary materials, including tree (Fig. S1), leaf shape (Fig. S2), new flush color (Fig. S3), leaf tip and base (Fig. S4), inflorescence (Fig. S5), fruit shape and color (Fig. S6), pulp color (Fig. S7), and seeds (Fig. S8). Tree growth habit was reasonably balanced across spreading, semi-spreading, and upright forms, reflecting structural diversity in the germplasm. Foliage density was predominantly dense, recorded in about two-thirds of the genotypes, while the remaining displayed a sparse canopy. Most of genotypes exhibited green mature leaves, with less than one-fifth showing a yellow-green shade, whereas leaf shapes were primarily broadly ovate, complemented by lanceolate and elliptic-oblong forms. Nearly 60% of the genotypes possessed acuminate apices, while acute apices were also frequent, and leaf bases were overwhelmingly acute, with only a small proportion rounded. New flush color was highly variable, though yellow-green dominated in more than half the genotypes, followed by greyed orange and greyed brown. The lamella surface was smooth in nearly three-fourths of the accessions, with the remainder showing wavy surfaces. Bloom initiation occurred mainly from March to April, with genotypes distributed across different fortnightly periods, and only one exhibited off-season flowering in October. Fruit traits displayed striking diversity, with oblong shape being the most common, followed by elliptic, round, and ovoid forms. The stalk end was almost universally flat, with depressed ends being rare. Apex shape varied, though depressed and flat types predominated, while round apices were least represented. In terms of fruit color, purple was most frequent, complemented by black and greyed-purple, while only a single genotype exhibited a unique white fruit. Pulp color was uniform across most genotypes, with over 95% showing red-purple pulp, highlighting strong stability in this trait.

Table 1 Frequency distribution for the measured qualitative morphological characteristics in Syzygium cumini genotypes.

Quantitative morphological variation among genotypes

The descriptive statistics for quantitative morphological traits of the 23 jamun genotypes revealed substantial variability, as reflected in the coefficients of variation (CV) (Table 2). Among growth parameters, plant height varied widely, ranging from 411 cm in CHESHJ-XIII/4 to 1737 cm in CHESHJ-Wd-1, with a mean of 636.83 cm (CV 40.64%). Trunk girth also showed high variability (52–222 cm; CV 35.20%), while canopy spread exhibited moderate variability (26–28%), suggesting stable architectural traits across genotypes. Leaf dimensions were comparatively less variable, with length ranging between 11.24 cm (Konkan Bahadoli) and 16.50 cm (CHESHJ-XI/3), and width between 4.40 cm (CHESHJ-Wd-1) and 7.55 cm (CHESHJ-Wt-1), reflecting moderate genetic stability in foliar traits. Reproductive traits displayed marked diversity, particularly in inflorescence size, which varied almost twofold across genotypes (CV 19–24%). Inflorescence length ranged from 6.50 cm in Savadatti to 14.10 cm in CHESHJ-V/1, while width varied between 4.50 cm (Selection-58) and 11.50 cm (Selection-45). Fruit characteristics exhibited striking variability, with weight spanning from 1.97 g (CHESHJ-Wd-1) to 18.32 g (Kaithnal), and fruit size from 1.65 cm² (CHESHJ-Wd-1) to 10.03 cm² (Kaithnal). Pulp percentage was highest in IC-715 (92.26%) and lowest in CHESHJ-Wd-1 (66.50%), while the pulp-to-seed ratio ranged from 1.98 (CHESHJ-Wd-1) to 11.91 (IC-715), representing the most variable trait (CV 42.81%). Seed traits further highlighted genotypic contrasts; seed weight varied from 0.65 g (CHESHJ-Wt-1) to 3.89 g (CHESHJ-VI/2), with seed percentage ranging from 7.74% in IC-715 to 33.50% in CHESHJ-Wd-1.

Table 2 Descriptive statistics for the quantitative morphological and biochemical characters in the studied genotypes of Syzygium cumini.

Biochemical variation among genotypes

Considerable variation was noted among the genotypes for biochemical traits (Fig. 1; Table 2, and Table S3). Total soluble solids (TSS) ranged from 10.00°B in CHESHJ-IV/3 to 18.20°B in CHESHJ-V/1, with a mean of 12.63°B. Titratable acidity ranged from 0.43% (CHESHJ-V/1) to 2.67% (CHESHJ-XI/3), with an overall mean of 1.50 ± 0.12%. Genotypic differences were statistically significant (p < 0.05, Tukey’s test), with CHESHJ-XI/3 and AGJ-85 forming a distinct group characterized by high acidity. Ascorbic acid content exhibited a broad range (23.40–65.07 mg/100 g), with Konkan Bahadoli and Kaveripattanam-4 recording the highest values, while Kaithnal was significantly lower. Total phenolic content showed pronounced variability (120.78–936.26 mg GAE/100 g; mean 452.70 ± 46.12 mg GAE/100 g), with CHESHJ-XI/3 being significantly superior. Antioxidant capacity, as measured by FRAP, was also highly variable (CV = 53%), extending from 307.78 mg AEAC/100 g in Dhoopdal to 2493.91 mg AEAC/100 g in CHESHJ-XI/3, the latter statistically at par with CHESHJ-Wd-1 and CHESHJ-V/1. In contrast, DPPH activity exhibited a relatively lower dispersion (mean 1,239.66 ± 89.09 mg AEAC/100 g), with CHESHJ-VI/2 significantly outperforming Kaithnal. Sugars displayed moderate variation; total sugars averaged 10.49%, with CHESHJ-Wd-1 (14.84%) significantly higher than Kaveripattanam-4 and Collection-7 (8.19%). Reducing sugars followed a similar trend (5.35–11.03%), whereas non-reducing sugars showed wider dispersion (CV = 69%), ranging from 0.18% (AGJ-85) to 5.74% (CHESHJ-I/1). Anthocyanin content exhibited the most striking differences, with CHESHJ-Wd-1 (268.50 mg/100 g) nearly six-fold higher than CHESHJ-Wt-1 (42.59 mg/100 g), reflecting highly significant genotypic variation.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Box plots showing distribution of (A) acidity, (B) vitamin C, (C) total phenols, (D) total sugars, (E) reducing sugar, (F) non-reducing sugar, (G) DPPH activity, (H) FRAP activity, and (I) anthocyanin in the studied genotypes of Syzygium cumini.

Multivariate analysis

Pearson correlation analysis

Pearson’s correlation (Fig. 2) revealed strong positive associations among fruit weight, fruit length, fruit size, and fruit breadth (r = 0.7–0.9). Seed weight correlated perfectly with seed size (r = 1.0), while pulp-to-seed ratio strongly aligned with pulp percentage (r = 0.9). Vitamin C showed moderate positive correlations with total phenols (r = 0.6) and FRAP (r = 0.5). Antioxidant traits were interlinked, with DPPH showing strong associations with total phenols (r = 0.8) and FRAP (r = 0.7). A negative correlation was observed between seed percentage and pulp percentage (r = − 1.0).

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Pearson correlation analysis for morphological and biochemical traits of Syzygium cumini.

Heatmap analysis

Heatmap clustering (Fig. 3) segregated genotypes into three groups, with Cluster I (CHESHJ-Wd-1, CHESHJ-XI/3, CHESHJ-V/1, CHESHJ-II/1) showing relatively higher contributions of antioxidant-related nutraceutical traits such as FRAP, total phenols, and vitamin C. Cluster II (e.g., CHESHJ-III/3, AJG-85, Kaveripattanam) showed robust vegetative growth and fruit size, while Cluster III (Selection-58, Kaithnal, Collection-7) was enriched in sugar-related traits. Antioxidant traits (FRAP, total phenols, Vitamin C) were the most divergent across clusters.

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Heat map analysis showing hierarchical clustering of 23 Syzygium cumini genotypes based on 17 morphological and biochemical traits. Trait values were standardized before clustering. The heat map and dendrogram were generated using R software version 4.2.2 (https://www.r-project.org) with hierarchical clustering based on Euclidean distance and the complete linkage method. Visualization was created using the pheatmap package.

Hierarchical cluster analysis

Hierarchical clustering (Fig. 4) grouped the 23 genotypes into five clusters. Cluster V (CHESHJ-V/1 and CHESHJ-XI/3) was distinguished by high vitamin C and pulp-to-seed ratio, whereas cluster I (CHESHJ-Wd-1) was characterized by relatively higher values for antioxidant-related parameters. Cluster III (e.g., Kaithnal and AJG-85) had larger fruits with higher pulp content, suggesting suitability for fresh consumption. Cluster IV (CHESHJ-Wt-1) combined high TSS with dwarf stature, indicating potential for high-density planting.

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

Dendrogram representing cluster-wise grouping of Syzygium cumini genotypes based on morphological and biochemical traits.

Biplot analysis

The biplot (Fig. 5) explained 70.9% of total variation (Dim1: 45.8%, Dim2: 25.1%). Fruit size traits (weight, length, breadth, pulp content) clustered along Dim1, while antioxidant traits (FRAP, phenols, DPPH) grouped in a separate quadrant. Vitamin C and total sugar showed intermediate positions, bridging biochemical and morphological characteristics. Genotypes such as CHESHJ-V/1 and CHESHJ-XI/3 were aligned with antioxidant traits, whereas Selection-58 and Kaithnal aligned with fruit size. Peripheral positioning of CHESHJ-Wd-1 highlighted its unique antioxidant profile.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

Correspondence analysis (CA) biplot representing variability among Syzygium cumini genotypes based on agro-morphological and biochemical traits.

Genetic diversity among jamun genotypes based on microsatellite markers

Genotyping of 23 jamun accessions with 50 SSR markers revealed considerable genetic variability. Across all loci, a total of 329 alleles were detected, with the number of alleles per locus ranging from 3 (cSSR174) to 11 (cSSR30), and an overall mean of 6.58 alleles per locus (Table 3). The mean expected heterozygosity (He) was 0.740, reflecting a high level of allelic diversity, while the mean observed heterozygosity (Ho) was comparatively lower, indicating some degree of fixation within accessions. The polymorphic information content (PIC) values ranged from 0.316 (least informative) to 0.824 (most informative at cSSR239), with a mean of 0.687, suggesting that the majority of markers were highly informative for diversity analysis.

Table 3 Level of polymorphism revealed by 50 SSR primers in 23 genotypes of Syzygium cumini. The parameters include the number of alleles, observed heterozygosity, expected heterozygosity, polymorphic information content (PIC), and minor allele frequency (MAF).

Jaccard’s similarity coefficient matrix revealed a wide range of genetic dissimilarities among the genotypes (Table S2). The highest pairwise dissimilarity (0.88) was observed between CHESHJ-Wt-1 and CHESHJ-Wd-1, while the lowest dissimilarity values were recorded among genotypes belonging to the same morphological groups. UPGMA clustering based on SSR data (Fig. 6) grouped the genotypes into three major clusters, with a few genotypes forming distinct branches. Notably, CHESHJ-Wd-1 and CHESHJ-Wt-1 consistently appeared as outliers in both dendrograms and multivariate analyses.

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

Dendrogram showing the genetic relationships between 23 Syzygium cumini genotypes using UPGMA cluster analysis.

Principal coordinate analysis (PCoA) further supported the clustering pattern (Fig. 7), with the first two axes explaining a substantial proportion of the molecular variance. Genotypes were dispersed across all quadrants, confirming the broad genetic base of the germplasm. The distinct positioning of CHESHJ-Wd-1 and CHESHJ-Wt-1 in the PCoA plot highlighted their unique allelic composition compared to the rest of the collection.

Fig. 7
Fig. 7The alternative text for this image may have been generated using AI.
Full size image

Factorial analysis of 23 Syzygium cumini genotypes based on SSR markers.

Discussion

Morphological and fruit quality variability relevant to utilization

The extensive morphological diversity observed among jamun genotypes in this study underscores the species’ broad genetic base and adaptability, consistent with earlier findings36,37,38. Growth parameters such as tree height, trunk girth, and canopy spread revealed significant variability, with vigorous types like CHESHJ-Wd-1 exhibiting maximum values, while compact forms such as CHESHJ-XIII/4 and CHESHJ-XI/3 showed reduced stature and canopy size. This divergence has direct implications for orchard design and industrial applications. Dwarf and semi-spreading forms are particularly desirable for high-density planting and mechanized harvesting18,39, while vigorous spreading types can serve well in agroforestry and timber–fruit dual-purpose systems38.

Leaf traits, often overlooked in fruit crop evaluation, emerged as equally important in jamun due to their documented medicinal and nutraceutical roles. In the present study, leaf length and width exhibited 32–42% variation, aligning with earlier reports10,40. Larger leaves, as observed in CHESHJ-XI/3, support enhanced photosynthetic potential, while narrower leaves such as those of CHESHJ-Wd-1 are adaptive to water-limited and heat-prone environments. Beyond physiological roles, jamun leaves are pharmacologically rich, containing bioactive compounds such as ellagic acid, quercetin, and gallic acid, which exhibit antidiabetic, antimicrobial, and anti-inflammatory properties27,41. Thus, morphological variability in leaf architecture not only reflects adaptive potential but also indicates opportunities for targeted exploitation of foliage as a raw material for herbal and pharmaceutical industries.

Fruit morphological attributes showed striking diversity across genotypes, particularly in fruit size, weight, pulp-to-seed ratio, and pigmentation. The large-fruited genotype Kaithnal (fruit size > 10 cm², weight 18.3 g) contrasted sharply with small-fruited types such as CHESHJ-Wd-1, highlighting genetic resources suitable for fresh consumption and processing, respectively. These results are in agreement with earlier reports42,43 documenting similar inter-genotypic variation in fruit traits. Pulp-to-seed ratio, ranging from 1.98 in CHESHJ-Wd-1 to 11.91 in IC-715, emerged as one of the most variable traits (CV 42.8%). High-pulp genotypes such as IC-715 and Konkan Bahadoli are particularly important for juice and pulp-based industries, while small-pulp types may be reserved for seed-based phytochemical extraction.

Seed traits, though traditionally considered secondary to fruit traits, are of major industrial and medicinal importance in jamun. In the present study, seed weight varied from 0.65 g (CHESHJ-Wt-1) to 3.89 g (CHESHJ-VI/2), while seed percentage ranged from 7.74% (IC-715) to 33.50% (CHESHJ-Wd-1). Such variability has been consistently reported earlier44,45,46. Jamun seeds are rich in jamboline, jambosine, and polyphenols with well-documented antidiabetic and antioxidant activities10,46. Therefore, genotypes with higher seed yield and larger seed size, though less favorable for pulp recovery, may be prioritized for pharmaceutical industries focusing on seed extracts, whereas low-seed-percentage types such as IC-715 are better suited for food-processing industries.

Fruit coloration, ranging from purple to black and extending to rare white types such as CHESHJ-Wt-1, further reflects the rich morphological diversity and biochemical potential of jamun. Darker fruits are generally associated with higher anthocyanin content and antioxidant activity, enhancing both nutraceutical value and consumer preference47. The white-fruited CHESHJ-Wt-1 represents a unique genetic resource for understanding pigment biosynthesis and may find specialized applications in functional foods and niche markets.

Overall, the observed morphological variability across growth, leaf, fruit, and seed traits provides a strong foundation for dual-purpose utilization of jamun germplasm. While high-pulp and large-fruited types are directly suited for industrial food processing, leaf- and seed-rich accessions serve as valuable reservoirs for medicinal and nutraceutical industries. This dual utility highlights the importance of conserving morphological diversity, as both edible and non-edible plant parts contribute significantly to the overall economic and industrial value of jamun.

Biochemical diversity and implications for nutraceutical potential

The present study revealed marked variability in the biochemical composition of jamun fruits across genotypes, underscoring the species’ value as a nutritionally rich and industrially relevant fruit crop. Total soluble solids (TSS) ranged from 10.00°B to 18.20°B, reflecting significant genotypic differences in sugar accumulation, which influence fruit palatability and consumer acceptance. Similar variation has been reported in jamun38,45,48. Genotypes such as CHESHJ-V/1, with higher TSS, are attractive for fresh market consumption and beverage industries, while moderate-TSS types may be more suitable for processing, where balance with acidity is desirable.

Titratable acidity displayed a wide range (0.43–2.67%), influencing taste balance and shelf stability. Acidic genotypes such as CHESHJ-XI/3 and AGJ-85 align with earlier reports44,49, which emphasized the importance of acidity in enhancing processing attributes such as juice and jam quality. Pulp acidity also contributes indirectly to bioactive composition by affecting anthocyanin stability, a factor critical for nutraceutical applications.

Ascorbic acid (vitamin C), a key antioxidant, ranged from 23.40 to 65.07 mg/100 g, with Konkan Bahadoli and Kaveripattanam-4 being the richest sources. This aligns with previous findings43,47,50, which documented high vitamin C levels in jamun relative to other tropical fruits. Given its role in combating oxidative stress, boosting immunity, and enhancing iron absorption, high-vitamin-C genotypes hold strong nutraceutical potential, making them candidates for functional food and dietary supplement industries.

Total phenolic content exhibited striking variability (120.78–936.26 mg GAE/100 g), with CHESHJ-XI/3 emerging as a superior source. Phenolic compounds are strongly associated with antioxidant, antimicrobial, and anti-inflammatory properties10,41. The broad range recorded here exceeds earlier reports in jamun38,46,51, placing specific genotypes at par with recognized superfruits such as blueberry and pomegranate. This highlights jamun’s promise as a functional ingredient in nutraceutical and pharmaceutical sectors.

Antioxidant activity, assessed through FRAP and DPPH assays, further reinforced the nutraceutical potential of specific genotypes. FRAP ranged from 307.78 to 2,493.91 mg AEAC/100 g, while DPPH activity varied between 597.41 and 1,840.80 mg AEAC/100 g, with genotypes such as CHESHJ-Wd-1 and CHESHJ-VI/2 showing particularly high values. These results corroborate earlier findings45,49,52, emphasizing that jamun fruits are rich reservoirs of free-radical scavengers useful in preventing chronic diseases such as diabetes, cardiovascular disorders, and certain cancers.

Sugar composition also varied considerably: total sugars ranged from 8.19% to 14.84%, reducing sugars from 5.35% to 11.03%, and non-reducing sugars from 0.18% to 5.74%. These results are consistent with earlier reports38,43. Sugar profiles are crucial in determining consumer preference, processing suitability, and fermentation potential for wine and vinegar industries. High-sugar genotypes (e.g., CHESHJ-Wd-1) may be suitable for confectionery and syrup industries, whereas moderate-sugar types align better with low-calorie health-oriented products.

Anthocyanin content, ranging from 42.59 to 268.50 mg/100 g, displayed the widest diversity among biochemical traits. CHESHJ-Wd-1 recorded the highest concentration, while CHESHJ-Wt-1 was at the lower extreme. Such variability has been documented earlier41,53, highlighting the close link between anthocyanin content, fruit pigmentation, and antioxidant properties. Darker fruits, typically richer in anthocyanins, are not only visually appealing but also serve as natural colorants and functional ingredients for nutraceutical formulations.

Overall, the biochemical diversity observed in this study emphasizes the dual importance of jamun as both a nutrient-rich fruit and a reservoir of bioactive compounds. Genotypes rich in vitamin C, total phenols, and anthocyanins represent promising candidates for nutraceutical industries, while those with a favorable sugar–acid balance are better suited for processing and fresh consumption. Integrating biochemical parameters with morphological and molecular diversity offers a comprehensive framework for identifying superior genotypes for industrial utilization and targeted breeding.

Trait associations and multivariate insights for selection

Multivariate analysis in the present study provided critical insights into the complex interrelationships among morphological, biochemical, and yield traits of jamun. Correspondence analysis (CA) extracted two principal components explaining 70.9% of the variation, highlighting the dominance of fruit size and seed-related traits along PC1, and antioxidant and phenolic traits along PC2. Such clustering of yield-related and nutraceutical traits into distinct axes suggests that these features may be independently manipulated in breeding programs, offering scope for developing ideotypes that combine high fruit size with enhanced antioxidant potential. Similar partitioning of morphological and biochemical traits into orthogonal axes has been reported in other perennial fruit crops, including guava54 and bael25, reinforcing the robustness of this approach in tropical tree fruits. Correlation analysis further revealed strong positive associations between fruit weight, length, breadth, and pulp percentage (r = 0.7–0.9), supporting earlier findings in jamun38. Seed-related traits (seed length, width, and weight) clustered tightly together, while pulp percentage and pulp: seed ratio showed strong inverse correlations with seed percentage, confirming their utility as indirect selection markers for high-pulp types.

Biochemical traits formed a coherent and biologically meaningful group, with vitamin C showing moderate but positive correlations with total phenols and FRAP, while total phenols exhibited strong associations with both DPPH and FRAP, in agreement with earlier reports11,55. These relationships reflect the shared role of phenolic compounds and ascorbic acid in free-radical scavenging and redox buffering, which collectively contribute to antioxidant capacity in jamun fruits. From a practical perspective, the strong inter-linkage among antioxidant traits suggests that indirect selection based on easily measurable parameters such as total phenols or FRAP could effectively enhance overall antioxidant potential. This simplifies selection strategies in breeding and germplasm screening programs by reducing the need for multiple assays while facilitating the identification of nutraceutically superior genotypes.

Cluster I comprised genotypes enriched in antioxidant-related nutraceutical traits, including total phenols, FRAP, and vitamin C, reflecting relative trait dominance rather than a distinct biological separation of nutraceutical components. Cluster II (CHESHJ-III/3, CHESHJ-III/2, Kaveripattanam, AJG-85) combined moderate-to-high plant vigor with larger fruit size, traits desirable for orchard productivity. Cluster III (Selection-58, Kaithnal, Collection-7) was distinguished by high plant height, fruit size, and total sugars, suggesting suitability for yield-focused and processing-oriented improvement programs. This clear separation of antioxidant-rich versus yield-oriented types has also been observed in cluster analyses of jamun43,48, indicating that multivariate tools can reliably identify trait-based ideotypes.

Notably, the broad dispersion of genotypes in the biplot revealed substantial diversity, with distantly placed genotypes such as CHESHJ-V/1 and Selection-58 representing potential parental combinations for hybridization to exploit heterosis. This aligns with earlier work50,56, which emphasized the value of crossing divergent clusters to combine desirable traits. The independent clustering of morphological and biochemical attributes also suggests opportunities for pyramiding, where antioxidant-rich small-fruited types can be crossed with large-fruited, high-pulp genotypes to generate dual-purpose cultivars with enhanced consumer and industrial appeal. Together, these multivariate findings underscore the utility of combining dimensionality reduction, correlation, and clustering tools to reveal meaningful trait patterns in jamun germplasm. By integrating yield- and nutraceutical-associated clusters, breeders and industry stakeholders can design targeted improvement programs to develop jamun cultivars that not only meet market demand for high fruit quality and pulp recovery but also tap into the growing nutraceutical and functional food markets.

Microsatellite markers reveal molecular diversity and genetic structure of Jamun germplasm

The SSR-based molecular analysis of jamun genotypes revealed a high level of genetic variation, underscoring the suitability of microsatellites for diversity assessment in this underutilized fruit crop. An average of 6.58 alleles per locus was recorded, ranging from three alleles at cSSR220 and cSSR224 to 11 alleles at cSSR30, indicating considerable allelic richness across loci. Comparable allele numbers have been reported in related Myrtaceae members such as wax apple57 and guava56, suggesting that jamun possesses a similarly broad genetic base. The average major allelic frequency (0.441) also indicated balanced allele distribution, confirming the presence of diverse allelic forms within the collection.

Expected heterozygosity (He) was high (mean 0.7403), reflecting substantial genetic diversity within the germplasm. Several loci showed He values exceeding 0.8, confirming that allelic diversity translates into higher heterozygosity at the population level. High heterozygosity has also been reported in previous studies on jamun23,25,58, further supporting the notion that cross-pollination and wide geographical distribution contribute to the broad genetic base of this species.

The polymorphic information content (PIC) values ranged from 0.316 to 0.824, with cSSR239 being the most informative marker. The mean PIC of 0.687 places these SSRs in the highly informative category, confirming their effectiveness for genotyping, genetic relationship studies, and future marker–trait association work.

Genetic dissimilarity analysis revealed wide divergence among genotypes, with a maximum dissimilarity of 0.88 observed between CHESHJ-Wt-1 and multiple accessions (e.g., AJG-85, Konkan Bahadoli, Dhoopdal, Selection-45, Selection-58, Savadatti, IC-715, CHESHJ-V/1, CHESHJ-XIII/4). Similarly, CHESHJ-Wd-1 displayed high divergence from a comparable set of genotypes, highlighting these two accessions as genetically distinct outliers. In contrast, minimum dissimilarity (0.36) between Selection-45 and Selection-58 and between CHESHJ-I/3 and CHESHJ-III/3 indicated localized uniformity, probably due to shared ancestry or collection from proximate regions. Such patterns of divergence and relatedness are consistent with earlier reports22,26, which also documented geographically linked clustering in jamun collections.

The Neighbor-Joining phylogenetic tree grouped the accessions into three major clusters, reflecting the underlying genetic structure of the germplasm. Clusters I and II contained the majority of accessions, often aligning with their geographical origin, whereas a few outlier genotypes were placed in Cluster III. The clustering pattern supports earlier findings in jamun28,59,60, where regional affinity influenced grouping but outliers carried distinct allelic combinations. The cross-pollinated nature of jamun likely contributes to this broad diversity through high rates of gene flow, while geographic separation promotes differentiation among populations.

Collectively, the SSR-based analysis confirms that jamun harbors extensive molecular diversity, with both highly divergent and closely related accessions coexisting within the germplasm. The identification of genetically distinct genotypes such as CHESHJ-Wt-1 and CHESHJ-Wd-1 provides valuable resources for broadening the genetic base in breeding programs, while closely related accessions can be used for stabilizing desirable traits. The robust marker informativeness of the SSR set employed in this study establishes a strong foundation for future applications in linkage mapping, germplasm management, and marker-assisted breeding in jamun.

Breeding, conservation, and industrial applications of diverse genotypes

The combined phenotypic and molecular evidence underscores jamun’s considerable industrial potential, with distinct genotypes suited for functional foods, fresh fruit markets, and high-density orchards. Breeding strategies can capitalize on complementary parental combinations, such as crossing antioxidant-rich types with high-pulp accessions, to generate hybrids that combine nutritional value with consumer-preferred fruit qualities. At the same time, safeguarding diverse and unique accessions through systematic conservation remains critical to sustain jamun-based industrial and nutraceutical product development. Specific genotypes illustrate these opportunities: IC-715 (high pulp, low seed) and CHESHJ-V/1 (high TSS) are valuable for fresh consumption and processing, while CHESHJ-Wd-1 and CHESHJ-XI/3, with exceptionally high phenolic content and antioxidant capacity, are promising candidates for nutraceutical applications. In contrast, seed-rich accessions such as CHESHJ-VI/2 provide raw material for pharmaceutical industries utilizing seed extracts, known for their antidiabetic and antioxidant properties. Multivariate analyses revealed apparent trait-based clustering, separating antioxidant-rich genotypes from high-yielding types, offering scope for crossing programs between divergent groups—for example, combining CHESHJ-XI/3 (high total phenols and antioxidants) with Selection-58 or Kaithnal (large fruit size and yield) to develop dual-purpose cultivars. Molecular characterization further identified genetically distinct outliers such as CHESHJ-Wt-1 and CHESHJ-Wd-1, which should be prioritized in pre-breeding programs for broadening the genetic base and introducing novel alleles. From a conservation perspective, the coexistence of highly divergent and closely related accessions highlights the need for core collections that maximize diversity while minimizing redundancy. Unique morphotypes, such as the rare white-fruited CHESHJ-Wt-1, warrant both in situ and ex situ conservation to preserve rare alleles. On the industrial front, the dual value of jamun is reinforced by its diversity in pulp quality, sugars, acids, total phenols, anthocyanins, and seed traits. High-pulp, sweet genotypes are well suited for beverages, jams, and confectionery, while antioxidant-rich types support functional food and pharmaceutical development. Leaf- and seed-rich accessions extend their role to herbal formulations and medicinal applications, positioning jamun as a truly multipurpose industrial crop.

Conclusions

In this study, 23 jamun genotypes were evaluated using DUS descriptors, morphological and biochemical traits, and SSR markers. The SSR panel revealed high polymorphism (mean PIC = 0.687; maximum PIC = 0.824 for cSSR239), confirming its suitability for diversity analysis. Morphological and biochemical characterization grouped the genotypes into five clusters, broadly reflecting their geographical origin. Several superior accessions were identified: Kaithnal and AJG-85 for fresh fruit quality, CHESHJ-V/1 and CHESHJ-XI/3 for biochemical richness, CHESHJ-Wd-1 for antioxidant potential, and CHESHJ-Wt-1 for high TSS and dwarf stature, making it suitable for high-density planting. Close genetic similarity among Selection-45 & Selection-58 and CHESHJ-I/3 & CHESHJ-III/3 indicated limited breeding value, while the pronounced divergence of CHESHJ-Wd-1 makes it a valuable parental resource. The overlap between morphological and molecular clustering patterns further suggests possible marker–trait associations that warrant validation through association mapping. These findings provide a strong foundation for future genetic improvement, conservation, and industrial exploitation of jamun. Promising directions include hybridization of genetically diverse parents (e.g., CHESHJ-Wd-1 × CHESHJ-XI/3) to combine large fruit size with high antioxidant content; postharvest and packaging interventions to improve shelf life and transportability; and comprehensive nutraceutical profiling of superior genotypes to strengthen jamun’s role in functional foods and pharmaceutical product development.