Introduction

Wheat, a primary cereal crop cultivated globally across diverse agro-climatic conditions, contributes significantly by providing 20% of human protein and calorie needs. The foremost challenges to the world’s food security stem from the ever-changing climate and the rapidly growing human population. To meet the projected demand of 140 million tonnes of wheat by 2050, breeders must develop highly productive wheat varieties resilient to both abiotic and biotic stresses. PHS is one of the major abiotic stresses that adversely affects the yield and quality of wheat grains, thus lowering farmer’s income. The estimated yearly losses worldwide, as a result of PHS are US$ one billion1. In China alone, over 2.5 million tons of wheat are affected by PHS each year, causing a significant reduction in grain quality and market value and in India particularly certain regions experiencing sudden uncertain conditions i.e., high humidity and rainfall during the harvest period2. PHS in wheat is characterized by seed germination on the spike of mother plant before harvesting due to prolonged rainfall and high humidity3,4,5, posing a significant abiotic constraint, thus, impeding wheat productivity at physiological maturity6,7. The reduction in yield of germinated seeds is attributed to the hydrolysis of materials stored inside the endosperm, like starch granules and protein. This leads to a decrease in both the thousand grain weight and bulk weight. Additionally, the increase in the activity of α-amylase in germinated seeds negatively impacts the quality of wheat grains due to the reduction in starch and protein content. This deterioration in quality not only affects the overall yield but also hampers seedling quality by diminishing seed vigour8,9,10,11. Beyond quality losses, yield losses from PHS can reach up to 20–40% in susceptible varieties under humid and wet pre-harvest conditions12. This is due to both grain shattering and embryo degradation, reducing the physiological viability of seeds13,14. Consequently, PHS is increasingly recognized as a severe climate-induced threat that compromises the stability and safety of global wheat production. Major wheat-producing countries have undertaken extensive research to address this issue15,16.

PHS, a complex trait, is influenced by several key factors, including environmental conditions, seed dormancy, colour, α-amylase activity, seed coat permeability, levels of endogenous hormones, functional proteins, genes, quantitative trait loci, and other relevant elements17. PHS has been observed in a variety of cereal crops, such as maize, wheat, rice, barley, and sorghum, across numerous global regions, like Japan, India, China, the United States, Australia, Canada, North Africa, and various parts of Europe14,18. This issue is recognized as a widespread concern, occurring approximately once every 10 years in major wheat-producing areas worldwide19. The primary genetic factor influencing resistance to PHS in wheat is seed dormancy, which dictates the level of resistance. Therefore, when investigating the mechanisms of PHS resistance, a key focus is often placed on understanding the genetic regulation of seed dormancy. As per biparental genetic linkage analyses, it has been reported that Quantitative trait loci (QTL) for PHS resistance are present on all 21 chromosomes of wheat5,20,21,22. However, the regions consistently identified are predominantly situated on the chromosome 3 A23,24,25,26,27 and chromosome 4 A5,28,29,30 As a result, the predominant strategies employed to mitigate the risk of PHS entail the development and utilization of wheat varieties that exhibit resistance to PHS through selective breeding. Hence, the main goal of molecular breeding research is to improve PHST that lies in the exploration and identification of key genes and loci.

Species related to wheat are considered as potential reservoirs of untapped grain yield and quality traits31. Thus, there is an urgent demand for characterizing other species of wheat, such as T. sphaerococcum (AAEEDD, 2n = 6x = 42), an ancient wheat of Indian origin, to identify promising lines that exhibit improved quality traits. This Indian origin wheat possesses several important characteristics, including short and robust culms, hemispherical grains, higher protein content in comparison to bread wheat, and resilience to biotic and abiotic stresses31. Despite these merits, sphaerococcum wheat has been inadequately studied32,33. The introduction of high-yielding wheat varieties through the Green Revolution together with the rust susceptibility, drastically reduced its distribution and cultivation after 196034. The genetic diversity of T. sphaerococcum continues to be important since it can improve wheat crops while enhancing their nutritional properties and resilience despite its fading cultivation35. Being a hexaploid species, it holds significant potential for contributing to the improvement of bread wheat, aligning with achieving food and nutritional security outlined by the United Nations sustainable development agenda36.

GWAS have emerged as powerful tools for dissecting the genetic architecture of complex traits like PHST in diverse wheat germplasm, including Indian origin sphaerococcum wheat. The wheat cultivars AUS1408 and CN1905537; Lok138; SPR819839 were used as the donors for the introgression of tolerance trait in the Indian breeding program. Recent GWAS studies have identified several quantitative trait loci (QTLs) and candidate genes associated with PHST in T. aestivum, such as TaMKK3-A on chromosome 4 A and TaVp1 on group 3 chromosomes4,40. Over the last few decades, researchers have extensively investigated the genetic basis of PHST in wheat using bi-parental mapping and association mapping. These comprehensive studies have revealed that PHST is a complex trait influenced by numerous QTLs and genes spread across all 21 wheat chromosomes41,42,43,44,45,46,47,48,49,50. A recent review of interval mapping and association mapping for PHST identified 575 known QTLs and MTAs in wheat12. However, sphaerococcum wheat has not been used extensively for the identification of QTLs/genes associated with PHST. Furthermore, the limited genetic diversity within elite aestivum germplasm poses a bottleneck for breeding durable PHS tolerance. Ancient wheat species like sphaerococcum offer a reservoir of untapped alleles, which could be introgressed to broaden the genetic base and improve resilience under variable climatic conditions31,34. Given that QTLs identified in T. aestivum often show variable expression across environments, the discovery of robust and environment-stable QTLs in Sphaerococcum could complement existing breeding strategies5,20.

Haplotype analysis is a powerful method for enhancing MTAs and identifying superior allelic combinations in wheat. In contrast to single-SNP analysis, haplotypes take into account the combined effects of tightly linked variants within the linkage disequilibrium (LD) blocks, thereby offering superior resolution in mapping studies. This approach is particularly effective in crops like wheat, which possess a large and complex genome with extensive LD and polyploidy51. Haplotype-based approaches have been widely used wheat to analyse quantitative traits such as PHS52, grain yield53, disease resistance54, glume colour55, nitrogen-use efficiency56, and glume pubescence57.

In view of the above, the current study was conceptualized to identify potential markers/genes associated with PHST in Indian dwarf wheat (Triticum sphaerococcum) using GWAS in a panel of 116 T. sphaerococcum wheat accessions, a diverse collection procured from three different gene banks across the world using the Affymetrix 35 K Axiom Wheat Breeders’ Array. The findings of our current study will improve our understanding and offer valuable gene resources for improving the PHST of Indian Dwarf Wheat.

Materials and methods

Plant material and data recording

A set of global collections consisting of 116 accessions of T. sphaerococcum were evaluated for their PHST at three different environments, ICAR- National Bureau of Plant Genetic Resources (NBPGR), New Delhi (E1), ICAR- Indian Agricultural Research Institute (IARI), Wellington (E2) and Mahatma Phule Krishi Vidyapeeth Rahuri (MPKV), Rahuri, Maharashtra (E3), with three biological replicates of each accession in 2023-24. The passport data of the accessions is given in the supplementary table (Table S1). Five spikes from each accession were harvested on physiological maturity, indicated by the loss of green colour from the spike58. A scale of 1 to 9 was used to evaluate PHS data; genotypes with no visible sprouting were given a score of 1, while genotypes with full sprouting were given a score of 9. This scoring system was adapted from59. Within one hour of harvest, spikes were soaked in water for 4–6 h. Subsequently, the spikes were incubated in a closed chamber in laboratory at ~ 20℃ to 25℃ and near saturated (90–100%) relative humidity on a layer of moist sand measuring 7.5 cm in thickness and covered with two layers of wet jute bags using the moist-chamber laboratory assay described by Baier (1987)60. To prevent drying, the spikes were regularly watered every 3–4 h. Ten days after the spikes were harvested and first submerged in water, observations on sprouting were made.

Statistical analysis

SAS v9.3 software was used to conduct one-way analysis of variance (ANOVA) in order to further evaluate associated variance components such as genotype, environment, and their interaction for PHST. Best Linear Unbiased Predictions (BLUPs) were estimated for combined phenotypic data of three environments (CE) using a linear mixed model, where genotypes were treated as random effects and environments as fixed effects using the lme4 package in R software61. This approach accounts for environmental variation and provides unbiased predictions of genotypic performance across environments, following the methodology described by62. The descriptive statistics for the PHST trait was evaluated in the four environments (E1, E2, E3, and CE) (Table S2).

SNP genotyping of wheat accessions

The genomic DNA from 116 lines was extracted separately from 15-day-old seedlings by following the CTAB procedure63. The association panel consisting of 116 wheat accessions were genotyped using a 35 K Axiom Wheat Breeders Array, following Affymetrix’s protocol (Axiom 2.0 Assay for 384 samples P/N 703154 Rev. 2) for wheat. This process resulted in the identification of 35,143 SNPs. To refine the dataset for downstream analysis, SNPs with a minimum allele frequency (MAF) below 0.05 were excluded. Ultimately, 15,308 polymorphic SNPs were retained for the subsequent GWAS analysis.

Population structure and linkage disequilibrium (LD)

Population structure and PCA were analysed in our previous study using STRUCTURE v 2.3.464. Intra-chromosomal LD between all potential pairwise comparisons of SNPs was computed using TASSEL v5.0 as squared allele frequency correlation (r2)65. The background LD was measured to determine a significant distance for LD decay. The average pattern of genome-wide LD decay across physical distance was evaluated using a scatter plot of r2 values against the matching physical distance between the markers. The degree of LD decay was evaluated using the LOESS (Locally Weighted Scatter-plot Smoother) model66. The 95th percentile of the square root transformed r2 data of unlinked markers was used to get the r2 value67.

GWAS and pyramiding effect of desirable alleles

MTAs for PHST trait were identified using each of the following models: (i) Compressed Mixed Linear Model (CMLM), (ii) Bayesian-information and linkage-disequilibrium iteratively nested keyway (BLINK) and (iii) Fixed and random model Circulating Probability Unification (FarmCPU), (iv) Multiple loci mixed model (MLMM), (v) Mixed Linear Model (MLM) and General Linear Model (GLM). All these models were implemented in R using GAPIT software package68. The CMLM and GLM allowed SL-GWAS while the FarmCPU, BLINK, MLMM and MLM allowed ML-GWAS analysis. Further, GAPIT was used to compute a marker-based kinship matrix (K). While fitting GWAS models, information about the kinship matrix (K) and population structure (Q) was also employed as covariates. The P value was used as a criterion for the identification of significant marker-trait association, while the coefficient of determination (R2) value was used to assess the magnitude of the marker effects. Further, false discovery rate (FDR) was used as a corrective measure for the problem arising due to multiple hypothesis testing. A threshold p-value < 0.001 was used to declare significant QTLs in the current study. MTAs associated with at least two models, or two environments were designated as consistent QTLs. The pyramiding effect of SNPs associated with PHST was evaluated using linear regression analysis. The number of desirable SNP alleles was used as the independent variable, while the corresponding trait values of genotypes carrying varying numbers of these alleles served as the dependent variable69.

Mining candidate regions for key genes and haplotype blocks

The physical position of each SNP was used as input in the EnsemblPlants database (http://www.ensembl.org/info/docs/tools/vep/index.html). For each SNP, the corresponding chromosomal region was extended by 1 Mb upstream and downstream, generating a 2 Mb interval for mining potential candidate genes (CGs) associated with seed germination and dormancy. The Biomart tool, available at the EnsemblPlants database, was used to extract information on proteins encoded by the genes. To determine the potential involvement of the identified CGs in regulating the PHS trait, their annotations were confirmed through published papers. The GO annotations (including molecular function and biological process) for each CG were extracted from the IWGSC website (http://www.wheatgenome.org/).

To investigate haplotype variation within these key regions, haplotype analysis was performed for MTAs with detectable haplotype blocks in their surrounding LD region using the geneHapR package in R70,71. Two MTAs (AX-94415302, AX-95097524) did not show any haplotype blocks in their surrounding LD regions and were therefore excluded from this analysis.

Identification of superior haplotypes by Haplo-Pheno analysis

To identify superior haplotypes associated with the PHS trait, haplo-pheno analysis was performed using the geneHapR package R70,71. This analysis aimed to evaluate the phenotypic effects of all the 12 significant MTAs from the GWAS, including the two excluded from haplotype analysis, to group genotypes based on extreme haplotypes for trait evaluation. Extreme phenotypic classes for PHST (highly resistant and highly susceptible accessions) were selected to ensure clear differentiation. Haplo-pheno analysis allowed comparison of PHST values across haplotype groups, thereby distinguishing favourable allelic combinations. Genotypes carrying haplotypes with significantly higher PHST scores were considered superior, representing promising candidates for breeding.

Results

Statistical analyses

ANOVA was performed to assess the effects of genotype and environment on the observed trait. The results revealed that genotypic differences were highly significant (F = 15.97, p < 2e–16), indicating substantial variation among genotypes for the trait under study. In contrast, the effect of the environments (E1, E2 and E3) were statistically non-significant (p = 0.534), suggesting that variation due to environmental replications was minimal (Table S3). The residual variance accounted for the remaining unexplained variability. Overall, these results highlight the strong genetic influence on the trait, confirming the potential for selection and genetic improvement (Table S3). Notched box plots were employed to visualize the distribution of PHS values across three different environments (E1, E2, E3) (Fig. 1a). The PHS trait shows consistent median values and distribution across the three environments (E1, E2, E3) suggesting good repeatability and minimal environmental variability. Phenotypic data from three environments (E1, E2 and E3) were combined using BLUPs to account for genotype-by-environment interactions and improve the reliability of trait estimates. These BLUP values were then used for GWAS to identify markers consistently associated with PHST trait across varying environmental conditions37,72,73. The hierarchical clustering treatment divided accessions into four distinct phenotypic groups. Lower BLUP values indicate higher tolerance to PHST. Genotypes in Cluster 1 had the lowest values, suggesting they are highly tolerant. Cluster 3 genotypes showed slightly higher values, indicating moderate tolerance. In contrast, Cluster 4 genotypes were moderately susceptible, and the highest values were observed in Cluster 2, which suggests that these genotypes are the least tolerant to PHS (Fig. 1b). The visual scoring method used for PHS on a scale of 1–9 is shown in Fig. 1c.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

(a) Notched box plot showing the distribution of PHS score in different environments, (b) genotypic variation in PHST trait value across the clusters, (c) scale used for measuring PHS trait.

Population structure, marker coverage and LD analysis

Population structure analysis led to the identification of four sub-populations amongst the 116 T. sphaerococcum accessions as reported in our previous paper64. After filtering, 15,308 polymorphic SNPs out of 35,144 SNPs were utilized for association mapping. Out of the 15,308 filtered SNPs, 4802 were mapped on the A sub-genome, 5925 on the B sub-genome, and 4581 on the D sub-genome (Fig. 2a). The number of SNPs mapped on individual chromosomes ranged from 255 (Chr4D) to 1,054 (Chr2B). The distribution of SNPs on three sub-genomes showed that A sub-genome has the maximum SNPs on Chr7A (823), followed by Chr2A and Chr5A (787); the B sub-genome has the maximum SNPs on Chr2B (1054), followed by Chr1B (992), whereas the D sub-genome has the maximum SNPs on Chr2D (1033), followed by Chr1D (742) (Fig. 2b).

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

 (a) Distribution of filtered SNPs across chromosomes and sub-genome used for the GWAS analysis, (b) marker density Plot showing distribution of chromosomes across the 21 wheat chromosomes.

Chromosome-wise LD plot was also drawn for 15,308 SNP markers to investigate pair-wise linkage among markers. Individually, the average R2 of genome-wide LD was 0.197 for sub-genome A, 0.172 for B and 0.177 for D sub-genome. SNP markers, with their assigned physical position on the map, were further used to estimate intra-chromosomal LD. The coefficient of regression (r2) for LD across 21 wheat chromosomes was minimum for chromosome 4D (0.175) and maximum LD was for chromosome 2D (0.315). The fastest LD decay was observed for the D sub-genome, followed by B and A sub-genome (Fig. 3). In the D sub-genome, r2 value for the marker pair was reduced to 6.81 Mb as compared to 11.24 Mb in B and 12.06 Mb in the A sub-genome. A detailed summary of markers, including chromosome distribution, average LD score, and other associated statistics, is presented in Table 1.

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Estimation of Linkage Disequilibrium (LD) decay rate for the (a) A sub-genome, (b) B sub-genome, (c) D sub-genome, and (d) whole genome.

Table 1 SNPs distribution across 21 chromosomes, their size, LD values and other associated details.

Genome-wide association analysis

A total of twelve MTAs for PHST were identified using single-locus and multi-locus models (p < 0.001) (Table 2). The genome-wide significant p-value threshold was adjusted based on Bonferroni correction. The MTAs were distributed on eight chromosomes (1 A, 1B (4), 1D, 2 A (2), 3 A, 3D, 5 A, 6B). The B genome and A genome harboured the maximum number of MTAs (five) followed by D genome (two MTAs). Five stable MTAs AX-94,415,302, AX-94,919,611, AX-94,403,953 on chromosome 1B, AX-95,220,897 on chromosome 6B and AX-94,756,068 on chromosome 2 A were found as they are common in all the three environments and CE and were also found common across the three models (FarmCPU, BLINK and GLM). AX-94,414,200 on chromosome 1B was found in two environments (E1 and E2) and CE. AX-94,823,205 (Chr1A), AX-94,939,596 (Chr1D), AX-94,523,390 (Chr2A), AX-95,003,297 (Chr3A), AX-94,580,041(Chr3D), AX-95,097,524 (Chr5A) were found consistent in only two environments (E1 and E3). Notably, the AX-94,919,611 (Chr1B) marker was identified and detected consistently by all six methods across all the three locations and thus it was a stable and major locus. Manhattan plots and the QQ plots of the GWAS results for CMLM, FarmCPU, BLINK, and MLMM are shown in Fig. 4.

Fig. 4
Fig. 4 The alternative text for this image may have been generated using AI.
Full size image

Circular Manhattan Plots obtained from (a) BLINK, (b) FarmCPU, (c) CMLM, (d) MLMM. In each circular plot, inner, middle and outer plots represent E1, E2, and E3 environments, respectively. The LOD threshold value - log10(p) ≥ 3 is indicated as red-colored dotted circle. For each case, multi-track Q-Q plots are shown alongside the circular Manhattan plots.

Table 2 List of significant markers associated with PHST (CE, E1, E2, E3).

Pyramiding effect of desirable alleles

The pyramiding effect of desirable alleles from multiple associated SNPs was evaluated using linear regression analysis. In the case of PHST, twelve MTAs exhibited significant associations across various environments. A progressive accumulation of up to eight favourable alleles corresponded with a marked reduction in PHS levels, as illustrated in Fig. 5. The estimated regression coefficients for these associations ranged from 0.173 to 0.201.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

Linear regression analysis depicting the relationship between the number of desirable SNP alleles (independent variable) and PHS score (dependent variable). R2 = regression coefficient; **represents 0.0001 level of significance.

Identification of candidate genes and haplotype diversity

A 1 Mb region flanking on either side of the significant MTAs linked to PHST trait were employed to pinpoint CGs through the annotated wheat reference sequence (Wheat Chinese Spring IWGSC Ref Seq v2.1 genome assembly, 2021). A total of 176 PHS-related CGs were identified, of which 47 genes, associated with 9 markers, were directly involved in regulating seed germination and dormancy. The remaining gene (129) are likely to influence pre-harvest sprouting (PHS) indirectly through pathways related to stress response, hormone signalling, and transcriptional regulation. These 47 potential CGs encoded proteins that contained 15 different types of domains. Some of the important domains include the following: (i) leucine-rich-repeat (LRR) superfamily, (ii) NAC domain superfamily, (iii) serine/threonine protein kinase, (iv) F-box domain, (v) WRKY domain, (vi) SANT/Myb domain, (vii) cytochrome P450, (viii) homeobox like domain, (ix) WD40 repeat. A set of 14 CGs underlying 4 MTAs located on 4 different wheat chromosomes encoded proteins that contained F-box domain (Table 3). Similarly, 10 CGs associated with 2 MTAs on nine different chromosomes encoded proteins that contained SANT/Myb domain. Detailed information of 176 CGs and their functional annotations are presented in Table S4.

Haplotype analysis across the LD block regions of 10 significant MTAs revealed variable haplotype blocks, reflecting genetic diversity around these candidate regions. The highest diversity was observed in the candidate region of MTA AX-94,523,390, with 32 haplotype blocks, followed by 27 haplotype blocks in the LD regions of AX-94,414,200 and AX-94,409,353. In contrast, the LD region of MTA AX-94,823,205 exhibited the lowest haplotype diversity, with only seven haplotype blocks (Fig. 6).

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

Bar graph representing the number of haplotypes surrounding each MTA.

Table 3 A summary of MTA’s and associated CGs that contain major domains controlling PHST.

Superior haplotypes regulating PHST

Haplo-pheno analysis of the significant MTAs revealed eight distinct haplotypes and allele patterns underlying variation in pre-harvest sprouting tolerance (PHST) (Fig. 7). Among these, five haplotypes (H001–H004 and H008) were predominantly associated with PHS susceptible accessions, including TS67, TS3, TS26, TS10, and TS14. In contrast, three haplotypes - H005 (TS28), H006 (TS64), and H007 (TS81) - were consistently present in PHS tolerant accessions, representing favorable allelic combinations for PHS tolerance. These superior haplotypes exhibited significantly higher PHST values compared to the susceptible groups, clearly distinguishing tolerant and susceptible genotypes. The tolerant haplotypes identified here provide valuable targets for marker-assisted selection and can serve as novel genetic resources for introgression of PHS tolerance into elite wheat breeding lines (Fig. 8).

Fig. 7
Fig. 7The alternative text for this image may have been generated using AI.
Full size image

Haplotype analysis leads to the identification of 8 haplotypes, among which 5 (H001, H002, H003, H004, H008) were involved in conferring tolerance to PHST while the remaining 3 (H005, H006, H007) were involved in conferring sensitivity to PHS.

Fig. 8
Fig. 8The alternative text for this image may have been generated using AI.
Full size image

Developing PHS tolerant T. sphaerococcum varieties with superior haplotypes.

Discussion

Triticum sphaerococcum, often known as Indian dwarf wheat, had been cultivated for thousands of years prior to the Green Revolution. This wheat, which is native to India and Pakistan, was preferred by local farmers due to its unique characteristics, which include round grains, erect leaves, sturdy short stems, and a high tolerance to abiotic stressors. These characteristics have significant potential for modern wheat improvement74. However, modern wheat cultivars have replaced Indian dwarf wheat nationwide due to the advent of domestication and the Green Revolution. Despite this decline, Indian dwarf wheat still harbors numerous unexplored genetic variations31,34,74 that would help in overcoming the limitations of bread wheat for numerous traits. Among these, PHS represents a major challenge, as it reduces grain quality and causes substantial economic losses, particularly in regions with humid harvest conditions75. Interestingly, sphaerococcum wheat, characterized by its distinctive round grains, has recently gained attention for its potential tolerance to PHS76. Therefore, it would be wise to investigate the possibility of introgressing PHS tolerance alleles from T. sphaerococcum to bread wheat as a promising strategy for wheat improvement. In the current study, T. sphaerococcum accessions were genotyped using the 35 K Axiom Wheat Breeders’ Array, chosen for its high density and broad genome coverage. This genotyping not only provides valuable insights into the genetic makeup of the species but also offers wheat improvement programs access to diverse novel alleles that can be harnessed for enhancing PHS tolerance. To our knowledge, this study provides the first insights into PHS tolerance in T. sphaerococcum, as no previous reports are available for this species. Therefore, the findings are interpreted in comparison with those reported previously for T. aestivum.

The conventional test that simulates field conditions was used to assess PHS in this study, as well as several previous investigations14,25,77,78. The test required immersing the spikes in water and making sure they stayed moist for a duration where sprouting could take place in the susceptible accessions. However, other factors such as Falling number (FN) and alpha-amylase activity could also be used to assess susceptibility to PHS, but each has its limitations79. It has been demonstrated that these parameters have a strong correlation with PHS and may thus be used to assess PHS. Although these parameters are associated with PHS, they may not be the reason for PHS because FN and alpha-amylase measure the quality of the endosperm after sprouting rather than the PHS itself25. In our experiment, the germination test with intact spikes was applied to check how much the seeds can resist sprouting, rather than examining the condition of their endosperm. Using the results in the present study, one can say that a germination test is suitable for determining if a species is likely to sprout in the fields due to uncertain climatic situations while maturity.

The overall genetic richness and diversity of wheat genome is assessed by the distribution and density of polymorphic markers. Out of 15,308 SNPs used in our analysis, the lowest frequency was observed in the D sub-genome, and the B sub-genome harboured the maximum frequency, which is consistent with the previous studies80. The B and A sub-genomes, which are considered to be older, contain a higher number of SNPs, probably due to their domestication events earlier by gene flow and gene duplications, which have caused more mutations to accumulate over time81,82.

Since phenotypes are affected by genes and the environment, using SNPs to group accessions is better than using phenotypes alone. By using population structure information as a way to lower the chance of finding false associations, population structure analysis greatly reveals the understanding of genetic diversity and increases the accuracy of association mapping83,84. We found four subpopulations in our analysis that varied in their allele frequencies, and these could be linked to genetic bottlenecks, recombination events over ages, selection pressures (artificial and natural)85,86.

The precision in the association mapping depends on the selection of an appropriate number of markers, the extent of the LD, and its decay rate in the mapping panel used87. Thus, it is important to determine the range of LD in the species under study. In our study, we observed the range of LD in each of the three sub-genomes separately and across the whole genome to gain a deeper understanding of the genetic architecture. As per the earlier reports, LD decay occurs faster in sub-genome B when compared with sub-genomes A and D88,89. However, in our study, the LD decay rate was faster in sub-genome D, followed by sub-genomes B and A, which aligns with results from previous studies90,91. The D sub-genome usually exhibits faster LD decay because of its higher and more consistent recombination distribution along the chromosomes92. The most likely cause for the differences in LD decay patterns among the sub-genomes could be the usage of different study materials with distinct population stratifications and selection pressures and, levels of gene flow34.

Twelve MTAs for PHS in wheat were found on the chromosomes 1 A, 1B, 1D, 2 A, 3 A, 3D, 5 A, and 6B. This information sheds light on the genetic makeup of this intricate trait. Interestingly, most of these markers were found in the B and A genomes, which is consistent with earlier research that found important PHS-related quantitative trait loci QTLs on these chromosomes. For example, Chao et al.93 highlighted the important role of chromosomes 2B, 3 A, and 4 A in PHS resistance by identifying key PHS-related quantitative QTLs on these genomes. Munkvold et al.94 also identified QTLs on chromosome 1B, highlighting the B genome’s significance in PHST. Furthermore, up to 78.03% of the phenotypic variance was found to be explained by a significant QTL for PHS tolerance that was identified on chromosome 3 A by Kulwal et al.95. Together, these investigations highlight the critical roles that the A and B genomes play in wheat’s tolerance to PHS. Similarly, markers on chromosomes 1 A, 1D, 2D, 3 A, 3D, and 5 A, which are consistently detected in two environments, and AX-94,414,200 on chromosome 1B, which was found in three environments, demonstrate their importance in breeding for PHST. The consistent detection of markers AX-94,415,302, AX-94,919,611, AX-94,403,953 on chromosome 1B, AX-95,220,897 on chromosome 6B, and AX-94,756,068 on chromosome 2 A across all four environments and three models reveals their strength as stable and reliable indicators for PHST in wheat.

QTLs for PHST/dormancy were found to exist on up to 20 distinct chromosomes in previous wheat studies, with chromosome 1D being the only exception23,76,96,97,98,99,100,101,102,103. Interestingly, no significant associations were detected on chromosome 4 A in our study, despite it being widely recognized as a major locus for PHS tolerance97,104,105,106,107,108,109,110,111,112,113,114. Despite the fact that T. sphaerococcum and T. aestivum are both hexaploid wheats, the absence of the well-known 4 A gene for PHS resistance in bread wheat may be due to species-specific differences in genomic architecture and allelic composition. It is plausible that the allelic variation on chromosome 4 A that confers PHST in T. aestivum is either absent, fixed, or replaced by other genomic areas in T. sphaerococcum, as our work focused on T. sphaerococcum, a species different from bread wheat (T. aestivum). Such species-specific divergence suggests that PHST in T. sphaerococcum may be governed by novel loci not previously reported in bread wheat, thereby underscoring the unique contribution of this species to broadening the genetic base for PHST.

The pyramiding effect of multiple associated SNPs was found to be significant, and the genotypes carrying a greater number of favourable alleles consistently exhibited superior phenotypic performance when compared to those with fewer favourable alleles (Fig. 5). ​Although the regression coefficients (R² values) for the pyramiding effect were statistically significant, they were relatively low (0.173–0.201), indicating that the pyramided alleles accounted for only about 20% of the total phenotypic variance. This highlights the complex and polygenic nature of PHS tolerance, suggesting that additional genetic factors and interactions beyond the pyramided alleles contribute substantially to the trait.

The 47 potential CGs identified in this study are essential for controlling several molecular processes linked to seed dormancy and PHS tolerance in wheat. Gene ontology analysis revealed that these CGs are involved in several key functions like abscisic acid (ABA) signalling, gibberellin (GA) biosynthesis, and seed dormancy and germination. A significant portion of the CGs identified in this study are involved in the ABA biosynthesis and signalling network, therefore, it is important to understand the molecular mechanisms driving PHST in wheat (Table 3). These ABA-related genes are involved in glucose signalling, metabolism, root growth, faulty embryo formation, catalytic activity, and ABA signalling breakdown or deactivation96,97,98,99,100,115,116,117,118,119. Leucine-rich repeat (LRR) genes are well known for playing crucial roles in plant development, immunology, and stress responses because of their involvement in signal transduction pathways. Although there is currently little direct functional evidence, LRR-containing proteins in wheat are progressively being linked seed dormancy and PHST. The wheat genome contains several QTLs linked to PHST, particularly on chromosomes 3 A and 4 A, which are areas frequently enriched with genes producing LRR proteins95,120. For example, the QPhs.ccsu-3 A.1 QTL was introgressed into the PHS-susceptible cultivar HD2329, and the region contains CGs, some with LRR domains, that may influence hormonal pathways such as ABA signalling, which is critical for maintaining seed dormancy102,121. In many plant species, two main endogenous hormones - ABA and GA are generally believed to regulate seed dormancy and germination (i.e., PHST) antagonistically103,122. While GA promotes germination, ABA contributes to the promotion of dormancy123,124. The exact regulatory mechanisms controlling the balance between ABA and GA in seed dormancy and germination are still not fully understood, despite years of intensive research on the two chemicals. A majority of the CGs found in our analysis contain the F-Box domain (Table 3). In wheat, the F-box protein gene TaFBA1 plays an important role in abiotic stress tolerance. Given the key role of ABA in maintaining seed dormancy and preventing premature germination, F-box proteins like TaFBA1 might influence PHS tolerance by modulating ABA signalling pathways. Moreover, the interaction of F-box proteins with key components of the ABA signalling cascade, such as RCAR1 and ABI5, emphasizes their potential regulatory role in seed dormancy mechanisms106,125. Thus, F-box domain-containing genes in wheat are promising candidates for future research aimed at improving PHST through genetic and biotechnological approaches. These findings imply that the genetic regulation of PHS and seed dormancy is extremely complex, and a more thorough investigation is required to completely comprehend how these CGs contribute to PHS tolerance. Additionally, Myb 10-D proteins are also known to confer PHST by enhancing ABA biosynthesis, thereby delaying germination in wheat11.

Additionally, our haplotype analysis suggests the contrasting distribution of alleles between PHS tolerant and susceptible genotypes which underscores the need for the potential utility of H005, H006, and H007 as reliable markers for MAS and haplotype-based breeding in wheat for PHST. Furthermore, the SNPs underlying the haplotypes identified in our study are located in genomic regions harbouring genes annotated for seed dormancy regulation, hormone signalling pathways like ABA and GA, and cell wall metabolism, all of which have been previously implicated in PHS resistance105,108. Superior haplotypes (H005, H006 and H007) may therefore, represent allelic variants that enhance dormancy or strengthen protective barriers, conferring greater tolerance to PHS under humid conditions. In contrast, the inferior haplotypes (H001, H002, H003, H004 and H008) may either lack these favourable alleles or carry alternate variants that reduce dormancy and increase susceptibility to PHS. These results imply that the superior haplotypes may represent favourable allele combinations rather than just statistical relationships, though functional validation will be necessary. Such insights strengthen their value as targets for MAS and for Introgression into elite wheat lines to broaden the genetic base of PHST.

Conclusion

This study provides the first evidence of novel genetic variation for PHST in Indian dwarf wheat, establishing a foundation for future breeding strategies in wheat improvement. Genetic diversity analysis of T. sphaerococcum using 35 K Axiom array revealed significant genetic variability across 116 accessions. By evaluating 116 accessions across three environments, we identified twelve significant MTAs, of which several were consistent across environments and models, predominantly in the A and B sub-genomes, which further supports the potential role of these genomic regions in regulating PHST. Candidate gene analysis revealed key functional categories, including ABA/GA signalling components, F-box proteins, LRR domain-containing genes, and Myb10 transcription factors, all of which are implicated in the regulation of seed dormancy and sprouting responses. The abundance of genes associated with ABA and GA signalling, particularly those with LRR and F-box domains, highlights the complex hormonal interplay that regulates dormancy and sprouting responses. Additionally, the presence of favourable haplotypes (H005, H006, and H007), together with the pyramiding effect of favourable alleles, highlights the potential for haplotype-assisted selection to enhance PHST in wheat. Our study highlights the value of this neglected wheat germplasm for identifying new alleles and genetic regulatory mechanisms, offering significant potential for developing wheat cultivars that are more tolerant to PHS in the face of climate change. Future functional validation and genomic studies are essential to further unravel the complexity of PHST and effectively translate these discoveries into applied breeding strategies.