Introduction

Safflower (Carthamus tinctorius L.) is a significant oilseed crop valued for its edible oil, natural dye, and a high-quality use in poultry feed1,2. Iran is among the largest importers of oilseeds, cooking oil production and as components of animal feed. In recent years, increasing demand for oilseed crops has intensified interest in the cultivation and breeding of adaptable oilseed crops such as safflower3,4. Renowned for its resilience to drought and salinity, safflower is believed to have been domesticated over 4000 years ago in the Fertile Crescent (encompassing present-day Israel, Syria, Iran, Iraq, and Turkey) and the Indian subcontinent5,6,7. Water-deficit stress is one of the most significant abiotic stresses due to its unpredicted timing, severity, and duration. Many safflower-producing countries, including India, Pakistan, Turkey, and Iran, are located in semi-arid regions characterized by winter-dominant rainfall. Consequently, safflower plantations in these areas often suffer yield losses caused by terminal drought stress8,9. While safflower exhibits notable tolerance to water stress, its seed yield is negatively affected by terminal drought during the flowering and seed-filling stages, similar to other crops10,11.

Although safflower exhibits the ability to tolerate drought stress with only a slight reduction in seed yield, it is considered a climate-resilient crop. However, research efforts and genetic insights into safflower’s drought tolerance mechanisms remain limited compared to other oilseed crops, likely due to its restricted cultivation area status as a minor crop worldwide12,13. Improving seed yield under stress conditions is more challenging than in favorable environments, as seed yield is a complex trait governed by multiple genes and significantly influenced by environmental factors14. Therefore, understanding the genetic regulation of seed yield, oil content, and morphological traits in safflower is crucial. This knowledge can enable breeders to design effective strategies for developing high-yield, high-oil content safflower genotypes that are drought-tolerant in arid regions11.

Drought stress tolerance in plants is a highly complex trait regulated by numerous genes, making it challenging to directly select drought-tolerant genotypes based on a single trait. The drought-tolerance coefficient (DC) and stress-tolerance index (STI) quantify trait loss under drought stress compared to non-stress conditions. These indices are widely used across various crop species as reliable measures for the indirect selection of drought-tolerant genotypes15,16.

An essential breeding strategy for developing improved genotypes involves identifying genes or quantitative trait loci (QTL) associated with seed yield and key agronomic traits through multi-environmental phenotypic and high-throughput molecular analyses15. Various molecular markers, such as SSR, AFLP, RAPD, and SNP have been employed to assess genome diversity in safflower17,18. However, many of these studies face limitations, including low genome coverage, low polymorphism, high costs and time-consuming processes. Advances in next-generation sequencing (NGS) techniques now enable researchers to generate thousands of loci with extensive genomes coverage using a DNA chip. These advancements have significantly reduced cost and improved efficiency for whole-genome sequencing (WGS) techniques across crop species, including safflower18,19,20.

In this context, to reduce the cost and time required for generating markers with wide genome coverage, a microarray-based technology called Diversity Array Technology (DArT) developed a genotyping-by-sequencing (GBS) platform known as DArTseq for various plant species21,22,23 , including safflower7,24. An additional advantage of DArTseq technology is its superior genome coverage compared to other GBS techniques, due to its higher sequencing depths and strict marker filtering criteria25.

While characterizing genes or QTLs associated with desirable traits in plants using bi-parental mapping populations, several limitations arise. This include low recombination resulting from the limited alleles contributed by each parent and small mapping populations, both of which lead to low mapping resolution26. Compared to traditional bi-parental QTL mapping, genome-wide association studies (GWAS) offer higher mapping resolution by analyzing a larger number of loci in a diverse set of individuals from the germplasm. This approach generates broader genetic variation and allows the simultaneous evaluation of many alleles across the germplasm, making it faster and more cost-effective, especially for complex quantitative traits27,28. To our knowledge, only a few genome-wide association studies (GWAS) have been conducted on safflower for genetic dissection of agronomic traits. These studies have often been limited by a small number of markers or have focused exclusively on traits evaluated under rainfed conditions24,29.

Building on this background, this study evaluated a panel of geographically diverse safflower genotypes in field conditions under two water regimes (irrigated and rainfed) over three years (2016–2018). Additionally, a genome-wide association mapping analysis was performed using high-density chromosome-level assembled DArTseq markers to identify QTLs associated with seed yield, seed oil content, and agronomic traits across both environments. Notably, this research represents the first report of chromosome-level positioning of DArTseq markers based on the safflower reference genome.

Methods

Plant materials and field assessment

A panel of 90 safflower genotypes, preserved at Iranian Seed and Plant Improvement Institute (SPII), Iran was used in this study. These genotypes was selected based on a preliminary molecular phylogenetic analysis and the agronomic performance of 135 genotypes4. The panel represents germplasm from 12 countries, including regions of origin and domestication center of safflower6,7. Detailed information on genotype code, name and origin is provided in Supplementary Table S1. The association panel was cultivated under both non-stressed and water deficient (rainfed) environments at the Dryland Agricultural Research Institute (DARI) of Sararood, Kermanshah, Iran (47° 20ʹ N latitude, 34° 20ʹ E longitude, and 1351.6 m altitude) across three consecutive seasons (2016, 2017 and 2018). Kermanshah features a hot Mediterranean/dry-summer subtropical climate (Köppen-Geiger classification: Csa), characterized by mild temperatures with moderate seasonal variation. Summers are typically dry and hot due to subtropical high-pressure systems, while winters are marked by moderate temperatures and variable, rainy conditions influenced by the polar front.

Detailed information on soil physicochemical properties, as well as monthly precipitation and temperature data (2016–2018), is provided in Supplementary Table S2. The field experiment followed a 10 × 9 lattice design with two replications per irrigation regime (rainfed and irrigated) each year. Seeds were planted in three rows of 3 m per plot, with 50 cm spacing between rows and 20 cm spacing between plants within rows. Sowing occurred on March 1st, and harvesting was completed by September 10th for all experiments and environments. Water-deficit stress plots were maintained under rainfed conditions (no irrigation), while non-stress plots received three irrigation at 10-day intervals from the flowering to seed-filling stages. Favorable rainfall and temperature during March and April supported germination and plant establishment in the field. However, very low rainfall and high temperature during May to September (flowering to maturity stages) created a "drought stress environment" for the rainfed treatments in this experiment.

Morphological data collection and statistical analysis

Phenotypic traits for each genotype were assessed based on ten plants from the middle row in each replication across both environments. At maturity, data were recorded for seed yield (SY), 1000-seed weight (TSW), number of heads per plant (NHP), plant height (PH), head diameter (HD), and number of seeds per head (NSH). Days to flowering (DTF) were recorded when 50% of plants in each plot had flowered, while day to maturity (DTM) were recorded when 50% of the plants reached maturity. Seed oil content (OIL) was measured using 10 g of seeds from each genotype in both environments using the Soxhlet method and calibration of NIRS equations for safflower oil content12,30. Relative-water content (RWC) for each genotype in both environments was calculated using the following equation31:

$${\text{RWC}} = \, \left( {{\text{LFW}} - {\text{LDW}}} \right)/\left( {{\text{LTW}} - {\text{LDW}}} \right)$$

Here, LFW represents the leaf fresh weight, LDW is the leaf dry weight after drying at 72 °C for 5 days, and LTW is the turgid weight of leaves after soaking in water for 6 h.

The three-year data sets in both environments were initially analyzed separately for each trait using a linear mixed model (LMM) that accounted for spatial effects (row and column effects). Analysis of variance (ANOVA) was performed to evaluate the effects of genotype, year, irrigation regimes and their interactions, using SPSS software (IBM Inc., Armonk, NY, United States). The resulting best linear unbiased estimator (BLUE) values for each genotype were then used to fit a multiple-environment (years) LMM, treating years as random effects32. The BLUE values for each quantitative trait were subsequently used for statistic descriptive parameters and Pearson’s correlation coefficient analysis between measured traits, utilizing the “cor” function in the R platform. Additionally, the “FactoMineR” package was employed to conduct principal component analysis (PCA) on the correlation matrix to examine the distribution of quantitative traits.

To assess the adaptation of safflower genotypes to drought stress, the BLUE values for each trait under drought stress (DS) and non-stress (NS) conditions were used to calculate the drought-tolerance coefficient (DC) using the following equation15:

$$DC=\frac{{Y}_{DS}}{{Y}_{NS}}$$

The drought-tolerance stress index (STI) was then calculated using the following formula33:

$$STI=\frac{{Y}_{NS \times }{Y}_{DS}}{({{{\bar{Y}}}_{NS) }}^{2}}$$

where, YDS is the performance of a genotype in the DS environment, YNS is the performance of the same genotype in the NS environment and \({{\bar{Y}}}_{NS}\) is the mean YNS of all genotypes. The DC and STI values summarized in supplementary Table S3. Based on measured morphological traits, as well as STI and DC-values, safflower genotypes were grouped by calculating Euclidean distance and applying the weighted pair group method with arithmetic mean (WPGMA) to evaluate their drought tolerance level.

DArTseq markers, filtering and genome assembly

Ten seeds from each safflower genotype were sown in pots, and after 20 days, fresh pooled leaves from each pot were used for DNA extraction using the CTAB protocol34, with minor modification35. The quality and quantity of extracted DNA were assessed via 0.8% agarose gel electrophoresis, and the final concentration was adjusted to 100 ng/µl. DNA samples from 90 safflower genotypes were sent to Diversity Arrays Technology Pty Ltd, Canberra, Australia (https://www.diversityarrays.com/), for genotyping by sequencing (GBS) analysis using high density SilicoDArT and SNP markers, as previously described for safflower7 and other crop species36,37,38. Quality parameters, including call rate, polymorphic information content (PIC), and marker reproducibility, were computed using DArTsoft v.7.4.7 (provided by DArT Pty Ltd). Markers were filtered to retain those with a minor allele frequency (MAF) of at least 5% and a maximum of 10% missing data24,39. This study reports the DArTseq markers for these safflower genotypes for the first time. Since their chromosomal locations were previously unknown, the physical position of the filtered markers were determined by mapping their flanking sequence to the draft safflower genome assembly (http://safflower.scuec.edu.cn), which includes 12 pseudochromosomes and was recently reported20.

Genetic diversity, population structure and linkage disequilibrium (LD)

To evaluate the phylogenetic diversity among the safflower genotypes, an identity-by-state (IBS) distant matrix was calculated in TASSEL v.5.2.3740. Cluster analysis based on this genetic distance matrix was performed using the unweighted neighbor joining (UNJ) algorithm, while principal coordinate analysis (PCoA) was conducted in DARWIN ver. 5.0 software41. Population structure analysis utilized a Bayesian clustering approach implemented in Structure 2.142, with five independent runs. Each run consisted of 50,000 burn-in iterations followed by 50,000 Markov Chain Monte Carlo (MCMC) iterations for each KKK value (number of populations), ranging from 2 to 10. The optimal number of populations was determined using Structure Harvester (web version 0.6.94)43. Analysis of molecular variance (AMOVA) was conducted to assess genetic variation within and between the estimated subpopulations (K) using GeneAlex Excel add-in software v.6.544.

Linkage disequilibrium analysis was performed using the GAPIT package within the R software environment. Squared allele frequency correlations (r2) with p-values < 0.001 were used to identify significant LD between locus pair. The LD pattern was visualized with a heatmap generated using the LDHeatmap package in R45. Additionally, a DArTseq marker density plot was created using the R package "CMplot".

Genome-wide association study (GWAS)

Genome-wide association studies (GWAS) were conducted using the Genomic Association and Prediction Integrated Tool (GAPIT) in R46. Marker-trait association (MTA) analysis was performed using two single-locus methods, the general linear (GLM) and mixed linear (MLM) models, and three multi-locus GWAS models: FarcmCPU47, BLINK48 and MLMM49. Manhattan and QQ plots were generated using R packages, qqman version 0.1.431. The BLUE values for each trait across both environments, along with the stress-tolerance index (STI) for each trait, were used as phenotypic data for GWAS. Population structure and the kinship similarity matrix were incorporated into the analysis based on the requirements of each model. Significant MTAs were identified for markers that exceeded the false discovery rate threshold (P = 0.05), with a significance level of P ≤ 0.0001 (log10 P ≥ 4).The flanking sequences of significant MTAs (within 10 K upstream and downstream windows) were extracted from the reference genome of Carthamus tinctorius20. These sequences were analyzed using BLAST against the NCBI database to annotate potential candidate genes associated with traits of interest.

Results

Phenotypic variation of safflower genotypes

In this study, diverse safflower genotypes were evaluated for seed yield, oil content, and morphological traits across multi-year field trials (2016–2018) conducted under drought-stressed and non-stressed conditions. These trials aimed to investigate genotype responses to water-limited and irrigated environments, providing critical insights for breeding drought-tolerant safflower varieties. Analysis of variance (ANOVA) showed a significant influence of the growing year on all traits (Table 1), underscoring the impact of environmental factors such as rainfall patterns, temperature fluctuations, and soil moisture levels specific to each growing season. Furthermore, highly significant (p < 0.01) variation in seed yield and all other traits was observed among genotypes under both drought-stressed and non-stressed conditions, indicating that genetic differences play a key role in determining trait expression across water regimes.

Table 1 Combined analysis of variance with mean squares of 10 morphological traits of 90 safflower genotypes evaluated in three consecutive year under two different water regimes.

The significant three-way interaction (p < 0.01) among genotype, water regime, and year highlights the complexity of genotype-by-environment (G × E) interactions, emphasizing how genotype performance is influenced only by water availability but also by specific environmental conditions across year. Descriptive statistics based on the best linear unbiased estimator (BLUE) values for seed yield, oil content, and other yield-related traits are presented in Table 2. Under irrigated conditions, seed yield ranged from 590.84 to 2043.18 kg/ha, with a mean of 1115.02 kg/ha, and exhibited relatively low heritability (0.14). In the rainfed (drought-stressed) environment, seed yield showed a significant reduction, ranging from 356.02 to 1165.33 kg/ha, with an average of 673.26 kg/ha and a slightly lower heritability (0.12). This reduction in seed yield under drought stress underscores the sensitivity of safflower genotypes to water limitation, aligning with findings from other drought-stressed crops where yield losses are attributed to insufficient water during critical growth stages. Additionally, significant differences among genotypes were observed for yield-related traits in both environments, with consistently lower values recorded under rainfed conditions.

Table 2 BLUE values (mean, minimum and maximum), coefficient of variation (CV), heritability (h2 ) and drought-tolerance coefficient (DC) for measured agronomic traits in 90 safflower genotypes under rainfed and irrigated environments.

Interestingly, seed oil exhibited an opposite trend compared to seed yield, showing higher mean value under rainfed conditions. Oil content ranged from 19.11 to 39.95% with an average of 29.49% under drought-stressed conditions, compared to a range of 21.61 to 36.69% and an average of 27.60% under irrigated conditions (Table 2). This increase in oil content under water stress suggests that certain genotypes may enhance oil accumulation as a stress response, a trait of potential value for breeding drought-resilient safflower varieties. The coefficient of variation (CV) and heritability estimates were generally higher for seed yield and related traits under irrigated conditions, indicating more stable trait expression in favorable environments. This stability, along with reduced environmental variation under irrigation, may enable more predictable selection outcomes for yield-related traits in well-watered conditions. Conversely, the increased variability under drought stress highlights the need to incorporate stress-specific traits or indices, such as the stress-tolerance index (STI), to effectively identify genotypes that combine high yield potential with resilience to water limitation. Overall, measured traits exhibited higher CVs in the irrigated environments compared to the rainfed environments, while heritability estimates were also greater under irrigation (Table 2). Drought-tolerance coefficient (DC) values for various traits revealed distinct distributions among genotypes (Supplementary Table S3). The mean DC value for measured traits in tested safflower genotypes are presented in Table 2. Traits such as number of heads per plant (NHP), relative water content (RWC), head diameter (HD), number of seeds per head (NSH), and 1000-seed weight (TSW), which had DC values below 0.50, were particularly sensitive to water stress.

Under irrigated conditions, seed yield (YP) showed positive correlation with the number of heads per plant (NHP), number of seeds per head (NSH), 1000-seed weight (TSW), and oil content (OIL). Additionally, a significant positive correlation was observed between OIL and both head diameter (HD) and NSH. While TSW was positively correlated with NHP, it exhibited a negative correlation with NSH (Fig. 1). In the supplementary-irrigation environment, seed yield (Ys) was positively correlated with TSW and NSH, whereas OIL displayed a significant positive correlation with HD and a significant negative correlation with plant height (PH) (Fig. 1). Principal component biplot analysis further illustrated the relationships among seed yield, oil content, and agro-morphological traits (Fig. 2). Under supplementary irrigation, the first principal component (PC1) accounted for 36.7% of the total variation, with NSH, PH, and HD as key contributors, while the second component (PC2) explained 21.2%, primarily influenced by seed yield (Yp), NHP, and TSW (Fig. 2). Similarly, in the rainfed environment, principal component analysis (PCA) revealed that PC1 and PC2 explained 29.4% and 16.2% of the variation, respectively. Key contributors to PC1 included seed yield (Ys), NHP, HD, days to maturity (DTM), and NSH (Fig. 2).

Fig. 1
figure 1

Correlation coefficients between seed yield and agro-morphological traits under irrigated (a), and rainfed condition (b) based on BLUE value of the measured traits of 3 years from 2016–2018. *significance at level of 0.05; **significance at level of 0.01; ***significance at level of 0.001.

Fig. 2
figure 2

Principal component bi-plot analysis of 90 safflower genotypes grown during three seasons (2016–2018) under the irrigated and rainfed environments.

Cluster analysis based on the BLUEs values of measured traits under both irrigated and rainfed conditions grouped the 90 safflower genotypes into four and three clusters, respectively (Fig. 3). Under rainfed conditions, the genotypes were grouped into three clusters (Fig. 3a). Clusters I and II comprised 32 and 21 genotypes, respectively, which exhibited relatively higher seed yields compared to the 37 genotypes grouped into cluster III (Table 3). Under an irrigated conditions, the genotypes were divided into four distinct clusters (Fig. 3b). Cluster I and III contained 33 and 30 genotypes, respectively, which were characterized by low numbers of seeds per capitula and low 1000-seed weight, resulting in lower seed yields. However, these genotypes demonstrated higher seed oil content. In contrast, clusters II and IV comprising 24 and 3 genotypes, respectively, displayed high seed yield and related traits but relatively lower oil content (Table 3). Notably, the three genotypes in cluster IV were identified as superior genotypes for seed yield under irrigated conditions. Across both environments, there was no correlation between genotype clustering based on measured traits and their geographical origins.

Fig. 3
figure 3

Cluster analysis of 90 safflower genotypes under rainfed (a) and irrigated (b) environments based on BLUEs value of seed yield and related traits.

Table 3 Mean of clusters for measured agronomic traits in 90 safflower genotypes under rainfed and irrigated environment.

To further classify the genotypes into drought-susceptible and drought-tolerant categories, cluster analysis was conducted using drought-tolerance coefficient (DC) and stress-tolerance index (STI) values. This analysis grouped the 90 safflower genotypes into four distinct clusters, each representing different levels of drought tolerance (Fig. 4).

Fig. 4
figure 4

Cluster analysis of 90 safflower genotypes based on drought-tolerance coefficient (DC) and stress-tolerance index (STI) values. Cluster I, Cluster II, Cluster III and Cluster IV represent different drought resistance levels.

Group I primarily consists of drought-susceptible genotypes, characterized by low drought-tolerance coefficient (DC) and stress-tolerance index (STI) values. These genotypes experience significant yield reductions under drought stress, indicating high sensitivity to water deficits. Group II includes genotypes with moderate drought tolerance. While these genotypes perform reasonably well under optimal conditions, their yield declines under drought stress, classifying them as susceptible but less severely impacted than those in Group I.

Group III represents moderately drought-tolerant genotypes, which maintain relatively stable performance across both drought-stressed and irrigated conditions. These genotypes have DC values near the threshold for tolerance, reflecting their intermediate resilience. Group IV comprises highly drought tolerance genotypes, demonstrating minimal yield loss under drought conditions. These genotypes exhibit high DC and STI values, indicating strong adaptability to water-limited environments. In Group IV, key yield-related traits such as seed yield, 1000-seed weight, and number of seeds per head shows less variation between environments, highlighting the ability of these genotypes to maintain yield stability despite water stress. Conversely, drought-susceptible genotypes in Groups I and II exhibit substantial yield reductions and lower DC values. These genotypes also have lower STI values, underscoring their limited capacity to adapt to drought stress. Traits such as head diameter and relative water content (RWC) tend to decline more sharply in these groups, further reflecting their vulnerability to water stress.

The high STI and DC values observed in Group IV suggest that drought tolerance in these genotypes may be linked to specific physiological adaptations, such as the ability to maintain relative water content (RWC), a key trait associated with drought resilience. Notably, genotypes G33, G52, G51, G13, and G18 stand out for their superior STI and DC values, indicating exceptional drought tolerance. These genotypes may possess unique genetic traits, such as efficient water use or mechanisms to reduce transpiration under stress, making them valuable candidates for drought-resilient breeding programs.

The absence of strong correlation between geographic diversity and phenotypic clustering based on DC and STI suggests that drought tolerance traits are not strictly tied to geographic origin. This finding is advantageous for breeding programs, as it allows the exploration of genetic diversity across regions without constraints imposed by origin-based clustering.

DArTseq marker distribution

DArTseq genotyping technology was utilized to generate SNP and SilicoDArT markers for a panel of 90 safflower genotypes. This process produced a total of 19,639 markers, comprising 10,130 SilicoDArTs and 9,509 SNPs. After applying quality control filters (i.e., MAF > 0.1 and missing data per genotype < 10%) and aligning the markers on the 12 pseudochromosomes of the draft safflower genome20 (Carthamus tinctorius reference genome assembly GCA_018320725.1), a total of 7029 markers (2193 SNP and 4836 SilicoDArT) were retained for furthers analysis (Supplementary Table S4). The number of DArTseq per pseudochromosome ranged from 447 on Chr1 to 739 on Chr5, with an average of 585.7 markers per pseudochromosome (Table 4). The chromosome size covered by DArTseq markers varied from 66.76 Mbp on Chr1 to 106.47 Mbp on Chr3, corresponding to an average of 65.59 markers per Mbp across the entire genome (Table 4).

Table 4 Polymorphism information content (PIC), number and distribution of DArTseq markers on 12 pseudochromosomes of safflower.

Population structure, genetic diversity and LD

Kinship coefficient values, calculated based on the average dissimilarity between genotypes, ranged from 0.18 to 0.63, with an average value of 0.48. A heatmap illustrating the relationship among safflower genotypes revealed three distinct groups (Fig. 5a). Additionally, the population structure, inferred using on Bayesian model, is visualized at K = 3 (Fig. 5b,c). The three subpopulations contained 21.2% (POP1), 23.4% (POP2) and 55.5% (POP3) of the safflower genotypes, with expected heterozygosity (genetic divergence) value of 0.05, 0.31 and 0.21 for POP1, POP2 and POP3, respectively. Cluster analysis using the Unweighted Neighbor-Joining (UNJ) algorithm also grouped the population into three subpopulations, consistent with the results of population structure analysis. Cluster I contained 21 genotypes from various countries, while cluster II and III included 29 and 40 genotypes, respectively (Fig. 5d). Overall, there was no correlation between genotype clustering and geographic origin. However, in clusters II and III, genotypes from Iran and Turkey tended to cluster together, while those from India were grouped closely. Analysis of molecular variance (AMOVA) revealed that the majority of genetic variation (85.5%) occurred among genotypes within populations, with a smaller proportion (14.5) attributed to variation among groups (Table 5).

Fig. 5
figure 5

Heatmap of genomic relationships based on genetic distances (a), Delta K (b), Bayesian model-based genetic structure analysis (c) and neighbor-joining cluster analysis (d) show the three subpopulations in the main population of 90 safflower genotypes based on 7029 DArTSeq markers.

Table 5 Analysis of molecular variance (AMOVA) within and between populations (K = 3) of safflower genotypes based on 7029 DArTseq markers.

Linkage disequilibrium (LD) for DArTseq markers with a minor allele frequency (MAF) > 10% was measured using pairwise marker r2 across the genome. LD was observed to be extensive, encompassing 350,176 marker pairs, of which 36,752 (10%) intra-chromosomal pairs exhibited significant level (P < 0.001). The mean r2-value and critical r2 were 0.11 and 0.14, respectively (Supplementary Figure S1). LD decay was observed over a distance of 1.86 MB, indicating that SNPs within this range are likely to behave as a single inheritance block.

Marker–trait associations (MTAs)

A multi-model GWAS analysis, incorporating both single-locus (GLM and MLM) and multi-locus (MLMM, FarmCPU and BLINK) identified significant marker-trait associations (MTAs) for morphological traits and the stress tolerance index (STI) of each trait. Among these models, FarmCPU demonstrated high computation efficiency by effectively controlling false positive and mitigating their effect in GWAS. Consequently, GWAS results using the FarmCPU model, along with the BLUE values for each trait, were used to identify significant MTAs. The significant MTAs for morphological traits and STI values are summarized in Tables 6 and 7, respectively. All genomic positions for markers and genes referenced in this manuscript are based on the Carthamus tinctorius reference genome assembly GCA_018320725.1, previously reported by Wu et al.23.

Table 6 List of the significant markers associated with seed yield and its component traits from genome-wide association analysis of 90 safflower genotypes under rainfed and supplemented irrigation conditions.
Table 7 List of the significant markers associated with stress-tolerance index (STI) value of measured traits from genome-wide association analysis of 90 safflower genotypes.

Under supplemented irrigation conditions, 31 significant MTAs (P ≤ 0.0001) were detected across all chromosomes except for chromosome 3 (Table 6). These significant MTAs are illustrated using Manhattan and QQ plots (Fig. 6; Supplementary Figure S2). These significant MTAs are illustrated using Manhattan plots and QQ plots (Fig. 6; Supplementary Figure S2). Chromosome 4 contained the highest number of significant MTAs (7), followed by chromosomes 1 and 10 (4 each), chromosomes 5, 6, and 8 (3 each), chromosomes 2 and 9 (2 each), and chromosomes 7, 11, and 12, each with one significant MTA. The highest number of significant MTAs were identified for plant height (PH), totaling six MTAs distributed across four chromosomes (4, 5, 6, and 10). Additionally, four MTAs (Cart-SNP212, Cart-SNP356, Cart-SNP3879, and Cart-SNP6353) were significantly associated with seed yield (SY) under irrigated conditions.

Fig. 6 
figure 6

Circular Manhattan plot representing the markers-traits association for morphological traits in 90 safflower genotypes under supplementary irrigation environment. The trait indicators used for constructing the circular Manhattan plot from outside to inside are TSW, Yp, RWC, PH, OIL, NSH, NHP, HD, DTM and DTF, respectively.

The significant MTAs and their associated parameters for measured traits under rainfed conditions are summarized in Table 6. Under drought stress (rainfed condition), 35 significant MTAs (P ≤ 0.0001) were identified for 10 traits. The associations are illustrated using Manhattan and QQ plots (Fig. 7; Supplementary Figure S3). Significant MTAs were identified across various chromosomes, with the exception of chromosomes 7 and 12. The highest number of associations was observed on chromosomes 4, 6 and 10, each with six significant MTAs, followed by chromosome 5 with five MTAs (Table 6). Four MTAs (Cart-SNP212, Cart-SNP665, Cart-SNP1893 and Cart-SNP3025 ) showed significant association with seed yield (SY) under rainfed conditions. Only a limited number of MTAs were shared between irrigated and rainfed conditions, including DTM (Cart-SNP2139), NHP (Cart-SNP2113 and Cart-SNP3276), TSW (Cart-SNP4990) and SY (Cart-SNP212).

Fig. 7 
figure 7

Circular Manhattan plot representing the markers-traits association for morphological traits in 90 safflower genotypes under rainfed environment. The trait indicators used for constructing the circular Manhattan plot from outside to inside are DTF, DTM, HD, NHP, NSH, OIL, PH, RWC, Ys and TSW, respectively.

Among the identified MTAs, one pleiotropic MTA (Cart-SNP2549) was detected under both conditions. This pleiotropic MTA, located on the chromosome 5, was associated with relative water content (RWC) and plant height (PH ) (Table 6).

A total of 45 MTAs were identified across the safflower genome for the analyzed STI values at a significance threshold of Log10p ≥ 4.0 (Table 7; Fig. 8). Significant MTAs were detected on all chromosomes. The highest number of MTAs was observed for STI-SY and STI-HD (6 each), followed by STI-TSW, STI-NHP and STI-DTM (5 each), STI-RWC, STI-PH and STI-DTF (4 each), STI-OIL and STI-NSH, with 3 significant MTA each (Table 7). Among these MTAs, 12 overlapped with MTAs identified for corresponding traits under drought-stress or non-stress conditions (Table 7), suggesting their stability. These stable markers were subsequently used for the identification of candidate genes. The locations of the significant MTAs were examined within 10 kb window on each side using the safflower reference genome (Carthamus tinctorius reference genome assembly GCA_018320725.1) to identify candidate genes associated with STI values. Gene ontology analysis for STI-DTF, STI-DTM, STI-HD, STI-NHP, STI-PH, STI-TSW and STI-SY traits identified 12 markers containing overlapping genes (Table 8). These MTAs were linked to several biological and molecular processes, including genes encoding biotin carboxylase, dehydrogenase gene, chlorophyll a-b binding protein, serine/threonine-protein kinase, zinc finger domain-containing protein, diacylglycerol acyl transferase gene and carboxylesterase.

Fig. 8 
figure 8

Circular Manhattan plot representing the markers-traits association for STI values of traits. The trait indicators used for constructing the circular Manhattan plot from outside to inside are STI-SY, STI-TSW, STI-RWC, STI-PH, STI-OIL, STI-NSH, STI-NHP, STI-HD, STI-DTM and STI-DTF, respectively.

Table 8 Significant MTAs and candidate genes identified as associated with stress-tolerance index (STI) of morphological traits based on GWAS of 90 safflower genotypes.

Discussion

In Iran, Safflower is widely cultivated in dry and semi-dry regions, where it is typically sown in winter or early spring to maximize the use of available water for germination and vegetative growth. This study assessed drought tolerance in a diverse global collection of safflower genotypes under rainfed and supplementary irrigation conditions. Average monthly rainfall and temperature data recorded during the safflower growing seasons from 2016 to 2018 indicated wet condition in early spring, followed by a dry period from late spring to early summer. As a result, the experimental site experienced drought stress in May, coinciding with the flowering stage. The study revealed significant genotype-by-environment (G × E) interactions for all the traits analyzed. High G × E interactions for seed yield and morphological traits across different environments were consistent with findings from previous studies on safflower19,29.

Substantial differences were observed between the minimum and maximum values of seed yield and its component traits under rainfed and supplementary irrigation conditions, emphasizing the detrimental effects of water stress on all the measured traits. These findings align with previous studies on safflower subjected to water stress across various nvironments10,11,50,51. Understanding the heritability of traits and their underlying additive or non-additive gene actions offers valuable insights for successful trait transmission across generations. This knowledge is critical for designing effective breeding programs11,52.

The results of this study across both environments demonstrated high heritability for traits such as day to flowering (DTF), day to maturity (DTM) and thousand-seed weight (TSW) (Table 2), indicating strong genetic inheritance of these traits. Previous studies on safflower field evaluations in various environments have suggested that these highly heritable traits predominantly exhibit additive gene action. This additive effect likely plays a more significant role than non-additive effects in the genetic control of seed yield and its componentsincluding days to maturity, flowering time, and plant height under drought stress conditions11,53. consequently, selection strategies focused on these traits in safflower can be effective for developing high-yield, drought tolerance genotypes. Correlation coefficients between seed yield and its component traits were assessed in both environments. The results revealed a positive, though moderate, correlation (r2 = 0.37), between seed yield under stress and non-stress conditions. This finding suggests that selection solely on high yield performance under optimal conditions may inadvertently favor susceptible, low yield genotypes under drought stress. Previous studies in various crops have similarity shown that yield performance in optimal environments does not necessarily correlate with seed yield under drought stress conditions54,55.

Seed yield under non-stress conditions showed a positive correlation with number of heads per plant (NHP), number of seeds per head (NSH) and 1000-seed weight (TSW) (Fig. 1). Similarly, under water stress condition, seed yield was positively correlated with TSW and NSH, suggesting that selection safflower genotypes based on these traits could enhance yield under both water-stress and non-stress environments. Additionally, PCA-biplot analysis in both environments identified NHP, NSH, TSW and seed yield (Yp and Ys) as the most effective traits for assessing diversity and characterization safflower genotypes in this study. These finding are consistent with previous studies in safflower11,56.

Drought tolerance is a highly complex traits governed by multiple genes affecting seed yield and related characteristics. Consequently, direct yield and morphological traits alone may not suffice to identify drought tolerance genotypes. Drought tolerance indices, such as Stress Tolerance Index (STI )and Drought Coefficient (DC), are widely used in evaluation crops under abiotic stress conditions15,33. Successful breeding programs depend on selecting appropriate genotypes based on their trait performance and genetic potential. In this study, genotypes G33, G52, G51, G13 and G18, which exhibited high values for both STI and DC, were identified as superior for seed yield and positively associated traits such as TSW and NSH under both experimental conditions. These genotypes are considered stable and may possess unique genetic traits, such as efficient water use or mechanisms for reducing transpiration under stress, making them valuable candidates for breeding programs. The observed genotype-by-environment (G × E) interactions indicated that genotypes in Group IV (based on DC and STI values) likely harbor genetic mechanisms supporting enhanced phenotypic plasticity under varying environmental stresses. The classification of genotypes based on drought tolerance, as shown in Fig. 4, along with detailed trait data in Supplementary Table S3, provides a valuable resources for agronomists and breeders. Drought-tolerant genotypes demonstrating strong agronomic performance under both stress and non-stress conditions hold significant promise for sustainable production in water-scarce environments. This classification also offers insights into regional adaptability and helps inform resource allocation decisions for future breeding programs.

No significant correlation was observed between the clustering of genotypes based on drought stress values and the genomic relationships derived from DArTseq data. However, genotypes from specific clusters (e.g., those grouped by drought tolerance or susceptibility) may exhibit patterns of genetic similarity within the broader population structure or subgroups identified by the Bayesian clustering model (K = 3).

Genome-wide association studies (GWAS) are powerful and efficient tools for dissecting the genetic architecture of complex traits, such as seed yield and agronomic characteristics, under diverse environmental conditions in safflower19,29 and other crops. In this study, GWAS analysis were conducted to identify both environmentally stable and environmental-specific MTAs, providing valuable insights into the genetic basis of important traits and their genotype-by-environment (G × E) interactions29,57. We employed two single locus models (GLM and MLM) and three multi-locus models (MMLM, FarmCPU and BLINK), all of which are recognized for their reliability in GWAS analyses. Additionally, DArTseq markers, known for their high genome coverage and resolution, were used to analyze complex traits across various plant species effectively36,38.

The mean call rate (92%) and reproducibility (99%) of the DArTseq markers used in this study are consistent with previous reports, demonstrating the high quality and reliability of this marker set for association mapping. GWAS was conducted on 90 globally diverse safflower genotypes using 7029 DArTseq markers, which were mapped on 12 pseudochromosomes based on alignment with the safflower reference genome20.

At a significant level of p < 0.0001, a total of 31 and 35 MTAs were identified as associated with traits under irrigated and rainfed conditions, respectively (Table 6). In line with many studies on various plants species, the present study found that drought-stressed environments often yield more MTAs than irrigated environments, likely due to the stronger influence of environmental factors on genotype expression. For all traits, the majority of MTAs were specific to a single environment, suggesting that these site-specific MTAs are linked to high G × E interactions. This information can help breeders understand key traits interactions and guide targeted breeding efforts in specific environments29. These finding are consistent with previous reports on diverse safflower collections evaluated across different environments29,56.

Association analysis for seed yield identified distinct marker-trait associations (MTAs) in both environments, located on chromosomes 1, 2, 4, 6, 7 and 11. Among these, one MTA, Cart-SNP212, on chromosome 1, was share between both environments (Table 5). Previous studies have reported several markers associated with seed yield in safflower under drought conditions on chromosomes 6, 9 and 1229,58.

Additionally, one MTA (Cart-SNP4990) for 1000-seed weight was identified on chromosome 9, two MTAs, Cart-SNP2113 and Cart-SNP3276, located on chromosomes 4 and 6, respectively, were associated with the number of heads per plant, while another MTA (Cart-SNP2139) on chromosome 4 was associated with day to maturity. These MTAs were shared across both rainfed and supplementary irrigation environments. These findings provide valuable genetic resources for breeders aiming to improve desirable traits in safflower under diverse environmental conditions29. In our study, the locus Cart-SNP2549 was associated with relative water content (RWC) in the irrigated environment and with plant height (PH) in the rainfed environment, indicating a pleiotropic effect. This suggests linkage to different QTLs influencing both traits. Furthermore, Shared major genes or QTLs for different traits, such as day to flowering (DTF) and plant height (PH), have been previously reported in safflower29.

We observed site-specific MTAs for all measured traits, likely due to significant genotype-by-environment (G × E) interactions and variations in gene expression levels across different environments29,32. Although the QTLs identified for most traits in this study were located on chromosomes previously reported in the literature, we were unable to confirm the exact positions of the MTAs compared to those earlier findings.

This discrepancy may stem from differences in the safflower germplasm, the type of molecular markers used, and the distinct climatic conditions of the field evaluation sites. Given the complex nature of drought stress tolerance in plants, relying solely on direct yield and morphological traits may not be sufficient to identify QTLs associated with drought tolerance effectively.

In this study, GWAS analysis using STI values identified 45 significant MTAS, 12 of which overlapped with significant MTAs for morphological traits across both environments. To further investigate the potential gene functions of these stable MTAs, we performed BLAST searches within a 10-kb flanking region around the markers. Most of the significant SNP markers identified in this study aligned with candidate genes implicated in responses to abiotic stress in closely related species of Cathamus tinctorius. For instance, CCCH zinc-finger proteins are known to play crucial roles in plant development and responses to abiotic stresses such as salt, drought, flooding, low temperatures and oxidative stress. Similarly, candidate genes such as serine/threonine-protein kinase and dehydrogenase have been associated with drought-tolerance in plants59. Furthermore, the diacylglycerol acyltransferase gene is recognized for its role in wax ester biosynthesis and osmotic stress responses60.

Conclusion

The genetic dissection of morphological traits, their genotype-by-environment (G × E) interactions, and the identification of QTLs associated with drought tolerance are essential for developing climate-resilient safflower varieties. In this study, 90 globally diverse safflower genotypes were evaluated for seed yield, seed oil content, and morphological traits under two water regimes. The performance of these traits was significantly influenced by both rainfed and supplementary irrigation environments. Correlation analysis and PCA-biplot analysis highlighted traits such as the number of heads per plant (NHP), number of seeds per head (NSH), 1000-seed weight (TSW), and seed yield (Yp and Ys) as effective indicators for assessing diversity and characterizing drought-tolerant safflower genotypes. GWAS analysis based on morphological data from both environments and on STI values identified 66 and 45 significant marker-trait associations (MTAs), respectively. The environment-specific MTAs underscored the substantial impact of G × E interactions on all measured traits. Notably, stable MTAs across environments and those associated with STI values revealed several candidate genes with potential roles in abiotic stress response, providing a solid foundation for molecular breeding efforts. The newly identified MTAs and their chromosomal locations enhance our understanding of the genetic architecture underlying drought tolerance in safflower, paving the way for marker-assisted breeding to develop varieties with improved drought resilience.