Introduction

Carrots are one of the most popular vegetables in the world and are rich in nutrients in their roots1. Bolting is a natural phenomenon in the process of flower bud differentiation and flowering in vegetables2. As a biennial plant, carrot mainly experiences a nutritive growth phase in the first year. After a period of vernalization at low temperature, carrot begins to bolt under suitable environmental conditions. Bolting and flowering represent pivotal stages in the seed production and breeding of carrot3. Environmental factors, such as low temperature and long days can induce flower bud differentiation and promote bolting and flowering4,5. However, during the planting process, early bolting causes lignification in the fleshy root of the carrot, resulting in reductions in production and quality and a loss of commercial value6. Therefore, the systematic identification and screening of core germplasm resources with bolting-related traits for bolting tolerance are critical steps for carrot breeding.

Bolting is one of the most important agronomic traits affecting the quality and yield of crops, and it determines their growth season and cultivation area7. The xylem of carrot lignifies immediately after vernalization, a transformation that significantly impacts palatability and leads to a reduction in commercial value8,9. Bolting is regulated by multiple regulatory pathways as well as environmental conditions4. These include the photoperiod, vernalization, gibberellic acid, autonomy, age, and temperature pathways10. Flowering time is influenced by both endogenous and exogenous signaling, via joint control mechanisms11. In summary, carrot bolting represents a quantitative genetic trait influenced by intricate environmental conditions and internal signaling pathways. Consequently, the identification of critical genes and molecular markers underlying this phenomenon is an essential prerequisite for the development and preservation of high-quality, commercially viable anti-bolting cultivars.

Current research on the molecular mechanisms of bolting has primarily focused on model plants and crops. Most of these genes from different species occur in the main flowering pathway, including flowering-time genes (FTGs)12 and the flowering integration genes LFY (Leafy)11,13, TFL1 (Terminal Flower 1)14, SOC1 (Suppressor of overexpression of constans), and FT (Flowering Locus)15. Studies on carrot have revealed the effects of low-temperature vernalization and the photoperiod pathway in regulating bolting traits16, as well as the expression pattern and underlying mechanisms of FLC (Flowering Locus C)17, CO (Constans)18, GI (Gigantea), and VIN3 (Vernalization insensitive 3)19. RNA-seq analysis of flowering pathways in eastern and western carrot varieties showed that differentially expressed genes in the photoperiod and vernalization pathways were upregulated in western varieties, while autonomous pathways (flowering site D (FLD), luminescence dependence (LD), flowering site K (FLK), and phospholipid binding protein (PEBP)) were significantly enriched in eastern varieties20. Currently, the molecular mechanisms underlying bolting in carrots remain largely unexplored. Therefore, exploring the molecular mechanism and inheritance patterns associated with carrot bolting provides valuable information for future molecular breeding to select appropriate genes for bolting-resistance traits and improving bolting resistance without affecting plant yields and quality21.

Population genomics integrates the research fields of population genetics and genomics. It provides a more thorough and systematic understanding of variation patterns, evolutionary mechanisms, and the relationships between genes and phenotypes within the genome by analyzing the genomic data of large numbers of individuals22. Genome-wide association studies (GWAS) involve sequencing natural populations, and this represents is a system biology method used to detect valuable single-nucleotide polymorphism (SNP) variations23. With the rapid development of molecular marker and sequencing technologies, GWAS has been widely applied to analyze the genetic mechanisms underlying plant phenotypic trait variations, such as a GWAS of grain color and tannin contents in Chinese sorghum24. Further, GWASs have been widely applied to rice, corn, wheat, soybean, cotton, and other types of crops to study their agronomic traits25. GWASs can be used to uncover differences in phenotypic changes at the genetic level, identifying SNP loci and candidate genes that contribute to traits that can improve breeding efficiency.

In this study, a total of 240 diverse carrots were employed to systematically assess phenotypic variation and estimate genetic parameters for traits including bolting time, bolting rate, and bolting height. GWAS were conducted using whole-genome resequencing data to identify SNPs significantly correlated with bolting-related traits, completing the identification of candidate genes. Additionally, public transcriptomic datasets and quantitative real-time PCR (qPCR) analyses were utilized to validate the expression profiles of key genes implicated in bolting. The findings aim to assist the research on molecular mechanism of carrot bolting and the molecular marker-assisted breeding to enrich anti-bolting properties in carrots.

Materials and methods

Plant materials and phenotyping

In total, 240 carrots from the Center of the Vegetable Technical Service (Xining, China) were used in this study (Supplementary Table S1). The taproots of these carrots exhibited a variety of root tip shapes (pointed, blunt pointed, and circular) and fleshy root colors (white, yellow, orange, orange red, red, magenta, and purple).

From 2021 to 2022, 240 carrots were grown in four different locations in Qinghai province, as follows: April 2021, Tianjiazhai Town, Huangzhong District [N36°27′40″, E101°47′46″]; June 2021, Biopark District [N36°41′34″, E101°45′19″]; April 2022, Lanlongkou Town, Huangzhong District [N36°46′24″, E101°29′34″]; June 2022, Xingquan Town, Huangyuan County [N36°43′34″, E101°12′58″]. All experiments were conducted using a randomized block design. Strip planting was adopted as the sowing method, and the land preparation method was ridge covering with plastic film. The ridge height was 20 cm, the ridge width was 80 cm, and the plant spacing was 5 cm; each ridge was planted in four 1.5 m-long rows. The seedlings were covered with non-woven fabric for moisture preservation during the emergence period, and the non-woven fabric was removed after the seedlings had completely formed. The fields were irrigated and fertilized according to standard local cultivation practices.

In four experimental areas, five lines of each variety were randomly selected for investigation. According to the description specification and data standard of carrot germplasm resources26, the seven agronomic traits such as Root tip phenotype (RTP), Central column phenotype (CCP), Epidermal traits (ET), Fleshy root color (FRC), Root thick (RT), Central column thick (CCT), Single root weight (SRW) were measured. Single root weight (SRW) was weighed using an electronic balance. Bolting time (BT) refers to the number of days from the date of planting to the start of bolting. Bolting rate (BR) is the ratio of the number of bolting plants to the total number of plants. Bolt height (BH) refers to the distance from the base of bolting to the top of the flower stem. Bolting speed (BS) is the ratio of bolt height to bolt time.

Genome resequencing

Genomic DNA extraction from young leaves of 240 carrots was performed using a new plant genomic DNA kit, following the manufacturer’s instructions. Genome sequencing was performed by Wuhan Maitewei Biotechnology Co. Ltd. DNA sequencing libraries were generated using the Illumina HiSeq PE150 sequencing platform. The experiment generated a total of 1.11 Terabytes (Tb) of raw data. We strictly filter the raw sequencing data using the following methods: (1) Remove read pairs containing adapter sequences; (2) Remove paired reads with N content exceeding 10% of the length ratio of the single ended sequencing read; (3) Remove paired reads containing low-quality (quality value Q ≤ 5) alkaline bases exceeding 50% of the length of the single ended sequencing read. After filtering, we obtained a high-quality cleaning sequence of 1.10 Tb.

SNP calling and population genetic analysis

We compared and located the clean data with the carrot reference genome (Daucus carota v2.0) using the BWA software (version 0.6, parameter: mem -t 4 -k 32 -M)27. The comparison results were deduplicated by SAMTOOLS28 (version 1.17, parameter: rmdup), and the size of the genome excluding the gap region was 386,795,034 bp. The average alignment rate of the population sample was 94.69%, and the average sequencing depth of the genome (excluding the gap region) was 9.89X (only considering reads with alignment quality greater than 0). The average coverage was 87.74% (covering at least one base) (Supplementary Table S4). Population SNPs were detected using SAMTOOLS software (version 1.17)28. To minimize the impact of false positives as much as possible, the SNPs were filtered with a single sample sequencing depth > 4×, a minor allele frequency (MAF) ≥ 0.05, and a missing rate < 0.2 to obtain high-quality SNPs. The SNP detection results were annotated using ANNOVAR (Version: 2013-06-21)29.

TreeBeST software (version 1.9.2, https://treesoft.sourceforge.net/treebest.shtml) was used to generate a neighbor-joining phylogenetic tree with the Kimura 2-parameter model, 1000 bootstrap replicates. The software MEGA (version 6.0, http://www.megasoftware.net/) was used for visualizing the phylogenetic trees30. GCTA software (version 1.24.2, http://cnsgenomic.comsoftware/gcta/) was used to perform Principal Component Analysis (PCA) to evaluate genetic structure31. The population genetics structure was examined using an expectation maximization algorithm as implemented in the program ADMIXTURE (version 1.23)32. PopLDdecay software (version 3.40) was used to assess the linkage disequilibrium (LD) coefficient (r2) between pairwise SNPs across the carrot genome33.

GWAS analysis and candidate gene identification

In the process of GWAS, individual kinship and population stratification represent primary sources of false positive associations. The Mixed Linear Model (MLM) approach addresses control of false positives by simultaneously accounting for population structure and individual kinship, thereby reducing computational burden while maintaining analytical accuracy. Utilizing SNP data derived from resequencing, we performed trait association analyses employing the genome-wide efficient mixed model association program (GEMMA, version 0.98.1, parameter: -bfile -p -c -k -lmm 1 -miss $miss -maf $maf) alongside the Mixed Linear Model34. In this model, population genetic structure was incorporated as a fixed effect, whereas individual kinship was modeled as a random effect to correct for their respective influences. Candidate SNPs were identified based on a significance threshold (P-value < 0.05/N). The significance threshold was estimated as − log10(P) = 5.0. The genomic regions within the LD block of the peak SNPs were selected to identify candidate genes. Visualization of the association results was conducted using Manhattan and quantile-quantile (Q-Q) plots generated via the qqman package in the R software35. Using lme4 software, a mixed linear model was used to calculate the BLUE values of carrot bolting traits36.

Transcriptome and quantitative real-time PCR analysis

To investigate the expression patterns of LOC108205243 in different carrot tissues, the RNA-seq data from a previous study were downloaded from the SRA database of NCBI (Accession: SRP062159)37. RNA-seq reads of all samples were mapped to the carrot genome by the Hisat2 tool of the TBtools software38. Then, the sequencing data was converted into gene expression matrices using the TBtools software. Subsequently, the TPM expression matrix was normalized using the formula log2 (TPM + 1) in R software. The expression matrix was visualized using Chiplot (https://www.chiplot.online/).

To investigate the role of candidate genes in the carrot bolting process, we selected three carrots (B228, B227, and 21DL-10) that exhibited high, medium, and low bolting speeds, respectively, for subsequent experiments. Leaf RNA extraction and cDNA synthesis were performed using OminiPlant RNA Kit (DNase I) from Cwbio (code: CW2598S) and MonScript™ RTIII All-in-One Mix with dsDNase kit from Monad (code: MR05101), following the manufacturer’s instructions. The real-time PCR reaction was performed on the ABI7500 real-time PCR system. QIAGEN 2x QuantiNova SYBR Green PCR Master Mix (code: 208054) was used with gene-specific primers (Supplementary Table S5). The qPCR cycling conditions were as follows: 95℃ for 30 s for pre-incubation, followed by 40 cycles of 95 ℃ for 10 s, 60 ℃ for 30 s and 72 ℃ for 30 s. The tubulin was used as an internal control for normalization. The relative transcription levels of the LOC108205243 gene relative to tubulin were calculated using 2−ΔΔCt methods, with 3 replicates per sample39.

Sequence analysis and of interacting proteins prediction

The sequences of the candidate genes were downloaded from the NCBI database (Daucus carota v2.0), and ClustalX (version 2.1)40 and MEGA (version 7.0)41 were used for multiple sequence alignment and phylogenetic analysis. Related sequences from Arabidopsis were obtained from the TAIR database (https://www.arabidopsis.org/). The protein interaction network of LOC108205243 and used the STRING database (https://string-db.org/cgi/input.pl) to predict the interacting proteins42.

Statistical analysis

SPSS software (version 17.0) was used to calculate the mean (x̄), maximum (max), minimum (min), and standard deviation (σ) of each trait43. The coefficient of variation (CV = (σ / x̄) × 100%) was computed in Excel. Based on the mean and standard deviation of the carrot germplasm data, the bolting-related traits were divided into 10 classes: class 1, Xi < (x̄ – 2σ); class 10, Xi ≥ (x̄ + 2σ); each class interval was 0.5σ. The diversity index (H') was then calculated as: H'= –Σ pi ln pi. Where pi is the percentage of total resources within level i of a certain trait. This study conducted correlation analysis on the data using SPSS software44. Specifically, the Spearman correlation coefficient was calculated, and a two-tailed significance test was conducted. And the data was subjected to the Least Significant Difference (LSD) test using SPSS software. The means and standard errors were calculated, and p < .05 was considered statistically significant. The statistical distribution graph of the phenotype values and correlation heatmap were plotted using Origin software.

Results

Genetic variation

In total, 240 carrots were selected for whole-genome resequencing analysis, and premature bolting during carrot cultivation was investigated. Our analyses identified three major quantitative traits, and the four bolting-related traits. We conducted an analysis of the frequency distribution for seven agronomic traits across a sample of 240 carrot specimens (Fig. 1). Those traits displayed a unimodal continuous distribution, with traits other than bolting rate approaching a normal distribution. These results align with the genetic properties associated with quantitative traits governed by polygenic inheritance. The results showed that the range of variation for seven indicators, among different carrots, was 22.78 ~ 137.86%, with a range of 52.60 ~ 533.83 g for the single root weight and a coefficient of variation of 49.94%. The bolt height exhibited a range of variation of 0–148.50 cm and a coefficient of variation of 71.59%. The maximum coefficient of variation for the bolting rate was 137.86%. This indicates that there was a significant difference in the single-root weight and bolt height among the different carrots. There was also a significant difference in bolt tolerance among the different carrots, indicating significant differences in the quality of the succulent roots. The diversity index of the seven traits showed significant differences, ranging from 1.38 to 2.09, indicating that the genetic diversity of the seven traits in the 240 carrots was relatively rich. The order of the diversity indices of the four bolting traits was as follows: bolting speed (1.75) > bolting height (1.59) > bolting rate (1.54) > bolting time (1.38). The diversity indices of the three fleshy root traits were ranked as follows: center column diameter (2.09) > root diameter (1.98) > single root weight (1.71). From this, it can be seen that the genetic diversity index associated with bolting speed was the highest among the bolting traits, and the genetic diversity index of the central column diameter was highest among the fleshy root traits.

The bolting conditions of the different varieties in the field were diverse (Fig. 2A). Bolting caused the carrots to lose their commercial value (Fig. 2B). A correlation analysis was conducted on the four bolting traits and seven agronomic traits of the carrots, as shown in Fig. 2C. The results showed a highly significant positive correlation between the bolting time (BT) and bolting rate (BR), bolt height (BH), and bolting speed (BS), with correlation coefficients of 0.39, 0.40, and 0.33, respectively. The correlation coefficients between the bolting rate (BR) and bolt height (BH) or bolting speed (BS) were 0.79 and 0.85, respectively. Compared with other bolting traits, the coefficient between the bolting height (BH) and bolting speed (BS) was the highest, at 0.98. Furthermore, the result revealed a strong positive association between bolting rate (BR) and both the central column phenotype (CCP) and the epidermal traits (ET) of fleshy roots. The fleshy root color (FRC) exhibited a significant negative correlation with root thick (RT) and central column thick (CCT). Additionally, a marked positive correlation was observed between the central column phenotype (CCP) and both bolting speed (BS) and bolt height (BH). An obvious correlation was also identified between central column phenotype (CCP) and single root weight (SRW). Agronomic traits that demonstrate significant correlations with bolting characteristics may be effectively utilized for the comprehensive selection of carrot germplasm exhibiting bolting tolerance.

Fig. 1
figure 1

Frequency distribution map of agronomic traits in 240carrots.

Fig. 2
figure 2

Correlation analysis of bolting related traits of carrot. (A) Bolting phenotype of carrot in plantation. (B) Taproot phenotype between bolting and non-bolting. (C) Correlation analysis of bolting related traits of carrot. * p<=0.05; ** p<=0.01.

Cluster analysis according to bolting-related traits

Cluster analysis of the four bolting correlations among the 240 carrots, divided into four major groups, was conducted using the Ward method, as shown in Fig. 3A. Based on the clustering results, a statistical analysis of the bolting correlations among the four groups was conducted (Supplementary Table S2). The first category included 71 carrots with a 2-year bolting rate of 0, belonging to an extremely tolerant to bolting genotype, and the calculated Best Linear Unbiased Estimator (BLUE) values were all 77.27. The second category included 122 carrots, and their average bolting rate, bolting height, bolting speed, and BLUE values were 105.09, which were higher than those of the other three categories of carrots. This indicates that this type of material has a short bolting time, high bolting rate, and fast bolting speed and belongs to the category of easy bolting carrots. There were 30 carrots in the third category, with a bolting rate that was lower than the average for all materials belonging to bolting tolerance materials. There were 17 carrots in the fourth category, which had the longest bolting time but a lower bolting rate, belonging to medium bolting tolerance carrots.

Fig. 3
figure 3

Population structure and linkage disequilibrium (LD) of 240 carrots. (A) Clustering heatmap of bolting time, bolting rate, bolt height, and bolting speed from 240 carrots. (B) SNP detection and annotations statistics in 240 carrots by resequencing. (C) PCA of all radish genotypes. The principal components (PC1, PC2 and PC3) were used to visualize the relationships among individuals and groups. Each point represents an independent carrot genotype. (D) The LD decay plot of 240 carrots by r2.

Population structure, and linkage disequilibrium (LD) analyses formatting of mathematical components

After sequencing the 240 carrots, we identified 50,596,401 SNPs, and after filtering, 19,056,107 high-quality SNPs remained. The filtered SNP sites were annotated using ANNOVAR software, and the annotation results are shown for 1,222,620 SNPs, which had been annotated SNPs (Fig. 3B). The SNPs were unevenly distributed across the nine chromosomes. The ADMIXTURE software was employed to analyze the population structure of this natural population. The analysis showed that the cross-validation (CV) error reached its minimum at K = 4, suggesting that the optimal number of subgroups is four (Supplementary Fig. S1). PCA was performed on 240 carrot SNPs, revealing that varieties from diverse countries and regions were intermingled without distinct geographical clustering (Fig. 3C). Additionally, a distance matrix was computed using TreeBest, which classified the 240 carrot accessions into four primary groups comprising 102, 55, 49, and 34 varieties, respectively (Supplementary Fig. S2). These findings collectively corroborate the reliability of the inferred population structure. Then, we estimated the LD of 240 carrots using 19,056,107 high-quality SNPs. The r² values were calculated by PopLDdecay, and LD decay plots were plotted. Employing an r² threshold of 0.1, the corresponding physical distance at which LD decayed was determined to be 35 kb (Fig. 3D). The smooth decay curve of LD indicates that SNPs are evenly distributed and have sufficient density on the genome. This result is also consistent with the slow LD decay caused by the presence of a large number of inbred lines (185/240, Supplementary Table S1) in the material.

Association mapping and screening of SNPs

Regarding the correlation between bolting-related marker-trait associations analysis among the 240 carrots, in total, nine significantly associated SNPs were detected at the applied significance threshold (− log10 P > 5) for the association analysis of the bolting speed in the year 2022 (Table 1). This implies that these parameters are closely related to the bolting speed. Notably, one leading SNP (33,254,067) was observed on chromosome (Chr) 1, which was selected for an association analysis the bolting speed and bolt height (Fig. 4A, B). At the same time, the BLUE value was also associated with this significant site in the GWAS (Fig. 4C). The related to bolting speed in the genome coordinate 33,254,067 becoming an interesting candidate SNP. Additionally, there were 36 and 47 significantly associated SNPs were detected for the association analysis of the BT (bolting time) and BR (bolting rate) in the year 2022, distributed on all 9 chromosomes (Supplementary Fig. S3A, B).

Based on the LD level, 93 total candidate genes were identified in the space defined by the ± 100 kb regions on either side of the nine peak SNPs (Supplementary Table S3). Furthermore, based on an analysis of the selected SNP with a peak position of 33,254,067 on chromosome 1, which was associated with 12 candidate genes, nine of the genes had annotations. These included LOC108193703, encoding an annotated mitochondrial zinc maintenance protein, and LOC108195189, encoding a pectinesterase inhibitor. Remarkably, the functional annotation of LOC108205243 indicated that it is an E3 ubiquitin-protein ligase, suggesting that it might be a potential candidate gene responsible for carrot bolting.

Table 1 Genome-wide SNPs around significant peaks associated with bolting speed traits of Daucus Carota in 2022 year.
Fig. 4
figure 4

GWAS analysis for bolting speed in carrot. (A) Manhattan plots (left) and quantile-quantile plots (right) depicting results of the GWAS for bolting speed in 2022 year. (B) Manhattan plots (left) and quantile-quantile plots (right) depicting results of the GWAS for bolting height in 2022 year. (C) Manhattan plots (left) and quantile-quantile plots (right) depicting results of the GWAS for BLUE. The x-axis depicts the physical location of SNPs across the 9 chromosomes of carrot, and the y-axis depicts the − log10 (P-value)

Identification of the candidate gene

The LD of LOC108205243 was shown in Fig. 5A. LOC108205243 was located 1360 bp upstream of the peak SNP (33304067). To further investigate the candidate gene LOC108205243, in terms of the regulation of bolting traits, protein structure analysis was performed (Supplementary Fig. S4), based on a typical ring domain structure. Protein interactions were predicted to explore the function of LOC108205243 proteins in the carrot network (Supplementary Fig. S5). The results indicate that LOC108205243 proteins may be co-expressed with transcription factors LONESOME HIGHWAY (LHW, DCAR_012241, DCAR_018203), CYCLIN-DEPENDENT KINASE INHIBITOR 7 (CDKi7, DCAR_022035, DCAR_025727), and other proteins. We further analyzed the correlation between the orthologs of the LOC108205243 proteins (Fig. 5B). Our results showed that LOC108205243 amino acid sequences in carrot share high homology with the E3 ubiquitin-protein ligase RFI2 (Red and Far-Red insensitive) in different species. AtRFI2 negatively regulates CO and FT to participate in the regulation of flowering in Arabidopsis45. We thus speculate that LOC108205243 is related to the bolting and flowering of carrots.

Furthermore, we selected three carrot varieties, B228, B227, and 21DL-10, which exhibit high, medium, and low bolting speeds, respectively. Subsequently, we measured their bolting heights (Fig. 5C). The transcript abundances of LOC108205243 in the three carrot genotypes were measured using quantitative real-time PCR (qPCR) analysis (Fig. 5D). The gene LOC108205243 exhibits its highest expression level in the 21DL-10 genotype, whereas it shows the lowest expression level in the B228 genotype (Fig. 5D). This observation shows a discordant relationship between gene expression and phenotypic traits. Such a pattern implies that the regulation of bolting characteristics by LOC108205243 is unlikely to follow a straightforward dose-dependent mechanism. Instead, it may involve complex regulatory processes including post-transcriptional inhibition, protein modifications, upstream negative feedback loops, and the buffering effects arising from interactions with other genes. The exact mode of action of the LOC108205243 gene still needs to be further validated through gene editing and other methods.

Fig. 5
figure 5

Characteristic analysis of LOC108205243 gene. (A) Regional Manhattan plots and linkage distribution (LD) heatmap of LOC108205243. (B) Multiple alignments of LOC108205243 (XP_0172308) and RFI2 proteins family with identified in different species. (C) Bolting speed and bolting height in different genotypes of carrot. (D) The relative expression level of LOC108205243 gene in 3 genotypes of carrot.

Expression patterns of the candidate gene

In order to investigate the role of LOC108205243 in regulating carrot bolting-related traits, expression patterns of the LOC108205243 gene in different tissue parts and developmental stages of carrots were analyzed using public RNA-seq data from the NCBI SRA database (Fig. 6). LOC108205243 was highly expressed in leaves stage 2 and leaves stage 3.

Fig. 6
figure 6

Expression of LOC108205243 gene in different carrot tissues. The color bars in the heatmap indicate the TPM values of log-transformed LOC108205243 gene. Sd1, Bud and Cal represented germinating seeds, 0.5 mm buds and callus, respectively. Br1 and Br2 represented the bracts of unopened flower and opened flower. Fl1 and Fl2 represented unopened whole flower and opened whole flower. Lf1, Lf2, Lf3 and Lfe represented leaves stage 1 (0.5–1 cm bud), leaves stage 2 (2–2.5 cm, unexpanded), leaves stage 3 (7–8 cm, unexpanded) and etiolated leaves, respectively. Pet, Hyp, Phl, Xyl, Rft, Pts, and Rts represented 10 cm petioles (from leaves stage3), hypocotyls, phloem, xylem, fibrous roots, stressed leaves (at reversible wilting point) and stressed roots, respectively. Ls1 and Ls2 represented stressed leaves of 2–2.5 cm and 7–8 cm at reversible wilting point.

Discussion

It is necessary to consider the biodiversity of carrots as a genetic source of useful and indispensable components for genetic diversity assessments of carrot breeding accessions46. In this study, a total of 240 carrots were collected, and the phenotypic analysis revealed an approximately normal distribution of their trait frequencies (Fig. 1). Furthermore, the analysis identified significant differences in genetic diversity and evolutionary directions among the samples. Strong characteristic associations between the root length, root diameter, and root weight and marketable yield have been suggested as a selection criterion for higher yields47. Bolting and flowering are the most important stages of the life cycle of a plant. Bolting significantly affects the quality and merchantability of carrot roots. Likewise, the bolting speed, height, rate, and time all exhibited the highest heritability and moderate genetic advances, and these were suggested to be the selection criteria for bolting (Fig. 2C). Bolting is closely related to plant reproduction, but the molecular mechanisms underlying plant flowering are complex48,49. A core collection of studies and those related to development have been applied to tea (Camellia sinensis)50, cowpeas51, alpine plums52, and walnuts53. Further, core germplasm resources are applied to relatively small populations with high genetic diversity54. Based on a correlation analysis of different bolting-related traits, we constructed a core collection. We identified 71 parts comprising markedly bolting-resistant carrots, 30 parts comprising resistant carrots, 122 parts comprising easy bolting carrots, and 17 parts comprising relatively easy bolting carrots (Fig. 3A). Core germplasm resources are invaluable fundamental resources for improving varieties55. The genetic diversity, population structure, and core collection elucidated based on bolting in this study will facilitate further genetic studies, germplasm protection, and carrot breeding.

Compared with traditional research based on classical QTL mapping, GWAS possesses the capability to identify genetic variations throughout the entire genome. It can reveal the genetic structure of quantitative traits more comprehensively. A population with abundant phenotypic and genetic variations was previously subjected to GWAS analysis56. Moreover, a GWAS and selective sweep research have revealed the origin of high-carotenoid orange carrots57. In previous studies, the root transcriptomes of widely differing cultivated and wild carrots were sequenced and introduced to optimize the identification of 11,369 SNPs58. This study used the GWAS method to conduct a correlation analysis of bolting and identified 1,222,620 total SNPs that had been annotated. Using the r² threshold of 0.1 and 0.2, we estimated that the physical distance corresponding to linkage disequilibrium (LD) decay is approximately 35 kb and 4.63 kb, respectively (Fig. 3D). A review of existing literature on LD in carrots revealed a limited number of studies addressing whole-genome LD decay. Notably, Clotault et al. (2010)59 and Soufflet-Freslon et al. (2013)60 reported no detectable LD decay within candidate gene fragments ranging from 0.7 to 1.0 kb and within a 4,234 bp sequence of the CRTISO gene, respectively. Wild carrots demonstrated a very rapid LD decay within the genome range of approximately 100 bp (r2 = 0.2) and 1 kb (r2 = 0.1), while domesticated carrots demonstrated a rapid LD decay between approximately 400 bp (r2 = 0.2) and 13 kb (r2 = 0.1)6. A comparable short-range LD decay estimate is provided by Brainard et al. (2022)61, who analyzed 146,816 SNPs and observed an LD decay distance of approximately 19.7 kb at r² = 0.1. This value is roughly 40% shorter than the 35 kb distance identified in the present study. The discrepancy is primarily attributed to the composition of the sample panel, where 77% of the lines (185/240) are inbred, resulting in elevated background LD and consequently extending the observed LD decay distance. Nine SNPs (Table 1) and 93 candidate genes (Supplementary Table S3) were significantly associated with the bolting speed in the data for the year 2022. SNP markers have been used in the genetic mapping of traits of interest in crop species and in molecular marker-based breeding62. The SNP genotyping method has been successfully applied to analyze the genetic diversity of carrots, for development of genomic resources63.

Historical documents and previous studies indicate that plant bolting and flowering are regulated by several endogenous signals64. Histone H4 prevents drought-induced bolting in Chinese cabbage65. A significant number of genes that play a role in the regulation of bolting and flowering have been identified in the model plant Arabidopsis66. Notably, FLOWERING LOCUS T (FT), SUPPRESSOR OF OVEREXPRESSION OF CO 1 (SOC1), and LEAFY (LFY) are recognized as key contributors to this process67,68. FT is located downstream of the key CO photoperiod pathway gene69. CO can activate FT, and the expression of a large amount of FT is induced in the blade via the phloem apical meristem, and then, this aligns with FD-interdependent partners through protein interactions to activate floral identity genes, such as LFY70,71. Our study identified the candidate gene LOC108205243 in carrots, which is associated with the bolting speed, as belonging to the homologous genes of the RFI2 family (Fig. 5A). Currently, research concerning RFI2 in plants remains limited. RFI2 plays a crucial role in modulating the photoperiodic flowering response in Arabidopsis72. It has the capacity to influence the expression of the genes CO and FT45. Investigations conducted on Arabidopsis and Fragaria vesca have demonstrated that CO and FT are integral to the regulation of plant bolting and flowering processes73,74. In rice, MiR528 has been shown to facilitate heading under long-day conditions through the suppression of OsRFI275. Furthermore, investigations in wheat have identified an interaction between the potential salt-tolerant early-maturing target gene TtPLA1-1 and TtRFI2L, the latter of which is notably enriched in gene ontology (GO) categories related to photoperiod regulation and flowering76. Similarly, the expression levels of the LOC108205243 gene have been found to correlate with the bolting characteristics of three distinct carrot varieties (Fig. 5B, C, D). In current genetic research concerning the vernalization requirements of carrots, the primary determinants influencing the flowering time of this species have been genetically localized to chromosome 24,77,78. In our study, the LOC108205243 gene was identified to be located on chromosome 1, whereas other candidate genes were situated on various chromosomes (Fig. 4). These genes may belong to homologous families or may participate in different regulatory pathways, which warrants further examination. Furthermore, the analysis of protein interaction indicates that the LOC108205243 protein, along with LHW, CDKi7, and other proteins, constitutes a protein interaction network that collectively impacts the bolting trait in carrots. This finding aligns with the results observed in the investigation of stem lodging resistance in ZS11-type Brassica napus79. We propose that LOC108205243 is a prime candidate for engineering, with the dual purpose of molecular breeding for resistance to bolting. Furthermore, drawing upon the significant SNPs identified in this investigation, a variety of high-throughput molecular markers, including Kompetitive Allele Specific PCR (KASP) and Cleaved Amplified Polymorphic Sequence (CAPS), can be developed. The integration of molecular marker-assisted selection with conventional breeding methodologies has the potential to substantially accelerate the breeding cycle and enhance the efficiency of germplasm improvement. Furthermore, with the increasing adoption of gene editing and related biotechnologies, these SNP loci are also anticipated to function as precise targets for genome editing, thereby advancing the precision breeding of carrots.

Conclusions

In summary, genetic variation provides insights into the correlations among bolting traits. Cluster analysis results emphasized the screening of 71 germplasm resources for resistance bolting. Additionally, through genome-wide association studies (GWAS) of 240 carrots,1,222,620 annotated SNPs were obtained for the correlation analysis. Nine of the obtained SNPs were significantly associated with bolting speed, and traits under multiple environments and 93 potential candidate genes were mined. The RING-domain zinc-finger protein LOC108205243 was determined to be primarily involved in the regulation of bolting traits. In conclusion, the SNPs and population gene resources developed in this study are valuable for research on molecular mechanism regulating bolting and for bolting tolerance breeding of carrot.