Introduction

The subgenus Cerasus, a major lineage within Prunus sensu lato, comprises ~70 species with ornamental, edible, and medicinal values. These species are distributed across temperate regions of Eurasia and North America, with centers of diversity in East Asia, including China, Japan, and the Korean Peninsula1,2,3. Numerous Cerasus species, including sweet cherry (P. avium), sour cherry (P. cerasus), and Chinese cherry (P. pseudocerasus), are of considerable economic importance with extensive cultivation4. In China, the cultivation of Chinese cherries dates back to the Zhou Dynasty, when they were regarded as precious offerings in imperial sacrificial ceremonies5.

Despite their long cultivation history and economic importance, the taxonomic classification and phylogenetic relationships within the subgenus Cerasus remain contentious3,6. Several species traditionally referred to as dwarf cherries (Microcerasus), including P. tianshanica, P. humilis, P. tomentosa, P. japonica, P. glandulosa, and P. pumila, have historically been treated as a part of Cerasus4,7. However, phylogenetic studies suggested that these species correspond more closely to sections Amygdalus, Armeniaca, or Prunus2,8,9 rather than forming a cohesive lineage within Cerasus. Although genome assemblies for several Cerasus species have become available, an overall phylogenetic framework within the group remains unresolved, leaving the evolutionary relationships among wild, domesticated, and ornamental species poorly understood.

Phenotypic variation in crop species arises from natural variation and selection for traits favored in domestication10,11. Although short-read sequencing enables large-scale detection of single-nucleotide polymorphisms (SNPs) and small insertions/deletions (Indels), it is limited in detecting structural variations (SVs), including insertions, deletions, inversions, and translocations. Such variants often exert large effects on agronomic traits, such as flowering time, fruit size, stress tolerance, and plant architecture12,13,14. Recent advances in long-read sequencing and graph-based pangenomes have enabled more accurate detection and comprehensive characterization of SVs, revealing forms of genetic variation that were previously inaccessible15,16,17,18,19. These developments are particularly valuable for dissecting the genomic architecture of perennial woody plants such as cherries, whose highly heterozygous genomes have historically constrained genomic analyses.

De novo genome assemblies are available for several edible and ornamental cherry species20,21, however, wild and geographically diverse Cerasus species remain underexplored. Pangenomes have advanced studies in numerous crops, including soybean (Glycine max)22, wheat (Triticum spp.)23, barley (Hordeum vulgare)24, tomato25, strawberry26, watermelon27, apple28, citrus29, grapevine30, and Solanum31. These efforts have demonstrated the value of capturing gene presence–absence variation (PAV), SVs, and population-scale diversity to elucidate trait evolution and support genomic-assisted breeding3,26,29,30. These advances underscore the need for a comprehensive Cerasus pangenome that captures both cultivated and wild diversity to facilitate comparative genomic and trait dissection.

Here, we assemble eight genetically diverse Cerasus genomes and integrate them with 13 published genomes to construct a pangenome, CERASUSpan v1.0, representing the current global Cerasus diversity. Using this resource, we generate a variation map of SNPs, Indels, and SVs across 219 cherry accessions and conduct a genome-wide association study (GWAS), together with integrating candidate gene analyses to investigate the genetic basis of flowering time. Collectively, this pangenome resource establishes a comprehensive genomic framework for Cerasus, enabling us to elucidate the genetic regulation of flowering phase progression in cherries.

Results

The pangenome reference for Cerasus (CERASUSpan v1.0)

Cerasus, comprising ~70 species, is one of the most diverse, ecologically varied, and economically significant groups in Prunus4. To characterize its genomic landscape, we de novo assembled genomes of eight species, including the edible cherry P. cerasus, the ornamental P. incisa “Oshidori,” and six additional species (P. campanulata, P. subhirtella, P. cerasoides, P. serrula, P. tianshanica, and P. mahaleb) (Supplementary Data 1). Among these, P. tianshanica and P. cerasus were tetraploid32. Assembly contig N50 values ranged from 2.37 to 31.13 Mb, with haploid assembly sizes of 234.39–353.73 Mb, consistent with flow cytometry estimates (Supplementary Fig. 1 and Supplementary Data 2 and 3). Assembly qualities were further evaluated using the LTR Assembly Index (LAI), which ranged from 7.17 to 16.44 (Supplementary Data 2). Annotated gene models ranged from 22,864 to 32,588 (Supplementary Data 2). Using Hi-C data, the contigs were anchored into eight pseudochromosomes for all species (Supplementary Fig. 2), comparable to previous assemblies6,20,21,33,34,35,36,37,38,39. Centromere regions were identified, with lengths varying across chromosomes and species (Supplementary Data 4).

For pangenome construction, we integrated these eight de novo assemblies with 13 published genomes (P. pseudocerasus, P. pusilliflora, P. conradinae, P. serrulata, P. fruticosa, P. humilis, P. campanulata, P. speciosa, P. spachiana, and four sweet cherry cultivars “Burlat,” “Tieton,” “Regina,” and “13-33”) (Supplementary Data 1). In total, 21 accessions representing 17 species captured broad phylogenetic and phenotypic diversity (Fig. 1a, b and Supplementary Fig. 3), encompassing both cultivated and wild taxa. Based on use and domestication status, species were classified as wild (W), consumed (C), ornamental (O), and domesticated (D) (Fig. 1a, b). A phylogenetic tree constructed with 310 single-copy orthologs (Fig. 1c) showed that P. avium was more distantly related to P. pseudocerasus than to P. cerasus among edible species (Fig. 1c).

Fig. 1: The Cerasus pangenome captures the phenotypic, ecological, agricultural, and genomic diversity across the genus.
Fig. 1: The Cerasus pangenome captures the phenotypic, ecological, agricultural, and genomic diversity across the genus.The alternative text for this image may have been generated using AI.
Full size image

a Approximate geographical centroid of the origin for the 21 selected Cerasus accessions, categorized by agricultural use: wild (W), consumed (C), ornamental (O), and domesticated crop (D). For some modern cultivated accessions, the indicated locations represent the breeding origin of the cultivar, rather than the native range of the corresponding species. b Representative phenotypic diversity of branches, buds, flowers, and fruits from a subset of Cerasus species included in the pangenome. Scale bars: 5 cm (shoots) and 2 mm (fruits, flower buds, and flowers). c Phylogenetic tree of the Cerasus pangenome illustrating genetic relationships within the genus. Branch lengths represent coalescent units. d Ratio of different transposon elements (TEs) for each accession within the Cerasus pangenome. e Pearson correlation coefficients (R2 = 0.77) between genome size and total length of TEs. The gray band indicates the 95% confidence interval.

We annotated transposable elements (TEs) across the 21 genomes, ranging from 63.93 to 200.76 Mb (Supplementary Data 5). Seven major TE categories were identified (LINE, LTR, SINE, TIR, non-LTR, non-TIR, and rDNA), with LTR elements being the dominant component, accounting for 13.96–41.86% of total TEs (Fig. 1d and Supplementary Data 5). Consistent with other plant pangenomes31,40,41, species-specific expansions of retrotransposons drove repetitive content and correlated strongly with genome size (Fig. 1e).

The simulated pangenome stabilized when >17 genomes were included, while core gene families stabilized beyond 19 genomes (Fig. 2a). Gene families were classified as core (7126; 12.53%), softcore (7350; 12.93%), dispensable (16,680; 29.33%), and private (25,709; 45.21%) (Supplementary Data 6). Core gene families were enriched in essential biological processes, including lipid oxidation, exocytosis, photosynthesis, and structural molecule activity, whereas dispensable gene families were mainly enriched in shoot development, protein phosphatase inhibitor activity, and signaling receptor activity (Supplementary Fig. 4). Across all gene families (Fig. 2b), gene counts were 219,617 (core), 209,672 (softcore), 225,232 (dispensable), and 28,641 (private) (Supplementary Data 7). Although dispensable and private families comprised 74.54% of the gene families, 62.84% of all genes belonged to core and softcore categories, which together accounted for 25.46% of all families, highlighting a conserved genomic backbone within the Cerasus pangenome (Fig. 2c and Supplementary Data 6 and 7).

Fig. 2: Pangenome characteristics of the cherry.
Fig. 2: Pangenome characteristics of the cherry.The alternative text for this image may have been generated using AI.
Full size image

a Number of pangenes and core gene families across different combinations of Cerasus genomes. The number of pan gene families and core gene families was independently stated 18 times when using a single genome and stated 21 times when using more than one genome. Each box represents the median and interquartile range (IQR), with whiskers extending to the maximum and minimum values. b Composition of the pangenome and individual genomes. The histogram displays the number of gene families across varying genomes, while the pie chart illustrates the proportion of gene families in each category. Gene families are classified as core (present in all genomes), softcore (over 90% of all genomes), private (over 90% of two genomes), and dispensable (the remaining) gene families. Orphan genes (that is, species-specific gene families with only one gene) were excluded. c Proportion of core, softcore, dispensable, and private genes within the Cerasus pangenome. d Structural variation (SV) density across each chromosome. e Proportions of SV types in the Cerasus pangenome, including deletion (DEL), duplication (DUP), insertion (INS), inversion (INV), and translocation (TRA). f Distribution of genomic characteristic elements across each chromosome. The circle layers from outermost to innermost represent: chromosome length, deletion density, insertion density, transposon elements, LTR density, GC content, and gene density. Species abbreviations: P. tianshanica (Ptia), P. subhirtella (Psub), P. speciosa (Pspe), P. spachiana “Komatsu-otome” (Pspa), P. serrulata (Psert), P. serrula (Psera), P. campanulata-F (Pcam-F), P. avium “Regina” (Pavi-R), P. pusilliflora (Ppus), P. pseudocerasus (Ppse), P. mahaleb (Pmah), P. incisa “Oshidori“ (Pinc), P. humilis (Phum), P. fruticosa (Pfru), P. conradinae (Pcon), P. cerasus (Pceru), P. cerasoides (Pcero), P. campanulata (Pcam), P. avium “Burlat” (Pavi-B), P. avium “13-33” (Pavi-13), and P. avium “Tieton” (Pavi-T).

To identify SVs, 20 genomes were aligned to the P. avium “Tieton” (Pavi-T) reference genome. SVs exhibited an uneven distribution along chromosomes (Fig. 2d). In total, we detected 493,836 SVs across the 20 genomes, with counts ranging from 6488 in P. tianshanica to 32,510 in P. fruticosa (Fig. 2e and Supplementary Data 8). These included 234,733 deletions and 240,638 insertions, whereas duplications (7947), inversions (774), and translocations (9744) were relatively rare (Fig. 2e, Supplementary Fig. 5a and Supplementary Data 8). Variant length distributions showed differences among SV types (Supplementary Fig. 5b–f) and correlated with chromosome length variation (Supplementary Fig. 6a and Supplementary Data 9). To capture structural variation and sequence diversity across Cerasus genomes, we constructed a graph-based pangenome using the VG toolkit, which enables representation of multiple haplotypes beyond a single linear reference (see Methods). Functions of genes affected by SVs (within or ±5 kb) varied across gene families, with softcore genes enriched in catalytic activity and transcription regulation, private genes enriched in heme binding, RNA modification, and defense responses (Supplementary Fig. 6b–d). Most deletions occurred along chromosome arms, whereas insertions clustered near or within centromeres, coinciding with high LTR density (Fig. 2f).

Variations in the blooming phenophase of the Cerasus species

Dormancy, budding, and flowering are critical developmental processes regulated by temperature, photoperiod, and hormonal cues. Plants adapt to climatic changes by evolving distinct germination and flowering periods. To evaluate flowering phenology, 21 Cerasus accessions were grown under uniform conditions in Shanghai, China. Substantial phenotypic diversity in blooming time was observed among these accessions (Fig. 3a and Supplementary Fig. 7), with P. cerasoides, P. campanulata, and P. pseudocerasus flowering earlier than other species.

Fig. 3: Analysis of flowering regulation of the genus Prunus.
Fig. 3: Analysis of flowering regulation of the genus Prunus.The alternative text for this image may have been generated using AI.
Full size image

a Record of flowering times across different Cerasus plants. The top image shows the onset of bud swelling, the middle image corresponds to the green tip stage, and the bottom image represents the full bloom stage. b Phylogenetic relationships among cherry species. Bootstrap support values are indicated by colored dots on major nodes, with red dots (BS > 90) and green dots (50 < BS ≤ 90). Prunus cerasus is an allotetraploid species derived from the natural hybridization of P. avium and P. fruticosa. The haplotype-resolved assemblies, P. cerasus (hapF) and P. cerasus (hapA), correspond to the subgenomes originating from P. fruticosa and P. avium, respectively. c Presence and absence of genes associated with flower development across different cherry genomes. Flower buds or flower samples from P. pseudocerasus, P. mahaleb, and P. avium were collected at three time points during the 2024 growing season: February 20 (T1), March 10 (T2), and March 25 (T3). d Highly relevant module (correlation ≥ 0.9) with flower development among three species. The gray square contains the gene number of the module (purple) and the subset annotated as floral development–related genes (blue). e Overlap of homologous genes of A. thaliana within the correlation module is shown as a Venn diagram, together with the number of candidate target genes in Pavi-T, Ppse, and Pmah (Bar chart). f Expression trends of the representative target gene across the three sampling time points (T1, T2, and T3).

To clarify the phylogenetic relationships within subg. Cerasus and between it and closely related subgenera (Prunophora and Amygdalus), we constructed a phylogenetic tree from single-copy orthologous genes (Supplementary Data 10). P. humilis, P. tianshanica, and P. tomentosa (Microcerasus) clustered closer to Prunophora than to subg. Cerasus. Microcerasus and Prunophora diverged approximately 23.8 million years ago (15.1–33.4 Mya) during the Miocene (Fig. 3b). Subg. Cerasus was divided into five clades: Clade I (P. mahaleb) formed the basal lineage; Clade II included P. avium; Clade III contained P. fruticosa; and Clade IV comprised P. serrula; and Clade V encompassed P. incisa and P. speciosa (more distant from P. serrulata, P. yedoensis, P. subhirtella, and P. spachiana), together with P. campanulata, P. conradinae (Pcon), P. pseudocerasus (Ppse), P. cerasoides, and P. pusilliflora (Ppus). The high-altitude species P. cerasoides and P. pusilliflora formed sister taxa, diverging approximately 2.3 Mya (0.5–4.8 Mya).

We identified 411 flowering-related orthologous gene families across Cerasus genomes using PlantCFG42. PAV analysis showed that some species, such as P. pseudocerasus and P. subhirtella, lacked numerous flowering-related genes. However, Pearson correlation analysis revealed no significant association between PAV and flowering time, suggesting it may not be the primary determinant of flowering time variation (Fig. 3c and Supplementary Fig. 8).

Among 97 flowering genes characterized in Arabidopsis thaliana (Supplementary Data 11), 86, 74, and 103 homologs were identified in Pavi-T, P. mahaleb (Pmah), and Ppse, respectively (Supplementary Data 12). These genes were widely distributed across chromosomes (Supplementary Data 12). We sequenced 27 RNA-Seq datasets (Supplementary Data 13) and performed Weighted Gene Co-expression Network Analysis (WGCNA), identifying three modules in Pavi-T, two in Pmah, and two in Ppse with correlation coefficients >0.9 (Fig. 3d and Supplementary Figs. 9 and 10). In Pavi-T, the pink, turquoise, and yellow modules contained 4783, 7296, and 2552 genes, including 18, 15, and 11 flowering-related genes, respectively (Fig. 3d). In Pmah, the purple and green modules comprised 5607 and 1135 genes, harboring 16 and 4 target genes. In Ppse, the brown and blue modules included 2387 and 3943 genes with 11 and 15 target genes (Fig. 3d).

Given the different flowering phenotypes—Pavi-T and Pmah bloom late, whereas Ppse blooms early—target genes aligned at comparable sampling stages in Pavi-T and Pmah but differed in Ppse. These targets were converted into A. thaliana homologs for cross-species comparison (Fig. 3e), yielding five genes from Pavi-T, four from Pmah, and seven from Ppse. Expression analysis based on FPKM values revealed that AGAMOUS-LIKE 9 (AGL9) exhibited the strongest differential expression across the three species (Fig. 3f).

Genetic diversity and population structure analysis of Cerasus

We compiled a comprehensive genomic variation dataset for 219 cherry accessions using previously published sequencing data6. After selecting accessions (Supplementary Data 14) and removing SNPs with a max-missing rate ≥0.1 and minor allele frequency (MAF) ≤0.05, we identified 853,236 SNPs and 217,080 Indels (Supplementary Data 15). SVs were characterized using the pangenome, yielding 4283 SVs (max-missing rate ≤0.3 and MAF ≥0.01), thus generating a high-resolution variation map for Cerasus (Supplementary Data 15 and Supplementary Fig. 11 and 12). To validate the reliability of SVs, 15 randomly selected SVs were experimentally confirmed by PCR, all showing genotype-consistent amplification patterns (Supplementary Fig. 13), supporting the reliability of the SV dataset for downstream analyses.

Phylogenetic reconstruction based on SVs delineated five major clusters (Clusters 1–5) (Fig. 4a). However, flowering phenology showed no strong correspondence with these clusters. A similar lack of correspondence between flowering phenotypes and genetic relationships was observed in SNP- and Indel-based phylogenies (Supplementary Fig. 14a, b). This is likely attributable to the relatively narrow genetic background of cultivated sweet cherry, compounded by the limited genetic differentiation between populations.

Fig. 4: Population structure of cherry accessions and selection signatures for blooming phenophase.
Fig. 4: Population structure of cherry accessions and selection signatures for blooming phenophase.The alternative text for this image may have been generated using AI.
Full size image

The neighbor-joining (NJ) phylogenetic tree (a), principal component analysis (PCA) (b), and admixture analysis (c) of 219 cherry accessions based on structural variation (SVs). d Manhattan plots showing the genomic regions under selection between early- and late-flowering groups based on the fixation index (FST) and nucleotide diversity (π) of SNPs. The dashed line represents the significant thresholds of π and FST values. e Local Manhattan plot (top) of the flowering-time association signals identified by genome-wide association analysis (GWAS) and the corresponding linkage disequilibrium (LD) heatmap (bottom) for the 44.71-kb interval on chromosome 7 (chr7: 28.10–28.15 Mb). Each dot represents an SNP, and the y-axis indicates the association significance as −log10 P. Association P values were obtained using a GLM model implemented in TASSEL. The horizontal red dashed line indicates the significance threshold (−log10 P = 6). Pairwise LD across this interval is shown in the lower panel and is measured as D′. Red dots indicate non-synonymous SNPs within chr7: 28.10–28.15 Mb. f Box plots showing haplotypes at the upstream locus of the candidate genes PAV07G053290 (+) and PAV07G053280 (−). Phenotypic differences among genotype classes at the upstream SNPs of these candidate genes are shown. Each point represents a biologically independent sample. Sample sizes were AA, n = 5; AT, n = 88; and TT, n = 126. Phenotypic values were obtained from single measurements per sample. Box plots show the median (center line), upper and lower quartiles (box limits), and whiskers extending to 1.5 × the interquartile range. Overall differences among genotype groups were assessed using a Kruskal–Wallis test (*P < 0.05, **P < 0.01, ns, P > 0.05), and pairwise comparisons were performed using two-sided Wilcoxon rank-sum tests. No conventional control group was included. g Zoom in on the selected region of the AGL9 gene.

To further investigate population structure, we performed ADMIXTURE (Fig. 4b) and principal component analyses (PCA, Fig. 4c). PCA revealed weak differentiation across clusters, consistent with limited diversity in selected sweet cherry. ADMIXTURE analysis further showed that most accessions harbored mixed ancestry components rather than forming discrete subpopulations (Supplementary Fig. 14c–f), further supporting the impact of long-term clonal propagation. Although the neutral genetic structure is shallow, this accession panel provides a robust framework for dissecting variation in flowering phenology in subsequent analyses.

Selection signature analysis of the blooming phenophase

Understanding the genetic basis of the blooming phenophase is essential for elucidating flowering regulation, evolutionary dynamics, and its potential applications in breeding. To investigate the genomic variations underlying flowering time in cherries, flowering-time phenotypes of 219 cherry accessions were collected and then initially classified into five categories. For selection signature analyses, only the three extreme groups—early, mid, and late flowering (20, 70, and 21 accessions, respectively)—were retained, while the intermediate groups were excluded to maximize phenotypic contrast and statistical power (Supplementary Data 14).

FST analysis revealed varying genetic differentiation among groups, with the strongest divergence observed in the early–late comparison (SVs: top 1% threshold = 0.1921; SNPs = 0.1710; Indels = 0.1445), followed by mid–late (SVs = 0.1399; SNPs = 0.1173; Indels = 0.0971), and early–mid (SVs = 0.1354; SNPs = 0.0983; Indels = 0.0845) comparisons (Fig. 4d, Supplementary Figs. 1517 and Supplementary Data 1618). GO enrichment analysis linked these candidate genes of high FST to DNA helicase activity, vacuolar membrane, glutathione transferase activity, and ubiquitin ligase complex (Supplementary Fig. 18). These functional categories are known to participate in chromatin regulation, redox homeostasis, protein turnover, and stress-responsive signaling, processes that have been implicated in the regulation of flowering time and floral transition in multiple plant species43,44,45.

To further identify genomic regions under selection associated with flowering time variation, the high π differences and FST values (top 1%) across the three comparisons were examined, thereby revealing 36 SNPs, 101 Indels, and seven SVs candidate regions in the early–late comparison, 87 SNPs, 68 Indels, and eight SVs regions in the early–mid comparison, and 17 Indels and four SVs in the mid–late comparison. Correspondingly, the candidate genes included 219 SNP-, 196 Indel-, and 35 SV-associated genes (early–mid comparison), 50 Indel- and 47 SV-associated genes (mid–late comparison), and 125 SNP-, 302 Indel-, and 55 SV-associated genes (early–late comparison) (Supplementary Data 1618). The greater number of candidate genes in early-flowering comparisons suggests stronger selection pressure at this phenotypic extreme, driving divergence in flowering regulation.

Within SNP candidate regions distinguishing early- and late-flowering groups, the region with the highest π ratio value (chr7:28.1–28.15 Mb) contained a SNP (chr7:28,123,036; A/T) located upstream of two adjacent genes, PAV07G053280 (-) and PAV07G053290 (+), both of which were associated with flowering time in GWAS analyses (Fig. 4d–f). PAV07G053290 encodes NUP205, a nuclear pore complex protein previously shown to influence flowering via the circadian clock regulation in Arabidopsis46,47. Additionally, PAV01G05220 (PavAGL9) exhibited strong selection signals across all SNP-based comparisons based on FST values (Fig. 4g). In addition, homology-based analyses identified four flowering-related genes in early–mid comparisons of SNPs and Indels: PAV01G067490 (PavGA2ox8), PAV01G068630 (PavCSTF77), PAV01G074490 (PavDCL3), and PAV07G053770 (PavARP6) (Supplementary Figs. 15 and 17 and Supplementary Data 19). Two non-synonymous SNPs were detected in PavCSTF77, while PavDCL3 harbored six non-synonymous variants, potentially affecting protein function (Supplementary Fig. 19). In A. thaliana, cstf77 mutants exhibit delayed flowering, and dcl1 mutants display reduced microRNA expression48,49. Haplotype analysis of PAV01G074490 (PavDCL3) identified a SNP (chr1:39,103,856; A/C) associated with flowering time, which resulted in a non-synonymous substitution, suggesting a potential role in flowering (Supplementary Fig. 19). Meanwhile, Indels located in the upstream regions of PavARP6 and PavDCL3 indicate possible cis-regulatory effects (Supplementary Fig. 20).

SV analysis during the mid-to-late comparison further identified PAV01G065550 (PavGAI/RGA), which encodes a DELLA protein involved in gibberellin signaling50, within a candidate SV region (Supplementary Fig. 16). Similarly, in the early–late comparison, Indel variation highlighted PAV04G001540 (PavCOP1), a gene previously implicated in photoperiodic flowering regulation51 (Supplementary Fig. 17). These results highlight the multilayered genetic architecture of flowering time, with selection acting on both coding and regulatory regions. To further dissect flowering time variation, we performed a GWAS using phenotypic data (Supplementary Figs. 2123 and Supplementary Data 20). Using GEMMA’s linear model, we identified two SNPs, 15 SVs, and two Indels, suggesting potential functional roles in flowering phenotypes. A generalized linear model (GLM) in TASSEL detected significant associations for each variant type, further underscoring that both sequence and structural variation contribute to flowering-time regulation in sweet cherry (Supplementary Data 20).

Functional analysis of PavAGL9 associated with flowering progression

Based on the integration of selection signature analyses (Fig. 4g) and RNA-seq expression clustering (Fig. 3e, f), PavAGL9 was identified as a candidate gene associated with flowering phenology. To further characterize its expression pattern, we examined PavAGL9 transcript abundance across developmental stages. Both semi-quantitative and quantitative PCR showed that PavAGL9 expression increased markedly during flowering, supporting its involvement in this process (Fig. 5a–c). Similarly, AGL9 orthologs exhibited comparable expression trends across multiple Cerasus species, indicating conserved transcriptional dynamics (Supplementary Fig. 24). Subcellular localization assays revealed the nuclear localization of PavAGL9 (Fig. 5d), consistent with its predicted role as a transcription factor. To explore its biological role, we generated transgenic A. thaliana lines overexpressing PavAGL9 to assess its functional impact on plant development. These lines flowered earlier than the wild-type Col-0 plants (Fig. 5e). Expression analysis revealed increased transcript levels of AtFT in PavAGL9-overexpressing plants (Fig. 5f, g), consistent with known flowering-time pathways in Arabidopsis52. To evaluate whether PavAGL9 exerts a comparable effect in sweet cherry, we transiently overexpressed PavAGL9 in sweet cherry flower buds that had fulfilled their chilling requirement. PavAGL9-overexpressing buds flowered earlier than empty-vector controls, accompanied by a marked increase in PavAGL9 transcript abundance (Fig. 5h, i). Moreover, expression of downstream flowering regulators PavFT and PavAP1 was also significantly upregulated (Fig. 5j, k). Taken together, these results indicate that PavAGL9 functions as a transcription factor associated with flowering progression in Cerasus.

Fig. 5: Overexpression of PavAGL9 accelerates the blooming phenophase in cherries.
Fig. 5: Overexpression of PavAGL9 accelerates the blooming phenophase in cherries.The alternative text for this image may have been generated using AI.
Full size image

a Phenotypic changes observed during cherry flower bud formation and dormancy release. Scale bar = 500 μm. Both semi-quantitative (b) and quantitative PCR (c) analyses demonstrated a substantial upregulation of PavAGL9 expression during the stages of flower bud formation and dormancy release. Data are presented as mean ± SD (n = 3 biological replicates). Different lowercase letters indicate significant differences at P < 0.05 (one-way ANOVA followed by Tukey’s multiple comparison test). S1-S11 correspond to the 11 sampling time points from 24 June 2024 to 6 April 2025. d Subcellular localization of the PavAGL9 protein in tobacco leaf epidermal cells, with 35S:GFP as the positive control. Green fluorescence represents the GFP fusion protein signal, while red fluorescence marks the nucleus using p2300-mCherry. Two independent experiments with similar results were performed. Scale bar = 50 μm. e Phenotypes of Col-0, PavAGL9-OE, agl9, and PavAGL9/agl9 plants. Scale bar = 2 cm. Expression profiles of PavAGL9 (f) and AtFT (g) in Col-0 and transgenic lines are also depicted. Data are presented as mean ± SD (n = 3 biological replicates). Statistical significance between groups was assessed using a two-sided Student’s t-test. h Flowering phenotypes following the transient overexpression of PavAGL9 in sweet cherry flower buds. Scale bar = 1 cm. Expression profiles of PavAGL9 (i), PavFT (j), and PavAP1 (k) in Col-0 and transient transgenic buds. Data are presented as mean ± SD (n = 3 biological replicates). Statistical significance between groups was assessed using a two-sided Student’s t-test. The Actin gene was used as the internal control. Source data are provided as a Source data file.

PavBPC6 binds the PavAGL9 promoter and represses its transcription to modulate flowering progression

To identify upstream regulators of PavAGL9, we first predicted 74 candidate transcription factors using PlantRegMap (https://plantregmap.gao-lab.org/regulation_prediction.php) (Supplementary Data 21). These candidates were further prioritized by integrating co-expression network analysis, which revealed that PavBPC6 exhibited the strongest expression correlation with PavAGL9 (Supplementary Data 2227). Based on this combined bioinformatic evidence, the potential PavBPC6-PavAGL9 regulatory module was selected for experimental validation. Yeast one-hybrid (Y1H) assays showed that yeast cells co-transformed with pB42AD-PavBPC6 and pLacZi-proPavAGL9 activated reporter gene expression, indicating a direct interaction between PavBPC6 and the PavAGL9 promoter (Fig. 6a). Motif analysis using PlantRegMap and FIMO (MEME Suite) identified putative PavBPC6-binding sites corresponding to an atypical GAGA motif within the PavAGL9 promoter. Electrophoretic mobility shift assay (EMSA) further demonstrated that recombinant PavBPC6-GST protein bound biotin-labeled probes containing this motif in vitro, while unlabeled or mutated probes abolished binding (Fig. 6b). These results indicate that PavBPC6 specifically binds this motif within the PavAGL9 promoter. To determine the transcriptional effect of PavBPC6, we performed a dual-luciferase (Dual-LUC) assay in Nicotiana benthamiana. Co-expression of 35S:PavBPC6 (effector) with proPavAGL9:LUC (reporter) significantly reduced the LUC/REN ratio compared with the reporter alone (Fig. 6c–e), indicating that PavBPC6 functions as a transcriptional repressor of PavAGL9. To validate this regulatory relationship in cherries, we transiently overexpressed PavBPC6 in sweet cherry flower buds that had satisfied chilling requirements. PavBPC6-overexpressing buds displayed delayed flowering relative to empty-vector controls, along with high-intensity GFP fluorescence (Fig. 6f, g). Consistently, PavAGL9 transcript levels were significantly reduced in PavBPC6-overexpressing buds (Fig. 6h). Collectively, these results demonstrate that PavBPC6 directly binds the PavAGL9 promoter and represses its transcription, thereby contributing to the regulation of flowering progression in sweet cherry.

Fig. 6: PavBPC6 binds to the promoter of PavAGL9, inhibiting its transcription and participating in the regulation of the blooming phenophase.
Fig. 6: PavBPC6 binds to the promoter of PavAGL9, inhibiting its transcription and participating in the regulation of the blooming phenophase.The alternative text for this image may have been generated using AI.
Full size image

a Yeast one-hybrid (Y1H) analysis demonstrating the binding of PavBPC6 to the PavAGL9 promoter on selective medium lacking Ura and Trp, supplemented with 100 mg/mL X-Gal. b Electrophoretic mobility shift assay (EMSA) indicating the interaction of PavBPC6 with PavAGL9 promoter fragments containing the motif. Unlabeled probes were utilized for the competition assay; “−” and “+” denote the absence and presence of the binding, respectively. c Schematic overview of the LUC reporter vector containing the PavAGL9 promoter, alongside effector vectors containing PavBPC6. d Transient LUC imaging assays reveal that PavBPC6 represses the transcription of the proPavAGL9:LUC reporter. Representative images show LUC activity in N. benthamiana leaves 48 h post-infiltration. Three independent experiments with similar results were performed. e Dual-Luciferase (Dual-LUC) assay depicting promoter activity expressed as the ratio of LUC to 35S:Renilla (REN). Data are presented as means ± SD of fifteen independent measurements (n = 15 biological replicates), with significant differences evaluated using two-sided Student’s t-test (centerline, median; box limits, first and third quartiles; whiskers, minimum and maximum values). f The phenotypes resulting from the transient overexpression of PavBPC6 in flower buds of sweet cherry. Scale bar = 1 cm. g GFP signal in the transiently transformed flower buds was detected using a fluorescence microscope. Two independent experiments with similar results were performed. h Expression profiles of PavAGL9 in transient transgenic buds of sweet cherry. Data are presented as mean ± SD (n = 3 biological replicates). Statistical significance between groups was assessed using a two-sided Student’s t-test. i Yeast two-hybrid (Y2H) analysis demonstrating the interaction between PavAGL9 and PavSEP1, as well as PavPMADS2. Protein interactions were screened on a selective medium lacking Trp, His, Leu, and Ade, and supplemented with 20 mM X-α-Gal. AD-T and BD-lam were co-transferred as negative controls, while AD-T and BD-53 served as positive controls. j Bimolecular fluorescence complementation (BiFC) assay illustrating the interaction between PavAGL9 and PavSEP1 as well as PavPMADS2 in tobacco cells. MYC-pXY104 and 106J2-pXY106 pairs were used as a positive control. Yellow fluorescence represents the YFP fusion protein signal, while red fluorescence marks the nucleus using p2300-mCherry. Images show overlays of fluorescence and light views. Two independent experiments with similar results were performed. k Co-Immunoprecipitation (Co-IP) assay showing interaction of PavAGL9 with PavSEP1. l Co-IP assay demonstrating the interaction of PavAGL9 with PavPMADS2. PavSEP1-Flag and PavPMADS2-Flag were each co-expressed with PavAGL9-YFP in tobacco leaves. Source data are provided as a Source data file.

To explore the evolutionary conservation of this regulatory relationship, we compared upstream promoter sequences of AGL9 orthologs across 17 Cerasus species represented in the pangenome dataset. A conserved motif corresponding to the PavBPC6-binding site was identified in 15 species (19 accessions), indicating strong conservation within subg. Cerasus (Supplementary Fig. 25). In contrast, the motif was absent in P. tianshanica and P. humilis, both belonging to Microcerasus, suggesting lineage-specific divergence. We further retrieved BPC6 ortholog coding sequences from all 21 accessions and conducted protein–DNA binding affinity predictions using motif-scanning approaches (FIMO; https://meme-suite.org/meme/tools/fimo). In silico analyses indicated that BPC6 orthologs from 15 species (19 accessions) retain predicted binding capacity to the conserved motif in their respective AGL9 promoters, suggesting that the BPC6-AGL9 regulatory module is evolutionarily conserved within subg. Cerasus (Supplementary Data 28).

PavAGL9 physically interacts with PavSEP1 and PavPMADS2

To investigate the involvement of PavAGL9 in MADS-box protein complex assembly, we analyzed the protein–protein interactions among MADS-box proteins using yeast two-hybrid (Y2H) assays. PavAGL9 was fused into pGADT7 (AD), whereas PavSEP1 and PavPMADS2 were fused into pGBKT7 (BD). Yeast cells harboring BD-PavPMADS2 or BD-PavSEP1 with AD-PavAGL9 showed strong blue colony coloration, indicating direct interactions between PavAGL9 and both PavSEP1 and PavPMADS2 (Fig. 6i). These interactions were further validated in planta using bimolecular fluorescence complementation (BiFC) assays in N. benthamiana leaves. Strong YFP fluorescence signals were observed in the nucleus when PavAGL9-cYFP was co-expressed with nYFP-PavSEP1 or nYFP-PavPMADS2, whereas no fluorescence was observed in negative controls lacking one interaction partner (Fig. 6j). The MYC-cYFP and nYFP-106J2 pair were used as a positive control. To further confirm these interactions, we performed co-immunoprecipitation (co-IP) assays in tobacco cells co-expressing PavAGL9-YFP with PavSEP1-Flag or PavPMADS2-Flag. Immunoprecipitation of PavSEP1-Flag or PavPMADS2-Flag pulled down PavAGL9-YFP, confirming their physical association in vivo (Fig. 6k, l). Taken together, these results suggest that PavAGL9 interacts with PavSEP1 and PavPMADS2, supporting its involvement in MADS-box protein complexes potentially associated with floral organ development.

Discussion

Plant pangenomes are rapidly emerging as foundational resources to guide genome-editing strategies and improve our understanding of genotype–phenotype relationships. While integrating forward genetics with large-scale sequencing of high-quality plant reference genomes—including complete telomere-to-telomere assemblies—enables the discovery of SVs, SNPs, Indels, and CNVs, tracing their evolutionary trajectories remains challenging due to the complex evolutionary history of angiosperms, marked by lineage-specific fragmentation and recent duplications. Compared with species-level pangenomes, genus-level pangenomes capture broader sequence variation and paralog diversification, providing a genomic framework for trait dissection and evolutionary relationships within Cerasus.

Within Prunus sensu lato, long-standing debates regarding the phylogenetic relationship of Cerasus and its closely related lineages have persisted, largely due to reliance on morphological traits or limited molecular markers. By leveraging a pangenome-scale comparative framework across 21 genomes, this study resolved several stable clades within Cerasus that are broadly consistent with recent chloroplast-based phylogenies9,53, while revealing discordance with traditional morphology-based classifications4. In particular, several species historically assigned to Microcerasus (P. tianshanica, P. humilis, and P. tomentosa) did not form a monophyletic lineage within Cerasus, but instead cluster more closely with other Prunus sections, thereby corroborating earlier molecular evidence. Although taxonomic revision was beyond the scope of this study, these findings demonstrate how genus-level pangenomes can bridge traditional classifications with genome-scale evolutionary evidence and provide a foundation for downstream analyses.

By integrating 13 published genomes20,35,39 with eight newly assembled accessions, we constructed a 21-accession Cerasus pangenome. Although this dataset failed to capture the full sequence diversity present within the Cerasus population, particularly among wild species in China, the broad representation it provides offers key insights into diversity, despite the smaller number of accessions compared to some plant pangenomes28,31. Numerous SVs were identified, including variants associated with fruit traits and, to a lesser extent, flowering time. These resources provide a valuable foundation for molecular breeding and genome editing.

Although flowering phenotypes were recorded under long-term uniform conditions to minimize environmental effects, residual phenotypic plasticity may persist in accessions originating from highly distinct native habitats. In the absence of explicit quantification of phenotypic plasticity or incorporation of environmental covariates, the GWAS and selection signals may partially reflect genotype-by-environment interactions and should therefore be interpreted with caution. Future multi-location trials and multi-year phenological monitoring will help disentangle the relative contributions of genetic and environmental factors to flowering-time variation.

Through combined selective sweep and transcriptomic analyses, PavAGL9 was identified as a key gene associated with flowering progression. Expression of AGL9 orthologs increased during post-dormancy stages and peaked at full bloom (Supplementary Fig. 24), and functional assays in Arabidopsis and sweet cherry buds that had fulfilled their chilling requirements demonstrated that elevated PavAGL9 expression accelerates floral opening. In contrast, ectopic expression in buds that had not satisfied chilling requirements failed to promote bud break (Supplementary Fig. 26), indicating that PavAGL9 functions downstream of dormancy release and is more closely associated with post-dormancy flowering progression rather than as a primary regulator of dormancy release.

Moreover, definitive functional validation of PavAGL9 in sweet cherry will require stable genetic manipulation, which remains technically challenging due to long juvenile phases, self-incompatibility, low fruit set, and the limited efficiency of transformation systems in Cerasus species33,34,54. Although PavBPC6 was validated as a transcriptional repressor of PavAGL9 in sweet cherry by EMSA and Dual-LUC assays, extension of this regulatory relationship across Cerasus species is currently supported primarily by sequence conservation and computational predictions. Accordingly, the inference of a conserved regulatory module remains provisional and requires further cross-species experimental validation. Substantial interspecific variation in flowering phenology suggests that additional regulatory layers, including variation in upstream regulators, downstream targets, or epigenetic modulation, may also contribute to species-specific flowering patterns. Elucidating these mechanisms represents an important direction for future studies.

Our selective sweep and GWAS analysis identified a set of flowering-related genes within the selective regions. Although several genes were functionally verified, future studies should refine candidate identification using more robust strategies. Considerable progress has been made in linking cherry horticultural traits to candidate genes55,56,57,58, yet compared with other perennial fruit crops such as apple, citrus, and grapevine, functional genomics in cherries is lagging. In addition, although this study integrates population genomics, the regulatory mechanism of SVs remains incomplete. In particular, establishing causal links between SVs in cis-regulatory regions and flowering phenotypes will ultimately require cross-species expression profiling and targeted functional perturbations, which remain challenging in cherry. Overall, this work expands genomic resources for Cerasus, providing a pangenome dataset that deepens understanding of flowering biology and supports future molecular breeding efforts.

Methods

Sample collection and sequencing

The Cerasus species samples used in this study for DNA and RNA sequencing were collected from six locations across China. P. cerasoides (Pcero) was collected from Yunnan Agricultural University, Kunming, China (102°45′36″E, 25°8′24″N), P. cerasus (Pceru) from the Cherry Experimental Station of Northwest A&F University, Tongchuan, China (108°51′0″E, 34°52′12″N), P. incisa “Oshidori” (Pinc), P. mahaleb (Pmah), and P. subhirtella (Psub) from the Shanghai Botanical Garden, Shanghai, China (121° 27ʹ 4" E, 31° 9ʹ 14" N), P. campanulata (Pcam) from the Botanical Garden of Shanghai Jiao Tong University, Shanghai, China (121° 26ʹ 24" E, 31° 1ʹ 48" N), P. serrula (Psera) from Shangri-la, Yunnan, China (99° 49ʹ 12" E, 27° 49ʹ 12" N), P. tianshanica (Ptia) from the Ili Kazakh Autonomous Prefecture, Xinjiang, China (81° 57ʹ 0" E, 43° 7ʹ 12" N). The flower buds of sweet cherry “Tieton” were collected at 11 sampling time points: 24 June, 14 July, 2 August, 22 August, 10 September, 21 October, 23 November 2024, and 8 January, 23 February, 25 March, and 6 April 2025, for anatomical and expression analysis. Additionally, branches bearing flower buds that had fulfilled chilling requirements were collected from sweet cherries for transient transformation assays.

High-purity genomic DNA was extracted using a DNeasy Plant Kit (Tiangen Biotech Co., Ltd., Beijing, China). After assessing DNA purity, concentration, and integrity, both long- and short-read sequencing libraries were constructed. Short-read libraries were sequenced on an Illumina HiSeq X Ten platform (Illumina Inc., San Diego, CA, USA). Hi-C libraries were constructed through chromatin extraction, digestion, ligation, purification, and fragmentation59, followed by sequencing on the Illumina HiSeq X Ten platform. ONT libraries were sequenced on the Oxford Nanopore PromethION 48 platform. For PacBio HiFi sequencing of Ptia, Psera, and Pmah, a standard SMRTbell library was prepared from 50 μg of DNA using the SMRTbell Express Template Prep Kit 2.0 and sequenced on a PacBio Sequel II (Pacific Biosciences, Menlo Park, CA, USA).

Total RNA for both RNA-seq and qRT-PCR was extracted using the RNAprep Pure Plant Kit (TIANGEN Biotech Co., Ltd., Beijing, China). Reverse transcription was performed using the PrimeScript RT Master Mix Kit (Takara, Dalian), and cDNA was synthesized from 1 μg of total RNA for subsequent experiments. The cDNA libraries were sequenced on an Illumina NovaSeq platform (Bluescape Hebei Biotech Co., Ltd., China). RNA-seq data were generated from floral bud samples of Pavi-T, Pmah, and Ppse at three time points (T1–T3) during the 2024 growing season, corresponding to 20 February, 10 March, and 25 March, respectively.

Phylogenetic analysis of chloroplast genomes

Chloroplast genomes of P. mahaleb (NCBI accession number: PX991928), P. spachiana (PX991929), and P. tianshanica (PX991930) were assembled using the GetOrganelle v1.7.7.160. A total of 66 chloroplast genomes of Prunus species were retrieved from NCBI (Supplementary Data 29). Sequences were aligned using MAFFT v7.52561, then the maximum-likelihood (ML) tree was constructed using IQ-TREE v1.6.1262 with 1000 bootstrap replicates. P. padus was used as the outgroup.

Evaluation of flowering phenophase

We introduced several wild species from Yunnan and Xinjiang to the Cherry Experimental Farm at Shanghai Jiao Tong University by transplanting whole plants, collecting seedlings from root sprouts, or applying grafting and cutting, thereby minimizing variations in phenological periods due to regional differences. Flowering phenophases for all species were recorded under uniform conditions at botanical gardens and experimental farms in Shanghai63. Flowering stages, including first swelling, green tip, and full bloom (defined as 50% flower opening), were recorded for 21 Cerasus accessions. P. cerasoides, an extremely early flowering species63, served as the control. Species blooming within 30 days after P. cerasoides were classified as “early,” those flowering 30–41 days later as “middle,” and those blooming more than 41 days later as “late.” Specifically, the flowering phenophase was evaluated for P. spachiana “Komatsu-otome” (Pspa), P. serrulata (Psert), and P. speciosa (Pspe) at the Shanghai Chenshan Botanical Garden (121° 11ʹ 24" E, 31° 4ʹ 48" N); P. incisa “Oshidori”, P. subhirtella, P. conradinae, P. cerasoides, and P. mahaleb at the Shanghai Botanical Garden; P. pusilliflora, P. serrula, P. tianshanica, P. avium [“Burlat,” “Tieton” (Pavi-T), “13-33” (Pavi-13) and “Regina” (Pavi-R)], P. pseudocerasus, Prunus humilis (Phum), and Prunus cerasus at the Cherry Experimental Farm at Shanghai Jiao Tong University. In addition, we assessed early-flowering (P. campanulata, Pcam) and late-flowering (P. campanulata-F, Pcam-F) types at the Botanical Garden of Shanghai Jiao Tong University. Flowering phenophase data from 219 cherry accessions were evaluated based on the date of full bloom (defined as 50% flower opening). The well-known sweet cherry cultivar “Van,” characterized by a middle flowering time64, was used as a reference. Accessions blooming within ±3 days of Van’s full bloom were classified as “middle flowering.” Those blooming 4–10 days earlier and 11–17 days earlier than “Van” were categorized as “early-middle” and “early,” respectively. Similarly, accessions blooming 4–10 days and 11–17 days later than “Van” were classified as “middle-late” and “late.”

Genome assembly and assessment

Before assembly, the genome sizes of the eight Cerasus accessions were estimated using flow cytometry. Preparation of the cell suspensions involved chopping fresh leaf tissue in nuclei isolation buffer, followed by filtration through a nylon mesh and staining with propidium iodide65,66. Stained nuclear suspensions were analyzed with a BD FACSCalibur flow cytometer, where propidium iodide fluorescence was detected under 488 nm blue-light excitation. Genome sizes were independently estimated using Solanum lycopersicum or Ricinus communis as internal reference standards. For quality control, Fastp v0.20.267 was used to process NGS data, including Hi-C, RNA-Seq, and whole-genome paired-end reads, with default parameters. Nanopore reads were assembled  de novo using NECAT v0.0.168. Redundant sequences were removed using purge_dups v1.2.569, and the contigs were polished through three iterative rounds using Racon with NGS data70.

Chromosome anchoring was performed with Hi-C data. Clean Hi-C reads were mapped to the draft assembly using Juicer v1.7.671, and chromosome-scale scaffolds were generated with the 3D-DNA pipeline v18092272. Hi-C heatmaps were manually inspected and corrected with Juicebox Assembly v1.9.173, followed by re-scaffolding using 3D-DNA. NGS reads were aligned to the assembly with Burrows-Wheeler Aligner v0.7.1274. For PacBio Sequel II HiFi reads, genomes were assembled using hifiasm v0.19.9-r61675 with default parameters, followed by the identical scaffolding and correction procedures. Genome completeness and accuracy of these genomes were evaluated using BUSCO v5.3.176. Centromeres were predicted with quarTeT v1.2.577. The LAI was calculated using LTR_retriever v3.0.4 with default parameters to evaluate assembly quality (https://github.com/oushujun/LTR_retriever)78.

Genome annotation, GO, and KEGG enrichment analysis

TEs were annotated using EDTA v2.2.279 with parameters (--sensitive 1 --anno 1 --to evaluate 1 --force 1). Protein-coding genes were predicted using a combined strategy integrating ab initio, homology-based, and transcriptome-based approaches. Ab initio predictions were performed with Augustus v3.0.380, SNAP v2006-07-2881, and GlimmHMM v3.0.182. Homology-based predictions were conducted using Exonerate v2.2.083, with sequences from Prunus cerasus (PRJNA295439 and PRJNA327561), P. avium (PRJNA73727, PRJNA419491, PRJNA595502, and PRJNA550274), and P. subhirtella (PRJNA596558). Transcriptome-based gene models were generated using StringTie v1.3.484 and PASA85, based on Illumina sequencing data (PRJNA260424 and PRJNA1041553). All gene models were integrated using Evidence Modeler v1.1.186.

Functional annotation was performed by assessing sequence similarity and domain conservation. BLAST searches were conducted against NR, KEGG, and Swiss-Prot databases. HMMER v3.087 was used for Pfam domain identification, and InterProScan v5.53-87.088 was used to assign GO terms. Non-coding RNAs were predicted with tRNAscan-SE v1.3.1 (tRNA)89, RNAmmer v1.2 (rRNA)90, and INFERNAL v1.1.2 (miRNA and snRNA)91. Additional ncRNAs were annotated with Rfam v1.0.492. GO and KEGG enrichment analyses were performed by using clusterProfiler93.

Pangenome construction

Initially, Pavi-T was selected as the reference genome, and the remaining 20 genomes were aligned against it using minimap2 v2.1 (-x asm5)94. Subsequently, SVs were identified using SyRi v1.6.3 (--nosnp)95 and Assemblytics v1.2.196. The SV-calling results were merged using SURVIVOR v1.0.7 (merge 200,1,1,1,0,50)97. A graph-based pangenome was constructed based on the SV-calling results using VG v1.53.098, which enables the integration of structural variations into a variation graph, allowing accurate read mapping and genotyping across diverse genomes beyond the constraints of a single linear reference. For gene analysis, GffRead v0.12.799 was used to extract the longest CDS and protein sequences from the 21 accessions. Gene families were identified using OrthoFinder v3.0.1b1100, and subsequently classified as core, softcore, dispensable, and private according to the number of sample accessions represented in each orthologous group.

Phylogenetic analysis of cherry genomes

To investigate the phylogenetic relationships within Cerasus, we analyzed 21 accessions, including 13 publicly available genomes and eight newly assembled in this study (Supplementary Data 1). Seven additional Prunus species were included: P. cerasus (haplotypes A and F), P. dulcis, P. mume, P. persica, P. tomentosa, and P. yedoensis. Rosa chinensis, Vitis vinifera, and A. thaliana served as outgroups. Gene families across all genomes were identified using OrthoFinder v3.0.1b1100. Single-copy orthologs were aligned using MUSCLE v3.8.31101, and well-aligned regions were retained using Gblocks v0.91b102. The best-fit substitution model was determined by ModelTest-NG v0.1.7103, and a maximum likelihood (ML) phylogeny was inferred using RAxML-NG v1.2.2104 with 1000 bootstrap replicates. Divergence times were estimated with the MCMCTree program in the PAML package v4.9j105. To ground divergence-time estimation in reliable paleobotanical evidence, we constrained the Prunus stem node using the early Eocene P. wutuensis fossil (~55 Mya)106, implemented as a soft calibration bound in line with established Rosaceae dating practices. We further incorporated two widely supported secondary calibration intervals (Arabidopsis–Vitis: 110–124 Mya; Vitis–Rosa: 102–113 Mya) to stabilize deeper nodes. An initial substitution-rate estimate obtained with baseml was then used to parameterize the rate prior in MCMCTree. Gene family expansion and contraction across the phylogeny were assessed with CAFE v5107.

Analysis of flowering genes

Flowering genes were retrieved from the A. thaliana database (https://www.arabidopsis.org/) and used as targets for scanning the Pavi-T, Pmah, and Ppse genomes. These genes were aligned against the protein sequences of the three genomes using BLASTP v2.16.0 (E value = 1e-5). Genes with identities >70% were classified as homologs, while those below this threshold or without hits were treated as candidates for re-annotation. New genes were added only when they did not overlap with previously annotated ones, and the annotations were updated accordingly.

Clean reads of RNA-seq data were aligned to the genome using HISAT2 v2.2.1108. Expression levels were quantified as fragments per kilobase of transcript per million mapped reads (FPKM) using StringTie2 v3.0.0 (default parameters). Co-expression networks were constructed using WGCNA109 in R v4.2.3 with parameters (minModuleSize = 30, mergeCutHeight = 0.25, and power = 16). Modules showing >90% correlation at any developmental stage were considered target modules. Genes from target modules were compared across Pavi-T, Pmah, and Ppse to identify those with consistent expression in Pavi-T and Pmah but divergent patterns in Ppse.

SNPs, Indels, and SV calling of sweet cherry accessions

The SNPs and Indels of 219 cherry accessions were obtained from a previous study6. Short reads from these accessions were used to genotype variants in the pangenome with the VG v1.53.098. Only SVs ≥50 bp were retained for downstream analyses. To assess the reliability of SV detection, a subset of SVs was randomly selected for PCR validation (Supplementary Data 30).

Population structure analysis

To assess the impact of SV sets on population genetic analyses, we compared them with SNP and Indel data from 219 cherry accessions collected across diverse regions. To ensure representative variation and reduce computational load, SNPs, Indels, and SVs were filtered by minor allele frequency and missing rate. Variant annotation was performed using ANNOVAR v2015-12-14110. Shared allele distance matrices were calculated in PLINK v1.90b7111. PCA was conducted using the PLINK PCA module. To infer phylogenetic relationships from genome-wide structural variations (SVs), filtered biallelic SV genotypes were converted into a binary presence/absence matrix and formatted into PHYLIP v3.698112. Maximum-likelihood phylogeny was reconstructed using IQ-TREE v1.6.1262 under the MK + ASC model with 1000 ultrafast bootstrap replicates. Phylogenetic trees were visualized with the iTOL web server (https://itol.embl.de). For population structure analysis, SNPs in linkage disequilibrium (LD) were pruned in PLINK v1.90b7 using a 50-SNP sliding window (step size = 10 SNPs) with an r2 threshold of 0.2. ADMIXTURE v1.3113 was then used to infer population structure, with K values set from 2 to 10, and cross-validation error estimates were used to determine the best fit.

Genomic selection signature identification

To detect a selective sweep, we performed a genome-wide scan of selection signals using the cross of FST and nucleotide diversity (π). A sliding window of 100 kb with 10 kb steps was applied, and FST and π values were calculated with VCFtools v0.1.16114. Genes located within the top 1% of regions showing significant selection signals were identified as candidate loci. Haplotype block regions were visualized using LDblockshow v1.4115.

GWAS analysis

To identify genetic loci associated with flowering time in cherries, GWAS was conducted using SNP, Indel, and SV genotypes from 219 accessions. Analyses were performed with both the linear mixed model (LM) implemented in GEMMA v0.98.5116 and the GLM in TASSEL v5.2.89117. GWAS results, including Manhattan and Q-Q plots, were visualized with the qqman package in R software v4.2.3.

Quantitative real-time reverse transcription PCR (qRT-PCR)

Real-time PCR was performed on a BIOER LineGene 9600 Plus Fluorescent Quantitative Detection System (FQD-96A; BIOER, Hangzhou, China) using the TB GreenTM Premix Ex TaqTM II kit (TaKaRa, Tokyo, Japan). Amplification was performed under the following conditions: an initial denaturation at 95 °C for 30 s, followed by 40 cycles of 95 °C for 5 s and 60 °C for 30 s, and a final denaturation step at 95 °C for 10 s118. Relative expression of the target genes was calculated using the 2−ΔΔCT method119, with PavActin serving as the reference gene for normalization. Primer sequences used for qRT-PCR are listed in Supplementary Data 31.

Microstructure assessment of cherry flower buds

The phenotypes of cross sections of sweet cherry flower buds at ten developmental stages were examined using a stereomicroscope (SZ810, Cnoptec). Paraffin sections were prepared by fixing samples in FAA solution, followed by dehydration through an ethanol series, paraffin embedding, and sectioning into 8–10 μm slices using a microtome120. Slides were analyzed and imaged using a BX61 microscope (Olympus, Tokyo, Japan). GFP signals in transiently transformed flower buds were detected with a Leica fluorescence microscope equipped with a Leica DFC450 C camera (Leica Microsystems Inc.).

Subcellular localization

The CDS of PavAGL9 without the stop codon was amplified using the primers listed in Supplementary Data 31. The CDS was cloned into the binary vector pHB, which carries two cauliflower mosaic virus 35S promoters, a translation enhancer, and a GFP fluorescent protein tag. This generated a fusion construct designated p35S-PavAGL9-GFP. The fusion and control vectors (pHB) were introduced into Agrobacterium tumefaciens strain GV3101 and subsequently used to infiltrate the leaves of 3–5-week-old tobacco plants. Three days post-infiltration, when GFP fluorescence reached its peak, the localization of the fluorescent proteins was observed using a confocal laser scanning microscope (Leica TCS SP8 STED 3X; Wetzlar, Germany).

Stable plant transformation

The full-length CDS of PavAGL9 was cloned into the binary pHB-GFP vector to generate an overexpression construct (PavAGL9-OE), which was subsequently transformed into A. tumefaciens strain GV3101. Transgenic A. thaliana plants were generated by immersing flowering plants in Agrobacterium tumefaciens GV3101 suspensions carrying the construct, followed by seed collection and selection of transformants121, targeting the Col-0 and agl9 mutants. Plants were grown in a controlled growth chamber under a 16-h light/8-h dark photoperiod at 23 °C. After screening for herbicide resistance and confirming transgenic integration by DNA detection, transgenic lines with high PavAGL9 expression were selected for phenotypic analyses.

Transient transformation

Transient overexpression was performed using an Agrobacterium-mediated transformation approach with minor modifications122. Two recombinant plasmids (PavAGL9-OE and PavBPC6-OE) for overexpression were introduced into the A. tumefaciens strain GV3101. An empty pHB-GFP vector served as a negative control. Agrobacterium cultures were grown to an optical density (OD600) of approximately 0.6–0.8, collected by centrifugation, and resuspended in infiltration buffer. Subsequently, whole sweet cherry flower buds were immersed in the bacterial suspension and subjected to vacuum infiltration in a vacuum chamber connected to a vacuum pump. The infiltrated flower buds were then cultured under controlled conditions for subsequent analyses.

Selection of candidate transcription factors binding the PavAGL9 promoter

Candidate transcription factors were predicted using PlantRegMap (https://plantregmap.gao-lab.org/regulation_prediction.php). Co-expression modules between PavAGL9 and other genes were constructed using the WGCNA shiny plugin implemented in TBtools v2.056 software123. Network analysis was performed using default parameters, except that the soft-thresholding power was set to 30 and the minimum module size to 30 genes. The clusters were merged based on eigengenes, and eigengenes values of each module were calculated to examine associations with PavAGL9. Candidate transcription factors were prioritized by integrating promoter binding predictions with co-expression network analysis.

Yeast one-hybrid assay

The full-length CDS of PavBPC6 was cloned into the pB42AD vector at the EcoRI site. A 1900-bp promoter fragment of PavAGL9 was amplified and inserted into the pLacZi vector at the EcoRI site. The reconstructed pLacZi and pB42AD vectors were co-transfected into yeast one-hybrid EGY48 cells. Protein–DNA interactions were assessed on a selective medium that lacked Ura and Trp, and supplemented with 100 mg/mL X-Gal. The primers used to construct the Y1H assay are listed in Supplementary Data 31.

Dual-LUC assay

The promoter fragment of PavAGL9 was cloned into the pGreenII 0800-LUC vector using BamHI and PstI sites to generate the reporter construct proPavAGL9: LUC. The full-length CDS of PavBPC6 was inserted into the pPZP6K90 vector at the XbaI endonuclease site. Both constructs were introduced into A. tumefaciens strain GV3101. Agrobacterium cultures carrying the effector (35S:PavBPC6-pPZP6K90) and the reporter construct proPavAGL9: LUC were co-infiltrated into N. benthamiana leaves, with pPZP6K90 and proPavAGL9: LUC serving as controls. After 2–3 days, luciferase reporter assays were performed using commercial reagents (YEASEN, Shanghai, China). Luciferase signals were visualized and quantified using a Tanon-5200 imaging system (Tanon, Shanghai, China). Primer sequences used for this assay are listed in Supplementary Data 31.

Electrophoretic mobility shift assay (EMSA)

The full-length CDS of PavBPC6 was cloned into the pGEX4T-2 vector in-frame with a GST tag using the BamHI and EcoRI sites. The constructs were transformed into Rosetta (DE3) competent cells for recombinant protein expression. Potential PavBPC6-binding sites within the PavAGL9 promoter were predicted using PlantRegMap and FIMO (MEME Suite, https://meme-suite.org/meme/tools/fimo). Biotin-labeled double-stranded DNA probes containing the predicted motifs were synthesized for EMSA analysis. Recombinant proteins were purified using glutathione (GSH) beads, and EMSA was performed with the LightShift Chemiluminescent EMSA Kit (Thermo Fisher Scientific, Waltham, MA, USA). Protein purity was verified by SDS–PAGE. Gels were imaged using a Tanon 5200 Chemiluminescent Imaging System and scanned using a ChemiDocTM XRS+ system (Bio-Rad, Hercules, CA, USA). All probes were synthesized by Sangon Biotech Co., Ltd. (Shanghai, China). Primer sequences used in these assays are provided in Supplementary Data 31.

Yeast two-hybrid assay

Full-length CDS of PavPMADS2 and PavSEP1 were cloned into the pGBKT7 vector using NdeI and EcoRI sites, whereas PavAGL9 was cloned into the pGADT7 vector using BamHI sites. All reconstructed vectors were confirmed by Sanger sequencing. The reconstructed and empty vectors were then introduced into yeast Y2HGold cells. Protein interactions were screened on a selective medium lacking Trp, His, Leu, and Ade, and supplemented with 20 mM X-α-Gal. AD-T and BD-lam were co-transferred as negative controls, while AD-T and BD-53 served as positive controls. Primer sequences used for the Y2H assay are listed in Supplementary Data 31 and were synthesized by Sangon Biotech Co., Ltd. (Shanghai, China).

Bimolecular fluorescence complementation (BiFC)

Full-length CDS of PavPMADS2 and PavSEP1 were inserted into the pXY106 vector (using BamHI and SalI sites, respectively), whereas PavAGL9 was cloned into the pXY104 vector using the same restriction sites. The resulting constructs were introduced into A. tumefaciens strain GV3101. Agrobacterium cultures were grown to an OD600 of 0.6–0.8 and then resuspended in infiltration buffer. The suspensions were co-infiltrated into leaves of N. benthamiana using a needleless syringe. Fluorescence signals were observed 2–3 days post-infiltration using a confocal laser scanning microscope (Leica TCS SP8 STED 3X; Wetzlar, Germany). Primer sequences used in this assay are listed in Supplementary Data 31.

Co-immunoprecipitation (Co-IP)

The constructs PavAGL9-YFP, PavPMADS2-Flag, and PavSEP1-Flag were generated by inserting the full-length coding sequences of PavAGL9, PavPMADS2, and PavSEP1, respectively, into the pHB-X-YFP and pHB-X-Flag vectors using BamHI and SpeI sites. The recombinant plasmids were transformed into A. tumefaciens strain GV3101. For transient expression, N. benthamiana leaves were infiltrated with Agrobacterium suspensions harboring the indicated constructs and incubated for 2–3 days. Total proteins were extracted from infiltrated leaf tissues using protein extraction buffer containing protease inhibitors, and the supernatant was obtained by centrifugation. For immunoprecipitation, protein extracts were incubated with anti-GFP protein G magnetic beads (Thermo Fisher Scientific, Waltham, MA, USA) at 4 °C for 2–4 h with gentle rotation. After extensive washing, the bound proteins were eluted and analyzed by immunoblotting using anti-GFP and anti-Flag antibodies, respectively, to detect protein–protein interactions. All antibodies used in this study are purchased from Beijing TransGen Biotech Co., Ltd. (Beijing, China). Primer sequences used for plasmid construction are listed in Supplementary Data 31.

Statistical analysis

Statistical analysis for multiple group comparisons was performed using one-way analysis of variance (ANOVA) by two-sided Tukey’s post-hoc test. For comparisons between two groups, two-sided Student’s t-test was applied. All data are presented as mean ± standard deviation (SD), and statistical significance was defined as P < 0.05. Bar charts and box plots were generated using GraphPad Prism v10.5.0.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.