Main

Many vital crops display self-incompatibility (SI); examples are potato1, cabbage2 and pear3. SI prevents inbreeding depression and maintains the genetic diversity of species, which can lead to high levels of genomic heterozygosity. The hybridization of different parents can produce offspring that show superior performance for several traits4. However, this heterozygosity increases the complexity of the genome assembly and most assembled genomes represent a consensus consisting of haplotype mixtures that restrict investigations into allelic variations that underlie important traits at the whole-genome level. In recent years, more haplotype-resolved plant genomes have been assembled from species such as apple5, tea6, lychee7, potato1, cassava8 and sugarcane9. These assemblies have facilitated the identification of structural variations (SVs) and allele-specific expression patterns between haplotypes and have contributed to understanding the biological characteristics of highly heterozygous species.

Pear is an SI diploid species and is one of the most economically important fruit species worldwide. Generally, pears can be divided into two main groups, Asian and European pears, which showed distinct phenotypic difference including fruit shape, sugar, acid, flavor and softening10. The first Asian pear reference genome from the Pyrus bretschneideri cultivar ‘Dangshansuli’ was released in 2013 (ref. 11), followed by the European pear Pyrus communis ‘Bartlett’ genomes in 2014 (ref. 12) and 2019 (ref. 13). In 2019, a phased diploid Asian pear genome was assembled using single-pollen cell sequencing technology14. Due to sequencing and assembly technology limitations, these earlier pear reference genomes were incomplete and lacked a large amount of genomic information. Recently, the nearly complete telomere-to-telomere (T2T) genomes of ‘Dangshansuli’ and ‘Conference’ (P. communis) were released using a mosaic assembly strategy15. However, the mosaic genome assembly restricted analyses to variations between two haplotypes. In addition, population structure and domestication in pear have been extensively discussed recently10,16. Nevertheless, the complex evolutionary history and uncertain phylogeny, especially the pattern of reticulate evolution with close wild relatives, remain to be examined. In addition, the high-quality T2T genomes provide the opportunity to construct a graph-based genome, which is a powerful tool for SV genotyping using next-generation sequencing (NGS) data17.

In this study, we constructed haplotype-resolved, T2T genome assemblies for the cultivars ‘Dangshansuli’ (DS) and ‘Max Red Bartlett’ (MRB; red bud mutation of ‘Bartlett’), which are representative of Asian and European pears, respectively. The assemblies are improved with regard to genome completeness and accuracy and provide opportunities to analyze allelic and interspecific variations. Resequencing data for 362 pear accessions were collected to investigate the evolutionary and demographic history of these pear groups. We constructed a graph-based genome by combining the two high-quality genomes with the resequencing data. SVs exhibiting significant frequency variations between Asian and European pears have been identified. We further explored the association between SVs and fruit traits and between SVs and gene expression. One 286-bp insertion sequence in the promoter region of PyACS1 may have contributed to changes in fruit maturity and softening in European pear fruit. The results of this study advance our understanding of heterosis, evolutionary history and critical genetic variation in the pear phenotype and will provide valuable resources for pear breeding.

Results

Haplotype-resolved T2T assemblies of pears

In view of the heterozygosity of P. bretschneideri DS (1.23%) and P. communis MRB (1.33%), two haplotypes of the preliminary diploid genomes were assembled using PacBio HiFi, ONT (Oxford Nanopore Technologies) ultra-long, and Hi-C data with hifiasm18. Then, Verkko19 and HiCanu20 assembly results were used to fill the gaps and telomeric regions (Supplementary Table 1 and Methods). All gap-closure regions were supported with uniform coverage of ONT and HiFi reads (Supplementary Figs. 1, 2 and 3). Finally, each haplotype was assembled into 17 contigs corresponding to the 17 pear chromosomes with genome sizes ranging from 497.50 Mb to 505.55 Mb and contig N50 sizes ≥28.96 Mb (Table 1). The sequence 18.89 Mb to 24.13 Mb was not assembled in released P. bretschneideri and P. communis genomes (Supplementary Fig. 4a). The contig N50 size, mapping rate, Benchmarking Universal Single-copy Orthologs (BUSCO) and Long Terminal Repeat (LTR) Assembly Index supported the high quality of the assemblies (Supplementary Note 1). We identified 34 telomeres on the terminal ends of the 17 chromosomes of the haplotypes in DS and 33 in MRB, using a telomere repeat sequence (TTTAGGG) as the query (Fig. 1a,b). In summary, we successfully assembled T2T haplotype-resolved genomes for DS and MRB.

Table 1 Summary of genome assembly and annotation of DS and MRB
Fig. 1: DS and MRB genome assemblies.
figure 1

a,b, Genomic features across haplotype A and B of DS (a) and MRB (b): (I) telomeres and centromeres; (II) TE density (number of TEs per 200 kb); (III) gene density (number of genes per 200 kb); (IV) SNP density (number of SNPs per 200 kb); and (V) GC content per 200 kb. ‘A’ means HapA and ‘B’ HapB. c, S-RNase and SFB (S-locus-related F-box gene) distribution in HapA and HapB of DS and MRB. The top graphs show gene number and TE percentage (per 50-kb window size and 20-kb step size) and the bottom graphs show the distribution of S-RNase; SFB and other genes and the genotype of S-RNase are marked. d, Orthologous gene pairs at an S-locus region across four haplotypes. The red line depicts orthologous F-box gene pairs, the yellow line the S-RNase gene pair and the green line other gene pairs.

The DS and MRB haplotypes contained 262.86–272.23 Mb of transposable elements (TEs) (Table 1) and LTRs comprised the most abundant category of TEs (Supplementary Table 5). A total of 42,675–43,602 protein-coding genes were predicted in the haplotypes of DS and MRB (Table 1), with the BUSCO completeness ranging from 97.90% to 98.60% (Supplementary Fig. 7). The high-quality assembly and annotation results provided an opportunity to investigate the complex regions, including centromeres and S-locus alleles (Fig. 1c,d and Supplementary Note 2). In addition, 577 (563) genes were identified in newly assembled regions of haplotype A (HapA) (haplotype B (HapB)) in DS and 697 (804) were in HapA (HapB) of MRB. These genes were predominantly (77.33–84.06%) involved in the regions with high TE content and high divergence between the two haplotypes (Supplementary Fig. 4b). Within the newly assembled regions, we identified cell division cycle protein 48, a key regulator of plant development and stress response21, and UDP-glucose flavonoid 3-O-glucosyltransferase 7, which is critical for sugar decomposition in strawberries22.

Identifying haplotype-specific and ASE genes

A long hybrid history has resulted in extensive genetic variations (Supplementary Note 3) and alleles between the two haplotypes in the pear. Based on the presence of ortholog genes in corresponding haplotypes, 33,710–34,036 allelic genes without the same coding sequence (CDS) were defined as biallelic genes (Fig. 2a,b). Of these genes, 6,691–8,087 contained partners with the same CDS in corresponding haplotypes (alleles with the same CDS) (Supplementary Table 11). We identified 1,703–1,994 genes in a haplotype with no allelic gene (haplotype-specific genes). Haplotype-specific genes had higher abundances of TEs compared with biallelic genes in the 5-kb upstream and downstream regions and gene body regions (Fig. 2c). The mean expression levels of haplotype-specific genes were lower than those of biallelic genes (or alleles with the same CDS) in different fruit samples of DS and MRB (Supplementary Fig. 14). Our analysis revealed that haplotype-specific genes exhibited higher DNA methylation levels in both the gene body and the regions spanning ±5 kb from the gene in leaf and mature fruit tissues of DS and MRB (Supplementary Fig. 15). This suggested stronger repression by methylation in these areas. These findings were in line with previous research, which generally regarded DNA methylation as a regulatory factor for gene expression23,24. The haplotype-specific genes were enriched in ‘Plant–pathogen interaction’ and ‘Environmental adaptation’ in the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways (Supplementary Fig. 16) and the ‘Response to biotic stimulus’ gene ontology (GO) terms (Supplementary Fig. 17).

Fig. 2: Allelic gene variation and expression pattern in DS and MRB.
figure 2

a, The layout for allele gene classification. The alleles were divided into the haplotype-specific genes (no paired genes in the corresponding haplotype), the alleles with the same CDS and biallelic genes. The star with the dotted line represents this gene not existing. b, Proportion of biallelic, allele with the same CDS and haplotype-specific genes in each haplotype. c, The TE distribution across gene body, 5-kb upstream and 5-kb downstream regions of biallelic, allele with the same CDS and haplotype-specific genes in HapA and HapB of DS and MRB. d, Number of ASE genes in eight fruit stages (including five developmental and three postharvest stages) of DS and MRB. e, The ASE gene type alteration between DS and MRB. HapDom indicates haplotype dominance gene pair. Sub denotes sub- or neofunctionalized pair and other ASE genes (Nodiff). NoASE denotes that the gene pair did not have ASE. f, The density of Ks and Ka/Ks values of ASE genes. g, The SNP density in gene features of HapA in DS (nHapDom = 4,498, nSub = 219, nNodiff = 3,794). The center line is the median, the box limits the first and third quartiles and the whiskers 1.5× the interquartile range (IQR). The terms nHapDom, nSub and nNodiff refer to the counts of different ASE genes. The y axis represents SNP numbers every 1,000 bp. The two-sided Student’s t-test was used for determining the significance. h, Expression pattern of the two alleles of SWEET2 and C3H in fruit samples of DS and MRB. TES, transcription end site; TSS, transcription start site.

We further investigated allele-specific expression (ASE) during the eight fruit stages of DS and MRB: 19.64% (8,511 of 43,330) and 26.21% (11,213 of 42,784) of genes showed ASE in at least one sample of DS and MRB (Fig. 2d). These ASE genes were further classified into three categories (Methods): 4,498 (4,953) haplotype dominance (HapDom, genes with expression biased toward one allele), 219 (527) Sub (sub- or neofunctionalized pairs, alleles with switched expression in different samples) and 3,794 (5,733) other ASE genes (Nodiff) in DS (MRB). Only 44.52% of ASE genes in DS also exhibited an ASE pattern in MRB (Fig. 2e). Of ASEs in DS, 22.12% maintained a consistent category with ASEs in MRB and 22.40% ASEs in DS showed an inconsistent category with ASEs in MRB. Enrichment results showed that these inconsistent ASEs between DS and MRB were enriched in biosynthesis and metabolism pathways and consistent ASEs were enriched in response to stress and different stimulus GO terms (Supplementary Fig. 18).

We annotated 380 (in DS) and 450 (in MRB) ASE genes, including sugar-related fructokinases, α-glucosidases and sugar transporters, acid-related malate dehydrogenases, flavonoid-related flavanol synthases, lignin related p-coumarate 3-hydroxylase (C3H), 4-coumarate: coenzyme A ligase (4CL) and cinnamoyl-CoA reductase, which might contribute to important fruit agronomic traits at the allele level (Supplementary Fig. 20a). Among the ASE genes related to fruit traits, we observed that 95.26% (362 of 380) of the genes in DS and 88.89% (400 of 450) of the genes in MRB exhibited a haplotype dominance expression pattern, indicating that one allele had a greater influence on fruit quality and development than its counterpart. For example, SWEET2 (sugars will eventually be exported transporters 2) and CPK28 (calcium-dependent protein kinase 28) transport and increase sugar content25,26 and showed a dominant expression pattern in DS and MRB (Fig. 2h and Supplementary Fig. 20b). Pbre1_15G769200 and Pbre1_13G264800, which encode C3H and 4CL protein, respectively, are major structural genes involved in lignin biosynthesis27,28. These genes exhibited higher expression levels during the early fruit developmental stages (S1–S2) compared with their counterparts in DS HapB (Fig. 2h and Supplementary Fig. 20b).

Genomic evolution between Asian and European pears

To explore the evolutionary history of Asian and European pears, the protein-coding genes of DS, MRB and ten other species were used in comparative analyses (Fig. 3a). Phylogenetic analysis indicated that DS and MRB diverged approximately 3.60 million years ago (Ma). Compared with the common ancestor, 767 and 740 gene families were expanded in DS and MRB. The expanded gene families in DS were enriched in ‘Biosynthesis of plant secondary metabolites’ and ‘Plant–pathogen interaction’ pathways and ‘Response to stress’ GO terms (Supplementary Fig. 21a), which are important for plant development and adaptation29,30. The ‘Fructose and mannose metabolism’ and ‘Carbohydrate metabolism’ pathways and ‘Secondary metabolic process’ GO terms were featured in the expanded gene families in MRB.

Fig. 3: Divergence and SVs between DS and MRB.
figure 3

a, Phylogenetic, molecular clock dating and gene family analysis of the DS and MRB genome with ten other species. The red oval indicates the fossil calibration point. The gain and loss of total gene families in DS, MRB and the other ten species are highlighted in green and red, respectively. b, Ideogram with synteny regions, SVs and inversion distribution in 200-kb bins. Synteny regions, SV and inversion distribution are plotted to the side of each chromosome. Centromeric region is plotted in chromosome representations. c, The proportion of SVs overlapping with TEs or different genomic regions. In the gene body, upstream 2-kb and downstream 2-kb regions of the gene body are considered genic regions and other regions are considered intergenic regions. d, Large inversions between two haplotypes in DS or MRB. The yellow regions on the collinear blocks represent the inversions. Hi-C heatmaps support the inversion. The dashed box represents the inversion and the arrows show the high interaction signal at the reverse diagonal. e, Pan-SV and core-SV count with Asian and European pear accessions. Accessions were sampled as 1,000 random combinations of each given number of accessions. f, The distribution and frequency density of SV genotype between Asian and European pears. The circos plot shows the distribution of the frequency variance between Asian and European pears: (I) the chromosomes of HapA of DS; and (II–V) frequency variance of SVs that overlapped with the upstream 2-kb (II), downstream 2-kb (III), exon (IV) and intron (V) regions. The density graph shows the distribution of SV frequency with reference genotype (HapA of DS as reference).

The extensive evolutionary divergence between Asian and European pears implies their marked genomic variations (Fig. 3b). In this study, four haplotypes were used to construct a graph-based genome for SV identification and 103,069 SVs were successfully identified and genotyped. Of the SVs, 57.42% were in the gene body and gene-regulatory regions (±2 kb) and 61.27% overlapped with TEs (Fig. 3c). In addition, 12 large inversions (>100 kb) were identified, spanning 5.79 Mb, and all were supported by ONT reads or Hi-C maps (Fig. 3d and Supplementary Fig. 23). Notably, a 0.58-Mb inversion on chromosome 17 (chr. 17) was present between DS HapA and HapB at the S-locus regions (Fig. 3d). This inversion may suppress recombination and maintain the linkage of SFB and S-RNase genes for SI31,32.

To reveal the SVs between Asian and European pears, NGS data from 362 pear accessions were collected (Supplementary Table 15) and mapped to the graph-based genome for SV genotyping. The number of genotyped SVs reached a plateau as more Asian and European samples were added (Fig. 3e). A total of 83,369 SVs were retained with minor allele frequency >0.05 and missing rate <0.2. The reference (DS HapA as reference) presence frequency of 76.82% SVs was >0.5 in the Asian pear group (Fig. 3f), and the frequencies of 42.73% SVs were >0.5 in the European pear group. There were 35,481 SVs that showed different presence frequency (false discovery rate (FDR) < 0.001 and fold-change >2) between the Asian and European pear groups.

Evolutionary history and genetic introgression

We further explored the introgression events, demographic history and domestication process. The SNP-based and SV-based phylogenetic tree showed that a division of the 362 pear accessions into 2 distinct clades, with clade I comprising most Asian pears and clade II consisting of European pears (Fig. 4a and Supplementary Fig. 27a). Clade I was further subdivided into four subclades—(1) P. bretschneideri and Pyrus pyrifolia; (2) Asian wild pears; (3) wild and cultivated Pyrus ussuriensis; and (4) Pyrus sinkiangensis and interspecific hybrids—and clade II was subdivided into two subclades—(5) P. communis and (6) European wild pears. High fixation statistics (FST) and principal component analysis results supported a long divergence history between Asian and European pears (Fig. 4b and Supplementary Fig. 27b,c). Cultivated P. ussuriensis (4.80 × 10−3) had higher nucleotide diversity than wild P. ussuriensis (3.26 × 10−3). The structure analysis revealed the historical hybridization of the cultivated pears (Fig. 4a and Supplementary Fig. 27d) and cultivated P. ussuriensis showed two genetic backgrounds, originating from wild P. ussuriensis and cultivated P. pyrifolia (or P. bretschneideri). The Treemix analysis detected gene flow from P. bretschneideri to P. ussuriensis (Supplementary Fig. 28), suggesting that an introgression event may have played a role in P. ussuriensis domestication.

Fig. 4: Phylogenetic relationship, demographic history and selective sweeps of different pear groups.
figure 4

a, Phylogenetic tree and population structure of 362 pear accessions, inferred using SNPs at fourfold degenerate sites. K is the the number of ancestry kinship. b, Mean nucleotide diversity in each group and population differentiation (FST) between groups. The line width and color of the edges represent FST values between the two groups. The circle size and value represent the nucleotide diversity of each pear group. The label of circle colors is the same with a phylogenetic tree plot. c, Historical effective population size Ne for Asian wild accessions (except wild P. ussuriensis) with a generation time of 7.5 years. Orange, blue and yellow shadows represent three bottlenecks. d,e, Genome-wide screening of selective sweeps in AW versus AC pears (including P. pyrifolia and P. bretschneideri) (d) and EW versus EC pears (P. communis) (e). The red dotted lines represent the threshold of the top 5% FST and 5% ratio of πwild to πcultivar, respectively.

Inferences about the demographic history revealed three effective declines in population size (Ne) in Asian (except wild P. ussuriensis) and EW pears (Fig. 4c and Supplementary Fig. 29a). The first decline event occurred at 2.5–1.7 Ma during the Gelasian epoch (2.59–1.81 Ma)33, characterized by a dramatic temperature decrease. The second Ne decline event occurred 1.0–0.6 Ma with another period of temperature decrease34 and the last decline occurred 250 thousand years ago (ka) to 160 ka. It was followed by a quick demographic expansion from 150 ka to 10 ka, consistent with increasing temperatures35. Only two Ne decline events were observed in wild P. ussuriensis (Supplementary Fig. 29b), with the second event occurring from 380 ka. to 180 ka A slight rebound was observed thereafter, resulting in the lower Ne of wild P. ussuriensis compared with Asian wild (AW) and European wild (EW) pears.

Selection signatures for fruit quality of pear

The cultivated pears, domesticated from wild progenitors, exhibit various phenotypical changes, including sugar, acid, stone cell and fruit size. In this study, we identified 21.31 Mb of selective sweep regions for AW versus Asian cultivated (AC, including P. pyrifolia and P. bretschneideri) pears and 14.59 Mb of selective sweep regions for EW versus European cultivated (EC; P. communis) pears (Fig. 4d,e); 123 selected genes related to fruit size, plant development, stone cell content, acids and sugars were identified in the selective sweeps (Supplementary Table 18). Notably, a robust selective sweep region was identified at chr. 16 for AW versus AC pears, and fruit acid-, size- and sugar-related genes were identified, including Ma1 (Pbre1_16G081600), fw2.2 (Pbre1_16G040000 and Pbre1_16G040200) and SWEET15 (Pbre1_16G220200) (Fig. 4d). The orthologs of Ma1 and fw2.2 also appeared in a selective sweep in cultivated apples36, suggesting that apple and pear may experience convergent selection for fruit acid and size.

Stone cell content (SCC) is a crucial trait in pear domestication. SCC forms during the early stages of fruit development and is associated with lignin and cellulose formation37. In Asian pear domestication, 11 structural genes identified in selective sweeps of AW versus AC pears were related to lignin and cellulose formation (Supplementary Table 18). PbrbZIP48 (Pbre1_16G094200), which regulates the expression of stone cell-related genes38, was identified in the selective sweeps of AC pears (Fig. 4d). PbrmiR397a, identified in the selective sweep, can silence the expression of LAC genes, thereby reducing lignin and stone cell contents in pear flesh39. These results suggested that structural genes, transcription factors and microRNAs (miRNAs) related to stone cell formation may have been selected in parallel during Asian pear domestication to reduce the SCC in fruit.

Fruit size is widely considered a domestication trait in domesticated crops40,41 and it is known to be regulated by two key processes: cell division and cell expansion41. Cell division and cell expansion genes were identified, suggesting that they play a role in these processes and may affect fruit size (Supplementary Table 18). Pbre1_13G194600 encodes CDKG2 (cyclin-dependent kinase G2), a protein that positively regulates cell growth in Arabidopsis spp.42. Pbre1_08G029000 encodes ARF5 (auxin response factor 5) and the homology of ARF5 has been proved to increase fruit size in tomatoes43, suggesting its potential to increase fruit size in pears. In addition, five missense SNPs with different genotype frequencies between EW and EC pears were identified (Supplementary Fig. 30a).

SVs associated with agronomic traits

Previous studies have suggested that SVs are causative variants for agronomic traits29. The high-quality 83,369-SV set provided us a chance to reveal the trait-associated SVs for pear. We performed an SV-based genome-wide association study (GWAS) for fruit transverse diameter (FTD), soluble solid content (SSC), SCC (Supplementary Note 11) and single fruit weight (SFW) (Supplementary Note 11), which are important traits for pear fruit quality and production. We identified a 66-bp deletion at chr. 2 that was significantly associated with FTD (Fig. 5a). This deletion overlapped with the selective sweep of AC pears (Fig. 5b) and its occurrence in P. pyrifolia (Ppy) and P. bretschneideri (Pbr) was lower than in the AW group (Fig. 5c). In addition, pear accessions without deletion had significantly higher FTD compared with those with deletion (Fig. 5d). A candidate gene, PyTPR, encoding a tetratricopeptide repeat-like superfamily protein, was identified downstream of this deletion (Fig. 5e). This gene is involved in plant development, elongation of cotton fibers44 and Arabidopsis roots45. Accessions without deletion have significantly higher expression levels of PyTPR compared with accessions with deletion (mean number of fragments per kilobase of exon per million mapped fragments (FPKM): 497.33 versus 81.85) (Fig. 5f). These results suggested that PyTPR was under selection during Asian pear domestication and may have a potential role in pear FTD and fruit development.

Fig. 5: SV GWAS of FTD and SSC.
figure 5

a, Manhattan plot displaying the SV GWAS outcomes of the FTD. P values were computed from a two-sided F test. The red dotted lines indicate the significance thresholds (1/n, where n denotes the SV number) for SV-based GWASs. b, Distribution of nucleotide diversity (π) of AW and AC pears (including P. pyrifolia and P. bretschneideri) and FST between AW and AC pears in a 20-kb interval. The shaded region means the selective sweep of AW versus AC pears. c, The deletion appearance frequency in AW pears and Ppy and Pbr groups. d, Comparison of FTD between samples with (n = 32) and those without (n = 150) deletion. e, The significantly associated SV downstream of PyTPR. f, PyTPR expression pattern in samples with (n = 6) and without (n = 22) deletion. g, Manhattan plot displaying the SV GWAS outcomes of the SSC. h, Comparison of SSC between samples with (n = 73) and those without (n = 199) deletion. i, The significantly associated SV downstream of PybZIP6. j, PybZIP6 expression pattern in DS and MRB at five fruit developmental (S1–S5) and three postharvest (S6–S8) stages. k, The deletion appearance frequencies in AW, Ppy, Pbr, EW and Pco groups. In d, f and h, P values were computed from two-sided Student’s t-tests. In d, f and h, each box represents the center line (median), the box limits (first and third quartiles) and the whiskers (1.5× the IQR).

Similarly, we identified a 1,016-bp deletion associated with SSC (Fig. 5g) and accessions with this deletion showed significantly higher SSC than those without it (Fig. 5h). Pbre1_15G533000, which encodes a bZIP transcription factor (PybZIP6), was identified (Fig. 5i). It has been reported that bZIP transcription factors are involved in SSC accumulation in pears46,47,48. PybZIP6 was highly expressed during the late fruit development stages (S3–S5) in DS, MRB and four other representative pear accessions, suggesting its potential in SSC accumulation at late fruit developmental stages (Fig. 5j and Supplementary Fig. 32). Compared with the EW group, the deletion occurrence frequency was higher in the P. communis (Pco) group (Fig. 5k). Overall, our SV-based GWAS provided new markers that will be helpful to improve the fruit quality of pear cultivars.

The role of PyACS1 in fruit maturity and softening

Nearby SVs can alter gene expression patterns due to the interruptions in regulatory or gene regions49,50. RNA-seq data from 110 samples (consisting of 62 Asian pears and 38 European pears) were generated to identify the SV effect on gene expression (Supplementary Table 19). A total of 4,040 SVs in the upstream 2-kb region and 568 SVs at the gene exon region were significantly correlated with gene expression (Fig. 6a); 70.79% (2,860 of 4,040) SVs and 65.49% (372 of 568) SVs exhibited divergent occurrence (FDR < 0.001 and fold-change >2) between the Asian and the European groups, suggesting the potential of SVs in regulating intergroup gene expression. These genes were enriched in metabolism and biosynthesis process, including ‘fructose and mannose metabolism’ and ‘citrate cycle’ (Supplementary Figs. 34 and 35). A total of 40 SVs and 12 SVs were significantly correlated with 36 sugar-related and 10 acid-related genes, including previously reported, fruit sugar-related genes like PpySTOP1, PpySPS8 and PbVHA-A25,48 (Supplementary Table 20).

Fig. 6: SVs associated with gene expression and functional impact on fruit ripening and softening of European pears.
figure 6

a, Manhattan plot displaying the SVs at exon or promoter regions significantly associated with gene expression. Sugar- and acid-related genes were marked in red and ethylene biosynthesis genes were marked in purple. b, The expression of PyACS1 in Asian (n = 62), European (n = 38) and interspecific hybrid (n = 10) pears. The ‘n’ means the number of accessions (center line as median, box limits as first and third quartiles and whiskers as 1.5× the IQR). c, PyACS1 expression pattern of DS and MRB at eight fruit stages. d, Phenotype of WT tomato fruit and tomato transformants overexpressing PyACS1 (OE-PyACS1). eg, Breaker stage (nWT = 10, nOE-PyACS1 = 10), ethylene production (nWT = 8, nOE-PyACS1 = 12) and firmness (nWT = 12, nOE-PyACS1 = 12) in the WT and OE-PyACS1 tomato fruit (center line as median, box limits as first and third quartiles and whiskers as 1.5× the IQR). The nWT and nOE-PyACS1 refer to the numbers of fruit samples used for WT and OE-PyACS1. h, A 286-bp insertion sequence identified at the promoter region of PyACS1 in MRB. i, PCR-based screen showing the absence (right) or presence (left) of the insertion sequence in the upstream of PyACS1 in European (I–V represent ‘Max Red Bartlett’, ‘Red comice’, ‘Spalding’, ‘Charneus’ and ‘Anjou’, respectively) and Asian (VI–X represent ‘Dangshansuli’, ‘Cuiguan’, ‘Cuiyu’, ‘Yunhong No. 1’ and ‘Yali’, respectively) pears. j, The insertion appearance frequency in Asian, European and interspecific hybrid pears. k, TE assays showing significant differences in promoter activities for PyACS1proMRB and PyACS1proDS. The data are presented as mean ± s.e.m. l, Relative activity levels of luciferase of three truncated promoters (830 bp, 1,221 bp and 2,000 bp) of PyACS1proMRB. The data are presented as mean ± s.e.m. m, EMSA showing that PybHLH94 binds directly to the G-box and E-box within the insertion sequence at the promoter region of PyACS1. Hot-P and Mut-P represent biotin-labeled WT and mutation probes, respectively; competitor-P represents the unlabeled WT sequence. In b, eg and kl, P values were computed from two-sided Student’s t-tests. The numbers of biologically independent samples used in kl is 3. The full scans of all the gels are included in Source data.

Source data

Compared with Asian pears, European pears are usually classified as climacteric fruit and ethylene production is responsible for the initiation and progression of fruit ripening and softening51,52. Five ethylene biosynthesis-related genes with SVs were identified (Supplementary Table 20) and one gene (Pbre1_15G535800) encoding 1-aminocyclopropane-1-carboxylic acid synthase (PyACS1) functions as the rate-limiting enzyme in ethylene synthesis53. Transcriptome analysis showed that PyACS1 was significantly more highly expressed in European pears than in Asian pears (Fig. 6b). The interspecific hybrids have higher expression levels than both Asian and European pears, indicating that overdominance expression pattern may be observed in these hybrid accessions54. The PyACS1 gene was only highly expressed during the late development fruit stages (S4 and S5) and three postharvest (S6–S8) stages in MRB and the FPKM value of PyACS1 was <0.1 at all eight stages (S1–S8) of DS (Fig. 6c). We further explored the functions of PyACS1 during fruit maturation and postharvest stages (S5–S8). Overexpressing PyACS1 (OE-PyACS1) tomato lines showed accelerated ripening (Fig. 6d, e), higher ethylene (Fig. 6f) and lower firmness (Fig. 6g) compared with wild-type (WT) fruit, suggesting that PyACS1 promotes fruit ripening and softening.

Furthermore, a 286-bp insertion sequence was identified in the promoter region of PyACS1, which was present in all European pears and interspecific hybrid accessions, but not in Asian pears (Fig. 6h–j). A PCR assay further confirmed that the 286-bp insertion sequence was present in European pears and absent in Asian pears (Fig. 6i and Supplementary Fig. 36). Sequence analysis of insertion showed that one E-box (CATTTG) and one G-box (CACGTT) cis-element were included (Supplementary Fig. 37a), which are the potential binding sites of basic helix–loop–helix transcription factors55,56. Thus, we conducted weighted co-expression network analysis and one basic helix–loop–helix transcription factor (PybHLH94) was identified in the PyACS1 gene co-expression module (Supplementary Fig. 37b). Dual-luciferase (LUC) transcription assay in tobacco protoplasts revealed that PybHLH94 can significantly activate PyACS1proMRB (with a 286-bp insertion), but did not activate PyACS1proDS (without a 286-bp insertion), suggesting insertion-dependent transcriptional regulation (Fig. 6k). Promoter truncation results showed that activation occurred within the −1,221-bp to −830-bp region, which contained the 286-bp insertion sequence (−1,141 bp to −855 bp) (Fig. 6l and Supplementary Fig. 37c). This suggests that PybHLH94 can specifically activate the promoter of PyACS1 by binding to the 286-bp insertion sequence. The electrophoretic mobility shift assay (EMSA) result showed that PybHLH94 (His) binds directly to the G-box and E-box in the 286-bp insertion sequence (Fig. 6m).

Discussion

In this study, we generated T2T haplotype-resolved genome assemblies for DS and MRB, allowing us to investigate all genomic regions and the allelic variations. Based on the haplotype assemblies, haplotype-specific genes and biallelic genes were identified. TEs and methylation have been proven to regulate gene expression57,58 and variation of methylation and TE between haplotype-specific genes and biallelic genes may be related to gene expression alteration in DS and MRB. HapDom ASE has an allele with biased expression in different fruit developmental stages of the pear, suggesting the dominance effect on heterosis. Furthermore, a sub-ASE gene exhibits one allele-biased expression in several samples but shifts to another allele in other samples, suggesting the overdominance effect59. As with apple and kiwifruit5,60, we observed that HapDom genes were more than subgenes, suggesting the major role of the dominance effect during fruit development. Our results showed that less than half of ASE genes were shared between DS and MRB and 22.12% of ASEs kept consistent categories, suggesting that ASE patterns were not conserved between different accessions. Studies showed that ASEs were related to the genetic and epigenetic variations8,60. Considering the long divergence history, variation accumulation between DS and MRB may have the potential for ASE alteration and further affect the adaptive capacity and plant development59,61.

Previous studies showed that cultivated P. ussuriensis was directly derived from wild P. ussuriensis10. However, we found that cultivated P. bretschneideri (and P. pyrifolia) played an important role in P. ussuriensis domestication. Cultivated P. ussuriensis showed higher nucleotide diversity than wild P. ussuriensis, which was likely increased by interspecific introgression6. Wild P. ussuriensis has high cold tolerance and was mainly distributed in the frigid environment of China10. Through introgression from cultivated P. bretschneideri, cultivated P. ussuriensis may promote the improvement of fruit quality and preserve adaptability of wild P. ussuriensis to a cold environment at the same time. These findings provide a new perspective on the domestication history of pears and offer a new strategy for breeding cold-resistant pear varieties.

Fruit sweetness, acidity and SCC are well-known domestication traits that are still the main targets in pear-breeding programs. We found many previously unreported trait-associated genes in selective sweeps based on the gold standard reference genome. PbrbZIP48 and PbrmiR397a were also identified in the selective sweeps of Asian pears and the products (a transcription factor and miRNA, respectively) regulate lignin biosynthesis-related genes and affect the SCC in fruit flesh39. These findings demonstrate that transcription factors, miRNAs and protein-coding genes may cause pear domestication-associated changes. MiR397 also appears in the Indica rice domestication process62, which suggests that miR397 was widely selected during plant domestication due to its conserved function in crop improvement. In addition, we also found that Ma1 and fw2.2 were selected during the domestication of apples and pears. Ma1 has been identified as a factor that can affect the fruit acidity of cultivated apples and a mutation (G to A) causes the Ma1 protein truncation and is responsible for the low fruit acidity in apple5. Still, this mutation did not appear in pears, implying that other mutations may have been selected during pear domestication. These selected or co-selected genes provide helpful genetic resources for understanding the domestication process and facilitating pear breeding.

Based on the two high-quality genome assemblies for DS and MRB, we constructed a graph-based genome and used it for SV genotyping with next-generation sequencing (NGS) data and identified additional signatures using the SV-GWAS approach, including several potential candidate genes for agronomically important traits. Ethylene production is the response for the initiation and progression of fruit ripening and softening in European pears51,52,63. In this study, we identified that PyACS1 has a high expression level in European pears and provided evidence of its role in fruit ripening and softening. We found one 286-bp insertion sequence from the promoter of the PyACS1 in European pears and this insertion sequence contains two new cis-elements that strengthen transactivation by PybHLH94. Asian and European pears experienced long divergence history and independent domestication events10 and their hybridization may form new germplasms. Overdominance gene expression patterns have been documented in hybrids of potato64, rice65,66,67 and Arabidopsis spp.68, which have elevated gene expression levels compared with their parental lines. Similarly, PyACS1 exhibits higher expression levels in interspecific hybrids compared with Asian and European pears, suggesting that, aside from the 286-bp insertion sequence, multiple factors in these hybrids may contribute to the overdominance expression pattern. Furthermore, accession-specific mutations may also influence gene expression by changing cis-elements like ABRE, STRE and G-box (Supplementary Fig. 38)69,70,71. However, the direct evidence of a regulatory effect of the 286-bp insertion in PyACS1 should be obtained in future research, once current technical limitations, such as the lack of a stable transgenic or genome-editing system in pears, has been overcome. Overall, our results showed that a graph-based genome may be a powerful tool for accurate population-level SV genotyping and genomic studies, as well as the forward application of genes and SVs in pear genome-guided breeding.

In conclusion, we generated two T2T haplotype-resolved pear genomes in this study and provided insights into genome evolution, haplotype imbalance and interspecies variation in Pyrus spp. Combining the high-quality genomic and population resequencing data allowed us to show the extensive genetic variance between Asian and European pears and identify trait-associated SVs. These genomic resources will provide a basis for further molecular biology research and will accelerate the breeding process in commercial pears.

Methods

Samples and genome sequencing

The pear cultivars ‘Dangshansuli’ (DS) and ‘Max Red Bartlett’ (MRB) are two critical commercial cultivars grown in China and Europe, respectively. Therefore, these two cultivars were chosen for genome sequencing. We generated HiFi, Hi-C and ONT ultra-long 100-kb reads for DS and MRB. To assist in gene prediction, RNA-seq data were generated from young stems, mature leaves, young leaves, seeds, pollen, radicles and fruits at different stages with: S1: 35 d after flowering blooming (d.a.f.b.); S2: 63 d.a.f.b.; S3: 91 d.a.f.b.; S4: 120 d.a.f.b.; S5: 150 d.a.f.b. (for MRB, 135 d.a.f.b.); and S6–S8: 2, 4 and 8 d after harvest. For population analysis, 362 pear accessions were used in this study, including 137 newly collected accessions and 225 previously released accessions10,72. The fruit pericarp was collected from 137 pear accessions and NGS data were generated. This study provides a detailed description of the pear samples used for genome assembly and short-read sequencing (Supplementary Method 1).

T2T genome assembly

The two haplotype assemblies of DS and MRB were performed using hifiasm (v.0.19.0)18 and Verkko software (v.1.4.1)19 with PacBio HiFi, ONT ultra-long and Hi-C data, and HiCanu20 with PacBio HiFi data. The Verkko assembly sequences were aligned to hifiasm assembly sequences for gap filling. If the assembly sequence can bridge the gap in the hifiasm result, the sequence will be merged to the genome sequence. HiCanu assembly sequences were used for telomeric region filling. ALLHIC (v.0.9.8) was used to anchor the contigs onto chromosomes with Hi-C data73 and the resulting assembly was manually corrected. To check the quality of gap filling, PacBio HiFi and ONT reads were mapped to each haplotype using Minimap2 (v.2.23)74. To evaluate the completeness and quality of the assembly, NGS reads were mapped to each haplotype using Burrows–Wheeler alignment (BWA, v.0.7.17)75. SAMtools (v.1.14)76 was then used to calculate the genome coverage and mapping rate. The LTR Assembly Index value was calculated using LTR_retriever77. Genome completeness was evaluated by BUSCO (v.5.2.2) using the embryophyta_odb10 database78. The accuracy of the genome is evaluated by calculating quasi-variance using Merqury (v.1.3)79. For haplotype switch error ratio estimation, we used the calc_switchErr pipeline (v.1.0)6 (Supplementary Fig. 39 and Supplementary Method 2). Hi-C data were mapped to each haplotype of DS and MRB using HiC-Pro (v.2.7.0)80. The output matrix file was visualized using HiCPlotter (v.0.8.1)81.

Repeated sequences and gene annotation

TEs in the two haplotypes of DS and MRB were annotated by combing prediction from the start and homology searches. First, RepeatScout (v.1.0.6)82, Piler (v.1.0)83, LTR_FINDER (v.1.07)84 and RepeatModeler (v.2.0.4)85 were used to build the de novo repeat sequence library and RepeatMasker (v.4.1.2)86 was used to annotate the repeat regions based on this library. Tandem Repeats Finder (v.4.09)87 was used to find the tandem repeats. Finally, RepeatMasker was used to retrieve the divergent TEs.

The protein-coding gene structure was predicted using a combination method involving prediction from the start, RNA-seq data prediction and homology-based prediction. The protein sequences from five species (Arabidopsis thaliana, P. communis, Malus domestica, Prunus persica and Pyrus betulifolia) were downloaded, and TBLASTN (v.2.2.26)88 searches and GeneWise (v.2.4.1)89 were used to predict the gene structure using the corresponding genomic regions of each BLAST hit. Augustus (v.2.5.5)90, GlimmerHMM (v.3.0.1)91 and Genscan (v.1.0)92 were used for gene predictions from the start. RNA-seq data were aligned to the haplotype sequences using HISAT2 (v.2.0.4)93 and the output was used for genome-guided transcript assembly using Cufflinks (v.2.1.1)94. De novo transcriptome read assemblies were generated using Trinity (v.2.1.1)95. Evidence Modeler (v.2.0.0) was used to integrate these prediction results to predict the gene model and PASA (v.2.5.3) added to predict the UTR region96. The functions of protein-coding genes were identified by eggNOG-mapper (v.2.1.10)97.

Collinearity analysis, K a and K s calculations

Paralogous and orthologous gene pairs were identified in syntenic blocks using JCVI (v.1.4.18)98 with the parameter ‘--cscore=0.99’. The protein sequences of each gene pair were aligned using MAFFT (v.7.49)99 and then converted into CDS alignments based on the CDS using ParaAT (v.2.0)100. The number of nonsynonymous substitutions per synonymous site (Ka), synonymous substitutions per synonymous site (Ks) and the Ka-to-Ks ratios were calculated for these pairs using the NGS method implemented in PAML (v.4.9b)101.

Gene family analysis and phylogenetic reconstruction

The protein sequences from 12 species, including P. bretschneideri (DS), P. communis (MRB), Oryza sativa, A. thaliana, Solanum lycopersicum, Prunus avium, P. persica, Prunus mume, M. domestica, Fragaria vesca, Rosa chinensis and Citrus sinensis were collected (http://www.phytozome.net). OrthoMCL (v.6.18)102 was used to identify the gene families and MUSCLE (v.5.2)103 to perform multiple sequence alignments. A maximum-likelihood tree was constructed using RAxML (v.1.1.0)104 with the PROTGAMMAAUTO model. The MCMCTree program in PAML101 was applied to estimate divergence times with five calibration values chosen from the Time Tree website (http://www.timetree.org). Expansions and contractions of orthologous gene families in the 12 species were determined using CAFE (v.2.2)105.

SNP and indel calling and SV analysis

MUMmer4 (v.4.0.0)106 was used to compare the haplotypes and a delta-filter was further used to filter the raw alignment results with the parameters -m -i 90 -l 100. The filtered alignment results were used to detect the SNP and insertion and deletion (indel) (≤50 bp) variations using the SyRI (v.1.6.3) pipeline with default parameters107. Minigraph (v.0.20-r559)108 was used to integrate the haplotypes (HapA and HapB of DS or MRB and HapA of DS and MRB) into graphs with the option -xggs. The gfatools (v.0.4-r214)108 was used to derive the SVs from the graph with the bubble-popping algorithm. The position information, reference and nonreference paths and path length of each bubble were extracted. The SVs were further divided into indels based on the lengths of the reference and nonreference paths. The variation was considered a deletion if the reference path was longer than the nonreference path. Each haplotype sequence was mapped to the graph-based genome for SV genotyping in DS and MRB using Minigraph (v.0.20-r559)108. The inversions were identified using the SyRI pipeline with the default parameters107. For SV validation, 100 SVs with lengths ranging from 51 bp to 11.24 kb and 20 SVs with lengths ranging from 22.40 kb to 181.19 kb were randomly selected. The ONT reads were aligned to the HapA of DS using Minimap2 (v.2.23)74 to check the SVs. The alignment results were manually checked using the Integrative Genomics Viewer (v.2.15.2)109.

Analysis of allele-specific gene expression

The allelic gene pairs between haplotypes were identified referring to a previous method used in sugarcane110 and tea6. The allele identification was based on two strategies: (1) synteny-based and (2) coordinate-based approaches. First, JCVI was used to identify the synteny region with the parameter ‘--cscore=0.99’ and all paired genes were considered allele A and B (biallelic gene). Then, gene models that were not present in syntenic blocks were mapped to genes in the corresponding haplotype using BLAST (v.2.2.26)88 and GMAP (v.2021-08-25)111. Potential alleles were considered if two genes had >50% overlap on coordinates, >80% sequence identity and the alignment length was >80% of gene sequences. Gene models with the same CDSs (allele with same CDS) were considered as a single allele. The allele with tandem and proximal duplication genes was identified with the Tandem module in JCVI using ‘--tandem_Nmax 10 --percent_overlap 80’. The remaining unpaired genes were considered to be haplotype-specific genes, which were mapped to the corresponding haplotype using GMAP (v.2021-08-25)111. Based on the mapping result, haplotype-specific genes were further classified into three categories, including: genes with start codon loss or stop codon gain; genes truly lost; and genes with high divergence (gene with alignment length <80% gene length or gene identity <80%). If biallelic genes were in one-to-one synteny blocks, they were used for further allele-specific gene expression analysis.

RNA-seq data of DS and MRB fruit at eight stages (S1–S8) were used for allele-specific gene expression analysis. RNA-seq reads were trimmed using Trimmomatic (v.0.39)112 and mapped against allele-aware annotated gene models using STAR (v.2.7.9)113 software with parameters ‘--alignIntronMax 20000 --alignMatesGapMax 25000 --outFilterMultimapNmax 1’. StringTie (v.2.1.7)114 software was used for FPKM estimation based on unique mapping reads. ASE was determined using DEseq2 (ref. 115) with more than a twofold change and adjusted P < 0.05. Based on a previous study116, ASE genes were further divided into three categories: (1) haplotype dominance gene pair (Hap), in which one haplotype gene was more highly expressed than its partner allele in at least one-third of the tissues and was not significantly different in other tissues; (2) Sub (sub- or neofunctionalized pairs), in which each of the two allelic genes showed a significantly high expression level in at least one tissue; and (3) other ASE genes that were classified as no-difference pairs (NoDiff). Methylation analysis followed a previous study117 in which Bismark (v.2.2.5)118 was used to analyze the whole-genome bisulfite sequencing data of leaf and fruit of DS and MRB and deepTools (v.3.5.1)119 to evaluate the methylation level of the upstream 5-kb gene body and the downstream 5-kb regions.

Graph-based genome construction and SV genotyping

To construct the graph-based genome, we used the two haplotypes of DS and MRB for graph generation using Minigraph (v.0.20-r559)108 and DS HapA was the base linear genome (Supplementary Fig. 40). The output Graphical Fragment Assembly file was used to construct a graph-based genome with the vg toolkit (v.1.46.0)120. We further evaluated the performance of SV genotyping using NGS data with a graph-based genome as reference (Supplementary Method 4). To compare the differences in SV genotype frequency between the Asian and European pears, short sequencing reads from the 362 pear accessions were mapped to the graph-based genome using the above graph-based SV genotyping pipeline and SVs with missing data rate <0.2 and minor allele frequency >0.05 were retained. The SV genotype frequencies of the Asian and European pear populations were calculated using VCFtools (v.0.1.16)121. SVs were counted twice for homozygous alleles and once for heterozygous alleles.

SNP calling

For capturing more SNP variations, we constructed a linear pangenome sequence with four haplotype sequences using the psvcp (v.1.01) pipeline122 and 47.82-Mb nonreference sequences were added to the reference genome (DS HapA). Raw sequencing reads were trimmed using Trimmomatic (v.0.39)112 and the clean reads were then aligned to the reference genome sequence using BWA (v.0.7.17)75. The alignment result was sorted and duplicate reads were marked using Picard software (https://broadinstitute.github.io/picard). The Genome Analysis Toolkit (v.4.2.3)123 was used for SNP calling with the HaplotypeCaller method and the SNP set was further filtered using VCFtools (v.0.1.16)121 software with the parameters ‘-maf 0.05 -max-missing 0.9’. Finally, SNPs with minor allele frequency >0.05 and missing data rate <0.1 were retained for further analysis.

Phylogenetic analysis and population structure study

A total of 273,985 SNPs at fourfold degenerate sites (4D SNPs) were selected for phylogenetic and population structure analyses. A maximum-likelihood tree was constructed using IQ-TREE (v.2.2.0)124, with 1,000 bootstrap replicates, and the best substitution model ‘PMB + F + R10’ was selected with the ModelFinder function. Admixture (v.1.3.0)125 was used for population structure analysis and principal component analysis was performed with PLINK (v.1.90)126. The genome-wide selective sweeps were identified between AW and AC pears (including P. pyrifolia and P. bretschneideri) and EW and EC pears (P. communis). Using VCFtools (v.0.1.16)121, we calculated pairwise FST (wild versus cultivated populations) and nucleotide diversity (π) in 20-kb sliding windows (10-kb step). Regions ranking in the top 5% for both FST and π were classified as selective sweeps.

Ancestral state estimation and inference of demographic history

To obtain the ancestral states, genomic resequencing data of the outgroup species hawthorn, apple and loquat were aligned against the HapA assembly of DS using BWA (v.0.7.17)75 and ancestral alleles were called using ANGSD (v.0.940)127 with the parameters ‘-doFasta 2 -doCounts 1’. To infer the demographic history, we used the stairway plot method128,129 and bam files for each pear accession were filtered and then used for calculating the site-frequency spectrum using ANGSD127, with the ‘-doSaf 1 -minMapQ 1 -minQ 20’ option. A stairway plot was performed with 200 bootstraps, 3.9 × 10−9 substitutions per site per year and 7.5 years per generation.

SV GWAS

We performed GWASs for SSC, FTD, SCC and SFW in 362 accessions using 83,369 high-quality SV markers with GAPIT (v.3.4)130 and the first three PCs and kinship matrix as covariates. The genome-wide significance thresholds (1.20 × 10−5) were determined using a uniform threshold of 1/n, where n is the adequate number of SVs. RNA-seq data from four pear accessions at five fruit development stages (S1–S5) were downloaded for evaluating PybZIP6 gene expression131 and RNA-seq data of samples with SCC were downloaded for evaluating PyC3H18 gene expression132. STAR (v.2.7.9)113 and StringTie (v.2.1.7)114 were used to align the reads and quantify the gene expression with the same parameters mentioned earlier (DS HapA as reference).

SV association with gene expression

A total of 110 samples (consisting of 62 Asian pears and 38 European pears) (Supplementary Table 19) were collected and RNA-seq data were generated to identify the SV effect on gene expression. The FPKM values were calculated using STAR (v.2.7.9)113 and StringTie (v.2.1.7)114 with the same parameters as above (DS HapA as reference). Then, 21,581 genes with a mean FPKM > 1 were retained for further analysis. The PanMarker program50 was used to identify SVs significantly associated with gene expression. SVs that overlapped with CDSs or the upstream 2-kb region were selected. All samples were classified into three groups, with groups 0, 1 and 2 representing the reference (Ref), heterozygous (Het) and alternative (Alt) genotypes, respectively. The Grubbs’ test was used to remove outliers (P < 0.05) and Levene’s test was applied to assess the equality of variances for a variable calculated for each of the two groups (P < 0.05). In addition, the two-sided Student’s t-test was used to determine whether there was a significant difference between the means of groups 0 and 2 (P < 0.05) and the locus was retained if the nonoutlier values in one of the two groups was strictly larger than that in another. We also compared the difference between group 1 and group 0 (or group 2) to evaluate the effect of heterozygous SV on the difference in gene expression.

RNA extraction, gene cloning and RT–qPCR assays

Total RNA was isolated using the Plant Total RNA Isolation Kit Plus (Foregene) following the manufacturer’s instructions. First-strand complementary DNA was synthesized from 2 μg of total RNA using the TransScript One-Step gDNA Removal and cDNA Synthesis SuperMix (TransGen), according to the manufacturer’s protocol. All oligonucleotide primers used are listed in Supplementary Table 21. Amplification for gene cloning was performed as follows: an initial denaturation step at 95 °C for 3 min, followed by 35 cycles of 95 °C for 15 s, 58 °C for 15 s and 72 °C for 90 s, with a final elongation step at 72 °C for 5 min.

Dual-luciferase reporter assay

The 2-kb promoter regions (5ʹ-upstream of the ATG start codon) of PyACS1 from DS and MRB were cloned and inserted into the pGreen II 0800-LUC binary vector for use as reporter genes. The PybHLH94 CDS was amplified by PCR using a pair of gene-specific primers and inserted into the pCAMBIA1300-35S-EGFP vector by homologous recombination. The recombinant plasmids constructed in the pCAMBIA1300-35S-EGFP vector and pGreen II 0800-LUC vector were transformed into the GV3101 strain of A. tumefaciens. A. tumefaciens GV3101 cells containing these vectors or the control vector were resuspended in the infiltration solution containing 10 mmol of 2-(N-morpholino) ethanesulfonic acid hydrate, 10 mmol of MgCl2 and 0.2 μmol of acetosyringone at an optical density at 600 nm = 1.0, then cultured at 25 °C with shaking at 70 rpm for 4 h before infiltration. The Agrobacterium cultures containing the pGreen II 0800-LUC and the pCAMBIA1300-35S-EGFP recombinant plasmids were mixed in a 1:9 ratio and mixtures were infiltrated into 4-week-old Nicotiana benthamiana leaves for transient gene expression assays. The infiltrated tobacco plants were placed in the dark for 12 h and then grown under normal conditions for 48 h, after which the firefly luciferase (LUC) and Renilla luciferase (REN) activities were assayed using dual-luciferase assay reagents (Promega). Determination of fluorescence values was performed using a microplate reader (SpectraMax Paradigm).

Electromobility shift assay analysis

The CDSs of PybHLH94 were cloned into the pET-32a+ vector to generate His fusion protein constructs (PybHLH94–His). Purification of the PybHLH94–His fusion proteins was performed using HIS Sefinose resin (Sangon Biotech). Mutated probes and unlabeled probes were used for competition in the EMSA experiments. Gene promoter sequences containing putative cis-element-binding sites were labeled at the 5ʹ-end using a biotin DNA-labeling kit (Beyotime Biotechnology), as directed by the manufacturer. EMSA was performed using a chemiluminescent EMSA kit (Beyotime Biotechnology).

Construction of transgenic tomato lines

For the construction of stable transgenic lines, the full-length CDS of PyACS1 from MRB was inserted into the pBI101 vector by homologous recombination. The recombinant plasmids were transformed into A. tumefaciens strain GV3101 using the freeze–thaw method. Transgenic tomato plants ‘Micro-Tom’ were obtained using an improvement of the method used in ref. 133 (Supplementary Method 8). Fruit firmness was quantitatively evaluated using a Brookfield CT3 Texture Analyzer following standardized protocols. Ethylene production dynamics were analyzed using a closed chamber methodology (Supplementary Method 9).

Statistical analysis

The statistical tests used are described throughout the article and in the figure legends. The χ2 test was performed using the chisq.test function in R. The two-sided Student’s t-test was calculated using Student’s t-test in R. Specifically, the χ2 test that is FDR corrected was used for comparison of SV occurrence frequency between Asian and European pears.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.