Main

Plant diseases increasingly challenge modern agriculture and food security. Long-term selective breeding reduces the genetic diversity of crops, thus decreasing their ability to adapt to changes in the environment and establishing effective disease resistance1,2,3. De novo domestication of wild relatives has great potential for resolving this problem4 and has recently been implemented in several annual crops, such as in rice (Oryza sativa)5, potato (Solanum tuberosum)6 and tomato (Solanum lycopersicum)7. Similar strategies have not yet been deployed in woody perennial crops due to the long juvenile period of woody plants and the unknown genetic background of their wild relatives. Citrus is one of the most important fruit crops worldwide8. However, the citrus industry has neglected the disease-resistance traits available from wild citrus because fruit flavor has been the pre-eminent breeding goal during the past decades.

Sweet orange (Citrus sinensis) is one of the most important fruit crops worldwide9. Most sweet orange cultivars are derived from somatic mutants and thus their background is monogenetic. Consequently, sweet orange is sensitive to disease threats. For example, all sweet oranges are sensitive to citrus canker, which is one of the most devastating diseases in the citrus industry, caused by the bacterium Xanthomonas citri subsp. citri (Xcc)10,11. Large amounts of bactericide and copper-related chemicals are used annually to control this disease, but these solutions increase costs, become less efficacious over time and lead to ecological problems. We therefore have a critical need for the de novo domestication of canker-resistant cultivars to counteract these problems.

Plants produce a large number of specialized metabolites to defend themselves against pathogens12. Specialized metabolites associated with defense play a variety of physiological and ecological roles13. Many specialized metabolites have been reported to have antimicrobial activity and are associated with plant disease resistance14,15,16,17,18. Citrus produce a large variety of specialized metabolites such as flavonoids and coumarins19,20,21 that contribute to defense responses14,22. Our previous work indicated that both decreases in disease resistance and specialized metabolite content occurred during the domestication of citrus23. A fast and promising de novo domestication strategy for breeding new disease-resistant cultivars of citrus involves focusing on specialized metabolites from wild citrus that contribute to disease resistance.

Elucidating the origin of sweet orange is the first step toward the de novo domestication of disease-resistant sweet orange. However, the origin of sweet orange is highly debated. The first genome analysis of sweet orange proposed that it originated from two rounds of interspecific hybridizations between mandarin and pummelo24. Another study that utilized more Citrus species indicated that the above model cannot explain some genomic regions and proposed a more complex model with several rounds of hybridization25. Importantly, both models are limited because they rely on only one haplotype genome to infer evolutionary history, although sweet orange is highly heterozygous.

Results

Diversity of sour orange in southern China

The region in southern China containing the Nanling Mountains, Yuan River and Wuling Mountains (NYW) is rich in wild and cultivated citrus resources, including sweet orange (Citrussinensis), sour orange (Citrus aurantium), mandarin (Citrus reticulata) and pummelo (Citrus maxima) (Fig. 1a,b and Extended Data Fig. 1). Diverse sweet oranges, including some varieties with extremely high acidity, are found in the NYW region26. Sour oranges were even more diverse relative to other citrus species, including variations in fruit acidity, seed numbers, fruit size and color (Fig. 1b and Extended Data Figs. 1 and 2).

Fig. 1: Sympatric region and genetic diversity of Citrus species from southern China.
figure 1

a, The geographical distribution of sweet oranges, sour oranges, mandarins and pummelos. The geographical map was generated manually using the freeform shape function of Microsoft PowerPoint software. b, Phenotypic variation among different sweet oranges, sour oranges, mandarins and pummelos in the sympatric region. c, PCA of 35 sweet oranges, 99 sour oranges, 11 mandarins and 16 pummelos. The red five-pointed star represents sweet oranges, orange circles represent sour oranges, red circles represent mandarins and yellowish green circles represent pummelos. d, The π values were calculated with 1 Mb sliding windows with 500 kb steps for sour oranges, mandarins and pummelos. The number of genomic statistical windows is 545. Each box represents the median and interquartile range. The error bars represent s.e.m. e, Coefficients of metabolite variation for sour oranges, mandarins and pummelos.

Source data.

We collected and sequenced 118 accessions of sour oranges and landrace citrus resources and another 76 cultivated citrus varieties using an Illumina platform (Supplementary Table 1), and also included 16 accessions of pummelo, 11 accessions of mandarin and 35 accessions of sweet oranges published previously (Supplementary Table 2). We performed a principal component analysis (PCA) using the single-nucleotide polymorphism (SNP) data. This analysis revealed that the largest variation exists between pummelo and mandarin, which are two founder Citrus species, as indicated by their opposing positions in the PCA plot. Both the PCA and phylogenetic analysis divided sour oranges into three groups that are largely consistent with their geographical distribution (Fig. 1c and Supplementary Fig. 1). We observed the highest level of genetic polymorphisms in sour oranges collected from the NYW region, which formed a separated group. Nucleotide diversity (π) analysis yielded a π value for sour oranges (~0.005) that indicated a higher genetic diversity relative to mandarins (~0.003) and pummelos (~0.002) (Fig. 1d and Supplementary Fig. 2). Consistent with these data, although all sweet oranges were susceptible to Xcc, we observed varying degrees of susceptibility to Xcc among the different accessions of sour orange (Supplementary Fig. 3 and Supplementary Table 3). Moreover, the coefficient of variation for metabolites in sour oranges is higher than those in mandarins and pummelos (Fig. 1e and Supplementary Table 4).

Notably, all sweet orange accessions coalesced into a single point on the PCA plot, closer to the sour orange group collected from the NYW region than to the other groups (Fig. 1c). These data provide evidence that sour oranges and sweet oranges are closely related. To explore the genetic constituents of sour oranges and sweet oranges, we analyzed the whole genome with species-specific SNPs27 to evaluate the genomic patterns of sweet oranges, sour oranges, mandarins and pummelos. We confirmed that sweet orange accessions shared a nearly identical genetic background differing26 (Supplementary Fig. 4), which is consistent with most sweet orange accessions arising from somatic mutations. We found that the genome of sour oranges is almost exclusively heterozygous with one allele from mandarin and the other allele from pummelo, which is consistent with the idea that sour oranges are F1 hybrids derived from a cross between mandarin and pummelo (Fig. 2a and Supplementary Fig. 5). Notably, most of the genomes from sour oranges and sweet oranges shared the same large heterozygous segments derived from mandarin and pummelo. However, sweet oranges differed from sour oranges in that they contain more homozygous segments from mandarin (Fig. 2a and Supplementary Figs. 4 and 5). We have identified one unknown orange accession from Hunan Province (‘TJH2’) that produces fruit that is similar to the fruit produced by sweet oranges in terms of fruit size, color and qualities (Extended Data Fig. 3). The genomic landscape of ‘TJH2’ was similar to that of sweet oranges, although not identical (Extended Data Fig. 3). These results indicate that sweet oranges are abundant and that similar accessions of sweet orange are present in the NYW region.

Fig. 2: A haplotype-resolved telomere-to-telomere genome assembly of sweet orange.
figure 2

a, Genomic patterns of sweet orange (‘Valencia’ sweet orange), sour orange (‘Anjiang’ sour orange), wild mandarin (‘Mangshan’ mandarin) and pummelo (‘Wanbai’ pummelo). These segments were calculated with 250 kb windows. Red, homozygous segment with a mandarin origin; yellowish green, homozygous segment with a pummelo origin; orange, heterozygous segment with a mandarin and pummelo origin; gray, unknown region. Chr, chromosome. b, Dual-color FISH of centromeres (pink, using CL34contig88 as centromeric probe34) labeled with biotin-16-dUTP (Roche) and telomeres (green, using synthesized telomeric probes (5′-TTTAGGG-3′)7 attached to FAM). c, A diagram illustrating the contributions of the mandarin (M) and pummelo (P) derived genome segments to the two haplotypes of sweet orange. Red, mandarin origin; yellowish green, pummelo origin; blue circles, telomeres; magenta ellipses, centromeres.

Source data.

Telomere-to-telomere genomes of sweet orange and sour orange

To elucidate the genetic origin of sweet orange, we independently obtained haplotype-resolved telomere-to-telomere genomes of sweet orange (Citrus sinensis) and sour orange (Citrus aurantium) and assembled four chromosome-level genomes of representative Citrus accessions, including C.reticulata ‘Mangshan’, C.maxima ‘Zipi’, C.reticulata ‘Ponkan’ and the above mentioned natural hybrid sweet orange ‘TJH2’ found in southern China. For the sweet orange genome, we used a combination of PacBio high-fidelity (HiFi) long reads (N50 of read lengths ~17 kb, 189× coverage) and Nanopore ultralong reads (N50 of read lengths ~55 kb, 82× coverage) to de novo assemble the genome of ‘Valencia’ sweet orange (Supplementary Table 5). We used 97× Hi-C data to scaffold the assembled contigs to pseudochromosomes (Supplementary Table 6). Our assembly filled all 27 gaps in the previous genome assembly and generated a phased telomere-to-telomere genome with high heterozygosity (~2%; Fig. 2c and Supplementary Fig. 6). We divided the two sets of homologous chromosomes from sweet orange into the two haplotypes ‘SWO-M’ and ‘SWO-P’ based on their percent sequence identity to the mandarin and pummelo genomes, respectively (Supplementary Fig. 7 and Supplementary Table 7). We reconstructed a maximum likelihood tree using single-copy genes, which showed that the two haplotypes are clustered together with the mandarin and pummelo genomes, respectively, which is consistent with each haplotype being almost entirely derived from either the mandarin or pummelo genome (Supplementary Fig. 8). The sequences of the two haplotypes, ‘SWO-M’ and ‘SWO-P’, and the previously published haploid sweet orange genome sequence were highly colinear26. These data indicate that the two haplotype sequences are complete (Supplementary Fig. 9). We estimated the completeness of the new genome assembly by calculating its Benchmarking Universal Single-Copy Orthologs (BUSCO) score, which reached 98% for complete and single-copy BUSCOs. Long-range sequencing combined with HiFi reads helped resolve all 18 centromeres and achieved the complete assembly of most chromosomal telomeres except for chromosome 8 from the pummelo haplotype and chromosomes 8 and 9 from the mandarin haplotype. Importantly, the new genome assembly had no gaps along all chromosomes (Fig. 2c). Fluorescence in situ hybridization (FISH) experiments supported the integrity of the assembly for telomeres and centromeres (Fig. 2b and Supplementary Fig. 10).

The level of presence and absence variation for each chromosome between ‘SWO-M’ and ‘SWO-P’ was between 2.09% and 6.77% (Supplementary Table 8). We also identified large inversions and translocations between the ‘SWO-M’ and ‘SWO-P’ haplotypes, such as a translocation between chromosomes 4 and 9. We found a ~10 Mb region without paired gene collinearity in the ‘SWO-M’ and ‘SWO-P’ haplotypes on chromosomes 4 and 9, which provides evidence for a large structural variation in this region (Supplementary Fig. 11). A comparative genomic analysis using the assembled Hi-C heat map and the Nanopore data confirmed the translocated regions (Supplementary Fig. 12). To test whether these structural variants influence the relative expression of the two haplotypes, we identified 17,850 allelic polymorphic genes in the sweet orange heterozygous genome (Table 1 and Supplementary Table 9) and detected greater expression differences for each allele in fruit flesh relative to peels and leaves used RNA-sequencing data (Supplementary Table 10).

Table 1 Statistics for the haplotype-resolved telomere-to-telomere genome of sweet orange and sour orange

For the haplotype-resolved telomere-to-telomere genome of sour orange, we assembled the genome of ‘ZGSC’ sour orange (a canker-resistant genotype) using a combination of PacBio HiFi long reads (N50 of read lengths ~20 kb, 102× coverage) and Nanopore ultralong reads (N50 of read lengths ~92 kb, 157× coverage) and phased the two haplotypes (Supplementary Fig. 13). We used 115.07× coverage Hi-C data for chromosome extension (Supplementary Table 6). The two sets of homologous chromosomes from sour orange were categorized as ‘SSO-M’ chromosomes (that is, chromosomes sharing high sequence identity with the mandarin genome) and ‘SSO-P’ chromosomes (that is, chromosomes sharing high sequence identity with the pummelo genome). These data provide more support for the idea that sour orange was derived from a cross between mandarin and pummelo. In addition, the genome assembly of sour orange resolved all 18 centromeres and achieved the assembly of most telomeres (Extended Data Fig. 4). Importantly, the new genome assembly had no gaps along all chromosomes.

For the genome assembly of the four chromosome-level genomes of the representative citrus accessions, including C.reticulata ‘Mangshan’, C.maxima ‘Zipi’, C.reticulata ‘Ponkan’ and the natural hybrid sweet orange ‘TJH2’, we generated 70.17–98.05× coverage PacBio or Nanopore long reads for the four accessions (Supplementary Table 5). We then used 68–105× coverage Hi-C data to extend the chromosomes (Supplementary Table 6). These data produced chromosome-level assemblies covering 333–365.5 Mb per genome, with contig N50 values of up to 20.7 Mb (Supplementary Table 11).

Elucidating the origin of sweet orange

We used the haplotype variations identified from the whole-genome resequencing data for 305 citrus accessions to assess the extent of their relatedness to sweet orange. We determined that sour oranges are divided into three groups (‘SSO1’, ‘SSO2’ and ‘SSO3’) based on clustering and kinship (Supplementary Table 12). The kinship matrix indicates that Ponkan/Jiankan mandarins are the most closely associated with sweet oranges with the exception of sweet orange-derived cultivars, as indicated by the number of genomic fragments shared between these accessions (Fig. 3a and Supplementary Fig. 14). In addition, a group of sour orange accessions belonging to the ‘SSO2’ cluster was closely related to sweet orange (Fig. 3a and Supplementary Tables 12 and 13). Jiankan mandarin is a landrace that is currently rarely cultivated and is genetically similar to the widely cultivated Ponkan mandarin (Supplementary Figs. 15 and 16). Both Ponkan and Jiankan mandarins are distributed in the NYW region. Identity by state (IBS) and identity by descent (IBD) analyses based on whole-genome SNP data indicated that sour oranges, Ponkan mandarin and cultivars derived from sweet orange were more closely related to sweet orange than pummelos and other mandarins. Indeed, the IBS values among sweet oranges were the highest (>0.9). The IBS values of sour oranges, Ponkan mandarin and sweet orange-derived cultivars (>0.8) were greater than those of other mandarins and pummelos (<0.8; Fig. 3b). IBD analysis estimates the likelihood that two alleles from two different species were inherited from the same parent. The probability of several sour oranges originating from the NYW region sharing a set of chromosomes with sweet oranges is close to 100%. These data are consistent with a parental relationship between sour oranges and sweet oranges (Fig. 3c). We also calculated allele sharing among accessions, using proportions from SNP data. A pair of accessions that shares 0, 1 or 2 alleles based on an IBS analysis were classified as P0, P1 and P2, respectively. The P0 value of sweet orange and ‘TJH1’ (a sour orange from the NYW region) was only 0.02%; the P0 value of sweet orange and Ponkan mandarins was 0.18% (Supplementary Fig. 17). These data indicate that sour oranges, Ponkan mandarin, and sweet oranges are highly related. For comparison, we used clementine mandarin, a hybrid between sweet orange and willow-leaf mandarin, with a P0 value of 0.04% as a reference for the parentage analysis. On the basis of these data, we hypothesize that sweet orange originated from hybridization between an ancient sour orange accession and a Ponkan-like mandarin in the NYW region. An analysis of the chloroplast genomes of the Citrus species revealed that sweet orange is more closely related to sour oranges and provides evidence that sour oranges are probably the maternal parent of sweet orange (Supplementary Fig. 18).

Fig. 3: Origin of sweet orange.
figure 3

a, fineSTRUCTURE analysis using copying vectors generated from chromosome painting. Each row of the heat map is a recipient copying vector showing the number of fragments shared between the recipient and each individual as a donor (columns), such that block-like patterns are observed on the heat map. Blue represents less haplotype sharing and red indicates more haplotype sharing. SWO, sweet orange; red ellipses, cultivars derived from sweet orange; SSO1, sour orange group 1; SSO2, sour orange group 2; SSO3, sour orange group 3; PG, Ponkan mandarin; blue ellipses, Jiankan mandarin; Md, other mandarins; Pm, pummelo. JY and JYH, citron and the hybrids of citron, such as lemon. b, IBS between sweet oranges (n = 14), cultivars derived from sweet orange (n = 5), sour oranges (n = 86), Ponkan mandarin (n = 27), other mandarins (n = 35) and pummelos (n = 169). Each box represents the median and interquartile range. The error bars represent s.e.m. c, IBD analysis among sweet orange, sour orange and hybrids of sour orange. Hybrids of sour orange refer to the hybrids obtained from a cross between sour orange and Ponkan mandarin. Red represents a close relationship. d, Model for the origin of sweet orange. The coincidence probability refers to the probability of a genome landscape that matches the genetic relationships between the parents and hybrids. SSO, sour orange. Scale bars, 1 cm.

Source data.

On the basis of the above results, we propose that sweet orange originated from a cross between sour orange (maternal parent) and a Ponkan-like mandarin (paternal parent) and that the former was derived from a more ancient cross between wild mandarin and pummelo (Fig. 2a). The coincidence probability refers to the probability of the genome landscape calculated by 1,014,134 species-specific SNP27 matching the genetic relationship between parents and hybrids and also supports the above inference. The probability that sweet orange is derived from sour orange and Ponkan mandarin is 82.97 ± 2.77% (n = 40; Fig. 3d and Supplementary Table 14) and the probability that sour orange is the hybrid of a cross between pummelo and mandarin is 84.55 ± 3.01% (n = 75; Fig. 3d and Supplementary Table 15).

Introduction of diversity to sweet orange

To test the origin model and introduce diversity into sweet orange, we designed the maternal and paternal parents for a cross (Fig. 4a and Supplementary Fig. 19). Xcc inoculation experiments revealed that different sour oranges varied quantitatively in their resistance to citrus canker. One sour orange accession ‘ZGSC’ was confirmed resistant to citrus canker based on the evaluation of pathogen titer, H2O2 production and defense gene expression (Extended Data Fig. 5 and Supplementary Fig. 20). We therefore performed hybridization experiments using the resistant genotype ‘ZGSC’ as the maternal parent and Ponkan mandarin as the paternal parent. We used embryo rescue to obtain 892 hybrids and the genotypes were validated by whole-genome sequencing data and liquid gene chip indicated that 215 of these rescued embryos were indeed hybrids. Hybridization of sour orange and Ponkan mandarin generated vigorous hybrids (Fig. 4a). We conducted a genetic background analysis of parents and hybrids using the 20K liquid gene chip we developed and found a high proportion of heterozygous sites in most of these hybrids (Supplementary Figs. 2123). We observed considerable trait variations in both fruit quality and disease resistance in the hybrid population. The total soluble solid (TSS) to titratable acidity (TA) ratio varied from 2.96 to 16.98 among hybrids, which makes a compelling case for widespread variation in fruit flavor (Supplementary Fig. 24 and Supplementary Table 16). The disease lesion areas produced by our Xcc inoculation assays varied widely among the hybrids (Supplementary Fig. 25). A PCA analysis provided more evidence that we obtained a highly diverse group of hybrids (Fig. 4b).

Fig. 4: De novo domestication of sweet orange.
figure 4

a, The process of de novo domestication of sweet orange. b, PCA of the hybrids derived from a cross between sour orange and Ponkan mandarin. Orange circles with black borders represent artificial sweet orange: hybrid 228, hybrid 24 and hybrid 55. c,d, Genomic pattern of artificial sweet orange (c) and cultivated sweet orange (d). mm, homozygous segment with a mandarin origin only; mp, heterozygous segment with a mandarin/pummelo origin; pp, homozygous segment with a pummelo origin only; UN, unknown region. The numbers represent the percentage of each genomic region in the genome. Genomic patterns were evaluated by 1,014,134 species-specific SNPs27. e, Genomic variations for chromosome 2 in SWO, SSO, PG, artificial SWO and artificial mandarin (Md). The number of accessions for each variety is indicated.

Source data.

Among these hybrids, three nearly overlapped with the commercial sweet oranges in a PCA analysis (Fig. 4b). The oval shape of the fruits produced by these three hybrids was very similar to that of sweet orange. Moreover, the orange–yellow to orange–red pericarp, tight-skinned fruit and the reduced numbers of seeds of these three hybrids was also similar to the commercial sweet oranges. The fruit flavor, as indicated by the ratio of TSS to TA, was also similar to sweet orange (Supplementary Table 16). Moreover, we sequenced the genomes of these three hybrids and used 1,014,134 species-specific SNPs to evaluate the genetic similarity, and found more than 96% regions are similar to that of sweet orange (Fig. 4c,d and Supplementary Table 17). In addition, we observed that hybrids with a specific region of homozygous pummelo haplotypes over the 1.8–5.8 Mb region on chromosome 2 showed sweet orange-like phenotypes (Fig. 4e and Supplementary Fig. 21). RNA-sequencing analysis indicated 36 genes were specifically expressed in sweet orange fruit peels and 27 genes were specifically expressed in sweet orange fruit flesh (Supplementary Table 18). Therefore, our hybridization experiments support the model that sweet orange was derived from a sour orange × mandarin cross.

Metabolites selection for canker-resistant sweet orange

To explore the molecular basis of Xcc resistance in sour orange, we examined the expression and sequence variation of the Xcc susceptibility gene LATERAL ORGAN BOUNDARIES 1 (CsLOB1)28. There was no sequence polymorphism or obvious difference in gene expression levels between resistant and susceptible genotypes (Supplementary Fig. 26). On the basis of these data, we suggest that resistance to Xcc in these hybrids was not due to a CsLOB1-mediated mechanism. Sour orange is widely used as a type of Chinese folk medicine and accumulates high levels of metabolites with antibacterial activity. Therefore, we hypothesized that the Xcc resistance of sour orange may be attributed to the accumulation of antibacterial metabolites. To test this idea, we performed a comparative metabolomic analysis using leaves and fruits from the Xcc-resistant and Xcc-susceptible hybrids. We identified and characterized 2,027 metabolites that included 690 flavonoids, 244 phenolic acids, 198 alkaloids, 189 amino acids and their derivatives, 154 lignans and coumarins, 122 lipids, 84 terpenoids, 83 organic acids, 68 nucleotides and their derivatives, and 195 other compounds (Supplementary Tables 19 and 20). Among these metabolites, we detected 215 metabolites with significantly higher levels in the leaves of all three Xcc-resistant accessions relative to the three Xcc-susceptible accessions (P value < 0.05; Fig. 5a and Supplementary Table 21). Additionally, most of these 215 metabolites accumulated at low levels in the fruits of all tested samples (Fig. 5a and Supplementary Table 21).

Fig. 5: Identification of defense-related metabolites and selection of canker-resistant sweet orange.
figure 5

a, Heat map for the levels of 215 metabolites that accumulate to higher levels in the leaves of Xcc-resistant accessions compared with Xcc-susceptible accessions (P < 0.05): 1, sour orange; 2, hybrid 239; 3, hybrid 331; 4, Ponkan mandarin; 5, hybrid 374; 6, hybrid 115. Blue represents low levels and red represents high levels. The content of metabolites were transformed by log2 and the normalized output of each row ranges from 0 to 1. b, Inhibitory effect of disease resistance-related metabolites (100 μg ml−1) on the growth of Xcc. Data are presented as mean values ± s.d.; n = 3 independent replicates. c, Disk diffusion assay showing the antibacterial activity of daphnetin and esculetin (50 mg ml−1 dissolved in DMSO) on the growth of 6 species of pathogenic bacteria. Kanamycin (25 mg ml−1 dissolved in pure water) and the solvent DMSO were used as positive and negative controls, respectively. Larger diameters for the inhibition zones around the paper disks indicate more antibacterial activity. The inhibition zones are indicated with dashed circles. The diameter of the round white filter paper was 6 mm. d, The appearance of fruits from sweet orange and artificial sweet orange. Scale bar, 2 cm. e, Leaves from sweet orange and artificial sweet orange were inoculated with Xcc (108 CFU ml−1). The photographs were taken 12 days after inoculation. Scale bars, 2 mm.

Source data.

Most of the 215 metabolites that accumulated to high levels only in Xcc-resistant accessions are derivatives of different types of metabolites and some of these metabolites are not commercially available. To identify defense-related metabolites with application potential, we chose 16 typical compounds that are commercially available and evaluated their antibacterial activity on Xcc (Fig. 5b). We observed that 13 of the 16 metabolites had bacteriostatic effects on Xcc. Among these metabolites, we observed a 10% bacteriostasis rate for four flavonoid compounds—naringenin, isorhamnetin, genistein and 7-methoxyflavone. In particular, two courmarin metabolites, daphnetin and esculetin, which have been reported to have antibacterial activity toward both Ralstonia solanacearum29 and Ralstonia pseudosolanacearum30, strongly inhibited the growth of Xcc, with a bacteriostasis rate of more than 60% (Fig. 5b). To further explore the broad-spectrum antibacterial effect of daphnetin and esculetin, we used a disk diffusion method to examine the bacteriostasis of the two metabolites on Xcc and five other pathogenic bacteria, including Xanthomonas campestris pv. campestris (Xca, a causal agent of the black rot diseases of crucifers), Acidovorax avenae subsp. citrulli (Aac, a causal agent of fruit blotch disease in watermelon), Pseudomonas syringae pv. actinidiae (Psa, a causal agent of bacterial canker in kiwifruit), Pseudomonas cannabina pv. alisalensis (Pc, a causal agent of bacterial blight in crucifers) and Xanthomonas oryzae pv. oryzae (Xoo, a causal agent of bacterial leaf streak disease in rice). We found that both daphnetin and esculetin can inhibit the growth of all six kinds of pathogenic bacteria to different degrees, as demonstrated by the diameters of inhibition zones that ranged from 10.9 to 42.0 mm (Fig. 5c and Supplementary Fig. 27). Notably, the inhibitory effect of daphnetin on the growth of Xcc and Psa, and esculetin on the growth of Aac was similar to or even larger than the inhibitory effects of kanamycin (Fig. 5c).

To select canker-resistant hybrids, we used metabolic assays to evaluate 29 hybrids. The results indicated that five hybrids accumulated higher levels of daphnetin and O-feruloyl 2-hydroxylcoumarin (a derivative of esculetin) than ‘ZGSC’ sour orange (Supplementary Table 22). Xcc inoculation experiments revealed that three of the hybrids were resistant to citrus canker and the other two were nearly resistant to citrus canker (Supplementary Figs. 25 and 28). Notably, among the three artificial sweet orange hybrids mentioned above (that is, hybrid 228, hybrid 24 and hybrid 55), hybrid 228 accumulated higher levels of daphnetin and O-feruloyl 2-hydroxylcoumarin than ‘ZGSC’ (Supplementary Table 22) and was resistant to citrus canker (Fig. 5d,e).

To explore genes related to disease resistance, we generated 1.68 Tb of RNA-sequencing data from the same samples used for the metabolome analysis described above (Supplementary Table 23). From a differential gene expression analysis, we identified 248 significantly upregulated genes in the leaves of the Xcc-resistant accessions relative to Xcc-susceptible accessions (Supplementary Table 24). We noticed that some genes encoding enzymes involved in coumarin and flavonoid biosynthesis were among the upregulated genes (Supplementary Figs. 29 and 30 and Supplementary Table 25), which is consistent with the high-level accumulation of defense-related metabolites in the resistant accessions (Fig. 5b).

Discussion

Our genomic data, together with evidence from extant native populations and ancient vernaculars, support the conclusion that sweet orange was domesticated in southern China. Prototypic forms with high fruit acidity and regionally distinct sweet oranges were discovered in this region13. Citrus has been cultivated for at least 4,000 years in this region, which has highly diverse citrus resources including abundant wild forms, semi-domesticated and cultivated sour oranges, mandarins and pummelos. Records indicate that citrus were initially commercially produced in this region, called the Chu state or the Jingzhou region, which corresponds approximately to the regions encompassed by present-day Hunan and Hubei provinces31. More archeological evidence comes from herbarium specimens of citrus seeds from the ‘Ma Wang Dui tomb of the Han dynasty’ (first century bc) in this region31,32.

Compared with previous studies on the origin of sweet orange, our study inferred a reasonable model with the extant germplasm of both parents of sweet orange in the NYW region of southern China. A previous study proposed that sweet orange originated from (pummelo × mandarin) × mandarin24. Another previous study proposed that sweet orange had undergone multiple pummelo introgression events33 and proposed that sour orange was not related to sweet orange25. Our study inferred a simple model for the origin of sweet orange. We propose that sweet orange is hybrid between sour orange and Ponkan-like mandarin based on a high-resolution haplotype-resolved genome analysis that we confirmed with artificial hybridization experiments. As sweet orange and sour orange both are interspecies hybrids with high levels of heterozygosity, a previous genomic analysis that utilized only one haploid genome may lack sufficient allele information. In this study, we clarified the genetic composition of phased haplotypes for both sweet orange and sour orange, allowing for a thorough consideration of the genetic characteristics of sweet oranges. The debate on the different models for the origin of sweet orange has focused on the 1.8–5.8 Mb homozygous pummelo/pummelo (P/P) segment on chromosome 2 of sweet orange. We defined the distribution of the P/P segment in all accessions and cross-referenced them with the phenotypes of the hybrids of sour orange and Ponkan mandarin. Our data revealed that both the sour orange and Ponkan mandarin possess one allele of the pummelo segment and thus, provide a reasonable explanation for the debated P/P segment of sweet oranges (Fig. 4e and Supplementary Fig. 22). We also observed sweet orange-like phenotype among the hybrids with homozygous pummelo haplotypes in this region (Fig. 4e and Supplementary Fig. 23). We further designed hybridization experiment by using sour orange and Ponkan mandarin as parents and created artificial sweet oranges that are very close to commercial sweet oranges, which provides strong evidence for our origin model of sweet orange. The existence of all parental species in the same sympatric region in the NYW region of southern China provides more support for our model for the origin of sweet orange.

Defense-related metabolites are promising candidates for environmentally friendly bactericides. Using resistant accessions of sour orange, we identified 13 citrus metabolites that can inhibit the growth of Xcc with different bacteriostasis rates. We also found broad-spectrum antibacterial activities for daphnetin and esculetin against six kinds of industrially important pathogenic bacteria in in vitro experiments, and the inhibitory effects of three combinations were even larger than the kanamycin. A previous study reported that daphnetin and esculetin have antibacterial activity toward both R.solanacearum29 and R.pseudosolanacearum30. These metabolites are promising candidates and should be tested with field experiments that aim to identify green pesticides with antibactericidal activities. In the future, breeding programs that aim to develop disease-resistant hybrids can use bactericidal metabolites as effective markers for disease-resistant hybrids.

This study developed a strategy that uses sour orange (a kind of wild citrus) for de novo domestication of sweet oranges. By choosing highly heterozygous parents and designing a liquid gene chip with high selection efficiency, we obtained diverse sweet oranges with broad variations in their genome sequences, metabolite levels and disease resistance. This study also provides a proof-of-concept for metabolite selection in perennials.

Methods

Plant materials and whole-genome sequencing

A total of 226 citrus cultivars and landraces (99 sour oranges, 13 sweet oranges, 15 mandarins, 43 pummelos, 19 citrus hybrids, 32 hybrids of sour orange and 5 others citrus accessions) were sequenced in this study (Supplementary Table 1). At least 10 µg of genomic DNA from each accession was used to construct a sequencing library. Paired-end sequencing libraries with an insert size of approximately 200–500 bp were constructed and sequenced on the Illumina HiSeq 2500 platform.

Genome assembly

We assembled six genomes including: Citrus sinensis ‘Valencia’, Citrus aurantium ‘ZGSC’, Citrus reticulata ‘Mangshan’, Citrus maxima ‘Zipi’, Citrus reticulata ‘Ponkan’ and the natural hybrid sweet orange ‘TJH2’ found in southern China. PacBio HiFi reads and Nanopore ultralong reads were generated for the haplotype-resolved telomere-to-telomere genomes assemblies of Citrus sinensis ‘Valencia’ and Citrus aurantium ‘ZGSC’. Nanopore ultralong reads were generated for the chromosome-level assemblies of Citrus reticulata ‘Mangshan’, Citrus maxima ‘Zipi’, Citrus reticulata ‘Ponkan’ and the natural hybrid sweet orange ‘TJH2’.

HiFi reads were assembled using Hifiasm (v0.16.1)35 and canu (v2.2)36 with default parameters. Nanopore ultralong reads were assembled using Nextdenovo (v2.5.2)37, Necat (v.1.0)38 and SMARTdenovo (v.1.0)39. Draft contigs generated using Hifiasm and Nextdenovo were chosen for the following analysis if they passed the quality assessment. Contigs were then polished using Nextpolish (v.1.4.0)40 with Illumina reads. The polished contigs were then clustered, ordered and oriented to 18 chromosomes using the 3d-dna pipeline (v180922)41, which contained two sets of genomes. The genomes assembled using Nanopore reads were used to fill gaps in the genomes assembled using Hifiasm and TGS-Gapcloser (v1.1.1)42. Another round of gap filling was performed using corrected Nanopore reads, generating a sweet orange assembly without gaps. The plant telomeric sequences (CCCTAAA) were used to identify telomeres in the assembled genome.

Nanopore reads (~10 kb) were generated for three species, including Citrusreticulata ‘Pokan’, Citrusmaxima ‘ZPY’ and a natural hybrid sweet orange named ‘TJH2’. In addition, we generated Nanopore ultralong reads (~20 kb) for Citrusreticulata ‘Mangshan’. These Nanopore reads were assembled using wtdbg (v.2.3)43, Nextdenovo (v.2.5.2), Necat (v.1.0), SMARTdenovo (v.1.0) and Flye (v.2.8.3)44. Then, three rounds of genome polishing were performed to correct assembly errors. The first polishing was performed using racon (v.1.5.0)45 with Nanopore raw reads. The next two rounds of polishing were performed using Nextpolish (v.1.4.0) with Illumina reads. For Citrusreticulata ‘Mangshan’, Citrusreticulata ‘Pokan’ and a natural hybrid sweet orange named ‘TJH2’ found in southern China, we generated Hi-C reads to anchor contigs using the 3d-dna pipeline (v180922). For Citrusmaxima ‘ZPY’, the contigs were anchored based on collinearity with the chromosome-level genome of Citrusmaxima ‘Wanbai’.

Genome annotation

De novo transposable element libraries were predicted using RepeatModeler2 (v1.0.11)46. The transposable element library generated above was used to identify repetitive sequences in particular genomes using RepeatMasker (v4.0.9)47. Gene models were annotated by integrating evidence from ab initio gene predictions, homology searches and RNA-sequencing. For ab initio gene predictions, AUGUSTUS (v.3.3)48, SNAP49 and GlimmerHMM (v3.0.4)50 were used. Homology searching was then conducted using the GenomeThreader software51. For RNA-sequencing reads, Hisat2 (v2.2.1)52 was used for sequence alignments and Cufflinks (v2.2.1)53 was used to construct transcripts. Transcripts were assembled both de novo and genome guided using Trinity (v.2.4.0)54. The PASA (v2.4.1)55 software was used to build a comprehensive transcriptome database for de novo transcriptome assemblies. Correcting exon boundaries, adding untranslated regions and modeling alternative splicing were based on the PASA alignment assemblies. All the above gene prediction results were integrated using EVM (v1.1.1)56 to generate a final structural annotation. Functions of the predicted genes were assigned using eggNOG-mapper57.

The approaches of FISH

FISH was performed as previously described by Xia et al.34 with minor modifications. Root tips were harvested from germinated C.sinensis ‘Valencia’. After pretreatment with 0.075 M KCl in saturated p-dichlorobenzene, the root tips were fixed in fresh Carnot (ethanol:glacial acetic acid, 3:1, v:v) at room temperature for 24 h, and then replaced with 70% ethanol and stored at −20 °C. The CL34contig88 served as a centromeric probe34, was labeled with biotin-16-dUTP (Roche) and was subsequently detected with Cy3-conjugated streptavidin (Sigma). Telomeric probes were synthesized (Sango Biotech) and attached to carboxyfluorescein (FAM). Dual-color FISH was conducted with these two probes. The slides were counterstained with 4,6-diamidino-2-phenylindole (DAPI, Invitrogen) and observed with an Axio imager and an M2 fluorescence microscope (Zeiss). The images were processed and adjusted using ZEN 2 (blue edition) and the Adobe Photoshop 2023 software.

Comparative genomic analyses

Seven representative citrus species were selected for comparative genomic analyses, including C.sinensis, C.reticulata ‘Mangshan’, C.maxima ‘ZPY’, C.reticulata ‘Ponkan’, C.aurantium ‘ZGSC’, C.medica and Atalantia buxifolia. Pairwise comparisons (‘SSO-M’ versus C.reticulata ‘Mangshan’, ‘SSO-P’ versus C.maxima ‘ZPY’, ‘SWO-M’ versus C.reticulata ‘Mangshan’, ‘SWO-P’ versus C.axima ‘ZPY’) of genomic sequences were made using LASTZ (v1.02.00)58 and the nucmer program from the MUMmer (v4.0.0)59 software package using ‘–maxgap = 500–mincluster = 100 -q -r’. The alignment results were further filtered using the delta-filter program in the MUMmer package. SNPs and structural variants were obtained using the show-diff and show-snps program in the MUMmer package. Genes were clustered using OrthoFinder2 (v2.2.7)60 with default parameters. A specific gene family means that this gene family only exists in one species. The Gene Ontology enrichment analyses were performed using the agriGO (v2.0)61 program.

Genetic diversity and nucleotide diversity (π) analysis

Genomic data from seven citrus species were collected and single-copy genes were identified using OrthoFinder2 (v2.2.7). For each pair of these genes, we used the MUSCLE (v.3.8.31)62 software for sequence alignment. Aligned amino acid sequences were converted into the corresponding CDS sequences and concatenated. A maximum likelihood tree was produced using A.buxifolia as outgroup and using the substitution model GTRGAMMA from the RAxML (v7.7.8) software63, with a total of 100 rapid bootstrap values. High-quality paired-end reads from each accession were mapped to the sweet orange genome using BWA (v0.7.17)64 and SAMtools (v1.9)65. PCR duplicates were marked using the MarkDuplicates command from GATK (v4.1.1)66. SNPs were called using HaplotypeCaller command from GATK (v4.1.1). Raw VCF files were filtered using Vcftools (v0.1.13)67. Then, we extracted all 4dTV sites in the sweet orange genome to construct a maximum likelihood phylogenetic tree using RAxML (v7.7.8) with a total of 200 rapid bootstrap values. PCA based on all the SNPs was conducted with default settings using GCTA (v1.26.0)68. We used PLINK (v1.90)69 for format conversion and used the R package prcomp to calculate PC1 and PC2. For nucleotide diversity (π) analysis, SNPs from the whole genome were used, and a sliding-window approach (1 Mb windows sliding in 500 kb steps) was employed to quantify nucleotide diversity (π) for each pair of citrus populations using the VCFtools software.

Evaluation of genetic relationships

We inferred genetic relationships based on the haplotype polymorphisms at the population level70,71. Before kinship calculations, we first pruned the SNPs called from previous sections (r2 > 0.9) using plink (v1.90) and phased the filtered SNPs using the beagle 4.0 software (vr1399)72. Then, we used ChromoPainter and the FineSTRUCTURE pipeline71 to analyze 305 citrus resources. We constructed the genetic relationship at the haplotype level, and then calculated the genetic relationship matrix to investigate what proportion of the haplotype each individual contributed to the remaining individuals. In addition, we also calculated IBD and IBS to assess the extent of relatedness among different citrus accessions to sweet orange using PLINK (v1.90). For the evaluation of genetic similarity between hybrids and sweet oranges, the genomic pattern were evaluated by 1,014,134 species-specific SNPs with 100 kb windows refer to the method of Wang et al.27.

Assay of pathogenicity

Fully expanded leaves were inoculated with Xcc (108 colony-forming units (CFU) per milliliter) with an inoculating needle (0.5 mm in diameter). Each inoculation spot consisted of seven pricks with the inoculation needle as recommended by previous reports with minor modifications. A bacterial suspension was dropped into each inoculation spot. The disease lesion area was measured (36 punctures on average) with ImageJ 2.0.

Hybrid fruit quality determinations

Total soluble solids (TSS) and titratable acid (TA) was determined using a PAL-BX/ACID sugar and acid integrated machine. The data collection was repeated three times. A single fruit weight was determined using an electronic balance with a sensitivity of 0.01 to measure the weight of the fruit. The masses of three fruits were quantified for each genotype. Mean values are reported. The fruit shape index is the longitudinal diameter value/transverse diameter value. A vernier caliper was used to measure the distance from the top of the fruit to the pedicle as the value for the longitudinal diameter (mm), and the cross-sectional diameter of the fruit was used as the value of the transverse meridian (mm). Three fruits were measured from each fruit tree. Mean values are reported.

Metabolomics profiling and analyses

The freeze-dried leaf and fruit samples were powdered using a mixer mill (MM 400, Retsch) with a zirconia bead for 1.5 min at 30 Hz. First, 100 mg powder was weighed and extracted overnight at 4 °C with 1.0 ml 70% aqueous methanol. Following centrifugation at 10,000 g for 10 min, the extracts were absorbed (CNWBOND Carbon-GCB SPE Cartridge, 250 mg, 3 ml; ANPEL) and filtered (SCAA-104, 0.22 μm pore size; ANPEL) before liquid chromatography–mass spectrometry (LC–MS) analysis.

The sample extracts were analyzed using an LC–ESI–MS/MS system (HPLC, Shim-pack UFLC SHIMADZU CBM30A system; MS, Applied Biosystems 4500 Q TRAP). The effluent was alternatively connected to an ESI-triple quadrupole-linear ion trap (QTRAP)–MS. LIT and triple quadrupole (QQQ) scans were acquired on a triple quadrupole-linear ion trap mass spectrometer (Q TRAP), API 4500 Q TRAP LC–MS/MS System, equipped with an ESI Turbo Ion-Spray interface, operating in a positive ion mode and controlled by Analyst 1.6 software (AB Sciex). Orthogonal projections to latent structures–discriminant analysis was applied to identify altered metabolites.

The coefficient of variation was calculated for each metabolite on the basis of the following formula: δ/μ, where δ and μ are the standard deviation and mean of each metabolite in the population, respectively.

Measurement of the antimicrobial activity of metabolites

Six kinds of pathogenic bacteria (Xcc, Xca, Aac, Psa, Pc and Xoo) were plated on Luria-Bertani (LB) solid medium separately and incubated at 28 °C for 14 h. Individual colonies were cultured separately in LB liquid medium at 28 °C on an orbital shaker at 200 rpm for 18 h.

To test the 16 metabolites for antibacterial activity, the Xcc bacterial cells were resuspended in sterile water at a concentration of 106 CFU ml−1 and 0.1 ml of the suspension was added to 5 ml of LB liquid medium containing different metabolites at a concentration of 100 μg ml−1. Dimethylsulfoxide (DMSO) without metabolites (0 mg ml−1) served as a control. All cultures were shaken (200 rpm) at 28 °C in the dark, and the OD600 was measured every 6 h until the bacterial culture reached the stationary phase. The OD600 values of all the cultures at 36 h were used to calculate the bacteriostasis rate which is expressed as (OD600 of DMSO culture − OD600 of metabolite culture)/OD600 of DMSO culture × 100%.

To explore the antibacterial effect of daphnetin and esculetin on the six species of pathogenic bacteria, the disk diffusion method for assessing antibacterial activity was performed73. In brief, melted LB solid medium cooled down to about 45 °C was mixed with different bacteria and spread evenly on Petri dishes in sterile conditions. The bacterial suspension concentration of LB solid medium was about 106 CFU ml−1. Filter paper disks (6 mm diameter) were soaked with different metabolites at a concentration of 50 mg ml−1. The impregnated disks were symmetrically placed onto the medium using sterile tweezers. Filter paper disks separately soaked with DMSO and 25 mg ml−1 kanamycin were used as negative and positive controls, respectively. The Petri dishes were incubated at 28 °C for 36 h. Antibacterial activity was evaluated by measuring the diameter of the inhibition zone, the area around the disks with no bacterial growth. Each experiment was performed three times, with three replicates per experiment.

Geographical map generation

The geographical map was generated manually using the freeform shape function of Microsoft PowerPoint software.

Statistics and reproducibility

A two-tailed Student’s t-test was used to compare the relative differences in gene expression, metabolite content, the ratio of TSS to TA and the antimicrobial activity of metabolites between two groups of samples. A two-tailed adjusted Fisher’s exact test was used to conduct the Gene Ontology enrichment analysis of the target genes relative to the background. All experiments were replicated with at least three biological replicates.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.