Introduction

The fact that interspecific hybridization and introgression are common phenomena in many plant groups has long been recognized, as well as the need to study these phenomena in theoretical and applied plant research, e.g. to clarify evolutionary trends (Anderson, 1949) and to determine sources of valuable diversity (Allard, 1960). However, thorough molecular documentation is available only for a restricted number of genera, reviewed for animals by Avise (1994) and for plants by Rieseberg et al. (1996b). A detailed study of speciation via introgression indicated mechanisms such as selection for coadapted gene complexes and selection against chromosomal rearrangements (Rieseberg et al., 1995).

The subfamily Maloideae (Rosaceae), comprising approximately 940 species in 28 genera (Robertson et al., 1991) and a further 33 species representing 13 hybrid genera (Phipps et al., 1990), is notorious for its variability in reproductive and breeding systems. Extensive hybridization at generic and specific levels, as well as polyploidy, is remarkably frequent (Campbell et al., 1991). Hybrid complexes in Maloideae, including artificially produced hybrids, have been studied mainly by traditional morphological and cytological methods (Dickinson & Campbell, 1991; Phipps et al., 1991; Robertson et al., 1991). Molecular evidence of natural hybridization has been reported for Malus (Dickson et al., 1991) and for Amelanchier (Campbell et al., 1997). In an allozyme-based study of interspecific hybridization within groups of several species of Malus, native to North America, and the domesticated apple (M. × domestica), Dickson et al. (1991) observed little gene flow from the domesticated apple into native Malus. However, in support of earlier conclusions based on morphology, two species, M. platycarpa and M. dawsoniana, appeared to be the result of interspecific hybridization involving the introduced M. × domestica and some native species. In Amelanchier, phylogenetic analysis of nuclear ribosomal DNA sequence polymorphisms demonstrated that five eastern North American taxa in a group of hybrid microspecies differed at many nucleotide sites (Campbell et al., 1997). Amelanchier humilis or A. sanguinea were proposed to represent one of the parents of this group, and various members from a sister phylogenetic clade to represent the other parent. Polyploidy and/or agamospermy, which are prevalent in the genus, were hypothesized to be responsible for stabilization of the detected sequence polymorphisms.

Although intergeneric hybrids are of common occurrence in the Maloideae, none is described with Chaenomeles as one of the parents, in spite of the shared chromosome number (2n=34) and similar chromosome morphology within the subfamily (Weber, 1964). Moreover, spontaneous intrageneric hybridization has also not been reported in this genus, which is native to China (C. cathayensis, C. speciosa and C. thibetica) and Japan (C. japonica). By contrast, interspecific hybrids have been produced artificially in all possible directions between three of the species (C. cathayensis, C. japonica, and C. speciosa) which have been introduced to Europe and North America, with the main objective to develop new ornamental cultivars (Weber, 1963). Meiosis is regular in the species as well as in the hybrids (Weber, 1964).

In nature, only C. cathayensis and C. speciosa are known to grow sympatrically but detailed information regarding the distribution and ecology is still lacking. The species grow on open hillsides, rocky slopes and ravines in thickets and at the edge of forests. Outcrossing has been suggested for both species, although without documented proof (Weber, 1964). Honeybees and bumblebees pollinate the flowers. Seed dispersal is supposed to take place through birds and rodents, which ignore the very acid fruit flesh and instead eat the numerous seeds. As a consequence of the medicinal use of dried fruits of C. speciosa and the use of C. cathayensis for making vinegar, man is also a potential vector for seed dispersal, collecting fruits in the wild and cultivating small-scale orchards. Obviously there is a contradiction between the virtual absence of reproductive isolation barriers in artificial crossings between C. cathayensis and C. speciosa and the absence of described natural interspecific hybrids. Several explanations for this may be offered: (i) insufficient sampling in putative hybrid zones; (ii) strong selection against F1 hybrids, which eliminates individuals with intermediate phenotypes; and (iii) inability of morphological characters to reveal later-generation hybrids (Rieseberg & Ellstrand, 1993).

In a recent taxonomic study of the genus Chaenomeles (Bartish et al., 2000), both isozyme and RAPD data suggested that two of the analysed populations had been derived through interspecific hybridization between C. cathayensis and C. speciosa. In another study, preliminary analysis of chloroplast DNA from four species of Chaenomeles, with three specific primer pairs (trnLc – trnFf, Taberlet et al., 1991; trnH – trnK and trnS – trnfM, Demesure et al., 1995) revealed a mutation which could discriminate between C. japonica and C. speciosa, and between C. cathayensis and C. thibetica (Bartish, unpubl. results). The mutation appears to be a deletion of an approximately 100 bp fragment from an approximately 1600 bp amplification product, directed by primers trnH and trnK.

In the present investigation, we re-analysed the previously obtained RAPD data set to study within-family marker frequencies in five populations, including the parental species as well as the putative hybrids, to further corroborate our hypothesis. In addition, we analysed chloroplast DNA polymorphism as well as numerous morphological characters in the same five populations.

Materials and methods

Seeds labelled as either C. cathayensis or C. speciosa were obtained from four different areas of Yunnan province, China, with the co-operation of Chinese botanists. These accessions were treated as five different populations (one area with samples of both species), P1–P5 (Table 1). Populations P4 and P5 were collected in the wild (one fruit per sampled genotype), whereas populations P1, P2 and P3 were obtained from fruits purchased at the local markets but collected in nearby mountains. The seeds from one fruit were presumed to constitute a half-sib family. Seeds were stratified (+ 4°C) for two months and germinated in a greenhouse in February 1998. In some populations, a high number of albino seedlings was obtained (up to 44% in one family), which all died within a few weeks. The numbers of albinos and of normal seedlings, respectively, were noted separately. Three families per population, with seven normal seedlings in each family, were then sampled for molecular and morphological analyses (in population P1 two plants died after transplanting, resulting in a total of 103 genotypes). Our intention was to take these families randomly among the seedlings available (Table 1). However, putative interspecific hybrids with heterosis-induced vigour may inadvertently have been preferred as we chose families with healthy, vigorously growing seedlings that could be used for DNA isolation as soon as possible.

Table 1 Number (n) of available and collected families, identity of analysed families, number of analysed individuals, species determination and locality of the five Chaenomeles populations investigated

Analysis of morphology

Plants were kept in a greenhouse during the first growing season. Stem characters (Table 2) were measured in the autumn when plants had entered dormancy. Plants were kept in a frost-free greenhouse during winter. At the end of April, leaves were sampled from long shoots of the annual growth, preferably from the main stem or else from nonshaded erect shoots. Three fully expanded leaves were sampled from nodes 7 to 9, the first node counted 1 cm below the actively growing shoot apex. The leaves were pressed, digitized by scanning (HP ScanJet 6100C), and characterized (Table 2) by use of the MacFOLIATM Pro software (Regent Instruments Inc, Canada). For each character, an average was calculated for the three leaves from each genotype. Finally, several combined characters were calculated (Table 2), including SIFF, which is a commonly used shape parameter measuring the deviation from a circle, and SIAL which measures the deviation from a rectangle of the same width and length as the leaf (West & Noble, 1984).

Table 2 Definition of characters used to estimate morphology in Chaenomeles

DNA extraction and RAPD analysis

DNA extraction and PCR amplification followed Bartish et al. (1999a). Based on reproducibility and polymorphism, 10 primers were selected out of one hundred (‘Operon Technologies’ sets A, B, D, E, F), which had been screened in a previous study on genetic diversity in Chaenomeles (Bartish et al., 1999b).

Chloroplast DNA RFLP analysis

Chloroplast DNA primers trnH and trnK (Demesure et al., 1995) were used to amplify DNA samples, extracted as above, from one seedling per family, i.e. 15 seedlings in total. The reaction mixtures were prepared as in our RAPD analysis, except that: the concentration of primers was 0.2 μM; and the reaction was carried out in a 15 μL reaction mixture. The reaction mixture was subjected to amplification at 94°C for 3 min, 35 cycles at 94°C for 45 s, 58°C for 1 min and 72°C for 1 min 30 s followed by an 8-min extension time at 72°C. The amplified fragments were restricted by Hind III endonuclease (1 u μL–1) and separated by electrophoresis (90 min) on a 1.8% agarose gel containing ethidium bromide and using TPE buffer. The gel was photographed under UV light.

Data analysis

Morphological characters were analysed using the SPSS v.4.0 software package (SPSS Inc., Chicago, Illinois, USA) using the MANOVA and DISCRIMINANT procedures to investigate the differentiation among and within the studied populations and families.

Binary data matrices were produced from the RAPD marker data for individual plants. Within-family variation was calculated as the unbiased expected heterozygosity (Hj) according to Lynch & Milligan (1994), as in Bartish et al. (2000), and as Shannon’s Index (H′j) according to Bussell (1999), but with family as group variable. Variation among families was calculated as G′- statistics (Bussell, 1999) and as Φ-statistics using AMOVA (Huff et al., 1993).

Null allele frequencies were calculated on the RAPD data set for each family separately in accordance with the formulae of Lynch & Milligan (1994). The resulting data set was used to compute Nei’s genetic distances (Nei, 1972) between families. Standardized mean per family estimates of morphological characters were used to calculate Euclidean distances between families. The SAHN program from NTSYS-pc v.1.8 software (Rohlf, 1997) was then used to produce UPGMA dendrograms from each of the distance matrices, i.e. (i) Nei’s genetic distances based on RAPD, and (ii) Euclidean distances based on morphology. A principal co-ordinate analysis (using DCENTER and EIGENVECTOR programs) was run for each of the distance matrices, and the results were viewed in 3D plots.

Pearson product–moment correlation analyses were performed between estimates of gene diversity obtained through different methods of data analysis (Hj and H′j), and between gene diversity and percentage of germination within families.

Results

Discriminant analysis based on morphology

A multivariate analysis of variance (MANOVA) of the morphological characters resulted in highly significant values, both at the population (Wilks’ lambda= 0.02218, P < 0.0001) and at the family (Wilks’ lambda = 0.00021, P < 0.0001) level. The efficiency of each character in separating the group means for populations and families, respectively, was estimated by a univariate F-test based on Wilks’ lambda (Table 3). Leaf serration (NOFT) and plant height (PHGT) were highly efficient both at population and family levels. Spininess (SPI) and shape of leaf apex (APEX) were the least efficient characters for separating populations, as were APEX and average branch length (BLEN) for separating families.

Table 3 Test of univariate equality of group means to estimate the efficiency of different characters in separating the groups. Low values for Wilks’ lambda indicate that the within-group variability is small compared to the total variability, thus most of the total variability is attributable to differences between the means of the populations and families, respectively. All values are highly significant (P < 0.0001). Values of Wilks’ lambda corresponding to the most efficient characters are printed in bold

A discriminant analysis based on morphological characters, with population as group variable, clearly separated the C. cathayensis populations P3 and P4 from one another and from a combined cluster of P1, P2 and P5 (Fig. 1). Two extracted functions together explained 92.0% of the variation, whereas three functions explained 97.1%. A reclassification test, comparing the actual population membership with the predicted population membership based on the discriminant analysis, resulted in overall 82.5% correctly classified cases. From this analysis, the two putative hybrid populations (P2 and P5) were the least well-defined, with only 76.2 and 57.1% correctly classified cases. The corresponding values for populations P1, P3 and P4 were 84.2, 100.0 and 95.2%.

Fig. 1
figure 1

Two-dimensional plot of the result of a discriminant analysis based on the morphological data set with populations as groups and genotypes as cases. The different populations are represented by triangles (C. speciosa), circles (C. cathayensis) and squares (hybrids). Percentage explained variation for the extracted functions: FCN1 70.8% and FCN2 21.2%.

A discriminant analysis based on morphological characters, with family as group variable, resulted in a separation of the families in the putative hybrid populations P2 and P5 into three different groups, whereas the families within the C. cathayensis populations P3 and P4 grouped together (Fig. 2). Family P5–14 was placed rather far away from the two major groups of C. speciosa (P1) and C. cathayensis (P3 and P4), whereas families P5–9, P2–2 and P2–11 fell into a separate group closer to the C. cathayensis cluster. Two extracted functions together explained 77.9% of the variation, whereas three functions explained 86.2%. A reclassification test of the families resulted in 83.5% correctly classified cases. Misclassification for populations P3 and P4 (C. cathayensis) occurred only among families within the correct population. For population P1 (C. speciosa) only one genotype was misclassified to another population, whereas in the same way three genotypes were misclassified for populations P2 and P5, respectively. Discriminant analysis with families as group variable thus yielded at least as good or even better resolution than when population was used as group variable.

Fig. 2
figure 2

Two-dimensional centroid plot of the result of a discriminant analysis based on the morphological data set with families as groups. The different populations are represented by triangles (C. speciosa), circles (C. cathayensis) and squares (hybrids). Percentage explained variation for the extracted functions: FCN1 65.9% and FCN2 12.0%.

Diagnostic RAPDs

One marker (F04.1250) occurred only in population P3 (C. cathayensis) and was therefore diagnostic for this population. Seventeen markers out of the total set of 222 were diagnostic between population P1 (C. speciosa), and populations P3 and P4 (C. cathayensis) (Table 4). Seven of these markers were present in all three families of population P1 (19 individual genotypes) and absent in all six families of populations P3 and P4 (42 individual genotypes). The reverse distribution pattern was observed for 10 other markers.

Table 4 Frequencies of diagnostic RAPD markers within families. Families are ordered from predominantly C. speciosa types to predominantly C. cathayensis types (with direction of hybridization hypothesized from cpDNA data). Apparently introgressed markers are printed in bold

Families in populations P2 and P5 (putative hybrid populations originally labelled as C. speciosa, Table 1) showed rather complicated patterns for the above-mentioned diagnostic markers (Table 4). Deviation by one or a few markers from the otherwise dominating type (either C. cathayensis or C. speciosa, as hypothesized from cpDNA data, Table 5) can be considered as an introgression of alien loci into the genetic background of one of the species. We observed several such cases in family P2-1 (which is of C. speciosa type but with five introgressed markers of C. cathayensis type, of which two were fixed, in family P5-08 (C. speciosa type but with seven introgressed markers of C. cathayensis type, of which two were fixed), in families P2-2 (C. cathayensis type but with five introgressed markers of C. speciosa type, of which four were fixed) and P5-9 (C. cathayensis type but with five introgressed markers of C. speciosa type, of which four were fixed). Even more numerous cases of putative marker introgression were noticed for the two remaining families, which may, in fact represent F1 hybridizations; for family P2-11 (C. cathayensis type but with nine introgressed markers of C. speciosa type, of which six were fixed) and family P5-14 (originally considered as a C. speciosa type based on morphology and cpDNA with 13 introgressed markers of C. cathayensis type, of which six were fixed). Coherent introgression of the same diagnostic markers was evident, even among families sampled from different geographical areas. It is remarkable that two random decamer primers (E07 and D08), out of the total set of eight primers used in the present study, were responsible for 82% of the diagnostic markers (Table 4).

Table 5 Gene diversity estimated as expected heterozygosity (Hj), and Shannon’s Index of diversity (H′j) within families in rank order as well as percentage germination and percentage of albino seedlings (SE = standard error)

Genetic relatedness among families

Whereas the described diagnostic RAPD markers are useful for differentiating accessions of C. cathayensis and C. speciosa, genetic relatedness between families in the entire set of studied populations (P1–P5) cannot be inferred from these markers alone. Therefore, the complete data set of 222 RAPD markers was used for this analysis, as well as the morphological characters. Relatedness between all 15 families from the five populations was represented in UPGMA dendrograms (Figs. 3 and 4). These dendrograms are consistent in: (i) clustering of all families from population P1 (C. speciosa) and families P2-1 and P5-8 in one group with family P5-14 more loosely associated, and (ii) clustering of families from populations P3, P4 (both C. cathayensis) and families P2-2, P2-11 and P5-9, respectively, into three different but closely positioned groups. The RAPD-based dendrogram is however, more efficient in: (i) separation of families from population P1 and families P2-1 and P5-8 into different groups, and (ii) revealing a close relationship between populations P3 and P4.

Fig. 3
figure 3

UPGMA dendrogram of a RAPD-based data set (Nei’s genetic distances) for families of Chaenomeles. The different populations are represented by triangles (C. speciosa), circles (C. cathayensis) and squares (hybrids).

Fig. 4
figure 4

UPGMA dendrogram of a morphological data set (Euclidean distances) for families of Chaenomeles. The different populations are represented by triangles (C. speciosa), circles (C. cathayensis) and squares (hybrids).

The principal coordinate analyses yielded results rather similar to the UPGMA dendrograms (results not shown). There were two main groups: population P1 together with families P5-8 and P2-1, and populations P3 and P4 together with families P2-2, P2-11 and P5-9. Family P5-14 again occupied an intermediate position.

The correlation between matrixes of pairwise distances based on RAPD-data (Nei’s genetic distance) and on morphology-data (Euclidean distance) was r=0.79 (P < 0.0001).

Analysis of cpDNA polymorphism

Analysis of chloroplast DNA resulted in two haplotypes, I and II, which revealed a pattern consistent with the genetic relatedness analyses (Table 5). Families P5-8, P5-14 and P2-1 had the same haplotype (II) as families from population P1 (C. speciosa) whereas families P2-2, P2-11 and P5-9 had the same haplotype (I) as families from populations P3 and P4 (C. cathayensis).

Among- and within-family variability

Although G′-statistics and Φ-statistics (based on Shannon’s Index and on AMOVA, respectively) differed considerably in absolute values, they nevertheless revealed a relatively consistent pattern (Table 6). Thus both statistical analyses found the among-families component of variance to be highest in the two putative hybrid populations P2 and P5, whereas it was lowest in population P3 (C. cathayensis). However, the ranking of populations P1 and P4 differed between the two methods.

Table 6 Partitioning of genetic variability between families within populations, estimated for the RAPD data set as G′-statistics (derived from Shannon’s Index) with standard error (SE) and Φ-statistics (AMOVA) with P-value. Values in bold refer to the largest estimates of variability

The Lynch and Milligan expected heterozygosity (Hj) and Shannon’s Index (H′j) values for within-family gene diversity were closely associated (Pearson coefficient of correlation: r=0.99, P < 0.0001). Ranking of families by their relative diversity showed that families from population P1 (C. speciosa) were the most diverse, and families from population P3 (C. cathayensis) the least diverse (Table 5). Families with chloroplast DNA haplotype II were consistently more diverse than families with haplotype I. Families P2-11 and P4-11 had haplotype I, but their diversity values were still considerably higher compared to families from population P3 (Table 5).

Rate of germination and albino seedlings

Within-family rates of seed germination (except family P5-09 for which data are missing) varied from 19 to 97% (Table 5). These rates were negatively correlated with the two estimates of within-family gene diversity (r=−0.86, P < 0.0001 for both Hj and H′j), suggesting that more homogeneous families have higher germination rates. It should also be noticed that all but one of the lower germination rates was found in families with chloroplast DNA haplotype II. Furthermore, albino seedlings were very common (10–44%) in families with this haplotype, whereas they were completely absent in families with chloroplast DNA haplotype I.

Discussion

The high discriminatory power of leaf morphology characters, revealed in our study, had previously been noted in Chaenomeles by Weber (1964), who also recognized their potential in determining species hybrids. However, had we relied only on morphological characters at the population level, we would not have been able to find clear evidence for hybridization between C. cathayensis and C. speciosa. Instead, a combination of morphological studies of offspring families and molecular data, have shown that two of our populations are, in all likelihood, derived from interspecific hybridization. A reason that natural hybrids have not previously been reported in Chaenomeles may be connected with the well-known mechanism of selection against first-generation hybrids (Rieseberg et al., 1995), and the difficulty in detecting the subtle signs of later-generation introgression using morphological characters alone.

Putatively neutral RAPDs and morphological characters, which may be subjected to natural selection, revealed surprisingly consistent patterns of relatedness between families. However, it is premature to infer the possible influence of natural selection on the process of hybridization between C. cathayensis and C. speciosa because the plant sampling was not designed specifically for this purpose.

Some introgressed diagnostic markers revealed a highly correlated pattern of fixation and polymorphism, irrespective of the putative direction of gene flow or the geographical origin of the analysed hybrid families. This may indicate either tight genetic linkage between different sets of the diagnostic RAPDs, or selection for coadapted gene complexes in genetic backgrounds of both species. The latter mechanism has been suggested by Rieseberg et al. (1996a) in their study of interspecific hybrids in Helianthus. Breeding system can also be an important factor in determining whether markers remain correlated in hybrid zones (Rieseberg & Ellstrand, 1993). However, at the moment, we do not have any experimental data to determine breeding system of our plant material.

Unilateral (directional) introgression of both nuclear and chloroplast DNA is a quite frequent and well-known phenomenon (reviewed by Rieseberg et al., 1996b). This kind of introgression does not, however, seem to prevail in the present study. Both chloroplast DNA haplotypes, otherwise diagnostic for the two studied species, were found within the presumed hybrid populations P2 and P5. We also found that the RAPD markers were associated with the correspondent chloroplast DNA haplotype, so that there were no cases where the chloroplast genome of one species was combined with nuclear DNA markers of mainly the other species. Only two cases (P5-14 and P2-11) represented intermediate nuclear genotypes.

Our results show a strong negative correlation between percentage of seed germination and gene diversities within families: the more diverse the family, the lower the percentage seed germination. It is remarkable that family P2-11, which is the most heterogeneous of the families with a C. cathayensis cpDNA haplotype, at the same time had the lowest seed germination among all families studied. Moreover, we found an absolute concordance between the percentage of (inviable) albinos and the cpDNA haplotype. Albino plants were thus always present in families with the C. speciosa haplotype and never present in families with the C. cathayensis haplotype. The chloroplast genome in Chaenomeles is probably uniparental in origin and maternally inherited like as in most other angiosperms. Consequently, an active selection/linkage mechanism of postzygotic isolation between C. speciosa and C. cathayensis is indicated.

The distribution of molecular markers and the patterns of genetic relatedness between families indicate that at least four of the six hybrid families included in our study represent later-generation backcrosses. Similarly, Rieseberg & Ellstrand (1993) emphasize in their review on interspecific hybridization in plants, that hybrid zones are seldom limited solely to F1s. Furthermore, substantial evidence that chromosomal and genic sterility barriers are responsible for strong selection against F1 hybrids has been presented (reviewed by Rieseberg & Carney, 1998).

At this stage we cannot distinguish between genetic and ecological factors, which may have created the extant pattern of introgression in the sampled populations. Because C. speciosa is a traditional medicinal plant, and both species are cultivated to a minor extent, there is a possibility that man has influenced the natural pattern of gene flow. However, the selection/linkage hypothesis (Rieseberg et al., 1996b) seems to be the most plausible explanation to our findings. Experimental crosses using plants from the families P2-11 and P5-14, which appear to be the most intermediate between C. cathayensis and C. speciosa, as one parent, and plants from the other families as the other parent, could provide estimates of genetic linkages between scored RAPD markers. Comparison of linkage maps of different species and interspecific hybrids could be useful in studying genetic mechanisms involved in the putative selection against interspecific hybrids (Rieseberg et al., 1995). Such analyses could also provide additional information about mechanisms of speciation via interspecific hybridization and introgression.