Introduction

Recent advances in molecular genetic techniques have provided novel experimental tools to address long-standing problems in evolutionary biology. Insights into the genetic basis of phenotypic variation, adaptation and reproductive isolation have been generated by the mapping of quantitative trait loci (Bernacchi and Tanksley, 1997; Bradshaw Jr et al, 1998; Hawthorne and Via, 2001). These data in turn have prompted a series of theoretical investigations into the distribution of allelic effects at genes that are responsible for adaptation and speciation (Orr, 1995, 1998; Turelli and Orr, 2000). As the list of species for which QTL data are available continues to be extended beyond the classical subjects of genetic research, these theoretical models can be put to increasingly thorough tests.

Hybrid zones are ideally suited for the genetic dissection of phenotypic traits. They involve closely related taxa that have diverged by natural selection, and so can inform us directly about the role of genes of major effect in species divergence. Moreover, the expression, the relevance to fitness and the genetic basis of a given phenotypic trait can be jointly analysed in natural hybrid populations. Controlled laboratory crosses can thus be complemented by QTL mapping in the field. Finally, hybrid zones often harbour a wide spectrum of recombinants. Highly recombined genomes increase the resolution of QTL mapping, and studies of differential marker introgression benefit from the cumulative effect of many generations of selection (Rieseberg et al, 2000). For example, repeatable patterns of suppressed introgression imply selection against particular chromosomal segments in Helianthus hybrids (Rieseberg et al, 1999). Similarly, markers on sex chromosomes showed reduced rates of introgression compared to those on autosomes in the Danish mouse hybrid zone (Dod et al, 1993). Excepting these two cases, the lack of genetic linkage maps for hybridising taxa has so far precluded the full use of hybrid zones for the genetic analysis of adaptation and divergence (Rieseberg and Buerkle, 2002).

The fire-bellied toads Bombina bombina and B. variegata represent a pair of highly diverged species that still produce abundant fertile hybrids wherever their distribution ranges adjoin. The maintenance of species differences across narrow clines (Szymura and Barton, 1986, 1991; MacCallum et al, 1998) shows that natural selection is keeping these two taxa from merging into one. Divergence over the course of about four million years (Szymura, 1993) has resulted in genetic incompatibilities, which reduce the fitness of hybrids (Kruuk et al, 1999). The two taxa are found in different kinds of breeding habitat: B. bombina reproduces in semipermanent ponds, whereas B. variegata is a typical puddle breeder. Differences in mating system, egg size, a whole suite of tadpole traits and body proportions in adults are presumably adaptations to these habitats (taxon differences summarised by Szymura, 1993; see also Kruuk and Gilchrist, 1997; Vorndran et al, 2002). Moreover, in places where ponds and puddles occur in a small-scale mosaic, even hybrid adults show a preference for one or the other habitat (MacCallum et al, 1998). The abundance of divergent phenotypic traits makes this species pair a particularly good candidate for the development of a genetic linkage map as a first step towards a QTL analysis of adaptation and incipient reproductive isolation.

The nuclear genome of the genus Bombina (sensu strictu) consists of 12 chromosome pairs, all of which are metacentric or submetacentric (Morescalchi, 1965). In metaphase preparations, no gross differences in chromosomal structure are apparent between B. bombina and B. variegata, although the B. variegata genome is about 12% larger than that of B. bombina (21.1 and 18.8 pg, respectively, Olmo et al, 1982). With an average of about 1010 bp, Bombina genome size is among the largest of 228 species of Anura and greatly exceeds those estimated for other members of the family Discoglossidae (see Duellmann and Trueb, 1994). Only two other frogs (Hyla versicolor and Eleutherodactylus binotatus) have genomes of comparable known size (Olmo et al, 1982) and are, respectively, either known or presumed polyploids. Finally, observations on chiasma frequencies in B. variegata suggest a total map length of 22.56 Morgans in males and 30 Morgans in females (Morescalchi, 1965).

In our development of a Bombina linkage map, we have so far concentrated on codominant markers, mainly microsatellites, SSCPs and allozymes. We present here data on the first 38 mapped markers, 29 of which are codominant. Moreover, among the 40 originally isolated CA microsatellites, we noted an unusual repeat motif structure. Observations of this kind provide clues about the mutational processes that underlie the origin and expansion of microsatellites. We therefore include a brief summary of these findings below.

Materials and methods

Molecular markers

DNA extraction

For the construction of a genomic library, high molecular weight DNA was extracted from muscle tissue of adults following the protocol in Sambrook et al (1989). Routine DNA extraction of ethanol-preserved tissue for genotyping was carried out either from single toes of adults or from tadpole tissue. In these cases, tissue samples were digested overnight with Proteinase K (final concentration: 100 μg/ml) at 55°C in 0.25 ml TNES buffer (0.05 M Tris, 0.4 M NaCl, 0.1 M EDTA, 0.5% SDS). Following the addition of 0.25 ml 2.6 M NaCl, the samples were shaken vigorously for 15 s and then centrifuged for 10 min. The supernatant was extracted once with chloroform. The DNA was then ethanol precipitated, washed once in 70% ethanol, dried and resuspended in 50–100 μl of ultrapure water (Merck). Stock solutions were stored at −20°C.

Library construction

Genomic DNA of single individuals was digested with Sau 3A and size-fractionated on low melting point agarose gels. Fragments between 250 and 500 bp and, following the acquisition of an automatic sequencer, between 700 and 1200 bp, were used. The excised, purified digestion products were ligated into the Bam HI site of dephosphorylated plasmid vectors (either pUC18 or pGEM, Promega), and the ligation products were used to transform competent Escherichia coli (strain JM109) by electroporation. Blue-white screening indicated ligation efficiencies around 80%.

Microsatellite screen

Aliquots of the libraries were plated onto 35 large (22 × 22 cm) selective agar plates, yielding a screen of 2.1 × 105 recombinant colonies. Colonies were lifted onto Hybond N (Amersham Biosciences) nylon membranes, and these were probed with a 32P-end-labelled (CA)15 oligo. Three of the 35 plates were also probed with a mixture of four trinucleotide repeats (GTC, CGA, TCC and CCA). All clones that gave a signal on this first screen were streaked out in replicate onto fresh agar plates and rescreened as before. Based on this secondary screen, 49 putative microsatellites were sequenced in both directions. Primers were designed with PRIMER v0.5 (Lincoln et al, 1991).

SSCP primers

Random clones over 400 bp in length were sequenced, and primers were designed to give PCR fragments between 200 and 340 bp (two exceptions: 168 and 157 bp).

PCR optimisation

PCR reactions were set up in a total volume of 30 μl with 50–100 ng template DNA, 50 mM KCl, 10 mM Tris (pH 9.0 at RT), dNTPs (0.2 mM per nucleotide), 5–10 pmol of each primer and 0.5 units Taq polymerase (rTaq, Amersham Biosciences). Amplification was carried out with oil overlay on a Hybaid Touchdown thermocycler. After initial denaturation for 3 min at 94°C, the basic cycling profile was as follows: 15 s at 94°C, 30 s at 55°C and 30 s at 72°C for 32–35 cycles. For long products (800–1000 bp), the extension time was 1 min. In order to obtain reliable amplification of unique products in both taxa, we varied as required the annealing temperature (52°–66°C), the MgCl2 concentration (1.0–3.0 mM) and the primer amount (5–10 pmol per primer). Additives such as tetramethylammonium chloride, glycerol and formamide were also experimented with for loci that did not amplify well. Amplification in heterozygotes was tested by using 1:1 mixtures of B. bombina and B. variegata DNA as template. Up to three different primer pairs (including all of their permutations) were tried out per locus.

Electrophoresis of microsatellites

Microsatellites were separated after amplification with Cy5 fluorescently labelled primers on an AlfExpress automatic sequencer (Amersham Biosciences) using 0.5 mm thick denaturing gels (LongRanger™ gel solution, BMA).

SSCP development

The search for SSCPs was carried out using native 0.5 mm thick horizontal polyacrylamide gels (acryl:bis=37.5:1, BioRad) in coolable MultiPhor (Amersham Biosciences) electrophoresis chambers. Buffer strips (SSCP version, ETC GmbH, Kirchentellinsfurt, Germany) were soaked in 2 × TBE running buffer and placed on either end of the gel. As gel buffer, we used 0.1 M Tris adjusted to the desired pH with acetic acid. The following parameters were varied in order to optimise the resolution of allelic differences: gel buffer pH (6.8–7.5), running temperature (2–18°C), and gel strength (8–13% acrylamide). Bands were visualised by silver staining.

Allozymes

Homogenates of tadpole tails were prepared by grinding the tissue in distilled water. Clear extracts were subjected to starch electrophoresis (Szymura, 1995). The allozymes encoded by the following loci: lactate dehydrogenase (Ldh-1), malate dehydrogenase (Mdh-1), isocitrate dehydrogenase (Idh-1), creatine kinase (Ck), adenylate kinase (Ak), glucose-6-phosphate isomerase (Gpi) and nucleoside phosphorylase (Np) were resolved in Tris-citrate buffer, pH=6.0. Other allozymes: aspartate aminotransferase (Aat-1) and glucose-6-phosphate dehydrogenase (G6pd) were studied in lithium buffer, pH=7.2. Staining methods were from Shaw and Prassad (1970). One linkage group (Ak and G6pd) among these loci has already been reported (Hofman and Szymura, 2000).

F2 cross

F1 hybrids were obtained by mating a female B. bombina from Kielce (Poland) to a male B. variegata from Zürich (Switzerland) in 1993. The hybrids reached sexual maturity after 2 years. We obtained F2 progeny from one F1 pair. Tadpoles were fed a pulp of boiled nettle and dandelion leaves. They were killed after a month, by an overdose of aminobenzoic acid ethyl ester (MS 222). The tail was frozen for use in the allozyme studies. After removal of the gut, the body was preserved for DNA extractions in 70% ethanol, which was changed the next day. In all, 96 offspring of this F2 cross were used to establish the linkage map.

Mapping

Mapping was carried out with Mapmaker/Exp v3.0b (Lincoln et al, 1993) using the ‘error detection’ option. Suspicious scores that had been identified by the ‘genotype’ command were reanalysed and corrected as required. Linkages with a LOD score of 3.0 or greater were considered significant.

Results

Marker development

Among the 49 putative microsatellite clones, there were 41 (=84%) true positives (repeat number ≥3) that contained either a CA (40 cases) or a CGA (1 case) repeat pattern. Primers were designed for the 23 clones with sufficient and suitable flanking sequences. Six microsatellites gave reliable amplification in both taxa and yielded codominant markers. Another two microsatellites that amplified only in the cloning taxon were mapped as dominant markers. This low yield was due in part to mildly repetitive flanking regions that caused problems with primer design and, presumably, cross-species amplification (see Discussion).

Primers for SSCPs were designed from 22 clones. Four of these primer pairs gave single clean PCR products for which allelic differences could be resolved. Most other primer pairs produced multiple PCR products either of very similar or of widely different sizes. In order to identify alleles of the same locus in complex SSCP banding patterns, one of the following approaches was used. A distinct pair of SSCP bands with very similar electrophoretic mobility that produced a Mendelian ratio of segregants in the F2 cross was treated as a single locus. Alternatively, bands of the same mobility in both taxa on denaturing gels were excised, reamplified and then separated on SSCP gels. Given diagnostic differences and normal segregation, the fragments were treated as allelic to each other. Finally, for a number of primer pairs, individual PCR products were cloned, sequenced and subjected to phylogenetic analysis in order to identify orthologous sequence pairs in B. bombina and B. variegata for which new, more specific primers could then be designed. In total, we developed 11 codominant and eight dominant SSCP markers.

Segregation ratios

In the case of codominant markers, we tested whether the frequencies of B. variegata alleles and of heterozygotes agreed with Mendelian expectations. For the dominant markers, the frequency of bands was considered. For each variable, we compared the observed variance across loci with that obtained from simulated Mendelian segregations of unlinked loci using the actual sample sizes per locus. Note that this is a conservative test, as linkage will increase the observed variance. Highly significant deviations for the codominant loci were entirely because of Mdh-I. Once it was removed from the data set, the variances were in line with Mendelian expectations (frequency of B. variegata alleles: P=0.106; frequency of heterozygotes: P=0.816). The same was true for the frequency of bands for the dominant loci (P=0.638). Finally, there were no deviations in any of these measures from the expected means.

Mapping

The Bombina linkage map currently includes 28 codominant markers (six microsatellites, 11 SSCPs, two simple length polymorphisms and nine allozymes) and 10 dominant markers (microsatellites and SSCPs, Figure 1). These fall into 20 linkage groups comprising one (11 cases), two (3), three (3), four (1) or five (1) markers. LOD scores greater than 4.0 supported the addition of any one marker to a linkage group (one exception: Bb3.34, LOD score=3.51). The original cloned sequences (including the primer positions) of all mapped molecular markers have been deposited in GenBank (Accession numbers AF472421-AF472442).

Figure 1
figure 1

The Bombina linkage map. The loci in the column on the left are unlinked. Codominant markers are indicated by an asterisk. Distances were computed with the Kosambi mapping function and are listed in cM to the left of each linkage group. The order of loci within a linkage group represents the maximum likelihood estimate. Dotted vertical lines indicate locus groups whose permutations give orders within 2 logL units of the maximum. In a separate study (Szymura, 1995), one additional locus, liver esterase β (Est-β), was found to be within 1 cM of Gpi.

Sequence divergence among multiple PCR products

For eight primer pairs, PCR products were cloned and sequenced from the grandparents of the F2 cross and from several individuals of pure populations near the Pešćenica transect in Croatia (MacCallum et al, 1998). In all cases, multiple products of a given primer pair showed clear signs of homology (Table 1), and most of them could be aligned over their entire length. Exceptions were the microsatellite locus Bv41.11, at which an allele with a 556 bp nonrepetitive insert segregates in B. variegata, and primer pair Bb42.22 which yields B. bombina and B. variegata amplification products that align only for a 186 bp stretch (78 and 32% of their total length, respectively) and are not allelic to each other. In homologous stretches, the minimum divergence per locus between any B. bombina and B. variegata fragment pair ranged from 0 to 8% divergence. SSCP polymorphisms differed by as few as two substitutions from each other.

Table 1 Sequence divergence among multiple PCR products of a given primer pair based on the substitution model of Hasegawa et al (1985)

Microsatellite repeat motifs

Of the 40 isolated CA microsatellites, 19 were compound and typically had a large total number of repeats (Figure 2). Among the compound loci, 17 featured combinations of CA and TA repeats only and showed perfect dinucleotide periodicity (eg (AT)n(AC)m rather than (AT)n(CA)m, Bull et al, 1999). Variant interspersed dinucleotides (eg a CT embedded in a run of (CA)n) that are indicative of point mutations were frequently observed, but single base pair indels were seen only twice. The TA and CA repeats typically formed interleaved patterns. Even pure CA microatellites showed an association with TA dinucleotides that tended to precede them immediately at the 5′ end. Overall, TA motifs occurred at the 5′ end of microsatellites in 21 cases. Given the mean frequency of adenine and thymine residues across all Bombina microsatellite flanking regions, this pattern is highly unlikely under a Poisson model of dinucleotide occurrence (P=0.001).

Figure 2
figure 2

The distribution of the total repeat number for Bombina microsatellites. Black bars represent perfect or imperfect motifs, whereas white bars indicate compound motifs.

Discussion

With this study, we have laid the foundation for a linkage map that is intended to become a tool for the genetic analysis of ecological adaptations and reproductive isolation in the Bombina hybrid zone. In order to get an approximate estimate of the expected coverage of the genome that this map provides, we carried out simulations in which 38 markers were randomly placed onto the 12 Bombina chromosomes. Averaged over males and females, the mean total genome size is 26.3 Morgons (Morescalchi, 1965), which we distributed over the chromosomes based on their relative lengths in metaphase preparations of B. variegata (Figure 4d of Schmid et al, 1987). The simulations indicate that on average 35.1% of the genome should be within 20 cM and 45.5% within 30 cM of a marker. We note that the observed number of linkage groups with 1, 2,…, 5 markers, respectively, agrees very well with the simulated expectation of random placement.

Microsatellites appear to be rare in the Bombina genome. We screened approximately 2.1 × 105 recombinant plasmids with an average insert size of 300 bp. Our 40 microsatellites thus correspond to a density of one CA microsatellite every 1300 kb. This yield is unlikely to be a consequence of our screening protocol. By the same method and with a fraction of the effort, microsatellites were efficiently isolated in meerkats (Griffin et al, 2001) and corals (Maier et al, 2001). Moreover, the Bombina screen produced a large number of small CA repeats, and positive controls were reliably detected. It appears therefore that the density of CA microsatellites in Bombina is an order of magnitude lower than in birds (one per 136 kb, Primmer et al, 1997), which in turn have a much lower density than, eg humans (1 per 30 kb, Beckmann and Weber, 1992). Our observation is in line with a recent survey of (CA)n repeats in vertebrates (Neff and Gross, 2001), which demonstrated a decline in microsatellite density with increasing genome size.

The flanking regions of a number of isolated microsatellites in Bombina appeared to be mildly repetitive, which precluded the design of primers. In other cases, the locus could only be amplified in the cloning taxon, which could be similarly because of rapid divergence in repetitive flanking sequence. We therefore quantified the amount of the so-called cryptic simplicity (Tautz et al, 1986) with the Simple34 algorithm (Alba et al, 2002). A score that represents a weighted sum of occurrences of tri- and tetranucleotide motifs was computed, and its random distribution was simulated based on the observed dinucleotide frequencies in the sequence. Relative simplicity, that is the ratio of the observed score over the simulated mean, was significantly greater than one in flanking regions within 50, 100 and 150 bp around Bombina microsatellites (Table 2). A comparison between those Bombina microsatellites that amplify at least in the source taxon (‘usable’ in Table 2) and those that have been published for other anurans suggests that Bombina microsatellites are embedded in particularly extensive tracts of cryptic simplicity.

Table 2 Relative cryptic simplicity around microsatellites in four anuran taxa (see text for explanation)

Compound microsatellites have been found in many taxa (Weber, 1990; Estoup et al, 1995), but rarely with such a high frequency as in Bombina. Their origin is usually attributed to point mutations that disrupt previously perfect repeats and then expand into repeated arrays themselves (Levinson and Gutman, 1987). Indeed, several phylogenetic analyses of microsatellite evolution suggest that some duplication mechanism must be operating even when the repeat number is as low as one or two (Primmer and Ellegren, 1998; Zhu et al, 2000; see also Rose and Falush, 1998). The observed interleaved patterns can arise when a mutant motif becomes part of a large loop in the elongating strand. DNA synthesis may then transverse this motif in the template strand for a second time and thus insert another copy in the newly synthesised molecule (Petes et al, 1997). Relative efficiencies of the enzymes involved (eg those that specialise in excising large loops) may be responsible for the pattern seen in Bombina.

Finally, the predominance of TA/CA compound microsatellites in Bombina is striking. Studies in other taxa have similarly shown that the representation of motifs in compound repeats deviates strongly from random expectation (Bull et al, 1999; Kruglyak et al, 2000). These patterns point to as yet unknown mutational processes of presumably neutral dinucleotide repeats. The same is true for the peculiar and taxon-specific polarity. While in Bombina most TA–CA microsatellites have a TA motif at the 5′ end, the opposite is true for yeast (Kruglyak et al, 2000), while no trend is seen in humans (Bull et al, 1999).

SSCPs from randomly cloned DNA fragments should in principle provide a limitless supply of diagnostic single-nucleotide polymorphisms, as is seen in our comparison of homologous sequences from the two taxa. All but one of these loci also amplified equally well in both taxa. However, most of these sequences exist in multiple copies that are sufficiently recent so that a pair of specific primers (20–25 bp in length) amplifies several to many of them and allele identification becomes impossible. Possibly, these duplications are part of the process that caused the unusually large genome size in Bombina. DNA–DNA hybridisation analyses across three genera of Discoglossidae showed that genome size in Bombina is mostly because of a large amount of middle-repetitive DNA, whereas highly repetitive DNA was equally abundant in all four genomes studied (Olmo et al, 1982). These changes in the relative proportions of the different DNA fractions and the absence of chains of bivalents in meiotic metaphase preparations argue against polyploidy in Bombina (Olmo et al, 1982).

Not unexpectedly (Lynch, 2002), duplications have often gone hand in hand with relocations. With the exception of one triple (Figure 1: Bb1.20/1,3,5), mapped DNA fragments that amplify with the same primer pair are not linked to each other (Figure 1: Bb31.2a-c, Bb1.1a-b, Bb1.20/2,4). In the case of coding DNA, the combination of duplication and either subfunction divergence or alternate gene silencing might be responsible for some degree of genetic incompatibility in the hybrids (Lynch and Force, 2000).

Clearly, randomly cloned and presumably noncoding DNA is not an efficient source of codominant markers in Bombina. AFLPs could provide dominant markers as long as the fraction of those diagnostic bands that stem from multiple recent and dispersed copies of a given fragment in the source taxon is low. Another promising alternative should be the identification of intron polymorphisms through ‘exon-primed intron-crossing (EPIC)’ PCR. Conserved exon sequences for primers could be identified from Xenopus/Homo alignments of sequenced genes. In this way, we intend to develop the linkage map into an increasingly useful tool to analyse the genetic basis of speciation and adaptation in the natural laboratory of the Bombina hybrid zone.