Introduction

Amplified fragment length polymorphic (AFLP®) markers have been used to construct phylogenic trees of cattle (Ajmone-Marsan et al, 1997; Buntjer et al, 2002), pig (SanCristobal et al, 2002) and goat (Ajmone-Marsan et al, 2001) breeds. Rather than being limited to between-breed studies, AFLP markers can provide information at the within-breed level. Óvilo et al (2000) used AFLP markers to provide information on the genetic relationships between individuals in a highly inbred strain of pigs. However, there have been no studies on the consequences of artificial selection, as opposed to natural selection or genetic drift, on AFLP marker frequencies or on the efficiency of AFLP markers to discriminate between genotypes or selection lines derived from one base population.

The Edinburgh lean growth selection experiment was established to determine the consequences of different selection strategies on the efficiency of lean meat production in pigs (Cameron, 1994). Divergent selection over several generations has resulted in correlated responses in growth, carcass and reproduction traits (eg Cameron et al, 1999, 2002). Given the assumption that trait variation is due to the action of a finite number of genes with variable effect, then responses to selection will be due to changes in allele frequencies of genes or more broadly quantitative trait loci (QTL) controlling traits in the selection criteria and/or the selection objectives. Allele frequencies of AFLP markers positioned within or flanking genes will also be expected to change within each selection line, if the AFLP markers are contained in a window of linkage disequilibrium with selected genes over the period of selection.

The objective of the current study was to determine if AFLP markers could discriminate between divergently selected lines of pigs, with the selection lines derived from one base population.

Materials and methods

Animals

Details on the establishment of the Large White pig population and divergent selection for daily food intake (DFI), lean food conversion (LFC) and lean growth rate on ad libitum (LGA) or restricted (LGS) feeding regimes were given by Cameron (1994). The selection objectives, with selection on LFC, LGA and LGS were to achieve equal correlated responses, measured in phenotypic SD (σ), in carcass lean content and food conversion ratio (LFC) or growth rate (LGA and LGS). For example, the selection criterion for LGA was designed to achieve responses in carcass lean content and growth rate of ασLEAN and ασGROWTH, respectively, for a constant α. Each selection line consisted of progeny from 10 sires with two dams mated to each sire and the selection lines were derived from the one base population. A control line provided genetic links between the four pairs of high and low selection lines. Each year, animals from each pair of high and low DFI, LFC, LGA and LGS selection lines were born during a 6-week period, 3 months apart due to the farm-management system. The generation interval was equal to 12 months as females were culled after weaning their first litter and males were culled after their first mating. The control line consisted of four subsidiary control lines that were genetically linked by mating young males from one control line to first-parity females in the following control line (eg LGA control line males mated to LGS control line first-parity females). The subsidiary control lines were the same size as the selection lines.

Animals from generation 12 of the selection experiment were included in the study. Divergent selection was practiced for seven generations, but subsequently, during a series of genotype with nutrition interaction studies, animals were selected randomly within-litter for mating to maintain the selection lines. The experimental design required three or four full-sibs per sire in each selection line for AFLP marker genotyping. The combination of a short time-period for blood sampling and that animals from the DFI, LFC, LGA and LGS selection lines were born at different times of the year prevented all animals being blood sampled at the same age. The numbers of sires and dams in each selection line is given in Table 1. There were fewer litters in the high LFC selection line than planned, due to the poor reproductive performance of the line. In contrast, only postmated animals were available in the LGA selection lines, which reduced the number of full-sibs per litter.

Table 1 Distribution of sires, dams and animals by selection line

AFLP markers

AFLP marker genotypes were obtained using 10 primer combinations of EcoRI and TaqI restriction enzymes. Of the 1102 observed bands, of length 50–550 bp, 239 were polymorphic AFLP markers (Table 2). Procedures for DNA extraction, AFLP marker analysis, detection and scoring were also described by Ajmone-Marsan et al (1997). The proprietary AFLP marker analysis software (Keygene NV) allowed alignment of multiple gels based on the use of two lanes, containing a size ladder and a standard sample, that were included on every gel-half. Gels were aligned using the two lanes to enable measurement of the band intensities of identical AFLP markers on different gels. AFLP markers were considered to be dominant and were scored on the basis of presence or absence of a band, with an AFLP marker score indicating the proportion of animals with the band absent.

Table 2 Distribution of the 239 polymorphic AFLP markers by primer combination

Statistical analysis

Within-selection line diversity

For each AFLP marker, the frequency of the ‘absence’ allele (qi, i=1, …, 239) was calculated as the square root of the proportion of animals with the AFLP marker scored as being absent. The frequency of the ‘presence’ allele, pi, was equated to 1−qi. For linej (j=1, …, 9), expected heterozygosity, Hetj, was calculated, assuming Hardy–Weinberg equilibrium, as

where pij and qij were the frequencies of the ‘presence’ and ‘absence’ alleles of AFLP markeri in linej for N (=239) AFLP markers.

Between-selection line diversity

Genetic distances between selection lines j and k (j, k=1, …, 9) were calculated using the method of Reynolds et al (1983) equal to −log(1−θ), with θ equal to the coancestry coefficient, where

The above formula includes the correction of Weir (1996) to the Reynolds et al (1983) formula. A graphical representation of the genetic distance matrix was determined using the multidimensional scaling (MDS) algorithm of Genstat Committee (1997).

Discrimination between-selection lines using AFLP marker scores

Linear combinations of AFLP marker scores that maximized the ratio of between-line variation to within-line variation corresponded to eigenvectors from a canonical variate analysis, using the CVA algorithm of Genstat Committee (1997). For each animal, two linear functions of AFLP marker scores, S1 and S2, were obtained from the product of AFLP marker scores and eigenvectors associated with the two largest eigenvalues. A plot of S2 on S1 was examined to determine if the linear functions of AFLP marker scores enabled between-animal discrimination on the basis of selection line.

Cluster analysis, using the CLUSTER algorithm of Genstat Committee (1997), allocated animals to cluster classes, based on the AFLP marker scores, to maximize the between-cluster class sum of squares. While canonical variate analysis determined linear combinations of AFLP marker scores to discriminate between animals of different lines, cluster analysis grouped animals into classes based on the similarity of AFLP marker scores. The similarity matrix was obtained with the FSIMILARITY algorithm of Genstat Committee (1997). With ‘Simple matching’, similarity was 1 if xi=xj and 0 if xixj, where xi and xj refer to AFLP marker scores of animals i and j. With the Jaccard method, similarity was 1 if xi=xj=1 and 0 if xi=xj=0 with weight unity, but if xixj the similarity was 0 with zero weight. Cluster analysis assumed no structure to the classes.

Hierarchical cluster analysis determined a structure for the cluster classes based on the similarity both between-cluster classes and within-cluster classes. Similarity matrices based on ‘Simple matching’ and the Jaccard method were analysed using the HCLUSTER and REDUCE algorithms of Genstat Committee (1997). Hierarchical cluster analysis was also performed on the matrix of genetic relationships of animals in the study, determined from the full pedigree back to the base population, which consisted of 2816 animals. Dendrograms representing the reduced similarity matrices were obtained with the DDEND algorithm of Genstat Committee (1997). A ‘similarity’ matrix based on the Reynolds genetic distance matrix, R, was estimated by 1−R.

The ‘Simple matching’ score was also calculated for each pair-wise comparison of animals using the program Phyltools (Buntjer, 1997) with a subsequent neighbor-joining tree determined by the NEIGHBOR algorithm of the Phylip package (Felsenstein, 1995).

Fingerprints of selection

Identification of AFLP markers located near genes controlling traits in the selection criteria and/or selection objectives may be useful for identifying QTL (Plastow et al, 1998), particularly if selection resulted in ‘presence’ and ‘absence’ of an AFLP marker in a pair of divergent selection lines. Given 239 AFLP markers, four pairs of selection lines and a significance level of 0.05, then 48 statistically significant high–low line comparisons would be expected under the null hypothesis that there was no between-selection line differences in AFLP marker frequencies. For example, there were significant between-selection line differences in the proportion of animals with an absent band for 203 AFLP polymorphic markers, using a χ2 test at the 0.05 significant level. Methods to test the above null hypothesis using information on all AFLP markers have been proposed (eg Raymond and Rousset, 1995), but the aim here was to identify AFLP markers whose presence or absence differed between-selection lines. To reduce Type 1 errors, the (a) ratio of between-selection line to within-selection line variance in AFLP marker scores was estimated for each AFLP marker to identify those with a ratio greater than unity and (b) selection line effects were estimated. Both analyses used the residual maximum likelihood (REML) algorithm of Genstat Committee (1997).

Results

Within-selection line diversity

Expected heterozygosity was significantly (P<0.05) higher in the Control line than in the selection lines (Table 3). Selection for low DFI or high LGS was associated with significantly (P<0.05) higher expected heterozygosity than the complementary selection lines and the two LGA lines had the lowest expected heterozygosity.

Table 3 Expected heterozygosity (Het) of the selection lines and Reynold's genetic distances between-selection lines

Between-selection line diversity

A three-dimensional representation of the Reynolds genetic distance matrix (Table 3) is presented in Figure 1. Within high and low selection line pairs, the distance between the LGA lines was almost double that of the DFI lines. The high LGA line was most distinct from other selection lines. The shortest distance was between the Control line and the low DFI line.

Figure 1
figure 1

Three-dimensional representation of genetic distances between the high (white circles) and low (gray circles) selection lines and the Control line (black circle) estimated using the Reynolds method (circle volumes are consistent with the third eigenvector coefficients).

The stress parameters of MDS analyses of the similarity matrix declined with the dimensionality of the reduced matrix (0.144, 0.087, 0.026, 0.009, 0.006, 0.006, 0.006 for analyses with one to seven dimensions, respectively). The first dimension distinguished between the high LGA and the high DFI and LFC lines, with the second dimension contrasting the high LFC with the high DFI and LGS lines, while the third axis differentiated the low LFC and LGA lines from the other lines.

Discrimination between-lines using AFLP marker scores

Eigenvectors with the three largest eigenvalues from the canonical variate analysis of AFLP marker scores cumulatively accounted for 28, 47 and 64% of the total variation in AFLP marker scores. A plot of the two linear functions of AFLP marker scores, S1 and S2, is given in Figure 2a. There was no overlap in the grouping of animals from the selection lines. The clear discrimination of the high LGA line from the other selection lines was consistent with the Reynolds genetic distance matrix. The relative positioning of the selection lines with Reynolds genetic distance (Figure 1) differed from that with canonical variate analysis (Figure 2a) due to constant weighting of AFLP marker scores in the former but weighting in the latter equal to eigenvector coefficients.

Figure 2
figure 2

The discriminating functions of AFLP marker scores, equal to the sum of AFLP marker scores multiplied by the coefficients of the first (X-axis) and second (Y-axis) eigenvectors from the canonical variate analysis of: (a) all AFLP marker scores and (b) AFLP marker scores from primer combinations involving only E33 and E35.

Allocation of animals to 10 discrete cluster classes by cluster analysis, in which no prior information on each animals' selection line was included in the analysis, provided a correct grouping of 226 of the 270 animals to their selection lines (Table 4). The almost perfect grouping of animals from the LGA lines, the high LFC and low LGS lines was consistent with these lines having the largest Reynolds genetic distances (Table 3). The tenth or ‘Mixed class’ consisted of several animals from the low DFI, low LFC, high LGS and Control lines. The contribution of each cluster class, excluding the ‘Mixed’ class, to the cluster analysis criterion were broadly similar (Table 4).

Table 4 Allocation of animals to cluster class and mixed class based on cluster analysis of similarity matrix for AFLP marker scores and the contribution to the criterion value of each cluster class

Within-selection line similarities, based on ‘Simple matching’, were higher for the LGA lines and low LGS line than for selection lines in which animals were allocated to the ‘Mixed’ class by cluster analysis (Table 5). ‘Simple matching’ resulted in higher within-selection line similarities than the Jaccard method, since both ‘presence’ and ‘absence’ of AFLP marker bands contributed to the former. Within-selection line similarities calculated using ‘Simple matching’ and the Jaccard method were positively correlated (r=0.98, SE 0.35) and both negatively correlated with within-selection line heterosis estimates (Tables 3 and 5) (r=−0.69 and −0.74). Variation between mean within-cluster class similarities was half that of the corresponding within-selection line similarities for both ‘Simple matching’ (7.0 vs 16.0) and the Jaccard method (16.3 vs 38.1) as the ‘Mixed’ class absorbed animals not allocated to their selection line by the cluster analysis (Table 5).

Table 5 Within-selection line and within-cluster class similarities of AFLP marker scores based on ‘Simple-matching’ and the Jaccard method

Dendrograms from hierachical cluster analysis of the Reynolds genetic distance matrix, the genetic relationships based on the full pedigree and similarity matrices for AFLP marker scores are presented in Figure 3. The dendrogram for Reynolds genetic distance matrix essentially ranked the lines according to their distance from the Control line. The dendrogram from the genetic relationship matrix formed two groups: (1) the LGA and LGS lines and (2) the DFI and LFC lines. There was closer grouping of the DFI lines than the LFC lines and likewise with the LGS and LGA lines. In contrast, dendrograms from the AFLP marker scores using ‘Simple matching’ and the Jaccard method grouped both DFI lines with the high LGS and low LGA lines with the high LGA and low LGS and LFC lines forming a second group.

Figure 3
figure 3

Dendrograms of Reynolds genetic distances, genetic relationships from pedigree, and similarity matrices using ‘Simple matching’ and Jaccard method between high (H) and low (L) DFI, LFC, LGA and LGS selection lines and the Control line (Con).

The star topology, a single hub with the selection lines radiating from the centre, of the neighbor-joining tree in Figure 4 was consistent with that expected from a selection experiment in which all lines were derived from the one base population. The longer distance between the central hub and the branching of the high LGA line animals was consistent with the high within-line similarity, the low heterozygosity and the clear differentiation of the high LGA line from the other selection lines, based on the Reynolds genetic distance and the canonical variate analysis. The low DFI line and the Control line were positioned together as indicated by the Reynolds genetic distance.

Figure 4
figure 4

Unrooted neighbor-joining tree of all animals with selection lines indicated.

Fingerprints of selection

Selection line effects for AFLP marker scores for which the between-selection line to within-selection line variance ratio was greater than unity are presented in Table 6. High and low selection for LGA or LGS resulted in the high and low presence of AFLP marker E38/T32/e with the Control line intermediate. Conversely, AFLP marker E35/T32/b was absent in the low LGA and high LGS lines but not in the complementary lines. Selection for high DFI or LFC was associated with the presence of AFLP marker E35/T35/c, unlike other selection lines and the Control line. Two AFLP marker alleles were associated with two lines: AFLP marker E32/T32/a was effectively absent in the high LFC line but present in all other lines while, conversely, AFLP marker E35/T35/d was absent in all lines except the low LGS line with the high LGA line intermediate.

Table 6 Selection line effectsa on AFLP marker scores for which the between-selection line to within-selection line variance ratio was greater than unity

Discussion

The study demonstrated that amplified fragment length polymorphic (AFLP) marker genotyping, using the EcoRI and TaqI restriction enzymes, was effective in discriminating between animals of different selection lines originating from one base population. In only seven generations of divergent selection for components of efficient lean growth rate, changes in quantitative trait loci (QTL) frequencies, resulting in the direct and correlated responses to selection, generated between-selection line differences in AFLP marker scores. Canonical variate analysis identified linear combinations of the AFLP marker scores that grouped animals by selection line with no overlap between selection lines. Cluster analysis of AFLP marker scores correctly allocated 226 of the 270 animals into nine classes, each consisting of animals from only one selection line. The two analyses differed in that selection line information was included in the canonical variate analysis but not in the cluster analysis. Therefore, without the knowledge of an animal's selection line, AFLP marker genotyping enabled allocation to the correct selection line with a high degree of accuracy. Optimization of primer combinations for each population would increase the accuracy of assigning animals to their genotype.

Canonical variate analysis of the 239 polymorphic AFLP markers provided a weighting of each AFLP marker score to totally discriminate between animals from the selection lines. Examination of the AFLP marker score weights, or eigenvector coefficients, would identify AFLP markers that were most informative for between-selection line differentiation. However, such an approach is only of academic relevance. A more useful strategy would identify a subset of primer combinations that generated the AFLP markers which discriminated between animals of different selection lines. For example, including only the AFLP markers generated by primer combinations involving E33 and E35 in the canonical variate analysis enabled differentiation between animals of the LFC lines, the high LGA and the low LGS lines (Figure 2b). Naturally the discrimination achieved with a subset of 151 AFLP markers was not as ‘tight’ as obtained with all 239 AFLP markers. However, canonical variate analysis can be used to quantify the ability of specific primer combinations to generate AFLP markers that discriminate between animals of particular genotypes. The choice of primer combinations will depend on the population of genotypes and the subset of genotypes that require to be differentiated.

The choice of primer combinations used in the current study was based on prior knowledge of primer combinations that had relatively high resolution and a large number of polymorphic bands in pig projects at Keygene NV. The number of fragments per primer combination was higher in the current study, 110, than in the Alves et al (2002) study, 45. One contributing factor to the difference was the greater range of fragment lengths in the current study (50–550 vs 35–400 bp) as primer combinations common to both studies generated 303 fragments less than 400 bp and 118 fragments greater than 400 bp in the current study. Differences in gel systems and software may also have contributed to the different number of fragments per primer combination of the two studies. However, the proportion of polymorphic AFLP markers per fragment was similar in both studies, 0.22 and 0.24, respectively.

Studies examining between-breed variation have used genetic distances to determine dendrograms and neighbor-joining trees connecting breeds. Commonly used distances are Cavalli-Sforza and Edwards (1967), Rogers (1972) and Nei (1978). Reynolds et al (1983) stated that the Nei (1978) distance assumed a drift/mutation model, while the Rogers (1972) distance was dependent on allele frequencies. The Cavalli-Sforza and Edwards (1967) and Reynolds et al (1983) distances assume no mutation and that changes in allele frequency were entirely due to genetic drift. SanCristobal et al (in preparation) assume that differences between European pig breeds were mainly due to genetic drift, such that the Reynolds distance may be an appropriate measure of genetic distance in that context. However, the between-selection line differences of the current study have arisen through artificial selection and nonrandom mating, such that genetic drift will only account for a proportion of the correlated responses to selection (Hill, 1972). Therefore, in the current study, none of the above measures of genetic distance were really appropriate. Provided that the estimated genetic distances are interpreted in a holistic manner and not reparameterized into traits such as divergence time (Kantanen et al, 2000), the genetic differences between the selection lines, then, can be quantified. Although estimated breeding values for production or carcass traits may provide a more interpretable measure of genetic difference.

Detection and genetic mapping of QTL associated with a polygenetic quantitative trait in experimental crosses between widely differing genotypes has been an effective approach to identify genomic regions containing genes affecting traits of interest (Andersson et al, 1994). However, QTL identified in such experimental populations may not be the same QTL as those segregating in commercial populations. Plastow et al (1998) proposed that a combination of AFLP markers and selective genotyping could be used to identify QTL in commercial populations. AFLP markers have been mapped near to QTL for traits associated with lean growth rate (Siggens et al, 1999) in a commercial sire line of pigs. Wimmers et al (2002) identified a carcass trait QTL using a similar procedure to Siggens et al (1999) but in an F2 experimental cross. Divergently selected lines, derived from one base population, present an alternative experimental resource to F2 populations for detecting QTL as indirectly demonstrated by the current study. AFLP markers identified as being associated with QTL would gain credibility if the AFLP marker scores for the suite of selection lines were consistent with the selection objectives. For example, the difference in AFLP marker score between the high and low LFC, LGA and LGS lines of AFLP markers E38/T32/e and E35/T32/b suggest that AFLP marker E38/T32/e may be associated with the rate of lean growth, while AFLP marker E35/T32/b may be associated with the efficiency of lean growth. The objective of the current study was to determine if AFLP markers could differentiate between animals of the different selection lines. A subsequent study is required to study associations between AFLP markers and phenotypic traits and to provide mapping information as in the Siggens et al (1999) and Plastow et al (1998) studies.

The current study demonstrated that AFLP markers could differentiate between lines of pigs after only seven generations of divergent selection. Alves et al (2002) used AFLP markers to discriminate between genotypes of pigs from the perspective of protecting a brand name. AFLP marker scores could also be used to generate a similarity matrix for pedigree verification or for establishing genetic relationships when such information is not available. For example, Visscher et al (2002) estimated a pedigree error rate of 10% in the UK dairy population using microsatellite markers with a predicted reduction in the rate of genetic improvement of 2–3%. In populations of animals in which little or no pedigree information was available, genetic relationships could be estimated using AFLP marker information for the purpose of reducing between-animal variation in inbreeding and maintaining genetic variation in a population, as in a conservation programme.