Introduction

Population genetic structure and variation are determined by the interaction of mutation, genetic drift, gene flow, and selection (Wright, 1931). The relative contribution of these processes to genetic structure within and among populations is a fundamental question in evolutionary biology. Most population genetic studies have used putatively neutral genetic markers to describe relationships between populations. However, the population genetic structure of functionally important genetic regions may differ from that of neutral genetic regions because of the influence of differential selection across the landscape (van Tienderen et al., 2002; Sommer, 2005). In non-model organisms, it has been difficult to investigate adaptive variation and divergence across populations because relatively few functional genes have been identified (Vasemagi and Primmer, 2005). An exception lies in the genes of the major histocompatibility complex (MHC). The MHC represents a functionally important region of the vertebrate genome that has been well-characterized across many taxa (Klein et al., 1997; Sommer, 2005; Piertney and Oliver, 2006). Several recent studies have shown that the MHC provides an ideal system in which to examine the importance of selection, relative to other evolutionary processes, in population genetic structure (for example, Landry and Bernatchez, 2001; Heath et al., 2006; Ekblom et al., 2007).

The MHC encodes for proteins that bind to pathogen-derived antigens (Matsumura et al., 1992). An immune response is activated when foreign-derived peptides (antigens) are presented to T cells by MHC molecules (Klein, 1986). MHC class I proteins generally function in the recognition of intracellular pathogens, whereas MHC class II proteins function in the recognition of extracellular pathogens (Klein, 1986; Nei and Hughes, 1991). The peptide-binding region (PBR) of the MHC is involved in pathogen recognition, and tends to be the region of highest genetic variability (Hughes and Hughes, 1995; Hughes and Yeager, 1998). Changes to amino acids of the PBR may result in altered binding capability and subsequent pathogen recognition abilities of MHC molecules (Brown et al., 1993). For example, studies of the MHC in humans, birds, and fishes have shown that individual alleles are associated with disease resistance (for example, Hill et al., 1991, 1994; Kean et al., 1994; Langefors et al., 2001; Grimholt et al., 2003; Miller et al., 2004; Johnson et al., 2008). Pathogen-mediated selection can be assessed by comparing the ratio of nonsynonymous (Dn) to synonymous (Ds) nucleotide polymorphisms in the PBR versus the non-PBR of MHC molecules, and, indeed, most studies of nucleotide substitution rates within the MHC PBR have indicated that this region is the subject of positive selection (Hill and Hastie, 1987; Hughes and Nei, 1988; reviewed by Bernatchez and Landry, 2003; Sommer, 2005; Piertney and Oliver, 2006).

The MHC encompasses one of the most variable genetic regions ever reported (Bernatchez and Landry, 2003; Sommer, 2005; Piertney and Oliver, 2006). For example, various loci of the human MHC (human leukocyte antigen or HLA) exhibit several hundred alleles (Solberg et al., 2008). It has been suggested that the extensive genetic variation at the MHC is maintained through balancing selection, mediated through overdominance effects (for example, Langefors et al., 1998; Arkush et al., 2002; Wegner et al., 2003) or negative frequency-dependent selection (Hedrick, 1999; Bernatchez and Landry, 2003; Sommer, 2005). If overdominance is maintaining genetic variation, then heterozygous individuals should be capable of mounting an immune response against a wider range of pathogens than homozygous individuals (Doherty and Zinkernagel, 1975). In the case of negative frequency-dependent selection, rare alleles will be associated with relatively high fitness if pathogens are able to exploit the most common host MHC genotype (Hughes and Nei, 1988; Paterson et al., 1998). Temporal or spatial fluctuations in pathogen communities may also play important roles in maintaining population genetic variation (Hedrick, 2002; Summers et al., 2003).

The MHC often displays considerable differentiation across geographical landscapes as has been found in populations of mammals, birds, and fishes (for example, Miller and Withler, 1997; Landry and Bernatchez, 2001; Miller et al., 2001; Wegner et al., 2003; Bryja et al., 2007; Dionne et al., 2007; Ekblom et al., 2007). Population subdivision at the MHC may arise because of mutation, gene flow, genetic drift, or local adaptation (Schierup et al., 2000). The role of selection in the evolution of MHC allele differences across populations can be assessed by comparing patterns of genetic variation at the MHC with that found in putatively neutral genetic markers such as microsatellites (Schierup et al., 2000; Bernatchez and Landry, 2003). Population structure at microsatellite loci provides a baseline estimate of population subdivision under neutral evolutionary processes. Under balancing selection, it is expected that MHC genetic divergence across populations will be lower than at microsatellites because MHC alleles are less likely to be lost due to random processes (Bernatchez and Landry, 2003). Conversely, if spatially variable selection is driving populations to become uniquely adapted to the environment, populations will exhibit higher levels of divergence at the MHC than at microsatellite markers (for example, Landry and Bernatchez, 2001).

In this study, we examine genetic variation at the MHC in five populations of Chinook salmon (Oncorhynchus tshawytscha) occurring in British Columbia, Canada. Salmonid populations have faced significant declines and even extirpation over the past 50 years (Nehlsen et al., 1991; Levin and Schiewe, 2001), and the degree of local adaptation exhibited by functional loci, such as the MHC, may be important when population conservation and plans for re-introduction are considered (Meyers and Bull, 2002; van Tienderen et al., 2002; Sommer, 2005). Chinook salmon are anadromous, semelparous breeders, in which individuals typically return to their natal freshwater stream to breed after a period of 2–6 years in salt water. Natal philopatry limits gene flow among populations, and thus, Chinook salmon populations are expected to show high levels of local adaptation (Taylor, 1991; Waples et al., 2004). Chinook salmon occur in all major river systems in the north Pacific Rim and the genetic relationships between these populations have been documented using neutral genetic markers (Beacham et al., 2006; Heath et al., 2006). However, studies of population structure and divergence at the MHC in Chinook salmon are limited. Miller et al. (1997) reported extensive genetic variation at the MHC class I and II of Chinook salmon; however, the study documented variation in only two populations. In a more extensive study, Miller and Withler (1997) examined variation at the MHC class I across 15 populations, but used a denaturing gradient gel electrophoresis approach that was able to distinguish only about half of the alleles, and thereby significantly underestimated genetic variation within populations and the extent of population genetic structure. Similarly, Heath et al. (2006) examined population structure of the MHC class I and II across 10 populations of Chinook salmon; however, the authors used a coarse PCR-RFLP approach capable of detecting variation in only a few common alleles. Given the potential importance of negative frequency-dependent selection in the maintenance of genetic variation at the MHC, an examination of all alleles is clearly important. In this study, we expand on these earlier studies by examining genetic variation across populations of Chinook salmon at the MHC class I-A1 (exon two) and class II-B1 (exon two) using a sequencing and single-strand conformation polymorphism approach capable of detailing all alleles. We also investigate selection at the sequence level by examining patterns of sequence substitutions. Finally, we assess the potential for local adaptation at MHC loci by examining patterns of MHC genetic variation relative to microsatellite genetic variation across populations. Our data add significantly to the understanding of evolution at both class I and II loci of the MHC, and provide important data on five threatened Chinook salmon populations.

Methods

Fish sampling

In the spring of 2006 and 2007, we collected tissue samples from Chinook salmon fry in three rivers located on the east coast of Vancouver Island, British Columbia. These rivers included the Quinsam, Puntledge, and Big Qualicum rivers (Figure 1). In the Puntledge River, there are two distinct Chinook salmon runs (fall and summer). We collected tissue samples from fall-run fry, as the summer run is an endangered population numbering only 400 spawning adults and is maintained primarily through hatchery enhancement (Guimond and Withler, 2006). The Puntledge River fall run was also considerably depleted in the mid-1980s but was subsequently re-established using individuals from the Quinsam and Big Qualicum rivers (Heath et al., 2006). We also collected samples in the fall of 2006 and 2007 from the Kitsumkalum River (a tributary of the lower Skeena River) in northwestern BC, and in 2007, from the main stem of the lower Skeena River, approximately 40 km west of the confluence of the Skeena and Kitsumkalum rivers (Figure 1). The east coast of Vancouver Island and the Skeena represent two distinct regions, and were expected to exhibit significant differentiation (see Beacham et al., 2006). All fry were captured using a combination of baited gee traps, live traps, or beach seining. Fry were sampled over a 1-week period in May (east coast Vancouver Island) or September (northwestern BC) of each year and were anaesthetized in MS-222 (Tricaine Methanesulfonate; Argent labs, Redmond, WA, USA) so that a tissue sample could be obtained. The tissue samples were stored in 95% ethanol for later genetic analyses.

Figure 1
figure 1

Map of British Columbia indicating the locations of the two northwestern British Columbia and three east coast Vancouver Island populations of Chinook salmon (Oncorhynchus tshawytscha). The northwestern British Columbia populations are KR, Kitsumkalum River; and LS, lower Skeena River. The Vancouver Island populations are QR, Quinsam River; PR, Puntledge River; and BQ, Big Qualicum River. Figure modified from Heath et al. (2006).

MHC genotyping

DNA was extracted from tissue using a DNA Wizard Extraction kit (Promega Corp., Madison, WI, USA). We examined genetic variation within a portion of the MHC class I-A1, and MHC class II-B1 using primer sets described in Grimholt et al. (1993), Hordvik et al. (1993), and Miller et al. (1997). Primers were designed to amplify the exons encoding the PBR located on the α1 chain of the MHC class I and the β1 chain of the MHC class II. Both the MHC class I and II studied are likely functional because they are transcribed in closely related species including the Atlantic salmon (Grimholt et al., 1993; Hordvik et al., 1993). We used polymerase chain reaction (PCR) to amplify each locus following the protocols outlined in Miller et al. (1997) and Docker and Heath (2002). The MHC class I-A1 primers amplify either 222 or 228 bp, whereas the MHC class II-B1 primers amplify 213 bp. For both loci, PCR products were visualized using SSCP (single-strand conformation polymorphism). Amplicons were electrophoresed through GeneGels on the Amersham-Biosciences SSCP system using buffer system ‘C’ at 12 °C following the manufacturer's protocols and gels were fixed and stained using a silver stain (GE Healthcare Bio-Sciences Corp., Piscataway, NJ, USA). Samples that exhibited unique conformations were cloned using the Promega pGEM T-easy vector kit following manufacturer's instructions (Promega Corp.). Between four and eight clones containing inserts were sequenced each time a conformation was sequenced (see Table 1 for breakdown of sequencing sample sizes). Unique single-strand conformations and alleles that were observed only once were verified by retyping the individuals containing the allele. Amplification products were sequenced by Genome Quebec (McGill University, Montreal, Canada).

Table 1 Sample sizes of five populations of Chinook salmon (Oncorhynchus tshawytscha)

Sequences were aligned using molecular evolutionary genetics analysis (MEGA) software 4 (http://www.megasoftware.net/; Tamura et al., 2007) and chromatograms were read in FinchTV (Geospiza Inc., Seattle, WA, USA, http://www.geospiza.com). MHC class I sequences were aligned to the human HLA-A1 (Bjorkman et al., 1987) and MHC class II sequences were aligned to the human HLA DRB1 (Brown et al., 1993) to identify the nucleotides that encode the putative PBR for each locus. Sequences were also aligned to previously reported Chinook salmon MHC class I-A1 or II-B1 alleles (Miller et al., 1997).

Sequence-level analyses

For both the MHC class I and II, we calculated rates of nonsynonymous (Dn) and synonymous (Ds) substitutions in MEGA 4 (Tamura et al., 2007). Nonsynonymous substitutions involve a change in amino acid sequence and are more likely to undergo selection than are synonymous substitutions. Purifying selection results in Dn/Ds ratios of <1, whereas Dn/Ds ratios>1 indicate that nonsynonymous substitutions are being maintained through processes such as balancing selection (Nei and Hughes, 1991; Ohta, 1991). Standard errors for the rate estimates were based on 1000 bootstrap replications. Z-tests for selection (using the Nei–Gojorbori method applying the Jukes–Cantor correction) were also conducted in MEGA 4. To investigate selection across codons of the MHC loci, we used omegaMap (Wilson and McVean, 2006). OmegaMap uses a Bayesian approach combined with a population genetics approximation of the coalescent with recombination to examine positive selection across DNA sequences (Wilson and McVean, 2006). The model was run twice on population allele frequencies at each MHC locus. In each simulation we ran 500 000 Markov-chain Monte Carlo iterations and the results were thinned every 1000 iterations to obtain the posterior distribution (see Miyake et al., 2009). We discarded the first 10 000 iterations and 20 000 iterations for the MHC class II and I, respectively, as ‘burn-in’. We did not have prior knowledge of codon frequencies across the Chinook salmon genome, so it was assumed that all codons had equal equilibrium frequencies in the model. The parameters estimated by omegaMap include the Dn/Ds ratio (ω), recombination rate (ρ), transition transversion rate ratio (κ), the rate of sysnonymous transversion (μ), and the rate of insertion/deletion (ϕ). We allowed ω and ρ to vary across codons and we used the inverse prior for these variables; average block size at which ω and ρ were estimated was set at 10 and 30, respectively. The remaining variable priors were set to improper_inverse. All priors were set as recommended by Wilson and McVean (2006) for circumstances in which knowledge of parameter distributions is unknown for the study organism.

Population analyses

Estimates of allele and genotype frequencies, observed and expected heterozygosities, FST, and tests of Hardy–Weinberg equilibrium were calculated using Genepop 3.4 (Raymond and Rousset, 1995). Allele richness, based on a sample size of 20, was calculated in FSTAT 2.9.3 (Goudet, 2002). We tested for allele frequency differences between years using a generalized linear model (GLM) in SPSS 17. We specified the GLM to use a normal distribution, a nominal response variable, and year and population as factors in the model. Mantel tests were performed in Genepop to examine isolation-by-distance relationships between estimates of MHC FST/(1−FST) and the natural logarithm of the geographic distance along the river and coast (in km) between the collection locations in each population. The significance of this relationship was determined using 1000 permutations in the subroutine Isolde. Analysis of molecular variance across geographic regions, across populations, and within populations was conducted in Arlequin 3.1, using the Jukes–Cantor correction. The statistical significance of the observed variance was determined over 1023 haplotype permutations (Excoffier et al., 2006). Calculation of nucleotide diversity was also conducted in Arlequin 3.1.

We used Micro-Checker (van Oosterhout et al., 2004) to test for the presence of null alleles at the MHC class I and II. MHC class I allele frequencies indicated the likely presence of a null allele in the Vancouver Island and lower Skeena populations, so we used Genepop 3.4 to generate maximum likelihood corrected allele frequencies across these populations (Dempster et al., 1977). Corrected allele frequencies were subsequently used in estimates of population genetic differentiation following Chapius and Estoup's (2007) ENA method (Supplementary information Table S1).

When comparing genetic divergence at microsatellites and MHC loci, we converted our estimates of FST to G′ST because the magnitude of FST is dependent on the heterozygosity of the locus examined; G′ST scales estimates of FST to the maximum amount of genetic variation each locus exhibits and thus accounts for differences in variation among genetic markers (Hedrick, 2005). Thus, we calculated G′ST as=FST/FST(max). FST(max) was defined as 1–Hs (see equation (3) in Hedrick, 2005), where Hs is the mean heterozygosity across all subpopulations examined (Hedrick, 2005). Relationships between MHC and 13 microsatellite markers were initially examined by performing a Mantel test that compares estimates of pairwise population divergence for each type of marker. Microsatellite allele frequencies were obtained for individuals sampled over multiple years by the Molecular Genetics Lab of Fisheries and Oceans, Pacific Region (see Beacham et al., 2006; http://www.pac.dfo-mpo.gc.ca/sci/mgl/data_e.htm). A summary of the microsatellite dataset used is provided in Supplementary information Table S2. Correlation coefficients were calculated in SPSS version 16 (SPPS Inc., Chicago, IL, USA), and Mantel tests were conducted in Genepop 3.4, using 1000 permutations to determine statistical significance. We also generated 95% confidence intervals (CIs) for pairwise microsatellite G′ST estimates by bootstrapping over loci 1000 times in FSTAT 2.9.3 (Goudet, 2002). We then compared G′ST estimates for the MHC loci to the 95% CIs, and identified outliers as populations undergoing selection at the MHC (Weir, 1996).

Results

MHC diversity

For both the MHC class I and II, typically 1–2 alleles were identified per individual, suggesting that neither locus is duplicated. For the MHC class I, 49 of the 228 (21%) nucleotide sites were variable, and we identified 37 distinct alleles (Supplementary information Figure S1a). Ten of the 37 (27%) alleles were found in only one population and 5 (14%) of the alleles were found in all populations (Supplementary information Table S1). Seventeen (46%) of the MHC class I alleles were identical to alleles published in Genbank. For the MHC class II, 38 of 213 (18%) nucleotide sites were variable and we found 17 distinct alleles (Supplementary information Figure S1b). Eight (47%) of these alleles were found in only one population, whereas five (29%) alleles were found in all populations (Supplementary information Table S1). Ten (59%) of the MHC class II alleles were identical to alleles published in Genbank. Examining the amino acid sequences, for MHC class I, 28 of 76 (37%) codons were variable. All 37 alleles coded for unique amino acid sequences. For MHC class II, 25 of 72 (35%) codons were variable and all 17 alleles coded for unique amino acid sequences. Neither locus showed signs of frameshift mutations that would cause the alleles to become non-functional.

Historical selection

For the putative PBR of the MHC class I, the rate of nonsynonymous substitutions was almost twice as high as that of synonymous substitutions; however, the Z-test for selection indicated that this difference was not statistically significant (Table 2). Codons outside of the putative PBR also showed higher rates of nonsynonymous substitutions than synonymous substitutions but again this difference was not statistically significant (Table 2). In contrast, the MHC class II showed a significantly higher rate of nonsynonymous substitutions than synonymous substitutions in the putative PBR (Table 2). However, a statistically higher ratio of nonsynonymous substitutions to synonymous substitutions was not observed for codons outside of the putative PBR for the MHC class II (Table 2).

Table 2 Summary of nucleotide substitution rates in the MHC of Chinook salmon (Oncorhynchus tshawytscha)

The omegaMap analysis showed evidence of positive selection (that is Dn/Ds>1) across several MHC classes I and II codons. Within the MHC class I (Figure 2a), codons 54–64 and 69–73 exhibited a posterior probability of positive selection>95%. All of these codons are associated with the putative PBR of the MHC class I molecule. Codons 58–71 also showed evidence of recombination (lower 95% CI ρ>1). The omegaMap analysis also identified several regions of the MHC class II molecule exhibiting significant evidence of positive selection (Figure 2b). Specifically, codons 10–15, 30–32, 40–42, 50–51, and 54–65 exhibited significant posterior probabilities of positive selection. Codons 10, 15, 31, 56–60, and 63–64 are part of the putative PBR of the MHC class II. Some of the codons exhibiting evidence of positive selection were not identified as part of the putative PBR; however, these codons were located immediately adjacent to the putative PBR (that is codons 11–14, 30, 32, 40–42, 50–51, 54–55, 65). No codons within the MHC class II exhibited evidence of recombination.

Figure 2
figure 2

Codon-level analysis of positive selection across MHC sequences of Chinook salmon (Oncorhynchus tshawytscha). Analyses comprise the MHC class I-A1 exon (a) and MHC class II-B1 exon (b) with upper plots within each figure showing estimates of positive selection and lower plots showing the posterior probability of positive selection. Mean omegaMap estimates of positive selection (ω) are indicated by the solid black line. The dotted line in the upper plots corresponds to neutral estimates of nonsynonymous/synonymous substitutions (that is ω=1) and in the lower plots correspond to the 95% posterior probability. The grey shaded area shows the 95% CI surrounding the estimates of positive selection. Stars indicate the locations of the putative peptide-binding regions (PBR) for each molecule. Putative PBRs were identified by aligning the MHC sequences in MEGA with human HLA molecules.

Population variation

Allele frequencies did not differ significantly between years for the MHC class I (GLM: χ21=0.295, P=0.587, n=668) or the MHC class II (GLM: χ21=3.183, P=0.074, n=709). Within populations, the number of alleles at the MHC class I ranged from 15 to 27 (Table 3). Most populations, except for the Kitsumkalum River, exhibited a significant heterozygote deficit (Table 3). In contrast, the MHC class II exhibited a significant excess of heterozygotes in three populations (Table 3). The MHC class II also exhibited a lower allelic richness than did the MHC class I between 8 and 10 alleles found within populations (Table 3). Mean nucleotide diversity across populations was 0.041±0.011 for the MHC class I and 0.104±0.046 for the MHC class II (Table 3). No single MHC class I allele was most common in all populations. Allele 12 was most common in the Big Qualicum and Puntledge rivers, at 15% and 14% of alleles, respectively, allele 3 was most common in the Quinsam River at 28% of alleles, allele 32 was most common in the Kitsumkalum River at 40% of alleles, and allele 32, at 14% of alleles, was most common in the lower Skeena River (Supplementary information Table S1). However, the MHC class II allele 2 was the most common allele found in all five of our study populations, though the frequency of the allele varied between 32% and 73% of alleles among populations (Supplementary information Table S1).

Table 3 Summary of genetic variation across five populations of Chinook salmon (Oncorhynchus tshawytscha)

For the MHC class I, the analysis of molecular variance indicated that the majority of the variance occurred within populations but significant differentiation occurred among populations within each region (Table 4). Near-significant geographic differentiation also occurred between regions for the MHC class I. A similar analysis of molecular variance result was shown for the MHC class II, in which the majority of the variance occurred within populations (Table 4). However, near-significant geographic differentiation also occurred among regions and among populations within regions.

Table 4 Summary of analysis of molecular variance (AMOVA) results based on five populations of Chinook salmon (Oncorhynchus tshawytscha)

Among population divergence

Pairwise population genetic differentiation at the MHC class I, as estimated by FST, ranged from –0.002 to 0.155 (Table 5). Estimates of FST for the MHC class II ranged from −0.002 to 0.167 (Table 5). We did not detect a significant isolation-by-distance relationship for either MHC class I (Mantel test: r=0.38, P=0.388, Figure 3) or the MHC class II (Mantel test: r=0.58, P=0.109, Figure 3).

Table 5 Summary of genetic differentiation occurring among five populations of Chinook salmon (Oncorhynchus tshawytscha)
Figure 3
figure 3

Isolation-by-distance for pairwise comparisons of five populations of Chinook salmon (Oncorhynchus tshawytscha). Each point represents a pairwise comparison (FST) between the MHC class I (black circles), MHC class II (open circles), and microsatellites (grey circles) and natural log of the geographic distance.

Pairwise estimates of MHC class I G′ST and the G′ST of microsatellites were only nearly significantly correlated (Mantel test: r=0.46, P=0.065) whereas those for the MHC class II and microsatellites were significantly correlated (Mantel test: r=0.80, P =0.03). However, we found that the overall MHC class I G′ST (0.424) was significantly higher than the G′ST for microsatellites (microsatellite G′ST=0.183, 95% CI=0.125–0.258). In contrast, the overall MHC class II G′ST (0.144) fell within the lower 95% CI for the microsatellite G′ST. Moreover, the majority (60%) of the pairwise G′ST estimates for the MHC class I fell above the 95% CIs estimated from microsatellites (Figure 4a) showing that some populations of Chinook salmon are more divergent at the MHC class I than expected based on neutral estimations. The Puntledge-Big Qualicum MHC class I G′ST was slightly lower than the G′ST of microsatellites, whereas only the Big Qualicum-lower Skeena, and Quinsam-lower Skeena, and Puntledge-lower Skeena G′ST estimates were not significantly different than the G′ST of microsatellites (Figure 4a). At the MHC class II, 50% of the pairwise G′ST estimates were significantly lower than at the microsatellites, and the majority of these values occurred in pairwise comparisons within geographic regions (Figure 4b).

Figure 4
figure 4

Relationship between genetic divergence (G′ST) at MHC and microsatellite loci in five populations of Chinook salmon (Oncorhynchus tshawytscha). Closed circles indicate pairwise population divergence estimates at MHC class I (a) and class II (b). Population genetic divergence at microsatellite loci is indicated by open circles and error bars indicate 95% CIs. Populations are KR, Kitsumkalum River; LS, lower Skeena River; QR, Quinsam River; PR, Puntledge River; and BQ, Big Qualicum River. Note that the scales differ on the y-axes.

Discussion

The MHC represents one of the most variable genetic regions in vertebrate genomes. In this study, we detected high allelic diversity at both the MHC class I and II loci across five populations of Chinook salmon. We found 37 alleles at the MHC class I and 17 alleles at the MHC class II in 342 and 365 individuals, respectively. These levels of allelic variation exceed the levels of variation that have been reported earlier for Chinook salmon (Miller et al., 1997; Miller and Withler, 1997; Kim et al., 1999; Garrigan and Hedrick, 2001); although this difference could reflect our relatively large sample size or increased allele detection sensitivity obtained through an SSCP and sequencing approach. In a study of approximately 40 individuals from two populations, Miller et al. (1997) described 22 alleles at the Chinook salmon MHC class I-A1 locus and 3 alleles at the MHC class II-B1 locus, whereas Garrigan and Hedrick (2001) detected 12 MHC class I-A1 alleles in 44 individuals. Studies of other fishes have found comparatively high numbers of alleles at the MHC class II as were identified in this study. Landry and Bernatchez (2001) reported 18 alleles across 14 populations of Atlantic salmon (Salmo salar) and Miller et al. (2001) reported 11 alleles in 31 populations of Sockeye salmon (O. keta). The allelic diversity we observed in Chinook salmon provides additional evidence that the MHC is one of the most polymorphic regions of vertebrate genomes.

Positive selection is frequently invoked to explain the high levels of polymorphism observed at the genes of the MHC (Hughes and Nei, 1988). Evidence for positive selection can be assessed through the comparison of synonymous (Ds) and nonsynonymous (Dn) substitution rates within the PBR and also by comparing substitution rates within and outside of the PBR (Hill and Hastie, 1987; Hughes and Nei, 1988). We found a Dn/Ds>1 in the putative PBR of both MHC loci, which indicates that nonsynonymous substitutions are being maintained at a rate that is higher than expected under neutral evolution. Moreover, the Dn/Ds ratio in the putative PBR was higher than the Dn/Ds ratio observed in the non-PBR for both loci, albeit it was only significantly higher for the MHC class II. We also tested for positive selection across regions of the MHC molecules using a codon model that incorporates recombination in omegaMap. For both the MHC class I and II we found several regions of the molecules exhibiting evidence of positive selection. For the MHC class I, all of the codons exhibiting evidence of positive selection were associated with the putative PBR, which provides strong evidence that amino acids associated with this region are undergoing selection for diversity. We also found evidence that the putative PBR is undergoing recombination, suggesting that recombination may also be an important mechanism generating genetic diversity within the MHC. For the MHC class II, we found that codons located within or immediately adjacent to the putative PBR appear to be undergoing selection for amino acid diversity. One of those codons is not part of the HLA PBR (codon 31 here and codon 53 in the HLA) but has previously been shown to exhibit evidence of positive selection, leading to the suggestion that the codon may actually be part of the salmon MHC PBR (Aguilar and Garza, 2007). However, a structural model of the Chinook salmon MHC class II molecule will be necessary to confirm this suggestion. Together our analyses indicate that positive selection is stronger in the PBR of both the MHC class I and II in Chinook salmon, as has been shown in other studies (reviewed in Bernatchez and Landry, 2003; van Oosterhout et al., 2006; Dionne et al., 2007), and suggest that historical positive selection is maintaining amino acid diversity within the MHC PBR.

Under balancing selection, heterozygous individuals are selectively favoured either through increased ability to mount an immune response against pathogens (Doherty and Zinkernagel, 1975). For the MHC class II, we found that three populations, the Big Qualicum, Puntledge, and Kitsumkalum, exhibited heterozygosity that significantly exceeded Hardy–Weinberg expectations. Other studies of the MHC have also found higher levels of observed heterozygosity than expected, suggesting that selection for heterozygotes is occurring in many species (for example, Seddon and Baverstock, 1999; van Haeringen et al., 1999; Peters and Turner, 2008; Oliver et al., 2009; see Garrigan and Hedrick, 2003 for review). In a recent study, we have shown that MHC class II heterozygous Chinook salmon fry exhibit fewer bacterial infections that do homozygotes (Evans and Neff, 2009). Moreover, a study by Arkush et al. (2002) on Chinook salmon showed that heterozygotes at the MHC class II exhibited higher survival than homozygotes when challenged with infectious haematopoietic necrosis virus (but see Langefors et al., 2001). Given the potential prevalence of infectious haematopoietic necrosis virus in aquatic systems (Meyers, 1998), it is a possible source of selection pressure on heterozygotes in wild Chinook salmon populations. Overall, our population genetic data, in combination with studies that have examined infections among MHC class II heterozygous and homozygous Chinook salmon, suggest that balancing selection is operating at this locus.

In contrast to the heterozygote advantage hypothesis, we detected a heterozygote deficit for the MHC class I in all of our study populations except for the Kitsumkalum. This deficit could be mediated by underdominance genetic effects, in which heterozygous genotypes exhibit lower survival than do homozygotes. Such effects have been shown in one study of Chinook salmon, albeit at the MHC class II locus (Pitcher and Neff, 2006). In Atlantic salmon it has been recently shown that fertilization success is higher between MHC class I similar mates (Yeates et al., 2009), which may reflect underdominance. Alternatively, it is possible that the observed heterozygote deficit reflects genetic drift occurring within populations or the presence of a null allele (Manwell and Baker, 1970). In the Malagasy giant jumping rat (Hypogeomys antimena), genetic drift was implicated in heterozygote deficits observed at the MHC (Sommer, 2003). Miller and Withler (1997) detected a heterozygote deficit at the MHC class I in the Big Qualicum Chinook salmon population, as well as in a population in the Skeena River and the authors speculated that a null allele was the cause. A study of MHC class I inheritance will be necessary to determine whether a null allele is indeed the cause of the heterozygote deficit.

Allele frequencies within populations reflect the outcome of selection or neutral processes such as gene flow, and drift. Estimates of genetic divergence at microsatellites and the MHC class II were significantly correlated, and near significantly correlated with the MHC class I, which suggests that population genetic divergence at the MHC may, in part, be driven by neutral evolutionary forces. In contrast, there was no apparent isolation-by-distance at the MHC, which is unexpected if neutral processes are important. Furthermore, several population pairwise comparisons of divergence exhibited higher differentiation at the MHC class I and lower differentiation at the class II relative to microsatellite loci. These data suggest that the class I region is subject to differential selection pressures across populations whereas the class II region is subject to homogenizing or balancing selection across populations. Differential selection at MHC classes I and II loci is highly plausible given that the genetic regions encoding for these loci are unlinked in fishes (Klein et al., 1997). Interestingly, Heath et al. (2006) instead reported higher genetic differentiation at the MHC class II than at microsatellite loci in the same species as we studied. However, as mentioned previously, their study used a PCR-RFLP approach to examine population genetic variation and thus, is not directly comparable to our results. Similar to our findings, a study of the guppy (Poecilia reticulata) found lower population differentiation at the MHC class II than at microsatellites (Fraser et al., 2009). The authors of that study suggest that the genetic similarity across populations is due to homogenizing directional selection driven by a common parasite across populations (Fraser and Neff, 2009). Yet studies on other fish species (for example, O. keta, Miller et al., 2001; S. salar, Landry and Bernatchez, 2001; O. mykiss, Aguilar and Garza, 2006), birds (for example, Gallinago media, Ekblom et al., 2007), and mammals (for example, Arvicola terrestris, Bryja et al., 2007) have shown higher levels of divergence at the MHC when compared with microsatellites, or in a few cases similar levels of genetic divergence at MHC markers and at microsatellite markers (for example, Boyce et al., 1997; Peters and Turner, 2008). Overall divergence trends at the MHC class I, in combination with the maintenance of high levels of genetic variability, suggest that spatially variable balancing selection is an important factor driving population differences in allele frequencies. At the MHC class II, selection appears to also play a role, but is driving some populations to exhibit similar population genetic structure.

For species in decline, such as Chinook salmon, it is clear that we must consider adaptive divergence at functionally important genes in recovery efforts. Our results indicate that selection is driving population variation at the Chinook salmon MHC. This variation may relate to differences in pathogen communities across regions, but further studies are required to address the importance of specific pathogens in the evolution of population differences.