Introduction

Multiple sclerosis (MS) is a chronic inflammatory disease of the central nervous system with well-documented genetic contributions to its pathogenesis [1]. Genome wide association studies have implicated >100 loci in MS risk [1]. The strongest genetic associations with MS are with specific alleles at the HLA loci, in the major histocompatibility complex (MHC) on chromosome 6p21. In particular, HLA-DRB1*15:01 is the strongest genetic determinant of MS; this association has been very well-established in a variety of studies and populations [2,3,4]. HLA-DRB1 allelic heterogeneity in MS risk has been described [5,6,7,8,9,10,11,12,13,14,15,16], but the role of genetic variation at the other HLA loci has been less clearly defined, due, in part, to the extensive linkage disequilibrium (LD) among the alleles at these loci. The MHC includes ~165 closely linked genes, roughly half of which have immune-related functions [17], and large-scale SNP screening of the MHC has identified at least one non-HLA MS association in the so-called class III region [14]. Using recently developed next-generation sequencing (NGS) assays, we investigated the association of HLA class I and class II alleles with MS. NGS also facilitates the association analysis of the DRB3, DRB4 and DRB5 loci (DRB3/4/5). These loci display strong LD with specific DRB1 allele families [18], and may modulate autoimmune disease associations attributed to the DRB1 locus [19] and display DRB1-independent associations [20, 21].

HLA disease associations are typically interpreted in terms of peptide binding and presentation driving specific adaptive immune responses, but class I epitopes serve as ligands for the killer immunoglobulin-like receptors (KIR) on natural killer (NK) cells, key elements in innate immunity [22, 23] and possible contributors to MS pathogenesis. While the precise role of innate immunity in MS pathogenesis is unclear, NK cells may contribute to MS indirectly via immunoregulatory activity, or directly through cytotoxicity of self-tissues [24,25,26,27].

KIR epitope ligands are encoded by class I amino acid positions 77 and 80; variants at these positions define the HLA-C C1 and C2 ligands [28, 29], the HLA-A A3/A11 ligand, and the Bw4 ligand of HLA-B and some HLA-A molecules [30, 31]. Encoded by genes on 19q13.4, inhibitory and stimulatory KIRs regulate the cytolytic killing and cytokine secretion of NK cells. The KIR gene complex is characterized by extensive gene content variation and allelic diversity; KIR haplotypes have been classified into two broad categories: KIR A (9 genes with primarily inhibitory functions) and KIR B (14 genes with inhibitory and stimulatory functions). The C1 ligand is recognized by the inhibitory KIR2DL2 and KIR2DL3 receptors, C2 by KIR2DL1 [29], Bw4 by KIR3DL1 [32], and A3/11 by KIR3DL2 [33]. The stimulatory KIR2DS1 [34, 35] and KIR2DS2 receptors are thought to bind to C2 and C1, respectively [36]; KIR2DS4 receptors bind strongly to A11 and weakly to C1 and C2 [37].

KIR polymorphism has also been implicated in predisposition to many diseases, including MS [38,39,40,41,42,43]. The presence of Bw4, the ligand for KIR KIR3DL1, was protective for MS in a Norwegian cohort [38] and, more recently, the combination of KIR3DL1 and Bw4 was protective in a study of African-American patients and controls [44]. Disease association analyses of KIR variation in the context of the HLA ligand require adjustment for LD between the HLA ligands, and specific disease associated HLA alleles. Using a MALDI-TOF mass spectrophotometer assay for KIR locus presence/absence and a NGS assay for HLA class I and class II alleles, we explored the association of specific KIR/HLA ligand combinations in a group of 412 patients of non-Hispanic European ancestry and 419 ethnically matched controls. We address the confounding issue of LD in these association analyses using the strategy of stratification, analyzing those strata of the data in which an associated allele is present separately from those in which it is absent.

Results

We initially examined the association of alleles at individual HLA loci. Due to the very high LD between the DRB1 and DQB1 loci, and the HLA-C and -B loci, each locus pair (DRB1~DQB1 and C~B) haplotype was analyzed as a “super-locus” (Tables 1, 2, 3 and Supplementary Table S1). With the exception of DPB1, all loci and super-loci displayed significant locus level heterogeneity between MS patients and controls (Table 1).

Table 1 Locus-level heterogeneity between multiple sclerosis patients and controls
Table 2 Significant association of DRB1~DQB1 haplotypes and DPB1 alleles with multiple sclerosis
Table 3 Significant association of HLA-A alleles and B~C haplotypes with multiple sclerosis

HLA class II associations

Table 2 shows the association of DRB1~DQB1 haplotypes and of DPB1 alleles. As extensively documented in previous studies [4, 45, 46], DRB1*15:01~DQB1*06:02 confers very high disease risk in this population (OR = 3.98; p = < 2.22E−16). We note that the other relatively common DR2 (including DR15 and DR16 alleles) haplotype in this population, DRB1*16:01~DQB1*05:02, does not confer MS risk (OR = 1.0; p = 0.95) in this data set. Association studies of African-American populations, in which the LD patterns differ and the DQB1*06:02 allele is often found on non-DRB1*15 haplotypes, indicate that it is DRB1*15:01 and not DQB1*06:02 that confers MS risk [44, 47]. Given the strength of the DRB1*15:01 association with MS, all observed associations (class I alleles or HLA ligands) should be examined in light of potential LD with DRB1*15:01.

The other significantly associated susceptible DRB1~DQB1 haplotype in this data set is DRB1*03:01~DQB1*02:01 (OR = 1.63; p = 1.41E−03), as previously reported [5, 6]. The DRB1*04:05, *08:01, and *13:03 alleles, previously reported to be associated with MS [7,8,9,10,11,12,13,14,15,16] were not associated in this data set. DRB1*04:05 and *08:01 are found on haplotypes with different DQB1 alleles in European and East Asian populations. The low frequency of the DRB1*13:03~ DQB1*03:01 haplotype in this data set (f = 0.014 in controls and 0.02 in cases) may explain the lack of statistical significance for this association (OR = 1.37; CI = 0.6–3.19; p = 0.412). The frequency of DRB1*04:05 haplotypes was very low, and these haplotypes were “binned” (Supplementary Table S1). Counts, frequencies, and summary statistics for all detected alleles and haplotypes are included in Supplementary Table S2.

DRB1*01:01~DQB1*05:01 (OR = 0.41; p = 9.57E−06), DRB1*04:01~DQB1*03:01 (OR = 0.4; p = 1.24E−03), DRB1*14:01~DQB1*05:03 (OR = 0.42; p = 0.038) and DRB1*07:01~DQB1*02:02 (OR = 0.55; p = 0.0014) were significantly protective for MS. The DRB1*01:01~DQB1* 05:01 haplotype is known to include the DQA1*01:01 allele [48,49,50,51,52], which, along with DRB1*01:01, was recently shown to be protective for MS in the presence of DRB1*15:01 [53]. While no DRB1~DQB1 haplotypes in our study displayed MS associations in the DRB1*15:01-positive stratum (Supplementary Table S3), DRB1*01:01~DQB1*05:01 remained protective in the DRB1*15:01-negative stratum (OR = 0.57; p = 1.9E−02). The DRB1*14:01 protective effect has been previously reported [54,55,56]. No individual DPB1 alleles were associated with MS in this data set.

The clonal nature of NGS allows the analysis of the secondary DRB loci (DRB3/4/5). Because all DRB1*15:01~ DQB1*06:02 haplotypes carried the DRB5*01:01 allele, and all DRB1*16:01~DQB1*05:02 haplotypes carried the DRB5*02:02 allele, the role of allelic variation at DRB5 could not be assessed in this data set. However, the predisposing DRB1*03:01~DQB1*02:01 haplotype carries either DRB3* 01:01 or *02:02. A recent study of type 1 diabetes [19] showed that the DRB1*03:01 haplotypes carrying DRB3*02:02 conferred greater risk than did those carrying DRB3*01:01. For MS, the allelic variation in DRB3 appeared to affect the risk conferred by DRB1*03:01 haplotypes based on this modest sample set (15 MS patients, three controls) but this effect was not significant. The OR for DRB1*03:01 homozygotes homozygous for DRB3*01:01 was 3.64 (CI = 0.69–36.1), whereas the OR for DRB1*03:01 homozygotes that carried DRB3*02:02 was 8.36 (CI = 1.1–371.2). Testing whether the point estimates for these ORs are significantly different will require a larger sample set. We note that A*30:02 and B*18:01, alleles in strong LD with DRB1*03:01~DRB3*02:02 haplotypes [19], are associated with MS (Table 3).

Protective association of A*02:01

In the association analyses of the class I loci (Table 3), HLA-A*02:01 appears protective (OR = 0.69; p = 1.46E−03), as previously reported [57,58,59,60,61,62]. After stratifying the data to account for negative LD with DRB1*15:01 (Table 4), A*02:01 on haplotypes lacking DRB1*15:01 remains protective (OR = 0.48; p = 1.1E-08).

Table 4 Association of HLA-A*02:01 and C*03:04~B*40:01 with multiple sclerosis in the presence and absence of DRB1*15:01

Further, the ORs of three common extended A~C~B~DRB1~DQB1~DPB1 haplotypes, all bearing DRB1*15:01 and differing only in the HLA-A allele, indicate that the presence of A*02:01 can reduce the risk conferred by DRB1*15:01 (Table 5). The OR conferred by the extended C*07:02~B*07:02~DRB1*15:01~DQB1*06:02~ DPB1*04:01 haplotype bearing A*02:01 is lower (OR = 1.65) than the OR for the same haplotype bearing A*03:01 (OR = 2.83) or A*24:02 (OR = 4.48). This protective effect of A*02:01 is not simply a haplotype effect. The modification of DRB1-mediated risk by A*02:01 can also be assessed by stratifying the data based on the presence of A*02:01 (Table 6); these observations suggest that A*02:01 in cis or in trans can decrease the OR of other DRB1~DQB1 haplotypes.

Table 5 Association of specific A~C~B~DRB1*15:01~DQB1~DPB1 haplotypes with multiple sclerosis
Table 6 Association of DRB1~DQB1 genotypes in the presence and absence of HLA-A*02:01 and C~B genotypes in the presence and absence of DRB1*15:01 with multiple sclerosis

Associated C~B haplotypes

For C~B haplotypes (Table 3), C*07:02~B*07:02 is associated strongly with MS (OR = 1.99; p = 8.8E−07); however, this association reflects the strong LD between this haplotype and the predisposing DRB1*15:01 allele (dij = 0.71 in MS patients, and 0.52 in controls). Two different C~B haplotypes display a protective association in this data set. As previously reported [60, 63, 64], C*05:01~B*44:02 is modestly protective (OR = 0.65; p = 0.043). B*44:02 is rarely found with any other HLA-C allele, while the C*05:01~B*18:01 haplotype is clearly not protective (OR = 2.07; CI = 0.88–5.25; p = 0.71), suggesting that B*44:02 may be responsible for the observed modest association for this haplotype.

In addition, the C*03:04~B*40:01 haplotype (OR = 0.27; p = 6.76E−06) shows a strong protective association. This protective C~B haplotype is in LD with the protective A*02:01 allele, and this three-locus haplotype (Table 4) is even more strongly protective (OR = 0.15; CI = 0.04–0.45; p = 6.5E−05).

The conditional asymmetric LD (cALD) measures WHLA-A/HLA-C~HLA-B and WHLA-C~HLA-B/HLA-A are 0.6 and 0.42 in cases, and 0.6 and 0.39 in controls, respectively, indicating more variation of C~B haplotypes relative to HLA-A alleles, than in HLA-A alleles relative to C~B haplotypes; the intermediate level of LD between A*02:01 and the C*03:04~B*40:01 haplotype (dij = 0.11 in MS patients and 0.34 in controls) suggests that the strong protective association for the A*02:01~C*03:04~B*40:01 haplotype results from the combination of these three alleles, and not LD with a single-protective locus. C*03:04~B*40:01 remains protective in the absence of A*02:01 (OR = 0.42; p = 0.018), and A*02:01 is modestly protective in the absence of C*03:04~B*40:01 (OR = 0.79; p = 0.048), suggesting that the observed protective association for the A*02:01 allele is not due entirely to LD with C*03:04~ B*40:01.

Impact of DRB1*15:01 predisposition on C*03:04~B*40:01 association

The highly protective C*03:04~B*40:01 haplotype is in negative LD with the highly predisposing DRB1*15:01 allele (dij = −1); no C*03:04~B*40:01-bearing haplotypes carry DRB1*15:01. In principle, this negative LD with DRB1*15:01 might account for the protective associations observed for A*02:01 and C*03:04~B*40:01. We applied stratification analyses (Table 4) to determine if this LD pattern could account for the observed protective association of this C~B haplotype. In the stratum lacking DRB1*15:01, the protective association of C*03:04~ B*40:01 is even stronger (OR = 0.29; CI = 0.15–0.55; p = 2.45E−05), so the protective association cannot be attributed simply to negative LD with the highly predisposing DRB1*15:01. In individuals carrying DRB1*15:01, the presence of the C*03:04~B*40:01 haplotype on the other chromosome reduces MS risk (OR = 1.37; p = 0.57) compared to all other C-B haplotypes (OR = 5.06; p = 3.21E−13) (Table 6). The only other significant associations in this DRB1*15:01-negative stratum are C*05:01~B*18:01 (OR = 2.87; p = 0.01) and C*07:01~B*08:01 (OR = 1.98; p = 0.0004) but these are both due to LD with the predisposing DRB1*03:01 (dij = 0.87 and 0.72 in cases, and 0.51 and 0.69 in controls, respectively). C*03:04~B*40:01 remained protective in the DRB1*03:01-negative stratum (OR = 0.32) (data not shown).

Hardy–Weinberg equilibrium analyses

The analysis of Hardy–Weinberg equilibrium (HWE) among controls can serve as a test of genotyping and sampling validity, while deviations from HWE among cases can, potentially, reveal patterns of disease association. Adherence to HWE expectations is a requirement for control groups in case–control studies. Among those loci that showed a significant MS association (HLA-A, -B, -C, DRB1, DQB1), no deviation from HWE was observed among controls (data not shown), including HWE analysis for DRB1~DQB1 haplotypes. While studies of HLA diversity in the US population have identified varying degrees of population stratification among non-Hispanic European Americans [65, 66], these Hardy–Weinberg analyses reveal no significant population stratification in this cohort.

Among MS patients, highly significant deviations from HWE were seen for genotypes of DR~DQ haplotypes (p = 0.0027). The two most common genotypes of DRB1~DQB1 haplotypes that contributed to this deviation were DRB1*03:01~DQB1*02:01 homozygotes (15 observed, 8.6 expected; p = 0.016) and DRB1*07:01~DQB1*03:03+DRB1*15:01~DQB1*06:02 heterozygotes (10 observed, 5.3 expected; p = 0.0078), both observed more often than expected among cases. The excess of DRB1*03:01 homozygotes among cases suggests a recessive model for MS risk. Consistent with this interpretation of the HWE deviation, the OR for the homozygous DRB1*03:01~DQB1*02:01 genotype is 5.27 (p = 0.0037) compared to DRB1*03:01~ DQB1*02:01+DR~DQ*X (OR = 0.74; p = 0.13), where DR~DQ*X is any haplotype that does not include DRB1*15:01 or DRB1*03:01. The OR for this DRB1*03:01~ DQB1*02:01 homozygote is close to that for DRB1*03:01~ DQB1*02:01+DRB1*15:01~DQB1*06:02 (OR = 5.55; p = 1.32E−06) and DRB1*15:01~DQB1*06:02+DRB1*15:01 ~DQB1*06:02 homozygote (OR = 7.6; p = 1.13E−05).

The excess of observed DRB1*07:01~DQB1*03:03+DRB1*15:01~DQB1*06:02 genotypes among cases suggests that the susceptibility conferred by the DRB1*15:01 haplotype may be “dominant” over the protection conferred by the DRB1*07:01 haplotype. The expected number of cases in the HWE analysis is based on the protective effect of the DRB1*07:01~DQB1*03:03 haplotype over all genotype combinations.

Association analysis of KIR and HLA ligands

HLA ligands

Association analyses for the presence/absence of the KIR loci and their HLA ligands are shown in Table 7 and Supplementary Table S4. As previously reported [38] the HLA ligand Bw4 (Thr or Ile at HLA-B amino-acid position 80) is negatively associated with MS (Table 7A; OR = 0.62; p = 5.95E−04). The OR for Bw4/Bw4 is 0.63 and for Bw6/Bw6 is 1.61. The observed protective effect of Bw4+ alone, however, may be attributed, in part, to negative LD with the highly predisposing DRB1*15:01; when the data are stratified on the presence of DRB1*15:01 (Table 7B), the statistical significance of the Bw4+ effect is diminished in the stratum missing DRB1*15:01 (OR = 0.72; p = 0.08). This interpretation suggests that the observed Bw4 protective association with MS is not necessarily due to the Bw4 signaling via its inhibitory receptor KIR3DL1, but may simply reflect LD patterns between HLA-B and DRB1.

Table 7 Association of KIR loci and HLA ligands with multiple sclerosis

In the association analysis of individual amino acid residues (see below, Table 8C), the Bw4 epitope with Thr at position 80 (Bw4T) shows a protective association (OR = 0.64; p = 0.0003) but the stronger-binding Bw4 epitope with Ile (Bw4I) does not (OR = 0.92; p = 0.56), consistent with the Bw4 association reflecting LD and not ligand mediated KIR signaling. Association analyses of the Bw4 epitope on some HLA-A molecules (ABw4) reveal no protective effect (data not shown). The frequency of DRB1*15:01 in Bw4+ individuals is 48% in MS patients and 16% in controls, while it is 51% in Bw4- patients and 26% in Bw4- controls, suggesting that the disease risk associated with DRB1*15:01 is not reduced in the Bw4-positive stratum. However, subdividing Bw4 does reveal a difference in the association pattern, and this difference cannot be attributed simply to LD. Both Bw4T and Bw4I are in negative LD with DRB1*15:01 (dij = −0.45 and −0.64 in cases, and −0.79 and −0.92 in controls, respectively) but the negative association of Bw4T with MS remains nominally significant (OR = 0.071; p = 0.032) even in the DRB1*15:01 negative stratum (Table 7C).

Table 8 Association of HLA amino acid positions with multiple sclerosis

HLA ligand with KIR

Since the interaction of specific receptors and their HLA ligands is functional, we analyzed specific combinations of HLA ligands and KIR genotypes (Supplementary Table S4A). To address the issue of LD with DRB1*15:01, we also examined these combinations in the stratum lacking DRB1*15:01 (Supplementary Table S4B). The combination of Bw4 and KIR3DL1 has been reported to be protective in the recent study of MS in African Americans [44] including the DRB1*15:01-lacking stratum. In our data set, Bw4 is protective in the presence of KIR3DL1, a gene present on virtually all KIR haplotypes (OR = 0.62; p = 6.12E−04) but also protective in the presence of KIR2DL3 (OR = 0.58; p = 9.12E−05). However, following stratification on DRB1*15:01, Bw4 and KIR3DL1 are no longer significantly protective (OR = 0.75; p = 0.11) in the DRB1*15:01 negative stratum. The protective association with Bw4 and KIR2DL3, however, is still nominally significant (OR = 0.62; p = 0.010). At the KIR genotype level, one specific combination in this DRB1*15:01 negative stratum (Bw4+ and KIR2DL2/KIR2DL3) shows a nominally significant protective association (OR = 0.59; p = 0.017), but Bw4+ with KIR2DL2/KIR2DL2 (OR = 2.17; p = 0.051), or with KIR2DL3/KIR2DL3 (OR = 0.91; p = 0.63) do not. Given the multiple comparisons in this association analysis, replication in another cohort will be critical in validating this observation.

Association analyses of individual amino acids

The association analyses of individual amino acids in the HLA class I and class II genes can potentially reveal functionally important aspects of disease associations. Several statistically significant associations are shown in Table 8A and dissected in Tables 8B and 8C.

Table 8B shows the individual DRB1 exon 2-encoded amino acid residues associated with MS. Pro at DRβ position 11 and Arg at position 13 are significantly associated with MS (OR = 3.23; p = 2.22E−16) but these specific residues are unique to DRB1*15 and *16 alleles and reflect the association of DRB1*15:01. The less common DRB1*15:02 and DRB1*16:01 alleles found in this population share this amino acid motif but do not confer risk to MS. Position 86 Val is also associated with MS (OR = 2.15; p = 1.56E−14). Many DRB1 alleles that are not associated with MS also encode Val-86 but the Val-Gly dimorphism at position 86 is the only difference between highly susceptible DRB1*15:01 and neutral DRB1*15:02. Position 86 contributes to peptide binding pocket 1, underscoring the role of position 86 dimorphism in determining peptide specificity.

Association analyses of individual HLA class I-encoded amino acid residues that constitute the KIR ligand epitopes are shown in Table 8C. As noted above, the HLA-B position 80 Bw4T subtype is protective while Bw4I, thought to be a stronger-binding ligand of KIR3DL1, is not. The modest protective association of Bw4T is not due to negative LD with DRB1*15:01, as it remains nominally significant even in the DRB1*15:01-negative stratum (Table 7C).

The HLA-C positions 77 and 80, which encode the C1 and C2 KIR ligands, are not associated with MS but, interestingly, amino acid positions 73–90, which influence the strength of KIR ligand binding [67] are significantly associated. The OR for the 73–77–80–99 motif (A~S~N~D) for the C2 epitope is 1.63 (p = 2.3E−06). This motif, however is in LD with HLA-C*07:02 (OR = 1.9; p = 2.11E−06), the HLA-C allele in LD with DRB1*15:01. Thus, the observed association of the A~S~N~D C2 motif probably reflects LD rather than KIR signaling. The same motif is present in C*07:01 and *07:04, alleles not associated with MS, consistent with this interpretation.

HLA-A*02:01 and Bw4

The strong protective associations of C*03:04~B*40:01 and A*02:01 do not appear to reflect LD with DRB1*15:01 or the Bw4 ligand group. The A*02:01 protective association with MS has been previously reported in various populations [14, 57,58,59,60,61]. In a recent study of African-American MS, the combined presence of KIR3DL1 and Bw4, its ligand, was protective, and the protective association for A*02 was attributed to LD with Bw4 [44]. This interpretation suggests that innate immunity and NK cell function, regulated by the Bw4 ligand, account for the observed negative association with A*02:01.

Our data suggest that A*02:01 is associated with protection from MS in European Americans, and that the protection conferred by A*02:01 in combination with C*03:04~B*40:01 (OR = 0.15; p = 6.51E−05) is stronger than the observed negative association with Bw4 presence (OR = 0.62; p = 5.95E−04). In our study, LD is modest between A*02:01 and the Bw4 epitope (dij = 0.17 in MS patients and 0.18 in controls), but much lower than LD of C*07:02~B*07:02 with DRB1*15:01 in MS patients (0.71) or A*02:01 with C*03:04~B*40:01 in controls (0.34). The A*02:01~Bw4 haplotype is as protective as Bw4 presence (OR = 0.62; p = 1.69E−03) (Table 9), but A*02:01~Bw4T haplotypes are more protective (OR = 0.53; p = 7.55E−04), while A*02:01~Bw4I haplotypes are not, consistent with Table 8C.

Table 9 Association of HLA-A*02:01 and Bw4 with multiple sclerosis

LD between A*02:01 and Bw4T is comparable to that between A*02:01 and Bw4 (dij 0.16 in MS patients and controls), whereas LD is much stronger between A*02:01 and C*05:01~B*44:02 (0.59 in MS patients and 0.62 in controls). Of the HLA-B alleles in protective C~B haplotypes, B*40:01 encodes Bw6, while B*44:02 encodes Bw4T; the protection associated with Bw4T may reflect, in part, the protective C*05:01~B*44:02 haplotype (and perhaps other Bw4T-encoding HLA-B alleles).

Discussion

We have identified multiple HLA class I and class II alleles and haplotypes associated with MS. Strong LD is a characteristic of the HLA region, and we investigated allele-pair LD and conditional asymmetric LD, and applied stratification analysis to adjust for LD in order to dissect and interpret these associations. In addition to standard case–control association analyses, we applied Hardy–Weinberg equilibrium analyses to cases and controls to validate our association findings. Many immune-related genes in the MHC were not analyzed in this study; given the LD known for the MHC, our analyses do not exclude these genes as potentially playing roles in MS susceptibility. However, association analysis, following stratification, proved effective at identifying the independent effects of specific HLA alleles and haplotypes. As reported in many previous studies, the DRB1*15:01~DQB1*06:02 haplotype is most strongly associated with MS risk; DRB1*03:01~DQB1*02:01 is also significantly associated with a recessive effect on MS risk and, as expected, is very strongly associated with MS in the DRB1*15:01-negative stratum. NGS HLA typing allowed the analysis of the DRB3 allelic diversity on DRB1*03:01 haplotypes, and our analyses suggest that DRB3*02:02 may confer higher risk than DRB3*01:01, but this observation must be tested in a larger study.

A*02:01, C*03:04~B*40:01 and the haplotype carrying all three alleles show very strong protective associations (OR = 0.15 for the three-locus haplotype) with MS, independent of LD with DRB1*15:01. The protective association of the A*02:01~C*03:04~B*40:01 haplotype displays the strongest effect size of the observed HLA associations in this study.

For the HLA ligands of the KIR, the presence of Bw4 was negatively associated with MS in the unstratified data set, as noted in previous reports, but was no longer significant in the stratum lacking DRB1*15:01. While this observed association may simply reflect negative LD between Bw4 and DRB1*15:01 in this population, the two Bw4 subtypes, Bw4T and Bw4I, showed different association patterns. The protective association of Bw4T remained nominally significantly even in the DRB1*15:01-negative stratum, while Bw4I was not associated in either stratum. The Bw4 motif on HLA-A molecules (all of which are Bw4I) was also not significantly protective. From the available data, we cannot distinguish between a potential effect on peptide binding mediated by this Thr/Ile polymorphism in HLA-B pocket F, differential signaling via the KIR3DL1 receptor, or a combination of the two. A recent study of HIV infection indicates that the binding of a specific HIV peptide can influence the interaction of the Bw4 epitope with the KIR3DL1 receptor [68]. The difference between an uncharged, polar side chain (Thr) and an aliphatic side chain (Ile) may influence peptide binding, and through differential peptide binding, KIR3DL1 signaling.

In investigating different HLA ligand/KIR genotype combinations in the DRB1*15:01-negative stratum, the strongest protective Bw4 association we observed was in combination with KIR2L2/KIR2DL3, which is stronger than Bw4 in combination with 3DL1. This protective association was nominally significant but, given the number of comparisons, validation of this observation requires testing in another large cohort. The immunological mechanism underlying the Bw4T protective association remains unclear.

Many other amino acid positions were implicated in our analyses, but, as in all HLA-related association studies, they must be considered in the context of LD. Some disease associated amino acid residues simply “tag” an allele, recapitulating an already well-established allele association. These associations, the report of Raychaudhuri and colleagues notwithstanding [69], do not increase our functional understanding of HLA-related disease association. However, other individual amino acid associations that do not correspond uniquely to specific alleles may provide some functional insights, although the peptide binding properties of HLA molecules are obviously determined by multiple amino acid residues. In general, the potential role of individual amino acids in disease associations can be best evaluated by comparing alleles that differ in disease risk, and differ in only one amino acid position.

For example, DRB1 alleles encode either Gly or Val at DRβ position 86; this position contributes to peptide binding pocket 1 [70], which anchors the N-terminal end of the bound peptide [71]. Positions 82 and 89 also contribute to pocket 1, but are invariant in this data set and in most HLA alleles. Neither 86G nor 86V tag a specific allele, but the predisposing DRB1*15:01 allele (86V) and the neutral DRB1*15:02 allele (86G) differ only at encoded position 86. In Table 8B, position 86V (OR = 2.15; p = 1.56E−14) was implicated as potentially being functionally related to the observed association of DRB1 with MS.

Finally, the non-Hispanic European American cohort in this study represents a “pan-European” population, and as such may be subject to population stratification. However, our Hardy–Weinberg analyses revealed no significant population stratification in this cohort. In addition, the frequencies of key alleles and haplotypes (e.g., HLA-B*18:01 and the A1~B8~DR3 haplotype) in our cohort are consistent with those observed across Europe [72], as opposed to the very high-frequencies observed for these variants in specific European populations, again suggesting that stratification in this cohort is minimal.

Conclusions

Some associations of specific HLA alleles, e.g., the strong protective effect of the C*03:04~B*40:01 haplotype, remain highly significant following stratification on DRB1*15:01. In general, the results of these analyses indicate that a careful consideration of LD patterns among HLA alleles is essential in the interpretation of MS association data. Overall, we conclude that specific HLA class I polymorphisms are protective for MS, independent of the strong MS predisposition conferred by the DRB1*15:01 allele.

Materials and methods

Samples

Blood samples were collected for 412 MS patients of self-identified non-Hispanic European ancestry, and 419 healthy, ethnically matched controls. MS patients were diagnosed by neurologists specialized in demyelinating diseases in accordance with well-established diagnostic and study inclusion criteria [73]. Controls were of self-identified non-Hispanic European ancestry and reported no history of chronic diseases for themselves or their nuclear family. De-identified genomic DNA was extracted using a standard desalting method and quantitated in duplicate using the PicoGreen dsDNA quantitation reagent. Coded DNA aliquots are stored at −80 °C. Study protocols were approved by the UCSF Committee on Human Research and informed consent was obtained from all participants.

Genotyping

Locus-specific genotyping for the 14 KIR loci was performed as previously described [74, 75]. Next-generation sequencing of HLA-class I exons 2, 3, and 4, HLA class II exon 2, and HLA class II exon 3 (for the DQB1 locus) on the Roche (Pleasanton, CA, USA) 454 GS FLX instrument was used to genotype HLA-A, -C, -B, DRB1, DRB3/4/5, DQA1, DQB1, and DPB1 alleles [76,77,78,79]. NGS HLA sequences were assigned to HLA alleles on the basis of reference sequences in IMGT/HLA Database version 3.1.0 (17 July 2010) using Conexio (Fremantle, Australia) Assign ATF version 1.1.0.35.

HLA genotyping was blinded with respect to MS patients and controls, with 15% of specimens retyped for quality assurance purposes; our NGS HLA genotypes were verified independently via HistoGenetics (Ossning, NY, USA) [80] with >98% concordance. Discordant typings were reviewed and retyped, and the final data set was 100% concordant between the two NGS methods. Subject disease-status (case/control) was only released for analysis after the genotyping was completed.

Data analysis

Tests of association

We applied locus-level tests of heterogeneity and variant-level χ2 tests of association at the genotype, haplotype, locus and individual amino acid levels using BIGDAWG (v1.8.1) [81]. In these tests, each multi-gene group (e.g., HLA-C~HLA-B), individual gene (e.g., HLA-DRB1) and inferred polymorphic amino acid position (e.g., DRβ position 86) was treated as a locus, and individual haplotypes (e.g., HLA-C*07:02~HLA-B*07:02), alleles (e.g., DRB1*03:01) and amino-acid residues (e.g., DRβ position 86 V) were treated as variants. For each comparison, variants with expected counts <5 in cases or controls were combined into a common “binned” category for analyses [82].

We measured interaction between KIR and HLA loci by applying a χ2 test to contingency tables that crossed disease phenotype with genotype, where genotype was defined as a given KIR-HLA combination. Specifically, we tested dominant and additive effects of KIR genes and their ligands at all biallelic loci in the overall cohort in addition to sub-cohorts defined by presence of DRB1*15:01. From these contingency tables, we calculated odds ratio with 95% confidence intervals, and p-values.

Test of Hardy–weinberg equilibrium

We performed tests for deviations from HWE proportions using BIGDAWG and PyPop (v0.7.0) [83], assessing genotyping proportions for both individual loci and specific haplotypes (using haplotypes assigned to individuals in BIGDAWG on the basis of posterior probabilities). We identified significant locus-level HWE deviations using Guo and Thompson’s exact method [84], and identified individual genotypes deviating significantly from HWE expectations using Chen′s method [85, 86], using a threshold of significance of 0.05.

Evaluation of linkage disequilibrium

We calculated normalized LD values (dij) [87] for individual haplotypes with PyPop, and calculated cALD values, evaluating LD between sets of loci, using the “asymLD” R package (v0.1) [88]. Values of dij range from −1, when the haplotype is never observed, to 1, describing the maximum possible LD based on the frequencies of the constituent alleles. The cALD measure WA/B is the correlation coefficient for alleles at locus A conditioned on the alleles at locus B, and describes the overall variation of alleles at locus A, given specific alleles at locus B. WB/A is the correlation coefficient for alleles at locus B, conditioned on the alleles at locus A, and describes the overall variation of alleles at locus B, given specific alleles at locus A [89]. When there are equal numbers of alleles and complete allele correlation between both loci, the value of WA/B and WB/A is 1, indicating no variation of alleles between loci.

Corrections for multiple comparisons

For locus-level χ2 tests of heterogeneity involving individual loci (i.e., HLA-A and DPB1) and haplotypes of loci (e.g., C~B and DRB1~DQB1), the threshold of significance was calculated as 0.05/n, where n is the number of comparisons. We note that these comparisons are not necessarily independent (e.g., the HLA-A locus is included in four comparisons), so that these estimates can be considered overly conservative.

Our tests of the specific hypothesis that the protective effect of HLA-A*02:01 is due to LD with the Bw4 motif [44] are addressed separately from other locus-level and haplotype-level comparisons. These tests pertained to the A~Bw4/Bw6 and A~HLA-B Position 80 amino-acid variant haplotypes. Similarly, our tests of Bw4 Thr and Ile subtypes in DRB1*15:01-positive and DRB1*15:01-negative strata address our observation that HLA-B position 80T is associated with MS, whereas position 80I is not, and our tests of DRB1 alleles in DRB1*15:01-positive and DRB1*15:01-negative strata address the observation that DQA1*01:01 (found on the DRB1*01:01~DQB1*05:01 haplotype) is protective only in the presence of DRB1*15:01 [53]. The threshold of significance for both of these pairs of locus-level χ2 tests of heterogeneity was calculated as 0.05/2 (0.025E−2).

For χ2 tests of heterogeneity of amino-acid positions, the threshold of significance was calculated for each individual locus as 0.05/n, where n is the number of variant amino-acid positions at that locus. Results are not presented for positions that did not display significant position-level heterogeneity.

In cases where locus-level tests of heterogeneity were not significant (p > the threshold of significance), the threshold of significance for the χ2 tests of association was calculated as 0.05/n, where n equals the number of variants at that locus.

Statistical power analysis

We used the pwr.chisq.test function in the R “pwr” package (version 1.2–0) to evaluate the size of an effect detectable in our data set with the recommended statistical power (1-β) of 0.8 with an α of 0.05 [90]. For association tests of alleles and haplotypes, with 31 allele categories, we expect to detect small effect sizes (0.121). For tests of locus presence, motifs and amino acid positions, with 2–5 categories, we expect to detect very small effect sizes (0.068–0.085).

Data access

The HLA and KIR genotype data used for the analyses described here have been deposited into ImmPort (http://www.immport.org), the public data-sharing resource of the National Institute of Allergy and Infectious Disease′s (NIAID) Division of Allergy, Immunology, and Transplantation (DAIT) and Division of Microbiology and Infectious Diseases (DMID), and can be accessed under the ImmPort Study Accession Number SDY1045 (https://doi.org/10.21430/M3QW34U2SG).

Code availability

The source code for BIGDAWG is available online at https://cran.r-project.org/web/packages/BIGDAWG/index.html and https://github.com/IgDAWG/BIGDAWG, with version 1.8.1 code at https://github.com/IgDAWG/BIGDAWG/tree/eb0b 4140ec3fb85b1a4fba5826ffc9f9e3239d10.

The source code for asymLD v0.1 is available online at https://cran.r-project.org/web/packages/asymLD/index.html.

The source code for PyPop is available online at https://github.com/alexlancaster/pypop, with version 0.70 code at https://github.com/alexlancaster/pypop/tree/3f29d4b53548 ce4deb60a5960368627999396653.