Abstract
Detection of somatic mutations in human leukocyte antigen (HLA) genes using whole-exome sequencing (WES) is hampered by the high polymorphism of the HLA loci, which prevents alignment of sequencing reads to the human reference genome. We describe a computational pipeline that enables accurate inference of germline alleles of class I HLA-A, B and C genes and subsequent detection of mutations in these genes using the inferred alleles as a reference. Analysis of WES data from 7,930 pairs of tumor and healthy tissue from the same patient revealed 298 nonsilent HLA mutations in tumors from 266 patients. These 298 mutations are enriched for likely functional mutations, including putative loss-of-function events. Recurrence of mutations suggested that these 'hotspot' sites were positively selected. Cancers with recurrent somatic HLA mutations were associated with upregulation of signatures of cytolytic activity characteristic of tumor infiltration by effector lymphocytes, supporting immune evasion by altered HLA function as a contributory mechanism in cancer.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
Change history
01 October 2015
In the version of this article initially published online, there were errors in three equations in the first page of Online Methods, in the section “Allele inference”: “= ei/3 otherwise” should have been on a separate line; the equation beginning with “P (D = dk)” was missing an equal sign immediately after this expression; and in the equation starting with Lm, the fifth summation sign was missing “k = 1”. On p. 3, under the first subheading on the right-hand side, “ovarian cancer (n = 432)” should have read “thyroid cancer (n = 486).” In addition, the citation for Supplementary Software was missing. The errors and omission have been corrected for the print, PDF and HTML versions of this article.
References
Stransky, N. et al. The mutational landscape of head and neck squamous cell carcinoma. Science 333, 1157–1160 (2011).
Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525 (2012).
Lohr, J.G. et al. Discovery and prioritization of somatic mutations in diffuse large B-cell lymphoma (DLBCL) by whole-exome sequencing. Proc. Natl. Acad. Sci. USA 109, 3879–3884 (2012).
Lawrence, M.S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).
Cancer Genome Atlas Research Network. Comprehensive molecular characterization of gastric adenocarcinoma. Nature 513, 202–209 (2014).
The MHC sequencing consortium. Complete sequence and gene map of a human major histocompatibility complex. The MHC sequencing consortium. Nature 401, 921–923 (1999).
Townsend, A. & Bodmer, H. Antigen recognition by class I-restricted T lymphocytes. Annu. Rev. Immunol. 7, 601–624 (1989).
Bjorkman, P.J. & Parham, P. Structure, function, and diversity of class I major histocompatibility complex molecules. Annu. Rev. Biochem. 59, 253–288 (1990).
Welsh, K. & Bunce, M. Molecular typing for the MHC with PCR-SSP. Rev. Immunogenet. 1, 157–176 (1999).
Fernandez-Viña, M.A., Falco, M., Sun, Y. & Stastny, P. DNA typing for HLA class I alleles: I. Subsets of HLA-A2 and of -A28. Hum. Immunol. 33, 163–173 (1992).
Tiercy, J.M. et al. Oligotyping of HLA-A2, -A3, and -B44 subtypes. Detection of subtype incompatibilities between patients and their serologically matched unrelated bone marrow donors. Hum. Immunol. 41, 207–215 (1994).
Erlich, R.L. et al. Next-generation sequencing for HLA typing of class I loci. BMC Genomics 12, 42 (2011).
Wang, C. et al. High-throughput, high-fidelity HLA genotyping with deep sequencing. Proc. Natl. Acad. Sci. USA 109, 8676–8681 (2012).
Lank, S.M. et al. Ultra-high resolution HLA genotyping and allele discovery by highly multiplexed cDNA amplicon pyrosequencing. BMC Genomics 13, 378 (2012).
Danzer, M. et al. Rapid, scalable and highly automated HLA genotyping using next-generation sequencing: a transition from research to diagnostics. BMC Genomics 14, 221 (2013).
Cao, H. et al. An integrated tool to study MHC region: accurate SNV detection and HLA genes typing in human MHC region using targeted high-throughput sequencing. PLoS One 8, e69388 (2013).
Wang, L. et al. SF3B1 and other novel cancer genes in chronic lymphocytic leukemia. N. Engl. J. Med. 365, 2497–2506 (2011).
Robinson, J. et al. The IMGT/HLA database. Nucleic Acids Res. 41, D1222–D1227 (2013).
Gonzalez-Galarza, F.F., Christmas, S., Middleton, D. & Jones, A.R. Allele frequency net: a database and online repository for immune gene frequencies in worldwide populations. Nucleic Acids Res. 39, D913–D919 (2011).
Szolek, A. et al. OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics 30, 3310–3316 (2014).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Saunders, C.T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012).
Omberg, L. et al. Enabling transparent and collaborative computational analysis of 12 tumor types within The Cancer Genome Atlas. Nat. Genet. 45, 1121–1126 (2013).
Hodis, E. et al. A landscape of driver mutations in melanoma. Cell 150, 251–263 (2012).
Robinson, J.T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Engström, P.G. et al. Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods 10, 1185–1191 (2013).
Roberts, R.J., Carneiro, M.O. & Schatz, M.C. The advantages of SMRT sequencing. Genome Biol. 14, 405 (2013).
Fayen, J. et al. Class I MHC alpha 3 domain can function as an independent structural unit to bind CD8 alpha. Mol. Immunol. 32, 267–275 (1995).
Brusic, V., Petrovsky, N., Zhang, G. & Bajic, V.B. Prediction of promiscuous peptides that bind HLA class I molecules. Immunol. Cell Biol. 80, 280–285 (2002).
Ruppert, J. et al. Prominent role of secondary anchor residues in peptide binding to HLA-A2.1 molecules. Cell 74, 929–937 (1993).
Brown, S.D. et al. Neo-antigens predicted by tumor genome meta-analysis correlate with increased patient survival. Genome Res. 24, 743–750 (2014).
Rooney, M.S., Shukla, S.A., Wu, C.J., Getz, G. & Hacohen, N. Molecular and genetic properties of tumors associated with local immune cytolytic activity. Cell 160, 48–61 (2015).
Schreiber, R.D., Old, L.J. & Smyth, M.J. Cancer immunoediting: integrating immunity's roles in cancer suppression and promotion. Science 331, 1565–1570 (2011).
Bubeník, J. MHC class I down-regulation: tumour escape from immune surveillance? (review). Int. J. Oncol. 25, 487–491 (2004).
Zou, W. Regulatory T cells, tumour immunity and immunotherapy. Nat. Rev. Immunol. 6, 295–307 (2006).
Pardoll, D.M. The blockade of immune checkpoints in cancer immunotherapy. Nat. Rev. Cancer 12, 252–264 (2012).
Norgaard, L., Fugger, L., Madsen, H.O. & Svejgaard, A. Identification of 4 different alternatively spliced HLA-A transcripts. Tissue Antigens 54, 370–378 (1999).
Brady, C.S. et al. Multiple mechanisms underlie HLA dysregulation in cervical cancer. Tissue Antigens 55, 401–411 (2000).
Jiménez, P. et al. A nucleotide insertion in exon 4 is responsible for the absence of expression of an HLA-A*0301 allele in a prostate carcinoma cell line. Immunogenetics 53, 606–610 (2001).
Pittet, M.J. et al. Alpha 3 domain mutants of peptide/MHC class I multimers allow the selective isolation of high avidity tumor-reactive CD8 T cells. J. Immunol. 171, 1844–1849 (2003).
Boegel, S. et al. HLA typing from RNA-seq sequence reads. Genome Med. 4, 102 (2012).
Kim, H.J. & Pourmand, N. HLA typing from RNA-seq data using hierarchical read weighting. PLoS One 8, e67885 (2013).
Bai, Y., Ni, M., Cooper, B., Wei, Y. & Fury, W. Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads. BMC Genomics 15, 325 (2014).
Warren, R.L. et al. Derivation of HLA types from shotgun sequence datasets. Genome Med. 4, 95 (2012).
Landau, D.A. et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell 152, 714–726 (2013).
Purcell, S.M. et al. A polygenic burden of rare disruptive mutations in schizophrenia. Nature 506, 185–190 (2014).
Irimia, M. & Roy, S.W. Origin of spliceosomal introns and alternative splicing. Cold Spring Harb. Perspect. Biol. 6, a016071 (2014).
Acknowledgements
C.J.W. is a Scholar of the Leukemia and Lymphoma Society and acknowledges support from the Blavatnik Family Foundation, American Association for Cancer Research (AACR) (SU2C Innovative Research Grant), National Heart, Lung, and Blood Institute (NHLBI) (1RO1HL103532-01) and National Cancer Institute (NCI) (1R01CA155010-01A1). This work has made extensive use of data generated by TCGA, a project of the National Cancer Institute and National Human Genome Research Institute. We thank E. Hodis for providing access to the melanoma data. We would also like to thank C. McCowan (Broad Technology Labs), T. Shea (Broad Technology Labs), S. Young (Broad Technology Labs) and M. Weiand (Pacific Biosciences) for their help in setting up, performing and analyzing data using Pacific Biosciences RSII instruments. We are grateful to E. Fritsch for critical reading of the manuscript and providing valuable feedback.
Author information
Authors and Affiliations
Contributions
C.J.W. proposed the initial idea of using exome data for HLA typing. S.A.S., G.G., C.J.W., P.M.D. and K.C. conceived and designed Polysolver and the mutation detection pipeline. G.T. and A.K. developed the ethnicity inference module. M.S.L. and S.A.S. performed the mutation significance analysis. V.B., C.J.W. and S.A.S. mapped the contact residue mutations. M.S.R. and N.H. performed the gene expression analysis. J.S., W.J.L., S.S. and J.L.D. performed the experimental validation. C.S. helped with data access and management. C.J.W., S.A.S., G.G. and M.R. wrote the manuscript. C.J.W. and G.G. led the project.
Corresponding authors
Ethics declarations
Competing interests
A patent application has been filed on this work by the Broad Institute with S.A.S., C.J.W. and G.G. as authors. G.G. is an inventor of Mutect and MutSig, which were used in this work. C.J.W. and N.H. are founders of Neon Therapeutics.
Integrated supplementary information
Supplementary Figure 1 GC%, coverage and informative sites in HLA genes in 8 CLL samples.
(a) A significant negative correlation was observed between GC content and exome coverage (1-way ANOVA, P = 1.6×10−7). Mapping was carried out using BWA with the following parameters: aln task, −q 5 −l 32 −k 2 −o 1; sampe task, −a 300 (b) GC-rich regions of HLA genes have a relative over-abundance of informative (variant) sites (1-way ANOVA, P = 0.0197). (c) Detailed view of GC%, coverage and informative site density in each HLA gene from 1 representative CLL sample. Top row: The x-axis represents the chr6 location. The mid-panel dashed black segments represent exons. GC% (green) decreases in the 5′->3′ direction (HLA-B and HLA-C are located on the negative strand). Coverage (blue) has an opposite trend and increases in the 5′->3′ direction. The informative site density (red) was evaluated as the number of variant sites located in a 50 bp window, and tracked with GC%. Bottom row — the coverage distribution at the variant positions in each of HLA-A, -B and -C.
Supplementary Figure 2 Specificity of different tag length libraries for retrieval of HLA reads.
A broad range of tag length libraries were evaluated for their specificity for HLA-A, -B and -C genes. Since we had 76-mer paired end reads, we selected a 38-mer tag library, which ensured 100% sensitivity in the context of downstream processing with 23.3% specificity for class I HLA genes.
Supplementary Figure 3 Ethnicity inference using PCA (HapMap samples).
Ethnicities of 132 of 133 HapMap samples were inferred correctly based on their projection in the 2-dimensional space defined by the first two principal components. The colored icons show the clustering of the 1,398 training samples belonging to four different ethnic groups. The black icons depict the projection of 132 HapMap samples in this space. (NA12878 was removed from the PCA step as an outlier.) The success rate for attributing the correct ethnicity to each sample was 100%.
Supplementary Figure 4 Characteristics of HLA mutations detected by POLYSOLVER across 7,930 samples.
(a) Allelic frequencies of all 298 detected HLA somatic changes. The median allele fraction across somatic changes was 33% (interquartile range: 16–58%). Most of these mutations are likely heterozygous. (b) Frequency of HLA mutations in samples. 240 of 266 (90.2%) samples with HLA mutations only had a single somatic event, 20 had two and 6 samples (4 colon, 1 stomach and 1 uterine) had 3 distinct HLA mutations. (c) Frequency of cases per recurrently mutated site. 57 of 64 recurrently mutated sites were defined as recurrent on the basis of 2 to 4 specimens across samples with a mutation at the same site. Residues 25, 299, 7 and 209 were found to be highly recurrent with 7, 9, 11 and 24 distinct individuals harboring mutations at these two positions respectively. (d) Length-normalized distribution of HLA mutations across functional domains. A strong preference of potentially loss-of-function events (nonsense, frameshift indels, splice site mutations) for exon 1 is observed.
Supplementary Figure 5 Genes with significantly reduced expression in HLA mutant samples across tumor types.
More than 80 genes were identified pan-cancer (P < 10−10); however, a coherent theme was not evident among them.
Supplementary information
Supplementary Figures and Notes
Supplementary Figures 1–5 and Supplementary Notes 1–5 (PDF 3631 kb)
Supplementary Tables
Supplementary Tables 1–16 (XLSX 7128 kb)
Supplementary Software
Supplementary Software (ZIP 80387 kb)
Rights and permissions
About this article
Cite this article
Shukla, S., Rooney, M., Rajasagi, M. et al. Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes. Nat Biotechnol 33, 1152–1158 (2015). https://doi.org/10.1038/nbt.3344
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/nbt.3344
This article is cited by
-
Med23 deficiency reprograms the tumor microenvironment to promote lung tumorigenesis
British Journal of Cancer (2024)
-
PD-L1-positive circulating endothelial progenitor cells associated with immune response to PD-1 blockade in patients with head and neck squamous cell carcinoma
Cancer Immunology, Immunotherapy (2024)
-
Pervasiveness of HLA allele-specific expression loss across tumor types
Genome Medicine (2023)
-
Genetic trajectory and clonal evolution of multiple primary lung cancer with lymph node metastasis
Cancer Gene Therapy (2023)
-
Molecular landscape of immune pressure and escape in aplastic anemia
Leukemia (2023)