Introduction

ZFP57 is a Kruppel-associated box (KRAB) containing zinc-finger protein, preferentially expressed early in development.1 ZFP57 has been shown in the mouse to act as a transcriptional regulator and is important in maintenance of imprinting,2, 3 regulating chromatin modifications and DNA methylation at murine imprinted loci in ES cells with the co-factor KRAB-associated protein 1 (KAP1/TRIM28).4 ZFP57 loss-of-function mutations are known to cause a hypomethylation disorder presenting as transient neonatal diabetes (TND) associated with a unique epigenetic profile at the TND differentially methylated region and other imprinted loci such as GRB10 and PEG3,5 but the biology of ZFP57 in humans is not well characterized. Recently we showed that the expression of ZFP57 was dependent on underlying genetic variation.6 Given the location of ZFP57 in the MHC class I region, we sought to resolve the association and investigate the relationship with disease.

Materials and methods

Volunteer recruitment, cell purification, cell culture, RACE, genotyping, imputation, eQTL mapping and relationship with reported GWAS were performed as detailed in Supplementary Information.

Results

We aimed to define the genetic modulators of ZFP57 transcription by expression quantitative trait (eQTL) mapping. Alternatively spliced isoforms are well characterized for murine Zfp571 and a number of isoforms are annotated in humans (Supplementary Figure S1). In order to take account of this when quantifying ZFP57 expression, we first characterized transcription in lymphoblastoid cell lines (LCLs) and peripheral blood mononuclear cells (PBMCs) from volunteers identified as expressing ZFP57. Rapid amplification of cDNA ends (RACE) using 3′ and 5′ adapted cDNA from the COX LCL, known to be a high expresser of ZFP57,6 and PCR with exon spanning primers revealed a previously unrecognized isoform, in which exon 2 is skipped and predicted to have a significantly truncated KRAB domain (Supplementary Figure S1). Quantification of ZFP57 using isoform-specific primers or primers spanning exons 3/4 to capture both isoforms revealed low but detectable expression in PBMCs, ES cells and several adult tissues, notably the thymus and kidney (Supplementary Figure S1). Relative abundance of the different isoforms remained consistent between different tissues and across individuals (Supplementary Figure S1).

We proceeded to eQTL mapping in a cohort of 288 healthy volunteers7 using primers spanning exons 3/4 to quantify transcript abundance in PBMCs. Following processing and quality control filtering, we analysed 651 210 SNP markers for 283 individuals. This revealed a major eQTL for ZFP57 with the most significant associated SNP rs375984 (P = 9.3 × 10−50) in the second intron of ZFP57 (Figure 1a and b). Analysis of purified monocytes from the same volunteers confirmed a strong eQTL, the most significant association was to rs375984 (P=3.2 × 10−11; Supplementary Figure S2). To further resolve this, we imputed 19 129 additional SNPs within 250 kb, which revealed three more strongly associated variants in the first intron of ZFP57 in perfect LD (rs416568, rs365052 and rs2747431, P=4.6 × 10−52; Figure 1c and d). We determined the functional genomic landscape for these eSNPs using data from the ENCODE project.8 Analysis of DNase-seq and ChIP-seq data sets resolved rs365052 is a candidate regulatory variant located in a DNase I hypersensitive site with evidence of CTCF binding (Figure 1d).

Figure 1
figure 1

Genetic modulators of ZFP57 expression. (a) Manhattan plot showing strength of association plotted as –log10(P) values by chromosome for ZFP57 expression. (b) Scatter/box and whiskers plot of ZFP57 expression by rs375984 allele demonstrating significant differences between the different genotype groups (P<0.0001, Kruskal–Wallis test). (c) Local association and recombination plot. Single marker allelic association results for a 215 kb region spanning ZFP57 plotted as –log10(P) values (left y-axis) by genomic coordinate (x-axis). With reference to rs2747431 (which is in complete LD with rs416568 and rs365052), typed SNPs are shown in red (r2>0.8), orange (0.5–0.8), yellow (0.2–0.5) and white (<0.2). Imputed SNPs are shown in grey. Recombination rate is also plotted (right y-axis). (d) Functional genomic landscape for ZFP57 (chr6:29640242-29650866) providing context for observed eSNPs, including rs375984, rs416568, rs365052 and rs2747431. Data are shown from the ENCODE project, accessed through the UCSC Genome Browser (http://genome.ucsc.edu/), resolving a DNase I hypersensitive site and evidence of CTCF binding in the region of rs365052 based on profiling of ES cells, LCLs (GM12878, GM12891) and CD20+ B cells. Linkage disequilibrium plot for the locus based on r2 is shown below including 115 SNPs (1000 Genomes CEU phase 1).

Analysis of the eQTL by HLA type showed association with HLA-A*01 and *23, but was not more informative than SNP markers (Figure 2). For the two most common ancestral haplotypes among Europeans, we found that volunteers with a copy of HLA-A1-B8-DR3 (n=19) had higher expression of ZFP57 compared to HLA-A3-B7-DR15 (n=12; Mann–Whitney, P<0.0001).

Figure 2
figure 2

Association between ZFP57 expression, eSNPs and classical HLA types. HLA-A, HLA-B, HLA-C, HLA-DRB, HLA-DQA and HLA-DQB shown at two-digit resolution with ZFP57 expression in PBMCs quantified by qPCR. Expression values are plotted for each individual corresponding to each HLA allele and coloured based on rs2747431 genotype (individuals with CC genotype at rs2747431 shown in red, CT in green and TT in blue). Two ZFP57 expression values are plotted for each individual corresponding to each allele. There was evidence of association for HLA-A*01 and HLA-A*23 alleles (P<0.0001 when analysed using a Mann–Whitney test).

We investigated whether the genetic variants identified here as associated with ZFP57 expression may be significant in common disease given the many disease associations reported involving the MHC class I region. We interrogated GWAS data sets and found intersection of ZFP57 eSNPs variants with reported disease associations involving malignancy, HIV/AIDS and autoimmunity (Table 1). These included nasopharyngeal carcinoma and prostate cancer, the latter involving disease risk based on gene–gene interaction with the tumour suppressor gene NKX3-1. Associations were also noted involving HIV-1 viral set point and disease progression to AIDS.

Table 1 Diseases and traits from the Catalogue of Published Genome-Wide Association Studies (www.genome.gov/gwastudies, Accessed May 2012) in which reported GWAS SNPs are also eSNPs for ZFP57 expression

Discussion

We found that a significant minority of people show low-level transcription of ZFP57 in adult cells and tissues, where it may modulate epigenetic processes, and that this is dependent on a strong local eQTL for ZFP57. Further work is required to resolve the functional basis for this, but a potential mechanism involves modulation of a novel regulatory element involving CTCF binding in the first intron of ZFP57. Our data also highlighted a potential role for ZFP57 eSNPs in traits including cancer and HIV/AIDS. KRAB-ZNF genes play a role in epigenetic processes critical to cancer, including silencing of tumour suppressor genes, while the co-factor KAP1 is involved in oncogenesis.9 DNA methylation is involved in establishing latency by retroviruses with hypermethylation of the viral 5’ long terminal repeat characteristic of HIV-1 aviraemic patients.10 To date in humans, ZFP57 has only been associated with maintenance and not establishment of DNA methylation,5 though a role in de novo methylation has been reported in mice.2 The identification of a novel shorter isoform of ZFP57 may be functionally significant given the resulting severely truncated KRAB domain, which is likely to limit interaction with KAP1.

Complex LD structure in the MHC together with differences in SNP coverage between genotyping platforms necessitates further work to establish whether the most significant ZFP57 eSNPs are also most informative for disease association and to explore the biological significance and relative importance of this observation in the context of extensive haplotype-specific expression in the MHC for other genes6 with multiple cis- and trans-eQTL identified for this genomic region.7

We have presented evidence that ZFP57 is expressed more widely than previously appreciated, notably beyond development, and that this is dependent on underlying genetic variation. Further work is needed to investigate the role of ZFP57 in epigenetic regulation, notably in terms of cancer and HIV infection where expression associated SNPs may play a role and epigenetic mechanisms are known to be important.