Main

CD4+ T cells have an essential role in immunity and autoimmune disease1,2,3,4, but how common genetic variants affect human CD4+ T cell function and disease pathology remain mostly unknown5,6. Genome-wide association studies (GWAS) have identified tens of thousands of genetic variants associated with autoimmune diseases, but the vast majority are in non-coding regions and in tight linkage disequilibrium with many other variants5,7,8. Hence, >99% of causal variants that drive complex traits such as autoimmune diseases have yet to be determined9. Identifying causal variants and the cells in which they operate would aid in defining disease-relevant target genes, pathways and cell types, and inform the development of more effective treatments10.

Complex trait-associated variants are enriched within chromatin that is accessible and/or actively modulating gene expression (that is, within regions of H3K27ac-deposited chromatin), which we define here as putative cis-regulatory elements (CREs). CREs differ between cell types11, and autoimmune GWAS variants enrich most highly within the CREs of CD4+ T cells5,12,13,14. Recent studies have shown that massively parallel reporter assays (MPRAs) can identify variants that alter cis-regulatory activity by changes in reporter gene expression15,16,17. Combining MPRAs with readouts of T cell-accessible chromatin enriched for variants with high statistical fine-mapping posterior inclusion probabilities (PIP > 0.5) up to 58-fold, suggesting that many emVars are putatively disease-causal variants18. Once putative causal variants are identified, a further challenge is to determine how they affect gene expression networks and cellular function. Online databases of functional genomic data, such as Open Targets Genetics, have been helpful for prioritizing genes that are probable targets of variants, but these databases lack perturbational data that directly link variants and CREs to gene expression19. To address this problem, recent studies have used bulk and single-cell CRISPR-interference (CRISPRi) screens to link variant CREs to the genes that they regulate20,21,22.

Although the above experiments support the probable cis-regulatory role of GWAS variants in T cells and other blood cells, both MPRA and CRISPRi screens have rarely been applied in primary cells. Instead, they tend to be performed in immortalized cell lines such as those derived from T cell leukemia (Jurkat) or erythroleukemia (K562). These cell lines may not accurately recapitulate the transcriptional regulation of primary cells or their associated phenotypes23. Testing these assays in primary cells containing transcriptional signatures and phenotypes that more closely reflect cells that have a role in disease may aid in highlighting the most relevant functional consequences of risk variants.

Here, we tested >18,000 variants for their effects on CRE activity with MPRAs in primary human T cells. We identified primary T cell expression-modulating variants (or emVars) that are largely distinct from those found in Jurkat cells. Primary T cell emVars tend to alter the binding of inflammatory transcription factors (TFs), enrich highly for fine-mapped variants with PIP > 0.8 and are often within loci containing genes that control T cell activation, among other pathways related to transcription and translation. Using single-cell and proliferation-based CRISPRi screens in primary T cells, we found that emVar CREs modulate genes within the T cell networks that control lymphocyte activation and mRNA processing, thus linking risk variants to T cell expression and function.

Results

Primary human CD4+ T cell emVars enrich for causal variants

We assessed the regulatory effects of genetic variants associated with multiple sclerosis, type 1 diabetes, psoriasis, rheumatoid arthritis and inflammatory bowel disease (IBD)24,25,26,27,28 in activated primary human CD4+ T cells using MPRA (Fig. 1a). We used a library previously described for an experiment performed on Jurkat T cells totaling 578 indexed single nucleotide polymorphisms (SNPs) and 18,312 total variants in tight linkage disequilibrium (R2 > 0.8) in the European subset of the 1000 Genomes Project cohort18.

Fig. 1: Autoimmune-associated emVars enrich for autoimmune disease-causal variants.
figure 1

a, Primary T cell MPRA workflow. b, Volcano plot. The log2 expression of the highest-expressing allele is on the x axis, and the log2 of the activity of allele1/allele2 is on the y axis. emVars are labeled red, pCRE elements are dark gray and elements with no activity are light gray. c, Enrichment of pCRE and emVar elements for the accessible chromatin profiles of all ENCODE cell types. The −log10P from a two-sided Fisher’s exact test for the enrichment of pCRE and emVar elements is on the y axis, and the rank according to this P value is on the x axis. Cell lineages are depicted according to the colors in the legend. d, Bar plot showing the enrichment of DHS elements, emVars identified in primary T cells and emVars in DHS elements for PICS statistically fine-mapped variants using probability thresholds indicated on the x axis. Bar plot shade indicates −log10P enrichment. Numbers below bars indicate the number of emVars that are statistically fine-mapped at a given PICS probability. Enrichment was calculated as a risk ratio, with P values determined through a two-sided Fisher’s exact test. e, Scatterplot comparing element allelic skew between Jurkat and primary T cell MPRA libraries. Color indicates emVar positivity in the primary T cell MPRA, Jurkat MPRA, both or neither assay. The log2 allelic bias levels of MPRA elements tested in primary T cells are plotted on the x axis and in Jurkat on the y axis. f, Venn diagram depicting called emVars in only primary T cell MPRAs (red), Jurkat MPRAs (blue) or both (purple).

We observed high reproducibility of the primary T cell MPRA across independent donors (Pearson correlation > 0.93; Extended Data Fig. 1a–c), identifying 1,125 elements with putative CRE activity greater than baseline (Extended Data Figs. 1d and 2), 545 of which were emVars (Fig. 1b and Supplementary Table 1). We found primary T cell emVars in 39.2% of tested loci (Extended Data Fig. 3a), with only one emVar on the haplotype in 53.8% of tested loci (Extended Data Fig. 3b). Primary T cell emVars and putative CREs were both enriched within active chromatin marks and other readouts of cis-regulatory activity and were found preferentially in the accessible chromatin of primary T cells compared to other cell types (Fig. 1c, Extended Data Fig. 4, Supplementary Note and Supplementary Tables 24)18,29.

We next assessed emVar enrichment for both statistically fine-mapped variants using probabilistic identification of causal SNPs (PICS) PIPs for each trait locus5,18 and SuSiE fine-mapping data for UK Biobank traits (Fig. 1d, Extended Data Fig. 5 and Supplementary Tables 5 and 6)30. In all loci, emVars enriched twofold to fourfold for high posterior probability variants compared to all other tested variants (Extended Data Fig. 5a,b and Supplementary Note). Focusing on loci in which at least one emVar is observed, emVars were enriched for variants with PIP > 0.8 upwards of 71-fold (PICS) and 50-fold (SuSiE) (Fig. 1d, center, Extended Data Fig. 5c, center and Supplementary Tables 7 and 8). When considering only variants in T cell DNase hypersensitivity (DHS) sites in these loci, emVars enrich 122-fold (PICS) to 200-fold (SuSiE) for high posterior probability variants (Fig. 1d, right, Extended Data Fig. 5c, right and Supplementary Tables 7 and 8). Overall, we found that our primary T cell MPRAs had a sensitivity of 27% and a specificity of 88% for identifying variants in fine-mapped 95% credible sets in risk loci. Thus, primary T cell emVars in T cell-accessible chromatin enrich highly for fine-mapped variants, indicating that they enrich for causal variants.

TF usage varies by cell type and disease

Although both primary T cell and Jurkat emVars enrich highly for causal variants (Fig. 1d and Extended Data Fig. 5)18, only 45 emVars overlapped between datasets; however, this was significantly more than expected by chance (hypergeometric P = 5.4 × 10−29). Primary T cell and Jurkat MPRAs differed largely in activity and allelic bias in reporter expression (Fig. 1e,f and Supplementary Tables 1 and 9); therefore, we reasoned that differences in signaling and activation between cell types may alter the TF programs operating in each cell type23,31. To assess how each variant is predicted to change TF binding, we used known catalogs of TF motifs to assess variant-mediated TF motif disruption and compared this to the allelic effect on reporter expression within MPRA (Supplementary Note). We identified TF motifs that, when disrupted by variants, reduce expression in the MPRA, suggesting that these are transcriptional activators, such as ATF1 (Fig. 2a). Conversely, we also identified variant-disrupted motifs with a corresponding increase in expression, suggesting that these are transcriptional repressors, such as GFI1B, consistent with GFI1B’s known repressor role (Fig. 2a)32. Although many TF effects were shared between both T cells and Jurkat cells, transcription conferred by TFs associated with inflammation was more likely to be disrupted by variants in primary T cell MPRAs, including NFKB1 (n = 67 variants), STAT3 (n = 71 variants) and JUN and FOSB (n = 72 and 61 variants, respectively) (Fig. 2a, Extended Data Fig. 6a,b and Supplementary Tables 1013). TF motifs whose perturbation by variants had concordant effects on MPRA expression in both cell types include ATF1, ETS factors ELK1 and ELK4 and ELF1 and ELF2, and GFI1B (Fig. 2a, Extended Data Fig. 6a,b and Supplementary Tables 1013). To validate whether differences in TF programs within primary T cells versus Jurkat cells drove this difference, we performed SCENIC on single-cell RNA sequencing (scRNA-seq) data from each cell type to relate gene expression programs to TF activity33. In agreement with our TF motif disruption analysis, we found that NF-κB, JUNB, RUNX1, MYB and ZEB1 expression programs were highly operant in primary T cells but not within Jurkat cells, whereas Jurkat cell programs were driven more by TFs such as SOX4, ATF4 and ELF2 (Fig. 2b). Therefore, although both Jurkat and primary T cell MPRAs enriched for causal variants, differences in emVar identification in each setting are probably driven by alternative cellular programs and orchestrated by TFs.

Fig. 2: Differences in predicted TF motif disruption between Jurkat and primary T cell MPRA experiments.
figure 2

a, Scatterplot comparing the effect of variants that are predicted to disrupt a TF motif and subsequent cumulative effect on MPRA expression between both primary T cell (red outline) and Jurkat (blue fill) experiments. The effect size is calculated using Cohen’s d for variant alleles predicted to disrupt a given TF motif, and P values are calculated using a two-sided t-test comparing the effect on expression of variants that disrupt a given motif versus all other variants. b, Scatterplot comparing cumulative effect of disruption of a given factor on MPRA expression (as in a) for Jurkat (blue) and primary T cell (red) results (x axis) and the AUCell activity score indicating TF regulon activity within a given cellular population based on single-cell RNA-seq data from Jurkat and primary T cells. The shade of each dot is the −log10P from a, calculated using a two-sided t-test comparing the effect on expression of variants that disrupt a given motif versus all other variants.

To determine whether we can identify TFs that drive risk in a disease-specific manner, we grouped variants by disease and repeated the analysis. We observed several TFs with disease-specific enrichment, including the ZNF563 motif, whose disruption is highly activating at IBD (Cohen’s d = 0.35, P = 0.0006) and rheumatoid arthritis loci (Cohen’s d = 0.45, P = 0.13), but repressive at psoriasis loci (Cohen’s d = −0.84, P = 0.005), and disruption of the GATA3 motif to be highly activating at multiple sclerosis loci (Cohen’s d = 0.30, P = 0.002) but with an average of no effect in loci associated with other diseases (Extended Data Fig. 6c). Therefore, we find MPRAs to be sensitive to TF usage in different cellular contexts and we define TFs that may be more important at specific disease loci.

emVars connect to T cell networks through multiple pathways

Motivated by the observation that the critical transcriptional regulators of T cell responses appear to mediate some primary T cell emVars, we sought to understand the pathways that emVars modulate to increase disease risk. To this end, we compared putative target genes of emVars in primary T cell DHS sites that were identified in primary T cell versus Jurkat MPRAs using the Open Targets Variant to Gene (V2G) dataset (Supplementary Tables 14 and 15 and Supplementary Note)19. We input these genes into STRING34 to define a primary T cell network of genes according to gene interaction experiments, co-expression and text mining. The resulting primary T cell network was more highly connected than expected when compared to a background of all 3,100 V2G genes linked to all MPRA-tested variants in T cell DHS sites (STRING protein–protein interaction enrichment, P < 1 × 10−16; Fig. 3a and Supplementary Data 1). Overall, the primary T cell network was enriched for T cell activation according to EnrichR35 even when compared to the refined background of the 3,100 V2G genes (false discovery rate, 0.026; Panther module; Supplementary Table 16). We then defined clusters within the network and observed that the largest clusters were involved in lymphocyte activation, translation, transcriptional regulation, antigen processing, mRNA processing and mRNA splicing (Fig. 3a,b and Supplementary Table 17).

Fig. 3: Network analysis of predicted target genes of emVars identifies a lymphocyte activation cluster.
figure 3

a, STRING network showing V2G genes linked to 79 emVars in T cell DHS sites (nodes) and edges representing the strength of gene–gene interactions. Colors represent different network subclusters. P value is calculated using a two-sided hypergeometric test. b, The subclusters with the most genes from the larger network in a with gene nodes labeled. ce, The lymphocyte activation (c), translation (d) and transcriptional regulation (e) clusters with each emVar on the x axis and target gene on the y axis. Fill color indicates that the gene is a V2G gene of the indicated emVar.

Within the lymphocyte activation cluster were known costimulatory genes expressed in T cells that encode proteins that regulate T cell activation, including CD28, CTLA4, ICOS, GITR, OX40 and SLAM family members (Fig. 3b, pink cluster). We also found that the transcriptional regulation cluster contained several genes encoding members of the NF-κB signaling family NFKB1, NFKB2, NFKBIA, TNFAIP3 and TNIP1 (Fig. 3b, yellow-green cluster). Both costimulatory genes and NF-κB signaling family members were absent from comparable clusters within a network built on V2G genes linked to Jurkat emVars in T cell DHS sites (Extended Data Fig. 7a,b, pink and yellow-green clusters and Supplementary Table 18). To connect genes within these networks to potential therapeutic targets, we used the Connectivity Map, finding that the top three primary T cell clusters were more significantly associated with NF-κB-driven gene programs compared to the Jurkat emVar clusters (Fig. 3b, Extended Data Fig. 7b,c and Supplementary Note). We created an emVar-by-gene matrix for each cluster for both primary T cell and Jurkat networks, which showed largely distinct gene targets in primary T cell emVar versus Jurkat emVar clusters (Fig. 3c–e, Extended Data Fig. 7d–g and Supplementary Note). Thus, the putative target genes of emVars found in primary T cell MPRAs are more involved in T cell costimulation and NF-κB signaling than those of Jurkat emVars.

Single-cell CRISPRi screens connect emVars to target genes

Although V2G data provide putative gene targets of variants, we sought to connect variants directly to the genes they regulate using a single-cell CRISPRi (scCRISPRi) approach in primary T cells. We used guide RNAs (gRNAs) and catalytically dead Cas9 (dCas9) tethered to a chromatin-repressing ZIM3–KRAB domain to target variant CREs and assessed local effects on gene expression (within 1 Mb) with scRNA-seq (Fig. 4a)36. We created two gRNA libraries to test 56 total emVar CREs. The first library targeted 20 emVars and three non-emVars in T cell-accessible chromatin, prioritizing variants more likely to be causal variants by PIP. The second library targeted 49 T cell emVars in T cell-accessible chromatin (emVar CREs) > 3,500 bp from transcription start sites (TSSs), to avoid silencing of promoter regions. A total of 13 emVar CREs overlapped between both libraries (Supplementary Tables 19 and 20). We used SCEPTRE37 to connect CREs to local genes (<1 Mb) based on differential gene expression in cells containing gRNAs targeting the CRE versus those containing non-target gRNAs.

Fig. 4: scCRISPRi screens connect emVars to target genes.
figure 4

a, Workflow for scCRISPRi screens. MOI, multiplicity of infection. b, Volcano plots depicting significantly differentially expressed genes when targeting a given emVar CRE, with the distance of emVar CRE to gene indicated by dot color; log2(fold change) is on the x axis and −log10P for differential gene expression is on the y axis. The dotted line indicates the empirical significance cutoff determined by SCEPTRE based on calibration with the non-target control gRNAs. ce, Locus plots of the IL2RA (c), SESN3 (d) and PLEC loci (e). In d, inset scale for genome tracks is 0–3. pcHiC loops from primary human T cells are depicted below genes in the locus plot. Disease-associated variants (dots) are red if they are emVars in T cell DHS sites, blue if they are emVars not within DHS sites and gray if they are non-emVars. Accessible chromatin data from T cells are depicted as read pileups (peaks) on the locus track from various T cell types. The pink lines represent the location of emVars in DHS. Violin plots depict genes that are differentially expressed when targeting CRISPRi to the emVar using gRNAs compared to cells containing non-target gRNAs. f, Network of genes identified using scCRISPRi screens compared to all tested V2G genes for 56 emVar CREs. The red subcluster indicates the lymphocyte activation network. In be, two-sided SCEPTRE P values are false discovery rate-corrected using the Benjamini–Hochberg method. NT, non-target gRNA; FC, fold change.

We found 13 of the 56 tested emVar CREs to impact at least one gene in cis, with a total of 18 significant emVar CRE:gene interactions (Fig. 4b, Extended Data Fig. 8a,b and Supplementary Tables 21 and 22). Among them, we found that rs61839660 (type 1 diabetes and IBD PICS = 0.98), an IL2RA intronic variant (9 kb from TSS) previously associated with the timing of IL-2RA protein expression in murine T cells38, was associated with a downregulation of IL2RA but also an upregulation of several nearby genes, including IL15RA, a gene involved in homeostatic proliferation of memory T cells, and RBM17, which encodes a protein involved in non-sense mediated decay (Fig. 4b,c)39. We also identified rs887314 (psoriasis PICS = 0.13) within the promoter of BAD, which we found to not only regulate BAD, which encodes a protein involved in T cell development and apoptosis40, but also GPR137 and OTUB1 expression (Fig. 4b and Extended Data Fig. 8c). Other notable hits include rs56095240 (multiple sclerosis PICS = 0.18) in an intergenic region 456 kb from the SESN3 TSS, which regulates SESN3 expression (Fig. 4b,d), encoding a protein involved in negative regulation of reactive oxygen species signaling41 and T cell MAPK signaling42, and rs60600003 (multiple sclerosis PICS = 0.48) in intron 1 of ELMO1, 106 kb from the ELMO1 TSS, which had a substantial effect on ELMO1 expression, encoding a gene involved in lymphocyte motility (Fig. 4b, left and Extended Data Fig. 8d)43. In addition, two other emVars, rs1250567 (multiple sclerosis PICS = 0.03) and rs7441808 (rheumatoid arthritis PICS = 0.007 and eosinophil counts UK Biobank PIP = 0.18 (ref. 30)) were significantly associated with RBPJ and ZMIZ1, respectively, both encoding proteins involved in WNT signaling in T cells (Fig. 4b, right)44,45, and rs61907765 (psoriasis PICS = 0.43) was associated with ETS1 expression, which encodes a TF involved in survival and activation of T cells and the development of natural regulatory T cells (Fig. 4b, right)46. We also identified genes that could have a role in disease biology, but with limited or no previous evidence of contributing to T cell biology. For example, targeting three emVars in separate CREs in a large haplotype within PLEC with CRISPRi leads to an upregulation in GRINA expression, which encodes a glutamate receptor (Fig. 4e); neither PLEC nor GRINA has an established role in T cells. Furthermore, between both screens, we identified two emVars that regulate PPP5C expression, which encodes a phosphatase that acts on ERK signaling47 but, to our knowledge, has no established role in T cell biology (Fig. 4b).

Finally, to assess whether genes identified by our scCRISPRi screens were enriched in T cell-related networks, we relaxed our calling threshold to include variant CRE:gene interactions of marginal significance (P < 0.05). We were able to connect 37 of the 56 tested variants (66%) to 61 genes, of which 49 overlapped with V2G genes (Supplementary Tables 21 and 22). Interestingly, through creating a STRING network based on the hits versus all local genes that were tested in the single-cell screen, we again found that the T cell activation cluster was the most predominant cluster in the network (Fig. 4f, Supplementary Table 23 and Supplementary Note), further supporting the importance of T cell activation programs in genetic risk for autoimmunity.

Linking variant CREs to T cell proliferation

Although many primary T cell emVar target genes were found within or connected to T cell activation networks, whether emVar CREs actually impact T cell proliferation remains unknown. To systematically link variant CREs with T cell activation and proliferation, we used bulk CRISPRi screens in primary human T cells (Fig. 5a). As these data could be broadly useful for classifying variant function, we also assessed ~1,000 additional autoimmune variant CREs (Extended Data Fig. 9a,b). We created a gRNA library targeting each variant CRE, along with positive controls known to affect T cell proliferation and non-targeting gRNAs (Fig. 5a and Supplementary Table 24), and performed a screen to assess how dCas9–ZIM3-mediated silencing of variant CREs affected T cell proliferation (Supplementary Note and Methods).

Fig. 5: Proliferation screens identify emVars that modulate T cell proliferation.
figure 5

a, Proliferation screen experimental workflow. b, Volcano plot of significant positive control genes and variant CREs (blue and red) and non-significant targets (gray), with the log2(fold change) on the x axis and the −log10(FDR) on the y axis. c, STRING network based on 13 emVar CREs that are CRISPRi proliferation hits. The lymphocyte activation and mRNA processing clusters and the PPP5C gene are highlighted in color. d, Locus plot depicting the PPP5C locus. Disease-associated variants (dots) are depicted according to MPRA allelic skew: rs4802307 (red), an emVar in a T cell DHS site, rs62136101(blue), a probable emVar in a T cell DHS site and non-emVars (gray). Accessible chromatin data from T cells are depicted as read pileups (peaks) on the locus track from various T cell types. The pink lines represent the location of emVars in DHS sites. e, Heatmap of differentially expressed genes when targeting CRISPRi to the PPP5C TSS or to the rs62136101 CRE using gRNAs compared to cells containing a non-target gRNA. f, Scatterplot depicting the correlation between differentially expressed genes when targeting rs62136101 versus NT (y axis) and the PPP5C TSS versus NT (x axis). P value in c is determined through a two-sided hypergeometric test, and those in f are determined using a Wald test with DESeq2. Error bars in f represent the standard error 95% confidence interval.

Through analyzing the effect of targeting all ~1,000 autoimmune GWAS variants in T cell DHS sites, we identified known positive controls, including VAV1 and IL-2RB as positive regulators of T cell proliferation, and CBLB, a known negative regulator (Extended Data Fig. 9c and Supplementary Table 25)48,49. We identified 21 additional variant CREs that were significantly associated with T cell proliferation (Padj < 0.1; Extended Data Fig. 9c and Supplementary Table 25). Among the hits with the strongest effect were variant CREs that contact the MYC promoter according to promoter capture HiC in T cells, such as rs10098999 (rheumatoid arthritis PICS = 0.0014) and rs10113762 (rheumatoid arthritis PICS = 0.0145), both of which are 800 kb downstream of MYC (Extended Data Fig. 9d)50. However, we found that variants that were hits within this screen were not enriched for statistically fine-mapped variants even at moderate posterior probabilities (Supplementary Note).

Focusing our analysis on the 56 emVar CREs that we analyzed in our scCRISPRi screens, we analyzed gRNAs that were enriched or depleted at day 21 compared to day 2 (Fig. 5b and Supplementary Table 26), identifying 12 emVar CREs that reduce proliferation when targeted with CRISPRi and one emVar CRE that promotes proliferation. Interestingly, the most significant emVar CRE hit that reduced T cell proliferation when targeted, rs10932019 (rheumatoid arthritis PICS 0.007), is 50 kb downstream from CD28, required for T cell activation. Other emVar CREs of note that reduced T cell proliferation when targeted were rs307370 (IBD PICS = 0.018) in the TNFRSF4 (encoding OX40) locus, rs61839660 (type 1 diabetes PICS = 0.98) in the IL2RA locus and rs9610375 (multiple sclerosis PICS = 0.0028) in the MAPK1 locus; each of the proteins encoded by these genes has previously been shown to be a positive regulator of T cell proliferation51,52,53. Conversely, the emVar CRE that increased T cell proliferation when targeted, rs11757155 (IBD PICS = 0.017), is in an intron of BACH2, encoding a TF involved in suppressing T cell activation and effector T cell differentiation54. Surprisingly, we found a correspondence between emVar CRE effects on proliferation and the effects of targeting their putative target genes within a genome-wide Cas9 proliferation screen in primary T cells. (Extended Data Fig. 9e,f), suggesting that many of the emVars could be in enhancers for these genes.

Given that many of our emVar CRE hits appeared in the T cell signaling and proliferation networks highlighted in the MPRA STRING network (Fig. 3), we next sought to determine whether the V2G genes of emVar CRE proliferation hits were enriched for these network clusters compared to a background of V2G genes linked to all 56 emVar CREs tested in the proliferation screen (Supplementary Table 27). We again found the top represented clusters were those pertaining to lymphocyte activation and signaling along with mRNA processing (Fig. 5c, Extended Data Fig. 9g,h and Supplementary Table 28). Our scCRISPRi screens found that five of the 13 proliferation hits also had a significant effect on the expression of local genes, including rs61839660 with IL2RA, IL15RA and RBM17, rs7823393 with GRINA, rs56095240 with SESN3 and both rs4802307 and rs62136101 with PPP5C (Fig. 4b). Therefore, we found that 13 of the 56 tested emVar CREs (23%) had a significant effect on T cell proliferation, linking these putatively causal variants to T cell function and highlighting putative target genes that enrich for T cell activation pathways.

PPP5C is a regulator of T cell proliferation

Two emVar CREs that reduced T cell proliferation when targeted, rs4802307 (emVar) and rs62136101 (an emVar with allelic skew in six out of seven donors; Extended Data Fig. 9i), are in the PPP5C locus on the same haplotype. The rs4802307-A (IBD lead variant) and rs62136101-T (R2 = 0.96 to rs4802307 in Europeans) alleles are associated with reduced PPP5C expression in T cell expression quantitative trait locus data and are protective from IBD (IBD PICS = 0.1 and 0.06, respectively) (Fig. 5d)28. From our single-cell screens, we found that PPP5C is the only significantly differentially expressed gene locally, suggesting that this is the target gene. PPP5C promotes RAF–MEK–ERK signaling and cancer cell proliferation47,55, but its function in T cells has not been assessed. Given that we find targeting both rs4802307 and rs62136101 to affect T cell proliferation but that PPP5C is not directly within the main T cell activation network, we reasoned that PPP5C could be a distal regulatory node to the T cell activation cluster. To assess whether silencing of PPP5C affects transcriptional programs in T cells, we targeted CRISPRi to both the rs62136101 CRE and the PPP5C TSS in T cells and compared the effects on global transcription to cells containing a non-targeting gRNA using RNA-seq. We found an upregulation of genes associated with T cell metabolism and function (Fig. 5e, Supplementary Tables 29 and 30 and Supplementary Note)56,57,58,59,60,61. Although targeting CRISPRi to rs62136101 had a more subtle effect than the TSS, both led to consistent differentially expressed genes involved in T cell biology (Fig. 5f). As PPP5C has been shown to act on MAPK signaling in cancer cell lines47,55,62, we sought to define how PPP5C shapes MAPK signaling in CD4 T cells. We performed protein blotting of phosphoproteins in the ERK–MAPK pathway in unstimulated and stimulated CD4 T cells with Cas9-mediated PPP5C ablation versus those containing non-target control sgRNAs, as well as on CD4 T cells transfected with a PPP5C-overexpression vector versus vector-only transfected CD4 T cells from two human donors (Extended Data Fig. 10a–c). We found that PPP5C knockout increased the phosphorylation of multiple components of the MAPK signaling pathway, including AKT, CREB, ERK1/2, GSKA3 and JNK, particularly in the unstimulated condition, while PPP5C overexpression reduced phosphorylation of these same signaling components in unstimulated conditions (Extended Data Fig. 10b–e). Therefore, PPP5C is a previously unappreciated component of T cell signaling that controls the baseline phosphorylation of several key members of the MAPK signaling pathway, and that IBD protective alleles downregulate, leading to tuning of T cell metabolic and effector programs.

Discussion

Identifying variants that underlie complex traits and defining their effects on disease-relevant cell types continues to be a longstanding challenge. Resources such as Open Targets Genetics and other online databases have been essential for providing observational and correlative data that aid in prioritizing variants as likely causal on a haplotype, defining the tissue in which the variant may promote the effect and the target gene. However, although these databases further refine variants that are likely causal, there often remain many potentially causal variants per haplotype with many putative target genes. To better understand variants that act within disease-relevant conditions, it would be ideal for online databases to include high-throughput perturbations of variants in many cell types, which could help identify variants that have effects on cis-regulatory regions and connect cis-regulatory regions to target genes across both disease-relevant and irrelevant contexts to better contextualize variant and haplotype activity. MPRA has proven to be a powerful approach that can enrich for causal variants and define the cell type and state in which the variant functions, suggesting its utility to define contexts for variant effects and to map likely causal variants genome-wide.

Identification of causal variants requires testing their effects in relevant cell types and assessing the broad biological functions of variants. T cells underlie the pathogenesis of many autoimmune diseases63,64,65. However, primary T cells have been notoriously difficult to genetically engineer and perturb until recent advances49,66. The Jurkat cell line has served as a tractable model for primary T cells for decades, although this cell line contains thousands of mutations, including within key tumor suppressors such as P53, the PI3K pathway and large structural variants23,67. Through testing the same MPRA library in both primary T cells and Jurkat cells, we find that both conditions identify likely causal variants, but the identified emVars largely differ, probably because of differences in TF usage within each setting. Given these data, we believe more variants that affect CRE activity would be discovered by assaying the same MPRA library in other primary T cell populations and states and other disease-relevant cell types, as well as assaying other potential functions of variants. MPRAs have been conducted in the context of other primary cells such as neuronal progenitors68, differentiated glutamatergic neurons69, brain organoids and human cortical tissue70, but only small MPRA libraries have been implemented in primary immune cells, such as one within primary human monocytes strongly implicating ETS2 regulation in monocytes in IBD pathogenesis71. Further technological development is required to implement MPRAs across other relevant cell types. Additionally, causal variants could function through alternate splicing or modulating transcript stability. MPRAs have been developed to test these functions72,73, and in future studies should be implemented within disease-relevant primary human cells.

Once putatively causal variants are identified, there remains a key challenge in defining their target genes and pathways. To link variants to genes, scCRISPRi screens have been performed, particularly within the K562 cell line20,22. Our scCRISPRi screens in primary T cells identify emVar target genes, many of which are relevant to T lymphocyte activation. However, we were surprised to find that our CRISPRi screens often did not agree with the Open Targets Genetics data with regard to the most likely target gene or the number of genes affected by a given variant. For example, SESN3 is not predicted to be a top target for rs56095240, but we found it to be the only differentially expressed gene in the 1 Mb region in our scCRISPRi screen when targeting this variant CRE. This brings into question whether variants or their haplotypes can have pleiotropic effects depending on the cell type and state in which it is tested, an idea supported by our MPRA in two cellular settings. However, additional experiments, including genomic editing of these loci in each setting, will be needed to determine whether this is indeed the case. Single-cell screens still lack sensitivity for smaller effects on expression; therefore, we probably missed disease-relevant genes. Additional target genes might be identified through increasing the number and efficacy of gRNAs and increasing the number of cells assayed. However, the effect size of a given enhancer on a gene and the specific context in which it functions will also affect whether a target gene is identified in these screens74. For some loci, we do find emVars CREs that regulate more than one gene. Thus, disentangling causal variant effects for these loci will be more complex than focusing on singular target genes within each locus, which has been the focus of genetic knockout studies for many years.

Although we expected causal variants to have larger effects on T cell function, the hits from our genome-wide CRISPRi screen targeting ~1,000 variant CREs to determine their effects on T cell proliferation did not enrich for causal variants. We suspect that variants that impose large effects on regulatory regions important for directly impacting disease processes are more likely to be rarer or eliminated in populations through purifying selection. Inversely, common disease variants may be more likely to reside in low-impact enhancers of key disease genes or higher-impact enhancers of non-critical genes with modest effects on cellular function, although more evidence will be needed to support this theory. Our data are in line with the recent observations showing systematic differences between expression quantitative trait loci and GWAS loci, whereby disease variants that have a high impact on a trait tend to have lower effects on local gene expression75. Through integrating our MPRA, scCRISPRi and proliferation-based CRISPRi screens, we identified a number of known and previously unappreciated target genes that control T cell activation, including PPP5C, a protein phosphatase that regulates ERK signaling in cancer cells47 but was previously unknown to affect T cells. Through testing variant CRE effects across other cellular functions, we can begin to better understand how common variants across cellular networks affect gene expression and function, and how variants may work together to lead to disease. Base editing, a method that can more precisely assess variant effects on biological functions, will still be required before definitively concluding variant mechanisms.

Our genomic screens in primary human T cells connect likely causal variants to their putative effects on T cell expression networks and function. These data can be used to propose mechanisms of risk and protection from autoimmune diseases mediated by primary T cells and begin to determine convergent properties of variants, which could be useful for stratifying polygenic risk scores and modifying treatments for individuals to target specific pathways.

Methods

Ethical regulations

This research complies with study protocols approved by the Benaroya Research Institute Institutional Review Board under protocol number IRB07109-633. The protocol was conducted according to the principles expressed in the Declaration of Helsinki. All cohorts provided informed, written consent.

Human subjects

For MPRA and bulk CRISPRi experiments, 11 fully deidentified donor peripheral blood mononuclear cells (seven for MPRA experiments, four for bulk CRISPRi experiments) were isolated from fresh apheresis leukoreduction packs (BloodWorks) using Ficoll-Paque plus (GE, 17-1440-03) (no sequencing of identifiable information), and for scCRISPRi screens and bulk RNA-seq, we used frozen peripheral blood mononuclear cells from five healthy donors within the BRI Biorepository: three for scCRISPRi screens (a 57-year-old Asian male, a 30-year-old Caucasian female and a 35-year-old Caucasian female) and two for bulk RNA-seq (a 56- year-old Asian male and a 32-year-old Caucasian male).

Cell culture

CD4 T cells were magnetically isolated (BioLegend, 480130) and activated in T cell media (TCM), constituting either X-VIVO 15 (Lonza, 04-418Q) supplemented with 25 mM HEPES, 1 mM sodium pyruvate, 0.5% non-essential amino acids, 1% penicillin–streptomycin, 0.5% l-glutamine, 5% FBS and 55 mM 2-mercaptoethanol; or CTS OpTmizer T cell Expansion Medium (ThermoFisher, A1048501), supplemented with 5% FBS, 1% glutamine, 100 U ml−1 penicillin–streptomycin and 55 mM 2-mercaptoethanol. Cells were activated with human T cell activation beads (Miltenyi, 130-091-441) and recombinant human IL-2 (final concentration, 100 U ml−1; NCI Biological Resources Branch). Lenti-X 293T cells were maintained in DMEM supplemented with 10% FBS, 1% glutamine, 100 U ml−1 penicillin–streptomycin, 1 mM sodium pyruvate, 1× MEM non-essential amino acids and 10 mM HEPES. Cells were passaged every 2–3 days using trypsin-EDTA for dissociation and kept at a confluency of less than 60%.

MPRA library transfection and sequencing

Please see the Supplementary Note.

Lentivirus production

The lentiviral production protocol was modified from a previous publication49. For making lentivirus, 293T cells (ATCC CRL-3216) were seeded in Opti-MEM I Reduced Serum Medium, GlutaMAX Supplement (ThermoFisher, 51985034) supplemented with 5% FBS, 1 mM sodium pyruvate and 1× MEM non-essential amino acids (as cOPTI-MEM) at 4 × 106 cells per 10 cm petri dish 1 day before the transfection. Cells were transfected at 80% confluency using 41.4 μl of Lipofectamine 3000 transfection reagent (ThermoFisher, L3000015) in 1,250 μl of plain OPTI-MEM (ThermoFisher, 31985070) at 21–25 °C. Next, 11 μg of transfer plasmid (dCas9-ZIM3-mCherry, Addgene, 154473; sgRNA libraries cloned into CROP-seq-opti, Addgene, 106280), 7.5 μg of psPAX2 (Addgene, 12260), 3.3 μg of pCMV-VSVG (Addgene, 8454) and 36.5 μl of p3000 reagent were added to 1,250 μl of room temperature plain OPTI-MEM in a separate tube and mixed by gentle pipetting. The plasmid and Lipofectamine 3,000 mixes were combined, mixed by gentle pipetting to a 2.5 ml volume of transfection mixture and incubated for 15 min at room temperature. Following incubation, 5 ml of medium was removed from the 10 cm dish and 2.5 ml of the transfection mixture was added. After 6 h, the transfection medium was replaced with 15 ml of cOPTI-MEM containing 1× ViralBoost (Alstem Bio, VB100). Lentivirus supernatant was collected and kept at 4 °C for 24 h after transfection (first collection) and replaced with 15 ml fresh cOPTI-MEM. The second collection was done 48 h after transfection. The two collections were pooled and spun down at 500g for 5 min at 4 °C to clear cell debris. Lenti-X concentrator (Takara Bio, 631232) was used to concentrate the virus, following the manufacturer’s instructions, and resuspended in plain OPTI-MEM at 100-fold less than the original volume. Concentrated virus was subsequently aliquoted and frozen at −80 °C.

Generation of CRISPRi libraries

Please see the Supplementary Note.

scCRISPRi screens

Five million CD4+ T cells were thawed and cultured in 5 ml of TCM supplemented with human T cell activation beads for 48 h. T cells were then infected with a 2–4% (v/v) solution of 100× concentrated dCas9-ZIM3-mCherry lentivirus to introduce the CRISPRi machinery. Then, 24 h later, the cells were washed twice with PBS and subsequently infected with an 8% (v/v) solution of 100× concentrated CRISPR-QTL virus, which targets specific variants of interest. A subset (5%) of gRNAs contained within the library targeted CD45 as a positive control to evaluate gene suppression efficacy. By day 2 post-library infection, puromycin was added to the culture at a final concentration of 2.0 µg ml−1 in fresh TCM to select for library transduced cells, and the cell density was adjusted to 0.5 million cells per ml. At 4 days post infection (4 dpi), the cells were resuspended in 20 ml of TCM containing 1.0 µg ml−1 of puromycin. At 10–12 dpi, the cells were collected, divided into six groups and stained with anti-CD3, anti-CD4 and anti-CD45 antibodies, as well as a live/dead dye (see Supplementary Table 31 for antibody dilutions). Six hashtag antibodies were also used to separate donors and to enable superloading of the 10× controller (Supplementary Table 31). Before sorting, six groups of cells were pooled and then sorted. A total of 150,000 (v1, one donor) and 330,000 (v2, two donors) mCherryhi/GFPhi cells (CD3+/CD4+/live/CD45+ pre-gated) were sorted using a FACS cell sorter (Supplementary Fig. 1). The sorted cells were promptly loaded onto two (v1) or six (v2) channels on the 10× Chromium X controller (10× Genomics) according to the manufacturer’s protocol, with a target capture of 20,000 (library 1) and 12,500 (library 2) cells per channel. Sequencing libraries were generated using the Chromium Next GEM Single Cell 5′ Kit v2 (10× Genomics, 1000265). Gene expression, CRISPR and feature barcoding libraries were pooled at a 4:1:1 ratio and treated with Illumina Free Adapter Blocking Reagent (Illumina, 20024144). Sequencing of pooled libraries was carried out on a NextSeq 2000 sequencer (Illumina), using a NextSeq P3 flowcell (Illumina) for v1 or sequenced on a Nova-seq X Plus (Illumina) 25B flow cell. Basecalls were processed to FASTQs on BaseSpace (Illumina).

Bulk CRISPRi screen for T cell proliferation

A total of 6 × 107 CD4+ T cells were activated in 30 ml of TCM for 24 h. T cells were then infected with 2% v/v 100× concentrated dCas9-ZIM3-mCherry lentivirus. Then, 24 h later, the cells were infected again with 0.25% v/v 100× concentrated 1,000 variant library lentivirus (multiplicity of infection, 0.5 ~ 1). At 1 dpi, cells were counted to ensure that there were at least 30 × 106 live cells. A total of 15 × 106 cells were collected as the time zero control (day 2) of the proliferation screens. For the remaining cells, fresh Th0 media (TCM with recombinant human IL-2 (final concentration, 500 U ml−1)) and puromycin (final concentration, 2.5 μg ml−1) were added to bring cells to 1 × 106 cells per ml. At 2 dpi, cells were collected, spun down and resuspended at 0.5 × 106 cells per ml in Th0 media. Cells were maintained between 0.5 and 1 × 106 cells per ml until 10 dpi, when cells were collected and live ZIM3–mCherry+ cells were FACS-sorted (typically ~10–30% of total cells; Supplementary Fig. 1 and see Supplementary Table 31 for antibody dilutions). Sorted ZIM3–mCherry+ cells were maintained at 0.5 × 106–1 × 106 cells per ml in Th0 media until day 21. At least 15 million cells were then collected per donor and stored at −80 °C until genomic DNA (gDNA) extraction.

Bulk CRISPRi screen sequencing library preparation

gDNA from cells (in ten million cell increments) was resuspended in 50 μl ChIP lysis buffer (1% SDS, 10 mM EDTA, in 50 mM Tris-HCl pH 8.1) and pipetted up and down. Lysed cells were transferred to a 96-well plate or eight-well strip, then incubated at 65 °C for 10 min. The sample was cooled to 37 °C, and 1 μl RNase cocktail (Ambion, AM2286) was added, mixed by pipetting and spun down, followed by incubation at 37 °C for 30 min. Next, 5 μl proteinase K (NEB, P8107) was added, and the sample was mixed by pipetting. The sample was then incubated at 37 °C for 2 h, then at 95 °C for 20 min to denature the proteinase K. To isolate gDNA, we added 36.4 μl Ampure XP to the sample, mixed thoroughly by pipetting, incubated for 5 min, and used a magnet to isolate magnetic beads and gDNA from the lysed sample. We pipetted off the supernatant and washed the sample three times with 80% ethanol while on the magnet. After drying the pellet for 5 min, gDNA was then eluted in 45 μl double-distilled H2O, yielding on average 3–6 pg per cell.

We then used 0.6 μg of gDNA from each sample for qPCR to determine the optimal PCR cycle number for library preparation. Each 10 μl qPCR reaction contained 5 μl of NEBNext Q5 Hotstart HiFi PCR master mix (NEB, M0543L), FwdInnerSeq and RevInnerSeq at 500 nM each (final concentration; Supplementary Table 31), 0.6 μg of gDNA, 1.7 μl of SYBR (diluted 1:10,000) and water to a final volume of 10 μl. Once the optimal PCR cycle was determined, 50 μl PCR reactions were used to amplify the amplicon, and the number of reactions was scaled to the total available gDNA. The PCR reactions from each sample were pooled, and 50 μl was taken for amplicon purification. Amplicons were purified using a two-step Ampure XP method. A 0.65× volume of Ampure XP was added to the sample, followed by mixing via pipette and incubating for 5 min. The sample was applied to the magnet, and the supernatant was isolated for further purification. To the supernatant, an additional 1.0× Ampure XP was added to the sample to capture the PCR amplicon. The sample was incubated for 5 min, and then the magnetic beads were captured by a magnet. The supernatant was removed and discarded, and the captured beads were washed three times with 80% ethanol. The sample was dried for 5 min and eluted with 25 μl of H2O. Samples were analyzed on a TapeStation to assess amplicon purity and size estimation, and the concentration of amplicon DNA was measured using Qubit. Samples were pooled based on the concentration of the specific target amplicon percentage and sequenced on a NextSeq 2000 with custom read1 primer hU6_R1 and custom index primer sgPuro_I (Supplementary Table 31).

Bulk RNA-seq

A total of 1 × 107 frozen CD4+ T cells were thawed and activated using human T cell activation beads in 10 ml of TCM. Then, 24 h later, the T cells were infected with a 10–15% v/v solution of 100× concentrated dCas9-ZIM3-mCherry lentivirus, facilitating the introduction of the CRISPRi machinery. At 24 h after CRISPRi infection, the cells were further infected with a 10–15% v/v solution of 100× concentrated gRNA virus designed to target specific variants of interest (see Supplementary Table 31 for gRNA sequences). As a positive control, we used a CD45 sgRNA to assess the efficacy of gene suppression. On day 2 post guide infection, puromycin was added into the culture at a final concentration of 2.0 µg ml−1 in fresh TCM to select for transduced cells and adjust the cell density to 0.5 million cells per ml. On day 4, the puromycin concentration was reduced to 1.0 µg ml−1 to maintain selection. Cells were expanded for an additional 3–4 days, and 1 day before cell sorting, flow cytometry analysis was performed to confirm the efficiency of CD45 knockdown (>85%) in mCherryhi/GFPhi cell populations (Supplementary Fig. 1 and see Supplementary Table 31 for antibody dilutions). On day 7 or 8, FACS was used to isolate at least 300,000 cells exhibiting high expression levels of mCherryhi/GFPhi. After sorting, cells were centrifuged to remove supernatant and immediately lysed using Trizol reagent. Cell lysates were subsequently frozen at −80 °C before RNA extraction. RNA extraction was carried out using Direct-zol RNA Microprep (ZYMO, R2063), following the manufacturer’s instructions, to obtain RNA for bulk RNA-seq. Total RNA was added to the reaction buffer from the SMART-Seq v4 Ultra Low Input RNA Kit for Sequencing (Takara, 634891), and reverse transcription was performed, followed by PCR amplification to generate full-length amplified cDNA. Sequencing libraries were constructed using the NexteraXT DNA library preparation kit with unique dual indexes (Illumina, FC-131-1096) to generate Illumina-compatible barcoded libraries. Libraries were pooled and quantified using a Qubit. Sequencing of pooled libraries was carried out on a NextSeq 2000 sequencer (Illumina) with paired-end 59-base reads, using a NextSeq P2 sequencing kit with a target depth of five million reads per sample.

Phospho-antibody arrays

A total of 10 × 106 activated CD4 T cells from two human donors were split and transduced with lentivirus containing PPP5C-targeting guides or non-targeting guides and Cas9. Then, 48 h post-transduction, cells were selected with puromycin (2 µg ml−1) and hygromycin (200 µg ml−1). After 14 days, cells were either left unstimulated or stimulated for 30 min with PMA (50 ng ml−1) and ionomycin (1 µg ml−1), and cell lysates were collected from 10 × 106 cells per condition. Separately, 100 million activated CD4 T cells from two human donors were split and nucleofected using a Neon Transfection System (1,600 V, 10 ms, three pulses) with a PPP5C-overexpression vector (VectorBuilder, VB900177-7632ksm) or a control vector (VectorBuilder, VB010000-9857ehb), rested for 48 h in TCM without IL-2, sorted based on BFP+ indicating successful transfection of the vector (Supplementary Fig. 2) and left unstimulated or stimulated for 30 min with PMA and ionomycin. The cell lysates were then collected. Lysates were mixed with protease and phosphatase inhibitors and frozen at −80 °C. Then, 20 µg of lysate was incubated on pre-blocked MAPK phospho arrays (RayBiotech, AAH-MAPK-1) overnight at 4 °C. Samples were washed with diluted wash buffer I and II for a total of five washes. Next, 1 ml of prepared detection antibody was added to each blot for 1.5–2 h at room temperature. Membranes were washed again as above with wash buffers I and II. Then, 2 ml of 1× horseradish peroxidase-anti-rabbit IgG was added to each blot and incubated for 2 h at room temperature. Samples were washed for a third time. Blots were exposed to detection buffer C + D and immediately imaged on a chemiluminescence system. We used ImageJ to obtain the density of each dot on the array and normalized each dot to the positive controls on the blot. We then compared the density of each dot from the PPP5C knockout and overexpression blots for each donor and stimulation condition by dividing the density value of the knockout or overexpression condition by that of the non-target or vector-only controls.

Analysis methods

Please see the Supplementary Note.

Statistics and reproducibility

We chose to perform our primary T cell MPRAs in seven human donors because we have found that this number is sufficient for identifying 10% effect sizes with 90% power assuming an activity standard deviation of 1.1, a Bonferroni-corrected alpha of 0.05 and 1,000 barcodes per SNP (https://andrewghazi.shinyapps.io/designmpra). We chose to perform our bulk CRISPRi screens with four donors because our simulation analyses found that we could successfully identify 10% effects with 80% power (https://zenodo.org/records/14847208). We chose to perform scCRISPRi screens with 150 cells per gRNA, as this was previously found to be reasonable power to detect differences in gene expression in cis (within 1 Mb)76. Two replicate samples were used for single gRNA CRISPRi RNA-seq experiments, which allowed us to detect 20% differences with 80% power assuming an alpha of 0.05. No data were excluded from the analyses. Given the unbiased nature of the experiments, we did not require the samples within experiments to be randomized or the investigators to be blinded to sample allocation during experiments and outcome assessment. All data met the assumptions of the statistical tests used.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.