Identifying genetic and cellular connections and distinctions among 15 autoimmune diseases using an in-silico approach

Dang, Xiao; Wang, Frank Qingyun; Zhang, Caicai; Lei, Yao; Su, Huidong; Yang, Cinderella Xinxin; Feng, Hong; She, Chun Hing; Chen, Xinxin; Yang, Xing Tian; Yang, Jing; Lau, Yu Lung; Wang, Yong-Fei; Yang, Wanling

doi:10.1038/s43856-026-01487-9

Download PDF

Article
Open access
Published: 07 March 2026

Identifying genetic and cellular connections and distinctions among 15 autoimmune diseases using an in-silico approach

Communications Medicine volume 6, Article number: 235 (2026) Cite this article

3054 Accesses
7 Altmetric
Metrics details

Subjects

Abstract

Background

Despite the identification of numerous genetic loci associated with autoimmune diseases (ADs) through genome-wide association studies (GWAS), elucidating the mechanisms underlying these associations remains challenging.

Methods

We integrated GWAS results with multi-omics data across diverse immune cell types to investigate both the shared and disease-specific association signals across 15 common ADs.

Results

Our analyses reveal a high prevalence of locus-sharing (50.8%) across these diseases when defined by physical proximity, but a substantially lower proportion of shared association signals (14.7%) when defined by linkage disequilibrium. This suggests that loci shared across diseases often harbor distinct association signals and mechanisms. We demonstrate that within individual loci, association signals frequently exhibit regulatory activity in different cell types and, less commonly, target different genes. Notably, for several loci, disease-specific associations appear to be mediated through regulatory activity in distinct cell types. Overall, we identify 1,554 genes associated with ADs. Further pathway enrichment and protein-protein interaction network analyses unveil both shared functions and disease-specific pathways among these genes.

Conclusions

By integrating GWAS and multi-omics data, our study delineates the genetic and regulatory architecture underlying autoimmunity, suggesting potential therapeutic targets and opportunities for drug repurposing.

Plain Language Summary

Autoimmune diseases occur when the immune system mistakenly attacks the body’s own tissues. Although many autoimmune diseases share genetic risk regions, the biological mechanisms driving these risks are not fully understood. This study examined fifteen common autoimmune diseases and integrated results from large genetic studies with detailed data on immune cell types to better understand their genetic similarities and differences. We demonstrated that while many diseases are linked to the same broad genetic regions, the specific genetic signals within those regions often differ and act in different immune cells. Our results highlight autoimmune disease-associated genes and identify shared and disease-specific biological pathways that may inform the development of new treatments and the repurposing of existing drugs.

Cross-disorder genetic analysis of immune diseases reveals distinct gene associations that converge on common pathways

Article Open access 12 May 2023

Genetic overlap between Alzheimer’s disease and immune-mediated diseases: an atlas of shared genetic determinants and biological convergence

Article 18 March 2024

Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics

Article 29 September 2022

Introduction

Autoimmune diseases (ADs) affect approximately one in ten individuals worldwide, posing significant health risks as the immune system mistakenly attacks healthy tissues¹. These conditions have a strong genetic component. Genome-wide association studies (GWAS) have identified thousands of genetic loci linked to disease susceptibility, significantly advancing our understanding of the underlying mechanisms². However, most loci reside in non-coding genomic regions or even “gene deserts”, complicating the identification of functional regulatory variants and target genes. Additionally, high linkage disequilibrium (LD) among common genetic variants further obscures the identification of risk‑influencing variants, leaving the functional mechanisms underlying most associated loci largely uncharacterized³.

Genetic sharing among ADs is common, with nearly half of the AD-associated loci linked to multiple traits^4,5. This sharing is often mistaken for common association mechanisms. Understanding the details of these shared genetic loci and their functional regulations can uncover both common and specific immunopathogenic mechanisms and pathways, aiding in identifying potential targets for drug repurposing or developing novel therapies. While several studies have explored shared genetic functions across ADs^4,6,7, a detailed and systematic analysis is crucial to fully understand disease association mechanisms, including both shared and disease-specific functions or pathways.

In this study, we systematically analyzed the association signals of the 15 most common ADs. We integrated GWAS with comprehensive functional genomics data, including expression quantitative trait loci (eQTLs), chromatin accessibility and enhancer-gene promoter connections, to identify relevant cell types and target genes at AD-associated signals. Utilizing a protein-protein interaction (PPI) network, we explored common and specific functional modules across these diseases. Our results highlight the importance of identifying target genes, functional cell types, and regulatory mechanisms at the signal level for various ADs. These analyses provide insights into precision treatment and drug repurposing opportunities. Using the IL12RB2/IL23 locus as an example, we demonstrate that genetic studies can accurately predict the efficacy and specificity of drugs targeting molecules in relevant pathways for the treatment of psoriasis and inflammatory bowel disease (IBD).

Methods

Defining loci and signals: data collection for 15 autoimmune diseases

Fifteen autoimmune diseases were selected for further analysis based on their prevalence, clinical significance, and the availability of genomic data in the GWAS Catalog (https://www.ebi.ac.uk/gwas/). For each disease, GWAS studies were prioritized according to cohort size when summary statistics were available, or by the number of reported associated variants when summary statistics were unavailable in the GWAS Catalog. To ensure data quality and reduce redundancy, GWAS studies with overlapping samples were assessed, and only the study with the largest cohort size or the highest number of reported significant variants was retained. The selected GWAS studies for these autoimmune diseases were subsequently utilized for downstream analyses (Table 1, Supplementary Table 1).

Table 1 The shared and disease-specific association loci, signals and target genes among the 15 autoimmune diseases

Full size table

To define independent loci and signals across these ADs, we first identified all reported variants with a significance threshold of P < 10⁻⁵ for each study from the GWAS Catalog. We then delineated loci by extending a ± 250 kb region around each reported variant. Overlapping loci were merged into a single locus. We excluded loci that met the following criteria: having only one reported variant, being reported by a single study, and with no variant achieving a statistical genome-wide significance of 5 × 10⁻⁸. For each reported variant, we identified all tag SNPs that are in high LD with the reported variant (r² >= 0.8) within a ± 500 kb window, based on 1000 Genomes Project Phase 3 data from either European or East Asian populations using the LDlinkR package⁸. Variants in high LD (r² >= 0.8) within the same locus were grouped into a single association signal; accordingly, a locus may harbor multiple signals. Within each signal, the reported variant with the smallest p-value for a disease was designated the lead variant.

The Jaccard score (J = |A ∩ B | / | A∪B | ), which calculates the fraction of overlapping values between two diseases, was used for each pair of diseases to measure the fraction of overlapping loci and signals, respectively.

Ranking association signals in each disease

To better understand the genes associated with each AD and their functional implications, we adopted a ranking scheme for the associated signals. For each GWAS study (Table 1), we ranked the signals based on the original association p-values of the independent lead variants. For diseases with multiple studies, we used a robust rank aggregation method⁹ to combine the signal lists from each study and determine the final ranking of signals. This method aggregates multiple ranked lists into a single statistical consensus ranking and is used for meta-analysis to combine ranked lists from different datasets. Ranking the signals could facilitate identifying and understanding the important candidate genes associated with each disease in the subsequent steps.

Identifying the target gene(s) for each signal

We employed five main SNP-to-gene linking strategies across 13 databases to identify cis gene targets for the tag variants in various immune cell types, aiming to determine the target gene(s) for the associated variants. These strategies included:

1)
Functional consequence analysis for the variants using the annotation tool Variant Effect Predictor(VEP)¹⁰;
2)
Identification of significant cis-eQTL (FDR < 0.05) from six studies on immune cells, namely DICE¹¹, ImmuNexUT¹², Genotype-Tissue Expression (GTEx)¹³, BLUEPRINT¹⁴, eQTLGen¹⁵, and scRNA-seq of ADs¹⁶;
3)
Enhancer-gene linking from two resources: EpiMap¹⁷ and Activity-By-Contact(ABC) model^18,19; ABC model predicts enhancer-gene connections in each cell type based on measurements of enhancer activity and 3D contact frequencies (Hi-C)¹⁹ and EpiMap predicts enhancer-gene links using the Pearson correlation between gene expression and enhancer activity¹⁷.
4)
Information of promoter-interacting regions by Promoter-capture Hi-C (PCHiC)²⁰;
5)
Integrated SNP-to-gene linking strategies from three tools: cS2G²¹, Open Targets V2G²², and L2G²³;

By integrating these approaches, we developed a new method to identify target gene(s) based on a gene score for the association variants (Table 2). This process involved consideration of significant raw SNP-to-gene linking values (p-values or scores) from each database and transforming them into scores ranging from 0 to 1 using quantile transformation. This transformation ensured the harmonization of scores across different sources using the scikit-learn of Python. Additionally, when multiple cell types were involved in SNP-to-gene evaluation within each database, only the top-scored SNP-gene pair was considered. Next, we integrated the scores of each variant-gene pair obtained from different sources into a unified framework.

Table 2 List of strategies for SNP-to-gene linking approaches and identification of the relevant cell types involved in signals

Full size table

For each variant-gene pair $g$, we calculated the weighted total score (${S}_{g}$) as follows:

$${S}_{g}=\sum\limits_{i=1}^{8}{W}_{i}{S}_{i}$$

where $i$ corresponds to the SNP-to-gene linking databases, ${W}_{i}$ corresponds to the weight assigned to the database $i$, and ${S}_{i}$ corresponds to the transformed variant-gene score derived from category $i$ (with six significant cis-eQTL databases serving as a single piece of evidence). The weight assigned to each database reflects the level of confidence in its evidence, as informed by prior knowledge and previous studies^2,22.

We computed a weighted score for each variant-gene pair to identify the target genes for each signal. For signals with multiple variants, the gene with the highest score was selected. This process resulted in a ranked list of genes for each signal. To pinpoint the most relevant genes for each signal, we applied an ‘elbow point’ cutoff, determined as the inflection point in the gene score curve. This cutoff served as a threshold for the gene list, excluding genes beyond it from being considered primary targets for downstream analysis. Ultimately, the top-ranked gene(s) for each signal were identified and referred to as the top-tier genes.

Identifying the relevant cell types for each signal

We leveraged various databases to pinpoint the specific immune cell types in which each association signal might be active. These databases often vary in their resolutions regarding the cell types, ranging from exploring a few major cell types to examining several dozen immune cell types, which may complicate the integration of information from different studies. To address this issue, we consolidated the cell type information into six primary immune cell types, specifically: monocytes, B cells, dendritic cells, CD4 + T cells, CD8 + T cells, and NK cells. This consolidation was achieved using data from six main databases: ENCODE snATAC-seq and DNase-seq data^24,25,26, Epimap chromHMM¹⁷, ABC model¹⁹, Fantom5 CAGE-seq^27,28, and Significant cis-eQTL (Table 2). ENCODE snATAC-seq and DNase-seq are used to identify open chromatin regions across various cell types, while Epimap chromHMM is employed to annotate functional genomic regions. FANTOM5 utilizes CAGE-seq to map transcription start sites and enhancers across cell types.

We implemented an evidence-based approach to determine the cell types relevant to a given signal. This involved calculating the number of databases that corroborated each of the six identified cell types for each variant within the signal. Additionally, we computed the cumulative number of databases supporting each cell type involved in the signal. By setting a cutoff at the ‘elbow point’ in the cumulative number of supporting databases for each cell type, we could determine the cell types that are potentially relevant to the association signal.

Pathway enrichment analysis for target genes

We performed Kyoto Encyclopedia of Genes and Genomes(KEGG) and Reactome pathway enrichment analysis on the target genes identified for each AD using the R package Clusterprofiler^29,30, with default parameters. Additionally, as we ranked the signals within each AD as well as the target genes, we also conducted Gene Set Enrichment Analysis (GSEA)³¹ using the package Clusterprofiler. GSEA is a rank-based method used to determine whether a predefined gene set is significantly enriched at either the top or bottom of a ranked list. For visualization of the results, we utilized the R package ggplot2³² to plot the figures.

Protein-protein interaction (PPI) network analysis for functional connections and differences across ADs

Distinguishing functional variations within large sets of immune-related genes can pose a significant challenge. This is primarily because the dominant immune functions tend to overshadow less prevalent ones, complicating the detection of subtle differences or unique functionalities between two gene sets. To circumvent this issue, we constructed a PPI network. This was achieved by importing all top-tier target genes associated with ADs into the STRING database (version 12.0, https://string-db.org/). This process utilized protein-protein interaction evidence gathered from high-throughput experiments, curated databases, and co-expression sources. The minimum confidence score for interactions was set to 0.4 (medium confidence). The visualization of the resulting network was achieved using Cytoscape³³ (version 3.10.1). The modules were determined using the default parameters of the MCODE plugin, facilitated through the clusterMaker2³⁴ (version 2.3.4) within Cytoscape. We assessed the networks considering the quantity of genes within each module, along with the node degree and closeness centrality measures. These parameters signify the significance of individual genes within the network.

We performed pathway enrichment analyses for each module utilizing a variety of databases and tools, including KEGG, Reactome, Wikipathways, and gene ontology (GO) biological process enrichment analyses. These were performed using the g:profiler from EnrichmentTable³⁵ (Version 2.0.5) plugin within Cytoscape to identify significantly enriched pathways and biological process terms. By using modules as functional units for pathway analysis, we were able to perform a detailed functional analysis of target genes across various ADs. This approach also facilitated the comparison of these target genes, enabling us to identify functional differences between the ADs.

Comparison of modules across ADs utilizing GSEA

Using the modules identified from the PPI network, along with the genes in these modules, we constructed custom background gene sets, with each module constituting a gene set. These modules and genes were adapted to meet the requirements of the standard GMT format. We then performed Gene Set Enrichment Analysis (GSEA) using the ranked top-tier target genes for each disease with the ClusterProfiler package. A positive normalized enrichment score (NES) was obtained from this analysis, reflecting the tendency of members within each gene set (module) to cluster at the top of the ranked list of associated genes for each AD. These positive NES values were then used to compare the relative importance of the modules for each AD.

Identifying drug candidates and targets based on the functional modules

We utilized the ChEMBL (https://www.ebi.ac.uk/chembl/, Release 30) and DrugBank (https://go.drugbank.com/) databases for a target-drug search using the top-tier target genes. These databases are comprehensive and manually curated repositories that provide detailed information on drugs, their targets, and clinical trials for specific disease indications. To confidently identify potential drug targets relevant to the disease, our analysis specifically focused on the targets of drugs that are currently in phase III or IV clinical trials.

Inclusion and ethics

As the included studies were approved by their respective independent review boards, no additional ethical approval was required for our study, which is based on summary statistics data.

Results

Diverse association signals even when the locus is shared

After reviewing GWAS studies for each autoimmune disease from the GWAS Catalog, we selected 28 distinct studies that covered the 15 most prevalent ADs. These were chosen based on either the largest cohort sizes or the highest numbers of associated variants that surpassed genome-wide significance (Table 1). Detailed information for each GWAS source is provided in Supplementary Table 1. These studies yielded a total of 2129 lead variants associated with ADs. To capture additional related variants, we expanded the lead variants to include those in high linkage disequilibrium (LD, r² ≥ 0.8) with them, based on the 1000 Genomes Project for European or East Asian populations. This expansion resulted in 38,402 tag SNPs, providing a more comprehensive set of genetic variants for further analysis (Supplementary Table 2).

We grouped these variants into 502 loci based on their genomic locations. Of these, 255 loci were associated with at least two diseases, which we refer to as pleiotropic loci. Within each locus, we defined association signals as LD clusters of variants with r² ≥ 0.8 and treated distinct clusters within the locus as independent if they were not in high LD with one another, yielding 1800 potentially independent signals. These included 265 pleiotropic signals defined as associated with multiple diseases and 1535 disease-specific signals (Table 1, Supplementary Table 2). Locus sizes varied widely, with a median of 638.4 kb (ranging from 500 kb to 4.14 Mb). The HLA region was a notable outlier, spanning 8.22 Mb. Multiple independent signals per locus were quite common, ranging from 1 to 22 (median 2), whereas the HLA locus presented a remarkable 230 signals (Supplementary Table 2).

Locus-sharing (255/502, 50.8%) across ADs is much more common than signal-sharing across the diseases (265/1800, 14.7%). This pattern is further demonstrated by the clustering of diseases based on association signals rather than loci, as indicated by the Jaccard similarity scores (Fig. 1A–B; Supplementary Fig. 1A–B). Three distinct groups of diseases were identified using complete-linkage hierarchical clustering on Jaccard overlaps derived from the signal matrix: Group 1 (SJS, SS, SLE, RA, PBC), Group 2 (IBD, AS, PV, MS, CD, BD), and Group 3 (T1D, ATD, VIT, AA) (Fig. 1B). We also ranked the association signals within each disease (see “Ranking association signals in each disease”), and examined the pleiotropic signals and their corresponding ranked association values. This analysis revealed similar clustering patterns (Fig. 1C).

**Fig. 1: Genetic locus-sharing and signal-sharing across autoimmune diseases.**

This suggests that comparing diseases at the signal level may offer a more effective approach for identifying both shared and specific association mechanisms. The presence of independent association signals within a shared locus suggests mechanistic differences across these ADs, potentially involving different target genes, different relevant cell types or regulatory mechanisms. Comparing the similarities and differences in the association architecture of these diseases may provide a unique perspective for a deeper understanding of autoimmunity.

Cell type specificity of the association signals

Differences in functionality across cell types might explain why diseases exhibit different association signals when they share the same locus. Understanding the cellular context of various ADs could also lead to more precise treatments. To determine the relevant cell types for each association signal, we used an evidence-based approach that integrates multiple data types. This includes significant cis-eQTL data, histone modification marks, chromatin accessibility information, and other genomic features (see “Identifying the relevant cell types for each signal”).

To account for differences in cell type resolution across datasets and to improve statistical power, our analysis focused on six major immune cell types: CD4⁺ T cells, CD8⁺ T cells, B cells, NK cells, monocytes, and dendritic cells. We standardized cell classifications by aggregating higher-resolution cell subsets from the original datasets into these broader categories. However, for specific loci of particular interest, we referred back to the original studies with more detailed cell type annotations to facilitate a more nuanced interpretation.

Of the 1800 autoimmune association signals analyzed, relevant cell types were identified for 1693 signals (Supplementary Table 2). Among these, 856 (or 50.6%) were predicted to be functional in three or fewer cell types. Notably, 349 signals (or 20.6%) appeared to be specific to one of the six immune cell types, suggesting potential cell type specificity (Supplementary Fig. 2A). We examined the distribution of cell types for each disease based on the 856 signals ascribed to 1–3 cell types, aiming to assess potential cell type specificity in these associations. Our results show that CD4 + T and CD8 + T cells were the most frequently involved cell types among signals exhibiting variation across cell types. In contrast, NK and dendritic cells were less commonly implicated across the six cell types.

Broadly, these findings suggest that lymphoid cells are more likely to be implicated in ADs compared to myeloid cells (Fig. 2A). Additionally, when comparing cell type associations for each AD against the overall distribution across all ADs, we observed a significantly higher involvement of B cells in SLE and PBC relative to other diseases. Detailed results of these comparisons are provided in Supplementary Fig. 2B. We also attempted to analyze the distribution of more detailed sub-cell types using specific databases; however, limited statistical power prevented the detection of significant differences at the sub-cell type level across diseases. More genomic data with higher cell type resolution are needed to fully understand the relevant cell types for these diseases.

**Fig. 2: Distribution of immune cell types within each AD and specific immune cell types involved in signals.**

Cell type specificity determines the associations of IL10 to different diseases

The role of cell type specificity in various ADs is emphasized by the associations observed at the IL10 locus. There are two independent signals around the IL10 locus linked to different ADs. Signal 1 (G1, depicted in Fig. 2B), located upstream of IL10, includes three reported variants (rs1518111, rs1800871, rs3024490) in the GWAS Catalog. They have near absolute LD to each other, and are associated with Behcet’s Disease (BD). Conversely, Signal 2 (G2, Fig. 2B) is situated downstream of G1 and contains three reported variants (rs3024493, rs3024505, rs3122605) associated with SLE, IBD, and T1D in GWAS Catalog (Supplementary Table 3). Data from both ENCODE DNase-seq and snATAC-seq (Fig. 2C) indicate that signal G1 is situated in an open chromatin region specific to monocytes/dendritic cells. This is further corroborated by significant cis-eQTL data from DICE and ImmuNexUT, which suggest that G1 variants are monocyte-specific eQTLs for IL10 expression (Fig. 2D–E), with the risk allele associated with reduced IL10 expression (Supplementary Fig. 3A). Moreover, based on HOCOMOCO human transcription factor-binding models³⁶, the risk allele rs1518111-T potentially modify the binding affinity of IRF4/8, providing a probable mechanism for the association (Supplementary Fig. 3B).

In contrast, signal G2 in the IL10 locus is associated with SLE, T1D and IBD^37,38, and the region containing G2 appears to be accessible ubiquitously in various immune cells according to ENCODE DNase-seq and snATAC-seq data (Fig. 2C). We conducted colocalization analysis for this region using HyPrColoc³⁹, which supported shared genetic etiology for SLE, T1D, and IBD (Supplementary Fig. 3C–E). Different from the G1 signal with strong evidence of cell type-specific eQTLs, no significant eQTL data for the G2 variants were detected in either the DICE or ImmuNexUT databases (Fig. 2D–E). For this locus, it seems that both different cellular contexts and regulatory mechanisms contribute to the different disease associations, with IL10 the most likely target gene in both cases.

Our evidence-based approach to identify relevant cell types for different signals at the IL10 locus agrees with prior studies: the BD-risk allele of signal G1 (rs1518111-T) reduces IL10 expression in purified monocytes^40,41, whereas the SLE-risk allele of signal G2 is associated with higher IL10 at both mRNA and protein levels, and increased proportions of IL-10+p-ELK-1+ cells in B cells, T cells, and monocytes in SLE patients⁴². To further corroborate cell type specificity using public resources, we analyzed bulk RNA-seq from isolated immune subsets obtained from the GEO to compare case–control IL10 expression patterns by disease and cell type. For BD, we used GSE61399⁴³, which includes CD4 + T cells (BD n = 9; healthy controls n = 3) and CD14+ monocytes (BD n = 8; controls n = 9); isolated B cells were not available. For SLE, we used GSE148601⁴⁴, which profiles T cells (SLE n = 21; controls n = 14), B cells (SLE n = 9; controls n = 10), and monocytes (SLE n = 15; controls n = 14). After standard normalization and per–cell type comparisons using two-sided Wilcoxon rank-sum tests, IL10 expression in BD was significantly decreased in monocytes (P = 0.015) but not in T cells (P = 0.282) (Supplementary Fig. 4A). In SLE, IL10 expression was significantly increased in monocytes (P = 0.007), T cells (P = 0.006), and B cells (P = 0.017) (Supplementary Fig. 4B). All these lines of evidence support distinct, disease- and cell type–specific effects at the IL10 locus. The G1 signal is specifically associated with BD and links the BD-risk allele to reduced IL10 expression in monocytes, consistent with monocyte-specific regulatory activity. By contrast, the G2 signal is associated with IBD, T1D, and SLE; its SLE-risk allele corresponds to increased IL10 expression across multiple immune cell types, indicating a broader, multi–cell type regulatory mechanism (Fig. 2F).

Genetic evidence from the IL23R/IL12RB2 locus supports targeted treatment

For the locus around IL23R/IL12RB2, we identified nine signals associated with relevant cell types and linked to multiple ADs. They include signals mostly specific to CD4 + T cells or ubiquitous for all six cell types (Supplementary Table 3). Upon detailed examination, Signal 4 (G4, left), with the reported variant rs2295359, intronic to IL23R, is associated only with psoriasis (PV). In contrast, Signal 2 (G2, right) includes seven variants associated with various diseases and likely targets IL12RB2 (Supplementary Table 3, Fig. 3A). G2 appears to be specific to CD4 + T cells (Th1) and NK cells (Fig. 3B). On the other hand, G4 seems to be specific to CD4 + T cells, including follicular helper T cells, Th17, and memory regulatory T cells, based on cis-eQTL data from DICE (Fig. 3C). These findings are consistent with the expression patterns of the IL23R or IL12RB2 genes in immune cells, as documented on the Protein Atlas (https://www.proteinatlas.org/).

**Fig. 3: Genetic signals in the IL23R/IL12RB2 locus.**

IL-12 and IL-23 signaling have been confirmed to drive aberrant Th1 and Th17 immune responses, respectively, contributing to ADs⁴⁵. The IL-23 signal pathway comprises the p19 subunit (encoded by IL23A) and the p40 subunit (encoded by IL12B), with its receptor consisting of IL23R and IL12RB1. IL-12 is composed of the p35 subunit (encoded by IL12A) and the p40 subunit (encoded by IL12B), with its receptor consisting of IL12RB1 and IL12RB2 (Fig. 3D). IL23R is associated with PV, IBD, BD, and AS, while IL12RB2 is associated with multiple ADs except for PV (Supplementary Fig. 5). This indicates that IL-23 signaling, rather than the IL-12 pathway, is likely pivotal to the pathogenesis of PV and IBD. The p19 subunit is unique to the IL-23 signaling, while the p40 subunit is shared with both IL-12 and IL-23 signals. Risankizumab, Tildrakizumab, and Guselkumab specifically target p19 in the IL-23 pathway, whereas Ustekinumab targets p40, thus potentially inhibiting both IL-23 and IL-12 pathways⁴⁶.

Based on the genetic findings, focusing on p19 may represent an effective and more specific strategy than targeting p40 for treating PV and IBD. In recent years, therapies targeting p19 have been used more frequently than those targeting p40⁴⁷. Recent experimental evidence supports this hypothesis, showing that anti-p19 antibodies are safe and do not increase the risk of adverse events when treating patients with moderate-to-severe PV compared to Ustekinumab⁴⁸. A comprehensive analysis of association signals in the loci of IL23R/IL12RB2, IL12RB1, IL23A, IL12A and IL12B also suggests differential involvement of the IL23 and IL12 pathways in ADs (Supplementary Fig. 5). Psoriasis and IBD appear to involve the IL23 pathway, likely showing regulatory activity in Th17 cells, while SLE, PBC, and MS are more likely to involve the IL12 pathway and activation in Th1 cells. Genetic analyses nominate IL12A (IL12p35) as a potential therapeutic target for SLE, MS and PBC, and targeting IL23A (IL23p19) might be a more specific and safer approach compared to targeting IL12p40.

Other examples include IL12A, encoding p35 of IL-12, which has one signal associated with celiac disease (CD) and BD and appears to be specific to monocyte/dendritic cells. In contrast, another signal in the same locus is linked to PBC and SLE and appears to be B cell-specific (Supplementary Table 3, Supplementary Fig. 6A–E). In the WDFY4 locus, there are two signals and both are associated with SLE but differ in their cellular specificity: one signal appears to be specific to naïve regulatory T cells, while the other seems to be functional exclusively to monocytes and neutrophils (Supplementary Table 3, Supplementary Fig. 6F–H). Investigating the underlying mechanisms of these specific associations could lead to a deeper understanding of disease pathogenesis and promote the development of precision treatments for ADs.

Functional comparison of pleiotropic and disease-specific signals

We functionally annotated these 1800 signals associated with various ADs, including 265 pleiotropic and 1535 disease-specific signals, using genomic data including cis-eQTLs, enhancer-gene linking and Promoter-capture Hi-C (see “Identifying the target gene(s) for each signal”). Our analyses revealed a significant functional enrichment of the pleiotropic signals, such as cis-eQTLs, compared to disease-specific signals (97% vs 86%, p = 2.298e-08, one-sided Fisher’s exact test). This indicates that pleiotropic signals have a stronger association with gene expression regulation, potentially being more widespread across cell types and exhibiting more robust or detectable connections. Similarly, pleiotropic signals were significantly enriched in detected active enhancers than disease-specific signals, as identified using EpiMap data (71% vs 48%, p = 2.726e-12). Additionally, the pleiotropic signals were also more likely to be supported by the ABC model (75% vs 50%, p = 7.398e-14) and the promoter-interacting data from PCHiC (89% vs 80%, p = 0.0003191) (Supplementary Fig. 7). These findings highlight the crucial roles of the pleiotropic signals in mediating autoimmunity across multiple diseases.

Identifying target genes for the association signals

We developed a scoring system to identify the target genes for each association signal, utilizing a robust approach with a combination of five SNP-to-gene linking approaches (“Identifying the target gene(s) for each signal”, Table 2). Out of these 1800 signals, we identified 1554 target genes from 1740 signals using this in-house scoring system. Notably, for most of the signals (68.3% or 1189 signals), a single target gene can be identified using this approach. Additionally, we can narrow down the target genes to two for 17% of the signals (295 signals). Three or more target genes are detected for 14.7% (256 signals) of the signals (Supplementary Table 2).

Furthermore, we ranked the target genes for each disease based on the number of supporting independent association signals and their significance from GWAS studies (Supplementary Table 4). This ranking provides insight into the relative significance of each target gene for a given disease, providing a valuable resource for understanding the functional implications of genetic associations. From the 1554 target genes, we identified 503 genes (32.4% of the total) associated with at least two ADs. Among these shared target genes, 90 were associated with five or more ADs (Fig. 4A), suggesting their crucial roles in autoimmunity.

**Fig. 4: Target Genes across ADs and functional enrichment.**

Excluding MHC Class II genes, the top three pleiotropic genes, SH2B3, STAT4, and BACH2, are shared by 11, 10, and 9 diseases, respectively (Supplementary Table 4). The top 10 target genes for each AD are also presented in Fig. 4B. For instance, in IBD, genes such as IL23R, NOD2, and TNFSF15 were identified as crucial association genes. As discussed above, the IL23R gene is involved in the IL23/Th17 signaling pathway, essential for maintaining intestinal immune homeostasis. NOD2 is specific to IBD and plays a key role in innate immunity, particularly in microbial recognition and autophagy⁴⁹, while TNFSF15 promotes antimicrobial pathways and is currently being explored as a potential therapeutic target for IBD treatment⁵⁰ (Fig. 4B).

Among the top ten target genes for each of the 15 diseases, we identified 111 unique genes. Remarkably, 80.2% (89 out of 111) of these unique genes were also associated with at least one other AD, although not all of them made the top ten list for each disease (Supplementary Fig. 8A). This proportion is notably greater than the proportion of all disease-shared genes (503/1554, 32.4%), suggesting that the highly ranked genes are more likely to play central functional roles common to multiple diseases. However, it is possible that detection bias may have influenced these results, as top-ranked genes tend to exhibit stronger associations, resulting in increased detection power.

Most of the target genes were specific to a single disease (Fig. 4C, Supplementary Table 4). Overall, sharing of target genes across different ADs is more prevalent than sharing of the underlying signals, with 32.4% of target genes being shared compared to 14.7% of the signals. This indicates that variations in cellular contexts or regulatory mechanisms may further differentiate disease associations.

Pathway characterization of the target genes

KEGG and Reactome pathway enrichment analyses were conducted on the target genes for each disease. The pathways significantly enriched for each disease are shown in Supplementary Table 5. The most significantly enriched pathways, ranked by adjusted p-values, were commonly shared across multiple ADs. Notably, these include T-cell differentiation and response to viral infection according to KEGG (Fig. 4D, Supplementary Fig. 8B, Supplementary Table 5). Similarly, according to the Reactome database, enrichment in TCR signaling, Interferon responses, and Interleukin signaling was observed (Fig. 4E, Supplementary Fig. 8C, Supplementary Table 5). When pathway enrichment was performed exclusively on genes shared by various ADs (Supplementary Table 6), we observed similar pathway enrichment patterns. In contrast, disease-specific genes exhibited limited pathway enrichment, indicating a limited knowledge of their specific functions in autoimmunity (Supplementary Table 7).

Despite these commonalities, certain pathways were significantly enriched only in specific diseases. For instance, based on the Reactome enrichment, the initial triggering of complement was exclusively enriched in SLE, the VEGFA-VEGFR2 pathway was unique to ATD, and the formation of the cornified envelope/keratinization was specific to PV. Signaling by CSF3 (G-CSF) and FCERI-mediated MAPK activation were unique to MS. Additionally, TRAF6-mediated IRF7 activation was enriched in both SLE and ATD, whereas Interleukin-1 family signaling was solely enriched in IBD and CD (Fig. 4F, Supplementary Table 5).

Analysis of protein interaction network reveals common and specific functional modules

To further characterize the shared and specific functions across the ADs, we constructed a PPI network using the 1554 target genes as input to the STRING database. This network incorporated interaction evidence that was derived from experimental data, curated databases, and co-expression from the STRING. Protein interaction pairs were included in the network based on a medium-confidence score threshold of 0.4. About 65.4% of the target genes (1016 out of 1554) with a total of 8978 interactions were incorporated into the network (Supplementary Fig. 9A). Notably, 80% of the interacting genes (813 out of 1016) converged into 32 functional clusters (modules) based on MCODE with default parameters via the clusterMaker2 plugin in Cytoscape (Fig. 5A, Supplementary Table 8).

**Fig. 5: PPI network and module Identification.**

Among the 813 genes identified across these 32 modules, roughly 35% of the genes were pleiotropic. The median proportion of pleiotropic genes across these modules was 34%, with significant variations among modules (Fig. 5B). As expected, the gene products of the Human leukocyte antigens (HLA) genes exhibited strong interactions and were clustered together in Module C1. Significantly, by analyzing the pleiotropic genes in each module across different diseases, we identified five modules - C1, C3, C7, C12, and C14 - that were predominantly shared across various ADs (Fig. 5C). Similar patterns were observed when examining the proportion of both pleiotropic and disease-specific genes (Supplementary Fig. 9B).

Assessing the role and relative significance of individual modules in each disease

We treated the genes within each module as gene sets and performed gene set enrichment analysis (GSEA) to evaluate their enrichment in each disease. This analysis utilized the genes associated with each disease, incorporating their respective ranking values within each disease context (“Comparison of modules across ADs utilizing GSEA”). This GSEA-based method allows us to evaluate the relative significance of each module for a specific disease, as reflected by the Normalized Enrichment Score (NES). A positive NES signifies the extent to which a module is overrepresented among the top-ranked genes for each disease (Fig. 6A). Additionally, we annotated the functions of these modules using databases such as KEGG, Reactome, Wikipathways, and GO biological processes. From these annotations, we selected the top five most enriched pathways for each module (Supplementary Table 9). A summary of the modules and their annotated major functions is presented in Fig. 6B.

**Fig. 6: Assessing the role of modules in ADs and functional analysis for each module.**

Modules shared across multiple diseases may play critical roles in the underlying mechanisms of autoimmunity (Fig. 6A). For instance, Module C1 is primarily enriched in HLA-mediated antigen processing and presentation. Module C3 is linked to cytokine-cytokine receptor interaction, chemokine signaling pathway, and TNFR2 non-canonical NF-κB signaling. Module C5 shows enrichment in regulatory circuits of STAT3 signaling, natural killer cell-mediated cytotoxicity, and interleukin-2 family signaling. Module C7 is linked to differentiation pathways of Th1, Th2, and Th17 cells, as well as interferon signaling pathways. Module C12 is enriched for nuclear receptor signaling and TNFR1-induced proapoptotic signaling. Lastly, Module C14 is associated with the production of reactive oxygen species (ROS) in phagocytes and the MAPK signaling pathway.

Potential drug repurposing based on shared modules across the diseases

Module C1 comprises 32 genes from the major histocompatibility complex (MHC) region and contains associated genes for all 15 ADs. Among these, 24 genes exhibit varying levels of pleiotropy (Fig. 5B). Our analysis revealed enrichment of HLA class I genes for AS and PV, and HLA class II genes for the other 13 ADs. Additionally, we observed that the top 10 proteins—those with the highest degree and closeness centrality within this module—are frequently shared across the ADs (Fig. 7A).

**Fig. 7: Shared and specific modules of ADs.**

Module C7 involves CD4 + T helper cell differentiation and consists of 180 proteins encoded by genes associated with all 15 ADs (Fig. 7B). Our analysis identified 20 drug targets in this module, involving 28 drugs in phase III or phase IV clinical trials from the ChEMBL and DrugBank databases (Supplementary Table 10). These drugs hold potential for repurposing. For example, NATALIZUMAB, a monoclonal antibody that targets ITGA4, is approved by FDA for the treatment of multiple sclerosis and Crohn’s disease. Since ITGA4 is associated with MS, IBD, Crohn’s disease, autoimmune thyroid disease, and ankylosing spondylitis, natalizumab may potentially be effective for these conditions. However, several caveats and considerations remain before this approach can be widely adopted. Additionally, FOSTAMATINIB, the first FDA-approved spleen tyrosine kinase (SYK) inhibitor for the treatment of chronic immune thrombocytopenia⁵¹, targets genes associated with PV, MS, IBD, and SLE, indicating potential opportunities for repurposing (Fig. 7B).

Module C3, consisting of 63 proteins encoded by genes associated with 15 ADs, is primarily enriched in cytokine-cytokine receptor interactions and chemokine signaling pathways (Supplementary Fig. 9C). Within this module, we identified 15 targets from 40 drugs that are currently in phase III or phase IV clinical trials (Supplementary Table 10). For instance, IL-6 inhibitors such as SIRUKUMAB, OLOKIZUMAB, and SILTUXIMAB, may represent some of the most promising candidates for repurposing in ADs. IL-6 is a key molecule in Module C3 and plays a central role as a cytokine involved in the pathogenesis of numerous autoimmune conditions. Notably, OLOKIZUMAB has been used for RA treatment⁵². Additionally, CYCLOSPORINE targets PPIA/PPIF--another important molecule in this module--for the treatment of RA and psoriasis, as listed in DrugBank. Our analysis further indicates that PPIF is associated with multiple diseases, including MS, IBD, CD, and AS.

To systematically evaluate drug repurposing opportunities across all modules, we classified drug-disease pairings into three categories: drugs already approved for multiple ADs (27 drugs), drugs actively tested for new autoimmune indications (16 drugs), and novel repurposing candidates not yet explored in ADs (7 drugs) (Supplementary Table 11). By prioritizing drugs within these categories, our framework facilitates the identification of the most promising candidates for future research and clinical trials, ultimately supporting more efficient and targeted therapeutic development for autoimmune conditions.

Specific modules for ADs

Intriguingly, we noted specific modules exhibiting unique functions in certain diseases. These modules may provide information for targeted treatment of autoimmune conditions. For example, Module C20, enriched for mature B cell differentiation, emerges as one of the top functional modules in SLE, MS, IBD and RA (Fig. 6A). On the other hand, module C16 is highly significant only in SLE, involving complement and coagulation cascades and B cell-mediated immunity (Fig. 7C). The C3 gene is a molecule in module C16 and is targeted by PEGCETACOPLAN. This drug may mitigate complement-mediated kidney damage in glomerular diseases where complement plays a pathogenic role⁵³.

Notably, module C8, which shows high specificity in psoriasis, is significantly enriched for processes related to keratinization and epithelial cell differentiation (Fig. 7D). Psoriasis, a chronic inflammatory skin disease, is characterized by acanthosis, abnormal keratinization, and inflammatory cell infiltrates⁵⁴. It involves an atypical keratinization process, and the crucial role of keratinocytes in triggering and perpetuating the inflammatory state emphasizes the importance of targeting these cells for effective treatment⁵⁵.

Discussion

It is widely reported that ADs have widespread sharing of genetic effects and immunopathology. However, without thorough examination, locus sharing is often interpreted as a demonstration of shared association mechanisms. Our study demonstrates that locus-sharing is indeed prevalent, however, signal-sharing is far less common, even when a locus is shared by multiple diseases. Thus, identifying target genes, active cell types, and regulatory mechanisms at the signal level is crucial for comparing the differences and similarities across ADs. This approach enhances our understanding of the associations and pathogenesis mechanisms of these diseases.

We expanded each lead variant to include all variants in strong LD (r² >= 0.8), and used this threshold to define signals. This is consistent with established GWAS default practice (e.g., PLINK, HaploReg, LDlink) and earlier studies^56,57. This threshold balances sensitivity and specificity, capturing the true regulatory variants while minimizing noise from variants less likely to be involved in association causality. Although fine-mapping and colocalization can provide more precise delineation of association signals, these approaches require detailed summary statistics and greater statistical power, which may not always be available for many autoimmune diseases. Thus, LD expansion at r² >= 0.8 represents a widely accepted, evidence-based approach for defining candidate signals. Importantly, for signals defined by strong LD, we performed colocalization analyses for a number of selected pleiotropic loci and found that these signals are indeed associated with multiple traits (Supplementary Fig. 3C–E). This result further supports the validity of our approach and demonstrates that the LD-based signal definition effectively captures biologically meaningful, shared genetic architecture. Looking ahead, Bayesian fine-mapping frameworks can further narrow causal variants within each signal (e.g., credible sets with posterior inclusion probabilities, multi-ancestry models). In parallel, Mendelian randomization—particularly two-sample MR using cis-eQTL or cis-pQTL instruments with sensitivity analyses—can strengthen causal inference from target genes to disease risk and help distinguish mediation from horizontal pleiotropy. As larger, harmonized summary statistics across all studied ADs become available, including diverse ancestries, these approaches will refine causal attribution, improve target prioritization, and clarify population-shared versus population-specific effects.

We found that CD4 + T cells were the most prominent cells involved in the associations of ADs. Multiple studies have consistently confirmed the crucial role of CD4 + T cells in the pathogenesis of autoimmune diseases^4,12. Additionally, we discovered that B-cell signals are significantly enriched in SLE, consistent with studies showing that SLE variants are particularly enriched in B cells^58,59. Association signals can be strongly cell-type or disease-specific. We explored the conditions under which disease-associated signals exert their functions and identified several signals that, despite sharing the target gene(s), function under different cellular contexts across various diseases, including signals around IL10, IL23R/IL12RB2, IL12A and WDFY4.

GWAS have successfully identified numerous variants associated with complex traits, with over 90% located in non-coding regions of the genome⁶⁰. Identifying the target genes of these non-coding variants remains a significant challenge. Each GWAS locus typically comprises multiple genes, and non-coding variants may not necessarily regulate the nearest gene²³. Large-scale gene expression quantitative trait loci (eQTL) datasets from various immune cells have demonstrated their value in linking disease variants to their target genes^11,12. Various strategies also have been developed to link regulatory SNPs to their target genes in various cell types. These include the Activity-by-contact (ABC) model and EpiMap to predict enhancer-gene connections in each cell type^17,19. Recently, integrated SNP-to-gene linking tools such as V2G, L2G, and cS2G have been developed to enhance the identification of target genes of genetic variants^21,23. However, gene regulation can be cell-type or context-specific, which restricts the power to detect target genes, especially for immune-related diseases for which the general tools do not have adequate resolution and adequate resource data. In this study, we developed a scoring scheme that combines these strategies, particularly making use of rich multi-omics data generated from various immune cell types, to identify the target genes for each signal. This approach enhances our ability to identify genes associated with ADs.

If a GWAS study is sufficiently powered, it is reasonable to prioritize the disease genes based on significant GWAS p-values, which is a reflection of effect size and population allele frequency. We prioritized the top-tier genes for each AD and found that 32.4% of genes were shared by at least two diseases. Despite the limited overlap in associated genes across these diseases, pathway enrichment analysis showed enrichment of the same top pathways, such as T-cell differentiation, Interferon signaling, and Interleukin signaling. This observation aligns with studies demonstrating that different AD groups have unique genetic association patterns but impact largely the same primary pathways^6,7. On the other hand, it is also a reflection of the limitations of pathway analysis, which is restricted by current knowledge and overshadowed by most prominent pathways.

Interestingly, besides sharing similar top pathways, we identified several pathways significantly enriched in specific ADs. For instance, the initial triggering of complement was only enriched in SLE, characterized by the production of autoantibodies against nuclear and cytoplasmic antigens, leading to immune-mediated tissue injury⁶¹. The VEGFA-VEGFR2 pathway was only enriched in ATD, with VEGFA being a critical factor in these diseases⁶². The Formation of the cornified envelope/Keratinization pathway was only enriched in PV, where premature keratinocyte differentiation disrupts the cornified envelope formation in this disease⁶³. TRAF6-mediated IRF7 activation was only enriched in SLE and ATD, with increased IRF7 expression promoting inflammation and autoantibody production⁶⁴. Lastly, IL-1 family signaling was enriched only in IBD and CD, highlighting the therapeutic potential of targeting IL-1 family cytokines, such as Canakinumab (anti-IL-1β) for IBD⁶⁵ and anti-IL-15 antibodies under investigation for CD⁶⁶.

It has been shown that when a protein is involved in a molecular process, its direct interactors often participate in the same process and function^67,68. Therefore, we utilized PPI-based clustering to investigate the similarities and differences across these diseases, which is hypothesized to provide better resolution in functional analysis when we could partition the target genes into PPI clusters. We constructed a PPI network using all top-tier target genes for the 15 ADs and this allowed us to group these genes into 32 clusters. This approach enabled us to better understand the functional similarities and differences across these diseases compared to enrichment analysis that uses all the associated genes collectively. From this analysis, we identified five significant common modules (C1, C3, C7, C12, and C14) and several specific modules (e.g., C20, C16, C8) by assessing the proportion of pleiotropic genes in each module or using GSEA on the ranked gene list of each AD. This implies that certain common modules may contribute to shared disease characteristics of ADs, while specific modules may have unique roles in individual diseases.

The process of drug discovery and development is fraught with risks and high costs, resulting in a relatively low success rate in translating discoveries into clinical applications. Research has indicated that GWAS can assist in identifying compounds suitable for drug repurposing. It has been shown that when a drug’s target is supported by underlying GWAS evidence, it can enhance the chances of its approval for clinical use^69,70. Our study prioritized disease-associated genes and identified both common and specific gene modules. The common modules could present opportunities for drug repurposing, while disease-specific modules or pathways could suggest potential new therapeutic targets for drug development and precision treatment.

This study has several limitations. First, the use of LD to differentiate signals associated with various ADs can be restrictive, particularly when there is intermediate LD among the signals. Second, identifying cell types responsive to association signals and the target genes is constrained by the availability and resolution of genomic and epigenomic data. This limitation is especially pronounced when association signals are active under specific conditions, such as infection or pathological states. Third, although we used immune cell-derived data to provide a consistent framework across all 15 ADs, critical non-immune cell types—such as intestinal epithelial cells (IBD), pancreatic beta cells (T1D), synovial fibroblasts (RA), and keratinocytes (psoriasis)—are not represented in our analysis. Fourth, although colocalization is a powerful approach for determining whether two traits share the same regulatory variants or for identifying target genes by linking GWAS signals to eQTLs, the limited availability of summary statistics for some ADs restricted our ability to perform comprehensive colocalization analyses. Fifth, variations in sample size across study cohorts may affect statistical power, with smaller cohorts identifying fewer genetic signals compared to diseases with much larger sample sizes. This disparity may limit the comparability of genetic signals across diseases. Sixth, most of the major GWAS we analyzed were conducted in European and East Asian cohorts. This ancestry predominance may limit the generalizability of our findings to underrepresented populations, where allele frequencies, LD structure, effect sizes, and environmental interactions can differ. As a result, some signals may be missed, and target gene, pathway, or cell type prioritizations may not fully translate across ancestries. Future work should incorporate larger, diverse-ancestry cohorts and population-specific functional genomics to improve transferability and ensure equitable relevance of the results. Finally, our findings from genetic associations should be considered as starting points that require further functional validation.

In conclusion, this study leveraged public datasets and integrated GWAS with comprehensive functional genomics data to identify relevant cell types and target genes at associated signals for 15 ADs. We explored both common and disease-specific functional modules by analyzing the PPI network, addressing the inherent complexity of these diseases. While we made significant efforts to identify shared and unique functional characteristics across various ADs, some challenges remain. Identifying shared major immune functional dysregulation is relatively straightforward, akin to recognizing the proverbial ‘elephant in the room’. However, detecting subtle differences that may be crucial to the specificities of different autoimmune disorders is considerably more challenging. Future research, encompassing genetic findings and genomic functional characterizations, may contribute to advances in this area. Despite these limitations, our study serves as a critical initial step toward understanding the insights offered by genetic findings through comprehensive genomic data analysis, particularly in the realm of autoimmune diseases.

Data availability

The summary statistics or the reported lead variants used in this study are available through the NHGRI-EBI GWAS Catalog (https://www.ebi.ac.uk/gwas/) using the accession numbers listed in Table 1. Significant cis-eQTL data on immune cells used in this study are available: DICE, ImmuNexUT (https://humandbs.dbcls.jp/en/hum0214-v9, E-GEAD-398), GTEx (https://gtexportal.org/home/downloads/adult-gtex/qtl, V8), BLUEPRINT (https://ega-archive.org/datasets/EGAD00001005200), eQTLGen (https://www.eqtlgen.org/cis-eqtls.html), and scRNA-seq of ADs (https://onek1k.org/). Enhancer-gene linking data on immune cells are available: EpiMap (https://personal.broadinstitute.org/cboix/epimap/links/links_corr_only/) and Activity-By-Contact (https://www.engreitzlab.org/resources/). Promoter-capture Hi-C data are available https://www.sciencedirect.com/science/article/pii/S0092867416313228#mmc4. Open Targets V2G and L2G are available https://platform.opentargets.org/downloads; cS2G (https://zenodo.org/records/6354007). ENCODE snATAC-seq and DNase-seq data are available https://www.encodeproject.org/matrix/?type=Experiment&control_type!=*&status=released&perturbed=false. Epimap chromHMM (https://compbio.mit.edu/epimap/). Fantom5 CAGE-seq data are available https://fantom.gsc.riken.jp/5/data/. Data on drugs and targets are available: ChEMBL (https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_30/) and DrugBank (https://go.drugbank.com/releases/latest). Source data for the figures are provided in the Supplementary Data: Fig. 1a–b (Supplementary Data S2); Figs. 2–3 (Supplementary Data S3); Fig. 4a–c (Supplementary Data S4); Fig. 4d–f (Supplementary Data S5); and Fig. 5 (Supplementary Data S8). All additional data supporting the results are included in the Supplementary Data or are available from the corresponding authors upon reasonable request.

Code availability

All analyses were conducted using open-source packages in Python 3.11.3 (scikit-learn 1.2.2, NumPy 1.24.3, and pandas 1.4.4) and R 4.3.2 (ggplot2 4.0.0, LDlinkR 1.4.0, corrplot 0.95, pheatmap 1.0.13, and clusterProfiler 4.10.0). The analysis code is publicly available at https://github.com/dangxiao21/15_ADs.

References

Conrad, N. et al. Incidence, prevalence, and co-occurrence of autoimmune disorders over time and by age, sex, and socioeconomic status: a population-based cohort study of 22 million individuals in the UK. Lancet 401, 1878–1890 (2023).
Article PubMed Google Scholar
Caliskan, M., Brown, C. D. & Maranville, J. C. A catalog of GWAS fine-mapping efforts in autoimmune disease. Am. J. Hum. Genet. 108, 549–563 (2021).
Article CAS PubMed PubMed Central Google Scholar
Gerussi, A., Soskic, B., Asselta, R., Invernizzi, P. & Gershwin, M. E. GWAS and autoimmunity: what have we learned and what next. J. Autoimmun. 133, 102922 (2022).
Article CAS PubMed Google Scholar
Farh, K. K.-H. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).
Article CAS PubMed Google Scholar
Harroud, A. & Hafler, D. A. Common genetic factors among autoimmune diseases. Science 380, 485–490 (2023).
Article CAS PubMed PubMed Central Google Scholar
Gokuladhas, S., Schierding, W., Golovina, E., Fadason, T. & O’Sullivan, J. Unravelling the shared genetic mechanisms underlying 18 autoimmune diseases using a systems approach. Front. Immunol. 12, 693142 (2021).
Article CAS PubMed PubMed Central Google Scholar
Demela, P., Pirastu, N. & Soskic, B. Cross-disorder genetic analysis of immune diseases reveals distinct gene associations that converge on common pathways. Nat. Commun. 14, 2743 (2023).
Article CAS PubMed PubMed Central Google Scholar
Myers, T. A., Chanock, S. J. & Machiela, M. J. LDlinkR: an R package for rapidly calculating linkage disequilibrium statistics in diverse populations. Front. Genet. 11, 157 (2020).
Article PubMed PubMed Central Google Scholar
Kolde, R., Laur, S., Adler, P. & Vilo, J. Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics 28, 573–580 (2012).
Article CAS PubMed PubMed Central Google Scholar
McLaren, W. et al. The ensembl variant effect predictor. Genome Biol. 17, 1–14 (2016).
Article Google Scholar
Schmiedel, B. J. et al. Impact of genetic polymorphisms on human immune cell gene expression. Cell 175, 1701–1715. e1716 (2018).
Article CAS PubMed PubMed Central Google Scholar
Ota, M. et al. Dynamic landscape of immune cell-specific gene regulation in immune-mediated diseases. Cell 184, 3006–3021. e3017 (2021).
Article CAS PubMed Google Scholar
Consortium, G. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Article Google Scholar
Kundu, K. et al. Genetic associations at regulatory phenotypes improve fine-mapping of causal variants for 12 immune-mediated diseases. Nat. Genet. 54, 251–262 (2022).
Article CAS PubMed Google Scholar
Võsa, U. et al. Large-scale cis-and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).
Article PubMed PubMed Central Google Scholar
Yazar, S. et al. Single-cell eQTL mapping identifies cell type–specific genetic control of autoimmune disease. Science 376, eabf3041 (2022).
Article CAS PubMed Google Scholar
Boix, C. A., James, B. T., Park, Y. P., Meuleman, W. & Kellis, M. Regulatory genomic circuitry of human disease loci by integrative epigenomics. Nature 590, 300–307 (2021).
Article CAS PubMed PubMed Central Google Scholar
Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).
Article CAS PubMed PubMed Central Google Scholar
Nasser, J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021).
Article CAS PubMed PubMed Central Google Scholar
Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384. e1319 (2016).
Article CAS PubMed PubMed Central Google Scholar
Gazal, S. et al. Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity. Nat. Genet. 54, 827–836 (2022).
Article CAS PubMed PubMed Central Google Scholar
Ghoussaini, M. et al. Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics. Nucleic Acids Res. 49, D1311–D1320 (2021).
Article CAS PubMed PubMed Central Google Scholar
Mountjoy, E. et al. An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nat. Genet. 53, 1527–1533 (2021).
Article CAS PubMed PubMed Central Google Scholar
Consortium, E. P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57 (2012).
Article Google Scholar
Luo, Y. et al. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic Acids Res. 48, D882–D889 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hitz, B. C. et al. The ENCODE uniform analysis pipelines. bioRxiv, 2023.2004. 2004.535623 (2023).
Lizio, M. et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol. 16, 1–14 (2015).
Article Google Scholar
Abugessaisa, I. et al. FANTOM enters 20th year: expansion of transcriptomic atlases and functional annotation of non-coding RNAs. Nucleic Acids Res. 49, D892–D898 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. Omics J. Integr. Biol. 16, 284–287 (2012).
Article CAS Google Scholar
Wu, T. et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation. 2, 3 (2021).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).
Article CAS PubMed PubMed Central Google Scholar
Villanueva, R. A. M., & Chen, Z. J. ggplot2: Elegant Graphics for Data Analysis (Taylor & Francis, 2019).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Article CAS PubMed PubMed Central Google Scholar
Utriainen, M. & Morris, J. H. clusterMaker2: a major update to clusterMaker, a multi-algorithm clustering app for Cytoscape. BMC Bioinforma. 24, 134 (2023).
Article Google Scholar
Raudvere, U. et al. g: Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 47, W191–W198 (2019).
Article CAS PubMed PubMed Central Google Scholar
Coetzee, S. G., Coetzee, G. A. & Hazelett, D. J. motifbreakR: an R/Bioconductor package for predicting variant effects at transcription factor binding sites. Bioinformatics 31, 3847–3849 (2015).
Article CAS PubMed PubMed Central Google Scholar
Eldjarn, G. H. et al. Large-scale plasma proteomics comparisons through genetics and disease associations. Nature 622, 348–358 (2023).
Article CAS PubMed PubMed Central Google Scholar
Jog, N. R. et al. Association of Epstein-Barr virus serological reactivation with transitioning to systemic lupus erythematosus in at-risk individuals. Ann. Rheum. Dis. 78, 1235–1241 (2019).
Article CAS PubMed PubMed Central Google Scholar
Foley, C. N. et al. A fast and efficient colocalization algorithm for identifying shared genetic risk factors across multiple traits. Nat. Commun. 12, 764 (2021).
Article CAS PubMed PubMed Central Google Scholar
Remmers, E. F. et al. Genome-wide association study identifies variants in the MHC class I, IL10, and IL23R-IL12RB2 regions associated with Behcet’s disease. Nat. Genet. 42, 698–702 (2010).
Article CAS PubMed PubMed Central Google Scholar
Nakano, H. et al. GWAS-identified CCR1 and IL10 loci contribute to M1 macrophage-predominant inflammation in Behçet’s disease. Arthritis Res. Ther. 20, 124 (2018).
Article PubMed PubMed Central Google Scholar
Sakurai, D. et al. Preferential binding to Elk-1 by SLE-associated IL10 risk allele upregulates IL10 expression. PLoS Genet. 9, e1003870 (2013).
Article PubMed PubMed Central Google Scholar
Tulunay, A. et al. Activation of the JAK/STAT pathway in Behcet’s disease. Genes Immun. 16, 170–175 (2015).
Article CAS PubMed Google Scholar
Kitagori, K. et al. Expression of S100A8 protein on B cells is associated with disease activity in patients with systemic lupus erythematosus. Arthritis Res. Ther. 25, 76 (2023).
Article CAS PubMed PubMed Central Google Scholar
Chyuan, I.-T. & Lai, J.-H. New insights into the IL-12 and IL-23: from a molecular basis to clinical application in immune-mediated inflammation and cancers. Biochem. Pharmacol. 175, 113928 (2020).
Article CAS PubMed Google Scholar
Daniele, S. G., Eldirany, S. A., Damiani, G., Ho, M. & Bunick, C. G. Structural basis for differential p19 targeting by anti-IL-23 biologics: correlations with short-and long-term clinical efficacy in Psoriasis. JID Innovat. 4, 100261 (2024).
Xu, S. et al. Treatment of plaque psoriasis with IL-23p19 blockers: a systematic review and meta-analysis. Int. Immunopharmacol. 75, 105841 (2019).
Article CAS PubMed Google Scholar
Blauvelt, A., Chiricozzi, A., Ehst, B. D. & Lebwohl, M. G. Safety of IL-23 p19 inhibitors for the treatment of patients with moderate-to-severe plaque psoriasis: a narrative review. Adv. Ther. 40, 3410–3433 (2023).
Lees, C., Barrett, J., Parkes, M. & Satsangi, J. New IBD genetics: common pathways with other diseases. Gut 60, 1739–1753 (2011).
Article CAS PubMed Google Scholar
Kadiyska, T., Tourtourikov, I., Popmihaylova, A.-M., Kadian, H. & Chavoushian, A. Role of TNFSF15 in the intestinal inflammatory response. World J. Gastrointest. Pathophysiol. 9, 73 (2018).
Article PubMed PubMed Central Google Scholar
Mullard, A. FDA approves first-in-class SYK inhibitor. Nat. Rev. Drug Discov. 17, 385–386 (2018).
PubMed Google Scholar
Abuelazm, M. et al. The efficacy and safety of olokizumab for rheumatoid arthritis: a systematic review, pairwise, and network meta-analysis. Clin. Rheumatol. 42, 1503–1520 (2023).
Article PubMed PubMed Central Google Scholar
Dixon, B. P. et al. Clinical safety and efficacy of pegcetacoplan in a phase 2 study of patients with C3 glomerulopathy and other complement-mediated glomerular diseases. Kidney Int. Rep. 8, 2284–2293 (2023).
Article PubMed PubMed Central Google Scholar
Nestle, F. O., Kaplan, D. H. & Barker, J. Mechanisms of disease. Psoriasis. N. Engl. J. Med. 361, 496 (2009).
Article CAS PubMed Google Scholar
Zhou, X., Chen, Y., Cui, L., Shi, Y. & Guo, C. Advances in the pathogenesis of psoriasis: From keratinocyte perspective. Cell Death Dis. 13, 81 (2022).
Article CAS PubMed PubMed Central Google Scholar
Marigorta, U. M., Rodríguez, J. A., Gibson, G. & Navarro, A. Replicability and prediction: lessons and challenges from GWAS. Trends Genet. 34, 504–517 (2018).
Article CAS PubMed PubMed Central Google Scholar
Abell, N. S. et al. Multiple causal variants underlie genetic associations in humans. Science 375, 1247–1254 (2022).
Article CAS PubMed PubMed Central Google Scholar
Harley, J. B. et al. Transcription factors operate across disease loci, with EBNA2 implicated in autoimmunity. Nat. Genet. 50, 699–707 (2018).
Article CAS PubMed PubMed Central Google Scholar
Cho, J. H. & Feldman, M. Heterogeneity of autoimmune diseases: pathophysiologic insights from genetics and implications for new therapies. Nat. Med. 21, 730–738 (2015).
Article CAS PubMed PubMed Central Google Scholar
Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362–9367 (2009).
Article CAS PubMed PubMed Central Google Scholar
Tsokos, G. C. Systemic lupus erythematosus. N. Engl. J. Med 365, 2110–2121 (2011).
Article CAS PubMed Google Scholar
Li, Z. et al. Single-cell RNA sequencing depicts the local cell landscape in thyroid-associated ophthalmopathy. Cell Rep. Med. 3, 8 (2022).
Arul, S., Dayalan, H., Jegadeesan, M. & Damodharan, P. Induction of differentiation in psoriatic keratinocytes by propylthiouracil and fructose. BBA Clin. 6, 82–86 (2016).
Article PubMed PubMed Central Google Scholar
Ma, W., Huang, G., Wang, Z., Wang, L. & Gao, Q. IRF7: role and regulation in immunity and autoimmunity. Front. Immunol. 14, 1236923 (2023).
Article CAS PubMed PubMed Central Google Scholar
Aggeletopoulou, I., Kalafateli, M., Tsounis, E. P. & Triantos, C. Exploring the role of IL-1β in inflammatory bowel disease pathogenesis. Front. Med. 11, 1307394 (2024).
Article Google Scholar
Lähdeaho, M.-L. et al. Safety and efficacy of AMG 714 in adults with coeliac disease exposed to gluten challenge: a phase 2a, randomised, double-blind, placebo-controlled study. Lancet Gastroenterol. Hepatol. 4, 948–959 (2019).
Article PubMed Google Scholar
Oti, M., Snel, B., Huynen, M. A. & Brunner, H. G. Predicting disease genes using protein–protein interactions. J. Med. Genet. 43, 691–698 (2006).
Article CAS PubMed PubMed Central Google Scholar
Barabási, A.-L., Gulbahce, N. & Loscalzo, J. Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12, 56–68 (2011).
Article PubMed PubMed Central Google Scholar
Reay, W. R. & Cairns, M. J. Advancing the use of genome-wide association studies for drug repurposing. Nat. Rev. Genet. 22, 658–671 (2021).
Article CAS PubMed Google Scholar
King, E. A., Davis, J. W. & Degner, J. F. Are drug targets with genetic support twice as likely to be approved? Revised estimates of the impact of genetic support for drug mechanisms on the probability of drug approval. PLoS Genet. 15, e1008489 (2019).
Article PubMed PubMed Central Google Scholar
Wei, T. et al. Package ‘corrplot’. Statistician 56, e24 (2017).
Nassar, L. R. et al. The UCSC genome browser database: 2023 update. Nucleic Acids Res. 51, D1188–D1195 (2023).
Article CAS PubMed PubMed Central Google Scholar
Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).
Article CAS PubMed PubMed Central Google Scholar
Lex, A., Gehlenborg, N., Strobelt, H., Vuillemot, R. & Pfister, H. UpSet: visualization of intersecting sets. IEEE Trans. Vis. Comput. Graph. 20, 1983–1992 (2014).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank funding support from the Health and Medical Research Fund (HMRF) [07182946, 10212696, PR-HKU-7] of Hong Kong SAR, China. We also thank the funding support from the Shenzhen-Hong Kong Jointly Funded Project (Category A; SGDX20230116093201002), the Stability Support for Higher Education from Shenzhen Science and Technology Program, and the 1 + 1 + 1 CUHK-CUHK(SZ)-GDSTC Joint Collaboration Fund (2025A0505000056). We give thanks to the coordination of the University of Hong Kong Shenzhen Institute of Research and Innovation and grant support from National Key Research and Development Program of China [2021YFC2702005].

Author information

Authors and Affiliations

Department of Paediatrics and Adolescent Medicine, The University of Hong Kong, Hong Kong, China
Xiao Dang, Frank Qingyun Wang, Caicai Zhang, Yao Lei, Huidong Su, Hong Feng, Chun Hing She, Xinxin Chen, Xing Tian Yang, Jing Yang, Yu Lung Lau & Wanling Yang
School of Medicine, Warshel Institute for Computational Biology, The Chinese University of Hong Kong—Shenzhen, Shenzhen, Guangdong, China
Cinderella Xinxin Yang & Yong-Fei Wang

Authors

Xiao Dang
View author publications
Search author on:PubMed Google Scholar
Frank Qingyun Wang
View author publications
Search author on:PubMed Google Scholar
Caicai Zhang
View author publications
Search author on:PubMed Google Scholar
Yao Lei
View author publications
Search author on:PubMed Google Scholar
Huidong Su
View author publications
Search author on:PubMed Google Scholar
Cinderella Xinxin Yang
View author publications
Search author on:PubMed Google Scholar
Hong Feng
View author publications
Search author on:PubMed Google Scholar
Chun Hing She
View author publications
Search author on:PubMed Google Scholar
Xinxin Chen
View author publications
Search author on:PubMed Google Scholar
Xing Tian Yang
View author publications
Search author on:PubMed Google Scholar
Jing Yang
View author publications
Search author on:PubMed Google Scholar
Yu Lung Lau
View author publications
Search author on:PubMed Google Scholar
Yong-Fei Wang
View author publications
Search author on:PubMed Google Scholar
Wanling Yang
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: W.Y., X.D.; Methodology: X.D., F.Q.W., C.Z.; Data Curation: X.D., F.Q.W., Y.L., H.S., C.H.S., X.C., X.T.Y.; Investigation: C.X.Y., H.F., Y.F.W.; Writing first draft: X.D.; Writing review and editing: X.D., J.Y., W.Y.; Supervision: Y.L.L., W.Y. All authors have reviewed, edited, and approved the manuscript.

Corresponding author

Correspondence to Wanling Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Medicine thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Transparent Peer Review file (download PDF )

Supplemental Information (download PDF )

Description of Additional Supplementary Files (download PDF )

Supplementary Data 1 (download XLSX )

Supplementary Data 2 (download XLSX )

Supplementary Data 3 (download XLSX )

Supplementary Data 4 (download XLSX )

Supplementary Data 5 (download XLSX )

Supplementary Data 6 (download XLSX )

Supplementary Data 7 (download XLSX )

Supplementary Data 8 (download XLSX )

Supplementary Data 9 (download XLSX )

Supplementary Data 10 (download XLSX )

Supplementary Data 11 (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Dang, X., Wang, F.Q., Zhang, C. et al. Identifying genetic and cellular connections and distinctions among 15 autoimmune diseases using an in-silico approach. Commun Med 6, 235 (2026). https://doi.org/10.1038/s43856-026-01487-9

Download citation

Received: 05 August 2025
Accepted: 18 February 2026
Published: 07 March 2026
Version of record: 21 April 2026
DOI: https://doi.org/10.1038/s43856-026-01487-9

Subjects

Abstract

Background

Methods

Results

Conclusions

Plain Language Summary

Similar content being viewed by others

Introduction

Methods

Defining loci and signals: data collection for 15 autoimmune diseases

Ranking association signals in each disease

Identifying the target gene(s) for each signal

Identifying the relevant cell types for each signal

Pathway enrichment analysis for target genes

Protein-protein interaction (PPI) network analysis for functional connections and differences across ADs

Comparison of modules across ADs utilizing GSEA

Identifying drug candidates and targets based on the functional modules

Inclusion and ethics

Results

Diverse association signals even when the locus is shared

Cell type specificity of the association signals

Cell type specificity determines the associations of IL10 to different diseases

Genetic evidence from the IL23R/IL12RB2 locus supports targeted treatment

Functional comparison of pleiotropic and disease-specific signals

Identifying target genes for the association signals

Pathway characterization of the target genes

Analysis of protein interaction network reveals common and specific functional modules

Assessing the role and relative significance of individual modules in each disease

Potential drug repurposing based on shared modules across the diseases

Specific modules for ADs

Discussion

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links