Introduction

The prevalence of atopic diseases (ADs), including atopic dermatitis (or eczema), hay fever (or allergic rhinitis), and asthma, is increasing globally, affecting approximately 20% of the population worldwide1,2. Compared with the general population, patients with ADs have an increased risk of immune diseases, metabolic syndrome and mental disorders3. These comorbidities associated with ADs can lead to poor health status, a high economic burden and reduced quality of life4,5.

One of the main comorbidities of ADs pertains to gastrointestinal disorders (GIDs) observed from epidemiological studies6,7, including inflammatory bowel disease (IBD)8,9, celiac disease10, gastroesophageal reflux disease (GERD)11, irritable bowel syndrome (IBS)12, and gastrointestinal cancers13,14. For example, the risks of incident CD in patients with atopic dermatitis, allergic rhinitis, or asthma were reported to be higher than individuals without those ADs with hazard ratios (HR) of 2.02, 1.33, and 1.60 respectively9. Similarly, the HRs of incident UC in patients with the above ADs were 1.51, 1.32, and 1.29 respectively9. Furthermore, the prevalence of GERD in asthmatic patients ranged from 32% to 80%11,15, and GERD were also associated with increased the risk of eczema (OR 1.21, 95% CI, 1.07–1.37)16. Likewise, IBS has been associated with increased risk of allergic disease. The risk of allergic diseases in patients with IBS was reported to be 15-99%, higher than healthy individuals12.

Multiple scenarios may explain the co-occurrence between ADs and GIDs, (1) one disease causes the other, (2) the treatment of the first disease influences the risk of the other, (3) shared etiology including genetic basis, environmental exposures, treatment history, epithelial barrier dysfunction and microbiota alterations10,17. Recent large-scale genome-wide association studies (GWAS) of ADs and GIDs have identified a considerable number of shared disease-associated genomic loci, for example, variants within genes SMAD3, GSDMB, ORMDL3, and IL1R118,19,20 conferred increased susceptibility for both asthma and IBD. Significant genetic correlations have also been reported between IBS-asthma and GERD-asthma21,22. These studies suggested that an individual’s genetics acts as a shared intrinsic factor of AD-GID comorbidities, providing an important view of the common etiology between two traits which are phenotypically different.

Despite this progress, a deeper investigation of genetic relationships between ADs and GIDs is still lacking. More specifically, genomic regions that are genetically correlated, genes which are likely to be causal and the relevant cell types are largely unknown. Understanding of the shared genetic architecture between ADs and GIDs has important clinical implications, (1) to determine whether one type of disease may represent a risk of another23, (2) to unveil shared molecular insights of comorbidities24, and (3) to re-evaluate current drugs for repurposing or new drug design based on pleiotropic genes25.

In this study, we used the up-to-date GWAS summary statistics for seven traits of ADs (eczema, allergic rhinitis, asthma, asthma subtypes), three lung function measurements related to asthma, and 11 GIDs based on reported comorbidities. We first proved their genetic correlations at both genome-wide and regional levels. We then hypothesized that (1) shared genetic factors implied common molecular basis and (2) causalities between ADs and GIDs partially explained the comorbidity. Therefore, we integrated large-scale of GWAS, sc-RNAseq, bulk- and sc-eQTLs to prioritize comorbidity-associated cell types and -genes. Multiple MR approaches were used to assess the causality between AD-GID pairs. By these approaches, we aimed to comprehensively explore the shared genetic architecture between ADs and GIDs in an attempt to improve the understanding of the molecular basis of these comorbidities.

Results

GERD/IBS and ADs were genetically correlated at genome-wide level

GID is one of the main comorbidities of ADs. We first summarized previous studies that reported AD-GID comorbidity evidence, which is provided in the Supplementary Materials. In total, seven ADs and disease subtypes, three lung function measurements that related to asthma, and 11 GI diseases were selected, and all samples of the selected studies were from European populations (Fig. 1, Table 1, Supplementary Data 1). We subsequently explored the genetic correlations among these 10 AD related traits and 11 GIDs by linkage disequilibrium score regression (LDSC). In total, we identified 14 pairs of AD-GID (FDR < 0.05, Fig. 2A, Supplementary Data 2). Gastroesophageal reflux disease (GERD) was correlated with eight ADs, six of which were positively correlated, including asthma, Child-onset asthma (COA), Adult-onset asthma (AOA), Moderate-to-severe asthma (MtoS asthma), hay fever and allergy (rg from 0.07–0.33), and two were negative, including FEV1 and FVC (rg =−0.082 and −0.081). Irritable bowel syndrome (IBS) showed similar correlation patterns with ADs as GERD, including asthma, AOA, allergy (rg from 0.26–0.29) and FVC (rg =−0.067). We also observed correlations between peptic ulcer disease with MtoS asthma (rg = 0.21), and celiac disease with allergy (rg = 0.24). These findings suggest that IBS and GERD might possess similar genetic background related to comorbidity with ADs while the correlations were limited across other AD-GID pairs at a genome-wide level.

Fig. 1: Schematic visualization of the study workflow.
figure 1

The genetic correlations between 10 traits of ADs and 11 GIDs were first determined on both genome-wide level and genomic regional level by LDSC and LAVA, respectively. The subsequent analysis was conducted based on two hypotheses. (1) To detect if the shared genetic architecture between ADs and GIDs were converged on common cellular pathways, comorbidity-associated genes were prioritized using SMR in bulk-eQTLs data of nine tissues and sc-eQTL data from PBMC. Comorbidity-associated cell types were projected to single-cell RNAseq data from blood, gut, lung and airway tissues. Then the druggability of the prioritized genes were assessed in public drug databases. (2) To investigate the directional causality between AD-GID which might partially explain the comorbidities, multiple bi-directional MR approaches were incorporated, followed by a variety of sensitivity analysis. AD, atopic disease. GID, gastrointestinal diseases. LDSC, linkage disequilibrium score regression. LAVA, local analysis of [co]variant association. sc-eQTL, single cell expression quantitative trait loci. SMR, summary-data-based Mendelian Randomization. PBMC, peripheral blood mononuclear cells. COA, child-onset asthma. AOA, adult-onset asthma. MtoS asthma, Moderate-to-severe asthma. CD, Crohn’s disease. CRC, colorectal carcinoma. GERD, gastroesophageal reflux disease. UC, ulcerative colitis. IBS, irritable bowel syndrome. Created in BioRender. Hu, S. (2024) https://BioRender.com/q95x753.

Table 1 Overview of summary statistics used in this study
Fig. 2: Global and local genetic correlations between AD-GID.
figure 2

A heatmap of 14 pairs of genome-wide significant correlations. # indicates significant signals with FDR < 0.05. The color represents the directionality of the correlation coefficients. B heatmap of 23 unique genomic regions which were correlated with at least one pair of AD-GID with FDR < 0.05. The numbers indicate the amount of genetically correlated loci. AOA adult-onset asthma, CD Crohn’s disease, COA childhood-onset asthma, CRC colorectal carcinoma, IBS irritable bowel syndrome, UC ulcerative colitis.

Correlated genomic regions were associated with previously unrelated diseases

Global genetic correlations investigated the average association across the genome. Complementary to global correlation analysis, we applied Local Analysis of [co]Variant Association (LAVA) to assess the genetically correlated loci. In total, 23 unique genomic regions (rl) were identified to be shared between 28 pairs of AD-GID (FDR < 0.05, Fig. 2B, Supplementary Data 3). Notably, the majority of local correlated loci (18 out of 23) showed no significant rg from LDSC analysis. For example, seven GIDs (diverticular disease, chronic gastritis, celiac disease, gastric cancer, IBS, Colorectal carcinoma (CRC), Crohn’s disease (CD)) were correlated with measures of lung function (FVC, FEV1 and FEV1/FVC) at ten unique loci. Moreover, CD, Ulcerative colitis (UC), celiac disease, diverticular disease and GERD were also associated with asthma or its subtypes at six specific regions, suggesting the widely existed genetic correlations of AD-GID were driven by local effects which were however not captured by LDSC analysis.

A LD block (18.65–20.06 Mb at chromosome 2) contains gene NT5C1B, which has been reported in GWAS of diverticular disease, FEV1 and FVC. We observed strong correlations among these disease pairs (diverticular disease vs. FEV1 rl = 0.74 and diverticular disease vs. FVC rl = 0.73). This locus also revealed positive correlations of CRC with FEV1 and FVC (rl = 0.88 and 0.87 respectively), but did not include genome-wide significant SNPs in CRC GWAS (Supplementary Data 4). Another region (130.57–132.55 Mb) at chromosome 5 was positively associated with asthma and GERD (rl = 0.61). This region contained genome-wide significant SNPs from asthma GWAS but not from GERD GWAS. Pathway enrichment analysis showed that the genes IL3, IL4, IL5, and IL13 within this region were involved in inflammatory responses classically associated with a Th2-driven immune response in asthma (adjusted P = 1.32 × 10−7, Supplementary Data 4, Supplementary Fig. 1). A genomic region (0.76–1.5 Mb) on chromosome 19 presented significant correlation between CD and COA (rl =−0.65), including genes WDR18, GRIN3B, GPX4, TMEM259 and CNN2, enriched in functions related to secretory granule lumen (adjusted P = 3.69 × 10−3), cytoplasmic vesicle lumen (adjusted P = 3.89 × 10−3), and serine-type endopeptidase activity (adjusted P = 7.90 × 10−3, Supplementary Data 4, Supplementary Fig. 1). Collectively, the pleiotropic effect of these regions indicated their evolutionarily conserved properties and thus might contain core genes involved in comorbidity26.

SMR revealed 14 pleiotropic genes on AD-GID pairs

We subsequently aimed to prioritize key pleiotropic genes in the correlated genomic blocks identified by LAVA, through the integration with eQTL data from multiple tissues by Summary-data-based Mendelian Randomization (SMR) analysis. After multiple testing correction, 70 genes showed putatively causality on diseases, among which 14 genes were associated with at least one AD-GID pair (FDRSMR < 0.05 and PHEIDI > 0.01 Fig. 3A, Supplementary Data 5).

Fig. 3: Candidate genes involved in AD-GID pairs predicted from bulk eQTLs demonstrate pleiotropic effects.
figure 3

A dot plot showing the significant causal genes identified by SMR, using bulk eQTL analysis from different tissues (FDRsmr <0.05). Blue diamond-shaped dots represent negative effect size estimated from SMR while red dots represent positive relationships. B, C Representative examples of locus zoom plots. X-axis indicated the genomic positions and Y-axis indicated the –log (10) P-values representing the significance of GWAS of diseases and eQTLs. SC sigmoid colon, TC transverse colon, EGJ gastro-esophageal junction, EMa esophagea mucosa, Ems esophagus muscularis.

One prominent example concerns PABPC4. Here, higher expression of PABPC4 was identified to be associated with decreased risk of IBS and increased of lung function measured by FEV1/FVC ratio (Fig. 3B), and this pleiotropic effect was identified in five tissues, including blood, sigmoid colon, transverse colon, gastro-esophageal junction and esophageal mucosa. Another example is SLC22A5, an important transporter in active cellular uptake of carnitine associated with spirometry indices27, which was correlated with an increased risk of both asthma and GERD in esophageal muscularis (Fig. 3C). WDR18 and HMHA1 were causally associated with both CD and COA in blood. Interestingly, other genes TMEM259, CNN2, ABCA7, POLR2E, MIDN and TEME259 in the same LD block of WDR18, were specific to COA risk but not to CD, whereas STK11 was specific to CD, suggesting that multiple altered gene expressions might contribute to COA and CD onset at this genomic region (Supplementary Data 5).

Disease-associated genetic factors converged on common cell types across AD-GID

To determine whether ADs and GIDs converge on shared molecular basis, we performed association analysis between the GWAS signals of ADs and GIDs with gene expressions of a variety of cell types from four relevant tissues. In PBMCs, we observed a significant enrichment of ADs- (asthma, AOA, COA, allergy) and GIDs- (CD, UC) -associated loci in T cells, including γδ T cells, cytotoxic CD4+ T cells and central memory CD4+ T cells (FDR < 0.05, Fig. 4A). Considering the disease-relevant tissues differ between ADs and GIDs, we extended the analysis to scRNAseq data from lung, airway, colonic and ileal tissues. Strikingly, asthma, AOA, COA, allergy, hay fever, CD, UC and celiac disease showed consistently stronger enrichment in T cells across airway, colonic and ileal tissues (FDR < 0.05, Fig. 4B–D). For example, these disease-associated genetic factors were enriched in regulatory T cells, NK cells and dendritic cells in lung. In colon and ileum, CD4+ T cells, CD8+ T cells/NK-like cells and T reg cells were predominant. Moreover, B cells, plasma cells and monocytes were identified to be associated with these diseases in airway tissue (FDR < 0.05, Fig. 4E). These findings indicate that disease-associated genetics of ADs-GIDs captured common immune signatures across distinct tissues, pinpointing the potentially central role of CD4+ T cells in a shared molecular background.

Fig. 4: Disease-associated genetic loci converge on specific cell types across different tissues.
figure 4

The CELLECT method was used to associate SNPs derived from GWAS summaries with 178 different cell types from four tissues (lung, airway, colon and terminal ileum). Panels AE demonstrate representative examples of shared cell type (T cells, B cells, NK cells, monocytes) enrichments across asthma, AOA, COA, hay fever, allergy, CD, UC and celiac disease in PBMCs and lung, airway and gut tissues (colon and terminal ileum). Red color indicated the significance at FDR  < 0.05 level while yellow indicates a more lenient threshold with nominal P < 0.05. Full summaries are presented in Supplementary Figs. 28 and Supplementary Data 6.

Disease-specific cell-type enrichment was also observed. For instance, CRC-associated loci converged more prominently on NK cells and memory B cells in blood. Lung function-related traits, including FEV1, FVC and FEV1/FVC ratio, were enriched in fibroblasts across lung, colon and ileum, suggesting the specificity of disease- and tissue characteristics. Full summaries are provided in Supplementary Figs. 28 and Supplementary Data 6.

Activation of WDR18 and GPX4 in CD4+ T cells predicted an increased risk of CD and asthma

Accumulating evidence indicates that eQTLs could exert different effects across cell types28,29 which might confound bulk-eQTLs. Therefore, we further integrated sc-eQTLs to identify comorbidity-associated genes on single-cell level. Five genes were identified with evidence for disease causality at singe-cell resolution (FDRSMR < 0.05) (Fig. 5A, Supplementary Data 7). Strikingly, WDR18 and GPX4 expressed in blood CD4+ T cells were associated with FEV1, FVC or MtoS asthma. When we adopted a more lenient threshold at PSMR < 0.05, both genes also exerted positive correlations with increased CD, asthma or MtoS asthma risks. Colocalization analysis further confirmed that the expression of WDR18 potentially shared the same causal variant with CD and asthma in naïve central memory CD4+ T cells (PPH4 > 0.8, Supplementary Data 8), which complemented the discoveries from SMR and scRNAseq data, presenting additional evidence for the key role of CD4+ T cells in mediating CD and asthma onset. However, the causality of GPX4 has been shown to be disease and cell-type specific, indicating different roles of GPX4 across CD4+ T cell subtypes.

Fig. 5: Candidate genes involved in both ADs and GIDs predicted from SMR-sc-eQTLs and druggablity investigation.
figure 5

A Illustrative plot of SMR-identified five candidate genes presenting associations with ADs or GIDs or both, expressed in CD4+ T cells, CD8+ T cells, NK cells or monocytes. Solid lines indicated significance with FDR < 0.05 while dashed lines indicated a more lenient significance with nominal P < 0.05. B Schematic visualization of the overlaps between predicated candidate genes from bulk- and sc-eQTLs with three public drug targets databases, including DGIdb, DrugBank and OpenTargets. Duggable candidates were defined as the genes which were well-established drug targets.

Druggability of candidate causal genes

An interrogation of three drug databases and the 14 SMR-identified pleiotropic genes revealed that three genes were well-established drug targets, including SLC22A5 (levocarnitine, treatment of carnitine deficiency), GM2A (Choline alfoscerate, treatment of neurodegenerative and vascular diseases) and ARHGEF28 (methylphenidate, treatment of attention deficit hyperactivity disorder), indicating potential candidates for drug repurposing (Fig. 5B, Supplementary Data 9). In addition, seven genes identified by SMR that were only related to one type of disease, have also been reported as drug targets (Supplementary Data 9). Moreover, the pleiotropic effects of WDR18 and GPX4 suggested that suppressing their expressions could be new therapeutic strategies for AD-GID comorbidities.

Causality between ADs and GIDs

Shared genetic architecture could increase the risk of one phenotype, the onset of which causes another30. This phenomenon could be examined by causality inferring. To test this hypothesis, we further applied primary and complementary Mendelian Randomization (MR) approaches for each pair of AD and GID (Fig. 6A). In total, five pairs of AD-GID showed significant unidirectional causality and one pair showed bi-directional causality with FDRivw < 0.05 (Fig. 6B).

Fig. 6: Causal inference of AD-GID associations using bi-directional MR analysis.
figure 6

A Schematic workflow of MR analysis (Methods), including IVs selection and quality controls, primary and complementary MR approaches, and sensitivity analysis. B A total of five unidirectional causal relationships with FDRIVW < 0.05 were identified, and one significant bi-directional causality between GERD and asthma (both directional FDRIVW < 0.05). All results shown passed quality control and sensitivity analysis. NS, non-significant.

When GIDs were exposures with AD as outcomes, we observed that GERD increased the risk of AOA (OR = 1.18) and associated with reduced FVC (OR = 0.92). CD decreased the risk of allergy (OR = 0.97). Conversely, when ADs were treated as exposures and GIDs as outcomes, asthma slightly reduced the risk of CD (OR = 0.86). Moreover, allergy was a potential causal factor of IBS (OR = 1.07).

Interestingly, a significant causal association was identified of asthma on the risk of GERD (OR = 1.04, 95% CI, 1.02–1.06) while GERD also increased the risk of asthma (OR = 1.18, 95% CI, 1.08–1.29). These findings indicate the causal relationships between ADs and GIDs which might partially explain comorbidities. Full summaries of IV quality controls, results and sensitivity analyses are provided in Supplementary Data 10, 11.

Discussion

In this study, we identified significant genome-wide correlations between GERD and IBS with ADs. Instead, the associations between CD, UC, celiac and diverticular diseases with ADs were locally driven by distinct genomic regions. Following integration of GWAS summaries with bulk-eQTL analysis of nine distinct tissues, 14 key genes were prioritized and involved in comorbidity. Despite the varied genetic correlation patterns, cell type enrichment analysis using scRNA-seq data identified CD4+ T cells as shared etiologic cells between AD and GID across four relevant tissues. We further projected the candidate key genes on a single-cell level and found causal evidence for five genes (CDC42SE2, WDR18, GPX4, GM2A, and AFF4) expressed in CD4+ T cells, CD8+ T cells, NK cells and monocytes. Moreover, three genes were established druggable targets, emphasizing their potential therapeutic amenability. Finally, nine pairs of ADs-GIDs were suggested to present causal relations with one significant bi-directional relationship found between GERD and asthma.

Until now, comorbidity between ADs and GIDs has largely been investigated through epidemiological work. Most of the genome-wide correlations in our study were consistent with epidemiological evidence, e.g., both GERD and IBS were correlated with multiple asthma-related traits11,31. However, we did not identify significant global genetic correlations between IBD and asthma, which frequently co-occur according to epidemiological data8,9. Considering that the global genetic correlations only capture the average of the associations across the genome, there is still the possibility that local genetic correlations exist in the absence of global relations32. Indeed, the majority of locally correlated loci was not detected in global co-occurrence analysis (LDSC), which was exemplified by the identification of a locus that was genetically correlated between CD and COA, while the genes from this locus were enriched in pathways relating to secretory functions and leukocyte activation, which may form the common genetic basis of CD and COA within this specific region.

Pleiotropic genomic regions might imply conserved genetic properties during evolution and thus contain important genes involved in cross-disease causality26. We identified 14 genes that showed potential causality on both ADs and GIDs on bulk tissue level and further projected these at single-cell level with five genes. WDR18 was causally associated with both CD and COA in CD4+ T cells and CD8+ T cells of PBMCs, and encodes a member of the WD40-repeat protein family, being related to DNA damage checkpoint signaling33. Elevated expression of GPX4 mediated higher risk of asthma, chronic gastritis and CD, and suggested the dysregulation of CD4+ and CD8+ T cells. GPX4 has been shown to restrict cytokine responses of small intestinal epithelial cells (IECs), and mice lacking one allele of Gpx4 in IECs can develop mucosal inflammation, resembling CD34. However, the role of GPX4 in asthma onset needs further investigations. Nevertheless, highly consistent patterns of enriched immune cells, especially for CD4+ T cells, were found across asthma, AOA, COA, allergy, CD, UC and celiac disease. This consistency did not largely differ across tissues, indicating the systematic nature of immune perturbations in these diseases. Taken together, these findings revealed that the genetics of some ADs and GIDs converge on shared cell types, especially for those immune-related diseases, implying common immune responses underlying these diseases. A more comprehensive discussion on candidate genes is provided in the Supplementary discussion.

MR analysis was used to infer whether the altered gene expression implies causality for diseases and thus, may accelerate the development or repurposing of drugs35. Three genes identified from SMR, including GM2A, SLC22A5 and ARHGEF28, were well-established drug targets of neurodegenerative- or metabolic diseases. GM2A was a target of choline alfoscerate which is used in the treatment of neurodegenerative- and vascular diseases. Levocarnitine, targeting SLC22A5, was developed for carnitine deficiency. Methylphenidate was FDA-approved for treating attention deficit hyperactivity disorder (ADHD) targeting ARHGEF28. The pleiotropic effects of these genes further suggest the feasibility of drug repurposing for asthma, impaired lung function, celiac disease, CD and diverticular disease. Moreover, the higher expression of WDR18 and GPX4 increased the risk of CD, COA, asthma or chronic gastritis. WDR18 has been described as putatively involved in the differentiation and self-renewal of intestinal stem cells36. Recently, GPX4 was identified as central hub gene in autophagy- and ferroptosis-related genes which were associated with postoperative recurrence in patients with CD who underwent ileocecal resection37. Colocalization analysis further confirmed these genes shared causal variants with AD-GID pairs or individual disease, indicating that inhibiting these genes might be novel therapeutic strategies for CD and asthma. However, further studies investigating the roles of these genes across different cell types followed by cell-specific drug targeting approach should be warranted.

Shared genetic factors have been shown to be involved in cross-disease causality which was related to the comorbidities30,38. A bi-directional causal effect between GERD and asthma was observed in our analysis which was consistent with a study by Ahn et al.16, indicating the comorbidity of the two diseases was partially due to mutual causality. In addition, we identified an inverse association between the risk of asthma and CD, which was also reported by Freuer et al.39. These previously reported findings support the robustness of our study. By incorporating a more comprehensive phenotypes for respiratory traits, we also uncovered novel leads of causality between ADs and GIDs. For example, GERD could be an important predictor of reduced spirometry indices and high risk of AOA, and allergy showed positive causal effect on IBS, calling for attention of healthcare providers to see whether AD develops in patients who have manifested GIDs or vice versa.

Other underlying mechanisms may also explain the link between ADs and GIDs in addition to shared genetics. Both the airway and GI tracts arise from the foregut40,41, locate in close proximity, and share similar physiological structures of epithelial tissues. Moreover, it has been reported that early life environmental exposures are associated with a predisposition towards atopic diseases and with changes in the intestinal microbiota42,43, and studies also suggested a cross-talk between the intestinal and airway microbiota compartments44,45, which may explain the mechanism of airway-gut axis.

The main strength of this study includes (1) novel genetic evidence that may explain comorbidities of ADs and GIDs, and distinct genetic correlation patterns were identified between AD-GID pairs, (2) the identification of potentially shared etiologic cell types and genes by integration of a large number of bulk- and sc-RNAseq datasets for drug repurposing, and (3) the observation that disease-to-disease causality might participate in comorbidities. On the other hand, some limitations warrant recognition. (1) This study is restricted to European populations, limiting external generalizability of our results to other ethnicities. (2) Although we integrated the largest publicly available eQTL datasets, more layers of omics data such as proteins and metabolites could have deepened the understanding of the functional consequences of the observed key genes. (3) The genetic correlations and MR-based causalities reported in this study relied on the sample size of original datasets which might affect our observations. Therefore, follow-up studies with greater statistical power of larger datasets would be required to validate the robustness of these findings.

In conclusion, within this study we constructed a comprehensive atlas of shared genetic architecture between ADs and GIDs, showing the correlated genomic loci were converged on dysregulation of CD4 + T cells. More importantly, we uncovered underlying pleiotropic genes which are relevant to current drug re-evaluation efforts and the development of novel therapeutic targets. The potential causalities between a group of AD-GID pairs provide important clinical prevention strategies to reduce the incidence of these comorbidities.

Methods

GWAS datasets

The criteria of disease datasets selection included: (1) the presence of reported comorbidities of AD-GID14,46,47,48,49,50 (Supplementary Materials), (2) public data availability. In total, seven atopic diseases (general asthma, hay fever, eczema, allergy) and disease subtypes (childhood-onset asthma (COA), adult-onset asthma (AOA), moderate to severe asthma (MtoS asthma)), three lung function measurements (forced expiratory volume in one second (FEV1), forced vital capacity (FVC) and FEV1/FVC) related to asthma, and 11 GI diseases (Crohn’s disease (CD), ulcerative colitis (UC), colorectal cancer (CRC), celiac disease, chronic gastritis, diverticular disease, esophageal cancer, gastric cancer, gastroesophageal reflux disease (GERD), irritable bowel disease (IBS) and peptic ulcer) were selected, and all samples of the selected studies were from European populations previously described in the datasets (Table 1). The full GWAS summaries were obtained from the MRC IEU OpenGWAS database (https://gwas.mrcieu.ac.uk/) and the GWAS Catalog (https://www.ebi.ac.uk/). The data was quality controlled by the following (1) removal of SNPs with MAF < 1%, (2) allele harmonization using R package SNPlocs.Hsapiens.dbSNP144.GRCH37 (v. 0.99.20).

Global and local genetic correlations

We used linkage disequilibrium score regression (LDSC) (https://github.com/bulik/ldsc) to estimate the SNP-based heritability (h2) of each of the 21 traits. The known LD scores from 1000 Genomes Project European (1000 G) data was used to estimate global genetic correlations (rg) between each pair of AD-GID which represented the shared genetic factors which were not influenced by environmental confounders. In addition, the intercepts and standard errors were also obtained from LDSC as an estimation of sample overlap. The significance was determined by Benjamini-Hochberg (BH) procedure considering the total number of tested disease pairs.

Local Analysis of [co]Variant Association (LAVA) (https://github.com/josefin-werme/LAVA) was used to test local genetic correlations of 2495 independent genomic loci as defined by Werme et al.32. For each pair of traits, we selected only the loci significantly associated with both traits in univariate analysis. The final significance was determined by FDR < 0.05 (Benjamini-Hochberg procedure, considering the total number of trait-trait pair-wise tests).

Summary-data-based Mendelian randomization analysis with bulk- and sc-eQTLs

To prioritize pleiotropic genes within genetically correlated genomic regions, we obtained expression quantitative trait loci (eQTL) data which assessed the genetic effects on the transcriptome, including bulk eQTLs of blood from eQTLGen datasets (n = 31,864) and bulk eQTLs of lung, esophagus-gastroesophageal junction, esophagus-mucosa, esophageal muscularis, stomach, ileum, transverse colon and sigmoid colon from the Genotype-Tissue Expression (GTEx) project (n = 860). In addition, sc-eQTLs (peripheral blood mononuclear cells, PMBCs) from Onek1k project51 (n = 982, cell count =1,267,758) were applied to project the candidate genes to single-cell level.

Summary-based Mendelian randomization (SMR) multi-tools52 was used to detect whether effects of SNPs on the phenotype were mediated by gene expression. SNPs were treated as instruments, with gene expression as exposures and diseases as outcomes. The 1000 G reference was used to calculate LD. Two-step SMR was conducted, step 1, SMR test on eQTL and one trait of an AD-GID pair, step 2, SMR test on the other trait53. The comorbidity-associated candidates were defined as (1) were suggestively genome-wide significant (P < 1 × 10−5) in both eQTLs and GWAS results54,55, (2) FDR < 0.05 (BH-method, considering the total number of the trait-gene pairs) in at least one pair of AD-GID, and (3) without heterogeneity in the dependent instrument (HEIDI) test at P > 0.05.

As a complementary analysis for SMR, we adopted colocalization analysis (coloc R package with default parameters)56 to test whether the expressions of certain genes were shared causal variants with diseases. This analysis would further help to 1) assess the validity of instrumental variable assumption, 2) prioritize the most likely therapeutic targets within the same locus causally associated with disease57.

Cell type enrichment

To explore the etiologic cell types underlying the genetic factors, we used CELL-type Expression-specific integration for Complex Traits (CELLECT) analysis to associate the GWAS signals with sc-RNAseq data from four tissues, including PBMC (Onek1k project)51, lung tissue58, airway tissue59, and gut60 (terminal ileum and colon). The MHC region was excluded and only healthy individuals were kept in this analysis. In total, 178 cell types from four tissues were included in this analysis, and the significance was determined with BH-approach considering the total number of the trait-cell type pairs at FDR < 0.05.

Pathway enrichment

We used gProfiler (https://biit.cs.ut.ee/gprofiler/gost) to provide functional enrichment analyses (Wikipathway, GO and KEGG) and report the significant pathways (BH-adjusted P < 0.05).

Druggability of candidate causal genes

To assess whether the SMR-selected bulk and sc-eQTL genes were potential drug targets, we overlapped the candidate genes with DGIdb, DrugBank and OpenTargets (https://www.dgidb.org/, https://www.drugbank.ca/, www.opentargets.org).

Trait pair-wise bi-directional MR analysis

Shared genetic architecture could increase the risk of one phenotype, the onset of which causes another30. This phenomenon could be examined by causality inferring. To test this hypothesis, we performed MR analysis for each pair of AD-GID. The instrument variables (IVs) were selected and quality controlled according to stringent criteria61 as following (1) the independent SNPs from GWAS summary statistics of exposure by using the clump function in PLINK 2.0 software (https://www.cog-genomics.org/plink/2.0/), with the LD reference of European population from 1000 G dataset. The parameters of LD pruning were set at r2 threshold of 0.001, a window size of 1 Mb, minor allele frequency (MAF) > 0.05 and a significant threshold of with P value < 5e−08 in each GWAS study, (2) heterogeneity test was performed to detect IV outliers using ivw_radial (alpha = 0.05, weights = 1, tol = 0.0001) and egger_radial (alpha = 0.05, weights = 1) functions in RadialMR R package (https://github.com/WSpiller/RadialMR). Outliers were defined as a nominal significant P value < 0.05 in either of the two approaches above and further filtered out, (3) moreover, the IVs with larger variation explained in the outcome compared with exposure were also excluded (nominal P < 0.05 with Steiger’s test), (4) finally, we calculated the F-statistics and included only those IVs with F-statistics >10 in the analysis. All the IVs were harmonized with the SNPs from the outcome trait (Fig. 6A).

Three two-sample MR approaches were utilized to explore the potential causal relationships across AD-GID pairs, including primary inverse variance weighting (IVW) analysis and two complementary approaches, MR-Egger and weighted median, which relaxed certain assumptions of MR. IVW treated each valid SNP as independent, and used Wald ratios estimation for each SNP then meta-analyzed under a fixed effects model. The weighted median measured the weighted median rather than weighted mean of the SNP ratio, which has the ability to identify true causality if ≤50% of the weights are from invalid SNPs. The MR-Egger further allowed for the presence of directional (i.e. non-zero mean) uncorrelated pleiotropy and adds an intercept to the IVW regression to exclude confounding from such pleiotropy. All these methods were implemented by the functions of mr_ivw, mr_weighted_median and mr_egger_regression in TwoSampleMR R package (v0.4.26).

The MR results were further verified by sensitivity analysis (Fig. 6A). First, the leave-one-out analysis was used to examine whether the causal association was driven by a single SNP. Second, the MR-Egger regression was conducted to test the potential bias of directional pleiotropy effects represented by the intercept. Third, MR-PRESSO approach was used for horizontal pleiotropy testing62. Forth, we performed Cochran Q test for heterogeneity.

We applied the MR analysis in a bi-directional way, including ADs to GIDs and GIDs to ADs. The significance was defined as (1) FDR (BH approach) <0.05 of primary IVW method, and nominal P < 0.05 of MR Egger or MR weighted median methods, (2) passed all the sensitivity check with nominal P > 0.05.

To control for potential confounders associated with both GIDs and ADs, we further adopted a multivariable MR (MVMR) approach by adjusting for genetically determined BMI and smoking63. The GWAS summary statistics of BMI and smoking were obtained from Pulit, S. L et al.64 and Karlsson Linnér et al.65 studies. Causal relationships with MVMR P value < 0.05 were finally reported.

Statistics and reproducibility

The statistical significance was determined by FDR correction for multiple testing (FDR  <  0.05). All the data used in this study is public available and analysis code has been provided which ensures the reproducibility.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.