Abstract
Many disease-associated variants are thought to be regulatory but are not present in existing catalogues of expression quantitative trait loci (eQTL). We hypothesise that these variants may regulate expression in specific biological contexts, such as stimulated immune cells. Here, we used human iPSC-derived macrophages to map eQTLs across 24 cellular conditions. We found that 76% of eQTLs detected in at least one stimulated condition were also found in naive cells. The percentage of response eQTLs (reQTLs) varied widely across conditions (3.7% − 28.4%), with reQTLs specific to a single condition being rare (1.11%). Despite their relative rarity, reQTLs were overrepresented among disease-colocalizing eQTLs. We nominated an additional 21.7% of disease effector genes at GWAS loci via colocalization of reQTLs, with 38.6% of these not found in the Genotype–Tissue Expression (GTEx) catalogue. Our study highlights the diversity of genetic effects on expression and demonstrates how condition-specific regulatory variation can enhance our understanding of common disease risk alleles.
Similar content being viewed by others
Introduction
Disease-associated variants are often located in noncoding regions of the genome1,2, making biological interpretation of their function challenging. If these variants alter gene expression this should, in principle, be detectable by mapping expression quantitative trait loci (eQTL). eQTLs have now been discovered across a broad range of human tissues and cell types3,4,with at least one eQTL known for every human gene. Despite this, a surprisingly large fraction of disease associations do not share a causal variant with a known eQTL5,6,7,8. For example, the most recent publication by the Genotype-Tissue Expression (GTEx) consortium, which mapped eQTLs across 49 tissues ascertained from 838 individuals, reported that approximately 43% of disease associations colocalize with a detectable eQTL3.
One potential explanation for missing disease-associated eQTLs is that many regulatory variants may function in highly specific cellular contexts, such as activated immune cells. To address this, multiple studies have now been performed using bulk RNA-sequencing of activated cell conditions9,10,11,12,13,14,15,16,17 and more recently using single cell approaches18,19. However, because primary cell material is frequently limited, eQTL mapping has typically been restricted to a relatively small set of environmental contexts following perturbation with a limited set of classical stimuli.
Here, we differentiated induced pluripotent stem cells (iPSCs) from 209 individuals to macrophages and mapped eQTLs across twelve different cellular conditions at two timepoints post stimulation. We used this dataset (MacroMap, https://www.macromapqtl.org.uk/) to explore the properties of response eQTLs (reQTLs) and their relevance for disease.
Results
Expression profiling of macrophage innate immune responses
We selected 217 iPSC lines derived from unrelated healthy donors by the HipSci consortium20, and differentiated them to macrophages using a previously described protocol15 with minor modifications (see Methods and Supplementary note).
During the differentiation process, we collected macrophage precursor cells at day 0 (labelled “Prec_D0”) and day 2 (“Prec_D2”). We used a low-input RNA-seq protocol to profile the transcriptomes of these cells, as well as those of unstimulated iPSC-derived macrophages after 6 and 24 h. We next perturbed the cells with a panel of ten different stimuli and measured gene expression six and 24 h after stimulation using the same protocol (Fig. 1a). Our stimulation panel (Supplementary Data 1) included stimuli that trigger pro- and anti-inflammatory pathways in macrophages (IFNβ, IFNγ, interleukin-4/IL4) and those that induce response to viral infection (Resiquimod/R848, Poly I:C/PIC). We also included combinations of stimuli to mimic bacterial response to infection (smooth lipopolysaccharide/sLPS, Pam3CSK4/P3C, CD40 ligand + IFNγ + sLPS/CIL, interleukin-10 + sLPS/LIL10). Furthermore, we stimulated macrophages with myelin basic protein (MBP) to mimic the innate immune response of microglia (brain-resident macrophages) to stimuli. For simplicity throughout, we include day 0 and day 2 precursor cells when we refer to “stimulated conditions”, except where otherwise stated. We profiled the transcriptome of 5208 samples and after quality control (“Methods”) retained 4698 unique RNA-seq libraries from 209 unique iPSC lines (Supplementary Fig. 1).
a Overview of the experimental workflow. Isolation and stimulation of naive macrophages and measurement of gene expression via RNA-seq at two post-stimulation time points (6 h and 24 h) for multiple stimuli. UMAP representation of 4698 RNA-seq libraries after quality control (“Methods”) coloured by different stimulation conditions (b) and time point (c).
We first explored which technical and biological confounders affected gene expression using variance component analysis (Supplementary Fig. 1d) and identified that the combined factors of stimulation type and differentiation time, as the second most important driver of the total expression variation (29.3%) after the library preparation method (39.9%). Notably, the library preparation method variation is primarily due to the choice of medium used during myeloid precursor formation (Supplementary Fig. 1e).
A UMAP projection of the gene expression data showed that samples clustered by stimulation and time point (Fig. 1b, c). To better quantify the structure present in the gene expression data we estimated the pairwise Pearson’s correlation (r) between conditions (mean TPM values per gene across all individuals for a given condition) (Supplementary Fig. 2a). For example, the gene expression profile of precursor cells at day 2 clustered closely with samples from naive cells at the 24 h time point (Pearson r = 0.98). Likewise, gene expression in macrophages stimulated with PIC was highly correlated with gene expression in IFNβ at both time points, highlighting an expected overlap in signalling because PIC increases IFNβ expression21.
We next identified differentially expressed genes (DEGs) between naive and stimulated conditions using DESeq222 (“Methods”). On average, we found a median of 2306 DEGs (false discovery rate (FDR) = 5%, fold change ≥2) between naive and stimulated cells across all conditions (Supplementary Data 2–4), although the number varied widely between conditions (488–5427 DEGs) (Supplementary Fig. 2b). These DEGs were enriched in gene ontology (GO) terms (Supplementary Data 5 and 6) and Reactome pathways23 (Supplementary Data 7 and 8) corresponding to relevant immunological pathways. For example, “response to bacterium” (GO:0009617) was the most enriched GO term in DEGs 6 h after sLPS stimulation, with TNF upregulated 13.75-fold24. Likewise, “Response to virus” (GO:0009615) was the GO term most enriched with genes differentially expressed in cells 6 h after stimulation with IFNβ, with GBP1 (Guanylate Binding Protein 1) upregulated 256-fold25,26,27. This demonstrates that iPSC derived macrophages can faithfully recapitulate the known biological pathways activated by different stimuli.
Genetic regulatory effects across different stimulation conditions
To investigate genetic regulation of gene expression we mapped expression quantitative loci (eQTL) across all twenty four conditions (Methods). We mapped cis-eQTLs within ±1Mbp of the transcription start site (TSS) of each gene at a false discovery rate (FDR) of 5%. We found between 1781–3735 eGenes (genes with at least one cis-SNP significantly associated with expression) per condition (Supplementary Fig. 3a), with an enrichment of eQTL lead SNPs (eSNPs) in close proximity to promoters (p = 1.2 × 10−124, mean log odds ratio = 2.29) (Supplementary Fig. 3b). We performed conditional eQTL mapping to identify additional cis-eQTLs with independent effects on expression (“Methods”) and found that 3.3%–8% of eGenes in each of our 24 cell conditions, and 18% of all eGenes, had more than one eSNP (1817 eGenes) (Supplementary Fig. 4a). Secondary and tertiary eSNPs (conditionally independent eQTLs) were further away from the TSS (mean distance 206 and 233 kb respectively) compared to the primary signals (mean distance 94 kb Wilcoxon p = 1.1 × 10−215, p = 5.7 × 10−16 respectively) (Supplementary Fig. 4b).
Per condition, our eQTL detection rate (22.4% of tested genes, mean over all conditions) was lower compared to GTEx tissues of similar sample size (35% in GTEx tissues of between 180–210 individuals). One likely explanation for our lower detection rate is that we studied a single cell type, while most GTEx eQTL studies are derived from complex tissues. Consistent with this, GTEx eQTLs in lymphoblastoid cell lines (LCLs), had an equivalent detection rate to our study (~22%). Summing over all cell conditions we identified 10,170 unique eGenes (72.4% of expressed genes) (Fig. 2a). This is higher than expected given our study sample size (mean across all conditions n = 202), even compared to studies of complex tissues. For example, the number of eGenes discovered in GTEx liver (n = 208) and brain cortex (n = 205) was 4415 and 7108, equivalent to 25.6% and 37.5% of expressed genes, respectively. Our overall eQTL detection rate also exceeds that observed for individual cell types studied in GTEx. For example, we detected a similar proportion of eQTLs to that found in GTEx cultured fibroblasts (12,280 eGenes, 73.2% of expressed genes), a study with over double our sample size (n = 483). Although studying many conditions likely revealed additional eQTLs, the large number of eQTLs we detected is likely also driven by the very high degree of biological replication in our study. We note that even when we compared our control conditions at 6 and 24 h timepoints, we detected an additional 774 new eGenes (Supplementary Fig. 4c). Thus, generating additional gene expression data from the same individuals increased our power to detect eQTLs relative to other studies of similar sample size.
a Fraction of expressed genes (protein coding and lincRNAs) that are eGenes (genes with at least one eQTL) in our study (red diamond) and in different GTEx tissues (circles) as a function of the sample size (mean sample size across all conditions in our study). GTEx studies of single cell types (cultured fibroblast and lymphocytes) and tissues with similar sample size to our study (Brain-cortex and Liver) are highlighted. b Number and fraction of reGenes (pink) and the total number of eGenes (green) per stimulation condition (left hand panel) and number and fraction of reGenes that were differentially expressed (dark purple, up-regulated, dark pink, down-regulated) and not differentially expressed (pink) (right hand panel). Examples of genes where reQTL was detected following stimulation and genes were significantly up- (c), downregulated (d) or where there was no significant change in expression (e) in the stimulated condition (sLPS at 6 h, n = 191 vs. Ctrl at 6 h, n = 179). Data are presented as individual points, jittered for visibility. Box plots represent the median (centre line), the first and third quartiles (bounds of the box), and the whiskers, which extend to the smallest and largest values within 1.5x the interquartile range from the bounds of the box. Data points outside this range are plotted as outliers. Shaded areas represent linear regression lines fitted to the data, with 95% confidence intervals.
Stimulation specific genetic regulation of gene expression in immune response
To better establish which cis-eQTLs were truly restricted to stimulated cells, we used mashr28, which compares eQTL effect size estimates between conditions. Using the “common baseline” mode (“Methods”), mashr estimates the extent to which the eQTL effect size in each stimulated condition deviates from a baseline condition, here defined as the Ctrl_24 condition. Mashr produces a local false sign rate (lfsr) that measures the confidence in the direction of each genetic effect compared to the baseline effect29.
We defined a response eQTL (reQTL) as a significant difference (lfsr < 0.05) in genetic effect between the baseline condition and at least one stimulated condition. The number of reQTLs we detected varied widely between conditions, from 3.7% of all eQTLs in precursor cells at day 2 (Prec_D2) to 28.4% of all eQTLs in cells stimulated with CIL at the 6 h time point (Fig. 2b). Across all conditions, 23.4% (2378) of eGenes had a reQTL in at least one condition with the majority of these (21.9%, 2228) having a larger absolute effect following stimulation than in the naive condition. Approximately 9% (159) of conditionally independent eQTLs were classified as reQTLs. 89% (142) of these genes with a conditionally independent reQTL also had a primary reQTL, an almost 4-fold enrichment (Fisher’s exact test p value = 3−36, OR = 12.93, 95% CI 7.7–23.05) (Supplementary Fig. 5b).
Our ability to detect eQTLs for lowly expressed genes is limited. Many of our reQTLs may therefore have similar effect sizes in both the naive and stimulated conditions, but these effects can only be detected as different once gene expression increases in response to stimulation. We found that between 5.8%–48.4% of reQTL genes were differentially expressed (example signals Fig. 2c, d) (FDR 5%, fold change ≥2) (31.2% on average across conditions), the majority of which (mean across conditions 75.8%) were up-regulated (Fig. 2c) compared to the naive condition. Nonetheless, the majority of reQTLs we identified (51.6%–94.2%) were in genes where we were unable to detect a substantial change in expression between the naive and stimulated conditions based on our chosen criteria (Fig. 2e).
Next, we asked how widely shared reQTLs were between stimulated conditions, using mashr. Very few (1.1%) reQTLs were specific to a single condition (condition-specific reQTLs), 90.5% of reQTLs were shared in five or more conditions, with 49% detected in at least half of the conditions (Fig. 3a, b). This suggests that the majority of the reQTLs we identified could have been detected with a smaller set of stimulated states.
a Proportions of reQTLs for every stimulated condition that were found in a single condition only (column 1) or detected in 2 or more stimulated conditions (columns 2–22). Colour intensity reflects the percentage of reQTLs in a given condition that were also detected in at least one other stimulated condition. Columns represent the number of conditions in which an reQTL was detected. b Cumulative percentage of detected reQTLs (red line and points, left hand y axis) and reQTL detection rate (blue boxplots, right hand y axis) versus the number of conditions in which an reQTL was detected. Box plots represent the median (centre line), the first and third quartiles (bounds of the box), and the whiskers, which extend to the smallest and largest values within 1.5x the interquartile range from the bounds of the box. Data points outside this range are plotted as outliers. The cumulative percentage of detected reQTLs (red line, left-hand y-axis) was calculated as the running sum of the mean detection rate across conditions. The x-axis indicates the number of stimulated conditions in which a reQTL was detected, and the right-hand y-axis shows the detection rate. All box plots are based on n = 22 stimulated conditions.
To investigate the stability of reQTLs over time, we conducted pairwise comparisons between the 6-h and 24-h time points within each cell condition, as well as between day 0 and day 2 for precursor cells. To define time point specificity, we examined whether the reQTLs identified in the initial condition (6 h, lfsr ≤0.05) were absent in the second condition (24 h, lfsr >0.05) and vice versa. We observed substantial variation in time-specificity across conditions. For example, cells stimulated with PIC or IFNβ had relatively few time point-specific reQTLs (18–23.4%), perhaps reflecting permanent changes in chromatin state produced by IFNβ stimulation30. In contrast 85% of reQTLs detected in the precursor cells were specific to day, likely reflecting the very substantial changes in transcriptional regulation that occur during cell differentiation (Supplementary Fig. 5c).
The GTEx consortium has completed the most extensive and widely used study of eQTLs to date3. The vast majority of GTEx samples were post-mortem tissues and it is unclear how suitable these are for detecting condition-specific genetic effects in immune response. To investigate this, we tested whether the reQTLs detected in our study were also found in GTEx, estimating the overlap using the π1 statistic31 (Supplementary Fig. 6). On average, half (54%) of the reQTLs detected in a given condition are replicated in a GTEx tissue, significantly lower than the replication rate of non-response eQTLs (63%) (Supplementary Fig. 6c, d) (Wilcoxon p = 4.9 × 10−47), with the highest replication rates observed for GTEx whole blood (Supplementary Fig. 6b). This suggests that many of the immune pathways we have activated in our study are, at some level, also active in post-mortem tissue.
Stimulation specific genetic regulation in disease
We next investigated the relevance of reQTLs in disease. We collated a set of 83 well powered (ten or more genome-wide significant loci (p ≤ 5.0 × 10−8) GWAS including 22 immune-mediated, 13 blood-related, 3 cancer, 11 cardiovascular, 15 neurological and 19 other traits or diseases (Supplementary Data 9). We found evidence of colocalization32 between a disease association signal and an eQTL for 1955 (Supplementary Data 10) unique eGenes across all traits and conditions (posterior probability of sharing a single causal variant PP4 > 0.75). Relative to the naive conditions, stimulation often increased the strength of evidence of colocalization with disease. We found that including stimulated states substantially boosted our discovery of disease-eQTL colocalisations with only 32% of all colocalisations (631 eGenes) detected in naive cells (Fig. 4a).
a Cumulative number of colocalized eGenes seeded in naïve conditions (Ctrl_24, Ctrl_6), with stimulated conditions shown in increasing order. The red line represents the cumulative count of colocalized eGenes exclusively in naive conditions. b Boxplots of percentages of GWAS loci colocalized with response eQTLs and non-response eQTLs across 6 main GWAS categories. Data are presented as box plots for each category, faceted by eQTL type (response eQTLs and non-response eQTLs). Box plots represent the median (centre line), the first and third quartiles (bounds of the box), and the whiskers, which extend to the smallest and largest values within 1.5x the interquartile range from the bounds of the box. Data points outside this range are plotted as outliers. The colocalization percentages were calculated as the proportion of colocalized significant regions among all identified significant regions per GWAS study. Sample sizes (n) for each GWAS category are as follows: Autoimmune or Inflammatory disease (n = 22), Blood related diseases and traits (n = 13), Cancer (n = 3), Heart related diseases and traits (n = 11), Neuro related diseases (n = 15), and Other (n = 19). c Percentages of GWAS loci colocalized per trait and category. Grey lines indicate the mean percentage of colocalization with response eQTLs across all traits for a specific GWAS category while the light grey line indicates the mean percentage of colocalization with non-response eQTLs.
The rise in the number of eGenes with colocalization evidence can be attributed to both the inclusion of more conditions relevant to disease and the increased power due to the substantial level of biological replication in our study. To investigate this further, we examined the frequency of disease-colocalized eQTLs that had significantly different effect sizes in naive and stimulated conditions using mashr. We found that 21.7% (424/1955; Fig. 4b, c) of eGenes with colocalization evidence had at least one reQTL, and among these, 89.4%(379) had a larger effect size in the stimulated condition than in the corresponding naive condition. Disease-colocalizing eGenes showed a significant overrepresentation of reQTL genes (p = 0.05, Fisher’s exact test). This suggests that a substantial number of disease-related eQTLs will only be detected in stimulated conditions, because their effect sizes in naive cells are too small.
Our analysis revealed specific traits, as depicted in (Fig. 4c), where the use of stimulated macrophages appeared to be more relevant compared to naive conditions. Furthermore, it is important to note that, despite no individual trait exhibiting a substantial increase in the number of eGenes with evidence of colocalization due to reQTLs, this underscores the significance of reQTLs in unravelling disease loci that cannot be explained by naive eQTLs even though their transformative impact on significantly reshaping our comprehension remains limited.
Next, to determine the effectiveness of MacroMap in identifying effector genes within GWAS loci, we compared our results to those obtained using GTEx data. We asked how many of the 1955 disease colocalizations we found in our study could not have been identified using GTEx eQTLs. We found that 998 eGenes (51% 988/1955) were colocalized with higher confidence in our dataset (PP4 > 0.75 in MacroMap versus PP4 < 0.5 across all GTEx tissues) (Supplementary Fig. 7), 164 (16.6% 164/988) of which were defined by mashr as reQTLs. All 164 genes were expressed (minimum TPM = 0.8, mean TPM across all tissues = 165) in at least one GTEx tissue, and had a detectable eQTL (qvalue ≤0.05) in one or more GTEx tissues.
A response eQTL implicates increased expression of CTSA with increased risk of coronary artery disease
One such colocalization event was identified between the CAD-associated risk locus at 20q13.1233,34 and a reQTL observed 24 h after stimulation with either sLPS or sLPS + IL10 (LIL10) (PP4 = 0.98 and PP4 = 0.99 respectively, Supplementary Fig. 8a, d) and not with Ctrl conditions (Supplementary Fig. 8b). The risk-increasing allele (rs3827066, C > T) was associated with increased expression of CTSA without the gene being differentially expressed after stimulation with either sLPS or LIL10 (Supplementary Fig. 8c). Moreover, this risk allele has also been associated with increased risk of Abdominal Aortic Aneurysm (AAA)35, a common complication of vessel wall impairment caused by various predisposing factors including atherosclerosis. Despite these disease associations being known for several years, the disease effector gene in the region had remained elusive. For example, the Open Targets Genetics Portal (https://genetics.opentargets.org/) showed that there are 34 genes in the locus, none of which had a variant-to-gene score >0.3, with CTSA ranked as the seventh most likely effector gene in the region. The TSS of CTSA is located 67.24 kb upstream of rs3827066, with six other genes in closer proximity to the SNP. Data from a promoter capture Hi-C study of circulating immune cells demonstrated that rs3827066 lies in a distal regulatory element of CTSA or NEURL2 in many immune cell types, including monocytes and macrophages36. Our reQTL and colocalization results suggest the disease associated variant acts via dysregulation of CTSA expression.
CTSA is a lysosomal protective protein with both intracellular and extracellular functions37. In the lysosome, it is essential for both the activity and protection of beta-galactosidase and neuraminidase (NEU1) (complex with CTSA), an enzyme which cleaves terminal sialic acid residues from substrates such as glycoproteins and glycolipids38. The removal of sialic acid from TLR4, a 2,3 sialylated pattern recognition receptor, with the action of NEU139,40, facilitates LPS recognition and the subsequent downstream activation of NF-κB signalling and inflammatory cytokine production. Outside the cell, CTSA has been suggested to play a role in extracellular matrix (ECM) remodelling41,42,43,44. ECM remodelling is an integral part of several chronic inflammatory processes including atheromatous plaque formation within vessel walls, and subsequent CAD45. The cardiac expression of CTSA is upregulated in multiple animal models of myocardial infarction, Type 2 Diabetes and angiotensin II-stimulated hypertrophy46,47,48,49. Increased expression of CTSA has been shown to trigger proteolysis of the extracellular antioxidant enzyme EC-SOD, resulting in higher levels of oxidative stress, myocyte hypertrophy, ECM remodelling, and inflammation44,50. We thus hypothesise that rs3827066 increases risk of CAD and AAA by disrupting a CTSA regulatory element, leading to increased expression of CTSA and elevated oxidative stress during the ECM remodelling stage of atheromatous plaque formation. Detecting this CTSA eQTL only upon stimulation (i.e. reQTL) is consistent with ECM remodelling being induced following a prolonged period of vessel wall inflammation in CAD. Overall, our findings suggest that this CAD-associated risk locus exacerbates oxidative stress in the ECM by abnormally upregulating CTSA following an inflammatory response.
Discussion
In this study, we used a high-throughput cellular system of human IPSC-derived immune cells to survey inter-individual variation in gene expression across a range of stimulated conditions. We identified 10,170 eGenes resulting from the combination of higher depth expression profiling, and enhanced detection of both condition-specific gene expression and the condition-specific effects of genetic variants. Using a statistical method that accounts for low power, we detected a significant change in genetic effect size between naive and stimulated cells in 23% of all eGenes. For this set of response eQTLs we found that the majority were shared widely across stimulated conditions, with effects found in a single condition constituting a small fraction (1%). We discovered 1955 disease-eQTL colocalizations of which 51% were not detectable in any GTEX tissue suggesting many disease-associated variants may function in a condition-specific manner.
Perhaps surprisingly, our results suggest that a high level of biological replication was at least as important as condition-specific genetic effects in boosting our discovery rate. More generally, it is unclear how many eQTLs could have been detected by a better powered study of naive cells rather than profiling a large number of stimulated conditions. Nonetheless, profiling stimulated cells proved valuable for interpreting disease loci. We found that 21.7% (617) of all disease colocalizations had a different effect size, usually greater, in a stimulated condition compared to naive cells. However, we believe our estimate for the true number of response QTLs is likely to be a lower bound, mainly limited by statistical power. In some cases, the statistical method we used performed aggressive shrinkage of eQTL effect sizes towards zero. For example, we observed colocalization of a CD80 eQTL after stimulation with sLPS at 6 h that was not confidently colocalized in naive cells. Despite this discrepancy in colocalization confidence between naive and stimulated macrophages, this eQTL was not considered a reQTL by mashr (Supplementary Figs. 9 and 10). We expect that future studies with larger sample sizes will increase the number of eQTL effects that can be confidently considered response eQTLs.
Several recent studies suggest that a smaller than expected fraction of GWAS loci colocalize with an eQTL3,5 either because genetic effects are often restricted to cell types and conditions that are not causal for the trait6. MacroMap represents the most comprehensive characterisation of macrophage eQTLs in different environmental contexts to better understand how genetic variants impact human traits. Our findings highlight the value of reQTLs for expanding our understanding of complex diseases. While their transformative impact on significantly reshaping our comprehension remains limited, reQTLs offer unique insights into the intricate genetic regulation that surpass what naive eQTLs can provide. Their contribution to uncovering previously unexplained disease loci is undeniable, paving the way for further exploration and bringing us closer to a comprehensive understanding of the complex genetic mechanisms underlying the disease.
Despite MacroMap being able to uncover a significant proportion of “missing” disease-relevant eQTLs, we acknowledge several key limitations of our study for post-GWAS prioritisation. eQTL data alone lacks the complete picture of how genetic associations relate to regulatory function. This highlights the need for broader, integrated methods to fully understand the regulatory roles of disease-associated variants51. Furthermore, eQTLs exhibit high context-dependence, varying significantly across different tissues and environmental conditions, thereby limiting the generalisability of our findings. Moreover, the complex nature of epigenetic and transcriptional variation adds layers of intricacy to identifying disease-relevant regulatory elements, something that we partially consider here. Lastly, as with many eQTL studies we assume a single causal variant per locus for colocalization. These assumptions do not accommodate the multifactorial nature of gene regulation, where multiple variants can influence multiple genes and vice versa. This complicates the establishment of true causal relationships between genetic variants and gene expression.
Upcoming large-scale single-cell QTL and response QTL studies of primary tissues and appropriate cell models will further advance our ability to detect disease-relevant genetic effects. Well-powered cell type and context specific QTL studies of molecular traits with different genomic properties (for example, splicing, chromatin accessibility and chromatin interactions) will likely further improve our ability to understand human disease biology.
Methods
iPSC lines
No statistical methods were used to predeterming sample size. Human iPSCs lines from healthy donors of European descent were selected from the HipSci project20 (http://www.hipsci.org) for differentiation to macrophages (see Supplementary Note). All HipSci samples were collected from consenting research volunteers recruited from the NIHR Cambridge BioResource (https://bioresource.nihr.ac.uk/studies/cbr62/; NIHR BioResource Study Code: CBR62; ethical approvals REC 09/H0304/77, V3 15/03/2013 and REC 09/H0304/77, V2 04/01/2013). None of cell lines were reported to be commonly misidentified according to ICLAC v13. Briefly, 315 lines were initially selected and 227 of them (71.6%) were successfully differentiated. RNA-seq libraries were produced for 217 lines and based on quality control 209 unique lines (Supplementary Data 11) (4698 unique RNA-seq libraries across all conditions) were included in the final dataset (Supplementary Fig. 1A, B).
RNA-seq and quality control
To process the large number of libraries more efficiently, two RNA-seq library construction protocols were utilised, including a modified Smart-seq2 protocol and the NEBnext Ultra II Directional RNA Library kit (further details provided in the supplementary note). However, this resulted in a batch effect due to the different library preparation methods. This effect was included as a covariate in all downstream analyses.
RNA-seq reads (75 bp paired-end) were aligned to the GRCh38 reference human genome and gencode v.27 transcript annotation using STAR_2.5.3a52. To quantify gene expression we used featureCounts v1.5.353. We kept protein-coding and lincRNA genes in all analyses with mean expression >= 0.5 transcripts per million (TPMs) in at least half of the conditions (≥)12), resulting in a total of 14,060 genes. To ensure the quality of the samples, we employed several QC metrics. Principal Components Analysis (PCA) was performed per 96-library pool (4 iPSC lines per pool, 24 conditions per iPSC line) to detect sequencing outliers. Non-stimulated or mislabeled labelled stimulated samples were identified and discarded based on pairwise PCA comparisons of each condition with the rest of the conditions, per 96-library pool. Sex incompatibility checks were also performed using the methods described in ref. 54 and 3 iPSC-lines (72 samples) were discarded due to discordant sex annotations. Subsequently, we performed UMAP analysis55 to cluster the different conditions and wrongly labelled samples that passed PCA filtering were discarded. Finally, we utilised the Match BAM to VCF (MBV) method56,57 to detect sample swaps and cross contamination between RNA-seq samples. We discarded 3 iPSC-lines (72 samples) and 63 additional samples due to cross contamination, and corrected the labels for 23 iPSC-lines identified as swaps. We did not observe concordance of genotype-RNA-seq data for 4 lines which we kept in the final dataset for differential expression analysis but discarded from eQTL mapping. Among the 23 swaps, 2 lines were identical with lines already present in the data and were subsequently removed from the dataset.
In total, we discarded 8 iPSC-lines (~3.7% of the successfully differentiated lines) and 510 RNA-seq samples (~9.8%) based on our QC metrics (318 samples based on all QC metrics, 192 from the discarded 8 iPSC-lines, Supplementary Fig. 1C) resulting in a total of 4698 unique RNA-seq libraries across all conditions.
Variance component analysis
We adopted the same approach that was implemented in ref. 58 to quantify transcriptional variation. In brief, we used a linear mixed model that employed log10(TPM + 0.1) values for the 14,060 genes, with 15 technical confounders (including RunID, Donor, Stimulus Hours, Sex, Library preparation method, Date thawed, Passage number at thawing, Passage EB formation, IPSCs culture time, Total Harvests, Differentiation time No of Days, Purity results, Estimated cell diameter, SD cell diameter, and Differentiation media) fitted as random effects with independent variance parameters \({\varphi }_{\kappa }^{2}\). We measured the variance explained by factor k using the intraclass correlation \({\varphi }_{\kappa }^{2}\)/(1+\({\varphi }_{\kappa }^{2}\)), while the remaining 14 factors were held constant. The standard error of the intraclass correlation was computed using the delta method, with the standard error of the variance parameter estimator.
Genotypes
GRCh37 imputed genotypes were obtained from the HipSci project20. We utilised CrossMap59 to lift over the variant coordinates from GRCh37 to GRCh38. We then used bcftools to filter the resulting VCF file, retaining only variants with INFO score >0.4 and minor allele frequency (MAF) > 0.05. To address population stratification, we used EIGENSTRAT60 to calculate genotype principal components (PC) for the retained variants.
UMAP clustering and visualisation
To visualise the transcriptional variation across conditions, we applied UMAP analysis to the gene expression data. Prior to UMAP, we performed several preprocessing steps on the log-transformed transcripts per million (TPMs). First, we quantile-normalised the log-TPMs to remove technical differences between samples. Next, we applied a rank-based inverse normal transformation to ensure that the gene expression values were normally distributed. Finally, we regressed out (linear regression) the effects of several covariates including runID, donor, library preparation method, sex, purity results, differentiation media, estimated cell diameter, and Differentiation time No Days (Time in days from EB plating until the day of successful harvest) to account for technical variation and batch effects. The resulting UMAP plot provided a low-dimensional visualisation of the transcriptional differences among the different conditions.
Differential gene expression analysis
DESeq222 was used to identify differentially expressed genes between the naive and stimulated conditions, and SVA (surrogate variable analysis)61 was employed to detect hidden technical variation that could not be captured by our technical covariates. Specifically, we fitted the samples from both the 6 and 24 h time points of the stimulated and naive conditions together and included 10 SVA factors that were determined from the overall sample composition. An interaction term was also included in the model as shown below:
DESeqDataSet(group1_vs_group2,design = ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 + Stimulus + Hours + Stimulus:Hours).
Following this, the same model was fitted without the interaction term and a likelihood ratio test (test = “LRT”) was performed to compare the full model (including all SVA factors and the interaction term) to the reduced model (including all SVA factors but not the interaction term).
To identify differentially expressed genes at specific timepoints (either 6 or 24 h) or genes showing different differential expression patterns between time points (interaction term, time point effects), we used the Wald test (test = “Wald”, alpha = 0.05) in DESeq2. Finally, we assessed significance at a 5% false discovery rate (FDR) using the Benjamini-Hochberg and kept genes with abs(log2FoldChange) ≥)1.
Gene Ontology (GO) and Reactome pathway enrichment analyses were conducted using the clusterProfiler and ReactomePA R packages. Over-representation of biological processes and pathways among differentially expressed genes (DEGs) was determined using a hypergeometric test. p values were adjusted for multiple testing using the Benjamini-Hochberg method, with terms or pathways considered significant at an adjusted p value (FDR) < 0.05.
eQTL mapping
We mapped cis-eQTLs within ±1Mbp of the transcription start site (TSS) of each gene using QTLtools v.1.156. Briefly, QTLtools conducts permutations of the expression data for each gene to record the best p value for any SNP in the cis window. The distribution of the best p values follows a beta distribution under the null hypothesis, and QTLtools estimates the parameters of the beta distribution of each gene through maximum likelihood, which depends on the LD structure of the cis region. An adjusted gene-level p value is computed based on the beta distribution for each gene. To correct for multiple testing across all genes, we used the q value R package on the adjusted gene-level p values obtained from 1000 permutations and significance was assessed at 5% FDR (q value < 0.05) to identify genes with at least one significant cis-eQTL (“eGenes”). We included expression PCs (35–50 depended on condition) and 3 genotyping PCs as covariates to correct for technical variation and capture population stratification. To determine the optimal configuration (number of expression PCs per condition) that maximised the number of discoveries (eGenes), we repeated the entire analysis multiple times using different numbers of expression PCs. Multiple independent signals (5% FDR) for a given eGene were identified by forward stepwise regression followed by a backwards selection step implemented in QTLtools (conditional pass).
Functional enrichment analysis
We performed an enrichment analysis of genomic annotations to investigate the functional implications of our identified eQTLs. Firstly, we utilised Ensembl’s Variant Effect Predictor (VEP) and the Ensembl Regulatory Build to annotate the eQTLs. To identify specific genomic annotations enriched among our eQTLs, we used the first stage hierarchical model implemented in PHM62 (https://github.com/natsuhiko/PHM).
Response eQTLs using Multivariate Adaptive Shrinkage (mash)
In order to determine which eQTLs in our dataset were truly restricted to stimulation (reQTLs) we used mashr28 by following the workflow provided by the authors of mashr(https://stephenslab.github.io/mashr/articles/eQTL_outline.html). Initially, we calculated the standard errors of QTL effect sizes (betas) from QTLtools nominal output, which were combined with effect sizes as input data for mash. Our analysis consisted of two subsets of tests: a random subset of 200,000 tests comprising both null and non-null tests, and a more focused “strong” subset that specifically included the lead SNP (lowest p-value) per gene across all conditions, emphasising the most impactful associations. While the strong subset of tests was used to learn data-driven covariance matrices, the random subset of tests was used to estimate mixture weights and scaling coefficients, as well as to learn the correlation structure among null tests. Additionally, we employed the mashr mode “mashr with common baseline,” as described here (https://stephenslab.github.io/mashr/articles/intro_mashcommonbaseline.html), by setting Ctrl_24 as our baseline condition and excluding Ctrl_6 from the mash analysis. In the common baseline mode, mashr estimated the deviation of eQTL effect sizes in each alternative condition from that of the baseline condition, taking into account the correlation that arises when comparing all conditions to a common baseline. To execute the analysis, we first fitted the mashr model to the random tests to determine the mixture weights. We then utilised the model fit to compute the posterior mean effect sizes (mash effect sizes) on the best associated SNP per gene for every stimulation condition. We considered significant response eQTLs (reQTLs) gene-SNP pairs with a local false sign rate (lfsr below 0.05. The lfsr is a measure that is stricter than the false discovery rate (FDR) since it not only requires significant discoveries to have a nonzero value, but also to have a consistent sign29. To determine the reQTLs that have consistent effects across multiple conditions (shared reQTLs) or function in only a single condition (condition-specific reQTLs), we needed to investigate whether the gene-SNP pair for a particular condition had lfsr <0.05 in the other conditions. This allowed us to quantify the level of sharing of response effects among the different conditions. For instance, if a particular gene-SNP pair was significant in three out of four stimulation conditions, we considered it a shared reQTL across those three conditions. Conversely, if a gene-SNP pair was significant only in one condition, we classified it as a condition-specific reQTL for that particular condition.
Colocalization
Colocalization analysis was performed with coloc v3.2-132 between our eQTL summary statistics and 83 publicly available GWAS summary statistics (either from GWAS catalogue or from UK BioBank GWAS63). These GWAS represented 22 immune-mediated, 13 blood-related, 3 cancer, 11 cardiovascular, 15 neurological, and 19 other traits or diseases (Supplementary Data 9). To ensure that we only included datasets with sufficient statistical power, we only considered GWAS datasets that had ten or more genome-wide significant regions (p ≤ 5.0 × 10−8). Specifically, to identify significant regions, we created bed files for all SNPs that had (p ≤ 5.0 × 10−8) for each GWAS study. We then used the ‘bedtools merge -d 500000’ command to combine overlapping variants into a single region that spanned all the combined variants. The regions were expanded by 500 kb on either side, and any overlapping regions were merged again. Next, we run coloc on a 2 Mb window centred on each lead eQTL, using default priors and set the colocalization threshold as PP4 > 0.75. The colocalization proportions were calculated as the proportion of colocalized significant regions among all identified significant regions per GWAS.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Imputed genotype data for the HipSci lines are available from the European Variation Archive (EVA) (https://www.ebi.ac.uk/eva/?eva-study=PRJEB11749) and European Genome–phenome Archive (EGA) (EGAD00010000773). Unprocessed RNA-seq data are available from EGA under study ID EGAS00001002268 (dataset ID EGAD00001015380). Full summary eQTL statistics, raw and processed counts and full colocalization results are available from Zenodo (https://doi.org/10.5281/zenodo.7967759). Details of available results are listed in Supplementary Data 12.
Code availability
The code used is available at https://github.com/andersonlab/macromap_eqtl.
References
Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Aguet, F. et al. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Chen, L. et al. Genetic drivers of epigenetic and transcriptional variation in human immune cells. Cell 167, 1398–1414.e24 (2016).
Chun, S. et al. Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat. Genet. 49, 600–605 (2017).
Umans, B. D., Battle, A. & Gilad, Y. Where are the disease-associated eQTLs? Trends Genet. https://doi.org/10.1016/j.tig.2020.08.009 (2020).
Connally, N. et al. The missing link between genetic association and regulatory function. bioRxiv https://doi.org/10.1101/2021.06.08.21258515 (2021).
Mostafavi, H., Spence, J. P., Naqvi, S. & Pritchard, J. K. Systematic differences in discovery of genetic effects on gene expression and complex traits. Nat. Genet. 55, 1866–1875 (2023).
Barreiro, L. B. et al. Deciphering the genetic architecture of variation in the immune response to Mycobacterium tuberculosis infection. Proc. Natl. Acad. Sci. USA 109, 1204–1209 (2012).
Lee, M. N. et al. Common genetic variants modulate pathogen-sensing responses in human dendritic cells. Science 343, 1246980 (2014).
Fairfax, B. P. et al. Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression. Science 343, 1246949 (2014).
Nédélec, Y. et al. Genetic ancestry and natural selection drive population differences in immune responses to pathogens. Cell 167, 657–669.e21 (2016).
Quach, H. et al. Genetic adaptation and neandertal admixture shaped the immune system of human populations. Cell 167, 643–656.e17 (2016).
Kim-Hellmuth, S. et al. Genetic regulatory effects modified by immune activation contribute to autoimmune disease associations. Nat. Commun. 8, 266 (2017).
Alasoo, K. et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat. Genet. 50, 424–431 (2018).
Huang, Q. Q. et al. Neonatal genetics of gene expression reveal potential origins of autoimmune and allergic disease risk. Nat. Commun. 11, 3761 (2020).
Lea, A. J., Peng, J. & Ayroles, J. F. Diverse environmental perturbations reveal the evolution and context-dependency of genetic effects on gene expression levels. Genome Res. 32, 1826–1839 (2022).
Jerber, J. et al. Population-scale single-cell RNA-seq profiling across dopaminergic neuron differentiation. Nat. Genet. 53, 304–312 (2021).
Schmiedel, B. J. et al. Single-cell eQTL analysis of activated T cell subsets reveals activation and cell type-dependent effects of disease-risk variants. Sci. Immunol. 7, eabm2508 (2022).
Kilpinen, H. et al. Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370–375 (2017).
Kumar, A., Zhang, J. & Yu, F.-S. X. Toll-like receptor 3 agonist poly(I:C)-induced antiviral response in human corneal epithelial cells. Immunology 117, 11–21 (2006).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Fabregat, A. et al. Reactome pathway analysis: a high-performance in-memory approach. BMC Bioinforma. 18, 142 (2017).
Xue, J. et al. Transcriptome-based network analysis reveals a spectrum model of human macrophage activation. Immunity 40, 274–288 (2014).
Cheng, Y. S., Colonno, R. J. & Yin, F. H. Interferon induction of fibroblast proteins with guanylate binding activity. J. Biol. Chem. 258, 7746–7750 (1983).
Kim, B.-H. et al. A family of IFN-γ–inducible 65-kD GTPases protects against bacterial infection. Science 332, 717–721 (2011).
Kim, B.-H. et al. Interferon-induced guanylate-binding proteins in inflammasome activation and host defense. Nat. Immunol. 17, 481–489 (2016).
Urbut, S. M., Wang, G., Carbonetto, P. & Stephens, M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat. Genet. 51, 187–195 (2019).
Stephens, M. False discovery rates: a new deal. Biostatistics 18, 275–294 (2017).
Qiao, Y. et al. Synergistic activation of inflammatory cytokine genes by interferon-γ-induced chromatin remodeling and toll-like receptor signaling. Immunity 39, 454–469 (2013).
Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA100, 9440–9445 (2003).
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
LeBlanc, M. et al. Identifying novel gene variants in coronary artery disease and shared genes with several cardiovascular risk factors. Circ. Res. 118, 83–94 (2016).
van der Harst, P. & Verweij, N. Identification of 64 novel genetic loci provides an expanded view on the genetic architecture of coronary artery disease. Circ. Res. 122, 433–443 (2018).
Jones, G. T. et al. Meta-analysis of genome-wide association studies for abdominal aortic aneurysm identifies four new disease-specific risk loci. Circ. Res. 120, 341–353 (2017).
Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384.e19 (2016).
Galjart, N. J. et al. Human lysosomal protective protein has cathepsin A-like activity distinct from its protective function. J. Biol. Chem. 266, 14754–14762 (1991).
Bonten, E., van der Spoel, A., Fornerod, M., Grosveld, G. & d’Azzo, A. Characterization of human lysosomal neuraminidase defines the molecular basis of the metabolic storage disorder sialidosis. Genes Dev. 10, 3156–3169 (1996).
Feng, C. et al. Sialyl residues modulate LPS-mediated signaling through the Toll-like receptor 4 complex. PLoS ONE 7, e32359 (2012).
Karmakar, J., Roy, S. & Mandal, C. Modulation of TLR4 sialylation mediated by a sialidase Neu1 and impairment of its signaling in leishmania donovani infected macrophages. Front. Immunol. 10, 2360 (2019).
Jackman, H. L. et al. Angiotensin 1-9 and 1-7 release in human heart. Hypertension 39, 976–981 (2002).
Seyrantepe, V. et al. Enzymatic activity of lysosomal carboxypeptidase (cathepsin) A is required for proper elastic fiber formation and inactivation of endothelin-1. Circulation 117, 1973–1981 (2008).
Timur, Z. K., Akyildiz Demir, S. & Seyrantepe, V. Lysosomal cathepsin A plays a significant role in the processing of endogenous bioactive peptides. Front. Mol. Biosci. 3, 68 (2016).
Hohl, M. et al. Cathepsin A contributes to left ventricular remodeling by degrading extracellular superoxide dismutase in mice. J. Biol. Chem. 295, 12605–12617 (2020).
Lu, P., Takai, K., Weaver, V. M. & Werb, Z. Extracellular matrix degradation and remodeling in development and disease. Cold Spring Harb. Perspect. Biol. 3, a005058 (2011).
Ruf, S. et al. Novel β-amino acid derivatives as inhibitors of cathepsin A. J. Med. Chem. 55, 7636–7649 (2012).
Linz, D. et al. Cathepsin A mediates susceptibility to atrial tachyarrhythmia and impairment of atrial emptying function in Zucker diabetic fatty rats. Cardiovasc. Res. 110, 371–380 (2016).
Petrera, A. et al. Cathepsin A inhibition attenuates myocardial infarction-induced heart failure on the functional and proteomic levels. J. Transl. Med. 14, 153 (2016).
Hohl, M. et al. Cathepsin A Mediates Ventricular Remote Remodeling and Atrial Cardiomyopathy in Rats With Ventricular Ischemia/Reperfusion. JACC Basic Transl. Sci. 4, 332–344 (2019).
Kurianiuk, A., Socha, K., Gacko, M., Błachnio-Zabielska, A. & Karwowska, A. The relationship between the concentration of cathepsin A, D, and E and the concentration of copper and zinc, and the size of the aneurysmal enlargement in the wall of the abdominal aortic aneurysm. Ann. Vasc. Surg. 55, 182–188 (2019).
El Garwany, O. et al. Splicing QTL mapping in stimulated macrophages associates low-usage splice junctions with immune-mediated disease risk. Nat. Commun. https://doi.org/10.1038/s41467-025-61669-2 (2025).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
't Hoen, P. A. C. et al. Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories. Nat. Biotechnol. 31, 1015–1022 (2013).
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. arXiv https://doi.org/10.48550/arXiv.1802.03426 (2018).
Delaneau, O. et al. A complete tool set for molecular QTL discovery and analysis. Nat. Commun. 8, 15452 (2017).
Fort, A. et al. MBV: a method to solve sample mislabeling and detect technical bias in large combined genotype and sequencing assay datasets. Bioinformatics 33, 1895–1897 (2017).
Young, A. M. H. et al. A map of transcriptional heterogeneity and regulatory variation in human microglia. Nat. Genet. 53, 861–868 (2021).
Zhao, H. et al. CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics 30, 1006–1007 (2014).
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
Leek, J. T. & Storey, J. D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, 1724–1735 (2007).
Kumasaka, N., Knights, A. J. & Gaffney, D. J. High-resolution genetic mapping of putative causal interactions between regions of open chromatin. Nat. Genet. 51, 128–137 (2019).
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Acknowledgements
We would like to thank the Sanger Institute Scientific Operations teams and Human Genetics Informatics team for providing sample handling, data generation and computational support to enable the analyses described in this manuscript. This work was supported by Wellcome Sanger Institute Core funding from the Wellcome Trust (206194, 220540/Z/20/A). The iPSC lines were generated at the Wellcome Sanger Institute, under the Human Induced Pluripotent Stem Cell Initiative funded by a strategic award (WT098503) from the Wellcome Trust and Medical Research Council. N.I.P. was supported by the Early Postdoc Mobility fellowship from the Swiss National Science Foundation (grant number 178005). M.I. supported by core funding from the British Heart Foundation (RG/18/13/33946), NIHR Cambridge Biomedical Research Centre (IS-BRC-1215-20014), BHF Chair Award (CH/12/2/29428) and Cambridge BHF Centre of Research Excellence (RE/18/1/34212).
Author information
Authors and Affiliations
Contributions
N.I.P. performed the analyses and, along with O.E.G., C.A.A. and D.J.G. drafted the manuscript with contributions from all authors. O.E.G. assisted with multiple analyses, independently validated eQTL results and provided feedback to improve the manuscript. J.R. performed the conditional eQTL analysis. N.K. provided statistical feedback on several occasions, assisted with multiple analyses, and conducted analysis on the pilot data. M.I., L.B.V., A.Ts. and C.G. applied the differentiation protocol, performed QC metrics on the differentiated macrophages and carried out the stimulations under the supervision of C.G. A.K. optimised the low-input bulk RNA-seq protocol and prepared the RNA-seq libraries along with M.I. and A.B. A.To. created the web portal to enhance online access to our resources. D.J.G. conceptualised the study while C.A.A. and D.J.G. supervised this work.
Corresponding author
Ethics declarations
Competing interests
C.A.A. has received consultancy or lectureship fees from Genomics plc, BridgeBio and GlaxoSmithKline. The rest of the authors declare no competing financial or non-financial interests. D.J.G. was an employee of BioMarin and N.I.P. was an employee of GSK at the time the manuscript was submitted.
Peer review
Peer review information
Nature Communications thanks Hanrui Zhang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Panousis, N.I., El Garwany, O., Knights, A. et al. Gene expression QTL mapping in stimulated iPSC-derived macrophages provides insights into common complex diseases. Nat Commun 16, 7204 (2025). https://doi.org/10.1038/s41467-025-61670-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-61670-9