iPSCs and iPSC-derived cells as a model of human genetic and epigenetic variation

Quaid, Kara; Xing, Xiaoyun; Chen, Yi-Hsien; Miao, Yong; Neilson, Amber; Selvamani, Vijayalingam; Tran, Aaron; Cui, Xiaoxia; Hu, Ming; Wang, Ting

doi:10.1038/s41467-025-56569-4

Download PDF

Article
Open access
Published: 18 February 2025

iPSCs and iPSC-derived cells as a model of human genetic and epigenetic variation

Nature Communications volume 16, Article number: 1750 (2025) Cite this article

14k Accesses
12 Citations
17 Altmetric
Metrics details

Subjects

Abstract

Understanding the interaction between genetic and epigenetic variation remains a challenge due to confounding environmental factors. We propose that human induced Pluripotent Stem Cells (iPSCs) are an excellent model to study the relationship between genetic and epigenetic variation while controlling for environmental factors. In this study, we have created a comprehensive resource of high-quality genomic, epigenomic, and transcriptomic data from iPSC lines and three iPSC-derived cell types (neural stem cell (NSC), motor neuron, monocyte) from three healthy donors. We find that epigenetic variation is most strongly associated with genetic variation at the iPSC stage, and that relationship weakens as epigenetic variation increases in differentiated cells. Additionally, cell type is a stronger source of epigenetic variation than genetic variation. Further, we elucidate a utility of studying epigenetic variation in iPSCs and their derivatives for identifying important loci for GWAS studies and the cell types in which they may be acting.

Tracing genomic instability in induced mesenchymal stromal cell manufacture: an integration-free transfection approach

Article Open access 14 April 2025

A resource of induced pluripotent stem cell (iPSC) lines including clinical, genomic, and cellular data from genetically isolated families with mood and psychotic disorders

Article Open access 16 December 2023

Passage number affects differentiation of sensory neurons from human induced pluripotent stem cells

Article Open access 23 September 2022

Introduction

While each of our cells has the same genetic template, our tissues are comprised of hundreds of cell types with different gene expression profiles, morphologies, and functions¹. This is because epigenetic features, such as histone modification, DNA methylation, and chromatin accessibility regulate which genes are expressed in a cell, informing cellular function and regenerative potential^2,3,4,5. Interactions between genotype and epigenetic mechanisms impact gene activities critical for cell type-specific functions. For example, demethylation of an enhancer linked to the SHH gene causes polydactyly^2,6. Similarly, a mutation in the DNMT3A gene results in reduced de novo methylation, worsening cancer progression⁷. While cross-talk between the genome and the epigenome has become widely appreciated in studies of gene regulation with disease etiology⁸, standardized models for genome-wide benchmarks of such interactions are still lacking. Until we comprehensively characterize the interaction between genetic variation and epigenetic variation, our understanding of the molecular mechanism of gene regulation will remain obscure.

Environmental influences like sunlight, diet, and various other exposures can play a confounding role when we study the interaction between epigenetic variation, genetic variation, and gene expression. This is because environmental factors can affect transcription as well as epigenetic features like DNA methylation^9,10,11. In order to study epigenetic impacts on transcription at the population scale, we would need to account for every exposure an individual has had from before birth, or even from before conception. For example, classic epigenetic studies have shown that starvation events from generations back could have an impact on someone’s DNA methylation profile^12,13. All of these confounding factors from the environment make it challenging to study the interaction between genetic variation and epigenetic variation at the population level.

One appealing way to untangle epigenetic variation and environmental variation is to remove environmental variation entirely, and study the epigenome in vitro using human induced Pluripotent Stem Cells (iPSCs). Specifically, iPSCs are generated from a somatic cell type, in this case, renal epithelial cells, and reprogrammed into a stem-like state, with differentiation potential similar to embryonic stem cells¹⁴. As they were reprogrammed, the iPSCs underwent changes in transcription factor binding, chromatin state, and DNA methylation across the genome^14,15,16,17. Previous studies have shown that iPSCs have donor-specific DNA methylation and gene expression even after reprogramming^18,19,20, supporting our rationale that iPSCs maintain epigenetic variation that is functionally related to genetic variation while removing environmental variation. In this study, we sought to evaluate whether donor-specific chromatin accessibility and DNA methylation are maintained as cells were differentiated to different germ layers, represented by neural stem cells (NSCs), motor neurons, and monocytes.

Regenerative medicine and disease modeling are important applications of iPSCs and iPSC-derived cell types, such as neural stem cells (NSCs), motor neurons, and monocytes^21,22,23. In this work, we investigate the degree to which iPSC-derived cells maintain donor-specific epigenetic patterns and shed some insights into how well-suited iPSC models are useful for studying epigenetic mechanisms that may be involved in phenotypes like Alzheimer’s or Parkinson’s disease²⁴ in neuronal cells, which are more difficult to study without stem cell technology.

We have leveraged deep sequencing data to evaluate the relationship between genetic variation, chromatin accessibility, and DNA methylation genome-wide, and how such relationship may impact gene expression in iPSCs. Further, we have interrogated the translationally important question of how such a relationship is maintained as iPSCs are differentiated into monocytes, NSCs, and motor neurons. We found that overall, iPSCs have donor-specific epigenetic patterns, and the direct relationship between genetic variation and chromatin accessibility is stronger in iPSCs than that in differentiated cells. In contrast, variations in DNA methylation was generally less associated directly with genetic variation, and patterns of association with DNA methylation were consistent across cell types. We found that epigenetic variation that overlies genetic variation is not necessarily more likely to be carried throughout differentiation. Notably, genes associated with epigenetic variation tend to have larger expression differences between samples. Finally, we present the value of epigenetic catalogs of iPSCs and their derivative cells as a valuable resource to interrogate disease mechanisms in different germ layers by identifying relevant cell types and points of accessibility for genome-wide association study (GWAS)-identified SNPs.

Results

Epigenetic variation increases as cells are differentiated, while the direct relationship with genetic variation becomes weaker

Epithelial cells were collected from the urine of three donors: a father daughter pair (M1 and F1), and an unrelated female donor (F2). The cells were reprogrammed into iPSCs, and assayed for gene expression, chromatin accessibility, and DNA methylation (Fig. 1A). Three iPSC lines (F1_1, F1_2, and F1_3) were derived from F1, two lines (M1_1, M1_2) from M1, and one line from F2 (F2_1) (Supplementary Data 1). Cell lines were characterized by marker gene expression (Supplementary Fig. 1 and Supplementary Data 2). Each line has two technical replicates that were grown simultaneously. We hypothesized that if there exists a base relationship between genetic variation and epigenetic variation in iPSCs after reprogramming, iPSCs from the same donor would be more similar to each other epigenetically than iPSCs from another donor. Similarly, iPSCs from related donors would be epigenetically more similar than iPSCs from unrelated donors. To combat the potential impact of reference bias, we employed a personalized genome mapping strategy, using vcf2diploid²⁵ to edit hg38 with individual variants.

**Fig. 1: Epigenetic variation increases as cells are differentiated, direct relationship to genetic variation becomes weaker.**

As expected, chromatin accessibility, DNA methylation, and gene expression are more dependent on the cell type than the donor of origin (Fig. 1B–D and Supplementary Figs. 2–5). In principal components analyses and bootstrapped hierarchical clustering, data from iPSCs, NSCs, motor neurons, and monocytes differentiated from respective iPSC lines of all donors clustered by cell type (Fig. 1B–D and Supplementary Figs. 2–5). We found that CpG methylation in iPSCs was donor-dependent, with lines from the same donor clustering together in PCA (Supplementary Fig. 6A). We calculated the number of differentially methylated regions (DMRs) between each pair of lines, and considered the level of genetic variation between samples as the relationship between donors: same donor(F1_1, F1_2, and F1_3, as well as M1_1 and M1_2), related donors (any sample from F1 compared to any sample from M1), and unrelated donors (any sample from F1 or M1 compared to the line from F2). Relatively few DMRs exist between lines from the same donor (between 10 and 46), compared to lines from related donors (between 1451 and 1585), and between 2667 and 2961 when lines from unrelated donors were compared (Fig. 1E). Taken together, our results demonstrate that iPSCs maintain donor-specific epigenomes that may be linked to genetic variation.

In the PCA plot of ATAC-seq data and gene expression data, iPSC and NSC samples cluster the most tightly together (iPSC and NSC ATAC intra-cluster distances = 893, 877; RNA intra-cluster distances = 17.6, 20.7), while motor neuron and monocyte samples are a bit more spread out (Motor Neuron and Monocyte ATAC intra-cluster distances = 1451, 1261; RNA intra-cluster distances = 43.9, 34.8), in agreement with change in extent of pluripotency of the different cell types (Fig. 1C, D and Supplementary Fig. 7). ATAC-seq intra-cluster Euclidean distances were calculated using reads under called peaks. RNA-seq distances were calculated using exonic read counts for each line sample. Notably, the two M1 iPSC lines, one clonal and one pool, do not cluster closely together in the accessible peaks or gene expression PCA, indicating that most of the variation is not driven by M1’s genetic background (Supplementary Fig. 6B, C). Differentially expressed genes between these two lines were not enriched for any relevant biological process (Supplementary Table 3), and differentiation capacity did not suggest any functional difference in these lines. This result is consistent with previous findings that iPSCs from the same donor may have variable epigenomes²⁶, highlighting the uncertainties of iPSC reprogramming and differentiation. Based on the differences between lines from the same donor, comparative analyses were performed between lines, rather than between individuals, as lines from the same donor might be biologically different.

Compared to the iPSCs, the differentiated cells generally have more differentially expressed (adjusted p < 0.05, abs(log2FC) > 1) genes, differentially accessible (adjusted p < 0.05, abs(log2FC) > 1) peaks, and differentially methylated (p < 0.001) regions between pairs of lines (Fig. 1E–G). However, in the differentiated cells, the number of differentially accessible peaks or differentially expressed genes is not consistently higher between lines that are from unrelated donors than between those of genetically related (i.e., the father-daughter pair) donors or in some cases, even between lines from the same donor. In motor neurons, we do see consistently higher numbers of DMRs between pairs of lines from unrelated donors, but this is not true of the other differentiated cell types (Fig. 1E). For example, F1_1 and F1_2 (same donor) have 1057 differentially expressed genes, while F1_1 and F2_1 (unrelated donors) have 300 differentially expressed genes. In a PCA of CpG methylation in iPSCs, each line is clustered by the donor (Supplementary Fig. 6A). In NSCs, motor neurons, and monocytes, the samples from different donors overlap, indicating that variation in DNA methylation of differentiated cells is less dependent on genetic background. Taken together, our data demonstrate that epigenetic variation greatly increases in differentiated cell types and that the donor of origin, i.e., genetic variation, is not the main contributor to such increase.

Epigenetic variation is more strongly associated with genetic variation in iPSCs than differentiated cells

We identified genetic variants in each of our donors with respect to hg38 and then identified the genetic variants that differed between donors. In all following analyses, genetic variation refers to variation between individuals. As expected, genetic variation between the father and daughter pair of donors (M1 and F1) is less than that between unrelated donors, approximately two-thirds the amount (Fig. 2A). Structural variants in the form of duplications (DUP) and inversions (INV) are much less, due to the short-read nature of our data, but the number of differences followed the same pattern (Fig. 2A). Structural variants between donors ranged in size up to 15,673bp, but the majority were between 1bp and 8bp (Fig. 2B).

**Fig. 2: Epigenetic variation is more strongly related to genetic variation in iPSCs than differentiated cells.**

Next, we examined how often genetic variation directly overlaps epigenetic variation, and in what genomic contexts the relationship might be stronger or weaker than expected by random chance. We hypothesized that the relationship between genetic and epigenetic variation would be different in different genomic contexts (e.g., promoter regions, quiescent regions) and that larger INDELs might have a stronger relationship with epigenetic variation than SNPs. Odds ratios were calculated as the odds(Epigenetic Variation | Genetic Variation)/odds(Epigenetic Variation | No Genetic Variation) between samples, across genomic contexts (genome-wide, promoter, ChromHMM quiescent²⁷) and types of genetic variation. For example, in the comparison between lines F1_1 and F2_1, 1770 differentially accessible peaks overlap a genetic variation, and 64,950 non-differentially accessible peaks overlap a genetic variation genome-wide (Supplementary Fig. 8, inset). The two lines also have 1263 differentially accessible peaks and 87,276 non-differentially accessible peaks that do not overlap a genetic variation. Therefore the odds ratio is (1770/64,950)/(1263/87,276) = 1.88. Odds ratios greater than 1 indicates that a peak is more likely to be differentially accessible if it overlaps a genetic variation than if it does not overlap a genetic variation. The direct relationship between genetic variation and chromatin accessibility varied by sample comparison, but several interesting patterns in the iPSCs emerged. Genome-wide, a differentially accessible peak was slightly more likely if it overlapped any kind of genetic variation than if it was not (aggregate OR = 1.87,95%CI: (1.83,1.91)), but these odds were significantly decreased in the context of promoters and increased in ChromHMM quiescent regions (aggregated OR = 1.26, 2.02, p-value = < 2.2e-16, 2.5e-9, respectively). Overall, INDELs were similarly associated with differential ATAC peaks as SNPs (aggregated OR = 1,72,1.73, p = 0.49), However, medium (20bp-50bp) and large (> 50bp) INDELs having the largest odds of differentially accessible peaks genome-wide (aggregated OR = 8.30, 18.01, p-values < 2.2e-16 compared to any OR with any genetic variation genome-wide) (Supplementary Fig. 9).

We further endeavored to discover whether the relationship we saw in iPSCs changed as the cells were differentiated. Despite the overall increased number of differentially accessible peaks in the differentiated cells, a much weaker overall relationship between genetic variation and differential accessibility exists than that of the iPSCs, Overall, differentially accessible peaks were less likely to be associated with genetic variation in the differentiated cells. Compared to iPSCs, INDELs had a much weaker relationship with differentially accessible peaks genome-wide in the differentiated cell types (iPSC OR = 2.73, NSC OR = 1.29, Motor Neuron OR = 1.24, Monocyte OR = 0.88, p < 2.2e-16 all tests) and that pattern extended across INDEL sizes and genomic contexts (Fig. 2C and Supplementary Fig. 9).

Compared to chromatin accessibility, the odds of differential methylation genome-wide were less associated with the presence of genetic variation genome-wide (aggregated OR = 0.52, 95% CI: (0.51,0.53)) (Fig. 2D). DMRs were more likely to be present in regions with INDELs than in regions with SNPs (aggregate OR = 2.05,0.43, p < 2.2e-16). Interestingly, in contrast to chromatin accessibility, differential methylation is similarly associated with genetic variation in the differentiated cells as it is in the iPSCs, although a similar pattern of decreasing ORs in the motor neurons and monocytes (Fig. 2D and Supplementary Fig. 10). For example, DMRs were similarly likely to occur in regions with INDELs genome-wide across cell types (iPSC OR = 2.05, NSC OR = 2.46, Motor Neuron OR = 1.47, Monocyte OR = 1.94).

To test whether insertions or deletions were consistently resulting in an increase or decrease, respectively, in normalized read counts, we investigated whether their odds of increased ATAC or RNA reads given that the reads overlapped an insertion, compared to if the reads overlapped a deletion. There was no significant increase in the odds of increased ATAC or RNA reads in iPSC comparisons given the presence of insertion compared to a deletion, indicating that there is not an overall pattern of directional INDEL impact upon chromatin accessibility or gene expression (Supplementary Fig. 11). Taken together, our results indicate that genetic variation has a stronger relationship with chromatin accessibility than DNA methylation in iPSCs, and such relationship with chromatin accessibility is not as strong in differentiated cells.

Genetic variation and epigenetic variation interactions may be required for changes in gene expression

We next asked whether the observed differences in chromatin accessibility and DNA methylation are associated with differences in gene expression. We expected that in some cases, epigenetic variation within a gene may be associated with differential gene expression levels, as chromatin accessibility and DNA methylation have documented roles in gene regulation²⁸. In iPSCs, we found that differentially expressed genes directly overlapped with differentially accessible and/or differentially methylated regions between 0.8% and 14%, depending on which iPSC lines were being compared(Supplementary Table 4 and Fig. 3A). Genes that overlapped an epigenetic variation had a slightly higher difference in expression on average (abs(log2FC) = 1.54) if there was a differentially accessible peak (with or without DMR) compared to genes that did not overlap epigenetic variation (abs(log2FC) = 1.07). In Wilcoxon rank-sum tests for each sample comparison, p-values ranged between 4.6e-5 and 0.89 due to low numbers of overlaps with differentially accessible peaks (Supplementary Table 5). To interrogate whether DNA methylation or chromatin accessibility could impact gene expression from distal locations, we used expression quantitative trait loci (eQTLs) from the Human induced Pluripotent Stem Cell Initiative (HipSci) study, which identified eQTLs using 711 iPSC lines from 301 donors²⁹. Genes associated with HipSci eQTLs that overlapped differentially accessible peaks between lines tended to have a larger difference in gene expression (mean abs(Log2FC) = 0.91) compared to genes whose associated eQTL was not differentially accessible (mean abs(Log2FC) = 0.80) (Fig. 3B). Wilcoxon rank-sum tests for these comparisons had a wide range of power to detect these differences, based on the varying numbers of HipSci eQTLs for each sample comparison, with p-values ranging between 0.04 and 0.98 (Supplementary Table 5).

**Fig. 3: Genetic variation and epigenetic variation interactions result in changes in gene expression.**

One example of genetic variation that is associated with chromatin accessibility and DNA methylation differences, as well as a gene expression difference is gene ZSCAN1. ZSCAN1 is expressed in iPSCs from F1 and M1, with much weaker expression in iPSC from F2. F2’s copy of ZSCAN1 is methylated and inaccessible at the promoter, which may lead to this difference in expression (Fig. 3C). These differences are in conjunction with a genetic difference at SNP rs495839, in the promoter of ZSCAN1 on chromosome 19, which was identified by HipSci as an eQTL (-Log10 p-value = 4.1) in iPSCs^29,30 (Supplementary Fig. 12). The downstream gene, ZNF135, is also identified as associated with the SNP rs495839. In F2 samples, the expression is lower than in samples from other donors, though not as dramatically as ZSCAN1. The promoter of ZNF135 is heterozygously methylated in F2 (fully unmethylated in iPSCs from other donors), and the chromatin accessibility is approximately half the levels of other samples, indicating that only one copy of the gene may be expressed. NR1D2 has the strongest TF binding motif alignment to the sequence surrounding rs495839 (CATGACCGAGTGGCCCAGGCCC) which decreases when the SNP is not present (E = 0.21), and the sequence match decreases when the SNP (T- > C) is present (E = 2.06) (Supplementary Fig. 13). Similarly, the match to MYCN’s binding motif is weaker when the SNP is present at that location (E = 2.44, 21.2, respectively) (Supplementary Fig. 14). Both NR1D2 and MYCN are expressed in iPSCs. Overall, we found a trend of higher differences in expression when genes directly overlap, or associated eQTLs overlap differentially accessible peaks, although this difference was not statistically significant. This difference may be attributed to the disruption or enhancement of TF binding sites in many cases.

Epigenetic variation that is directly associated with genetic variation is not more likely to be maintained in multiple cell types

We hypothesized that epigenetic variation that is directly associated with genetic variation is more likely to be ubiquitous across cell types. In other words, a peak that is differentially accessible in one cell type and overlaps a genetic variation may be more likely to be differentially accessible in other cell types if the genetic variation is contributing to the difference in accessibility. We reasoned that if an epigenetic variation was directly caused by a genetic variation, it might be more likely that that epigenetic variation is present in multiple cell types. In WGBS, we observed the opposite result. The proportions of cell type-specific DMRs varied depending on sample comparison (Fig. 4A). On average, 7.1% of DMRs (p < 0.0001) that overlapped genetic variation were differentially methylated in more than one cell type, while 14% of DMRs that did not overlap genetic variation were differentially methylated in more than one cell type on average (2-proportion test, p < 2.2e-16). In differentially accessible ATAC peaks, only 4.7% in peaks with genetic variation and 5.4% in peaks without genetic variation were not cell type specific (Fig. 4B). This difference was statistically significant in a two-proportion test (p < 2.2e-16), but the difference is miniscule. In a two-proportion test, differentially expressed genes that overlap genetic variations were significantly more likely to be shared between cell types than to be differentially expressed in only one cell type, by a small margin (23% in genes with genetic variation, 20% in genes without genetic variation, p = 8.178e-5) (Fig. 4C). Overall, DMRs, differentially accessible peaks, and differentially expressed genes are much more likely to be cell type-specific than consistent across cell types, regardless of overlap with genetic variation (Fig. 4). DMRs and differentially accessible peaks that overlapped genetic variation were not more likely to be shared in more than one cell type than DMRs and differentially accessible peaks that did not overlap genetic variation. This indicates that genetic variation does not directly impact methylation or accessibility in a non-cell-type-specific manner in the cells observed.

**Fig. 4: Epigenetic variation that is directly associated with genetic variation is not more likely to be shared across cell types.**

Cell-type-specific enrichment of GWAS traits varies depending on presence or absence of underlying genetic variation

In order to investigate the utility of iPSC systems for studying disease mechanisms, we explored whether we could use our data to identify the relevant cell type of GWAS trait-associated SNPs based on chromatin accessibility. Furthermore, we investigated whether GWAS trait-associated loci are more enriched in ATAC peaks with or without genetic variation. We focused on neurological traits with the hypothesis that there would be enrichment for neurological trait-associated loci in the accessible regions of motor neurons or NSCs compared to iPSCs or monocytes. Linkage Disequilibrium Score Regression (LDSC) analysis was conducted to assess cell-type-specific enrichment for GWAS-identified loci in ATAC peaks compared to a randomized set of similarly sized genomic regions, which would be indicated by a positive regression coefficient. As our cells come from donors of different ancestries, they were analyzed separately with relevant GWAS studies to their backgrounds.

In cells from donor F2 (European Ancestry), for some traits (ADHD, autism, insomnia, major depressive disorder, schizophrenia), ATAC peaks in motor neurons were more enriched for trait-associated loci compared to other cell types, as expected (Fig. 5A). ATAC peaks were enriched for GWAS loci in several cell types, indicating that several cell types may play a role in these complex traits (ADHD, autism, bipolar disorder, schizophrenia). Notably, there was enrichment for schizophrenia-associated loci in ATAC peaks of iPSCs and NSCs, and ADHD-associated loci in ATAC peaks of iPSCs, which may indicate some developmental mechanism for these traits. In F1 and M1 (Eastern Asian Ancestry), ATAC peaks in motor neurons were strongly enriched for schizophrenia-associated loci, specifically in a schizophrenia study from the Psychiatric Genomics Consortium (PGC) (Supplementary Fig. 15A). It should be noted that most of the GWAS studies included for the donors with Eastern Asian Ancestry were from BioBank Japan³¹, which had small case numbers for each of the traits, limiting the power of those studies to identify associated SNPs (Supplementary Table 6). The exceptions were a schizophrenia study by Lam et al. and a depression study by Giannakopoulou et al.^32,33. This may have resulted in the lower regression coefficients and non-significant p-values for GWAS traits from these studies. In a similar analysis, we hypothesized that ATAC peaks in monocytes may be enriched for immune-related traits (Supplementary Fig. 16). Indeed, we found that monocyte count-related SNPs were enriched in ATAC peaks in monocytes. We found a similar enrichment of monocyte count-associated SNPs in the samples from donors of Eastern Asian ancestry (Supplementary Fig. 17).

**Fig. 5: Cell-type-specific enrichment of GWAS traits varies depending on presence or absence of underlying genetic variation.**

We further sought to find whether ATAC peaks that were differentially accessible between lines might be enriched for GWAS loci compared to ATAC peaks that were conserved across lines. In this LDSC analysis, a positive regression coefficient would indicate enrichment of trait-associated loci in variable ATAC regions compared to conserved ATAC regions. In cells from donor F2, we found that there was no trait tested with significant enrichment of associated loci in conserved ATAC regions compared to variable ATAC regions (Fig. 5B). Two traits had motor neuron-specific enrichment for associated loci in differentially accessible peaks (autism, major depressive disorder). Notably, Alzheimer’s disease had a monocyte-specific enrichment for loci in differentially accessible peaks. In cells from donors F1 and M1, using GWAS studies with Eastern Asian ancestry, we found that Schizophrenia-associated loci were enriched in differentially accessible peaks compared to conserved peaks (Supplementary Fig. 15B). Monocyte count-associated SNPs were enriched in differentially accessible peaks in monocytes in both European and Eastern Asian background samples (Supplementary Figs. 16B, 17B).

In order to interrogate whether genetically conserved regions within ATAC peaks were more likely to harbor trait-associated loci, we used LDSC to test whether ATAC peaks that overlapped genetic variation and ATAC peaks without genetic variation in our donors had different enrichments of GWAS trait-associated SNPs. In this analysis, a positive coefficient indicates enrichment in regions that are genetically conserved between donors. ATAC peaks were collapsed into one list of ATAC peaks for each donor. We calculated regression coefficients for each cell type for several neurological traits and found that in motor neurons, ATAC peaks without genetic variation were significantly more likely to overlap loci associated with several neurological traits, including ADHD, autism, insomnia, major depressive disorder, and schizophrenia (Fig. 5C). Results for F1 and M1 did not recapitulate these findings, with small regression coefficients and non-significant p-values (Supplementary Fig. 15C). Monocyte count-associated SNPs were enriched in genetically conserved ATAC peaks in monocytes in both European and Eastern Asian background samples (Supplementary Figs. 16C, 17C).

As an illustrative example, the SNP rs4453556 is associated with several neurological phenotypes, which include ADHD, autism, insomnia, bipolar disorder, Alzheimer’s disease, Parkinson’s disease, obsessive-compulsive disorder, post-traumatic stress disorder, and schizophrenia with varying directional effects, but it’s highest estimated effect size of the studies included is in schizophrenia (Z = 3.945). This SNP is located in a region that is accessible in motor neurons in each of the samples and is upstream of the gene NPTX1, a neuronal gene, and RPTOR, a gene involved in cell growth (Fig. 5D). F2 is a homozygous alternative for rs4453556, whereas M1 and F1 are homozygous references. Motor neurons derived from F2 have very low expression of NPTX1 compared to motor neurons derived from the other two donors, but there is not an expression difference in RPTOR. rs4453556 has been identified in some eQTL studies as a non-significant association with NPTX1 in varying directions in different cell types, but no documented association with the brain in FIVEx³⁰(Supplementary Fig. 18). There is a strong eQTL observation of rs4453556 in blood, but we do not see any expression differences in our monocyte samples (Supplementary Fig. 19). To further interrogate the potential mechanism of this SNP’s impact we looked at how it may affect TF binding. ZEB1, which is expressed in motor neurons, has the strongest TF binding motif alignment to the sequence surrounding rs4453556 (CACATCACCTGTCGTGTGTCA) when the SNP is not present (E = 0.61), and the sequence match decreases when the SNP (T- > A) is present (E = 10.0) (Supplementary Fig. 20).

These results suggest that epigenomic data from iPSCs and iPSC-derived cell types may bear valuable information for interrogating mechanism of various diseases, including those that arise during development.

Discussion

We have provided a comprehensive catalog detailing the direct interaction between genetic variation and epigenetic variation using human iPSCs and iPSC-derived cell types. We generated high-quality WGS, WGBS, ATAC-seq, and RNA-seq data in replicates for six iPSC lines from three donors (Supplementary Data 1). Further, these iPSC lines were differentiated into NSCs, motor neurons, and monocytes, for which we also present high-quality WGBS, ATAC-seq, and RNA-seq data. We have identified 817-39,313 (depending on sample comparison) cell-type-specific DMRs that overlapped with genetic variation, 7181-67,837 cell-type-specific differentially accessible peaks that overlapped with genetic variation, and 164-556 cell-type-specific differentially expressed genes that overlapped with genetic variation. At the line level, we have identified 1288-40,105 (depending on sample comparison) DMRs overlapping genetic variation between lines across cell types, 7689-71,953 differentially accessible peaks overlapping genetic variation between lines, and 254-868 differentially expressed genes overlapping genetic variation between lines. All of this provides an important resource for researchers to investigate cell-type-specific epigenomes in the context of epigenetic variation.

Previous studies have shown that iPSCs maintain donor-specific DNA methylation and gene expression patterns even after reprogramming, providing a tractable system that is not overwhelmed by random environmental variations^18,19. We found that iPSC lines were generally more similar to lines from the same donor (Fig. 1E–G), supporting the hypothesis that donor-specific epigenetic marks that are maintained after reprogramming and iPSCs as a useful model for studying the relationship between genetic variation and epigenetic variation. We observed variation among iPSCs from the same donor (Supplementary Fig. 1), which is consistent with previous findings^26,34. The variability amongst iPSCs from the same donor indicates that some variation in DNA methylation, chromatin accessibility, and gene expression is unexplained by genetic variation or environmental factors, likely due to stochasticity within the cells, and the reprogramming process of iPSCs²⁶. Notably, we did not observe any significant enrichment for relevant gene ontology terms in differentially expressed genes between lines from the same donor (Supplementary Table 3).

Epigenetic buffering involves epigenetic conservation in functionally important regions, where the epigenetic marks may be more important for function than the genetic sequence itself. This is important because it contradicts the idea that the interaction between epigenetic and genetic variation is in one direction. In a study of the relationship between epigenetics and structural variation in the human and chimpanzee genomes, Zhuo et al. found a 2-fold enrichment of medium-sized indels (20bp to 50bp) in CpG island (CGI)-containing promoters when compared to a random distribution of INDELs³⁵. This was surprising, considering the functional impact that promoters have on gene expression. In these regions, DNA methylation and chromatin accessibility profiles were generally conserved across regions, supporting a hypothesis of epigenetic buffering of the impact of genetic variation³⁶. Our finding that iPSC ATAC peaks that overlap medium-sized INDELs, specifically in promoters, were less likely to be differentially accessible (Fig. 2C) did not reach statistical significance. This trend does align with the previous finding of epigenetic buffering in this context, making it an interesting concept to follow up on in future study. Previous studies have demonstrated that genetic variation impacts the epigenome through several mechanisms, such as disrupting transcription factor binding sites or well-characterized mutations in genes encoding epigenetic factors^37,38. However, how the epigenome may impact genetic variation is largely unstudied. One mechanism of epigenetic impact on genetic variation is the phenomenon of epigenetic buffering, which involves maintained epigenetic marks in genomic regions, resulting in a dampening of negative selection on genetic mutations, increasing the potential for genetic variation^36,39,40. Comparative epigenomics across species has revealed instances of epigenetic buffering in the context of evolution³⁶. It would be interesting to extend these studies to test the hypothesis that epigenetic buffering of genetic variation exists within human individuals and further quantify these instances genome-wide and across four different cell types.

As the cells were differentiated, variation in gene expression and chromatin accessibility increased (Fig. 1C, D, F, G). The variation in differentiated cells was more robust to genetic variation than the epigenetic landscapes of iPSCs (Fig. 2F). Banovich et al. showed that GTEx eQTLs for heart cells were also present in iPSC-derived cardiomyocytes, which indicates that the epigenetic architecture for cell-type-specific gene regulation is maintained in iPSC-derived cells¹⁸. We found that DNA methylation, chromatin accessibility, and gene expression variations were likely to be cell-type-specific (Fig. 5), which is reflective of cell-type-specific epigenetic structure in iPSC-derived cells. Variation is more dependent upon cell type than genetic variation.

The relationship between the epigenome and gene expression is complex. In a simple model, genes are regulated by open chromatin and low methylation at promoters^28,41. In fact, gene expression can be regulated by various transcription factor bindings, distal regulatory elements, post-translational modifications, and more²⁸. Limited by sample size, we had the power to analyze patterns in the direct relationship between gene expression, chromatin accessibility, and DNA methylation differences of that gene, and found a very limited relationship (Fig. 4A). Extending this analysis to known iPSC eQTLs, we found a trend of increased difference in gene expression if an associated eQTL had differential accessibility, however, this trend was not significant (Fig. 4B). These trends highlight the need for future study with a larger sample size in order to elucidate the relationship between genetic variation and epigenetic variation on distal regulatory elements. We identified genes associated with known eQTLs (e.g., ZSCAN1, ZNF135) that were apparently regulated by chromatin accessibility and methylation (Fig. 4C). This is one example of direct interaction between genetic variation and epigenetic variation resulting in gene expression differences between cell types.

Genome-wide association studies (GWAS) have identified SNPs associated with various disease phenotypes. In order to better understand which of these associated SNPs are functionally important and in what cell type context, we can overlap GWAS variants with functional genomic data, including chromatin accessibility⁴². Previous studies have utilized iPSC transcriptomics with a similar goal of understanding GWAS SNP mechanism^43,44, but the inclusion of chromatin accessibility data sheds deeper insight into which loci are potentially active, and in which cell types. We found that accessible chromatin regions without genetic variation in our motor neuron samples were enriched for loci associated with several neurological phenotypes (Fig. 5C). The regions without genetic variation might be more enriched for these traits because it is expected that causative SNPs are less common in the population⁴⁵. These findings emphasize the utility of iPSCs and iPSC-derived cells for taking advantage of the knowledge GWAS has given us and interrogating disease mechanisms.

One limitation of our study is that our data was sequenced using short reads, which limits our ability to detect structural variants (SVs)⁴⁶. In addition, direct alignment of short reads to hg38 results in reference bias, where reads with variants that do not appear in hg38 are less likely to be mapped. This would cause a potential bias in our data wherein reads containing variants may have a lower mapping frequency, which would decrease the number of reads we would see. To avoid such reference bias, we employed a personalized genome mapping strategy, using vcf2diploid²⁵ hg38 with individual variants to create personalized reference genomes to align our data to (See “Methods”). The epigenetic data was then lifted back over to hg38 to compare between samples.

In order to gain insights for disease modeling, data from the epithelial cells that the iPSCs were derived from would be critical. Our study is limited in this aspect, and conclusions about the efficacy of iPSC-derived disease models cannot be confidently made without epigenetic information from the cells prior to reprogramming. Future research, including epigenetic data from the donor cells before iPSC reprogramming would be a valuable resource to understand how patient epigenetic marks are maintained through reprogramming and differentiation.

In conclusion, iPSCs maintain some donor-specific epigenetic signatures that may be linked to genetic variation. Those signatures are generally not maintained as the iPSCs are differentiated. In the differentiated cells, epigenetic variation and gene expression variation increase, but the direct relationship to genetic variation is weaker, genome-wide. The interaction between genetic variation and epigenetic variation can impact gene expression. However, variation in gene expression is much more complicated than direct overlaps of differential methylation or chromatin accessibility, highlighting the need for a further study with a large sample size and other genomics data modalities. Finally, we have elucidated the utility of studying epigenetic variation in iPSCs and their derivatives for identifying important loci for GWAS studies and the cell types in which they may be acting.

Methods

Authors have complied with all relevant ethical regulations. IPSCs were reprogrammed from epithelial cells in donated urine samples. Authors have complied with all relevant ethical regulations, as approved by the Washington University IRB, with IRB study ID 201911152. Informed consent was obtained from all study participants, including the release of demographic information including age, sex, and ethnicity.

iPSC Reprogramming

Renal epithelial cells were collected from fresh urine samples from three donors, one related daughter and father, and one unrelated individual (see Supplementary Table 1). iPSCs were reprogrammed according to using the CytoTune™ iPS 2.0 Sendai Reprogramming Kit following the manufacturer’s instructions (ThermoFisher). Several lines, clonal and pooled, were derived from each donor (Supplementary Table 2). The iPSCs were reprogrammed as pools and clones as a separate endeavor from the Genome Engineering & Stem Cell Center (GESC@MGI) to find out whether pools would be epigenetically different than clonally derived lines. We did not see any indication that lines from pools (i.e., F1_1) were more variable than lines from clones (i.e., F1_3). In the iPSCs, F1_1(pool), F1_2(pool), and F1_3(clone) were all very similar with very few differentially expressed genes, indicating that variation was not inherently caused by clone/pool differentiation. Each of the six lines were grown in duplicate for 12 total cultures in two 6-well plates. All iPSC lines were harvested simultaneously.

NSC Differentiation

NSCs were differentiated according to Du, 2015⁴⁷. In a 12 well plate, iPSCs were kept in NEP induction medium(50% DMEM, 50% Neural basal Medium, 0.5X B27, 0.5X N2, 0.1 mM Ascorbic acid, 1X GlutaMAX, 1X P/S, 2 µM DMH1, 2uM SB-431542, 3 µM CHIR99021) for 6 days and sorted for the CD184 + /CD271- population (Supplementary Fig. 21). All NSC lines were harvested simultaneously.

Motor neuron differentiation

Motor Neurons were further differentiated from NSCs⁴⁷. The cells were kept in Motor Neuron Progenitor (MNP) induction medium (50% DMEM, 50% Neural basal medium (Gibco), 0.5X N2, 0.5X B27, 0.1 mM Ascorbic Acid, 1X GlutaMAX, 1X P/S, 2 µM DMH1, 2 µM SB-431542, 1 µM CHIR99021, 0.5 µM Pur, 0.1 uM Retinoic Acid) for 6 days. MNPs were expanded in MNP expansion medium (50% DMEM, 50% Neural basal medium, 0.5X N2, 0.5X B27, 0.1 mM Ascorbic Acid, 1X GlutaMAX, 1X P/S, 2 µM DMH1, 2 µM SB-431542, 3 µM CHIR99021, 0.5 µM Pur, 0.5 mM VPA, 0.1 mM Retinoic Acid). The cells then spent 6 days in Motor Neuron induction medium (50% DMEM, 50% Neural basal medium, 0.5X N2, 0.5X B27, 0.1 mM Ascorbic Acid, 1X GlutaMAX, 1X P/S, 0.1 µM Pur, 0.5 µM Retinoic Acid). Finally, the cells were matured into CHAT-positive motor neurons in PDL/Laminin coated wells (50% DMEM, 50% Neural basal medium, 0.5X N2, 0.5X B27, 0.1 mM Ascorbic Acid, 1X GlutaMAX, 0.1 µM Pur, 0.5 µM Retinoic Acid, 0.1 µM Compound E). Motor Neuron morphology before harvesting is documented in Supplementary Fig. 22.

Monocyte differentiation

Monocytes were differentiated using the StemDiff Monocyte kit. iPSCs were treated with ROCKi before dissociation. Cells were counted at a density of 2.5-3 million per well of 6-well plates in an HSC differentiation medium (1X P/S, 20 ng/ml hSCF, 40 ng/ml hBMP4, 40 ng/ml hVEGF). Embryoid bodies (EBs) were harvested day 1 using a reversible cell strainer (Stem Cell Technologies). The medium was changed every three days. On day 12, cells were transferred to monocyte differentiation medium (1X P/S, 10% FBS, 1X GlutaMAX, 25 ng/ml hIL3, 100 ng/ml hM-CSF). Cells were kept for 14 days before sorting for CD14 + /CD11b + and then harvested (Supplementary Fig. 23). Due to insufficient material, the differentiation of samples M1_1 and F2_1 were repeated for both technical replicates at a later time.

WGS Library preparation

Genomic DNA was extracted from each of the 6 lines and processed separately. Genomic DNA samples were quantified using the Qubit Flourometer 3.0 and approximately 25ng of genomic DNA was used for sample assessment. Genomic DNA (~ 600 ng) was fragmented on the Covaris LE220 instrument targeting ~ 375bp inserts. A QC check was done post-fragmentation to assess DNA quality using the Agilent Bioanalyzer. Automated dual-indexed libraries were constructed with the KAPA Hyper PCR-free library prep kit (ROCHE DIAGNOSTICS SYSTEMS, Cat # 7962371001) on the SciCloneG3 NGS instrument platform (Perkin Elmer) in a 96 well format. The concentration of each library was accurately determined through qPCR utilizing the KAPA library Quantification Kit according to the manufacturer’s protocol (KAPA Biosystems/Roche) to produce cluster counts appropriate for the Illumina NovaSeq6000 instrument. Normalized libraries were sequenced on a NovaSeq6000 S4 Flow Cell using the XP workflow and a 151 × 10 × 10 × 151 sequencing recipe according to manufacturer protocol. Target sequencing depth is determined prior to pooling, and samples are pooled in ratios based on the targeted depth.

ATAC Library preparation

ATAC-seq was performed as described in Dong et al. 2020. 100,000-200,000 cells from each of the iPSC lines and differentiated lines were collected upon reaching confluence. Cells were resuspended and centrifuged at 500 × g for 10 min at 4 °C. The supernatant was aspirated. The cells were resuspended in 300 μl DNaseI solution (20 mM Tris pH 7.4, 150 mM NaCl, 1 × reaction buffer with MgCl2, 0.1 U/μl DNaseI) on ice for 5 min. Subsequently, 1 ml of PBS (0.04% BSA) was added, and the mixture was centrifuged at 500 × g for 10 min at 4 °C. The cells were washed twice in PBS + BSA and resuspended in 100 μl ATAC-seq RSB (10 mM Tris pH 7.4, 10 mM NaCl, 3 mM MgCl2 in water) with 0.1% NP40, 0.1% Tween-20, and 0.01% Digitonin and incubated on ice for 3 min to lyse. After lysis, 1 mL of ATAC-seq RSB with 0.1% Tween-20 was added and mixed by inversion. Next, the nuclei were collected by centrifugation at 1000 × g for 10 min at 4 °C. The nuclei were resuspended in 20 µL 2 × TD buffer (20 mM Tris pH 7.6, 10 mM MgCl2, 20% Dimethyl Formamide), and 50,000 nuclei were transferred to a tube with 2 × TD buffer filled up to 25 µL. 25 µL of transposition mix [2.5 µL Transposase (100 nM final) (Illumina, 20034197), 16.5 µL PBS, 0.5 µL 1% digitonin, 0.5 µL 10% Tween-20, and 5 µL water] was added. The transposition reactions were mixed and incubated for 30 min at 37 °C with gentle tapping every 10 min. Reactions were purified with the Zymo DNA Clean and Concentrator-5 kit (Zymo Research, D4014). The ATAC-seq library was amplified for nine cycles with Nextara primers in a PCR machine. The PCR reaction was purified with Sera-Mag Select beads (Cytiva, 29343057) using double size selection with 27.5 µL beads (0.55 × sample volume) and 50 µL beads (1.55 × sample volume). The ATAC-seq libraries were quantitated by Qubit assays and sequenced by an Illumina NextSeq 500 platform.

RNA-seq Library preparation

200–500 k cells from each of the iPSC lines and differentiated lines were treated with Trizol for lysing. RNA was extracted using a Zymo Direct-zol RNA Microprep kit (Zymo Research, R2060). 200 ng of RNA from each sample was used to generate RNA-seq libraries using the NuGEN Universal Plus mRNA-Seq + UDI kit (Tecan, 0520). The libraries underwent 2 × 75 paired-end sequencing by Illumina NextSeq 500.

WGBS Library preparation

Whole Genome Bisulfite Sequencing data was prepared from 300–600 k cells using the xGen Methyl-Seq DNA Library Prep Kit (IDT, 10009860). DNA was extracted sing the Zymo Quick-DNA Microprep Kit (Zymo Research, R3020). DNA was sheared on BioRuptor PICO, 9 cycles with 30 s on, 30 s off to achieve fragments distributed around 300 bp. Sheared DNA was purified using the Zymo DNA Clean and Concentrator-5 kit (Zymo Research, D4014). 50–100 ng of DNA fragments mixed with 0.5% unmethylated lambda DNA were bisulfite converted using EZ DNA Methylation-Gold Kit (Zymo, D5005). WGBS libraries were sequenced on the Illumina NovaSeq 6000 platform.

WGS Alignment and variant analysis

WGS data was aligned to hg38 using BWA⁴⁸, and deduplicated using Picard v2.22.0 The aligned sequences were joint-genotyped using GATK HaplotypeCaller⁴⁹. It was verified that there were no genetic variants among lines from the same donor, so reads from lines of the same donor were combined, and alignment and variant calling was repeated. Structural variants were called using Parliament2⁵⁰ in conjunction with MANTA⁵¹, DELLY⁵², LUMPY⁵³, CNVNator⁵⁴, and dysgu⁵⁵, and then combined using SURVIVOR⁵⁶. Genetic variation between donors was determined by comparing the hg38 called variants between donors using VCFtools⁵⁷ and BEDTools⁵⁸. Variants were required to have a 75 bp mappability of greater than 0.5 to be included in analyses.

Personalized reference fasta

Personalized reference fastas were created by editing each individual’s variants into hg38 using vcf2diploid²⁵. As most of the variants were un-phased, variants were sorted at random into “maternal” and “paternal” haplotype reference files, to be combined after alignment.

ATAC Analysis

ATAC-seq reads were aligned to personalized references using AIAP⁵⁹. Bam alignments were lifted over to hg38 using CrossMap⁶⁰, and peak calls were lifted using liftOver⁶¹. Calls for each of the haplotypes were combined to create alignment and peakCall files for each of the samples. QC and analysis on ATAC-seq libraries were performed using AIAP⁵⁹. Peaks in the ATAC data were called using MACS2⁶² and visualized on the WashU Epigenome Browser⁶³. Batch correction using Combat-seq⁶⁴ was applied to the Monocyte samples to remove batch effects of asynchronous library preparation (see Methods: Monocyte Differentiation) Data from sex chromosomes was excluded from all analyses. Differentially accessible peaks were calculated using DiffBind⁶⁵. Intra-cluster distances in PCAs were calculated using the clv package in R. Intra-cluster distances are defined as the Euclidean distances between the furthest two points in the cluster, across all principal components. In this case, clusters were defined as samples from the same cell type. Observations in the principal components analyses were normalized read counts under peaks for each sample. Odds ratios were calculated as the odds that a called peak is differentially accessible between lines given that the peak overlaps a genetic variation, divided by the odds of a peak being differentially accessible given that it does not overlap a genetic variation. Odds ratios were calculated in genome-wide context (all peaks), as well as specifically peaks that overlap ChromHMM Quiescent/Heterochromatin regions²⁷, and peaks that overlap promoters. There are also odds ratios calculated for different types of genetic variation in those genomic contexts, including SNPs, INDELs, DUPs, and INVs. P-values were calculated in odds ratio tests for whether the contextual odds ratios are significantly different from the odds ratio observed in a genome-wide context. All overlaps were computed using BEDTools⁵⁸. Odds ratio (OR) statistical differences were calculated as follows.

$${{\rm{n}}}_{1}={\rm{N}}\; {\rm{diffEpi}}\;\&\; {\rm{diffGen}}$$

$${{\rm{n}}}_{2}={\rm{N}}\; {\rm{sameEpi}}\;\&\; {\rm{diffGen}}$$

$${{\rm{n}}}_{3}={\rm{N}}\; {\rm{diffEPi}}\;\&\; {\rm{sameGen}}$$

$${{\rm{n}}}_{4}={\rm{N}}\; {\rm{sameEpi}}\;\&\; {\rm{sameGen}}$$

$${\rm{OR}}=\frac{{\rm{Odds}}({\rm{diffEpi|diffGen}})}{{\rm{Odds}}({\rm{diffEpi|sameGen}})}=\frac{({\rm{N}}\; {\rm{diffEpi}}\;\&\; {\rm{diffGen}})/\,({\rm{N}}\; {\rm{sameEpi}}\;\&\; {\rm{diffGen}})}{({\rm{N}}\; {\rm{diffEpi}}\;\&\; {\rm{sameGen}})/\,({\rm{N}}\; {\rm{sameEpi}}\;\&\; {\rm{sameGen}})}$$

$${\rm{SE}}(\log {\rm{OR}})=\sqrt{{\frac{1}{{n}_{1}}}^{2}+{\frac{1}{{n}_{2}}}^{2}+\,{\frac{1}{{n}_{3}}}^{2}+{\frac{1}{{n}_{4}}}^{2}\,}$$

$$\delta=\left|{\rm{SE}}(\log {{\rm{OR}}}_{1})-{\rm{SE}}(\log {{\rm{OR}}}_{2})\right|$$

$${\rm{SE}}({\rm{\delta }})=\sqrt{{{{SE}}_{1}}^{2}+{{{SE}}_{2}}^{2}}$$

$$z=\delta /{SE}(\delta )$$

$$({\rm{In\; R}})\,{{P}}-{\rm{value}}={2}^{*}(1-{\rm{pnorm}}({\rm{z}}))$$

RNA-seq Analysis

RNA-seq data was aligned to personalized references using STAR⁶⁶. The personalized alignments to each haplotype were lifted to hg38 using liftOver⁶¹ and added together to form each sample’s hg38 alignment. Data from sex chromosomes was excluded from all analyses. Gene count matrices were generated using featureCounts from the Rsubread package⁶⁷. Batch correction using Combat-seq was applied to the Monocyte samples to remove batch effects of asynchronous library preparation (see Methods: Monocyte Differentiation). Differentially expressed genes and normalized gene expression differences were analyzed using DESeq2⁶⁸. Promoter regions were defined as + /-2kb from the transcription start site (TSS) of a gene. Gene ontology analysis was performed using ShinyGo⁶⁹, with a background gene set including any gene that had >1 RPKM expression in any iPSC line. Intra-cluster distances in PCAs were calculated using the clv package in R. Intra-cluster distances are defined as the Euclidean distances between the furthest two points in the cluster, across all principal components. In this case, clusters were defined as samples from the same cell type. Observations in the principal components analyses were per-gene normalized read counts for each sample. RNA-seq reads were matched with HipSci target genes²⁹ joining by gene name. The eQTL loci were then directly overlapped with chromatin accessibility and methylation statistics using BEDTools⁵⁸. Motif analysis was conducted by providing hg38 sequence 10bp upstream and downstream of rs495839 to TOMTOM⁷⁰, with reference sequence and alternative sequence in the rs495839 position, filtering by E < 20 to catch large differences in E-values between the sequences.

WGBS Analysis

WGBS was aligned to personalized references. CpG methylation counts were lifted to hg38 using liftOver⁶¹ for each of the haplotypes and added together to form each sample’s hg38 alignment. WGBS alignment and methylation calling was performed as in https://github.com/hyungjoo-lee/wgbs. Reads were trimmed (first 10 bp) using TrimGalore (Version 0.4.5, https://github.com/FelixKrueger/TrimGalore). Trimmed reads were processed using Bismark⁷¹. Data from sex chromosomes was excluded from all analyses. Differential methylation was called using DSS⁷² using smoothing and a p-value cut-off of 0.001. Odds ratios were calculated as the odds of a differentially methylated region (DMR), as opposed to a non-differentially methylated CpG island, between lines given that the region overlaps a genetic variation, divided by the odds of a DMR given that it does not overlap a genetic variation. Odds ratios are calculated in genome-wide context (all DMRs and non-DMR CpG islands), as well as specifically regions that overlap ChromHMM Quiescent/Heterochromatin regions²⁷, and regions that overlap promoters. There are also odds ratios calculated for different types of genetic variation in those genomic contexts, including SNPs, INDELs. P-values were calculated using odds ratio tests for whether the contextual odds ratios are significantly different from the genome-wide odds ratio. All overlaps were computed using BEDTools⁵⁸.

Bootstrapped hierarchical clustering

Bootstrapped hierarchical clustering was performed using pvclust⁷³ in R (Supplementary Figs. 3–5). Input for the gene expression clustering was DESeq2-normalized gene expression values with variance >0.3 across samples. Input for chromatin accessibility clustering was DiffBind-normalized ATAC-seq reads under peaks with variance >0.3 across samples. Input for methylation clustering was CpG methylation levels of CpGs with variance >0.05 across samples. For each clustering, 10,000 bootstraps were performed. The correlation distance metric was employed, with complete clustering.

LDSC Analysis

ATAC peaks were concatenated for each donor, for each cell type, using BEDOPs -u and BEDOPS -merge⁷⁴. Merged ATAC peaks were then sorted into peaks with and without variation amongst donors (any variation compared to any donor) using BEDTools intersect⁵⁸. Cell-type-specific LDSC was conducted according to https://github.com/bulik/ldsc (v1.0.0). LDSC was conducted using 1000 Genomes⁷⁵ and HapMap3⁷⁶ reference files from https://zenodo.org/records/7768714. F2 was regressed using EUR reference files, and M1 and F1 were regressed using EAS reference files. The cell-type-specific regression was done comparing peaks with no genetic variation to peaks with genetic variation for the enrichment of trait-associated loci (summary statistics from: EUR Parkinson’s⁷⁷, EUR PTSD⁷⁵, EUR Schizophrenia⁷⁸, EUR Major Depressive Disorder⁷⁹, EUR Obsessive Compulsive Disorder⁸⁰, EUR Alzheimer’s Disease⁸¹, EUR Bipolar Disorder⁸², EUR Attention Deficit Hyperactivity Disorder⁸³, EUR Autism Spectrum Disorder⁸⁴, EUR Monocyte Count⁸⁵, EUR Lupus⁸⁶, EUR Type 1 Diabetes⁸⁷, EUR Ulcerative Colitis⁸⁸, EUR Multiple Sclerosis⁸⁹, EUR Autoimmune Vitiligo⁹⁰, EUR Rheumatoid Arthritis⁹¹, EUR Osteoarthritis⁹², EUR Gout⁹³, EUR Insomnia⁹⁴. From Biobank Japan³¹: EAS Depression, Epilepsy, Insomnia, Parkinson’s, Schizophrenia, Substance Dependence, Food Allergy, Lupus, Monocyte Count, Psoriasis, Rheumatoid Arthritis, and Type 1 Diabetes. Additional GWAS studies with eastern Asian participants for Schizophrenia³² and Major Depressive Disorder³³ were also included.

Statistics & reproducibility

No statistical method was used to predetermine sample size, and no data was excluded from the analysis. Reproducibility of clustering results was verified through bootstrapping. Strict Bonferroni significance cut-offs were employed to determine statistical significance in Wilcoxon rank-sum tests.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

To protect the privacy of donors, donors consented to individual-level sequencing data to be released only to authorized investigators. Individual level sequencing data is available to authorized investigators through dbGaP controlled access under release phs003649.v1.p1 here: https://ftp.ncbi.nlm.nih.gov/dbgap/studies/phs003649/phs003649.v1.p1. Authorized investigators include senior investigators (tenure track professor, senior scientist, or equivalent) and NIH Investigators by logging into eRA commons and downloading through the dbGaP Authorized Access System upon approval of the Data Access Request (DAR). The average wait time for a DAR to be processed is two weeks, and approved investigators will maintain access for one year with the ability to renew. A summary of the data release can be found here: https://ftp.ncbi.nlm.nih.gov/dbgap/studies/phs003649/phs003649.v1.p1/release_notes/Release_Notes.phs003649.iPSCs_HealthyVolunteers.v1.p1.MULTI.pdf. HIPSCI data analyzed in this work can be accessed through https://ftp.ebi.ac.uk/pub/databases/spot/eQTL/sumstats/QTS000016/QTD000361/. Included GWAS studies can be accessed at the following links (also included in Supplementary Table 6): EUR ADHD: https://doi.org/10.1038/s41588-018-0269-7, EUR Alzheimer’s: https://doi.org/10.1038/s41588-018-0311-9, EUR Autism: https://doi.org/10.1038/s41588-019-0344-8, EUR Bipolar: https://doi.org/10.1093/hmg/ddw181, EUR Insomnia: https://doi.org/10.1038/s41588-018-0333-3, EUR Major Depressive Disorder: https://doi.org/10.1038/s41593-018-0326-7, EUR OCD: https://doi.org/10.1038/mp.2017.154, EUR Parkinson’s: https://doi.org/10.1016%2FS1474-4422(19)30320-5, EUR Monocyte Count: https://doi.org/10.1016/j.cell.2016.10.042, EUR Systemic Lupus Erythematous: https://doi.org/10.1038/ng.3434, EUR Type I Diabetes: https://doi.org/10.1038/s41586-021-03552-w, EUR Ulcerative Collitis: https://doi.org/10.1038/ng.3760, EUR Multiple Sclerosis: https://doi.org/10.1038/nature10251, EUR Autoimmune Vitiligo: https://doi.org/10.1038/ng.3680, EUR Rhematoid Arthritis: https://doi.org/10.1038/nature12873, EUR Osteoarthritis: https://doi.org/10.1038/s41588-018-0327-1, EUR Gout: https://doi.org/10.1038/s41588-019-0504-x, EUR PTSD: https://doi.org/10.1038/s41467-019-12576-w, EUR Schizophrenia: https://doi.org/10.1038/nature13595, EAS Depression, Epilepsy, Insomnia, Parkinson’s, Schizophrenia, Substance Dependence, Food Allergy, Lupus, Monocyte Count, Psoriasis Vulgaris, Rheumatoid Arthritis and Type I Diabetes: https://doi.org/10.1038/s41588-021-00931-x, EAS Depression: https://doi.org/10.1001/jamapsychiatry.2021.2099, and EAS Schizophrenia: https://doi.org/10.1038/s41588-019-0512-x. Source data are provided in this paper.

References

Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
Article CAS PubMed PubMed Central MATH Google Scholar
Zhang, Q. et al. Genome-wide open chromatin regions and their effects on the regulation of silk protein genes in Bombyx mori. Sci. Rep. 7, 12919 (2017).
Article ADS PubMed PubMed Central MATH Google Scholar
Gibney, E. R. & Nolan, C. M. Epigenetics and gene expression. Heredity 105, 4–13 (2010).
Article CAS PubMed MATH Google Scholar
Natarajan, A., Yardimci, G. G., Sheffield, N. C., Crawford, G. E. & Ohler, U. Predicting cell-type-specific gene expression from regions of open chromatin. Genome Res. 22, 1711–1722 (2012).
Article CAS PubMed PubMed Central Google Scholar
Wagner, J. R. et al. The relationship between DNA methylation, genetic and expression inter-individual variation in untransformed human fibroblasts. Genome Biol. 15, R37 (2014).
Article PubMed PubMed Central MATH Google Scholar
Matsubara, Y. et al. Inactivation of sonic hedgehog signaling and polydactyly in limbs of hereditary multiple malformation, a novel type of talpid mutant. Front. Cell Dev. Biol. 4, 149 (2016).
Article PubMed PubMed Central MATH Google Scholar
Tatton-Brown, K. et al. Mutations in the DNA methyltransferase gene DNMT3A cause an overgrowth syndrome with intellectual disability. Nat. Genet. 46, 385–388 (2014).
Article CAS PubMed PubMed Central MATH Google Scholar
Do, C. et al. Genetic–epigenetic interactions in cis: a major focus in the post-GWAS era. Genome Biol 18, 120 (2017).
Article PubMed PubMed Central MATH Google Scholar
Feil, R. & Fraga, M. F. Epigenetics and the environment: emerging patterns and implications. Nat. Rev. Genet. 13, 97–109 (2012).
Article CAS PubMed MATH Google Scholar
Wong, C. C. Y. et al. A longitudinal study of epigenetic variation in twins. Epigenetics 5, 516–526 (2010).
Article CAS PubMed PubMed Central MATH Google Scholar
Zhu, X. et al. Effects of ultraviolet B exposure on DNA methylation in patients with systemic lupus erythematosus. Exp. Ther. Med. 5, 1219–1225 (2013).
Article CAS PubMed PubMed Central MATH Google Scholar
Stenz, L., Schechter, D. S., Serpa, S. R. & Paoloni-Giacobino, A. Intergenerational transmission of DNA methylation signatures associated with early life stress. Curr. Genomics 19, 665–675 (2018).
Article CAS PubMed PubMed Central Google Scholar
Ekamper, P., van Poppel, F., Stein, A. D. & Lumey, L. H. Independent and additive association of prenatal famine exposure and intermediary life conditions with adult mortality between age 18–63 years. Soc. Sci. Med. 119, 232–239 (2014).
Article CAS PubMed Google Scholar
Takahashi, K. et al. Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell 131, 861–872 (2007).
Article CAS PubMed MATH Google Scholar
Berdasco, M. & Esteller, M. DNA methylation in stem cell renewal and multipotency. Stem Cell Res. Ther. 2, 42 (2011).
Article CAS PubMed PubMed Central MATH Google Scholar
Godini, R., Lafta, H. Y. & Fallahi, H. Epigenetic modifications in the embryonic and induced pluripotent stem cells. Gene Expr. Patterns 29, 1–9 (2018).
Article CAS PubMed MATH Google Scholar
Maherali, N. et al. Directly reprogrammed fibroblasts show global epigenetic remodeling and widespread tissue contribution. Cell Stem Cell 1, 55–70 (2007).
Article CAS PubMed Google Scholar
Banovich, N. E. et al. Impact of regulatory variation across human iPSCs and differentiated cells. Genome Res. 28, 122–131 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kyttälä, A. et al. Genetic variability overrides the impact of parental cell type and determines iPSC differentiation potential. Stem Cell Rep. 6, 200–212 (2016).
Article MATH Google Scholar
Burrows, C. K. et al. Genetic variation, not cell type of origin, underlies the majority of identifiable regulatory differences in iPSCs. PLOS Genet. 12, e1005793 (2016).
Article PubMed PubMed Central MATH Google Scholar
Wu, D.-Y., Bittencourt, D., Stallcup, M. R. & Siegmund, K. D. Identifying differential transcription factor binding in ChIP-seq. Front. Genet. 6, 169 (2015).
Article PubMed PubMed Central Google Scholar
Singh, V. K., Kalsan, M., Kumar, N., Saini, A. & Chandra, R. Induced pluripotent stem cells: applications in regenerative medicine, disease modeling, and drug discovery. Front. Cell Dev. Biol. 3, 2 (2015).
Article PubMed PubMed Central MATH Google Scholar
Doss, M. X. & Sachinidis, A. Current challenges of iPSC-based disease modeling and therapeutic implications. Cells 8, 403 (2019).
Article CAS PubMed PubMed Central MATH Google Scholar
Grezenko, H. et al. Epigenetics in neurological and psychiatric disorders: A comprehensive review of current understanding and future perspectives. Cureus 15, e43960 (2023).
Rozowsky, J. et al. AlleleSeq: analysis of allele-specific expression and binding in a network framework. Mol. Syst. Biol. 7, 522 (2011).
Article PubMed PubMed Central MATH Google Scholar
Carcamo-Orive, I. et al. Analysis of transcriptional variability in a large human iPSC library reveals genetic and non-genetic determinants of heterogeneity. Cell Stem Cell 20, 518–532 (2017).
Article CAS PubMed Google Scholar
van der Velde, A. et al. Annotation of chromatin states in 66 complete mouse epigenomes during development. Commun. Biol. 4, 1–15 (2021).
MATH Google Scholar
Goldberg, A. D., Allis, C. D. & Bernstein, E. Epigenetics: A landscape takes shape. Cell 128, 635–638 (2007).
Article CAS PubMed Google Scholar
Kilpinen, H. et al. Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370–375 (2017).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Kwong, A. et al. FIVEx: an interactive eQTL browser across public datasets. Bioinformatics 38, 559–561 (2021).
Article PubMed Central Google Scholar
Sakaue, S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat. Genet. 53, 1415–1424 (2021).
Article CAS PubMed MATH Google Scholar
Lam, M. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 51, 1670–1678 (2019).
Article CAS PubMed PubMed Central MATH Google Scholar
Giannakopoulou, O. et al. The genetic architecture of depression in individuals of East Asian Ancestry: A genome-wide association study. JAMA Psychiatry 78, 1258–1269 (2021).
Article PubMed MATH Google Scholar
Sardo, V. L. et al. The effect of aging on human induced pluripotent stem cells. Nat. Biotechnol. 35, 69–74 (2017).
Article PubMed Google Scholar
Zhuo, X., Du, A. Y., Pehrsson, E. C., Li, D. & Wang, T. Epigenomic differences in the human and chimpanzee genomes are associated with structural variation. Genome Res. https://doi.org/10.1101/gr.263491.120 (2020).
Xiao, S. et al. Comparative epigenomic annotation of regulatory DNA. Cell 149, 1381–1392 (2012).
Article CAS PubMed PubMed Central MATH Google Scholar
Jin, B. & Robertson, K. D. DNA methyltransferases, DNA damage repair, and cancer. Adv. Exp. Med. Biol. 754, 3–29 (2013).
Article CAS PubMed PubMed Central MATH Google Scholar
Zhou, J. et al. Tissue-specific DNA methylation is conserved across human, mouse, and rat, and driven by primary sequence conservation. BMC Genomics 18, 724 (2017).
Article PubMed PubMed Central MATH Google Scholar
O’Dea, R. E., Noble, D. W. A., Johnson, S. L., Hesselson, D. & Nakagawa, S. The role of non-genetic inheritance in evolutionary rescue: epigenetic buffering, heritable bet hedging and epigenetic traps. Environ. Epigenetics 2, dvv014 (2016).
Article Google Scholar
Li, Z. et al. LHP1-mediated epigenetic buffering of subgenome diversity and defense responses confers genome plasticity and adaptability in allopolyploid wheat. Nat. Commun. 14, 7538 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Razin, A. & Cedar, H. DNA methylation and gene expression. Microbiol. Rev. 55, 451 (1991).
Article CAS PubMed PubMed Central MATH Google Scholar
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Bonder, M. J. et al. Identification of rare and common regulatory variants in pluripotent cells using population-scale transcriptomics. Nat. Genet. 53, 313–321 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Warren, C. R. et al. Induced pluripotent stem cell differentiation enables functional validation of GWAS variants in metabolic disease. Cell Stem Cell 20, 547–557 (2017).
Article CAS PubMed MATH Google Scholar
Gorlov, I. P., Gorlova, O. Y., Frazier, M. L., Spitz, M. R. & Amos, C. I. Evolutionary evidence of the effect of rare variants on disease etiology. Clin. Genet. 79, 199–206 (2011).
Article CAS PubMed Google Scholar
Mahmoud, M. et al. Structural variant calling: the long and the short of it. Genome Biol. 20, 246 (2019).
Article PubMed PubMed Central MATH Google Scholar
Du, Z.-W. et al. Generation and expansion of highly pure motor neuron progenitors from human pluripotent stem cells. Nat. Commun. 6, 6626 (2015).
Article ADS CAS PubMed MATH Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central MATH Google Scholar
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at https://doi.org/10.1101/201178 (2018).
Zarate, S. et al. Parliament2: Accurate structural variant calling at scale. GigaScience 9, giaa145 (2020).
Article PubMed PubMed Central MATH Google Scholar
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinforma. Oxf. Engl. 32, 1220–1222 (2016).
Article CAS MATH Google Scholar
Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).
Article CAS PubMed PubMed Central MATH Google Scholar
Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).
Article PubMed PubMed Central MATH Google Scholar
Abyzov, A., Urban, A. E., Snyder, M. & Gerstein, M. CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, 974–984 (2011).
Article CAS PubMed PubMed Central Google Scholar
Cleal, K. & Baird, D. M. Dysgu: efficient structural variant calling using short or long reads. Nucleic Acids Res. 50, e53 (2022).
Article CAS PubMed PubMed Central Google Scholar
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Article CAS PubMed PubMed Central MATH Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central MATH Google Scholar
Liu, S. et al. AIAP: A quality control and integrative analysis package to improve ATAC-seq data analysis. Genom. Proteom. Bioinform. 19, 641–651 (2021).
Zhao, H. et al. CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinforma. Oxf. Engl. 30, 1006–1007 (2014).
Article MATH Google Scholar
Kuhn, R. M. et al. The UCSC genome browser database: update 2009. Nucleic Acids Res. 37, D755–D761 (2009).
Article CAS PubMed Google Scholar
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Article PubMed PubMed Central MATH Google Scholar
Li, D. et al. WashU epigenome Browser update 2022. Nucleic Acids Res. 50, W774–W781 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y., Parmigiani, G. & Johnson, W. E. ComBat-seq: batch effect adjustment for RNA-seq count data. NAR Genomics Bioinforma. 2, lqaa078 (2020).
Article Google Scholar
Stark, R. & Brown, G. DiffBind: Differential binding analysis of ChIP-Seq peak data.
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinforma. Oxf. Engl. 29, 15–21 (2013).
Article CAS Google Scholar
Liao, Y., Smyth, G. K. & Shi, W. The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads. Nucleic Acids Res. 47, e47 (2019).
Article CAS PubMed PubMed Central MATH Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central MATH Google Scholar
Ge, S. X., Jung, D. & Yao, R. ShinyGO: a graphical gene-set enrichment tool for animals and plants. Bioinformatics 36, 2628–2629 (2020).
Article CAS PubMed MATH Google Scholar
Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. S. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).
Article PubMed PubMed Central Google Scholar
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinforma. Oxf. Engl. 27, 1571–1572 (2011).
Article CAS MATH Google Scholar
Feng, H. & Wu, H. Differential methylation analysis for bisulfite sequencing using DSS. Quant. Biol. Beijing China 7, 327–334 (2019).
Article CAS MATH Google Scholar
Suzuki, R. & Shimodaira, H. Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22, 1540–1542 (2006).
Article CAS PubMed MATH Google Scholar
Neph, S. et al. BEDOPS: high-performance genomic feature operations. Bioinformatics 28, 1919–1920 (2012).
Article CAS PubMed PubMed Central MATH Google Scholar
Genomes Project Consortium. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article ADS Google Scholar
Altshuler, D. M. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Article ADS CAS PubMed MATH Google Scholar
Nalls, M. A. et al. Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet Neurol. 18, 1091–1102 (2019).
Article CAS PubMed PubMed Central MATH Google Scholar
Ripke, S. et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
Article ADS CAS PubMed Central MATH Google Scholar
Howard, D. M. et al. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nat. Neurosci. 22, 343–352 (2019).
Article CAS PubMed PubMed Central MATH Google Scholar
International Obsessive Compulsive Disorder Foundation Genetics Collaborative (IOCDF-GC) and OCD Collaborative Genetics Association Studies (OCGAS). Revealing the complex genetic architecture of obsessive-compulsive disorder using meta-analysis. Mol. Psychiatry 23, 1181–1188 (2018).
Article PubMed Google Scholar
Jansen, I. E. et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet. 51, 404–413 (2019).
Article CAS PubMed PubMed Central Google Scholar
Hou, L. et al. Genome-wide association study of 40,000 individuals identifies two novel loci associated with bipolar disorder. Hum. Mol. Genet. 25, 3383–3394 (2016).
Article CAS PubMed PubMed Central MATH Google Scholar
Demontis, D. et al. Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat. Genet. 51, 63–75 (2019).
Article CAS PubMed MATH Google Scholar
Grove, J. et al. Identification of common genetic risk variants for autism spectrum disorder. Nat. Genet. 51, 431–444 (2019).
Article CAS PubMed PubMed Central MATH Google Scholar
Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429 (2016).
Article CAS PubMed PubMed Central MATH Google Scholar
Bentham, J. et al. Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus. Nat. Genet. 47, 1457–1464 (2015).
Article CAS PubMed PubMed Central MATH Google Scholar
Chiou, J. et al. Interpreting type 1 diabetes risk with genetics and single-cell epigenomics. Nature 594, 398–402 (2021).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
de Lange, K. M. et al. Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease. Nat. Genet. 49, 256–261 (2017).
Article PubMed PubMed Central MATH Google Scholar
International Multiple Sclerosis Genetics Consortium. et al. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476, 214–219 (2011).
Article ADS Google Scholar
Jin, Y. et al. Genome-wide association analyses identify 13 new susceptibility loci for generalized vitiligo. Nat. Genet. 44, 676–680 (2012).
Article CAS PubMed PubMed Central MATH Google Scholar
Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).
Article ADS CAS PubMed MATH Google Scholar
Tachmazidou, I. et al. Identification of new therapeutic targets for osteoarthritis through genome-wide analyses of UK Biobank data. Nat. Genet. 51, 230–236 (2019).
Article CAS PubMed PubMed Central Google Scholar
Tin, A. et al. Target genes, variants, tissues and transcriptional pathways influencing human serum urate levels. Nat. Genet. 51, 1459–1474 (2019).
Jansen, P. R. et al. Genome-wide analysis of insomnia in 1331,010 individuals identifies new risk loci and functional pathways. Nat. Genet. 51, 394–403 (2019).
Article CAS PubMed MATH Google Scholar

Download references

Acknowledgements

We thank members of the Wang Lab for thoughtful discussions of this project; Jessica Hoisington-López and Maria Lynn Jaeger from The Edison Family Center for Genome Sciences and Systems Biology, and the McDonnell Genome Institute for assistance with sequencing; Yang Li and Nancy Lim Saccone for help and insights with linkage disequilibrium score regression. This work was supported by R01HG007175, U41HG010972, U24HG012070 (T.W.), R35HG011922 (M.H.) and T32HG000045-18 (K.Q.).

Author information

Authors and Affiliations

Center for Genome Sciences & Systems Biology, Washington University in St. Louis, St. Louis, MO, USA
Kara Quaid, Xiaoyun Xing, Aaron Tran & Ting Wang
Department of Genetics, Washington University in St. Louis, St. Louis, MO, USA
Kara Quaid, Xiaoyun Xing, Aaron Tran & Ting Wang
Genome Engineering & Stem Cell Center (GESC@MGI), Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO, USA
Yi-Hsien Chen, Yong Miao, Amber Neilson, Vijayalingam Selvamani & Xiaoxia Cui
Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
Ming Hu

Authors

Kara Quaid
View author publications
Search author on:PubMed Google Scholar
Xiaoyun Xing
View author publications
Search author on:PubMed Google Scholar
Yi-Hsien Chen
View author publications
Search author on:PubMed Google Scholar
Yong Miao
View author publications
Search author on:PubMed Google Scholar
Amber Neilson
View author publications
Search author on:PubMed Google Scholar
Vijayalingam Selvamani
View author publications
Search author on:PubMed Google Scholar
Aaron Tran
View author publications
Search author on:PubMed Google Scholar
Xiaoxia Cui
View author publications
Search author on:PubMed Google Scholar
Ming Hu
View author publications
Search author on:PubMed Google Scholar
Ting Wang
View author publications
Search author on:PubMed Google Scholar

Contributions

K.Q., T.W., and M.H. conceived and designed this study. X.C., Y.-H. C., Y.M., A.N., and V.S. reprogrammed iPSCs and differentiated the cells. X.X. and K.Q. performed library preparations. K.Q. analyzed the data. K.Q. and A.T. reviewed the literature to contextualize the results. K.Q. wrote the manuscript with inputs from T.W., M.H., and X.X. All authors subsequently edited and approved the final manuscript.

Corresponding authors

Correspondence to Xiaoxia Cui, Ming Hu or Ting Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Leng Han, and the other anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review file

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Quaid, K., Xing, X., Chen, YH. et al. iPSCs and iPSC-derived cells as a model of human genetic and epigenetic variation. Nat Commun 16, 1750 (2025). https://doi.org/10.1038/s41467-025-56569-4

Download citation

Received: 25 March 2024
Accepted: 22 January 2025
Published: 18 February 2025
Version of record: 18 February 2025
DOI: https://doi.org/10.1038/s41467-025-56569-4

This article is cited by

Human pluripotent stem cell models of Friedreich’s ataxia: innovations, considerations, and future perspectives
- Ha Thi Nguyen
- Marek Napierala
- Jill S. Napierala
Stem Cell Research & Therapy (2026)
Microglia-associated progression of multiple sclerosis: target identification and therapeutic engagement in human in vitro models
- Alica Blenkle
- Anastasia Geladaris
- Martin S. Weber
Experimental & Molecular Medicine (2026)
Advances and Challenges in Constructing Bone Organoids Using Cells Derived from Human Pluripotent Stem Cells: A Review
- Zhengyang Xie
- Maoying Liu
- Ping Zhou
Stem Cell Reviews and Reports (2026)