Abstract
Understanding the interaction between genetic and epigenetic variation remains a challenge due to confounding environmental factors. We propose that human induced Pluripotent Stem Cells (iPSCs) are an excellent model to study the relationship between genetic and epigenetic variation while controlling for environmental factors. In this study, we have created a comprehensive resource of high-quality genomic, epigenomic, and transcriptomic data from iPSC lines and three iPSC-derived cell types (neural stem cell (NSC), motor neuron, monocyte) from three healthy donors. We find that epigenetic variation is most strongly associated with genetic variation at the iPSC stage, and that relationship weakens as epigenetic variation increases in differentiated cells. Additionally, cell type is a stronger source of epigenetic variation than genetic variation. Further, we elucidate a utility of studying epigenetic variation in iPSCs and their derivatives for identifying important loci for GWAS studies and the cell types in which they may be acting.
Similar content being viewed by others
Introduction
While each of our cells has the same genetic template, our tissues are comprised of hundreds of cell types with different gene expression profiles, morphologies, and functions1. This is because epigenetic features, such as histone modification, DNA methylation, and chromatin accessibility regulate which genes are expressed in a cell, informing cellular function and regenerative potential2,3,4,5. Interactions between genotype and epigenetic mechanisms impact gene activities critical for cell type-specific functions. For example, demethylation of an enhancer linked to the SHH gene causes polydactyly2,6. Similarly, a mutation in the DNMT3A gene results in reduced de novo methylation, worsening cancer progression7. While cross-talk between the genome and the epigenome has become widely appreciated in studies of gene regulation with disease etiology8, standardized models for genome-wide benchmarks of such interactions are still lacking. Until we comprehensively characterize the interaction between genetic variation and epigenetic variation, our understanding of the molecular mechanism of gene regulation will remain obscure.
Environmental influences like sunlight, diet, and various other exposures can play a confounding role when we study the interaction between epigenetic variation, genetic variation, and gene expression. This is because environmental factors can affect transcription as well as epigenetic features like DNA methylation9,10,11. In order to study epigenetic impacts on transcription at the population scale, we would need to account for every exposure an individual has had from before birth, or even from before conception. For example, classic epigenetic studies have shown that starvation events from generations back could have an impact on someone’s DNA methylation profile12,13. All of these confounding factors from the environment make it challenging to study the interaction between genetic variation and epigenetic variation at the population level.
One appealing way to untangle epigenetic variation and environmental variation is to remove environmental variation entirely, and study the epigenome in vitro using human induced Pluripotent Stem Cells (iPSCs). Specifically, iPSCs are generated from a somatic cell type, in this case, renal epithelial cells, and reprogrammed into a stem-like state, with differentiation potential similar to embryonic stem cells14. As they were reprogrammed, the iPSCs underwent changes in transcription factor binding, chromatin state, and DNA methylation across the genome14,15,16,17. Previous studies have shown that iPSCs have donor-specific DNA methylation and gene expression even after reprogramming18,19,20, supporting our rationale that iPSCs maintain epigenetic variation that is functionally related to genetic variation while removing environmental variation. In this study, we sought to evaluate whether donor-specific chromatin accessibility and DNA methylation are maintained as cells were differentiated to different germ layers, represented by neural stem cells (NSCs), motor neurons, and monocytes.
Regenerative medicine and disease modeling are important applications of iPSCs and iPSC-derived cell types, such as neural stem cells (NSCs), motor neurons, and monocytes21,22,23. In this work, we investigate the degree to which iPSC-derived cells maintain donor-specific epigenetic patterns and shed some insights into how well-suited iPSC models are useful for studying epigenetic mechanisms that may be involved in phenotypes like Alzheimer’s or Parkinson’s disease24 in neuronal cells, which are more difficult to study without stem cell technology.
We have leveraged deep sequencing data to evaluate the relationship between genetic variation, chromatin accessibility, and DNA methylation genome-wide, and how such relationship may impact gene expression in iPSCs. Further, we have interrogated the translationally important question of how such a relationship is maintained as iPSCs are differentiated into monocytes, NSCs, and motor neurons. We found that overall, iPSCs have donor-specific epigenetic patterns, and the direct relationship between genetic variation and chromatin accessibility is stronger in iPSCs than that in differentiated cells. In contrast, variations in DNA methylation was generally less associated directly with genetic variation, and patterns of association with DNA methylation were consistent across cell types. We found that epigenetic variation that overlies genetic variation is not necessarily more likely to be carried throughout differentiation. Notably, genes associated with epigenetic variation tend to have larger expression differences between samples. Finally, we present the value of epigenetic catalogs of iPSCs and their derivative cells as a valuable resource to interrogate disease mechanisms in different germ layers by identifying relevant cell types and points of accessibility for genome-wide association study (GWAS)-identified SNPs.
Results
Epigenetic variation increases as cells are differentiated, while the direct relationship with genetic variation becomes weaker
Epithelial cells were collected from the urine of three donors: a father daughter pair (M1 and F1), and an unrelated female donor (F2). The cells were reprogrammed into iPSCs, and assayed for gene expression, chromatin accessibility, and DNA methylation (Fig. 1A). Three iPSC lines (F1_1, F1_2, and F1_3) were derived from F1, two lines (M1_1, M1_2) from M1, and one line from F2 (F2_1) (Supplementary Data 1). Cell lines were characterized by marker gene expression (Supplementary Fig. 1 and Supplementary Data 2). Each line has two technical replicates that were grown simultaneously. We hypothesized that if there exists a base relationship between genetic variation and epigenetic variation in iPSCs after reprogramming, iPSCs from the same donor would be more similar to each other epigenetically than iPSCs from another donor. Similarly, iPSCs from related donors would be epigenetically more similar than iPSCs from unrelated donors. To combat the potential impact of reference bias, we employed a personalized genome mapping strategy, using vcf2diploid25 to edit hg38 with individual variants.
A Study Design Graphic. Renal epithelial cells collected from the fresh urine samples of two related donors (F1 and M1) and one unrelated donor (F2) were reprogrammed into iPSCs.: lines F1_1, F1_2, F1_3, M1_1, M1_2, and F2_1. Each line has a technical replicate that was grown simultaneously. Each iPSC line, as well as NSCs, motor neurons, and monocytes differentiated from respective iPSC lines, were assayed for WGBS, ATAC-seq, and RNA-seq to measure DNA methylation, chromatin accessibility, and gene expression. B PCA plots of CpG methylation. Each point is a line colored by cell type, and each technical replicate has the same shape. Observations are the methylation levels (percentages) of CpGs with variance > 0.05 among samples. C PCA plots of ATAC reads within peaks. Each point is a line colored by cell type, and each technical replicate has the same shape. PCAs of reads within ATAC peaks for each of the differentiated cell lines. Each observation is the number of reads under a peak, normalized by library size in DiffBind. D PCA plots of exonic RNA-seq reads. Each point is a line colored by cell type, and each technical replicate has the same shape. Each observation is the number of exonic reads within a gene, normalized by sequencing depth and RNA composition in DESeq2. E Bar plots of the number of differentially methylated CpGs between each pair of lines in each of the differentiated cell lines. Bars are colored by the relationship between the donors of lines (same donor (orange), related donors (green), unrelated donors (purple)). F Bar plots of the number of differentially expressed genes between each pair of lines in each of the differentiated cell lines. Bars are colored by the relationship between the donors of lines. Color coding is the same as in (E). G Bar plots of the number of differentially accessible ATAC peaks between each pair of lines in each of the differentiated cell lines. Color coding is the same as in (E). Source data are provided as a Source Data file.
As expected, chromatin accessibility, DNA methylation, and gene expression are more dependent on the cell type than the donor of origin (Fig. 1B–D and Supplementary Figs. 2–5). In principal components analyses and bootstrapped hierarchical clustering, data from iPSCs, NSCs, motor neurons, and monocytes differentiated from respective iPSC lines of all donors clustered by cell type (Fig. 1B–D and Supplementary Figs. 2–5). We found that CpG methylation in iPSCs was donor-dependent, with lines from the same donor clustering together in PCA (Supplementary Fig. 6A). We calculated the number of differentially methylated regions (DMRs) between each pair of lines, and considered the level of genetic variation between samples as the relationship between donors: same donor(F1_1, F1_2, and F1_3, as well as M1_1 and M1_2), related donors (any sample from F1 compared to any sample from M1), and unrelated donors (any sample from F1 or M1 compared to the line from F2). Relatively few DMRs exist between lines from the same donor (between 10 and 46), compared to lines from related donors (between 1451 and 1585), and between 2667 and 2961 when lines from unrelated donors were compared (Fig. 1E). Taken together, our results demonstrate that iPSCs maintain donor-specific epigenomes that may be linked to genetic variation.
In the PCA plot of ATAC-seq data and gene expression data, iPSC and NSC samples cluster the most tightly together (iPSC and NSC ATAC intra-cluster distances = 893, 877; RNA intra-cluster distances = 17.6, 20.7), while motor neuron and monocyte samples are a bit more spread out (Motor Neuron and Monocyte ATAC intra-cluster distances = 1451, 1261; RNA intra-cluster distances = 43.9, 34.8), in agreement with change in extent of pluripotency of the different cell types (Fig. 1C, D and Supplementary Fig. 7). ATAC-seq intra-cluster Euclidean distances were calculated using reads under called peaks. RNA-seq distances were calculated using exonic read counts for each line sample. Notably, the two M1 iPSC lines, one clonal and one pool, do not cluster closely together in the accessible peaks or gene expression PCA, indicating that most of the variation is not driven by M1’s genetic background (Supplementary Fig. 6B, C). Differentially expressed genes between these two lines were not enriched for any relevant biological process (Supplementary Table 3), and differentiation capacity did not suggest any functional difference in these lines. This result is consistent with previous findings that iPSCs from the same donor may have variable epigenomes26, highlighting the uncertainties of iPSC reprogramming and differentiation. Based on the differences between lines from the same donor, comparative analyses were performed between lines, rather than between individuals, as lines from the same donor might be biologically different.
Compared to the iPSCs, the differentiated cells generally have more differentially expressed (adjusted p < 0.05, abs(log2FC) > 1) genes, differentially accessible (adjusted p < 0.05, abs(log2FC) > 1) peaks, and differentially methylated (p < 0.001) regions between pairs of lines (Fig. 1E–G). However, in the differentiated cells, the number of differentially accessible peaks or differentially expressed genes is not consistently higher between lines that are from unrelated donors than between those of genetically related (i.e., the father-daughter pair) donors or in some cases, even between lines from the same donor. In motor neurons, we do see consistently higher numbers of DMRs between pairs of lines from unrelated donors, but this is not true of the other differentiated cell types (Fig. 1E). For example, F1_1 and F1_2 (same donor) have 1057 differentially expressed genes, while F1_1 and F2_1 (unrelated donors) have 300 differentially expressed genes. In a PCA of CpG methylation in iPSCs, each line is clustered by the donor (Supplementary Fig. 6A). In NSCs, motor neurons, and monocytes, the samples from different donors overlap, indicating that variation in DNA methylation of differentiated cells is less dependent on genetic background. Taken together, our data demonstrate that epigenetic variation greatly increases in differentiated cell types and that the donor of origin, i.e., genetic variation, is not the main contributor to such increase.
Epigenetic variation is more strongly associated with genetic variation in iPSCs than differentiated cells
We identified genetic variants in each of our donors with respect to hg38 and then identified the genetic variants that differed between donors. In all following analyses, genetic variation refers to variation between individuals. As expected, genetic variation between the father and daughter pair of donors (M1 and F1) is less than that between unrelated donors, approximately two-thirds the amount (Fig. 2A). Structural variants in the form of duplications (DUP) and inversions (INV) are much less, due to the short-read nature of our data, but the number of differences followed the same pattern (Fig. 2A). Structural variants between donors ranged in size up to 15,673bp, but the majority were between 1bp and 8bp (Fig. 2B).
A Barplot of the number of genetic variations, as compared to hg38, that are different between each donor. B Distribution of the size of structural variations as compared to hg38 that are different between donors. Ns for each comparison are listed above the corresponding boxes. Box plots are distributed such that the center line represents the median, the outer edges of the box represent the top and bottom 25%, and the whiskers the minimum and maximum. Outliers are represented as dots outside of the plot. C Aggregated odds ratios of all comparisons from each cell type for odds of differential accessibility given genetic variation. D Aggregated odds ratios of all comparisons from each cell type for odds of differential methylation given genetic variation. Source data are provided as a Source Data file.
Next, we examined how often genetic variation directly overlaps epigenetic variation, and in what genomic contexts the relationship might be stronger or weaker than expected by random chance. We hypothesized that the relationship between genetic and epigenetic variation would be different in different genomic contexts (e.g., promoter regions, quiescent regions) and that larger INDELs might have a stronger relationship with epigenetic variation than SNPs. Odds ratios were calculated as the odds(Epigenetic Variation | Genetic Variation)/odds(Epigenetic Variation | No Genetic Variation) between samples, across genomic contexts (genome-wide, promoter, ChromHMM quiescent27) and types of genetic variation. For example, in the comparison between lines F1_1 and F2_1, 1770 differentially accessible peaks overlap a genetic variation, and 64,950 non-differentially accessible peaks overlap a genetic variation genome-wide (Supplementary Fig. 8, inset). The two lines also have 1263 differentially accessible peaks and 87,276 non-differentially accessible peaks that do not overlap a genetic variation. Therefore the odds ratio is (1770/64,950)/(1263/87,276) = 1.88. Odds ratios greater than 1 indicates that a peak is more likely to be differentially accessible if it overlaps a genetic variation than if it does not overlap a genetic variation. The direct relationship between genetic variation and chromatin accessibility varied by sample comparison, but several interesting patterns in the iPSCs emerged. Genome-wide, a differentially accessible peak was slightly more likely if it overlapped any kind of genetic variation than if it was not (aggregate OR = 1.87,95%CI: (1.83,1.91)), but these odds were significantly decreased in the context of promoters and increased in ChromHMM quiescent regions (aggregated OR = 1.26, 2.02, p-value = < 2.2e-16, 2.5e-9, respectively). Overall, INDELs were similarly associated with differential ATAC peaks as SNPs (aggregated OR = 1,72,1.73, p = 0.49), However, medium (20bp-50bp) and large (> 50bp) INDELs having the largest odds of differentially accessible peaks genome-wide (aggregated OR = 8.30, 18.01, p-values < 2.2e-16 compared to any OR with any genetic variation genome-wide) (Supplementary Fig. 9).
We further endeavored to discover whether the relationship we saw in iPSCs changed as the cells were differentiated. Despite the overall increased number of differentially accessible peaks in the differentiated cells, a much weaker overall relationship between genetic variation and differential accessibility exists than that of the iPSCs, Overall, differentially accessible peaks were less likely to be associated with genetic variation in the differentiated cells. Compared to iPSCs, INDELs had a much weaker relationship with differentially accessible peaks genome-wide in the differentiated cell types (iPSC OR = 2.73, NSC OR = 1.29, Motor Neuron OR = 1.24, Monocyte OR = 0.88, p < 2.2e-16 all tests) and that pattern extended across INDEL sizes and genomic contexts (Fig. 2C and Supplementary Fig. 9).
Compared to chromatin accessibility, the odds of differential methylation genome-wide were less associated with the presence of genetic variation genome-wide (aggregated OR = 0.52, 95% CI: (0.51,0.53)) (Fig. 2D). DMRs were more likely to be present in regions with INDELs than in regions with SNPs (aggregate OR = 2.05,0.43, p < 2.2e-16). Interestingly, in contrast to chromatin accessibility, differential methylation is similarly associated with genetic variation in the differentiated cells as it is in the iPSCs, although a similar pattern of decreasing ORs in the motor neurons and monocytes (Fig. 2D and Supplementary Fig. 10). For example, DMRs were similarly likely to occur in regions with INDELs genome-wide across cell types (iPSC OR = 2.05, NSC OR = 2.46, Motor Neuron OR = 1.47, Monocyte OR = 1.94).
To test whether insertions or deletions were consistently resulting in an increase or decrease, respectively, in normalized read counts, we investigated whether their odds of increased ATAC or RNA reads given that the reads overlapped an insertion, compared to if the reads overlapped a deletion. There was no significant increase in the odds of increased ATAC or RNA reads in iPSC comparisons given the presence of insertion compared to a deletion, indicating that there is not an overall pattern of directional INDEL impact upon chromatin accessibility or gene expression (Supplementary Fig. 11). Taken together, our results indicate that genetic variation has a stronger relationship with chromatin accessibility than DNA methylation in iPSCs, and such relationship with chromatin accessibility is not as strong in differentiated cells.
Genetic variation and epigenetic variation interactions may be required for changes in gene expression
We next asked whether the observed differences in chromatin accessibility and DNA methylation are associated with differences in gene expression. We expected that in some cases, epigenetic variation within a gene may be associated with differential gene expression levels, as chromatin accessibility and DNA methylation have documented roles in gene regulation28. In iPSCs, we found that differentially expressed genes directly overlapped with differentially accessible and/or differentially methylated regions between 0.8% and 14%, depending on which iPSC lines were being compared(Supplementary Table 4 and Fig. 3A). Genes that overlapped an epigenetic variation had a slightly higher difference in expression on average (abs(log2FC) = 1.54) if there was a differentially accessible peak (with or without DMR) compared to genes that did not overlap epigenetic variation (abs(log2FC) = 1.07). In Wilcoxon rank-sum tests for each sample comparison, p-values ranged between 4.6e-5 and 0.89 due to low numbers of overlaps with differentially accessible peaks (Supplementary Table 5). To interrogate whether DNA methylation or chromatin accessibility could impact gene expression from distal locations, we used expression quantitative trait loci (eQTLs) from the Human induced Pluripotent Stem Cell Initiative (HipSci) study, which identified eQTLs using 711 iPSC lines from 301 donors29. Genes associated with HipSci eQTLs that overlapped differentially accessible peaks between lines tended to have a larger difference in gene expression (mean abs(Log2FC) = 0.91) compared to genes whose associated eQTL was not differentially accessible (mean abs(Log2FC) = 0.80) (Fig. 3B). Wilcoxon rank-sum tests for these comparisons had a wide range of power to detect these differences, based on the varying numbers of HipSci eQTLs for each sample comparison, with p-values ranging between 0.04 and 0.98 (Supplementary Table 5).
A Box plot of log2 Fold Change gene expression difference between the two samples listed as a comparison, under the conditions that the gene overlapped a differentially accessible peak(DAP), a differentially methylated region (DMR), both, or neither. Ns for each comparison are listed above the corresponding boxes. Each group was compared to "No DAP or DMR" as a control group (See Supplementary Table 5). Box plots are distributed such that the center line represents the median, the outer edges of the box represent the top and bottom 25%, and the whiskers the minimum and maximum. Outliers are represented as dots outside of the plot. B Box plot of log2 Fold Change gene expression difference between the two samples listed as a comparison, under the condition that an associated HipSci eQTL overlapped a differentially accessible peak(DAP) or non differentially accessible peak (non-DAP). The distribution of change in gene expression for each comparison was tested against its corresponding nonDAP comparison as a control (See Supplementary Table 5). Ns for each comparison are listed above the corresponding boxes. Box plots are distributed such that the center line represents the median, the outer edges of the box represent the top and bottom 25%, and the whiskers the minimum and maximum. Outliers are represented as dots outside of the plot. C WashU Epigenome Browser 47 shot of the gene ZSCAN1. Methylation level is described by blue bars in the WGBS section, normalized read counts are described in the frequency plots for ATAC-seq and RNA-seq. Source data are provided as a Source Data file.
One example of genetic variation that is associated with chromatin accessibility and DNA methylation differences, as well as a gene expression difference is gene ZSCAN1. ZSCAN1 is expressed in iPSCs from F1 and M1, with much weaker expression in iPSC from F2. F2’s copy of ZSCAN1 is methylated and inaccessible at the promoter, which may lead to this difference in expression (Fig. 3C). These differences are in conjunction with a genetic difference at SNP rs495839, in the promoter of ZSCAN1 on chromosome 19, which was identified by HipSci as an eQTL (-Log10 p-value = 4.1) in iPSCs29,30 (Supplementary Fig. 12). The downstream gene, ZNF135, is also identified as associated with the SNP rs495839. In F2 samples, the expression is lower than in samples from other donors, though not as dramatically as ZSCAN1. The promoter of ZNF135 is heterozygously methylated in F2 (fully unmethylated in iPSCs from other donors), and the chromatin accessibility is approximately half the levels of other samples, indicating that only one copy of the gene may be expressed. NR1D2 has the strongest TF binding motif alignment to the sequence surrounding rs495839 (CATGACCGAGTGGCCCAGGCCC) which decreases when the SNP is not present (E = 0.21), and the sequence match decreases when the SNP (T- > C) is present (E = 2.06) (Supplementary Fig. 13). Similarly, the match to MYCN’s binding motif is weaker when the SNP is present at that location (E = 2.44, 21.2, respectively) (Supplementary Fig. 14). Both NR1D2 and MYCN are expressed in iPSCs. Overall, we found a trend of higher differences in expression when genes directly overlap, or associated eQTLs overlap differentially accessible peaks, although this difference was not statistically significant. This difference may be attributed to the disruption or enhancement of TF binding sites in many cases.
Epigenetic variation that is directly associated with genetic variation is not more likely to be maintained in multiple cell types
We hypothesized that epigenetic variation that is directly associated with genetic variation is more likely to be ubiquitous across cell types. In other words, a peak that is differentially accessible in one cell type and overlaps a genetic variation may be more likely to be differentially accessible in other cell types if the genetic variation is contributing to the difference in accessibility. We reasoned that if an epigenetic variation was directly caused by a genetic variation, it might be more likely that that epigenetic variation is present in multiple cell types. In WGBS, we observed the opposite result. The proportions of cell type-specific DMRs varied depending on sample comparison (Fig. 4A). On average, 7.1% of DMRs (p < 0.0001) that overlapped genetic variation were differentially methylated in more than one cell type, while 14% of DMRs that did not overlap genetic variation were differentially methylated in more than one cell type on average (2-proportion test, p < 2.2e-16). In differentially accessible ATAC peaks, only 4.7% in peaks with genetic variation and 5.4% in peaks without genetic variation were not cell type specific (Fig. 4B). This difference was statistically significant in a two-proportion test (p < 2.2e-16), but the difference is miniscule. In a two-proportion test, differentially expressed genes that overlap genetic variations were significantly more likely to be shared between cell types than to be differentially expressed in only one cell type, by a small margin (23% in genes with genetic variation, 20% in genes without genetic variation, p = 8.178e-5) (Fig. 4C). Overall, DMRs, differentially accessible peaks, and differentially expressed genes are much more likely to be cell type-specific than consistent across cell types, regardless of overlap with genetic variation (Fig. 4). DMRs and differentially accessible peaks that overlapped genetic variation were not more likely to be shared in more than one cell type than DMRs and differentially accessible peaks that did not overlap genetic variation. This indicates that genetic variation does not directly impact methylation or accessibility in a non-cell-type-specific manner in the cells observed.
A Stacked bar chart of the percent of differentially methylated regions (DMRs) shared across cell types, divided into groups of peaks that overlapped genetic variation and groups of peaks that did not overlap a genetic variation between samples. The key indicates which cell type overlaps the bar chart responds to. For example, all_cell_types indicates the number of DMRs present in all cell types, and iPSC_NSC indicates that the DMRs are present in both iPSCs and NSCs. B Stacked bar chart of percent of differentially expressed genes shared across cell types, divided into groups of genes that overlapped a genetic variation and groups of genes that did not overlap a genetic variation between samples. C Stacked bar chart of percent of differentially accessible peaks shared across cell types, divided into groups of peaks that overlapped genetic variation and groups of peaks that did not overlap a genetic variation between samples. Source data are provided as a Source Data file.
Cell-type-specific enrichment of GWAS traits varies depending on presence or absence of underlying genetic variation
In order to investigate the utility of iPSC systems for studying disease mechanisms, we explored whether we could use our data to identify the relevant cell type of GWAS trait-associated SNPs based on chromatin accessibility. Furthermore, we investigated whether GWAS trait-associated loci are more enriched in ATAC peaks with or without genetic variation. We focused on neurological traits with the hypothesis that there would be enrichment for neurological trait-associated loci in the accessible regions of motor neurons or NSCs compared to iPSCs or monocytes. Linkage Disequilibrium Score Regression (LDSC) analysis was conducted to assess cell-type-specific enrichment for GWAS-identified loci in ATAC peaks compared to a randomized set of similarly sized genomic regions, which would be indicated by a positive regression coefficient. As our cells come from donors of different ancestries, they were analyzed separately with relevant GWAS studies to their backgrounds.
In cells from donor F2 (European Ancestry), for some traits (ADHD, autism, insomnia, major depressive disorder, schizophrenia), ATAC peaks in motor neurons were more enriched for trait-associated loci compared to other cell types, as expected (Fig. 5A). ATAC peaks were enriched for GWAS loci in several cell types, indicating that several cell types may play a role in these complex traits (ADHD, autism, bipolar disorder, schizophrenia). Notably, there was enrichment for schizophrenia-associated loci in ATAC peaks of iPSCs and NSCs, and ADHD-associated loci in ATAC peaks of iPSCs, which may indicate some developmental mechanism for these traits. In F1 and M1 (Eastern Asian Ancestry), ATAC peaks in motor neurons were strongly enriched for schizophrenia-associated loci, specifically in a schizophrenia study from the Psychiatric Genomics Consortium (PGC) (Supplementary Fig. 15A). It should be noted that most of the GWAS studies included for the donors with Eastern Asian Ancestry were from BioBank Japan31, which had small case numbers for each of the traits, limiting the power of those studies to identify associated SNPs (Supplementary Table 6). The exceptions were a schizophrenia study by Lam et al. and a depression study by Giannakopoulou et al.32,33. This may have resulted in the lower regression coefficients and non-significant p-values for GWAS traits from these studies. In a similar analysis, we hypothesized that ATAC peaks in monocytes may be enriched for immune-related traits (Supplementary Fig. 16). Indeed, we found that monocyte count-related SNPs were enriched in ATAC peaks in monocytes. We found a similar enrichment of monocyte count-associated SNPs in the samples from donors of Eastern Asian ancestry (Supplementary Fig. 17).
A (Upper) Bar chart depicting linkage disequilibrium score regression (LDSC) coefficients of ATAC peaks in any F2 sample compared to randomized genomic regions of equivalent size. A positive coefficient indicates an enrichment of GWAS loci in ATAC peaks. (lower) Bar chart depicting unadjusted − log10 p-values of LDSC regression coefficients for each GWAS trait. P-values are determined for regression coefficients through two-sided T tests. B (Upper) Bar chart depicting LDSC coefficients for regression comparing ATAC peaks that were differentially accessible between any line for F2 to ATAC peaks that were not differentially accessible. (Lower) Bar chart depicting unadjusted − log10 p-values of LDSC regression coefficients for each GWAS trait. P-values are determined for regression coefficients through two-sided T tests. C (Upper) Bar chart depicting LDSC coefficients of ATAC peaks that did not overlap a genetic variation compared to ATAC peaks that did overlap a genetic variation. A positive coefficient indicates an enrichment of GWAS loci in peaks without genetic variation (lower) Bar chart depicting unadjusted − log10 p-values of LDSC regression coefficients for each GWAS trait. P-values are determined for regression coefficients through two-sided T tests. D WashU Epigenome Browser Shot47 of SNP rs4453556 in motor neurons. The SNP location is identified with the first red arrow and vertical black line. Matplot wrap is a lineplot showing expression (RNA-seq) of each of the samples, color-coded by a donor. Read counts from RNA-seq are displayed for each sample, as well as ATAC-seq. Source data are provided as a Source Data file.
We further sought to find whether ATAC peaks that were differentially accessible between lines might be enriched for GWAS loci compared to ATAC peaks that were conserved across lines. In this LDSC analysis, a positive regression coefficient would indicate enrichment of trait-associated loci in variable ATAC regions compared to conserved ATAC regions. In cells from donor F2, we found that there was no trait tested with significant enrichment of associated loci in conserved ATAC regions compared to variable ATAC regions (Fig. 5B). Two traits had motor neuron-specific enrichment for associated loci in differentially accessible peaks (autism, major depressive disorder). Notably, Alzheimer’s disease had a monocyte-specific enrichment for loci in differentially accessible peaks. In cells from donors F1 and M1, using GWAS studies with Eastern Asian ancestry, we found that Schizophrenia-associated loci were enriched in differentially accessible peaks compared to conserved peaks (Supplementary Fig. 15B). Monocyte count-associated SNPs were enriched in differentially accessible peaks in monocytes in both European and Eastern Asian background samples (Supplementary Figs. 16B, 17B).
In order to interrogate whether genetically conserved regions within ATAC peaks were more likely to harbor trait-associated loci, we used LDSC to test whether ATAC peaks that overlapped genetic variation and ATAC peaks without genetic variation in our donors had different enrichments of GWAS trait-associated SNPs. In this analysis, a positive coefficient indicates enrichment in regions that are genetically conserved between donors. ATAC peaks were collapsed into one list of ATAC peaks for each donor. We calculated regression coefficients for each cell type for several neurological traits and found that in motor neurons, ATAC peaks without genetic variation were significantly more likely to overlap loci associated with several neurological traits, including ADHD, autism, insomnia, major depressive disorder, and schizophrenia (Fig. 5C). Results for F1 and M1 did not recapitulate these findings, with small regression coefficients and non-significant p-values (Supplementary Fig. 15C). Monocyte count-associated SNPs were enriched in genetically conserved ATAC peaks in monocytes in both European and Eastern Asian background samples (Supplementary Figs. 16C, 17C).
As an illustrative example, the SNP rs4453556 is associated with several neurological phenotypes, which include ADHD, autism, insomnia, bipolar disorder, Alzheimer’s disease, Parkinson’s disease, obsessive-compulsive disorder, post-traumatic stress disorder, and schizophrenia with varying directional effects, but it’s highest estimated effect size of the studies included is in schizophrenia (Z = 3.945). This SNP is located in a region that is accessible in motor neurons in each of the samples and is upstream of the gene NPTX1, a neuronal gene, and RPTOR, a gene involved in cell growth (Fig. 5D). F2 is a homozygous alternative for rs4453556, whereas M1 and F1 are homozygous references. Motor neurons derived from F2 have very low expression of NPTX1 compared to motor neurons derived from the other two donors, but there is not an expression difference in RPTOR. rs4453556 has been identified in some eQTL studies as a non-significant association with NPTX1 in varying directions in different cell types, but no documented association with the brain in FIVEx30(Supplementary Fig. 18). There is a strong eQTL observation of rs4453556 in blood, but we do not see any expression differences in our monocyte samples (Supplementary Fig. 19). To further interrogate the potential mechanism of this SNP’s impact we looked at how it may affect TF binding. ZEB1, which is expressed in motor neurons, has the strongest TF binding motif alignment to the sequence surrounding rs4453556 (CACATCACCTGTCGTGTGTCA) when the SNP is not present (E = 0.61), and the sequence match decreases when the SNP (T- > A) is present (E = 10.0) (Supplementary Fig. 20).
These results suggest that epigenomic data from iPSCs and iPSC-derived cell types may bear valuable information for interrogating mechanism of various diseases, including those that arise during development.
Discussion
We have provided a comprehensive catalog detailing the direct interaction between genetic variation and epigenetic variation using human iPSCs and iPSC-derived cell types. We generated high-quality WGS, WGBS, ATAC-seq, and RNA-seq data in replicates for six iPSC lines from three donors (Supplementary Data 1). Further, these iPSC lines were differentiated into NSCs, motor neurons, and monocytes, for which we also present high-quality WGBS, ATAC-seq, and RNA-seq data. We have identified 817-39,313 (depending on sample comparison) cell-type-specific DMRs that overlapped with genetic variation, 7181-67,837 cell-type-specific differentially accessible peaks that overlapped with genetic variation, and 164-556 cell-type-specific differentially expressed genes that overlapped with genetic variation. At the line level, we have identified 1288-40,105 (depending on sample comparison) DMRs overlapping genetic variation between lines across cell types, 7689-71,953 differentially accessible peaks overlapping genetic variation between lines, and 254-868 differentially expressed genes overlapping genetic variation between lines. All of this provides an important resource for researchers to investigate cell-type-specific epigenomes in the context of epigenetic variation.
Previous studies have shown that iPSCs maintain donor-specific DNA methylation and gene expression patterns even after reprogramming, providing a tractable system that is not overwhelmed by random environmental variations18,19. We found that iPSC lines were generally more similar to lines from the same donor (Fig. 1E–G), supporting the hypothesis that donor-specific epigenetic marks that are maintained after reprogramming and iPSCs as a useful model for studying the relationship between genetic variation and epigenetic variation. We observed variation among iPSCs from the same donor (Supplementary Fig. 1), which is consistent with previous findings26,34. The variability amongst iPSCs from the same donor indicates that some variation in DNA methylation, chromatin accessibility, and gene expression is unexplained by genetic variation or environmental factors, likely due to stochasticity within the cells, and the reprogramming process of iPSCs26. Notably, we did not observe any significant enrichment for relevant gene ontology terms in differentially expressed genes between lines from the same donor (Supplementary Table 3).
Epigenetic buffering involves epigenetic conservation in functionally important regions, where the epigenetic marks may be more important for function than the genetic sequence itself. This is important because it contradicts the idea that the interaction between epigenetic and genetic variation is in one direction. In a study of the relationship between epigenetics and structural variation in the human and chimpanzee genomes, Zhuo et al. found a 2-fold enrichment of medium-sized indels (20bp to 50bp) in CpG island (CGI)-containing promoters when compared to a random distribution of INDELs35. This was surprising, considering the functional impact that promoters have on gene expression. In these regions, DNA methylation and chromatin accessibility profiles were generally conserved across regions, supporting a hypothesis of epigenetic buffering of the impact of genetic variation36. Our finding that iPSC ATAC peaks that overlap medium-sized INDELs, specifically in promoters, were less likely to be differentially accessible (Fig. 2C) did not reach statistical significance. This trend does align with the previous finding of epigenetic buffering in this context, making it an interesting concept to follow up on in future study. Previous studies have demonstrated that genetic variation impacts the epigenome through several mechanisms, such as disrupting transcription factor binding sites or well-characterized mutations in genes encoding epigenetic factors37,38. However, how the epigenome may impact genetic variation is largely unstudied. One mechanism of epigenetic impact on genetic variation is the phenomenon of epigenetic buffering, which involves maintained epigenetic marks in genomic regions, resulting in a dampening of negative selection on genetic mutations, increasing the potential for genetic variation36,39,40. Comparative epigenomics across species has revealed instances of epigenetic buffering in the context of evolution36. It would be interesting to extend these studies to test the hypothesis that epigenetic buffering of genetic variation exists within human individuals and further quantify these instances genome-wide and across four different cell types.
As the cells were differentiated, variation in gene expression and chromatin accessibility increased (Fig. 1C, D, F, G). The variation in differentiated cells was more robust to genetic variation than the epigenetic landscapes of iPSCs (Fig. 2F). Banovich et al. showed that GTEx eQTLs for heart cells were also present in iPSC-derived cardiomyocytes, which indicates that the epigenetic architecture for cell-type-specific gene regulation is maintained in iPSC-derived cells18. We found that DNA methylation, chromatin accessibility, and gene expression variations were likely to be cell-type-specific (Fig. 5), which is reflective of cell-type-specific epigenetic structure in iPSC-derived cells. Variation is more dependent upon cell type than genetic variation.
The relationship between the epigenome and gene expression is complex. In a simple model, genes are regulated by open chromatin and low methylation at promoters28,41. In fact, gene expression can be regulated by various transcription factor bindings, distal regulatory elements, post-translational modifications, and more28. Limited by sample size, we had the power to analyze patterns in the direct relationship between gene expression, chromatin accessibility, and DNA methylation differences of that gene, and found a very limited relationship (Fig. 4A). Extending this analysis to known iPSC eQTLs, we found a trend of increased difference in gene expression if an associated eQTL had differential accessibility, however, this trend was not significant (Fig. 4B). These trends highlight the need for future study with a larger sample size in order to elucidate the relationship between genetic variation and epigenetic variation on distal regulatory elements. We identified genes associated with known eQTLs (e.g., ZSCAN1, ZNF135) that were apparently regulated by chromatin accessibility and methylation (Fig. 4C). This is one example of direct interaction between genetic variation and epigenetic variation resulting in gene expression differences between cell types.
Genome-wide association studies (GWAS) have identified SNPs associated with various disease phenotypes. In order to better understand which of these associated SNPs are functionally important and in what cell type context, we can overlap GWAS variants with functional genomic data, including chromatin accessibility42. Previous studies have utilized iPSC transcriptomics with a similar goal of understanding GWAS SNP mechanism43,44, but the inclusion of chromatin accessibility data sheds deeper insight into which loci are potentially active, and in which cell types. We found that accessible chromatin regions without genetic variation in our motor neuron samples were enriched for loci associated with several neurological phenotypes (Fig. 5C). The regions without genetic variation might be more enriched for these traits because it is expected that causative SNPs are less common in the population45. These findings emphasize the utility of iPSCs and iPSC-derived cells for taking advantage of the knowledge GWAS has given us and interrogating disease mechanisms.
One limitation of our study is that our data was sequenced using short reads, which limits our ability to detect structural variants (SVs)46. In addition, direct alignment of short reads to hg38 results in reference bias, where reads with variants that do not appear in hg38 are less likely to be mapped. This would cause a potential bias in our data wherein reads containing variants may have a lower mapping frequency, which would decrease the number of reads we would see. To avoid such reference bias, we employed a personalized genome mapping strategy, using vcf2diploid25 hg38 with individual variants to create personalized reference genomes to align our data to (See “Methods”). The epigenetic data was then lifted back over to hg38 to compare between samples.
In order to gain insights for disease modeling, data from the epithelial cells that the iPSCs were derived from would be critical. Our study is limited in this aspect, and conclusions about the efficacy of iPSC-derived disease models cannot be confidently made without epigenetic information from the cells prior to reprogramming. Future research, including epigenetic data from the donor cells before iPSC reprogramming would be a valuable resource to understand how patient epigenetic marks are maintained through reprogramming and differentiation.
In conclusion, iPSCs maintain some donor-specific epigenetic signatures that may be linked to genetic variation. Those signatures are generally not maintained as the iPSCs are differentiated. In the differentiated cells, epigenetic variation and gene expression variation increase, but the direct relationship to genetic variation is weaker, genome-wide. The interaction between genetic variation and epigenetic variation can impact gene expression. However, variation in gene expression is much more complicated than direct overlaps of differential methylation or chromatin accessibility, highlighting the need for a further study with a large sample size and other genomics data modalities. Finally, we have elucidated the utility of studying epigenetic variation in iPSCs and their derivatives for identifying important loci for GWAS studies and the cell types in which they may be acting.
Methods
Authors have complied with all relevant ethical regulations. IPSCs were reprogrammed from epithelial cells in donated urine samples. Authors have complied with all relevant ethical regulations, as approved by the Washington University IRB, with IRB study ID 201911152. Informed consent was obtained from all study participants, including the release of demographic information including age, sex, and ethnicity.
iPSC Reprogramming
Renal epithelial cells were collected from fresh urine samples from three donors, one related daughter and father, and one unrelated individual (see Supplementary Table 1). iPSCs were reprogrammed according to using the CytoTune™ iPS 2.0 Sendai Reprogramming Kit following the manufacturer’s instructions (ThermoFisher). Several lines, clonal and pooled, were derived from each donor (Supplementary Table 2). The iPSCs were reprogrammed as pools and clones as a separate endeavor from the Genome Engineering & Stem Cell Center (GESC@MGI) to find out whether pools would be epigenetically different than clonally derived lines. We did not see any indication that lines from pools (i.e., F1_1) were more variable than lines from clones (i.e., F1_3). In the iPSCs, F1_1(pool), F1_2(pool), and F1_3(clone) were all very similar with very few differentially expressed genes, indicating that variation was not inherently caused by clone/pool differentiation. Each of the six lines were grown in duplicate for 12 total cultures in two 6-well plates. All iPSC lines were harvested simultaneously.
NSC Differentiation
NSCs were differentiated according to Du, 201547. In a 12 well plate, iPSCs were kept in NEP induction medium(50% DMEM, 50% Neural basal Medium, 0.5X B27, 0.5X N2, 0.1 mM Ascorbic acid, 1X GlutaMAX, 1X P/S, 2 µM DMH1, 2uM SB-431542, 3 µM CHIR99021) for 6 days and sorted for the CD184 + /CD271- population (Supplementary Fig. 21). All NSC lines were harvested simultaneously.
Motor neuron differentiation
Motor Neurons were further differentiated from NSCs47. The cells were kept in Motor Neuron Progenitor (MNP) induction medium (50% DMEM, 50% Neural basal medium (Gibco), 0.5X N2, 0.5X B27, 0.1 mM Ascorbic Acid, 1X GlutaMAX, 1X P/S, 2 µM DMH1, 2 µM SB-431542, 1 µM CHIR99021, 0.5 µM Pur, 0.1 uM Retinoic Acid) for 6 days. MNPs were expanded in MNP expansion medium (50% DMEM, 50% Neural basal medium, 0.5X N2, 0.5X B27, 0.1 mM Ascorbic Acid, 1X GlutaMAX, 1X P/S, 2 µM DMH1, 2 µM SB-431542, 3 µM CHIR99021, 0.5 µM Pur, 0.5 mM VPA, 0.1 mM Retinoic Acid). The cells then spent 6 days in Motor Neuron induction medium (50% DMEM, 50% Neural basal medium, 0.5X N2, 0.5X B27, 0.1 mM Ascorbic Acid, 1X GlutaMAX, 1X P/S, 0.1 µM Pur, 0.5 µM Retinoic Acid). Finally, the cells were matured into CHAT-positive motor neurons in PDL/Laminin coated wells (50% DMEM, 50% Neural basal medium, 0.5X N2, 0.5X B27, 0.1 mM Ascorbic Acid, 1X GlutaMAX, 0.1 µM Pur, 0.5 µM Retinoic Acid, 0.1 µM Compound E). Motor Neuron morphology before harvesting is documented in Supplementary Fig. 22.
Monocyte differentiation
Monocytes were differentiated using the StemDiff Monocyte kit. iPSCs were treated with ROCKi before dissociation. Cells were counted at a density of 2.5-3 million per well of 6-well plates in an HSC differentiation medium (1X P/S, 20 ng/ml hSCF, 40 ng/ml hBMP4, 40 ng/ml hVEGF). Embryoid bodies (EBs) were harvested day 1 using a reversible cell strainer (Stem Cell Technologies). The medium was changed every three days. On day 12, cells were transferred to monocyte differentiation medium (1X P/S, 10% FBS, 1X GlutaMAX, 25 ng/ml hIL3, 100 ng/ml hM-CSF). Cells were kept for 14 days before sorting for CD14 + /CD11b + and then harvested (Supplementary Fig. 23). Due to insufficient material, the differentiation of samples M1_1 and F2_1 were repeated for both technical replicates at a later time.
WGS Library preparation
Genomic DNA was extracted from each of the 6 lines and processed separately. Genomic DNA samples were quantified using the Qubit Flourometer 3.0 and approximately 25ng of genomic DNA was used for sample assessment. Genomic DNA (~ 600 ng) was fragmented on the Covaris LE220 instrument targeting ~ 375bp inserts. A QC check was done post-fragmentation to assess DNA quality using the Agilent Bioanalyzer. Automated dual-indexed libraries were constructed with the KAPA Hyper PCR-free library prep kit (ROCHE DIAGNOSTICS SYSTEMS, Cat # 7962371001) on the SciCloneG3 NGS instrument platform (Perkin Elmer) in a 96 well format. The concentration of each library was accurately determined through qPCR utilizing the KAPA library Quantification Kit according to the manufacturer’s protocol (KAPA Biosystems/Roche) to produce cluster counts appropriate for the Illumina NovaSeq6000 instrument. Normalized libraries were sequenced on a NovaSeq6000 S4 Flow Cell using the XP workflow and a 151 × 10 × 10 × 151 sequencing recipe according to manufacturer protocol. Target sequencing depth is determined prior to pooling, and samples are pooled in ratios based on the targeted depth.
ATAC Library preparation
ATAC-seq was performed as described in Dong et al. 2020. 100,000-200,000 cells from each of the iPSC lines and differentiated lines were collected upon reaching confluence. Cells were resuspended and centrifuged at 500 × g for 10 min at 4 °C. The supernatant was aspirated. The cells were resuspended in 300 μl DNaseI solution (20 mM Tris pH 7.4, 150 mM NaCl, 1 × reaction buffer with MgCl2, 0.1 U/μl DNaseI) on ice for 5 min. Subsequently, 1 ml of PBS (0.04% BSA) was added, and the mixture was centrifuged at 500 × g for 10 min at 4 °C. The cells were washed twice in PBS + BSA and resuspended in 100 μl ATAC-seq RSB (10 mM Tris pH 7.4, 10 mM NaCl, 3 mM MgCl2 in water) with 0.1% NP40, 0.1% Tween-20, and 0.01% Digitonin and incubated on ice for 3 min to lyse. After lysis, 1 mL of ATAC-seq RSB with 0.1% Tween-20 was added and mixed by inversion. Next, the nuclei were collected by centrifugation at 1000 × g for 10 min at 4 °C. The nuclei were resuspended in 20 µL 2 × TD buffer (20 mM Tris pH 7.6, 10 mM MgCl2, 20% Dimethyl Formamide), and 50,000 nuclei were transferred to a tube with 2 × TD buffer filled up to 25 µL. 25 µL of transposition mix [2.5 µL Transposase (100 nM final) (Illumina, 20034197), 16.5 µL PBS, 0.5 µL 1% digitonin, 0.5 µL 10% Tween-20, and 5 µL water] was added. The transposition reactions were mixed and incubated for 30 min at 37 °C with gentle tapping every 10 min. Reactions were purified with the Zymo DNA Clean and Concentrator-5 kit (Zymo Research, D4014). The ATAC-seq library was amplified for nine cycles with Nextara primers in a PCR machine. The PCR reaction was purified with Sera-Mag Select beads (Cytiva, 29343057) using double size selection with 27.5 µL beads (0.55 × sample volume) and 50 µL beads (1.55 × sample volume). The ATAC-seq libraries were quantitated by Qubit assays and sequenced by an Illumina NextSeq 500 platform.
RNA-seq Library preparation
200–500 k cells from each of the iPSC lines and differentiated lines were treated with Trizol for lysing. RNA was extracted using a Zymo Direct-zol RNA Microprep kit (Zymo Research, R2060). 200 ng of RNA from each sample was used to generate RNA-seq libraries using the NuGEN Universal Plus mRNA-Seq + UDI kit (Tecan, 0520). The libraries underwent 2 × 75 paired-end sequencing by Illumina NextSeq 500.
WGBS Library preparation
Whole Genome Bisulfite Sequencing data was prepared from 300–600 k cells using the xGen Methyl-Seq DNA Library Prep Kit (IDT, 10009860). DNA was extracted sing the Zymo Quick-DNA Microprep Kit (Zymo Research, R3020). DNA was sheared on BioRuptor PICO, 9 cycles with 30 s on, 30 s off to achieve fragments distributed around 300 bp. Sheared DNA was purified using the Zymo DNA Clean and Concentrator-5 kit (Zymo Research, D4014). 50–100 ng of DNA fragments mixed with 0.5% unmethylated lambda DNA were bisulfite converted using EZ DNA Methylation-Gold Kit (Zymo, D5005). WGBS libraries were sequenced on the Illumina NovaSeq 6000 platform.
WGS Alignment and variant analysis
WGS data was aligned to hg38 using BWA48, and deduplicated using Picard v2.22.0 The aligned sequences were joint-genotyped using GATK HaplotypeCaller49. It was verified that there were no genetic variants among lines from the same donor, so reads from lines of the same donor were combined, and alignment and variant calling was repeated. Structural variants were called using Parliament250 in conjunction with MANTA51, DELLY52, LUMPY53, CNVNator54, and dysgu55, and then combined using SURVIVOR56. Genetic variation between donors was determined by comparing the hg38 called variants between donors using VCFtools57 and BEDTools58. Variants were required to have a 75 bp mappability of greater than 0.5 to be included in analyses.
Personalized reference fasta
Personalized reference fastas were created by editing each individual’s variants into hg38 using vcf2diploid25. As most of the variants were un-phased, variants were sorted at random into “maternal” and “paternal” haplotype reference files, to be combined after alignment.
ATAC Analysis
ATAC-seq reads were aligned to personalized references using AIAP59. Bam alignments were lifted over to hg38 using CrossMap60, and peak calls were lifted using liftOver61. Calls for each of the haplotypes were combined to create alignment and peakCall files for each of the samples. QC and analysis on ATAC-seq libraries were performed using AIAP59. Peaks in the ATAC data were called using MACS262 and visualized on the WashU Epigenome Browser63. Batch correction using Combat-seq64 was applied to the Monocyte samples to remove batch effects of asynchronous library preparation (see Methods: Monocyte Differentiation) Data from sex chromosomes was excluded from all analyses. Differentially accessible peaks were calculated using DiffBind65. Intra-cluster distances in PCAs were calculated using the clv package in R. Intra-cluster distances are defined as the Euclidean distances between the furthest two points in the cluster, across all principal components. In this case, clusters were defined as samples from the same cell type. Observations in the principal components analyses were normalized read counts under peaks for each sample. Odds ratios were calculated as the odds that a called peak is differentially accessible between lines given that the peak overlaps a genetic variation, divided by the odds of a peak being differentially accessible given that it does not overlap a genetic variation. Odds ratios were calculated in genome-wide context (all peaks), as well as specifically peaks that overlap ChromHMM Quiescent/Heterochromatin regions27, and peaks that overlap promoters. There are also odds ratios calculated for different types of genetic variation in those genomic contexts, including SNPs, INDELs, DUPs, and INVs. P-values were calculated in odds ratio tests for whether the contextual odds ratios are significantly different from the odds ratio observed in a genome-wide context. All overlaps were computed using BEDTools58. Odds ratio (OR) statistical differences were calculated as follows.
RNA-seq Analysis
RNA-seq data was aligned to personalized references using STAR66. The personalized alignments to each haplotype were lifted to hg38 using liftOver61 and added together to form each sample’s hg38 alignment. Data from sex chromosomes was excluded from all analyses. Gene count matrices were generated using featureCounts from the Rsubread package67. Batch correction using Combat-seq was applied to the Monocyte samples to remove batch effects of asynchronous library preparation (see Methods: Monocyte Differentiation). Differentially expressed genes and normalized gene expression differences were analyzed using DESeq268. Promoter regions were defined as + /-2kb from the transcription start site (TSS) of a gene. Gene ontology analysis was performed using ShinyGo69, with a background gene set including any gene that had >1 RPKM expression in any iPSC line. Intra-cluster distances in PCAs were calculated using the clv package in R. Intra-cluster distances are defined as the Euclidean distances between the furthest two points in the cluster, across all principal components. In this case, clusters were defined as samples from the same cell type. Observations in the principal components analyses were per-gene normalized read counts for each sample. RNA-seq reads were matched with HipSci target genes29 joining by gene name. The eQTL loci were then directly overlapped with chromatin accessibility and methylation statistics using BEDTools58. Motif analysis was conducted by providing hg38 sequence 10bp upstream and downstream of rs495839 to TOMTOM70, with reference sequence and alternative sequence in the rs495839 position, filtering by E < 20 to catch large differences in E-values between the sequences.
WGBS Analysis
WGBS was aligned to personalized references. CpG methylation counts were lifted to hg38 using liftOver61 for each of the haplotypes and added together to form each sample’s hg38 alignment. WGBS alignment and methylation calling was performed as in https://github.com/hyungjoo-lee/wgbs. Reads were trimmed (first 10 bp) using TrimGalore (Version 0.4.5, https://github.com/FelixKrueger/TrimGalore). Trimmed reads were processed using Bismark71. Data from sex chromosomes was excluded from all analyses. Differential methylation was called using DSS72 using smoothing and a p-value cut-off of 0.001. Odds ratios were calculated as the odds of a differentially methylated region (DMR), as opposed to a non-differentially methylated CpG island, between lines given that the region overlaps a genetic variation, divided by the odds of a DMR given that it does not overlap a genetic variation. Odds ratios are calculated in genome-wide context (all DMRs and non-DMR CpG islands), as well as specifically regions that overlap ChromHMM Quiescent/Heterochromatin regions27, and regions that overlap promoters. There are also odds ratios calculated for different types of genetic variation in those genomic contexts, including SNPs, INDELs. P-values were calculated using odds ratio tests for whether the contextual odds ratios are significantly different from the genome-wide odds ratio. All overlaps were computed using BEDTools58.
Bootstrapped hierarchical clustering
Bootstrapped hierarchical clustering was performed using pvclust73 in R (Supplementary Figs. 3–5). Input for the gene expression clustering was DESeq2-normalized gene expression values with variance >0.3 across samples. Input for chromatin accessibility clustering was DiffBind-normalized ATAC-seq reads under peaks with variance >0.3 across samples. Input for methylation clustering was CpG methylation levels of CpGs with variance >0.05 across samples. For each clustering, 10,000 bootstraps were performed. The correlation distance metric was employed, with complete clustering.
LDSC Analysis
ATAC peaks were concatenated for each donor, for each cell type, using BEDOPs -u and BEDOPS -merge74. Merged ATAC peaks were then sorted into peaks with and without variation amongst donors (any variation compared to any donor) using BEDTools intersect58. Cell-type-specific LDSC was conducted according to https://github.com/bulik/ldsc (v1.0.0). LDSC was conducted using 1000 Genomes75 and HapMap376 reference files from https://zenodo.org/records/7768714. F2 was regressed using EUR reference files, and M1 and F1 were regressed using EAS reference files. The cell-type-specific regression was done comparing peaks with no genetic variation to peaks with genetic variation for the enrichment of trait-associated loci (summary statistics from: EUR Parkinson’s77, EUR PTSD75, EUR Schizophrenia78, EUR Major Depressive Disorder79, EUR Obsessive Compulsive Disorder80, EUR Alzheimer’s Disease81, EUR Bipolar Disorder82, EUR Attention Deficit Hyperactivity Disorder83, EUR Autism Spectrum Disorder84, EUR Monocyte Count85, EUR Lupus86, EUR Type 1 Diabetes87, EUR Ulcerative Colitis88, EUR Multiple Sclerosis89, EUR Autoimmune Vitiligo90, EUR Rheumatoid Arthritis91, EUR Osteoarthritis92, EUR Gout93, EUR Insomnia94. From Biobank Japan31: EAS Depression, Epilepsy, Insomnia, Parkinson’s, Schizophrenia, Substance Dependence, Food Allergy, Lupus, Monocyte Count, Psoriasis, Rheumatoid Arthritis, and Type 1 Diabetes. Additional GWAS studies with eastern Asian participants for Schizophrenia32 and Major Depressive Disorder33 were also included.
Statistics & reproducibility
No statistical method was used to predetermine sample size, and no data was excluded from the analysis. Reproducibility of clustering results was verified through bootstrapping. Strict Bonferroni significance cut-offs were employed to determine statistical significance in Wilcoxon rank-sum tests.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
To protect the privacy of donors, donors consented to individual-level sequencing data to be released only to authorized investigators. Individual level sequencing data is available to authorized investigators through dbGaP controlled access under release phs003649.v1.p1 here: https://ftp.ncbi.nlm.nih.gov/dbgap/studies/phs003649/phs003649.v1.p1. Authorized investigators include senior investigators (tenure track professor, senior scientist, or equivalent) and NIH Investigators by logging into eRA commons and downloading through the dbGaP Authorized Access System upon approval of the Data Access Request (DAR). The average wait time for a DAR to be processed is two weeks, and approved investigators will maintain access for one year with the ability to renew. A summary of the data release can be found here: https://ftp.ncbi.nlm.nih.gov/dbgap/studies/phs003649/phs003649.v1.p1/release_notes/Release_Notes.phs003649.iPSCs_HealthyVolunteers.v1.p1.MULTI.pdf. HIPSCI data analyzed in this work can be accessed through https://ftp.ebi.ac.uk/pub/databases/spot/eQTL/sumstats/QTS000016/QTD000361/. Included GWAS studies can be accessed at the following links (also included in Supplementary Table 6): EUR ADHD: https://doi.org/10.1038/s41588-018-0269-7, EUR Alzheimer’s: https://doi.org/10.1038/s41588-018-0311-9, EUR Autism: https://doi.org/10.1038/s41588-019-0344-8, EUR Bipolar: https://doi.org/10.1093/hmg/ddw181, EUR Insomnia: https://doi.org/10.1038/s41588-018-0333-3, EUR Major Depressive Disorder: https://doi.org/10.1038/s41593-018-0326-7, EUR OCD: https://doi.org/10.1038/mp.2017.154, EUR Parkinson’s: https://doi.org/10.1016%2FS1474-4422(19)30320-5, EUR Monocyte Count: https://doi.org/10.1016/j.cell.2016.10.042, EUR Systemic Lupus Erythematous: https://doi.org/10.1038/ng.3434, EUR Type I Diabetes: https://doi.org/10.1038/s41586-021-03552-w, EUR Ulcerative Collitis: https://doi.org/10.1038/ng.3760, EUR Multiple Sclerosis: https://doi.org/10.1038/nature10251, EUR Autoimmune Vitiligo: https://doi.org/10.1038/ng.3680, EUR Rhematoid Arthritis: https://doi.org/10.1038/nature12873, EUR Osteoarthritis: https://doi.org/10.1038/s41588-018-0327-1, EUR Gout: https://doi.org/10.1038/s41588-019-0504-x, EUR PTSD: https://doi.org/10.1038/s41467-019-12576-w, EUR Schizophrenia: https://doi.org/10.1038/nature13595, EAS Depression, Epilepsy, Insomnia, Parkinson’s, Schizophrenia, Substance Dependence, Food Allergy, Lupus, Monocyte Count, Psoriasis Vulgaris, Rheumatoid Arthritis and Type I Diabetes: https://doi.org/10.1038/s41588-021-00931-x, EAS Depression: https://doi.org/10.1001/jamapsychiatry.2021.2099, and EAS Schizophrenia: https://doi.org/10.1038/s41588-019-0512-x. Source data are provided in this paper.
References
Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
Zhang, Q. et al. Genome-wide open chromatin regions and their effects on the regulation of silk protein genes in Bombyx mori. Sci. Rep. 7, 12919 (2017).
Gibney, E. R. & Nolan, C. M. Epigenetics and gene expression. Heredity 105, 4–13 (2010).
Natarajan, A., Yardimci, G. G., Sheffield, N. C., Crawford, G. E. & Ohler, U. Predicting cell-type-specific gene expression from regions of open chromatin. Genome Res. 22, 1711–1722 (2012).
Wagner, J. R. et al. The relationship between DNA methylation, genetic and expression inter-individual variation in untransformed human fibroblasts. Genome Biol. 15, R37 (2014).
Matsubara, Y. et al. Inactivation of sonic hedgehog signaling and polydactyly in limbs of hereditary multiple malformation, a novel type of talpid mutant. Front. Cell Dev. Biol. 4, 149 (2016).
Tatton-Brown, K. et al. Mutations in the DNA methyltransferase gene DNMT3A cause an overgrowth syndrome with intellectual disability. Nat. Genet. 46, 385–388 (2014).
Do, C. et al. Genetic–epigenetic interactions in cis: a major focus in the post-GWAS era. Genome Biol 18, 120 (2017).
Feil, R. & Fraga, M. F. Epigenetics and the environment: emerging patterns and implications. Nat. Rev. Genet. 13, 97–109 (2012).
Wong, C. C. Y. et al. A longitudinal study of epigenetic variation in twins. Epigenetics 5, 516–526 (2010).
Zhu, X. et al. Effects of ultraviolet B exposure on DNA methylation in patients with systemic lupus erythematosus. Exp. Ther. Med. 5, 1219–1225 (2013).
Stenz, L., Schechter, D. S., Serpa, S. R. & Paoloni-Giacobino, A. Intergenerational transmission of DNA methylation signatures associated with early life stress. Curr. Genomics 19, 665–675 (2018).
Ekamper, P., van Poppel, F., Stein, A. D. & Lumey, L. H. Independent and additive association of prenatal famine exposure and intermediary life conditions with adult mortality between age 18–63 years. Soc. Sci. Med. 119, 232–239 (2014).
Takahashi, K. et al. Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell 131, 861–872 (2007).
Berdasco, M. & Esteller, M. DNA methylation in stem cell renewal and multipotency. Stem Cell Res. Ther. 2, 42 (2011).
Godini, R., Lafta, H. Y. & Fallahi, H. Epigenetic modifications in the embryonic and induced pluripotent stem cells. Gene Expr. Patterns 29, 1–9 (2018).
Maherali, N. et al. Directly reprogrammed fibroblasts show global epigenetic remodeling and widespread tissue contribution. Cell Stem Cell 1, 55–70 (2007).
Banovich, N. E. et al. Impact of regulatory variation across human iPSCs and differentiated cells. Genome Res. 28, 122–131 (2018).
Kyttälä, A. et al. Genetic variability overrides the impact of parental cell type and determines iPSC differentiation potential. Stem Cell Rep. 6, 200–212 (2016).
Burrows, C. K. et al. Genetic variation, not cell type of origin, underlies the majority of identifiable regulatory differences in iPSCs. PLOS Genet. 12, e1005793 (2016).
Wu, D.-Y., Bittencourt, D., Stallcup, M. R. & Siegmund, K. D. Identifying differential transcription factor binding in ChIP-seq. Front. Genet. 6, 169 (2015).
Singh, V. K., Kalsan, M., Kumar, N., Saini, A. & Chandra, R. Induced pluripotent stem cells: applications in regenerative medicine, disease modeling, and drug discovery. Front. Cell Dev. Biol. 3, 2 (2015).
Doss, M. X. & Sachinidis, A. Current challenges of iPSC-based disease modeling and therapeutic implications. Cells 8, 403 (2019).
Grezenko, H. et al. Epigenetics in neurological and psychiatric disorders: A comprehensive review of current understanding and future perspectives. Cureus 15, e43960 (2023).
Rozowsky, J. et al. AlleleSeq: analysis of allele-specific expression and binding in a network framework. Mol. Syst. Biol. 7, 522 (2011).
Carcamo-Orive, I. et al. Analysis of transcriptional variability in a large human iPSC library reveals genetic and non-genetic determinants of heterogeneity. Cell Stem Cell 20, 518–532 (2017).
van der Velde, A. et al. Annotation of chromatin states in 66 complete mouse epigenomes during development. Commun. Biol. 4, 1–15 (2021).
Goldberg, A. D., Allis, C. D. & Bernstein, E. Epigenetics: A landscape takes shape. Cell 128, 635–638 (2007).
Kilpinen, H. et al. Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370–375 (2017).
Kwong, A. et al. FIVEx: an interactive eQTL browser across public datasets. Bioinformatics 38, 559–561 (2021).
Sakaue, S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat. Genet. 53, 1415–1424 (2021).
Lam, M. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 51, 1670–1678 (2019).
Giannakopoulou, O. et al. The genetic architecture of depression in individuals of East Asian Ancestry: A genome-wide association study. JAMA Psychiatry 78, 1258–1269 (2021).
Sardo, V. L. et al. The effect of aging on human induced pluripotent stem cells. Nat. Biotechnol. 35, 69–74 (2017).
Zhuo, X., Du, A. Y., Pehrsson, E. C., Li, D. & Wang, T. Epigenomic differences in the human and chimpanzee genomes are associated with structural variation. Genome Res. https://doi.org/10.1101/gr.263491.120 (2020).
Xiao, S. et al. Comparative epigenomic annotation of regulatory DNA. Cell 149, 1381–1392 (2012).
Jin, B. & Robertson, K. D. DNA methyltransferases, DNA damage repair, and cancer. Adv. Exp. Med. Biol. 754, 3–29 (2013).
Zhou, J. et al. Tissue-specific DNA methylation is conserved across human, mouse, and rat, and driven by primary sequence conservation. BMC Genomics 18, 724 (2017).
O’Dea, R. E., Noble, D. W. A., Johnson, S. L., Hesselson, D. & Nakagawa, S. The role of non-genetic inheritance in evolutionary rescue: epigenetic buffering, heritable bet hedging and epigenetic traps. Environ. Epigenetics 2, dvv014 (2016).
Li, Z. et al. LHP1-mediated epigenetic buffering of subgenome diversity and defense responses confers genome plasticity and adaptability in allopolyploid wheat. Nat. Commun. 14, 7538 (2023).
Razin, A. & Cedar, H. DNA methylation and gene expression. Microbiol. Rev. 55, 451 (1991).
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Bonder, M. J. et al. Identification of rare and common regulatory variants in pluripotent cells using population-scale transcriptomics. Nat. Genet. 53, 313–321 (2021).
Warren, C. R. et al. Induced pluripotent stem cell differentiation enables functional validation of GWAS variants in metabolic disease. Cell Stem Cell 20, 547–557 (2017).
Gorlov, I. P., Gorlova, O. Y., Frazier, M. L., Spitz, M. R. & Amos, C. I. Evolutionary evidence of the effect of rare variants on disease etiology. Clin. Genet. 79, 199–206 (2011).
Mahmoud, M. et al. Structural variant calling: the long and the short of it. Genome Biol. 20, 246 (2019).
Du, Z.-W. et al. Generation and expansion of highly pure motor neuron progenitors from human pluripotent stem cells. Nat. Commun. 6, 6626 (2015).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at https://doi.org/10.1101/201178 (2018).
Zarate, S. et al. Parliament2: Accurate structural variant calling at scale. GigaScience 9, giaa145 (2020).
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinforma. Oxf. Engl. 32, 1220–1222 (2016).
Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).
Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).
Abyzov, A., Urban, A. E., Snyder, M. & Gerstein, M. CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, 974–984 (2011).
Cleal, K. & Baird, D. M. Dysgu: efficient structural variant calling using short or long reads. Nucleic Acids Res. 50, e53 (2022).
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Liu, S. et al. AIAP: A quality control and integrative analysis package to improve ATAC-seq data analysis. Genom. Proteom. Bioinform. 19, 641–651 (2021).
Zhao, H. et al. CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinforma. Oxf. Engl. 30, 1006–1007 (2014).
Kuhn, R. M. et al. The UCSC genome browser database: update 2009. Nucleic Acids Res. 37, D755–D761 (2009).
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Li, D. et al. WashU epigenome Browser update 2022. Nucleic Acids Res. 50, W774–W781 (2022).
Zhang, Y., Parmigiani, G. & Johnson, W. E. ComBat-seq: batch effect adjustment for RNA-seq count data. NAR Genomics Bioinforma. 2, lqaa078 (2020).
Stark, R. & Brown, G. DiffBind: Differential binding analysis of ChIP-Seq peak data.
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinforma. Oxf. Engl. 29, 15–21 (2013).
Liao, Y., Smyth, G. K. & Shi, W. The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads. Nucleic Acids Res. 47, e47 (2019).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Ge, S. X., Jung, D. & Yao, R. ShinyGO: a graphical gene-set enrichment tool for animals and plants. Bioinformatics 36, 2628–2629 (2020).
Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. S. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinforma. Oxf. Engl. 27, 1571–1572 (2011).
Feng, H. & Wu, H. Differential methylation analysis for bisulfite sequencing using DSS. Quant. Biol. Beijing China 7, 327–334 (2019).
Suzuki, R. & Shimodaira, H. Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22, 1540–1542 (2006).
Neph, S. et al. BEDOPS: high-performance genomic feature operations. Bioinformatics 28, 1919–1920 (2012).
Genomes Project Consortium. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Altshuler, D. M. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Nalls, M. A. et al. Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet Neurol. 18, 1091–1102 (2019).
Ripke, S. et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
Howard, D. M. et al. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nat. Neurosci. 22, 343–352 (2019).
International Obsessive Compulsive Disorder Foundation Genetics Collaborative (IOCDF-GC) and OCD Collaborative Genetics Association Studies (OCGAS). Revealing the complex genetic architecture of obsessive-compulsive disorder using meta-analysis. Mol. Psychiatry 23, 1181–1188 (2018).
Jansen, I. E. et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet. 51, 404–413 (2019).
Hou, L. et al. Genome-wide association study of 40,000 individuals identifies two novel loci associated with bipolar disorder. Hum. Mol. Genet. 25, 3383–3394 (2016).
Demontis, D. et al. Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat. Genet. 51, 63–75 (2019).
Grove, J. et al. Identification of common genetic risk variants for autism spectrum disorder. Nat. Genet. 51, 431–444 (2019).
Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429 (2016).
Bentham, J. et al. Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus. Nat. Genet. 47, 1457–1464 (2015).
Chiou, J. et al. Interpreting type 1 diabetes risk with genetics and single-cell epigenomics. Nature 594, 398–402 (2021).
de Lange, K. M. et al. Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease. Nat. Genet. 49, 256–261 (2017).
International Multiple Sclerosis Genetics Consortium. et al. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476, 214–219 (2011).
Jin, Y. et al. Genome-wide association analyses identify 13 new susceptibility loci for generalized vitiligo. Nat. Genet. 44, 676–680 (2012).
Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).
Tachmazidou, I. et al. Identification of new therapeutic targets for osteoarthritis through genome-wide analyses of UK Biobank data. Nat. Genet. 51, 230–236 (2019).
Tin, A. et al. Target genes, variants, tissues and transcriptional pathways influencing human serum urate levels. Nat. Genet. 51, 1459–1474 (2019).
Jansen, P. R. et al. Genome-wide analysis of insomnia in 1331,010 individuals identifies new risk loci and functional pathways. Nat. Genet. 51, 394–403 (2019).
Acknowledgements
We thank members of the Wang Lab for thoughtful discussions of this project; Jessica Hoisington-López and Maria Lynn Jaeger from The Edison Family Center for Genome Sciences and Systems Biology, and the McDonnell Genome Institute for assistance with sequencing; Yang Li and Nancy Lim Saccone for help and insights with linkage disequilibrium score regression. This work was supported by R01HG007175, U41HG010972, U24HG012070 (T.W.), R35HG011922 (M.H.) and T32HG000045-18 (K.Q.).
Author information
Authors and Affiliations
Contributions
K.Q., T.W., and M.H. conceived and designed this study. X.C., Y.-H. C., Y.M., A.N., and V.S. reprogrammed iPSCs and differentiated the cells. X.X. and K.Q. performed library preparations. K.Q. analyzed the data. K.Q. and A.T. reviewed the literature to contextualize the results. K.Q. wrote the manuscript with inputs from T.W., M.H., and X.X. All authors subsequently edited and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Leng Han, and the other anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Quaid, K., Xing, X., Chen, YH. et al. iPSCs and iPSC-derived cells as a model of human genetic and epigenetic variation. Nat Commun 16, 1750 (2025). https://doi.org/10.1038/s41467-025-56569-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-56569-4