Abstract
Epigenetic modifications influence gene expression levels, impact organismal traits, and play a role in the development of diseases. Therefore, variants in genes involved in epigenetic processes are likely to be important in disease susceptibility, and the frequency of variants may vary between populations with African and European ancestries. Here, we analyse an integrated dataset to define the frequencies, associated traits, and functional impact of epigenetic gene variants among individuals of African and European ancestry represented in the UK Biobank. We find that the frequencies of 88.4% of epigenetic gene variants significantly differ between these groups. Furthermore, we find that these variants map to many reported traits and diseases, and we show that allele-frequency differences can alter statistical power and the likelihood of detecting associations across ancestry groups, particularly given the substantial sample-size imbalance between the UK Biobank European-ancestry and African-ancestry subsets. Additionally, we observe that variants associated with traits are significantly enriched for quantitative trait loci that affect DNA methylation, chromatin accessibility, and gene expression. We find that methylation quantitative trait loci account for 71.2% of the variants influencing gene expression. Moreover, variants linked to biomarker traits exhibit high correlation. We therefore conclude that epigenetic gene variants associated with traits tend to differ in their allele frequencies among African and European populations and are enriched for QTLs.
Similar content being viewed by others
Data availability
The raw datasets that support the findings of this manuscript are available from the following sources. The pre-processed datasets are accessible via Zenodo (https://doi.org/10.5281/zenodo.12789774)90 under the Creative Commons Attribution 4.0 license. List of Resources and Datasets: UK Biobank: Biomarker measurements, allele frequencies, and GWAS summary statistics of various biomarkers. https://www.ukbiobank.ac.uk ; https://pan-ukb-us-east-1.s3.amazonaws.com. dbSNP: SNP data and allele frequencies. https://www.ncbi.nlm.nih.gov/snp/. GWAS Catalog: Associations between SNPs and various traits. https://www.ebi.ac.uk/gwas/. GTEx: eQTL, hQTL, and sQTL data. https://gtexportal.org/. mQTLdb: mQTLs. http://www.mqtldb.org. OncoBase Databases: mQTLs. https://ngdc.cncb.ac.cn/databasecommons/database/id/6069. EUR Genome-phenome Archive: H3Africa datasets. https://ega-archive.org/about/ega/ and EGAD00001008577. Reactome Pathways: Epigenetic gene annotations. https://reactome.org/. Ensembl BioMart: Gene annotations and chromosomal positions. http://mart.ensembl.org/info/data/biomart/index.html. Ensembl Variant Effect Predictor: Functional impact of SNPs: Functional impact of SNPs. http://mart.ensembl.org/info/docs/tools/vep/index.html.
Code availability
Code to reproduce most of the results and plots is available from the following GitHub repository: https://github.com/smsinks/epigenetic-gene-variant-dynamics-analysis.
References
Toth, R. et al. Genetic variants in epigenetic pathways and risks of multiple cancers in the GAME-ON consortium. Cancer Epidemiol. Biomark. Prev. 26, 816–825 (2017).
Cebrian, A. et al. Genetic variants in epigenetic genes and breast cancer risk. Carcinogenesis 27, 1661–1669 (2006).
Egger, G., Liang, G., Aparicio, A. & Jones, P. A. Epigenetics in human disease and prospects for epigenetic therapy. Nature. 429, 457–463 (2004).
Van Loo, K. M. J. et al. Epigenetic genes and epilepsy—emerging mechanisms and clinical applications. Nat. Rev. Neurol. 18, 530–543 (2022).
Ntontsi, P., Photiades, A., Zervas, E., Xanthou, G. & Samitas, K. Genetics and epigenetics in asthma. Int. J. Mol. Sci. 22 (2021).
Li, Q. L. et al. Genome-wide profiling in colorectal cancer identifies PHF19 and TBC1D16 as oncogenic super enhancers. Nat. Commun. 12, 6407 (2021).
Portela, A. & Esteller, M. Epigenetic modifications and human disease. Nat. Biotechnol. 28, 1057–1068 (2010).
Barnes, K. C. Genomewide association studies in allergy and the influence of ethnicity. Curr. Opin. Allergy Clin. Immunol. 10, 427–433 (2010).
Chan, S. L., Jin, S., Loh, M. & Brunham, L. R. Progress in understanding the genomic basis for adverse drug reactions: a comprehensive review and focus on the role of ethnicity. Pharmacogenomics. 16, 1161–1178 (2015).
Bien, S. A. et al. The future of genomic studies must be globally representative: Perspectives from PAGE. Annu. Rev. Genom. Hum. Genet. 20, 181–200 (2019).
Sinkala, M., Elsheikh, S. S. M., Mbiyavanga, M., Cullinan, J. & Mulder, N. J. A genome-wide association study identifies distinct variants associated with pulmonary function among European and African ancestries from the UK Biobank. Commun. Biol. 6, 49 (2023).
Ueta, M. et al. Genome-wide association study using the ethnicity-specific Japonica array: identification of new susceptibility loci for cold medicine-related Stevens-Johnson syndrome with severe ocular complications. J. Hum. Genet. 62, 485–489 (2017).
Jorgenson, E. et al. Genetic contributors to variation in alcohol consumption vary by race/ethnicity in a large multi-ethnic genome-wide association study. Mol. Psychiatry. 22, 1359–1367 (2017).
Genomes Project, C. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Ahsan, T., Urmi, N. J. & Sajib, A. A. Heterogeneity in the distribution of 159 drug-response related SNPs in world populations and their genetic relatedness. PLoS One. 15, e0228000 (2020).
Wright, M. L., Ware, E. B., Smith, J. A., Kardia, S. L. R. & Taylor, J. Y. Joint influence of SNPs and DNA methylation on lipids in African Americans from hypertensive sibships. Biol. Res. Nurs. 20, 161–167 (2018).
Carja, O. et al. Worldwide patterns of human epigenetic variation. Nat. Ecol. Evol. 1, 1577–1583 (2017).
Manrai, A. K. et al. Genetic misdiagnoses and the potential for health disparities. N. Engl. J. Med. 375, 655–665 (2016).
Peterson, R. E. et al. Genome-wide association studies in ancestrally diverse populations: Opportunities, methods, pitfalls, and recommendations. Cell. 179, 589–603 (2019).
Bryc, K., Durand, E. Y., Macpherson, J. M., Reich, D. & Mountain, J. L. The genetic ancestry of African Americans, Latinos, and European Americans across the United States. Am. J. Hum. Genet. 96, 37–53 (2015).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 562, 203–209 (2018).
Mo, X. B. et al. Integrative analysis revealed potential causal genetic and epigenetic factors for multiple sclerosis. J. Neurol. 266, 2699–2709 (2019).
Stephens, K. E. et al. Associations between genetic and epigenetic variations in cytokine genes and mild persistent breast pain in women following breast cancer surgery. Cytokine. 99, 203–213 (2017).
Morimoto, Y., Ono, S., Kurotaki, N., Imamura, A. & Ozawa, H. Genetic and epigenetic analyses of panic disorder in the post-GWAS era. J. Neural Transm. (Vienna). 127, 1517–1526 (2020).
Daskalakis, N. P., Rijal, C. M., King, C., Huckins, L. M. & Ressler, K. J. Recent genetics and epigenetics approaches to PTSD. Curr. Psychiatry Rep. 20, 30 (2018).
Freeman, D. M. & Wang, Z. Epigenetic vulnerability of insulator CTCF motifs at Parkinson’s disease-associated genes in response to neurotoxicant rotenone. Front. Genet. 11, 627 (2020).
Jassal, B. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 48, D498–D503 (2020).
Gillespie, M. et al. The reactome pathway knowledgebase 2022. Nucleic Acids Res. 50, D687–D692 (2022).
Gnad, F., Doll, S., Manning, G., Arnott, D. & Zhang, Z. Bioinformatics analysis of thousands of TCGA tumors to determine the involvement of epigenetic regulators in human cancer. BMC Genom. 16 (Suppl 8), S5 (2015).
Fiziev, P. P. et al. Rare penetrant mutations confer severe risk of common diseases. Science 380, eabo1131 (2023).
Truelsen, D., Pereira, V., Phillips, C., Morling, N. & Borsting, C. Evaluation of a custom GeneRead massively parallel sequencing assay with 210 ancestry informative SNPs using the Ion S5 and MiSeq platforms. Forensic Sci. Int. Genet. 50, 102411 (2021).
McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 122 (2016).
Nyangiri, O. A. et al. Copy number variation in human genomes from three major ethno-linguistic groups in Africa. BMC Genom. 21, 289 (2020).
Consortium, H. A. et al. Research capacity. Enabling the genomic revolution in Africa. Science 344, 1346–1348 (2014).
Malaria Genomic Epidemiology, N. Insights into malaria susceptibility using genome-wide data on 17,000 individuals from Africa, Asia and Oceania. Nat. Commun. 10, 5732 (2019).
Pazoki, R. et al. Genetic analysis in European ancestry individuals identifies 517 loci associated with liver enzymes. Nat. Commun. 12, 2579 (2021).
Chen, V. L. et al. Genome-wide association study of serum liver enzymes implicates diverse metabolic and liver pathology. Nat. Commun. 12, 816 (2021).
Conti, D. V. et al. Trans-ancestry genome-wide association meta-analysis of prostate cancer identifies new susceptibility loci and informs genetic risk prediction. Nat. Genet. 53, 65–75 (2021).
Schumacher, F. R. et al. Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci. Nat. Genet. 50, 928–936 (2018).
Asif, H. et al. GWAS significance thresholds for deep phenotyping studies can depend upon minor allele frequencies and sample size. Mol. Psychiatry. 26, 2048–2055 (2021).
Kim, Y. J. et al. The contribution of common and rare genetic variants to variation in metabolic traits in 288,137 East Asians. Nat. Commun. 13, 6642 (2022).
Kanai, M. et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 50, 390–400 (2018).
Zhang, Q. et al. Genotype effects contribute to variation in longitudinal methylome patterns in older people. Genome Med. 10, 75 (2018).
Pulit, S. L. et al. Meta-analysis of genome-wide association studies for body fat distribution in 694 649 individuals of European ancestry. Hum. Mol. Genet. 28, 166–174 (2019).
Vujkovic, M. et al. Discovery of 318 new risk loci for type 2 diabetes and related vascular outcomes among 1.4 million participants in a multi-ancestry meta-analysis. Nat. Genet. 52, 680–691 (2020).
Emison, E. S. et al. Differential contributions of rare and common, coding and noncoding Ret mutations to multifactorial Hirschsprung disease liability. Am. J. Hum. Genet. 87, 60–74 (2010).
Gate, R. E. et al. Genetic determinants of co-accessible chromatin regions in activated T cells across humans. Nat. Genet. 50, 1140–1150 (2018).
Pierce, B. L. et al. Co-occurring expression and methylation QTLs allow detection of common causal variants and shared biological mechanisms. Nat. Commun. 9, 804 (2018).
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).
Nicolae, D. L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).
Tewary, S. K., Zheng, Y. G. & Ho, M. C. Protein arginine methyltransferases: insights into the enzyme structure and mechanism at the atomic level. Cell. Mol. Life Sci. 76, 2917–2932 (2019).
Chen, Z. et al. The emerging role of PRMT6 in cancer. Front. Oncol. 12, 841381 (2022).
Gudjonsson, A. et al. A genome-wide association study of serum proteins reveals shared loci with common diseases. Nat. Commun. 13, 480 (2022).
Le Clerc, S. et al. Genomewide association study of a rapid progression cohort identifies new susceptibility alleles for AIDS (ANRS Genomewide Association Study 03). J. Infect. Dis. 200, 1194–1201 (2009).
Sinkala, M., Elsheikh, S., Mbiyavanga, M., Cullinan, J. & Mulder, N. Multitrait genome-wide analysis in the UK biobank reveals novel and distinct variants influencing cardiovascular traits in Africans and Europeans. medRxiv,2022.02. 27.22268990 (2022).
Ionita-Laza, I., Lee, S., Makarov, V., Buxbaum, J. D. & Lin, X. Sequence kernel association tests for the combined effect of rare and common variants. Am. J. Hum. Genet. 92, 841–853 (2013).
Morabia, A. et al. Association between lipoprotein lipase (LPL) gene and blood lipids: a common variant for a common trait? Genet. Epidemiol. 24, 309–321 (2003).
Campbell, M. C. & Tishkoff, S. A. African genetic diversity: implications for human demographic history, modern human origins, and complex disease mapping. Annu. Rev. Genom. Hum. Genet. 9, 403–433 (2008).
Benonisdottir, S. et al. Epigenetic and genetic components of height regulation. Nat. Commun. 7, 13490 (2016).
Burgio, E., Lopomo, A. & Migliore, L. Obesity and diabetes: from genetics to epigenetics. Mol. Biol. Rep. 42, 799–818 (2015).
Degner, J. F. et al. DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482, 390–394 (2012).
Esoh, K. K. et al. Genome-wide association study identifies novel candidate malaria resistance genes in Cameroon. Hum. Mol. Genet. 32, 1946–1958 (2023).
Huang, Q. Q., Ritchie, S. C., Brozynska, M. & Inouye, M. Power, false discovery rate and Winner’s Curse in eQTL studies. Nucleic Acids Res. 46, e133 (2018).
Consortium, E. P. et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
Consortium, G. T. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 348, 648–660 (2015).
Zhu, Z. et al. Shared genetic and experimental links between obesity-related traits and asthma subtypes in UK Biobank. J. Allergy Clin. Immunol. 145, 537–549 (2020).
Shi, H., Mancuso, N., Spendlove, S. & Pasaniuc, B. Local genetic correlation gives insights into the shared genetic architecture of complex traits. Am. J. Hum. Genet. 101, 737–751 (2017).
Rafnar, T. et al. Variants associating with uterine leiomyoma highlight genetic background shared by various cancers and hormone-related traits. Nat. Commun. 9, 3636 (2018).
Pickrell, J. K. et al. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 48, 709–717 (2016).
Zhu, Z. et al. A genome-wide cross-trait analysis from UK Biobank highlights the shared genetic architecture of asthma and allergic diseases. Nat. Genet. 50, 857–864 (2018).
Ball, R. D. Designing a GWAS: power, sample size, and data structure. Methods Mol. Biol. 1019, 37–98 (2013).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Zerbino, D. R. et al. Ensembl 2018. Nucleic Acids Res. 46, D754–D761 (2018).
Polanski, A. & Kimmel, M. New explicit expressions for relative frequencies of single-nucleotide polymorphisms with application to statistical inference on population growth. Genetics 165, 427–436 (2003).
Mulder, N. et al. H3Africa: current perspectives. Pharmgenomics Pers. Med. 11, 59–66 (2018).
Apaga, D. L., Dennis, S. E., Salvador, J. M., Calacal, G. C. & De Ungria, M. C. Comparison of two massively parallel sequencing platforms using 83 single nucleotide polymorphisms for human identification. Sci. Rep. 7, 398 (2017).
Marina, H. et al. Study on the concordance between different SNP-genotyping platforms in sheep. Anim. Genet. 52, 868–880 (2021).
Machiela, M. J. & Chanock, S. J. LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics 31, 3555–3557 (2015).
Lin, S. H., Thakur, R. & Machiela, M. J. LDexpress: an online tool for integrating population-specific linkage disequilibrium patterns with tissue-specific expression data. BMC Bioinform. 22, 608 (2021).
Kimes, P. K., Liu, Y., Hayes, N., Marron, J. S. & D. & Statistical significance for hierarchical clustering. Biometrics 73, 811–821 (2017).
Belkina, A. C. et al. Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets. Nat. Commun. 10, 5415 (2019).
Buniello, A. et al. The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
Li, X. et al. OncoBase: a platform for decoding regulatory somatic mutations in human cancers. Nucleic Acids Res. 47, D1044–D1055 (2019).
Gaunt, T. R. et al. Systematic identification of genetic influences on methylation across the human life course. Genome Biol. 17, 61 (2016).
P-U, T. Quality Control (QC) | Pan UKBB n.d. https://pan-dev.ukbb.broadinstitute.org/docs/qc/index.html (2021).
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
H, G. Manhattan Plots for visualisation of GWAS results - File Exchange—MATLAB Central n.d. https://www.mathworks.com/matlabcentral/fileexchange/69549-manhattan-plots-for-visualisation-of-gwas-results?s_tid=srchtitle (2021).
Barton, A. R., Sherman, M. A., Mukamel, R. E. & Loh, P. R. Whole-exome imputation within UK Biobank powers rare coding variant association and fine-mapping analyses. Nat. Genet. 53, 1260–1269 (2021).
Benjamini, Y. & Yekutieli, D. Quantitative trait Loci analysis using the false discovery rate. Genetics. 171, 783–790 (2005).
Sinkala, M. Epigenetic-Gene-Variant-Dynamics-Analyiss (Zenodo, 2024).
Acknowledgements
We would like to acknowledge the contributions of the following individuals for their valuable support and collaboration in this research: Enock Matovu (matovue04@yahoo.com), Busisiwe Mlotshwa (mlotshwab@ub.ac.bw), Simo Gustaveand (gsimoca@yahoo.fr), and Martin Simuunza (martin.simuunza@unza.zm).
Funding
This research has been conducted using the UK Biobank Resource under Application Number 53163. The funding for this project was provided by H3ABioNet, supported by the National Institutes of Health Common Fund under grant number U24HG006941. Clement A. Adebamowo and Sally N. Adebamowo were supported by the African Collaborative Center for Microbiome and Genomics Research (ACCME) Grant (1U54HG006947), funds through the Maryland Department of Health’s Cigarette Restitution Fund Program (CH-649-CRF), and the University of Maryland Greenebaum Cancer Center Support Grant (P30CA134274). The content of this publication is solely the authors’ responsibility and does not necessarily represent the official views of the National Institutes of Health.
Author information
Authors and Affiliations
Contributions
The study was conceptualised by Musalula Sinkala (M.S.), Gaone Retshabile (G.R.), Phelelani T. Mpangase (P.T.M.), Nicola Mulder (N.M.), Salia Bamba (S.B.), Modibo K Goita (M.K.G), Victoria Nembaware (V.N.), Samar S. M. Elsheikh (S.S.M.E.), Jeannine Heckmann (J.H.), Kevin Esoh (K.E.), Mogomotsi Matshaba (M.M.), Guida Landoure (G.L.) Ambroise Wonkam (A.W.), Michele Ramsay (M.R.), Clement A. Adebamowo (C.A.A.), Sally N. Adebamowo (S.N.A.), and Ofon Elvis Amih (O.E.A.). The methodology was designed by M.S., G.R., P.T.M., and N.M. Data collection and provision were carried out by M.S., G.R., P.T.M., S.B., M.K.G., V.N., S.S.M.E., J.H., K.E., O.E.A., G.L., A.W., M.M., M.R., C.A.A., and S.N.A. Formal analysis of the data was performed by M.S., G.R., S.B., and M.K.G. The manuscript was drafted by M.S., G.R., P.T.M., and N.M. Editing and reviewing of the manuscript were carried out by M.S., G.R., P.T.M., S.B., M.K.G., G.L., V.N., S.S.M.E., J.H., K.E., M.M., A.W., M.R., C.A.A., S.N.A., O.E.A., and N.M. M.S produced data visualisations. N.M. supervised the study.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics approval
The study protocol was approved by The University of Cape Town; Health Sciences Research Ethics Committee IRB00001938.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Sinkala, M., Retshabile, G., Mpangase, P.T. et al. Mapping epigenetic gene variant dynamics: comparative analysis of frequency, functional impact and trait associations in African and European populations. Sci Rep (2026). https://doi.org/10.1038/s41598-026-41871-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-41871-y


