Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Mobile element variation contributes to population-specific genome diversification, gene regulation and disease risk

Abstract

Mobile genetic elements (MEs) are heritable mutagens that recursively generate structural variants (SVs). ME variants (MEVs) are difficult to genotype and integrate in statistical genetics, obscuring their impact on genome diversification and traits. We developed a tool that accurately genotypes MEVs using short-read whole-genome sequencing (WGS) and applied it to global human populations. We find unexpected population-specific MEV differences, including an Alu insertion distribution distinguishing Japanese from other populations. Integrating MEVs with expression quantitative trait loci (eQTL) maps shows that MEV classes regulate tissue-specific gene expression by shared mechanisms, including creating or attenuating enhancers and recruiting post-transcriptional regulators, supporting class-wide interpretability. MEVs more often associate with gene expression changes than SNVs, thus plausibly impacting traits. Performing genome-wide association study (GWAS) with MEVs pinpoints potential causes of disease risk, including a LINE-1 insertion associated with keloid and fasciitis. This work implicates MEVs as drivers of human divergence and disease risk.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Discovery and accurate genotyping of MEVs in global and Japanese populations.
Fig. 2: Biased distribution of MEVs.
Fig. 3: eQTL analysis with MEVs.
Fig. 4: Alu insertions in regulatory elements.
Fig. 5: Alu insertions in 3’UTRs.
Fig. 6: MEVs associate with disease.

Similar content being viewed by others

Data availability

MEVs identified from 1000GP and summary statistics of eQTL analysis were uploaded in Zenodo (https://doi.org/10.5281/zenodo.7703708). The positions of and allele frequencies of MEVs identified from Japanese recruited by BBJ are available from National Bioscience Database Center (https://humandbs.biosciencedbc.jp/en/, accession ID: hum0014.v28) without any access restrictions. Summary statistics of GWAS are publicly available from our website (JENGER, http://jenger.riken.jp/en). Human reference genomes, human_g1k_v37, hs37d5, and GRCh38DH, are available from 1000GP repository (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz, ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz, and http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa, respectively). The 30× WGS data from 3,202 individuals recruited by 1000GP were downloaded from the 1000GP website (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/). SV callset generated by the phased assembly variant caller (PAV) in part of 1000GP is available from 1000GP repository (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC2/release/v2.0/integrated_callset/variants_freeze4_sv_insdel.tsv.gz). SV callset generated by Panenie in part of 1000GP is available from 1000GP repository (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC2/release/v1.0/PanGenie_results). GATK-SV callset generated by Panenie in part of 1000GP is available from 1000GP repository (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20210124.SV_Illumina_Integration/1KGP_3202.Illumina_ensemble_callset.freeze_V1.vcf.gz). Human repeat library is available from RepBase (https://www.girinst.org/repbase). Gene models, GenCode GTF v26 and v26lift37, are available from GenCode (https://www.gencodegenes.org/human). SNVs found from participants in 1000GP are available from 1000GP repository (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502). ENCODE cCRE dataset is available from ENCODE repository (https://screen.encodeproject.org). Accession numbers of histone ChIP-seq data from ENCODE are summarized in Supplementary Table 7. Other datasets of ESCs we used are available from ENCODE repository (https://screen.encodeproject.org) and deposited under accession numbers ENCFF601NBW (CpG methylation), ENCFF524BMX (CpG methylation), ENCFF379ZXG (CHG methylation), ENCFF086MMC (CHG methylation), ENCFF417VRB (CHH methylation), ENCFF918PML (CHH methylation), ENCFF000KUF (Repli-Chip), ENCFF000KUG (Repli-Chip), ENCFF000KUK (Repli-Chip), ENCFF905XDS (DNase-seq), ENCFF574LKL (DNase-seq), ENCFF338KTY (DNase-seq), ENCFF821AQO (CTCF ChIP-seq), ENCFF418QVJ (phospho-Pol-II A ChIP-seq), ENCFF422HDN (Pol-ll ChIP-seq), and ENCFF834UVX (EP300 ChIP-seq). Annotations of DHS were obtained from data deposited under accession number ENCFF503GCK. Methylation data of iPSCs and ECSs taken by genome tiling array deposited under number accession GSE60821 was used. Hi-C data of H1-hESCs deposited under accession numbers GSM5057489 and GSM5057481 was used. Gene expression profiles during spermatogenesis and early embryo deposited under accession numbers GSE120508 and GSE36552 were used. Read count tables of RNA-sequencing done by GTEx are available from GTEx repository (https://storage.googleapis.com/gtex_analysis_v8/rna_seq_data/GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_exon_reads.parquet and GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_reads.gct.gz). Covariates for eQTL analysis and phased SNVs and indels in GTEx (that is PEER factors, genetic PCs, library preparation methods, sequencing platforms, and sex) are available from NCBI under dbGaP accession number phs000424. The script used to collapse GenCode GTF file is available from the URL below: https://github.com/broadinstitute/gtex-pipeline/blob/master/gene_model/collapse_annotation.py. The script used to apply TMM normalization is available from the URL below: https://github.com/broadinstitute/gtex-pipeline/blob/master/qtl/src/eqtl_prepare_expression.py. MEVs identified in Cao et al. are available from the following URLs: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02101-4. Gene sets used for gene-set enrichment analysis is available from msigdb (https://www.gsea-msigdb.org/gsea/msigdb/). 7221 summary statistics of GWAS done by Pan-UKB were downloaded from the URLs listed in a file: https://pan-ukb-us-east-1.s3.amazonaws.com/sumstats_release/phenotype_manifest.tsv.bgz. The ‘tophit’ variants in BBJ were downloaded from http://jenger.riken.jp:8080/top_hits and https://pheweb.jp.

Code availability

MEGAnE is deposited in Zenodo (https://doi.org/10.5281/zenodo.7703696) and available from GitHub and Dockerhub (https://github.com/shohei-kojima/MEGAnE, https://hub.docker.com/repository/docker/shoheikojima/megane). A complete environment including MEGAnE and other required software is available from Dockerhub.

References

  1. Wells, J. N. & Feschotte, C. A field guide to eukaryotic transposable elements. Annu. Rev. Genet. 54, 539–561 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. Serrato-Capuchina, A. & Matute, D. R. The role of transposable elements in speciation. Genes (Basel) 9, 254 (2018).

    PubMed  Google Scholar 

  3. Payer, L. M. & Burns, K. H. Transposable elements in human genetic disease. Nat. Rev. Genet. 20, 760–772 (2019).

    CAS  PubMed  Google Scholar 

  4. Kobayashi, K. et al. An ancient retrotransposal insertion causes Fukuyama-type congenital muscular dystrophy. Nature 394, 388–392 (1998).

    CAS  PubMed  Google Scholar 

  5. Hancks, D. C. & Kazazian, H. H. J. Roles for retrotransposon insertions in human disease. Mob. DNA 7, 9 (2016).

    PubMed  PubMed Central  Google Scholar 

  6. Kagawa, T. et al. Recessive inheritance of population-specific intronic LINE-1 insertion causes a rotor syndrome phenotype. Hum. Mutat. 36, 327–332 (2015).

    CAS  PubMed  Google Scholar 

  7. Goubert, C., Zevallos, N. A. & Feschotte, C. Contribution of unfixed transposable element insertions to human regulatory variation. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 375, 20190331 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Chiang, C. et al. The impact of structural variation on human gene expression. Nat. Genet. 49, 692–699 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Payer, L. M. et al. Alu insertion variants alter gene transcript levels. Genome Res. 31, 2236–2248 (2021).

    PubMed  PubMed Central  Google Scholar 

  10. Cao, X. et al. Polymorphic mobile element insertions contribute to gene expression and alternative splicing in human tissues. Genome Biol. 21, 185 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Sekar, A. et al. Schizophrenia risk from complex variation of complement component 4. Nature 530, 177–183 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Payer, L. M. et al. Structural variants caused by Alu insertions are associated with risks for many human diseases. Proc. Natl Acad. Sci. USA 114, E3984–E3992 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Jacques, P.-É., Jeyakani, J. & Bourque, G. The majority of primate-specific regulatory sequences are derived from transposable elements. PLoS Genet. 9, e1003504 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. Meuleman, W. et al. Index and biological spectrum of human DNase I hypersensitive sites. Nature 584, 244–251 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Trizzino, M. et al. Transposable elements are the primary source of novelty in primate gene regulation. Genome Res. 27, 1623–1633 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Peter, E. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, eabf7117 (2021).

    Google Scholar 

  18. Scott, A. J., Chiang, C. & Hall, I. M. Structural variants are a major source of gene expression differences in humans and often affect multiple nearby genes. Genome Res. 31, 2249–2257 (2021).

    PubMed  PubMed Central  Google Scholar 

  19. Ito, J. et al. A hominoid-specific endogenous retrovirus may have rewired the gene regulatory network shared between primordial germ cells and naïve pluripotent cells. PLoS Genet. 18, e1009846 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory evolution of innate immunity through co-option of endogenous retroviruses. Science 351, 1083–1087 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Chaisson, M. J. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1784 (2019).

    PubMed  PubMed Central  Google Scholar 

  22. Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Abel, H. J. et al. Mapping and characterization of structural variation in 17,795 human genomes. Nature 583, 83–89 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Jabbari, K. & Bernardi, G. CpG doublets, CpG islands and Alu repeats in long human DNA sequences from different isochore families. Gene 224, 123–128 (1998).

    CAS  PubMed  Google Scholar 

  25. Soumillon, M. et al. Cellular source and mechanisms of high transcriptome complexity in the mammalian testis. Cell Rep. 3, 2179–2190 (2013).

    CAS  PubMed  Google Scholar 

  26. van Bree, E. J. et al. A hidden layer of structural variation in transposable elements reveals potential genetic modifiers in human disease-risk loci. Genome Res. 32, 656–670 (2022).

    PubMed  PubMed Central  Google Scholar 

  27. Vialle, R. A., de Paiva Lopes, K., Bennett, D. A., Crary, J. F. & Raj, T. Integrating whole-genome sequencing with multi-omic data reveals the impact of structural variants on gene regulation in the human brain. Nat. Neurosci. 25, 504–514 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Lubelsky, Y. & Ulitsky, I. Sequences enriched in Alu repeats drive nuclear localization of long RNAs in human cells. Nature 555, 107–111 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Fueyo, R., Judd, J., Feschotte, C. & Wysocka, J. Roles of transposable elements in the regulation of mammalian transcription. Nat. Rev. Mol. Cell Biol. 23, 481–497 (2022).

    CAS  PubMed  Google Scholar 

  30. Spracklen, C. N. et al. Identification of type 2 diabetes loci in 433,540 East Asian individuals. Nature 582, 240–245 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Hautanen, A. Synthesis and regulation of sex hormone-binding globulin in obesity. Int. J. Obes. 24, S64–S70 (2000).

    CAS  Google Scholar 

  32. Ogawa, R. et al. Associations between keloid severity and single-nucleotide polymorphisms: importance of rs8032158 as a biomarker of keloid severity. J. Invest. Dermatol. 134, 2041–2043 (2014).

    CAS  PubMed  Google Scholar 

  33. Fujita, M. et al. NEDD4 is involved in inflammation development during keloid formation. J. Invest. Dermatol. 139, 333–341 (2019).

    CAS  PubMed  Google Scholar 

  34. Gardner, E. J. et al. The Mobile Element Locator Tool (MELT): population-scale mobile element discovery and biology. Genome Res. 27, 1916–1929 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Costantini, M., Auletta, F. & Bernardi, G. The distributions of ‘new’ and ‘old’ Alu sequences in the human genome: the solution of a ‘mystery’. Mol. Biol. Evol. 29, 421–427 (2012).

    CAS  PubMed  Google Scholar 

  36. Baggen, J. et al. Genome-wide CRISPR screening identifies TMEM106B as a proviral host factor for SARS-CoV-2. Nat. Genet. 53, 435–444 (2021).

    CAS  PubMed  Google Scholar 

  37. Van Deerlin, V. M. et al. Common variants at 7p21 are associated with frontotemporal lobar degeneration with TDP-43 inclusions. Nat. Genet. 42, 234–239 (2010).

    PubMed  PubMed Central  Google Scholar 

  38. Marttala, J., Andrews, J. P., Rosenbloom, J. & Uitto, J. Keloids: Animal models and pathologic equivalents to study tissue fibrosis. Matrix Biol. 51, 47–54 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Linker, S. B., Marchetto, M. C., Narvaiza, I., Denli, A. M. & Gage, F. H. Examining non-LTR retrotransposons in the context of the evolving primate brain. BMC Biol. 15, 68 (2017).

    PubMed  PubMed Central  Google Scholar 

  40. Yampolsky, L. Y. & Stoltzfus, A. Bias in the introduction of variation as an orienting factor in evolution. Evol. Dev. 3, 73–83 (2001).

    CAS  PubMed  Google Scholar 

  41. Hirata, M. et al. Cross-sectional analysis of BioBank Japan clinical data: a large cohort of 200,000 patients with 47 common diseases. J. Epidemiol. 27, S9–S21 (2017).

    PubMed  PubMed Central  Google Scholar 

  42. Nagai, A. et al. Overview of the BioBank Japan Project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).

    PubMed  PubMed Central  Google Scholar 

  43. Ongen, H., Buil, A., Brown, A. A., Dermitzakis, E. T. & Delaneau, O. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32, 1479–1485 (2016).

    CAS  PubMed  Google Scholar 

  44. Urbut, S. M., Wang, G., Carbonetto, P. & Stephens, M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat. Genet. 51, 187–195 (2019).

    CAS  PubMed  Google Scholar 

  45. Ishigaki, K. et al. Large-scale genome-wide association study in a Japanese population identifies novel susceptibility loci across different diseases. Nat. Genet. 52, 669–679 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. Lee, C. H., Moioli, E. K. & Mao, J. J. Fibroblastic differentiation of human mesenchymal stem cells using connective tissue growth factor. Conf. Proc. IEEE Eng. Med Biol. Soc. 2006, 775–778 (2006).

    PubMed  Google Scholar 

Download references

Acknowledgements

We are grateful to all of the participants in BBJ, as well as the staff of BBJ for their assistance. BBJ is supported by the Japan Agency for Medical Research and Development (AMED) (grant JP19km0605001). The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health and by NCI, NHGRI, NHLBI, NIDA, NIMH and NINDS. The GTEx data used for the analyses described in this study were obtained from dbGaP accession number phs000424.v8.p2 We are grateful to all of the families at the participating Simons Simplex Collection (SSC) sites, as well as the principal investigators (A. Beaudet, R. Bernier, J. Constantino, E. Cook, E. Fombonne, D. Geschwind, R. Goin-Kochel, E. Hanson, D. Grice, A. Klin, D. Ledbetter, C. Lord, C. Martin, D. Martin, R. Maxim, J. Miles, O. Ousley, K. Pelphrey, B. Peterson, J. Piggot, C. Saulnier, M. State, W. Stone, J. Sutcliffe, C. Walsh, Z. Warren and E. Wijsman). We appreciate obtaining access to genetic and pedigree data on SFARI Base. We acknowledge the resources of the 1000 Genomes Project and HGDP-CEPH Human Genome Diversity Cell Line Panel. We are grateful to the UK Biobank participants contributing to the results made public via the Pan-UK Biobank Resource and acknowledge the Pan-UKBB team (https://pan.ukbb.broadinstitute.org/team). Super-computing resources were provided by Human Genome Center, the Institute of Medical Science, the University of Tokyo (SHIROKANE), and the Office for Information Systems and Cybersecurity, RIKEN (HOKUSAI General Use project G20021 and Q21537). We acknowledge H. Yoshida and T. Suzuki of RIKEN Center for Integrative Medical Sciences for providing plasmids, X. Chen and G. Bourque of Kyoto University and N. Sasa and Y. Okada of Osaka University for testing prerelease versions of MEGAnE, K. Sato and J. Ito of University of Tokyo for helpful discussion, and M. Yoshioka for outstanding administrative support. S. Kojima, M.N. and A.J.K. acknowledge funding from the Incentive Research Projects of RIKEN, supported in part by Resona Bank. S. Kojima acknowledges a funding from Japan Society for the Promotion of Science (JSPS) KAKENHI Grant-in-Aid for Early-Career Scientists 22K15385. S. Kojima and S.M.H. acknowledge the RIKEN Special Postdoctoral Researcher Program. K. Ito acknowledges fundings from JSPS KAKENHI Grant-in-Aid for Scientific Research(B) 21H02919, AMED (JP22ek0210164, JP21tm0724601, JP20km0405209 and JP20ek0109487) and Research Funding for Longevity Sciences from the NCGG. N.F.P. acknowledges funding from JSPS KAKENHI Grants-in-Aid for Scientific Research(S) 20H05682, JSPS KAKENHI Grants-in-Aid for Scientific Research(B) 21H02972, RIKEN-McGill International Collaborative grant, Gout and Uric Acid Foundation of Japan, Cluster for Pioneering Research under the Hakubi fellowship program and from the discretionary budget of the Director of the RIKEN Center for Integrative Medical Sciences, K. Yamamoto.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

S. Kojima and N.F.P. designed this study. S. Kojima, A.T., M.H., and N.F.P. developed MEGAnE. S. Kojima, S. Koyama, and N.F.P. analyzed data. N.F.P., K.H., K. Ishigaki, K. Ito, C.T., and Y.K. organized data in BBJ. S. Kojima, Y.S., M.E., M.M., S.T., Y.M., Y. Momozawa, and N.F.P. performed targeted deep sequencing. S.M.H., A.J.K., Y.M., Y. Murakawa, C.T., M.N., and Y.N. managed cell culture. S. Kojima, E.H.P., M.K., A.F.G., and R.K. performed wet experiments. S. Kojima and N.F.P. wrote and all authors checked the manuscript.

Corresponding authors

Correspondence to Shohei Kojima or Nicholas F. Parrish.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Clement Goubert and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Note and Supplementary Figures 1–47

Reporting Summary

Peer Review File

Supplementary Table 1

Supplementary Tables 1–18

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kojima, S., Koyama, S., Ka, M. et al. Mobile element variation contributes to population-specific genome diversification, gene regulation and disease risk. Nat Genet 55, 939–951 (2023). https://doi.org/10.1038/s41588-023-01390-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41588-023-01390-2

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing