To overcome the technical difficulties of genotyping STRs (defined as periodic DNA motifs of 2–6 bp spanning a median length of ~25 bp), the team had previously developed the algorithm lobSTR, and used it to catalogue STR variations in whole genomes of individuals in the 1000 Genomes Project. Now, they have linked variations in STR length of 311 of these individuals to the expression levels of nearby genes (obtained by RNA sequencing of the same 311 individuals in the gEUVADIS project), which led to the identification of 2,060 protein-coding genes whose expression was associated with a nearby eSTR variation. The eSTR association signals were robust and reproducible across populations and expression analysis platforms, as 83% of ~800 of the identified eSTRs were confirmed in an independent cohort using Illumina expression profiles from individuals of African, Asian, European and Mexican ancestry.
“STRs make a significant contribution to gene expression”, says Melissa Gymrek (MIT/Harvard), first author of the study. Indeed, linear mixed model analyses confirmed that many of the identified eSTRs link directly to the gene expression level and could not be explained by tagging other variants in the vicinity of the genes. Moreover, these analyses revealed that 10–15% of the heritable variation in gene expression between individuals is determined by eSTRs.
This is a preview of subscription content, access via your institution