Introduction

Tourette’s Syndrome (TS) is a neuropsychiatric disorder that is characterized by motor and phonic tics1. The onset of the disorder is typically between the age of 5 and 7 years. TS has been shown to have a large genetic component, in which the first-degree relatives of TS patients have a 10–100-fold higher rate of TS compared to the general population2,3. Past genetic studies of TS have identified several implicated genes such as CELSR3, a gene where recurrent do novo variants are found in probands4. Furthermore, a recent genome-wide association study (GWAS) identified a genome-wide significant hit on chromosome 13, rs2504235, which is within the FLT3 (Fms Related Tyrosine Kinase 3) gene5. Although GWAS is a powerful method for identifying associated genetic loci, it is often difficult to interpret the biological effects of significant hits.

Recently, transcriptomic imputation was developed to allow for the integration of genetic and expression data from datasets such as the CommonMind (CMC) and Genotype-Tissue Expression (GTEx) consortia6,7. The derivation of panels involves a machine-learning approach to characterize the relationship between gene expression and genotypes, making tissue-specific predictive models. Transcriptomic imputation can leverage these reference imputation panels from these consortia and identifies the genetic correlation between imputed expression and GWAS data6. Ultimately, transcriptomic imputation allows for better characterization of GWAS data by prioritizing tissue-specific genes associated with disease8. Furthermore, imputation of aberrant splicing can be done through this method. Given that alternative splicing occurs frequently in brain tissue and in early development, identifying potential genes with aberrant splicing could be important for understanding the genetic etiology of brain disorders9,10. This methodology, known as a transcriptome-wide association study (TWAS), has already been used to prioritize genes in a variety of different traits, such as schizophrenia, depression, and ADHD6,11,12,13,14. For instance, for ADHD, a TWAS was done with adult brain tissue and found several genes including ST3GAL315. A subsequent ADHD TWAS using fetal brain tissue also implicated ST3GAL316.

To identify genetically regulated genes associated with TS, we conducted a TWAS of the current largest TS cohort of 4819 cases and 9488 controls5. Brain-specific panels were derived from the CMC and GTEx 53 v7. The TWAS revealed the expression of FLT3 to be increased across many brain tissues in TS, with the largest effect in the dorsolateral prefrontal cortex (DLPFC). Given that FLT3 is expressed in lymphoblasts, we additionally measured the RNA expression of FLT3 in 100 lymphoblastoid cell lines (LCL;50 cases and 50 controls). Brain samples for TS are often difficult to acquire and not readily available in large sample sizes, which make LCL a useful source of information. Previous studies have demonstrated that ~23% of post-mortem brain tissue and lymphoblastoid cell lines have similar expression levels17. There was an increased expression in FLT3 in LCL derived from TS cases, consistent with TWAS results. In conclusion, increased expression of FLT3 was implicated through TWAS across several brain tissues and expression in LCL.

Results

Transcriptome-wide significant hits

To identify genes associated with TS, a TWAS was conducted using FUSION. The strongest significant hit was FLT3, with increased expression (Z = 4.67, P = 2.98E−06) in the DLPFC (Table 1). Interestingly, the gene also had increased expression in the brain cortex, hippocampus, anterior cingulate cortex, frontal cortex, cerebellum, and cerebellar hemispheres suggesting a global dysregulation across brain tissue types. The gene DHRS11 was also implicated (Z = 4.26, P = 2.01E−05), although not genome-wide significant. An omnibus test using the GTEx and CMC brain tissue panels also identified the top two genes: ATP6V0A2 (P = 3.70E−05) and NEB (P = 1.72E−04) (Table 1). Given the lower number of FDR-significant hits, we used a nominally significant threshold (FDR-corrected p value < 0.10) to assess pathways. We found that significant pathways were related to FLT3 signalling, myosin binding, and microfilament activity (Supplementary Data 1).

Table 1 TWAS genes with association to Tourette’s syndrome.

Splicing in Tourette’s syndrome

Given the importance of alternative splicing in neurodevelopment and brain gene expression, a splicing TWAS was done to identify splicing events associated with TS. There were several significant genes after permutation. The top three hits were MPHOSPH9 (Z = −4.32, P = 1.58E−05), FIP1L1 (Z = 4.21, P = 2.55E−05) and CSGALNACT2 (Z = 4.14, P = 3.39E−05) (Table 1). However, we also caution on the interpretability of the effect direction given that alternatively spliced exons are typically negatively correlated18.

Fine-mapping of FLT3 locus

To determine whether FLT3 is the putatively causal gene on the DLPFC, FOCUS was used to assign a probabilistic inclusion probability for genes at the TWAS region19. For the region 13:27284583-13:29257379 (hg19 coordinates), the FLT3 gene had the highest posterior inclusion probability (PIP) of 0.849 and was included in the 90% credible gene set (Fig. 1). The PIP is an inclusion probability (0–1), where a higher PIP may suggest higher chance of being potentially causal.

Fig. 1: Fine mapping of chromosome 13 TWAS signal.
figure 1

PIP is the posterior inclusion probability. TWAS p values derived from FOCUS are on the Y-axis and genes within the locus are on the X-axis. The local correlation structure is shown in the bottom half of the figure. The PIP for FLT3 was the highest.

RT-qPCR of FLT3

Given that brain tissue for TS is difficult to acquire in large sample sizes, LCL were from TS patients and controls were used to assess the expression of FLT3. To test normality of qPCR data, a Shapiro–Wilk test was done. It was found that the ∆CT values (a measure of expression based on the difference in the number of PCR cycles required for the fluorescent signal to exceed background level) were normally distributed (W = 0.99, P = 0.70). Next, an ANOVA was done and determined that the disease status was statistically different in the dataset (F = 7.06, P = 0.0095). A Tukey’s test showed that TS patients had significantly higher expression of FLT3 compared to controls, with a ∆CT difference 0.780 (P = 0.009) (Fig. 2). The corresponding fold change is +1.72 higher in TS than controls. The effect size was determined to be moderate-large (Cohen’s F, 0.30).

Fig. 2: FLT3 RNA Expression differences between Tourette’s Syndrome patients and controls in lymphoblastoid cell lines (LCL).
figure 2

TS LCL had higher expression of FLT3 compared to controls after adjusting for plate, sex, and age. The black dot represents the mean of the data and error bars are ±SE.

Phenome-wide association study of FLT3

To identify phenotypes associated with the top gene, a regional phenome-wide association study spanning the entire gene (pheWAS) was done for FLT3. The pheWAS identified several immunological traits associated with FLT3 such as monocyte count (3.87E−40) and percentage of white blood cells (1.42E−21) (Fig. 3).

Fig. 3: Regional phenome-wide association study (pheWAS) of FLT3 for 2977 unique traits.
figure 3

Each point represents a trait. Only traits with an association (P < 0.05) were included for the plot.

Discussion

While recent GWAS for TS have successfully identified risk loci, the biological relevance of these associations remains unknown. Here, we conduct a TWAS using the summary statistics of over 14,000 individuals from the most recent TS GWAS5. This approach allows for imputation of expression by leveraging genotype-expression reference panels. From this, we identified increased FLT3 expression as the top hit in the DLPFC and additionally found an increase in expression across most brain tissue types, suggesting global dysregulation. The global dysregulation may suggest that commonly TS-implicated brain areas such as the supplementary motor area could potentially have FLT3 dysregulation20. Validation of expression in LCL prepared from TS cases found an increased or an increase in RNA expression compared to LCL from control individuals. The FLT3 gene encodes for a tyrosine-protein kinase and it has been associated with inflammation and immune function21,22. This could point toward immunity in TS as a putative biological mechanism. The FLT3 gene has been shown to be related to the developmental process of mouse brain, and its expression was markedly increased with age. Furthermore, the pheWAS identified that FLT3 is associated with immunological traits such as monocyte count and white blood cell counts. However, we emphasize that the pheWAS pertains only to the top hit FLT3 and not the rest of the genomic loci underlying TS risk. Interestingly, previous studies have demonstrated that TS patients have significantly higher levels of monocytes compared to healthy controls23. This could suggest dysregulation of monocytes partially due to increased expression of FLT3, which may contribute toward pathogenicity of TS. Alternatively, FLT3 could also alter neuroimmune interactions, but further functional data would be needed to investigate this. Furthermore, fine-mapping the TWAS hit demonstrated that FLT3 was in the 90% causal credible-set with a PIP of 0.849 for the DLPFC. This further stipulates that FLT3 is the strongest putative gene at this locus. A previous study investigating transcriptomic differences of the basal ganglia between TS and controls found an enrichment of differentially expressed immune-related genes, reinforcing the potential importance of immune-related genes in TS24.

The splicing TWAS identified several putatively associated genes (MPHOSPH9, FIP1L1, and CSGALNACT2) associated with TS, suggesting that both splicing, and genetically regulated genes are potentially implicated in TS. The MPHOSPH9 gene encodes for a protein that regulates cell cycling25. This gene has been implicated in multiple sclerosis, which is an inflammatory disease of the central nervous system25. In addition, FIP1L1 is associated with pre-mRNA 3′-end formation and has been implicated in immunological function by cooperating with IL-526,27. These findings could potentially support the hypothesis that the pathophysiology of TS may include or involve the immune system. Understanding the role of immunity in TS may elucidate the link between streptococcal infections and tic exacerbations as proposed in the pediatric autoimmune neuropsychiatric disorders associated with streptococcal infections (PANDAS) hypothesis1,28. It is also possible that these dysregulated may point to synaptic pruning and potentially overactive microglia instead of an immune response, which has been implicated in schizophrenia29. A previous study has also found that microglia-mediated neuroinflammation was found higher in the TS group in the bilateral caudate and bilateral lentiform nucleus and bilateral caudate nuclei compared to controls30. The CSGALNACT2 gene encodes for chondroitin sulfate protein, which is involved in the brain extracellular matrix31. A previous meta-analysis of ADHD and TS showed implication of sulfuration of chondroitin, suggesting potential relevance32. These results build upon the GWAS study by suggesting that FLT3 expression may be overexpressed in TS and identifying putative splicing gene targets to further investigate. Given the high degree of splicing in brain tissue, this area may prove fruitful for identifying novel gene targets9.

We conclude this study with some strengths, caveats and potential future directions. Strengths of TWAS include trying to unravel the biological relevance behind GWAS signals and identify gene targets for functional follow-up. Here, we were able to investigate the directional effects of FLT3 and prioritize the gene over others within the same GWAS locus. For limitations, TWAS signals can putatively be confounded due to expression imputation from weighted linear combinations of SNPs. Because of this, some of these SNPs may be associated with non-regulatory mechanisms that inflate the test statistic. A second caveat is that there is currently no available replication cohort, given that the largest GWAS for TS was used for this study. Future work could look at integrating single-cell sequencing data with TS GWAS to determine single-cell cis-eQTL regulated genes. Furthermore, individual TWAS risk could be investigated in independent cohorts. A third caveat is that a given gene may be influenced by genetic regulators independent of cis-eQTLs and sQTLs but still have downstream effect on TS. Finally, the use of GWASAtlas for the pheWAS can present potential selection bias on which traits get included, however, it was used since it can be readily used by any investigators as more phenotypes get added. In conclusion, we identify the FLT3 gene as likely involved in TS with increased expression found by TWAS and in lymphoblastoid cell lines of patients. We further identify several significant genes associated with aberrant splicing and point toward immunity in the pathogenesis of TS.

Methods

Genotyping data

Public summary statistics were obtained from the Psychiatric Genetics Consortium through the OCD & Tourette Syndrome group. Briefly, the summary statistics consists of a case-control GWAS for TS. Population stratification was accounted for through multidimensional scaling, and European individuals were retained. Imputation was done using the 1000 Genomes phase 1 haplotypes. Meta-analysis of different cohorts for the GWAS was done using an inverse variance model. Further details on the participant ascertainment and quality control steps are previously described in the 2019 TS GWAS5. The summary statistics were munged using LDSC and were used as the input for the subsequent transcriptomic imputation33.

Transcriptomic imputation

Imputation was done by using reference panels from FUSION that were derived from consortia datasets of tissue-specific gene expression integrated with genotypic data. The CommonMind Consortium (CMC) and brain tissue panels from GTEx 53 v7 were used for a total of 14 tissue types and panels. To account for multiple hypothesis testing, P values were adjusted for false-discovery rate (FDR). FUSION was used to conduct the transcriptome-wide association testing and features that capture a significant heritability were used. The 1000 Genomes v3 LD panel was used for the TWAS. FUSION utilizes several penalized linear models, such as GBLUP, LASSO, Elastic Net. In addition, a Bayesian sparse linear mixed model is used. FUSION computes an out-sample R2 to determine the best model by performing a fivefold cross-validating of every model. After, a multiple degree-of-freedom omnibus test was done to test for effect in multiple reference panels. The threshold for the omnibus test was P  =  4.64E−06 (0.05/10,323 [number of genes tested]). Next, we sought to assess splicing events associated with TS. Splicing analysis was done using the CMC splicing imputation panel obtained from FUSION, following the same methods as described earlier, and an FDR p value < 0.05 was considered significant.

Fine-mapping of TWAS associations

To address the issue of co-regulation and LD, we used FOCUS (Fine-mapping of causal gene sets) to model predicted expression correlations and to assign a posterior probability for causality in relevant tissue types19. Briefly, FOCUS prioritizes genes for each TWAS hit to be included in a 90%-credible set while accounting for pleiotropic SNP effects. The identical TWAS reference panels for FUSION were used as in the analysis described above.

Phenome-wide association studies

To identify phenotypes associated with FLT3, a phenome-wide association study (pheWAS) was done. PheWAS was done using public data provided by GWASAtlas (https://atlas.ctglab.nl)34. Briefly, the pheWAS seeks to identify traits that are associated with FLT3 by querying many different GWAS traits. This method seeks to prioritize phenotypes that may be associated with FLT3. Only publicly available GWAS summary statistics were included in the GWASAtlas database. Any GWAS based on immune-chip sequencing, whole-exome sequencing or replicable cohorts were excluded. At the time, there were 2977 unique traits and the Bonferroni-corrected threshold was set at 1.68E−5 (0.05/2977).

Lymphoblastoid cell lines

Tourette’s patients were recruited at the TS clinic at the Montreal General Hospital and theAllen Memorial Institute. The family members were interviewed by a group consisting of a psychiatrist, neuropsychologist, and neurologist. Symptoms were evaluated with semi-structured interviews using standardized scales. All Tourette’s patients had definite TS based on the DSM-IV and the TS Classification Study Group. Severity of tics was evaluated using the Yale Global Tic Severity Scale. Patients were excluded if there was evidence of another neurological disorder that could mimic TS, or if they had neuroleptic-induced tardive dyskinesia, or tics secondary to head injury or drug abuse. Lymphoblastoid cell lines were prepared from consenting individuals. The study was approved by the institutional review board of McGill University (IRB00010120) and informed consent was obtained from patients. A total of 100 LCL (50 controls and 50 cases) were randomly selected. Cells derived from TS patients and controls were grown at 37 °C and cells were cultured for ~1 week prior to RNA extraction. Control LCL were selected if there is no clinical reporting of TS or any other psychiatric disorder. Similar ages and sex were chosen to match the TS samples.

RNA extraction

RNA was extracted from the cells using the Qiagen RNAeasy Mini Kit. The RNA was subsequently stored in −80 °C after elution with RNAse-free water. One microgram of each sample of RNA was converted into cDNA using the SuperScript VILO cDNA Synthesis Kit by Thermo Fisher Scientific.

Reverse-transcriptase quantitative qPCR

The cDNA was used to perform a Taqman qPCR using QuantStudio 7. The FLT3 probe (Accession number: Hs00174690_m1) was used, and POLR2A (polymerase [RNA] II [DNA-directed] polypeptide) (Accession number: Hs00172187_m1) was used as the endogenous control. The thermocyler parameters consisted of the following: 2 min at 95 °C, and 40 cycles of 95 °C for 1 s and 60 °C for 20 s. The qPCR was performed in triplicate. The cycling parameters were A Shapiro–Wilk Test was done to determine the normality of the mean CT values data. Mean CT values were derived from averaging the triplicate CT values. An ANOVA was done using the model mean CT values ~ sex + plate + disease status + disease status:sex). Mean CT values were derived from averaging the triplicate CT values. Cohen’s F was used to determine the effect size of the data.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.