Introduction

Schizophrenia (SCZ), which affects approximately 1% of people worldwide [1, 2], is a chronic and severe psychotic disorder thought to have a strong genetic component in which patients typically display auditory hallucinations, delusions, emotional passivation, social withdrawal, and cognitive impairment [3]. Family and twin studies have consistently reported a high heritability of 70–80% for SCZ [4, 5], but only a few points of genetic variance in SCZ have been previously explained at the molecular level. Several common loci deeply influence susceptibility to genetic etiology, for instance, DISC1 [6] and NRG1 [7]. However, specific loci are unable to account for most of the heritability of SCZ. Although roughly 33–50% of the genetic risk of SCZ can be captured by current genome-wide association studies [8], a substantial part of the estimated heritability is still unknown. Thus, an approach that accounts for the high heritability of SCZ is important for investigating genetic susceptibility to SCZ.

To that end, whole-exome sequencing (WES) studies have proven successful in disentangling complex phenotypes by identifying causative genetic mutations [9]. In particular, WES studies have identified rare variants and mutants that significantly influence the risk genes for autism [10]. Although family-based WES has been successfully used to study risk genes for SCZ, only a few studies on this topic have been reported. For example, using WES, Daniel et al. found that de novo mutations in protein coding genes explain only a small fraction of SCZ risk [11]. Another study using WES reported disruptive de novo variants screened from 591 exome-sequenced SCZ cases and their parents [12]. Although family based studies can exclude some confounding factors such as population structure differences, these studies cannot explain the relationship between multiple factors and SCZ.

It is well established that gene-environment interactions are important for the development of SCZ; however, the mechanism(s) by which environmental factors influence SCZ-related genes remains poorly understood [13]. Importantly, environmental risk factors can function on an individual level or on a population level, and their effects can either be the direct or indirect cause of the risk increase. Generally, there are many limitations to studying environmental risk factors for SCZ which are very difficult to measure. For instance, studies have reported that subjective experiences, such as stress and childhood adversities, certain infections, and variable dose-dependent outcomes, including cannabis use, may have an impact during specific developmental stages of SCZ [14, 15]. However, these studies have focused on patients from the general population using a macro perspective, rarely including a unique population. In particular, most sample sources of WES studies are from the general population in China, of which samples from the Han people have been used to identify novel genetic susceptibility loci of SCZ [16]. Notably, relatively isolated populations will tend toward homogeneity in terms of genetic background and environmental exposure. For example, the isolated population of the Faroe Islands displays the mutation of glycogen storage disease III 4250 times more frequent than in outbred populations [17]. The results of independent samples precisely resolved the complex data caused by many interdependent environmental factors in other studies. Therefore, Tibet represents an important location that may provide a unique patient population to study the interaction between genes and the environment.

In this study, we completed screening and diagnosis of severe mental diseases in the Ngari prefecture, which is located in northwest Tibet, the highest average altitude in the world with a sparse population. In this survey, we visited the seven counties of the Nagri Prefecture, which has a population of approximately 0.1 million, and the total area is approximately 0.3 million km2. Our screening suggested that SCZ is the most common severe mental disorder in this area. We therefore performed WES to identify rare risk variants of SCZ in this area by investigating 47 individuals with SCZ and 53 controls from the isolated population. We also attempted to verify these findings in a follow-up Han sample of 279 SCZ patients and 95 healthy controls (HCs)

Materials and methods

Tibet subjects

Patients were included from seven counties in the Nagri Prefecture, whose diagnosis were made experienced psychiatrists from the Third People’s Hospital of Foshan based on all the material and records, according to the Diagnostic Criteria for Research (ICD-10), and the Diagnostic and Statistical Manual of Mental Disorders Fourth Edition (DSM-IV). HCs from the local community that were assessed as having no psychiatric record, were recruited by public advertising and included in this study.

Han subjects

The Han participants included 279 patients with SCZ and 95 HCs, of which 99 patients and 45 HCs were from the Huangshan Second People’s Hospital while the rest were from the Third People’s Hospital of Foshan. All patients with SCZ were diagnosed by experienced psychiatrists according to the ICD-10 and a Structured Clinical Interview using the DSM-IV. The HCs consisted of local volunteers recruited through public advertising.

All the participants or their relatives signed an informed consent form. The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008. All procedures involving human subjects/patients were approved by the Biological and medical ethics committee, Minzu University of China.

Sequence processing of Tibetan subjects

Library and sequencing

The library preparation was performed according to the manufacturer’s instructions, and the exome was captured using Agilent Sure Select version 3 (Agilent Technologies). The libraries were sequenced on an Illumina HiSeq2500 (Illumina).

Mapping

The sample reads were aligned to the genome (reference hg19) using BWA-MEM, converted to the BAM format, and indexed using SAM tools (version 0.1.18, https://samtools.github.io). The samples were realigned, marked for duplicates, and recalibrated using GATK as a pipeline manager.

Sequence processing of Han subjects

Objective gene primer design

The primers were designed by the company (Novogene) using Primer 3 Online (http://frodo.wi.mit.edu /), Oligo software, and the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/).

Library preparation

Library preparation was performed in accordance with the manufacturer’s instructions. This mainly included PCR amplification, DNA purification, and library mixing.

High throughput sequencing (HTS)

The libraries were sequenced using an Illumina Hi-SNP (Illumina, Novogene, Beijing, China). The operation was performed in accordance with the standard SOP.

Variation detection and annotation. Variant type was determined using GATK Variant Annotator. Based on this, variant calls were grouped into single-nucleotide variants (SNVs) and insertion-deletion (indel). The values of SIFT, Polyphen2, and Polyphen2-HDIV were used to annotate missense mutations with additional predictions of potentially damaging consequences.

Association analysis

First, the variants were deeply filtered to select high-quality variations for association analysis. The filtering standard was as follows: this variant position had at least 95% of the samples reaching a depth of more than 10×. After deep filtering, correlation analysis was carried out by various methods according to different range values of MAF to divide the difference between the case and control. It was mainly divided into single variant association (SVA) and gene-wise association (GWA). For SVA, we used PLINK to perform case/control association analysis while ignoring the variants of Hawin imbalance (P < 1e−5). For GWA, we used EPACTS software to perform gene-level association analysis of variants grouped by gene.

Metascape analysis of the variant target genes

Metascape pathway enrichment analysis was performed to identify new potential pathogenic and rare damaging variants. The TargetScan database (version 6.2) was used to predict target gene variants, and the threshold of TargetScan context+ scores was set to −0.20.

Results

Tibet sample sequencing

WES analysis of the 47 SCZ cases and 53 controls was performed at an average depth of 122.98. After comparing the sequencing reads to the human reference genome using BMA-MEM and filtering out low-quality variations using GATK, a total of 213,097 variants were identified. These included 199,521 SNVs and 13,558 Indels. Overall, 27,644 of the called variants were novel and not present in the Single Nucleotide Polymorphism Database (dbSNP). The variant types are shown in Fig. 1. All known pathogenic mutations in SCZ-related genes were obtained by searching the Human Gene Mutation Database (HGMD), and the distribution of these mutations in the samples was determined. The results showed that 12:46244485:A/G (ARID2) and 22:18905934:A/G (PRODH) may be pathogenic variants of SCZ.

Fig. 1: The proportion of sequenced variant types.
figure 1

There were eight types of variants, with missense being the most common and stop loss being the least prevalent.

This study was designed to target rare risk variants that might have increased in frequency in the isolated plateau population. Consequently, we included only variants at frequencies lower than 0.05 in the genomes data. In addition, due to the small sample size, we limited the analysis to those variants carried by two or more cases but not by controls, which were low frequency, harmful, and conserved sites. Thus, these variants may be pathogenic. This filtering strategy revealed 275 new potentially pathogenic variants (Supplementary Table), these genes included MAP2, IL6R, SHANK1 and BAI2.

For variant burden analysis, we counted the number of cases and controls carrying these variants at each low-frequency and harmful (frameshift, stop gain, spreading, or sift/polyphen2 predicted as harmful missense variant) gene. Among them, the number of cases was significantly greater than the number of controls, which may indicate potential pathogenic genes. Subsequently, 27 rare, damaging variants were identified (Table 1).

Table 1 Rare damaging variants from the Tibet samples.

Metascape analysis of new variants

To comprehend the potential function of these variants in SCZ, Metascape enrichment analysis was performed for the corresponding genes, including new potential pathogenic variants and rare damaging variants. The top five Metascape enrichment pathways included the flavone metabolic process, myofibril assembly, calcium-dependent cell-cell adhesion via plasma membrane cell adhesion molecules, the PID RHOA REG PATHWAY, and regulation of telomere maintenance (Fig. 2A). In addition, the enrichment networks of the top enrichment clusters were used to analyze intracluster and intercluster relatedness (Fig. 2B). The analyses suggested that high intracluster similarities drove the formation of tight local complexes and a substantial proportion of clusters were bridged through subterms with similarities.

Fig. 2: Bioinformatics analysis of potential risk genes in schizophrenia.
figure 2

A The top 20 Metascape enrichment clusters of potential risk genes in the isolated population. B Metascape enrichment network analysis depicting the intracluster and intercluster similarities of enriched terms for the potential risk genes.

Association analysis

We first filtered the variation deeply and selected high-quality variations for the association analysis. After deep filtering, association analysis was carried out by various methods according to the different range values of MAF to determine the difference between cases and controls. For SVA, 86,574 common variants (MAF ≥ 0.05) and 54,503 low frequency variants (0.01 ≤ MAF < 0.05) were identified after filtering. Ignoring the Hawin imbalance, 85,509 common variants and 54 503 low frequency variants remained. The analysis of common variants showed a significant association (P < 0.05) of 4495 reference genes in the trend test, 4500 genes under the Allelic Model, 947 genes under the Dominant Model, 443 genes under the Recessive Model, and 685 genes under the Genotypic Model.

To perform GWA on the rare variants, we divided variants with predicted significance related to SCZ into different groups for analysis. C5orf42 was the only significant gene in both groups (P < 0.05), regardless of whether it was a damage stopgain-frameshift (Fig. 3A) or nonsynonymous variant (Fig. 3B).

Fig. 3: Manhattan plots of the genome-wide association studies with the rare variants.
figure 3

A Three genes exhibited a significant difference in damage stop-gain-frameshift varients. B Seven genes exhibited a significant difference in nonsynonymous variants.

Verification of the variants related to SCZ in Han subjects

To explore whether the general population has the same mutation trend, we verified 47 variants (Table 2) related to SCZ filtering from the new potential pathogenic variants and rare damaging variants in the Chinese Han population. The average depth reached 2851 in these test samples using BWA software. However, the results showed that only BAI2 variants appeared in the case group, with one in the Han population and two in the Tibetan population. The unique SCZ risk variant signature in Ngari Prefecture may be due to the fact that these people live under extreme environmental conditions, and they are a genetically homogeneous population due to geographic isolation.

Table 2 The verification of 47 genes in the Han population.

Discussion

Aiming to identify rare risk variants of SCZ, we attempted to take advantage of using a related isolated population, the Tibetan population from the Ngari Prefecture, as some of the variants that are very rare in outbred populations have been found to be highly consistent in loci or increased in frequency. In particular, this population lives at the highest average altitude in the world; therefore, the independence and particularity of this research is self-evident as these individuals live in a hypoxic environment due to the high altitude of the region. Notably, previous studies have reported that flavonoids could improve the injury caused by hypoxia [18, 19], which is consistent with our results that showed enriched genes in the flavone metabolic process, conforming to the characteristics of these populations in hypoxic environments. Flavone compounds have previously been exploited as potential antipsychotic targets. For example, one flavone compound was found to have favorable effects in alleviating SCZ-like symptoms because of its high affinity for dopamine D2 and D3, and serotonin 5-HT1A, 5-HT2A receptors [20]. Another flavone compound was found to inhibit SCZ symptoms by inhibiting the physiologically crucial enzyme, phosphodiesterase 1 [21]. Hypoxia during neurodevelopment is one of several environmental factors associated with an increased risk of SCZ. In fact, previous research has suggested that hypoxia may impair oligodendrocyte function and myelination during neurodevelopment, thus potentiating the emergence of neurological diseases, such as SCZ [22, 23]. Studies indicate that DISC, which increases rare nonsynonymous mutations in patients and impairs the differentiation of oligodendrocytes, may play a role in the pathogenesis of SCZ [24, 25]. Furthermore, roughly half of the SCZ candidate genes identified are linked to ischemia-hypoxia [26, 27], supporting the close correlation between the selected population and SCZ. Indeed, ischemia-hypoxia response genes in the brain overlap with a subset of SCZ genes; related to monogenic disorders of the nervous system and synaptic function identified in recent SCZ GWAS studies [28]. Our findings support the role of the flavone metabolic pathway in SCZ, providing a potential therapeutic basis for this disease and supporting the importance of hypoxia in the onset and/or development of SCZ. Additionally, this study could offer fresh insights into understanding the mechanisms of SCZ and other psychiatric diseases that share genetic risk factors [29].

Risk alleles identified in isolated populations may be extremely rare in other populations or not observed elsewhere, suggesting that these new rare variants may provide new insights into SCZ. Although the sample size was very limited and thus prone to yielding spurious findings, we identified single variants that have already been reported in The Human Gene Mutation Database (HGMD), suggesting that several candidate genes of SCZ found here are common in multiple populations. Among these single variants, PRODH is best known as a risk gene for SCZ [30, 31]. Previous research has reported PRODH may mediate functional genetic variations in the neostriatal-frontal circuits, resulting in increased a risk for SCZ [32]. Moreover, PRODH encodes a proline dehydrogenase enzyme that catalyzes the first step of proline catabolism and is most likely involved in neuromediator synthesis in the CNS, especially in the hippocampus, which is known to be one of the brain structures most affected in SCZ [33]. Taken together, our findings provide further support for the role of this gene in susceptibility to SCZ.

Subsequently, 275 variants were revealed to have new potential pathogenicity, and 27 variants were revealed to cause rare damage via effective filtering. Among these variants, the MAP2 missense variant is intriguing, although this variant has not been detected in the Han population. MAP2 encodes a protein that belongs to the microtubule-associated protein family. Previous research has shown an association between MAP6 and SCZ [34]. Proteins of this family are thought to be involved in microtubule assembly, which is an essential step in neurogenesis. Reduced neurogenesis marker expression is associated with polygenic risk in SCZ [35]. Moreover, decreased adult neurogenesis in the hippocampus of model mice has been found to be associated with the pathology of SCZ [36]. On the other hand, aberrant MAP2 phosphorylation may underlie the profound reductions in MAP2-IR observed as a “molecular hallmark” of SCZ observed postmortem [37, 38], suggesting that MAP2 could have direct consequences on neuronal structure and function in SCZ. Our findings support the role of this gene in susceptibility to SCZ and provide a good genetic basis for SCZ under hypoxic condition.

Notably, our association analysis showed C5orf42 from both damage-stop-gain frameshift variants and nonsynonymous variants. C5orf42 is also known as ciliogenesis and planar polarity effector 1 (CPLANE1). The protein encoded by this gene has putative coiled-coil domains and may be a transmembrane protein. In fact, the top-ranked psychosis-associated differentially methylated position (cg23933044), located in the promoter of the C5ORF42 gene, was hypomethylated in post-mortem prefrontal cortex brain tissue from SCZ patients compared to unaffected controls [39]. Another genome-wide analysis showed that several hypomethylated genes were significantly enriched in the cerebral cortex and functionally enriched in nervous system development in SCZ [40]. Our findings support a potential role for this gene and connect the importance of methylation and SCZ, providing a basis for functional studies that reveal new epigenetic therapies.

Nevertheless, findings using isolated populations may not necessarily generalize to other populations making replication difficult. Therefore, we selected 47 new variants identified among the isolated population for verification in a general population, and only one risk gene emerged: BAI2. Notably, its family member, BAI3, has already been reported to be correlated with psychiatric disorders [41]. This gene is predominantly expressed in the brain, and while its physiological ligands and functions remain unclear, emotional behaviors were found to be modulated by BAI2, which connects with the main mediators of signal transduction, G protein-coupled receptors, in the central nervous system [42]. Interestingly while identified missense variants in both the isolated and general populations, they were in different loci, providing a novel and potential genetic mechanism of SCZ as well as revealing the importance of the BAI2 gene in SCZ, although its functions and effects on the disorder remain unclear. Overall, our findings revealed novel variants across numerous genes in an isolated population, although replications of these genes in the general population were rare. This might provide opportunities to further investigate the pathogenesis regulated by different genes under extreme conditions. Indeed, investigating mutations in brain cells in SCZ is crucial, as brain damage occurring during the embryonic stage—which is later than the damage leading to neurodevelopmental disorders—could contribute to the development of schizophrenia during maturation and adulthood. Consequently, the analysis of somatic mutations may emerge as a promising approach in future research [43].

In summary, our results support both existing findings in the literature on SCZ, as well as new risk genes in the disease etiology. In particular, we identified rare variants that may directly lead to the underlying biology of SCZ under hypoxic conditions. Importantly, potential new risk variants could not be verified in the Chinese Han population, which suggests that SCZ patients living at high altitudes may have a unique risk gene signature.