Abstract
Many psychiatric disorders share genetic liabilities, but whether these shared liabilities can be utilized to classify and differentiate psychiatric disorders remains unclear. In this study, we use polygenic risk scores (PRSs) of 42 traits comorbid with schizophrenia (SCZ), bipolar disorder (BIP), and major depressive disorder (MDD) to evaluate their utilities. We found that combining target specific PRS with PRSs of comorbid traits can improve the classification of the target disorders. Importantly, without inclusion of PRSs from targeted disorders, we can still classify SCZ (accuracy 0.710 ± 0.008, AUC 0.789 ± 0.011), BIP (accuracy 0.782 ± 0.006, AUC 0.852 ± 0.004), and MDD (accuracy 0.753 ± 0.019, AUC 0.822 ± 0.010). Furthermore, PRSs from comorbid traits alone can effectively differentiate unaffected controls and patients with SCZ, BIP, and MDD (accuracy 0.861 ± 0.003, AUC 0.961 ± 0.041). Our results demonstrate that shared liabilities can be used effectively to improve the classification and differentiation of these disorders. The finding that PRSs from comorbid traits alone can classify and differentiate SCZ, BIP and MDD reasonably well implies that a majority of the risk variants composing target PRSs are shared with comorbid traits. Overall, our results suggest that a data-driven approach may be feasible to classify and differentiate these disorders.
Similar content being viewed by others
Introduction
It is well known in psychiatry that comorbidities and overlapping phenomenology are common between different disorders, some symptoms are observed in multiple disorders, including physical diseases and behavioral traits1,2,3. Over the years of genome wide association studies (GWASs), it has been clear that many of these comorbid disorders and traits share genetic liabilities4,5,6,7. These shared liabilities suggest that not all genetic variants identified by a GWAS are specific to the disease, but it remains unclear to what extent that the liabilities shared with comorbid conditions account for the liability specific to the disease. From a clinical perspective, comorbidity increases the difficulties and challenges in disease diagnosis and treatment. Major psychiatric disorders such as schizophrenia (SCZ), bipolar disorder (BIP), and major depressive disorder (MDD) are often “misdiagnosed,” especially in the early stage of the disorders when typical symptoms have not been fully manifested. Understanding the genetic architecture of comorbid conditions could provide new insights into the underlying mechanisms and open new windows for a data driven, biology-based diagnosis and differentiation of these comorbid conditions.
Conceptually, for an individual, we can consider that his/her genetic liability to a psychiatric disorder consists of a core of liability specific to the disorder and a peripheral shell made of shared liabilities from comorbid disorders and traits. For a given individual, the shared patterns and extents in the peripheral shell can be different from another individual. Therefore, we may be able to take advantage of these differences between people to separate affected individuals from unaffected individuals. Fig. 1A illustrates this concept using SCZ as an example. Since polygenic risk score (PRS) has been established as a reliable approximation of genetic liability8,9, we can consider that the total genetic liability of a disorder for a given individual is the sum of PRS of the targeted disorder and the PRSs of all other comorbid disorders and traits. Similarly, we can extend the concept of overlapping genetic liabilities to several disorders with common clinical symptoms. Due to the differences in the degree of overlaps among these disorders, we can utilize these differences to distinguish major psychiatric disorders that have substantial overlaps in both genetic liabilities and clinical symptoms. Based on these rationales, for a specific disorder, we can construct a classification model that integrates the PRS of targeted disorder with the PRSs of all other comorbid disorders and traits, and this model may have a better performance than the model that uses the PRS of the targeted disorder alone. Similarly, we can also build models with these PRSs to differentiate several different but symptomatically overlapped disorders.
A The rationale. Classification of a target trait, SCZ is used as an example. Conceptually, we can consider that the genetic liability of SCZ consists of a core of genetic factors that are specific to SCZ and a varying number of genetic factors from comorbid disorders and traits. The extent and variation of the sharing of genetic risks between individuals are explored with a deep neural network model to classify SCZ. B A flow chart of procedures used in the study.
We have performed this study as a demonstration of these principles. For convenience, in this article, we refer to all clinical features, diseases and other phenotypes comorbid with a target disorder as comorbid traits collectively. We searched diseases and traits that are comorbid with SCZ, BIP, and MDD from the literature, and matched these traits with those reported in the GWAS catalog10,11 (https://www.ebi.ac.uk/gwas/). We then calculated the PRSs for SCZ, BIP, MDD, and the comorbid traits, and evaluated their genetic correlations. We selected those traits with PRSs statistically associated with the targeted disorders (SCZ, BIP, and MDD), and constructed deep neural networks (DNN) models with these PRSs to evaluate their utilities in the classification and differentiation of the targeted disorders (Fig. 1B). The results obtained from the models could help us understand the genetic architecture of comorbid conditions and provide strategies for biology-based diagnosis and distinction of these targeted disorders.
Results
Selection of comorbid traits
Based on the survey of the literature4,12,13,14,15 and our test of association, we selected a total of 42 diseases/traits in this study, including the targeted disorders (see Supplementary Table S1). From Table S1, it was clear that the associations between the targeted disorders and comorbid PRSs vary substantially. All selected traits were associated with at least one of our targeted disorders, i.e., SCZ, BIP, or MDD.
Comparison between baseline models
Comparison between models with and without comorbid traits
We started our study by building models using the bestPRSs and sex to classify targeted disorders and treated them as the baseline models for comparison with all other models. For each targeted disorder, there were 3 baseline models, B.I, B.II, and B.III, which used targeted bestPRS alone, targeted bestPRS + comorbid bestPRSs, and all bestPRSs but targeted bestPRS (Table 1).
To evaluate whether comorbid traits improve classification of targeted disorders, we constructed models with only PRS of targeted disorders (model B.I), and models with target PRS + PRSs of comorbid traits (model B.II). The results for SCZ were summarized in Table S2. Adding PRSs from comorbid traits did improve model performance (comparing B.I.SCZ vs B.II.SCZ, AUC had about 2.8% improvement) (Fig. 2A, Supplementary Table S2), but the improvement was modest. Similar improvements were also observed in class specific precision, recall, and F1-score (Supplementary Table S2, models B.I.SCZ and B.II.SCZ).
A SCZ; B BIP; and C MDD. The plots show the ROC curves for the baseline models that use only the bestPRSs from the targeted disorders (blue, Model B.I), bestPRSs from all traits (targeted disorders and comorbid traits, gold, Model B.II), and bestPRSs from only the comorbid traits (green, Model B.III). The results indicate that models with the inclusion of PRSs from comorbid traits have better performances than that of using only the PRSs from targeted traits. It is also clear that models using only the PRSs from comorbid traits can also have decent performance. D A comparison between the baseline elastic model (Model B.II.SCZ) and DNN model for SCZ classification (Model II.SCZ).
For BIP and MDD, we observed a similar trend as observed in SCZ, i.e., model B.II performance > model B.I performance (Fig. 2B, C, and Supplementary Tables S3 and S4), with the AUC improvements of 2.8% and 10% respectively. Overall, these results indicated that adding PRSs from comorbid traits to the model with only target specific PRSs improved model performances. Although the improvements for SCZ and BIP were modest, that for MDD was substantial.
Comparison between models with only target PRS and only comorbid PRSs
The performances between models with only target PRS and models with only comorbid PRSs, i.e., model B.I vs B.III, were in line with our expectation that models with target specific PRS performed better. But it was a surprise that model B.III could classify the target disorders reasonably well. For SCZ, i.e., model B.III.SCZ, it achieved an accuracy of 0.734 ± 0.001 and AUC of 0.812 ± 0.001. While its performance was worse (Fig. 2A, Supplementary Table S2) than that of models B.I.SCZ, the results were intriguing given the fact that no bestPRSs of SCZ and directly related traits were included in model B.III.SCZ.
For BIP, we observed similar trend as SCZ. Model B.III.BIP also had comparable performance as B.III.SCZ, and it classified BIP reasonably well (Fig. 2B and Supplementary Table S3). But for MDD, model B.III.MDD had an unexpected and better performance than model B.I.MDD (AUC 0.753 ± 0.002 vs 0.710 ± 0.000, Fig. 2C, Supplementary Table S4). Overall, these results indicated that the PRSs of comorbid traits alone could predict disease status for SCZ, BIP, and MDD.
We evaluated the performances between the elastic net regression models and DNN models using the bestPRS dataset of SCZ (comparison between models B.II.SCZ and II.SCZ), and the two models performed virtually the same (Fig. 2D).
Binary classification of target disorders with DNN models
Classification of SCZ
We constructed 4 DNN models (main models I–IV) to classify target disorders with the PRSs obtained from the selected traits. For all models, sex was included as a predictor and a 5-fold leave one out cross validation (LOOCV) protocol was used to validate the models. The metrices reported were all obtained from the left-out samples.
For model I.SCZ, all 42 traits were used and each trait had 6 PRSs obtained from multiple thresholds, with a total of 237 predictors (some traits did not have PRS from the 5 × 10−8 p value threshold). We obtained accuracy and AUC for the left-out samples of 0.913 ± 0.004 and 0.974 ± 0.002, respectively. The class specific precision, recall, and F1-score for SCZ were 0.915 ± 0.004, 0.912 ± 0.005, and 0.913 ± 0.005, respectively (Table 2, model I.SCZ). Please note that while there were some differences between SCZ and CTRL for class specific precision, recall, and F1-score metrices, the results for the two classes were comparable.
For model II.SCZ, where only the bestPRS of each trait were used, the number of predictors was 42. The DNN model (Fig. S1) had a performance very close to that of model I.SCZ, with the accuracy and AUC of 0.880 ± 0.005 and 0.956 ± 0.003, respectively (Table 2, model II.SCZ). The SCZ specific precision, recall, and F1-score were close to those of model I.SCZ as well. Of note, model II.SCZ (Table 2) and model B.II.SCZ (Supplementary Table S2) used exactly the same predictors, the two models performed virtually the same for accuracy, AUC, and class specific precision, recall, and F1-score.
Model III.SCZ was a model that removed all PRSs of the traits that included SCZ phenotypes directly in their perspective GWASs. The number of predictors was 196. As seen in B.III.SCZ, we were surprised to find that the model performed reasonably well, with a validation accuracy of 0.760 ± 0.007 and a validation AUC of 0.843 ± 0.005 (Table 2, model III.SCZ). The SCZ specific precision, recall, and F1-score were 0.784 ± 0.009, 0.719 ± 0.023, and 0.749 ± 0.012. Although the accuracy and AUC, and the class specific metrices were substantially lower than that of model I.SCZ (Table 2, comparing models I.SCZ and III.SCZ), the results remained significant.
For model IV.SCZ, which used only the bestPRSs of comorbid traits, the accuracy and AUC were significantly lower than that of model II.SCZ (which included SCZ specific bestPRSs), but the results were decent, with validation accuracy of 0.710 ± 0.008 and validation AUC of 0.789 ± 0.011 (Table 2, model IV.SCZ vs. model II.SCZ). Similar trends were also observed for the class specific metrices.
These models showed a performance trend of I.SCZ > II.SCZ > III.SCZ > IV.SCZ. The differences between I.SCZ and II.SCZ and between III.SCZ and IV.SCZ were much less than that between I.SCZ and III.SCZ and between II.SCZ and IV.SCZ, suggesting that while the use of PRSs from multiple p value thresholds could improve model performances, the improvements were limited, in contrast to the significant differences with and without the use of target specific PRSs.
Classification of BIP and MDD
We pursued similar strategies for the classification of BIP and MDD with DNN models. The results for BIP were summarized in Table 3, which closely mirrored those of SCZ reported in Table 2 except that models III.BIP and IV.BIP had virtually the same performance.
For MDD, while there were some differences among the 4 models, the performances were generally on par with one another, including class specific metrices (Table 4). This was different from the trend observed from both SCZ and BIP where models I and II performed significantly better that models III and IV. Additionally, the overall performance of MDD models was worse than that of SCZ and BIP. This could be due to the difference in heritability between these disorders.
Classification of control, SCZ, BIP, and MDD
We built a multiclass model, model V, to classify and differentiate the 4 classes (4C, i.e., CTRL, SCZ, BIP and MDD). The results were summarized in Table 5. Among the models, model V.I.4C had a similar performance as model V.II.4C and model V.III.4C had a similar performance as model V.IV.4C. For example, the average AUC of model V.I.4C (0.986 ± 0.015) was very close to that of model V.II.4C (0.990 ± 0.010), and so forth (Table 5). As expected, models with the inclusion of PRSs from targeted diseases, i.e., models V.I.4C and V.II.4C, had better performances than that of models without PRSs from targeted diseases (models V.III.4C and V.IV.4C). For all models, the class specific measures for the CTRL were the worst (Table 5, Fig. 3A). When we examined the confusion matrices more carefully, it was apparent that most of the misclassifications occurred with the controls (Fig. 3B).
To evaluate how much impact p value threshold selection had on model performance, we picked an arbitrary threshold, 1e-5, calculated PRSs for the 42 traits, constructed the same model as V.II.4C, and compared its performance to V.II.4C. The results were summarized in Supplementary Table S5. Using the PRSs calculated at a single p value threshold of 1e-5, which was different from the best fit p value threshold for all the 42 traits (see Supplementary Table S1), although the performance was slightly worse than that with the bestPRSs of the same 42 traits, the classification results for the CTRL, SCZ, BIP and MDD remained good. The results suggested that even with PRSs obtained from a suboptimal p value threshold, we could still be able to differentiate CTRL, SCZ, BIP and MDD reasonably well.
It was noted that the multiclass models performed better than the binary models. For example, for model V.II.4C, the average accuracy for the model was 0.938 ± 0.003, and average AUC was 0.990 ± 0.010, better than the accuracies and AUCs from models II.SCZ, II.BIP, and II.MDD. The class specific precision, recall and F1 score were also better for the multiclass models than that from the binary models. These results seemed to suggest that in a multiclass model, the inclusion of other targets could improve the classification of a specific target.
Evaluation of contribution of comorbid traits to SCZ, BIP and MDD
Based on the results from the models that did not include the PRSs from the target and directly related traits, PRSs from comorbid traits could substitute for the target PRSs to some extent, but we did not know which comorbid traits made the most contribution. To answer this question, we implemented a permutation procedure to evaluate the importance of the features in the models. Fig. 4 shows the results as measured by the changes of r2 score (delta r2) and AUC (delta AUC) between models II and IV for SCZ, BIP and MDD, and between models V.II.4C and V.IV.4C for all 4 classes. From the first panel, the SCZ panel, for model II.SCZ, 3 predictors/traits, i.e., BIP and SCZ vs CTRL, PGC_SCZ_2021, and SCZ vs BIP, made the most contribution to the model (Fig. 4A). When these traits were removed from the model, i.e., model IV.SCZ, anxiety, BIP vs CTRL, BIP_2021, BIP-1_2021 and panic_2019 became the most important contributors to the model. In other words, these 5 traits could largely account for the effects of SCZ specific traits in model II.SCZ. Similarly, for BIP (Fig. 4B), BIP vs CTRL, BIP_2021, BIP-1_2021, and SCZ vs BIP made most contribution to the II.BIP model. When these traits were removed, CAD and PGC_SCZ_2021 became the most important features for model IV.BIP. These two features could largely substitute BIP specific PRSs. For MDD model II.MDD, anxiety and depression_2019 were the most important features (Fig. 4C). In model IV.MDD, anxiety made the most contribution, and all other features, except BIP-1_2019, were significant contributors.
Permutation based feature importance estimates, as measured by delta r2 scores and delta AUC, were plotted for the two models. The features not included in the models were replaced with NA, and plotted as broken lines. A SCZ, models II.SCZ and IV.SCZ. B BIP, models II.BIP and IV.BIP. C MDD, models II.MDD and IV.MDD. D SCZ, BIP, and MDD, models V.II.4C and V.IV.4C.
For the multiclass classification model V.II.4C, in addition to those traits directly involved in SCZ, BIP and MDD (BIP and SCZ vs CTRL, BIP vs CTRL, CDG_2019, depression_2019, PGC_SCZ_2021, BIP_2021, BIP-1_2021 and SCZ vs BIP), anxiety and OCD_2017 also made significant contributions to the model (Fig. 4D). After removing target specific PRSs, ADHD, anxiety, CAD (coronary artery disease), OCD_2017, anxiety_2019 and panic_2019 became the major players in model V.IV.4C. Intriguingly, a physical disease, CAD, became a prominent contributor in the classification and differentiation of CTRL, SCZ, BIP and MDD. All other traits, while their contributions were not as dramatic as anxiety, CAD, OCD_2017 and panic_2019, their contributions were substantial because the permutation changes, delta AUC, were clearly greater than zero (Fig. 4D).
We also examined the embedding projections of these models to see whether there were significant differences in cluster structures. The results are presented in Fig. 5. For the binary classification models, with or without the use of target specific PRSs, the models had similar embedding structures. For example, for models II.SCZ and IV.SCZ, shown as the top two panels in Fig. 5, while there were significant differences in model performance, i.e., classification accuracy, there was no apparent difference in cluster structure and were no apparent gaps in embedding projections. The difference between models V.II.4C and V.IV.4C, the multiclass classification models, was the disappearance of the CTRL group, and BIP became the group connecting the SCZ and MDD groups.
The embedding layers immediately before the classification layer of the models were extracted and projected into a 2-dimension space using the UMAP method. Panels on the left side were from models with the inclusion of targeted PRSs (II.SCZ, II.BIP, II.MDD and V.II.4C), and panels on the right side were from models without the inclusion of targeted PRSs (IV.SCZ, IV.BIP, IV.MDD and V.IV.4C). For binary classification models, there were no apparent differences in cluster structures. For the multiclass models, model V.IV.4C did not have a CTRL cluster (green), and BIP became the connecting cluster between the SCZ and MDD clusters.
Discussion
In this study, we used PRSs from multiple comorbid traits to classify SCZ, BIP and MDD. Our study shows that PRSs from both target traits and comorbid traits can consistently predict the disease status of the targets. The use of multiple p value thresholds for a single trait does improve model performance, but the gain is limited. These can be seen in model performance between models I and II. The results from models I and II might be inflated because the samples we used were part of the GWASs for SCZ, BIP, and MDD. The original reason we designed models III and IV was to address the sample independence issue. We reasoned that if we excluded all PRSs obtained from the targeted disorder and directly related phenotypes, we could establish the lower bound of the model performance. The results we observed exceeded our expectations and were exciting. Based on our reading of the literature, there is no report that PRSs from comorbid traits alone can classify major psychiatric disorders, such as SCZ, BIP, and MDD. Although we16 and others17,18 have reported that inclusion of comorbid traits could improve the classification of targeted disorders, there are no reports of classification that exclude the PRSs of targeted disorders.
Our study has two major findings. One is that the PRSs from the comorbid traits alone, i.e., without the inclusion of PRSs of targeted diseases, can classify the targeted diseases, and these models perform reasonably well. This observation is true for the 3 psychiatric disorders studied in this article. This finding is consistent with recent reports that major psychiatric disorders share significant genetic liabilities4,5,19, suggesting that many of the risk alleles found in disorder specific GWASs may be the same alleles found in a different GWAS. Our study explicitly shows that a combination of PRSs from comorbid traits can replace the target specific PRSs to predict the disease status of the targets, and we can quantify the effects of target specific risks by comparing the performance between models II and IV. Furthermore, by examining the feature importance of these models, we can find which traits can be used to replace the targets in perspective models. In the case of BIP, PRSs from CAD and SCZ can substitute the BIP specific PRSs (Fig. 4B).
This finding raises an interesting question, that is how many genetic risk variants found in a GWAS are specific to the target disease, say, SCZ? In this study, we used a total of 42 traits, and the 35 traits not directly related to SCZ could predict SCZ status reasonably well. From the feature importance analyses, anxiety, BIP and panic can largely compensate for the removal of the effects of SCZ specific PRSs (Fig. 4A). The differences in model performance between model II.SCZ and IV.SCZ were 0.170 (accuracy) and 0.147 (AUC) (Table 2). The differences between II.BIP and IV.BIP and between II.MDD and IV.MDD were even smaller. Since the comorbid traits used in our models were far from exhaustive, other potential traits could be included. For example, breast cancer20, migraine21 and amyotrophic lateral sclerosis22 have been reported to have genetic correlation with SCZ, they could be good candidates to expand our list of comorbid traits. Should more genetically correlated traits be included, we would reasonably expect that the gap between models II.SCZ and IV.SCZ would decrease, i.e., the number of variants specific to SCZ would decrease, and this observation can be extended to BIP and MDD as well. In other words, the risk variants specific to SCZ, BIP or MDD are limited, most variants found in disease specific GWAS are shared between comorbid traits. The implication is that, if we believe that the main roles of genetic risks on SCZ are disruption of normal biological process/functions and our finding that many, perhaps, a majority, of the genetic risks are not specific to SCZ, it would lead to a conclusion that unless we know that a drug is designed to target variants specific to SCZ, the drug is unlikely to have a specific effect to SCZ, and it is more likely to be interchangeable among multiple disorders. This is consistent with current clinical practice that several drugs are exchangeable in treating SCZ, BIP and MDD because they all target the same or similar mechanisms23.
Another potentially important finding is that with PRSs from multiple comorbid traits, we can effectively differentiate CTRL, SCZ, BIP, and MDD. The results remained significant without the inclusion of the PRSs of targeted diseases (Table 5, models V.III.4C and V.IV.4C). When we examine the results more carefully, we find that the CTRL group has the lowest AUC and significant misclassifications (Fig. 3A, B), leading to lower precision, recall and F1-score for this group. A possible reason could be that the controls used in the datasets were not likely super controls who did not have any symptoms or diagnoses for all comorbid traits included in this study. The PRSs from comorbid traits help to differentiate each patient group (SCZ, BIP, or MDD), but they may blur the line between controls and case groups (SCZ, BIP, or MDD). This is because controls in the SCZ, BIP, and MDD datasets might not have been screened against all the comorbid traits, such as years of school attended24, body mass index25, and memory and neural function measures26, traits for which GWAS based PRSs were used in the models.
Our study has some limitations. One is potential overlap of controls between the targeted GWASs (i.e., SCZ, BIP, and MDD) and the GWASs of the traits included in our study. Some GWASs use consortium data that include control subjects from multiple sources. While these overlapping control subjects may not impact these individual GWASs, it might lead to some dependencies between our targeted disorders and those comorbid traits if their GWASs share some control subjects. Since our study used a substantial number of GWASs, it was difficult for us to know whether and to what extent these GWASs have overlapping controls. Therefore, it would be difficult to estimate their impact on our model’s performance. Independent studies may be needed to validate our findings. Another potential issue is the differentiation of the targeted disorders. The samples for SCZ, BIP, and MDD studies come from different sources, even though we conducted batch effect correction before combining the data in our models, it may not be sufficient to account for the stratification, leading to inflated performance. Further studies with the same genotype platform and samples coming from the same populations are necessary to validate our findings. In this study we used an INFO value of 0.4 to filter our imputed genotypes, a different threshold could be used. This change would change the number of variants included in PRS calculations, therefore, would have potential impacts on model performances.
In summary, we find that without the use of PRSs of targeted disorders, we can predict the status of these disorders. These results suggest that most genetic risk variants found by GWASs may not be specific to the disorders. Furthermore, PRSs from comorbid conditions, with or without the inclusion of PRSs from targeted disorders, can be used to differentiate the targeted disorders, indicating that it may be feasible to obtain a data-driven and biology-based diagnosis for these disorders.
Methods
Datasets and genotype imputation
In this study, we used datasets obtained from multiple sources. For SCZ datasets, we used the Molecular Genetics of Schizophrenia (MGS)27 (accession phs000167.v1.p1, cases = 2681, controls = 2653) and the Swedish Case-Control Study of Schizophrenia (SCCSS)28 (accession phs000473.v2.p2, cases = 2895, controls = 3836) from dbGaP (https://www.ncbi.nlm.nih.gov/gap/), and the Clinical Antipsychotic Trials of Intervention Effectiveness29,30 (CATIE, cases = 741, controls = 751) from NIMH’s genetic repository (https://www.nimhgenetics.org/). These SCZ datasets were combined with PLINK software31,32. For BIP datasets, we used samples from the Psychiatric Genomics Consortium (PGC) (TOP3, cases = 203, controls = 349; DUB, cases = 150, controls = 797; EDI, cases = 282 controls = 275) and the Wellcome Trust Case Control Consortium (https://www.wtccc.org.uk/) (cases = 1998, controls = 3004). The BIP datasets were also combined as a single sample using PLINK. The MDD data was obtained from dbGaP, accession phs000486.v1.p1, with 1704 cases and 3357 controls33,34. All subjects used in this study were Caucasians. For all datasets, we obtained the genotype and phenotype information from the sources, conducted genotype quality assessments, and removed SNPs with minor frequency less than 0.01 and Hardy-Weinberg equilibrium p value less than 0.0001. We then conducted genotype imputations for each dataset separately using the Michigan Imputation Server (https://imputationserver.sph.umich.edu/index.html#!) using the 1000 Genomes Phase 3 as reference and default parameters. After the imputation, SNPs with INFO values less than 0.4 were removed. For analyses that required combining the datasets (i.e., main models V), we used PLINK to merge the datasets. For phenotypes, we used the same definitions as defined in the original studies.
Selection of comorbid traits and PRS calculation
We reviewed literature on comorbid traits of psychiatric disorders and searched the GWAS catalog10,11 to find whether the traits had GWAS performed. If a GWAS was found, the summary statistics would be downloaded. Of note, a trait here was defined as a phenotype that GWAS was performed for. For a psychiatric disorder, if multiple related phenotypes were used for GWAS, all of these phenotypes were considered individual traits and were included in our study. With the summary statistics of GWASs, we used the default settings of PRSice2 package35,36 to calculate the PRSs for the best-fit p value threshold (bestPRS, hereafter) for the SCZ, BIP, and MDD samples for each candidate trait. For a candidate trait, if the association p value of the bestPRS with any one of our targeted disorders was ≤0.050, we would include it in our study. With this procedure, a total of 42 traits were obtained (Supplementary Table S1). For all included traits, we then calculated PRSs at six p value thresholds (5e-8, 1e-5, 1e-3, 0.1, 1 and the best-fit p value) for all subjects of the datasets used in this study. For all PRSs, we rescaled them to the range between 0 and 1 and stored them as individual by feature metrices for model inputs.
Model definition, training, and optimization
We built 6 main models (B and I-V) in this study to classify SCZ, BIP and MDD (Table 1). For convenience, we used this convention to name our models: main_model.submodel.target_trait. If the main models did not have submodels, then they would be named as main_model.target_trait. We used logistic and elastic net regressions to establish a baseline (main model B) to compare to DNN models (main models I-V). For the elastic net models, we used a grid search to find the optimal alpha value and L1 ratio (alpha of 0.010 and L1 ratio of 0.920) for submodels B.II and B.III. Sex was used as a predictor in all models.
Main models I to IV were binary models designed to classify the target disorders using various PRS combinations. Main model V was a multiclass model that were used to classify the 4 classes of CTRL, SCZ, BIP and MDD (4C as target in model nomenclature) by combining the datasets from the 3 targeted diseases together. To account for batch effects among the 3 datasets, we conducted batch correction with the pyComBat package37,38. Since the number of subjects in each class was substantially different, the combined dataset from the 3 diseases was imbalanced for the 4 classes, therefore, we used oversampling techniques (ADASYN39 and borderline SMOTE40,41) to balance the classes and train the models.
For models that needed to remove targeted disease and related phenotypes, i.e., models B.III, III, IV, V.III, and V.IV (see model definitions in Table 1), PRSs obtained for these target related phenotypes would be removed from the models (see Supplementary Table S1, columns 5, 9, 13, 14). For models that used multiple levels of PRSs for the same traits, i.e., models III and V.III, when a trait was removed from the model, all levels of PRS of that trait would be removed.
We used TensorFlow (version 2.5.0; www.tensorflow.org/)42,43, keras (version 2.9.0; https://keras.io/api/) and DNN architecture to construct the models. An example of the models (model II.SCZ) was shown in Supplementary Fig. S1, and the Python scripts for main models can be found in our Github site (https://github.com/mdsamchen/scz_bip_mdd). For each model, we used the LOOCV procedure to conduct model validation, where 80% of the samples were used for model training, optimization and testing, and 20% of the samples were reserved for validation. By setting the random seed of the split to a specific number, the same samples were reserved for validation, and they were not seen by any model during the model training and optimization processes. All results reported were obtained from the 20% left-out samples that were not seen by any of the models during the training processes. For binary classification (main models B, I–IV), we reported the binary accuracy, precision, recall, F1 score, and AUC as defined in the scikit-learn package (version 0.23.2)44. For multiclass model (i.e., model V), we reported the average of categorical accuracies from all classes, and the AUCs were also averaged across the classes. The precision, recall, and F1 scores were reported for each class.
Evaluation of feature importance
For deep learning models, permutations have been used as a method to evaluate the importance of features45. We used permutations to estimate the importance of the features by the following procedures: (a). define the feature importance as the change of model performance between the models with and without permuting each individual feature. For binary classification models, we used r2 score, as defined in the scikit-learn package, as the measurement of model performance. For multiclass classification models, we used class weighted average of AUC as the measurement. To estimate the feature importance, we permuted each feature 100 times for the trained model and took the average of these 100 permutations as the performance of the permuted feature. The importance of the feature was the difference between the performance of the trained model and the same model with the specified feature permuted:
Where ij is the importance of feature j, s is the performance score (r2 score for binary model and AUC for multiclass model) of the trained model, K is the number of permutations. We used one sample t-test to evaluate whether the change was statistically significant assuming that the performance changes from permutations followed a normal distribution. The changes in the permuted performances were plotted using Seaborn (version 0.12.2)46 and Matplot (version 3.5.2; https://www.mathworks.com/help/stats/index.html) libraries.
UMAP plotting of model embeddings
Model embeddings were extracted from the layer immediately before the classification layer. For direct comparison between the models with and without the inclusion of target specific PRSs, all models had 32 dimensions at this embedding layer. Then the embeddings were projected into a 2-dimensional space by umap-learn (version 0.5.4)47 and plotted by classes with the Matplot library.
Data availability
This study used data from NIH dbGaP (https://www.ncbi.nlm.nih.gov/gap/) and NIMH genetic data repository (https://www.nimhgenetics.org/) via controlled access. Qualified investigators can obtain access to these datasets by applying to these organizations. The python script codes used in the study are available at xiangning chen’s github site (https://github.com/mdsamchen/scz_bip_mdd).
References
Lee, H. et al. Comorbid health outcomes in patients with schizophrenia: an umbrella review of systematic reviews and meta-analyses. Mol. Psychiatry 1–11. https://doi.org/10.1038/s41380-024-02792-2 (2024).
Léda-Rêgo, G. et al. Lifetime prevalence of psychiatric comorbidities in patients with bipolar disorder: a systematic review and meta-analysis. Psychiatry Res. 337, 115953 (2024).
Platona, R. I., Căiţă, G. A., Voiţă-Mekeres, F., Peia, A. O. & Enătescu, R. V. The impact of psychiatric comorbidities associated with depression: a literature review. Med. Pharm. Rep. 97, 143 (2024).
Romero, C. et al. Exploring the genetic overlap between twelve psychiatric disorders. Nat. Genet. 54, 1795–1802 (2022).
Cross-Disorder Group of the Psychiatric Genomics Consortium. Genomic relationships, novel loci, and pleiotropic mechanisms across eight psychiatric disorders. Cell 179, 1469–1482.e11 (2019).
Brainstorm Consortium. et al. Analysis of shared heritability in common disorders of the brain. Science 360, eaap8757 (2018).
Baselmans, B. M. L., Yengo, L., van Rheenen, W. & Wray, N. R. Risk in relatives, heritability, snp-based heritability, and genetic correlations in psychiatric disorders: a review. Biol. Psychiatry 89, 11–19 (2021).
Lewis, C. M. & Vassos, E. Polygenic risk scores: from research tools to clinical instruments. Genome Med. 12, 44 (2020).
Legge, S. E. et al. Associations between schizophrenia polygenic liability, symptom dimensions, and cognitive ability in schizophrenia. JAMA Psychiatry 78, 1143–1151 (2021).
MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 (2017).
Buniello, A. et al. The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
Cross-Disorder Group of the Psychiatric Genomics Consortium. et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat. Genet. 45, 984–994 (2013).
Martin, J., Taylor, M. J. & Lichtenstein, P. Assessing the evidence for shared genetic risks across psychiatric disorders and traits. Psychol. Med. 48, 1759–1774 (2018).
Grotzinger, A. D. Shared genetic architecture across psychiatric disorders. Psychol. Med. 51, 2210–2216 (2021).
Cheng, S. et al. An atlas of genetic correlations between psychiatric disorders and human blood plasma proteome. Eur. Psychiatry 63, e17 (2020).
Chen, J., Wu, J.-S., Mize, T., Shui, D. & Chen, X. Prediction of schizophrenia diagnosis by integration of genetically correlated conditions and traits. J. Neuroimmune Pharmacol. 13, 532–540 (2018).
Hu, Y. et al. Joint modeling of genetically correlated diseases and functional annotations increases accuracy of polygenic risk prediction. PLoS Genet. 13, e1006836 (2017).
Maier, R., Moser, G., Chen, G.-B. & Ripke, S. Cross-Disorder Working Group of the Psychiatric Genomics Consortium Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder. Am. J. Hum. Genet. 96, 283–294 (2015).
Lee, P. H., Feng, Y.-C. A. & Smoller, J. W. Pleiotropy and cross-disorder genetics among psychiatric disorders. Biol. Psychiatry 89, 20–31 (2021).
Tang, M. et al. Epidemiological and genetic analyses of schizophrenia and breast cancer. Schizophr. Bull. sbad106. https://doi.org/10.1093/schbul/sbad106 (2023).
Bahrami, S. et al. Dissecting the shared genetic basis of migraine and mental disorders using novel statistical tools. Brain 145, 142–153 (2021).
McLaughlin, R. L. et al. Genetic correlation between amyotrophic lateral sclerosis and schizophrenia. Nat. Commun. 8, 14774 (2017).
Paul, S. M. & Potter, W. Z. Finding new and better treatments for psychiatric disorders. Neuropsychopharmacology 49, 3–9 (2024).
Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016).
Graff, M. et al. Genome-wide physical activity interactions in adiposity—a meta-analysis of 200,452 adults. PLoS Genet. 13, e1006528 (2017).
Davies, G. et al. Genome-wide association studies establish that human intelligence is highly heritable and polygenic. Mol. Psychiatry 16, 996–1005 (2011).
Shi, J. et al. Common variants on chromosome 6p22.1 are associated with schizophrenia. Nature 460, 753–757 (2009).
Bergen, S. E. et al. Genome-wide association study in a Swedish population yields support for greater CNV and MHC involvement in schizophrenia compared with bipolar disorder. Mol. Psychiatry 17, 880–886 (2012).
Stroup, T. S. et al. The National Institute of Mental Health Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) project: schizophrenia trial design and protocol development. Schizophr. Bull. 29, 15–31 (2003).
Sullivan, P. F. et al. Genomewide association for schizophrenia in the CATIE study: results of stage 1. Mol. Psychiatry 13, 570–584 (2008).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).
Sullivan, P. F. et al. Genome-wide association for major depressive disorder: a possible role for the presynaptic protein piccolo. Mol. Psychiatry 14, 359–375 (2009).
Wright, F. A. et al. Heritability and genomics of gene expression in peripheral blood. Nat. Genet. 46, 430–437 (2014).
Euesden, J., Lewis, C. M. & O’Reilly, P. F. PRSice: polygenic risk score software. Bioinformatics 31, 1466–1468 (2015).
Choi, S. W. & O’Reilly, P. F. PRSice-2: polygenic risk score software for biobank-scale data. GigaScience 8, giz082 (2019).
Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostat. Oxf. Engl. 8, 118–127 (2007).
Behdenna, A., Haziza, J., Azencott, C.-A. & Nordor, A. pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods. BMC Bioinform. 24, 459 (2023).
He, H., Bai, Y., Garcia, E. A. & Li, S. ADASYN: adaptive synthetic sampling approach for imbalanced learning. In Proc. 2008 IEEE International Joint Conference on Neural Networks IEEE World Congress on Computational Intelligence https://doi.org/10.1109/IJCNN.2008.4633969 (2008).
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002).
Han, H., Wang, W.-Y. & Mao, B.-H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In Advances in Intelligent Computing (eds Huang, D.-S., Zhang, X.-P. & Huang, G.-B.) 878–887 (Springer, 2005). https://doi.org/10.1007/11538059_91.
Abadi, M. et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems. ArXiv160304467 Cs (2016).
Abadi, M. et al. TensorFlow: a system for large-scale machine learning. ArXiv160508695 Cs (2016).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Mi, X., Zou, B., Zou, F. & Hu, J. Permutation-based identification of important biomarkers for complex diseases via machine learning models. Nat. Commun. 12, 3008 (2021).
Waskom, M. L. seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).
McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).
Acknowledgements
This work was supported in part by NIH grant P20GM121325 and R01LM012806. The genetic and clinical data for the MGS, SCCSS, and CATIE studies were obtained from the Genomics Repository of National Institute of Mental Health (https://www.nimhgenetics.org/). We thank the patients, control subjects, and the investigators involved in these studies. The investigators and co-investigators for the MGS were: ENH/Northwestern University, Evanston, IL, MH059571, Pablo V. Gejman, M.D. (Collaboration Coordinator; PI), Alan R. Sanders, M.D.; Emory University School of Medicine, Atlanta, GA, MH59587, Farooq Amin, M.D. (PI); Louisiana State University Health Sciences Center; New Orleans, LA, MH067257, Nancy Buccola APRN, B.C., M.S.N. (PI); University of California-Irvine, Irvine, CA, MH60870, William Byerley, M.D. (PI); Washington University, St. Louis, MO, U01, MH060879, C. Robert Cloninger, M.D. (PI); University of Iowa, Iowa, IA, MH59566, Raymond Crowe, M.D. (PI), Donald Black, M.D.; University of Colorado, Denver, CO, MH059565, Robert Freedman, M.D. (PI); University of Pennsylvania, Philadelphia, PA, MH061675, Douglas Levinson, M.D. (PI); University of Queensland, QLD, Australia, MH059588, Bryan Mowry, M.D. (PI); Mt. Sinai School of Medicine, New York, NY, MH59586, Jeremy Silverman, Ph.D. (PI). The SCCSS was supported by funding provided by the NIMH (R01 MH077139 to Patrick F. Sullivan and R01 MH095034 to Pamela Sklar), the Stanley Center for Psychiatric Research, the Sylvan Herman Foundation, the Friedman Brain Institute, Icahn School of Medicine at Mount Sinai at the Mount Sinai School of Medicine, the Karolinska Institutet, Karolinska University Hospital, the Swedish Research Council, the Swedish County Council, the Söderström Königska Foundation, and the Netherlands Scientific Organization (NWO 645-000-003). Co-principal investigators involved in this study were Pamela Sklar (Mount Sinai School of Medicine), Christina M. Hultman (Karolinska Institutet, Stockholm, Sweden), and Patrick F. Sullivan (University of North Carolina and Karolinska Institutet, Stockholm, Sweden). We are deeply grateful for the participation of all subjects contributing to this research and to the collection team that worked to recruit them. The principal investigators of the CATIE (Clinical Antipsychotic Trials of Intervention Effectiveness) trial were Jeffrey A. Lieberman, M.D., T. Scott Stroup, M.D., M.P.H., and Joseph P. McEvoy, M.D. The CATIE trial was funded by a grant from the National Institute of Mental Health (N01 MH900001) along with MH074027 (PI PF Sullivan). Genotyping was funded by Eli Lilly and Company.
Author information
Authors and Affiliations
Contributions
X.C. conceived the concept, designed the study, analyzed the data, and wrote the manuscript. Y.L. and J.M.C. processed data and conducted some data analyses. M.H., V.L.N., D.W., S.H., and Z.Z. reviewed and commented on the manuscript and results explanation. JC was involved in study design, conducted some analyses, and edited the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Chen, X., Lu, Y., Cue, J.M. et al. Classification of schizophrenia, bipolar disorder and major depressive disorder with comorbid traits and deep learning algorithms. Schizophr 11, 14 (2025). https://doi.org/10.1038/s41537-025-00564-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41537-025-00564-7