Abstract
Genome-wide association studies (GWAS) identify genetic variants underlying complex traits but are limited by stringent genome-wide significance thresholds. We present GRIN (Gene set Refinement through Interacting Networks), which increases confidence in the expanded gene set by retaining genes strongly connected by biological networks when GWAS thresholds are relaxed. GRIN was validated on both simulated interrelated gene sets as well as multiple GWAS traits. From multiple GWAS summary statistics of suicide attempt, a complex phenotype, GRIN identified additional genes that replicated across independent cohorts and retained biologically interrelated genes despite a relaxed significance threshold. We present a conceptual model of how these retained genes interact through neurobiological pathways that may influence suicidal behavior, and identify existing drugs associated with these pathways that would not have been identified under traditional GWAS thresholds. We demonstrate GRIN’s utility in boosting GWAS results by increasing the number of true positive genes identified from GWAS results.
Similar content being viewed by others
Introduction
Genome-wide association studies (GWAS) have become a crucial tool for the discovery of the genetic basis of complex traits. Complex traits are governed by a set of genes, each of which may be influenced by allelic variation affecting gene expression or the resulting product. In a GWAS, one tests the effects of single nucleotide polymorphisms (SNPs) on the trait of interest to isolate genomic regions that can then be linked to putatively relevant genes. Ultimately, the goal of GWAS is to accurately detect this “true positive” set of genes that affect a given trait. Since the true positive gene set is typically unknown, this sets up a struggle between precision (the ratio of actual true positive genes to all genes identified as positive) and recall (the ratio of the true positive genes in a gene set compared to all true positive genes that could possibly be discovered). When performing GWAS it is often precision that takes precedence over recall because downstream experiments to validate true positive genes are labor intensive and expensive. Thus, producing a shorter yet more reliable gene set may be more worthwhile than a longer set containing a higher proportion of false positives to enhance biological discovery.
Precision and recall from GWAS results are governed by many factors, only some of which are in control of the researcher. Balance between the two is primarily controlled by the use of multiple-testing correction of p-values, which puts a threshold on which SNPs are considered genome-wide significant and consequently which genes are considered relevant to the trait. Stringent genome-wide significance thresholds (e.g., p < 5e−8) provide increased confidence in SNPs that make it over the threshold, but can severely limit the recall of relevant genes. By using less stringent significance thresholds (e.g., p < 1e−5), one can access more heritability (higher recall) at the expense of introducing a higher proportion of unknown false positives (lower precision).
GWAS can identify the genetic basis of complex phenotypes but interpreting the results of these studies is challenging. SNP-level results from GWAS often need gene assignment for understanding downstream functions, but SNP-to-gene assignment is particularly challenging for intergenic SNPs1,2,3,4 which often reside within large blocks of linkage disequilibrium encompassing many genes. This makes it difficult to accurately determine which specific genes are relevant to the trait, increasing the likelihood of introducing false positives and reducing precision. One simple approach is to map SNPs to nearby genes5, which can be error prone because many non-coding SNPs regulate genes at greater distances. Additional methods such as expression quantitative trait loci (eQTL) mapping6 and H-MAGMA1 are designed to improve the SNP-to-gene mapping process by leveraging further biological evidence from Hi-C or RNA-seq data respectively, but it is often difficult to obtain these context-relevant omics and false positive genes may still remain. Thus, the difficult problem of distinguishing false positives from true positives in GWAS results remains.
Of critical importance, however, is the fact that polygenic traits are typically governed by genes from multiple pathways working in concert with each other in a non-random manner7. Given enough information from diverse experimental sources, it should be possible to find functional lines of biological evidence connecting any pair of the true positive, causal GWAS genes, as they are very likely to be more functionally connected with each other relative to other random pairs of genes. Conversely, false positives will likely be random and therefore far less functionally connected to other genes in the set. Previous efforts8,9,10 have therefore utilized networks in an attempt to boost GWAS signals, albeit without specifying which genes are likely to be true or false positives.
To this end, we present GRIN – Gene set Refinement through Interacting Networks – an approach that uses biological network topology to remove false positive genes from a gene set in order to improve its precision. Starting from a network representation of system-wide gene-to-gene interactions and a user-defined gene set (e.g., after SNP-to-gene assignment), GRIN explores network topology to determine how strongly these genes are interconnected, and compares the connectivity among these genes to the connectivity found among random genes to determine which of the user’s genes are likely to be false positives. We tested GRIN’s ability to separate well-characterized, functionally related “gold standard” genes from random genes using a large multiplex network8. We then tested GRIN on several published GWAS of human traits and diseases, showing that GRIN is able to improve the precision of the gene set when measured against a gold standard comprising genes discovered in a much higher powered GWAS of the same trait. GRIN was also compared to state-of-the-art, network-based GWAS boosting methods - NAGA9 and GWAB10.
We then applied GRIN to suicide attempt GWAS summary statistics, a complex psychiatric phenotype. Even though its heritability estimates range from 17–55%11,12, this psychiatric disorder has elicited few genome-wide significant variants to date12,13,14,15. We analyzed independent suicide attempt GWAS results from the Million Veteran Program (MVP15) and the International Suicide Genetics Consortium (ISGC13). As few variants were significant at traditional genome-wide significance, we explored SNPs at a less stringent threshold and used multiple SNP-to-gene assignment methods to elucidate the underlying mechanisms of the heritable components of suicidality. We then aimed to reduce the impact of possible false positive genes by applying GRIN, and identified potential drug targets for future suicide prevention studies that targeted the suicide attempt genes.
Results
GRIN workflow
A summary of the GRIN workflow is presented in Fig. 1a. GRIN inputs include a gene set and a previously generated multiplex network. In Stage 1, all experimentally-derived (e.g., GWAS) “seed” genes are ranked based upon biological network connectivity, which is compared to a null rank distribution generated from random gene sets of equivalent size. In Stage 2, a sliding window is used to compare the ordered ranks of experimentally-derived genes to the equivalent ordered ranks within the null distribution using the Mann–Whitney U test, and the p-value of the Mann–Whitney U test is plotted for each window to form a curve. The elbow of this curve indicates the cutoff point at which the rank distributions are equivalent between the seed gene set and the null distribution (indicating low functional interrelatedness), and genes following this cutoff point are filtered out.
a In stage 1, GRIN requires a network composed of gene-gene connections, preferably assembled from multiple experimental data sources. Next, the network is explored starting from all genes in the user’s gene set using the random walk with restart algorithm, resulting in a rank-ordered list of all genes in the network based on the frequency in which they were visited. The ordered ranks of the user’s genes are used for stage 2, where they are compared to ordered ranks obtained from running Stage 1 on 100 random gene sets of the same size (empirical null distribution). Using the Mann–Whitney U test, a window of the user’s gene set ranks is compared to the empirical null distribution of ranks to determine if the user’s gene rankings come from the same distribution as random genes’ rankings. After using a sliding window to compare the gene ranks in this manner, GRIN identifies a set of biologically related genes at the point at which the ranks of the user’s gene list begin to approximate the ranks of the null distribution (elbow point of the curve). Genes prior to the sliding window at the elbow point are retained as their ranks deviate from the null distribution, while genes beyond this point are removed. Figure made with BioRender.com. b Example of true and false positive genes from three GWAS loci. Left: Locus 1 and Locus 2 each contain more than one candidate gene within the boundaries of the locus, making determination of the causal gene difficult. In this example, retaining all the genes within the three loci means that 50% of the retained genes are false positives (gray) that are probably functionally random with respect to each other - there is no reason to expect that the false positive genes at locus 1 have any biological relationship to the false positive gene at locus 2 except by chance. The true positive gene at locus 1 (orange), however, is more likely to have a relationship with the true positive genes at other loci due to their joint role in the genetic architecture of the phenotype. Right: These relationships can be detected using comprehensive gene interaction networks, where true positive genes (orange) are more strongly connected to each other than random false positive genes (gray) are. The strength of this connectivity can be determined by network learning algorithms.
When conceiving GRIN, we identified its potential utility in identifying true positive candidate genes from genome-wide significant loci identified by GWAS (Fig. 1b). One real world GWAS case involves a situation in which one or more genome-wide significant SNPs are in proximity to multiple candidate genes, making the causal gene difficult to identify (Fig. 1b, Loci 1 and 2). This would result in false positive genes being associated with the trait of interest. However, true positive genes that contribute to a phenotype of interest are likely to share in unified biological pathways, which can be represented in gene-gene network connections (Fig. 1b). In cases where there is a higher degree of confidence of linking a genome-wide significant locus to a gene (Fig. 1b, Locus 3), leveraging genes that are highly interconnected within a network may identify which genes implicated by GWAS are more likely to truly contribute to the trait of interest.
GRIN accurately retains true positive genes from simulated, “noisy” gene sets
We first tested GRIN’s ability to distinguish between true positive and true negative genes using three thousand simulated gene sets where the truth is known. Each gene set was constructed from one of 30 biologically-interrelated, “gold standard” gene sets mixed with an equivalent amount of randomly drawn, “noisy” genes (1:1 ratio), and repeated 100 times per gold standard gene set (Fig. 2a, Supplementary Table 1). Stage 1 of GRIN ranked gold standard genes more highly than most random genes in any given test set, achieving median area under the receiver operating characteristic curve (AUROC) and median area under the precision-recall curve (AUPRC) values of 0.950 \(\pm\) 0.059 and 0.896 \(\pm\) 0.084 respectively across the 30 gold standard gene sets (Fig. 2b, c). This indicated highly accurate gene ranking in GRIN Stage 1. GRIN Stage 2 effectively classified true and false positive genes at the cutoff point (median precision 0.810 \(\pm\) 0.134; recall 0.914 \(\pm\) 0.167; specificity 0.880 \(\pm\) 0.197; Fig. 2d, Supplementary Fig. 1). On average, 46.0% of the gene set was discarded as noise (0.460 \(\pm\) 0.143), and less than 9% of true positive gold genes were discarded (median recall 0.914).
a Workflow for assessing the capacity of GRIN to identify biologically related genes from well-defined, “gold standard” gene sets (GO, OMIM, DisGeNET) mixed with random genes. Figure made with BioRender.com. b Receiver-operating characteristic (ROC) curve and precision-recall curves (PRC) for GRIN. The blue line represents the mean of GRIN performance over 100 sets of glutamatergic signaling genes (GO:0035249) intermixed with an equivalent number of random genes from the multiplex network. Gray values indicate maximum and minimum values over 100 sets, and dotted lines indicate classification values based on random chance. c Median area under ROC (AUROC) and PRC (AUPRC) values for 100 sets of random genes intermixed with a gold standard gene set for 30 distinct gold standard gene sets. d Median precision, recall, and specificity values for 100 sets of a gold standard gene set intermixed with an equivalent number of random genes, repeated for 30 distinct gold standard gene sets. e Violin and box plots of precision, recall, and specificity values for 100 sets of dopaminergic synaptic signaling (GO:0001963) gold standard genes intermixed with different ratios of random genes (Noise). Median precision: 0.970 (2:1), 0.767 (1:1), 0.568 (1:2), 0.360 (1:4), 0.175 (1:10). Median recall: 0.735 (2:1), 1.00 (1:1), 1.00 (1:2), 1.00 (1:4), 1.00 (1:10). Median specificity: 0.970 (2:1), 0.696 (1:1), 0.620 (1:2), 0.554 (1:4), 0.528 (1:10). Dotted lines indicate classification values based on random chance. f Precision, recall, and specificity values for 100 sets of four combined gold standard sets (dopaminergic synaptic signaling (GO:0001963), central nervous system myelination (GO:0022010), Alzheimer’s disease (OMIM #: 607822, 104300, 606889, 608907, 602192, 615590), and major depressive disorder (DisGeNET)) mixed with five ratios of gold standard to random genes. Median precision: 0.942 (2:1), 0.823 (1:1), 0.593 (1:2), 0.343 (1:4), 0.158 (1:10). Median recall: 0.746 (2:1), 0.881 (1:1), 0.955 (1:2), 0.993 (1:4), 1.00 (1:10). Median specificity: 0.909 (2:1), 0.813 (1:1), 0.672 (1:2), 0.524 (1:4), 0.470 (1:10). Dotted lines indicate classification values based on random chance.
Next, we tested GRIN’s ability to refine gene sets containing varying proportions of random genes in order to simulate real world scenarios where the signal to noise ratio in a gene set is not known. When gold standard genes outnumbered random genes by 2:1, precision (0.967 \(\pm\) 0.060), recall (0.735 \(\pm\) 0.331), and specificity (0.971\(\pm\) 0.088) were consistently high compared to expected values from random classification (precision = 0.66, specificity = 0.66). Precision, recall, and specificity values decreased as the ratio of gold genes to random genes in the gene sets decreased (1:1, 1:2, 1:4, 1:10) but were consistently better than random classification (Fig. 2e, Supplementary Figs. 2–3).
Finally, we tested whether GRIN could retain multiple distinct functional groups when mixed with random genes, as a real-world gene set for a complex trait is likely to contain multiple functional groups. When given gene sets with four distinctive biological functions (“Alzheimer’s disease”, “central nervous system myelination”, “dopaminergic synaptic signaling”, and “major depressive disorder”) at 2:1, 1:1, 1:2, 1:4, or 1:10 ratios of gold standard genes to random genes, GRIN generated much better precision and specificity compared to random chance while retaining high values for recall (Fig. 2F).
Evaluating GRIN using real-world GWAS results
We evaluated GRIN’s ability to control for a loss of precision in real-world GWAS results by relaxing significance thresholds from a lower-powered GWAS to identify genes that reach genome-wide significance in a higher-powered GWAS of the same trait. We considered true positive genes as genes assigned from SNPs with p < 5e−8 from a high-powered GWAS, and then measured the precision and recall before and after GRIN for the corresponding lower-powered GWAS at progressively less stringent significance thresholds.
For the vast majority of the combinations of traits and significance thresholds tested, GRIN improved precision compared to no GWAS boosting (78.47% of all combinations) or was equivalent to no GWAS boosting when precision was equal to 1.00 (8.33% of all combinations; Fig. 3a, Supplementary Table 2). The improvement in precision after using GRIN was especially evident for traits like smoking initiation (no boosting: 0.30–0.58; GRIN: 0.40–0.70), schizophrenia first release (SCZ1: no boosting: 0.33–0.84; GRIN: 0.38–1.00), and type 2 diabetes first release (T2D1: no boosting: 0.26–0.89; GRIN: 0.28–1.00; Fig. 3a). In contrast, no GWAS boosting only had higher values of precision for very few combinations of traits and thresholds tested (13.19% of all combinations, Fig. 3a, Supplementary Table 2). As expected, as the significance threshold became less stringent in the lower-powered GWAS, the recall of the higher-powered GWAS results improved without GWAS boosting, at the expense of precision, often drastically (Fig. 3). However, while the difference in recall between no GWAS boosting and GRIN was more evident as less stringent genome-wide significance thresholds, there was a notably smaller difference in recall when comparing the second release of height at p < 5e−5 (Fig. 3b, Supplementary Table 3; no GWAS boosting: 0.44 recall, 0.951 precision; GRIN: 0.37 recall, 0.954 precision).
a Using a later, higher powered GWAS as a gold standard, GRIN generated higher precision values compared to GWAB and NAGA across all traits tested except for Type 2 Diabetes (T2D1) at dramatically less stringent statistical significance thresholds. GRIN maintained higher or equivalent precision when only the statistical significance threshold was less stringent (No GWAS Boosting). b GRIN produces comparable values of recall to GWAB and NAGA but no GWAS boosting method recalls all possible signal from the later GWAS. Numbers above each point indicate total number of genes input to GRIN at each statistical significance threshold. CAD1/CAD2: Coronary artery disease releases 1 and 2 compared to release 3 results; DrinksPerWeek1: release 1 of Drinks Per Week compared to release 2 results; HDL1/HDL2: High Density Lipoprotein cholesterol releases 1 and 2 compared to release 3 results; Height1/2: releases 1 and 2 compared to release 3 results; LDL1/2: Low Density Lipoprotein cholesterol releases 1 and 2 compared to release 3 results; SCZ1/2: Schizophrenia releases 1 and 2 compared to release 3 results; SmokingInitiation1: Smoking Initiation release 1 compared to release 2 results; T2D1: Type 2 Diabetes release 1 compared to release 2 results; TC1/2: Total Cholesterol releases 1 and 2 compared to release 3 results; TG1/TG2: Triglycerides releases 1 and 2 compared to release 3 results.
GRIN boosts precision compared to other GWAS boosting methods
We compared GRIN to two existing network biology-based GWAS boosting methods (NAGA9 and GWAB10), which work by re-ranking the p-values of genes using network topology, similarly to GRIN stage 1 (see Methods). However, unlike GRIN, neither method classifies whether a ranked gene is a true or false positive GWAS gene, so it is up to the user to make that decision based on an arbitrary rank position. For a fair comparison between methods, we set a genome-wide p-value significance threshold and retained the re-ranked genes above that threshold for NAGA and GWAB, while allowing GRIN’s stage 2 to determine its own retained gene set given the same initial threshold.
For each trait and threshold tested, we found that GRIN almost always obtained the best precision of the three methods, which reflects its focus on improving precision by discarding potential false positive genes (Fig. 3a, Supplementary Table 2). For multiple traits and thresholds, NAGA and/or GWAB obtained higher recall than GRIN, which was due to GRIN discarding some true positive genes. However, there were numerous scenarios where GRIN maintained similar or even higher levels of recall than NAGA or GWAB despite discarding genes (see coronary artery disease releases 1 and 2 [CAD1, CAD2], SCZ1, SCZ2; Fig. 3b, Supplementary Tables 2–4).
GRIN retains a majority of genes from the union of multiple suicide attempt GWAS results at less stringent significance thresholds
After benchmarking GRIN on gold standard gene sets and other GWAS results, we applied it to suicide attempt GWAS summary statistics from the Million Veteran Program (MVP15) to understand which genes contribute to this psychiatric pathophysiology. We utilized both conventional MAGMA4 and H-MAGMA1 solely for gene assignment from SNPs at different genome-wide significance thresholds (Fig. 3a). Only five SNPs were significant below the traditional threshold of genome-wide significance (p < 5e−8) which were assigned to three genes (SPATA17, TSHZ2, and ENSG00000227705; Supplementary Tables 5 and 6). In order to explore additional genes contributing to suicide attempt pathophysiology, we examined SNPs at a threshold of p < 1e−5 and assigned them to 122 genes, since less stringent thresholds resulted in an unwieldy number of genes to interpret and we anticipated precision would still remain high based on real-world GWAS results (Supplementary Tables 5 and 7). We applied GRIN to this gene set in order to limit the number of potential false positive genes introduced at this threshold, resulting in 65 retained genes (57 genes removed; Supplementary Fig. 4, Supplementary Table 8).
We then sought to identify which genes retained by GRIN from MVP were replicated in two independent, civilian suicide attempt GWAS compiled by the International Suicide Genetics Consortium (ISGC13). This study contained GWAS summary statistics from: (1) suicide attempt in European-ancestry civilians (SA-EUR) and (2) results from the SA-EUR population conditioned by major depressive disorder (MDD) diagnosis status (SA-MDD). At a threshold of p < 5e−8, SNPs were assigned to only seven SA-EUR genes and three SA-MDD genes (Supplementary Tables 5–6). At p < 1e−5, we assigned SNPs to 252 genes from the SA-EUR results and 62 genes from SA-MDD results (Supplementary Tables 5 and 7), and applied GRIN to remove potential false positives. Prior to GRIN, 25 genes were in both SA-EUR and SA-MDD sets (Supplementary Fig. 5). Following GRIN, 11 genes were retained from both sets of summary statistics, 8 genes were removed from both sets, and 6 genes were removed or retained from one set only (Supplementary Fig. 4; Supplementary Tables 9–10). Moreover, the genes retained by GRIN from MVP and ISGC summary statistics were not simply those associated with lowest p-values (Supplementary Fig. 6).
In a process similar to a meta-analysis, we applied GRIN to the union of MVP, SA-EUR, and SA-MDD genes to identify replicated genes across cohorts and identify unified biological mechanisms. Prior to GRIN, at p < 1e−5 one gene (PDE4B) was shared among all results, three genes were shared by only MVP and SA-EUR, and 24 genes were shared by only SA-EUR and SA-MDD (Fig. 4b). After applying GRIN to the union of all genes, over 50% of the 17 genes shared by multiple summary statistics were retained, including PDE4B, 2 out of 3 shared genes between MVP and SA-EUR, and 14 out of 24 shared genes between SA-EUR and SA-MDD (Fig. 4b, Supplementary Table 11). Conversely, 8 genes between SA-EUR and SA-MDD and 1 gene between MVP and SA-EUR were removed at this threshold (Fig. 4b, Supplementary Table 9).
a Workflow for identifying biologically interrelated genes from genome-wide association study (GWAS) summary statistics of suicide attempt from Million Veteran Program (MVP) and International Suicide Genetics Consortium (ISGC) cohorts. Figure made with BioRender.com. b Venn diagrams of GRIN retained and removed genes from the union of MVP, SA-EUR, and SA-MDD suicide attempt summary statistics from genes assigned from SNPs at p < 1e-5. Most genes common to multiple summary statistics were retained.
GRIN enhances functional gene set enrichment from MVP and ISGC suicide attempt summary statistics
To determine if GRIN successfully retained biologically interrelated genes, we performed gene set enrichment analysis of suicide attempt GWAS genes before and after GRIN. Using a statistical significance threshold of FDR-corrected p-value < 0.05, this included gene set enrichments for: Gene Ontology (GO16) Biological Processes, Cellular Processes, and Molecular Functions; Human Phenotypes; biological pathways; transcription factor binding Sites; drug targets; and diseases (Methods). When examining only the MVP summary statistics before GRIN, the 122 unfiltered genes were not significantly enriched for any GO terms but were significantly enriched for 203 combined drug and disease terms, including “substance dependence” (DisGeNET C0038580; Supplementary Table 12). After refining the gene set with GRIN, 1449 terms were significantly enriched from just 65 retained genes, including five GO molecular functions (e.g., “adenyl ribonucleotide binding,” Supplementary Table 13). Conversely, the 57 genes removed by GRIN returned zero significantly enriched terms. Thus, filtering MVP genes at a far less stringent GWAS significance threshold with GRIN resulted in a more functionally enriched gene set by improving the signal-to-noise ratio.
Next, we compared gene set enrichments of the ISGC summary statistics before and after GRIN to investigate if GRIN functionally refined these gene sets. At p < 1e−5, 243 terms were significantly enriched from the unfiltered SA-EUR gene set, including 33 transcription factor binding site terms and 114 GO terms (Supplementary Table 14). GRIN retained highly interrelated genes from SA-EUR as demonstrated by 274 significantly enriched terms, including 132 significantly enriched GO terms and 44 transcription factor binding site terms (Supplementary Table 15) compared to only two significantly enriched terms obtained from removed SA-EUR genes (Supplementary Table 16). Similarly, 65 enriched terms were identified from SA-MDD before GRIN, whereas 247 enriched terms were identified in the GRIN retained gene set, including a significant enrichment for “schizophrenia,” (DisGeNET C0036341; Supplementary Tables 17–18). Conversely, only 31 enriched terms were identified in the removed gene set (Supplementary Table 19). This strongly indicated that GRIN-filtered SA-EUR and SA-MDD gene sets contained highly interrelated genes that were relevant to neurobiological pathways.
Next, we assessed GRIN’s ability to improve gene set enrichment analysis using genes common to two or more sets of summary statistics. Prior to GRIN there were 1324 significant enriched terms from the 28 genes common to multiple data sets (Fig. 5, Supplementary Table 20). Notably, 1443 significant enriched terms were obtained from the 17 genes commonly retained by GRIN in multiple data sets, including 126 GO terms such as “dopaminergic synapse” (Fig. 5, Supplementary Table 21). Conversely, only 358 enriched terms were significant from the 11 intersecting genes removed by GRIN, indicating lower interrelatedness among these genes compared to the retained set (Supplementary Table 22). In addition to the 1268 terms that remained significantly enriched, an additional 175 terms were significantly enriched from the filtered gene set, indicating that retained genes constituted a more biologically cohesive set (Fig. 5, Supplementary Tables 20–21). The enrichments from the gene set of GRIN-retained genes far exceeded the median number of significantly enriched terms when retaining random genes of an equivalent gene set size (Supplementary Fig. 7).
Using genes intersecting in two or more GWAS summary statistics (shaded blue region of Venn diagram) prior to GRIN resulted in 1324 enrichments. Intersecting genes retained by GRIN (28 genes) resulted in 1443 enrichments, 1268 of which were common prior to GRIN. Green: Enrichments using intersecting genes prior to GRIN; yellow: enrichments using intersecting genes from GRIN retained genes; purple: enrichments common to unfiltered and GRIN-retained genes.
Similarly, when applying GRIN to each set of summary statistics separately, 13 intersecting genes were commonly retained and 9 intersecting genes were commonly removed (Supplementary Fig. 8, Supplementary Table 23). This resulted in more numerous gene set enrichments using intersecting GRIN-retained genes compared to the unfiltered gene set or GRIN-removed genes (Supplementary Fig. 9, Supplementary Tables 24–25).
Retained genes from GRIN identify putative pathophysiological pathways involved in suicide attempt
Using genes retained by GRIN, we identified biological pathways implicated by suicide attempt GWAS (Fig. 6). Only one of the three genes identified from MVP GWAS at p < 5e−8 (TSHZ2) was in the set of 65 genes retained by GRIN at p < 1e−5. Multiple genes identified in multiple cohorts were relevant to dopaminergic signaling, including the dopamine D2 receptor subunit (DRD2) and phosphodiesterase 4-beta (PDE4B) as well as a protein kinase A subunit (PRKAR2A) from MVP, which can subsequently modulate cAMP/CREB-mediated transcription of genes important for synaptic plasticity17. Additionally, SGIP1 was retained by GRIN in MVP and SA-EUR summary statistics, which affects presynaptic vesicle release and emotional state18,19. Furthermore, genes involved in neurotransmitter release (BRSK1) and glutamatergic synapses (CELSR320) were identified along with ICE2, which is induced by NMDA receptor activity21,22. NCAM1 was also retained by GRIN, which is a crucial mediator of synaptic plasticity and memory processes23,24. Multiple genes involved in cytoskeletal reorganization were also identified including CDC42BPB, MAP4, and MARK3. Moreover, TSHZ2, SMARCC1, and ZNF589 were retained by GRIN and have been implicated in neurodevelopmental processes while RCOR1 is important for neural progenitor differentiation into neuronal and glial subtypes17,25,26,27,28. Finally, a number of genes involved in global translation processes were identified (DALRD3, DHX30, and EIF5), two of which have been previously implicated in neurodevelopmental disorders arising from missense variants29,30 (Fig. 6).
GRIN retained genes from MVP suicide attempt GWAS (cyan), MVP and ISGC general population GWAS (orange), and MVP and both ISGC suicide attempt summary statistics (red). A number of genes related to dopaminergic signaling (DRD2, PRKAR2A, and PDE4B) were identified, as well as presynaptic vesicle release (BRSK1, SGIP1), glutamatergic synapse formation (CELSR3), and synaptic plasticity (NCAM1). Genes related to cytoskeletal reorganization (CDC42BPB, MAP4, and MARK3) were also implicated in suicide attempt GWAS. Genes related to chromatin reorganization (SMARCC1), cortical development (TSHZ2, ZNF589), cell lineage (RCOR1), and translation processes (DALRD3, DHX30, and EIF5) were also retained by GRIN. Together, GRIN retained genes from multiple coincident biological processes underlying suicide pathophysiology. Figure made with BioRender.com.
Drug candidates identified as suicide attempt GWAS gene targets
Finally, we identified drugs that may modulate suicidal behavior based on GWAS-implicated genes retained by GRIN. Multiple drugs target the dopamine D2 receptor subunit, including the FDA-approved drugs clozapine (used to prevent suicidal behavior in schizoaffective individuals31) and amisulpride (Fig. 7, Supplementary Table 26). Roflumilast and a number of other molecular compounds also directly affect PDE4B. Furthermore, fostamatinib is known to affect 8 genes implicated in both MVP and SA-EUR suicide attempt GWAS. These drug-gene target links warrant future studies to ensure that they do not present increased risk for suicidality as a side effect, and to evaluate candidates for drug repurposing for suicide prevention.
Network diagram of DrugBank drugs (featured drugs green, additional drugs gray) and drug target genes from genes retained by GRIN applied to MVP and ISGC suicide attempt summary statistics at p < 1e−5 threshold. The dopamine D2 receptor (DRD2), phosphodiesterase 4B (PDE4B), and estrogen receptor 1 (ESR1) are targeted by many drugs, while fostamatinib targets 8 genes implicated by suicide attempt GWAS followed by GRIN. Only a single gene removed by GRIN was a known drug target (NOMO1 removed from ISGC SA-MDD, targeted by theophylline). Magenta: genes implicated in MVP and both SA-EUR and SA-MDD ISGC summary statistics; cyan: genes implicated in MVP only; orange: genes implicated in both SA-EUR and SA-MDD; brown: genes implicated in SA-EUR only; yellow: genes implicated in SA-MDD only.
Discussion
Here, we introduced GRIN, a software based on networks of biological relationships to enable the relaxing of GWAS thresholds while reducing the impact of false positives. GWAS are subject to statistical challenges that have historically made it difficult to identify a large proportion of trait-relevant genes. Genes that may contribute to disease have traditionally been identified from SNP associations at a genome-wide significance threshold of p < 5e−8, but for complex traits controlled by many small-effect loci GWAS often fails to find a comprehensive signal despite the fact that much of the SNP-based heritability lies below this threshold. Relaxing the stringency gives access to more SNPs and genes associated with the trait, at the risk of introducing an increasing proportion of false positives that will confound downstream analyses. Therefore, GRIN operates as a filter as it identifies true and false positives according to biological network connectivity.
The first stage of GRIN requires a representation of known relationships between all genes in a network format. Here we used a biological multiplex network which captures a wide variety of relationship types across its 10 layers, as GRIN’s capacity to accurately refine gene sets is contingent on the connectivity represented in the network. A network lacking sufficient biological relationships would result in reduced ability to distinguish between functionally related genes and random noise. We therefore included various experimental data sources and generated a frontal cortex-specific predictive gene expression network using an explainable-AI methodology (iRF-LOOP32), which can define relationships not present in the literature. Notably, we have tested GRIN on both a conventional laptop and a high-performance computing cluster and it is computationally tractable for gene sets containing hundreds of genes (Supplementary Table 27).
We demonstrated that GRIN works with simulated noisy gene sets similar to what is obtained from a GWAS. GRIN successfully partitioned curated gene sets spiked with random genes into signal and noise subsets, even when given multiple functional groups or a high noise ratio. The results confirm that RWR indeed ranks the functional groups of genes highly while random genes mostly receive poor rankings. When applied to real-world data, it is up to GRIN’s second stage to determine the optimal cutoff point that divides functional genes from false positive genes. The strong simulated test results provide confidence that when GRIN is applied to GWAS results, the true positive genes should rise to the top of the rankings as long as they are more functionally related to each other than random genes are, thus providing a retained set that has a higher signal-to-noise ratio than if GRIN was not used at all. However, as precision becomes lower as the signal-to-noise ratio increases, it is important to consider that when many genes can be assigned from a single GWAS locus that precision may decrease even when applying GRIN.
We also validated GRIN’s performance on real world GWAS results. First, we defined GWAS gold standard genes as genes reaching genome-wide significance in a higher-powered GWAS, and then measured precision and recall when applying GRIN to genes identified from a lower-powered GWAS at less stringent significance thresholds. We chose this definition despite the challenge of defining a “gold standard” for GWAS, as other groups have previously noted33. GRIN resulted in equivalent or higher precision compared to simply lowering the threshold of statistical significance. This validates that GRIN’s use of network topology hones in on biologically-relevant genes in the retained gene set, improving precision despite more false positives being introduced at less stringent genome-wide significance thresholds. Moreover, while recall was reduced among GRIN-retained genes compared to no GWAS boosting applied, this difference was often quite minimal except at low stringency thresholds (e.g., p < 1e−3) with the notable exception of the second release of human height (GRIN recall of 0.37, no boosting recall of 0.44 at p < 5e−5). We conclude that GRIN may be particularly useful in identifying additional trait-relevant genes from lower-powered GWAS.
GRIN was compared to two other GWAS boosting methods to evaluate its performance: GWAB10 and NAGA9. Both GWAB and NAGA are designed to boost recall of GWAS results (limiting the number of false negative genes) by ranking genes via network topology, but GRIN is specifically designed to boost precision while maintaining or increasing recall. Compared to GWAB and NAGA, GRIN exhibited higher precision for nearly every trait at every significance threshold, with the exception of GWAB applied to type 2 diabetes at low-stringency thresholds. Furthermore, GRIN does not share some of the same limitations of GWAB and NAGA. GWAB is particularly limited by: (1) the user is limited in choice of networks applied to GWAS results; and (2) a list of a priori disease-relevant genes are required for input. Thus, we were not able to generate GWAB results for height due to a lack of gold-standard genes. While NAGA permits users to provide their own networks, only monoplex networks can be used, which have been previously shown to decrease performance compared to multiplex networks34. NAGA also ranks all genes in the network, irrespective of whether genes in these networks were implicated by GWAS, and therefore does not provide an optimal statistical threshold for GWAS results. In this study, we attempted to make a fair comparison between GRIN, GWAB, and NAGA by comparing precision and recall of the top ranked genes by GWAB and NAGA with the equivalent number of genes input to GRIN at each genome-wide significance threshold tested. However, in practice GRIN allows users to boost precision of GWAS results at less stringent genome-wide significance thresholds without having to identify which proportion of top-ranked genes are trait-relevant, nor being constrained to using a predetermined gene set.
While SNP-to-gene assignment is an important aspect of interpreting GWAS results, assigning genes from SNPs is not a step within the GRIN software. When comparing GRIN to other methods, we chose to use a similar method of SNP-to-gene assignment used by NAGA and GWAB to make a fair comparison across all methods, as SNP-to-gene assignment is part of the workflow when using these two methods. However, compared to these alternate methods, users have the flexibility to choose one or more methods of assigning SNPs to genes when using GRIN. As SNPs may be incorrectly assigned to genes, particularly for intergenic SNPs, GRIN may be used to help remove potential false positive genes based on the underlying premise that genes causal to a trait will be more highly interconnected within networks representing known biological relationships. We, therefore, used multiple SNP-to-gene assignment methods when interpreting suicide attempt summary statistics in order to refine the genes most likely to be associated with this trait based on multiple possible genes assigned from SNPs: conventional MAGMA as a means of assigning SNPs to nearby genes, and H-MAGMA with dorsolateral prefrontal cortex Hi-C data to leverage 3D chromatin architecture from this brain region relevant to suicide attempt. This approach allows the user the flexibility of using their preferred method(s) of SNP-to-gene assignment while leveraging GRIN’s ability to reduce false positives identified by GWAS.
While we demonstrated that GRIN achieves high accuracy using gold standard gene sets, GRIN sometimes discarded true positive genes. This indicates that not all genes removed by GRIN are necessarily irrelevant to the trait or disease, and the removed set should be considered but with lower confidence than the retained set. It is also important to consider that false positives and false negatives could be re-classified as new experimental data sources become available. For example, some genes removed by GRIN from the suicide attempt summary statistics currently have few experimental gene-gene network relationships (e.g., the non-coding RNA RP11-839D17.3). However, future experiments may identify their capacity to modulate transcriptional or post-transcriptional processes with pathophysiological implications. Therefore, GRIN output should be considered as guidance rather than a definitive determination of what is a true or false positive. In an attempt to limit a bias against genes with lower evidence, we included a network we constructed using explainable artificial intelligence derived from RNA-seq data from the dorsolateral prefrontal cortex as an additional data-driven source of gene-gene relationships (see Methods).
After applying GRIN to expanded suicide attempt GWAS results at p < 1e−5, we obtained more gene set enrichments in the retained set compared to the original unfiltered gene set based on the removal of false positives. The fact that the removed gene set was scarcely enriched supports this argument. By separating a gene list into retained and removed subsets, users can identify additional biologically relevant pathways that may be missed by enrichment analyses on the whole set alone due to dilution with noise.
Combined with multiple SNP-to-gene assignment approaches, incorporating SNPs at a less stringent significance threshold and applying GRIN elucidated additional suicide-associated genes and pathways. While certain variants have been previously described (e.g., variants in DRD2, PDE4B, and SPATA17)12,13,15, the present study characterizes additional genes (CELSR3, PRKAR2A) contributing to dendritic structure and multiple key neurotransmitter pathways associated with suicidality, including the previously associated dopaminergic pathway35,36. Among these additional genes, missense variants in CDC42BPB37, DALRD330, DHX3029, SMARCC126, and ZNF58938 are known to impair behavioral and neurodevelopmental processes. While the genetic variants in the present study did not include these missense or loss-of function variants, it is possible that the variants implicated in the present suicide attempt summary statistics may alter the transcriptional regulation of these genes. In addition, multiple genes were implicated in cytoskeletal reorganization. CDC42BPB encodes MRCKbeta, a protein kinase that is induced by long-term potentiation in rodent models and mediates dendritic spinogenesis by actin-myosin filament phosphorylation39,40. MAP4 is a microtubule-associated protein (MAP) and MARK3 has been shown to phosphorylate tau (MAPT), another MAP which accumulates in multiple neurodegenerative disorders41,42. Moreover, RCOR1 is a subunit of the REST/CoREST complex and has been shown to affect CELSR3 and SMARCC1 transcription in mouse models20,26, and SMARCC1 has been implicated in autism as a core component of the SWI/SNF complex27,43. These findings point to the possible pleiotropic nature of these genes being associated with multiple psychiatric disorders.
By lowering the significance threshold and applying GRIN to refine suicide GWAS gene sets, we identified previously characterized drug targets (DRD2, MARK3, and PDE4B) and drug repurposing/side effect candidates. This includes genes that would not have been detected without lowering the genome-wide significance threshold that were retained by GRIN. Notably, the DRD2 antagonist clozapine is the only FDA-approved drug with on-label use to prevent suicidal behavior31. Amisulpride is also a DRD2 antagonist that has been shown to exhibit antipsychotic and antidepressant activities44. Intriguingly, fostamatinib targets 8 genes implicated by suicide attempt GWAS including MARK3 and PDE5B, a different phosphodiesterase than the PDE4B gene implicated in suicide attempt GWAS45. Moreover, it is important to understand if drugs can present adverse side effects modulating suicidal behavior. For example, the PDE4B inhibitor roflumilast has a rare adverse side effect of increased suicidality in some individuals46. Further studies are warranted to understand how pharmacological manipulation of these GWAS-implicated drug targets affect the propensity of suicidal behaviors in at-risk individuals.
GRIN is a powerful tool for identifying biologically interrelated genes and for identifying true positive variants and associated genes from GWAS. In effect, GRIN facilitates post-GWAS investigation by synthesizing multiple lines of evidence to determine which genes should be investigated further. By applying this tool to multiple GWAS results, we identify new genes involved in suicide pathophysiology that may lead to important clinical insights.
Methods
Multiplex biological network generation
The function of GRIN is contingent on the user supplying networks that represent known gene-gene biological relationships. In order to capture these relationships from diverse types of biological evidence, a multiplex network was assembled from weighted network connections (edges) from a combination of publicly available and newly generated monoplex (single layer) networks. A multiplex network has an advantage over aggregate multilayer networks in that the unique topology of each layer is maintained, resulting in generally higher functional predictive ability34. Multiple component networks from HumanNet v247 were used (co-functional links by co-citation, co-essentiality48, co-expression, molecular pathway databases, gene neighborhood, phylogenetic profile associations, and orthologous protein-protein interactions transferred from model organisms [CC, CE, CX, DB, GN, PG, IL]), and a protein-protein interaction (PPI) network was generated by merging the following networks into a single monoplex layer: HumanNet v2 component PPI networks (HT, LC), and high-confidence physical protein-protein interactions from STRING version 11.049 (taxa = 9606, protein.actions.v11.0, mode = binding, min score = 700).
As the dorsolateral prefrontal cortex (dlPFC) is a key brain region involved in processes disrupted in individuals with a history of suicide attempt (e.g., deficits in executive function and impulsivity50,51), we included multiple dlPFC-specific networks to gain tissue type-specific perspective on gene-gene relationships from this brain region. This included a dlPFC-specific transcription factor-gene network layer from a previously published transcription factor binding site network52. A newly generated dlPFC (Brodmann area 9) Predictive Expression Network (PEN) was obtained using the Iterative Random Forest - Leave One Out Prediction (iRF-LOOP) method32,53 using individual-level RNA-seq expression data from the Genotype-Tissue Expression (GTEx) project54. The resulting multiplex network was built using RWRtoolkit55 (https://github.com/dkainer/RWRtoolkit), which incorporates command-line scripts and an R library for generating multiplex networks and running the network exploration algorithm random walk with restart (RWR) by building upon the RandomWalkRestartMH R package34. The multiplex network used for all analyses comprises 10 layers, 51,183 unique genes, and 3,419,975 edges using δ = 0.5, where δ is the probability of the random walker remaining in the current network layer or moving to a different layer. The multiplex network used for all analyses is publicly available at https://github.com/sullivanka/GRIN/tree/main/test/suicide_weighted_Multiplex_0.5Delta.RData.
GRIN process
GRIN leverages the hypothesis that false positive genes in a user’s gene set, such as from SNP-to-gene assignment from GWAS, are likely to be functionally random with respect to the rest of the gene set, while true positive genes are likely to share function with other members of the gene set. GRIN is a classifier that uses information captured in biological networks from diverse lines of evidence to determine which genes in a gene set are functionally related to each other (and therefore belong together) and which ones appear to be randomly included and are likely false positives. GRIN achieves this by first scoring every gene in the network (including those in the user’s gene set) according to how topologically accessible it is from each gene in the user’s gene set as determined by the network propagation algorithm Random Walk with Restart (RWR). GRIN then classifies the genes in the user’s gene set as true or false positives based on their RWR rankings. GRIN runs its RWR process for 100 random gene sets of the same dimension as the user’s gene set to build a null ranking distribution, so that GRIN can learn what false positive gene set ranks should look like under the assumption they are random. The ordered gene rankings for the user’s gene set is compared to the ordered null rankings to find the position where the distribution of rankings in the user’s gene set no longer diverges significantly from the null distribution. Using this theory, GRIN partitions the user’s gene set, such as from SNP-to-gene assignment from GWAS, into a Retained gene set and Removed gene set in a two-stage process.
In Stage 1, every gene in the network is ranked according to how connected it is to the genes in the user-specified gene set (e.g., GWAS-derived genes). This includes ranking the user-specified genes themselves by using leave-one-out cross-validation (LOOCV). RWR provides each gene with a rank that is a proxy for how easily each gene in the network can be reached from the starting set of GWAS genes, including a rank for the GWAS genes themselves. Genes with many paths and interactions to one or more of the GWAS genes rank strongly, while genes that are isolated or distant from the GWAS genes rank poorly. In the current implementation, this RWR-based ranking occurs based on network propagation of probabilities of visiting a given gene in the multiplex network, which is based on a matrix representation of the edge weights between genes in the multiplex network (i.e., the supra-adjacency matrix composed of all intra- and inter-layer connections). Random walks are then simulated many times by propagating the probability of the random walker exploring a given gene beginning from the seed genes, and this process continues until the combined network probabilities no longer change between simulated random walks by a given threshold (1e−10), thereby achieving convergence based upon an asymptotic number of simulated random walks. The advantage of using a propagation algorithm like RWR is that genes that are not direct neighbors of GWAS genes may still rank highly due to indirect paths. Additional parameters can be used to tune RWR to favor certain network layers (τ) or adjust the probability of restart (r) at seed genes. In all analyses in the present study, we used r = 0.7, equivalent τ values for all network layers, and a multiplex network with δ = 0.5 based on previous work that achieved good performance using these parameters34.
To obtain accurate rankings for each gene in a gene set of size n, we chose to implement random walk with restart leave-one-out cross validation (RWR-LOOCV) n times, where in each run one gene is left out and the other n-1 genes are used as seed genes (starting points) for the random walker in the multiplex network. Each run of RWR-LOOCV generates a ranking of every non-seed gene in the multiplex, including the left-out gene from the original seed gene set, so that each gene in the user’s obtains n-1 rank values after n runs. Stage 1 then orders the genes in the set from best to worst according to their median rank values. GRIN also needs a representation of what Stage 1 results should look like for purely random gene sets of size n. This empirical null distribution is generated by running RWR-LOO for 100 gene sets, each containing n randomly sampled genes from the multiplex. The median rank at each position in the order from 1 to n thus represents the empirical null distribution of ranks for this specific multiplex and gene set size.
In Stage 2, a cutoff C between 1 and n is determined below which all gene set members are considered the equivalent of random and can be discarded. A two-sided Mann–Whitney U test from the R stats base package (“wilcox.test”) is performed over a sliding window of size \({winsize}=0.15\times n\) to see if the RWR-LOOCV ranks for the gene set members come from the same distribution as the null distribution RWR-LOOCV ranks. The expectation is that a gene set window containing functional groups of genes will have a very different ranking distribution to the random genes in the equivalent null window, resulting in very small (significant) p-values. On the other hand, if the window contains genes with little functional relatedness, the ranking distribution will appear to be drawn from the null distribution and the p-value tends towards 1. This test is run for each window sliding by 1, producing a p-value vector of length n-winsize. The cutoff C is chosen by finding an elbow in the p-values using the open source R package “Knee Arrower” with the method = “first” parameter set (https://github.com/agentlans/KneeArrower). The output is a Retained gene set and a Removed gene set.
Validation of GRIN using well-characterized gene sets
To determine the ability of GRIN to effectively remove noise genes from a gene set, we obtained a variety of well-characterized biologically interrelated gene sets (“gold” sets) and spiked them with random genes drawn from the full multiplex network. Given our application of this method to suicide GWAS summary statistics, we chose 20 gene sets related to diverse brain functions. We included an additional 10 gene sets related to other organ systems (lung and kidney) in order to demonstrate that GRIN can be used in other biological contexts. These thirty “gold standard” gene sets of functionally interrelated genes (see Supplementary Table 1), ranging in size from 10 to 225 genes, were derived from the following sources: Gene Ontology (GO16); Online Mendelian Inheritance in Man (OMIM56); and DisGENET57. Random genes were added to the full list of genes in each gold set to create gene sets with a 1:1 signal-to-noise ratio (i.e., Ngold : Nrandom). For each of the 30 gold sets we generated 100 test gene sets using varying samples of random genes. GRIN was then used to filter out random genes from each test gene set and the effectiveness of the filter was evaluated using receiver operator characteristics (ROC) and precision/recall (PR) measured at every possible cutoff point, C, in each rank-ordered gene set.
For evaluation purposes, “true positive” genes were labeled as genes belonging to a gold gene set that were correctly retained by GRIN; “true negative” genes were randomly added genes that were correctly removed by GRIN; “false positive” genes were randomly added genes that were incorrectly retained by GRIN; and “false negative” genes were gold genes that were incorrectly removed by GRIN. ROC (false positive rate vs true positive rate), and PR curves (precision vs recall) were generated and area under ROC (AUROC) and area under PRC (AUPRC) values were calculated for each test gene set. Median AUROC and AUPRC were calculated for each of the 30 gold standard gene sets to indicate whether Stage 1 of GRIN ranked gold genes more highly than random genes in general. After estimating the optimal cutoff C at Stage 2, precision, recall, and specificity (true negatives / true negatives + false positives) were calculated for the genes removed and the genes retained by Stage 2. Median precision, recall, and specificity values were calculated across the 100 test gene sets for each of the 30 gold standard gene sets. Values are presented as median +/- interquartile range (IQR).
GRIN was also tested on unequal ratios of gold standard genes and random genes using the dopaminergic synaptic signaling gene set from GO (GO:0001963) and Acute Kidney Failure gene set from DisGeNET (C0022660) – 2:1 gold genes to random genes, 1:2 gold genes to random genes, 1:4 gold genes to random genes, and 1:10 gold genes to random genes. For each ratio of gold standard genes to noise, 100 test sets were generated. Finally, to test whether GRIN could remove random genes from gene sets containing multiple groups belonging to biological processes that were functionally distinct, multiple gold gene sets were combined and random noise also added. Dopaminergic synaptic transmission (GO:0001963; 23 genes), central nervous system myelination (GO:0022010; 20 genes), Alzheimer’s disease (OMIM #: 607822, 104300, 606889, 608907, 602192, 615590; 12 genes), and major depressive disorder (DisGeNET gene-disease association score ≥ 0.5; 14 genes) were mixed with five ratios of random to gold standard genes – 2:1 gold genes:noise, 1:1 gold genes:noise, 1:2 gold genes:noise, 1:4 gold genes:noise, and 1:10 gold genes:noise. This process was repeated to generate 100 gene sets of gold standard and random genes for each ratio examined.
Evaluating GRIN with low- and high-powered GWAS results
We evaluated GRIN’s ability to control for the loss of precision when using a lower powered GWAS with relaxed significance thresholds to detect genes that were genome-wide significant in a higher powered GWAS. To do this, we obtained published GWAS summary statistics from multiple studies of the same trait and defined the highest powered study (the one with largest sample size) as the gold GWAS for that trait. To fairly compare GRIN to GWAB and NAGA, SNP-to-gene mapping was performed in the same way for each GWAS using the method used by GWAB and NAGA. SNPs were therefore assigned to protein-coding genes within a +/− 10 kbp window, with the SNP with the lowest p-value assigned to the nearest gene (or multiple genes in this window if the SNP was intergenic), and the genes identified in the gold GWAS at a stringent threshold of p < 5e−8 were labeled as true positives for the trait. We then ran GRIN on the gene sets from the lower powered studies at progressively more relaxed thresholds from p < 5e−8 by half orders of magnitude down to p < 1e−3 and measured the precision and recall of the GRIN-retained set of genes at each threshold. Using the higher powered GWAS as gold standard results, we calculated precision and recall using the total number of genes of the lower powered GWAS (No GWAS Boosting) and compared this to the values from GRIN-retained genes at each significance threshold. We did this for 10 human traits or diseases: coronary artery disease (CAD1-358,59,60); number of alcohol-containing drinks consumed per week (DrinksPerWeek1-261,62; HDL cholesterol (HDL1-363,64,65; height1-366,67,68; LDL cholesterol (LDL1-363,64,65); schizophrenia (SCZ1-369,70,71); smoking initiation1-261,62; type 2 diabetes (T2D1-272,73); total cholesterol (TC1-363,64,65); and total triglycerides (TG1-363,64,65). For CAD58,59, height66,67, SCZ69,70, and blood lipids traits (HDL, LDL, TC, TG63,64), two earlier, lower powered GWAS results were compared to the later, higher powered GWAS60,65,68,71 (e.g., CAD1 and CAD2 were used to measure precision and recall of CAD3). For traits where under 2000 genes were assigned at a given threshold, we did not proceed with lower thresholds with the exception of the second set of height results (Height2), as 2710 genes were assigned at p < 5e-8 (went down by half orders of magnitude to p < 1e−5).
Comparing GRIN to other GWAS boosting methods
We sought to compare the performance of GRIN to two other GWAS-boosting methods, NAGA and GWAB, using the same approach of considering higher powered GWAS results as a gold set from which to evaluate precision and recall of lower powered GWAS results.
NAGA works by first assigning to each protein-coding gene the p-value of the best GWAS SNP in near proximity (i.e., +/− 10kbp), which produces a ranking of all protein-coding genes. We used NAGA’s SNP-gene assignment to assign SNPs to protein-coding genes within +/− 10kbp for the genome assembly that was used to run the GWAS: hg18/GRCh36 (CAD158, HDL163, Height166, LDL163, SCZ169, TC163, TG163, T2D172), hg19/GRCh37 (CAD259, CAD360, DrinksPerWeek161, HDL264, HDL365, Height267, LDL264, LDL365, SCZ270, SmokingInitiation161, TC264, TC365, TG264, TG365, SCZ371, T2D273), or hg38/GRCh38 (DrinksPerWeek262, Height368, SmokingInitiation262). NAGA then propagates the p-values over a gene-to-gene functional network using either the random walk with restart algorithm or heat diffusion, which is intended to boost the recall of relevant (true positive) genes for the trait by re-ranking the genes and increasing their rank position. Thus, while the traditional full NAGA output is a full list of ranked genes, it is up to the user to determine a cutoff in the ranked list for further investigation. In order to make a fair comparison between NAGA and GRIN, we used the top n ranked genes from NAGA to calculate precision and recall, where n is the total number of genes that were input to GRIN at a given statistical threshold. All NAGA rankings were performed using a Jupyter Notebook, random walk with restart network propagation, and the NAGA-supplied monoplex network “Original PCNet” (http://www.ndexbio.org/#/network/f93f402c-86d4-11e7-a10d-0ac135e8bacf).
We ran GWAB (located at https://www.inetbio.org/gwab/gwab_query.php) using SNP-nearest gene assignment within +/− 10kbp similarly to NAGA. GWAB then uses a gene-to-gene functional network (HumanNet v247) to calculate a new score for each gene based on both its own p-value and the p-values of its network neighbors, modulated by the weights of the edges between the gene and those neighbors. This re-ranks the genes, which are evaluated against a user-provided list of true positive genes (e.g., a literature curated list of disease-relevant genes). The traditional output of GWAB is a list of genes assigned from SNPs at an optimized, less stringent genome-wide significance threshold based upon recall of an a priori list of literature-curated disease-relevant traits. Thus, we ran GWAB for the same traits as GRIN and NAGA where literature-curated genes were available from the DISEASES74 database, with the exception of Height1, Height2, and SmokingInitiation1. We ran GWAB at the same genome-wide significance thresholds as GRIN and compared precision and recall of GWAB boosted genes at each threshold.
Million Veteran Program (MVP) suicide attempt genome wide association study (GWAS) summary statistics
Suicide attempts were identified from United States veterans as described previously15. Suicide attempts were characterized by using a combination of Veterans Healthcare Administration (VHA) databases from the VA: the Suicide Prevention Application Network (SPAN) database, electronic health record (EHR) information from the VA Corporate Data Warehouse (CDW), and the CDW Mental Health Domain survey. For the MVP diagnosis, suicide attempt was determined by the presence of one or more of the following International Statistical Classification of Diseases and Related Health Problems (ICD) − 9 and ICD-10 diagnostic codes in a subject’s EHR: ICD-9: E950-959; ICD-10: T14.91, X60-62, X64, X66-X83, Y87.0, Z91.5. Control patients were obtained from veterans enrolled in MVP without a history of suicide attempt or suicidal ideation as determined by a combination of SPAN survey, Mental Health Domain survey, and ICD diagnostic codes in the CDW database (suicidal ideation codes: ICD-9: V62.84; ICD-10: R45.851). A total of 410,464 controls from various ancestries (African, Asian, European, and Hispanic) were included for genome-wide association along with 14,535 cases of non-fatal suicide attempt and 294 fatal attempts. Genome-wide association analyses were conducted using DNA from whole blood samples from subjects enrolled in MVP using a custom Affymetrix Biobank Array. Quality control and imputation was performed as previously described14. All subjects provided informed consent and the activities used to generate the GWAS summary statistics were approved by the VA Central Institutional Review Board. All ethical regulations relevant to human research participants were followed.
International Suicide Genetics Consortium (ISGC) suicide attempt GWAS summary statistics
Suicide attempt summary statistics were analyzed from two sets of suicide attempt summary statistics derived from civilian populations compiled by the ISGC13. SNPs were included from a general population of European ancestry (SA-EUR) as cases of suicide attempt or control subjects. Furthermore, additional summary statistics were derived from this general population conditioned on diagnosis status for major depressive disorder (SA-MDD) to generate an additional set of suicide attempt summary statistics. Thus, while the SA-MDD summary statistics are not independent of the SA-EUR summary statistics as they are comprised of the same set of controls and cases of suicide attempt, both SA-EUR and SA-MDD summary statistics were analyzed in order to determine the overlap between these results and results from the MVP cohort. All subjects involved in the ISGC provided informed consent and the activities used to generate the GWAS summary statistics were approved by their local institutional review boards as previously described13. All ethical regulations relevant to human research participants were followed.
SNP to gene assignment for suicide attempt summary statistics
SNPs from MVP and ISGC suicide attempt summary statistics were assigned to genes using the union of two separate methods in order to identify multiple possible genes contributing to this phenotype as input to GRIN. H-MAGMA1 was used in combination with publicly available Hi-C data from adult dorsolateral prefrontal cortex (dlPFC)75 to improve intergenic SNP-to-gene assignment based on three-dimensional chromatin structure in this brain region. Adult prefrontal cortex Hi-C data was used as this brain region is known to be involved in executive function and impulsivity processes, which are disrupted in individuals with a history of suicide attempt50,51. Additionally, conventional MAGMA4 was used as an alternate method of SNP-to-gene assignment. Thus, H-MAGMA and conventional MAGMA were applied only as methods of assigning SNPs to genes only using SNPs at given significance thresholds, rather than using these tools as gene-based tests on the entire set of summary statistics.
SNPs were assigned to genes from MVP, SA-EUR, or SA-MDD summary statistics at multiple thresholds (p < 5e−8, p < 1e−5, p < 1e−4, p < 1e−3, p < 1e−2, and p < 1e−1; Supplementary Table 2) to determine the number of suicide attempt GWAS genes that could be input to GRIN at each of these thresholds. The union of conventional MAGMA and H-MAGMA-assigned genes (i.e., all genes assigned from either method) from MVP, SA-EUR, or SA-MDD suicide attempt summary statistics were subsequently used as gene set inputs to GRIN at a threshold of p < 1e−5, as this resulted in gene set sizes that would result in high precision while still obtaining more recall compared to genes identified at a threshold of p < 5e−8. The union of genes identified at a threshold of p < 1e−5 were then filtered into retained and removed gene sets using GRIN (Supplementary Tables 5, 8, and 9).
Gene set enrichment analysis
Gene sets from MVP and ISGC summary statistics were tested for multiple enrichments using the online ToppGene suite using ToppFun76. Gene set enrichments were analyzed using the following enrichment categories: GO: Molecular Function; GO: Biological Process; GO: Cellular Component; Human Phenotype; Pathway (all databases selected); Transcription Factor Binding Site (all databases selected); Drug (all databases selected); Disease (all databases selected). Enrichments were considered significant using a Benjamini-Hochberg false discovery rate (FDR)-adjusted p-value threshold < 0.05.
Drug to gene target networks for putative drug repurposing and side effect evaluation
Genes identified as contributing to suicide attempt pathophysiology from MVP and ISGC summary statistics were used to construct drug to gene target networks from information derived from DrugBank77. Drug to gene target networks were visualized in Cytoscape78 (version 3.8.2, Cytoscape Consortium) to identify drugs known to target genes of interest from MVP and ISGC summary statistics using GRIN-retained genes at p < 1e−5 (Supplementary Table 20). ISGC GWAS genes were compared to genes from the MVP cohort using Venn diagrams generated from the open source R package Vennerable (https://github.com/js229/Vennerable).
Statistics and reproducibility
Full data points from Fig. 2, Supplementary Figs. 1–3, and Supplementary Figs. 6 and 7 are provided in Supplementary Data 2. All comparisons between the user’s gene set rank distributions and null distribution were performed using a two-sided Mann–Whitney U test, and the null distribution was generated by generating 100 random gene sets of equivalent size to the user’s gene set to ensure the reproducibility of results.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
GWAS summary statistics from the Million Veteran Program used in this study will be made available on the NIH database of Genotypes and Phenotypes (dbGaP) under accession ID phs001672.v1.p, and summary statistics from the International Suicide Genetics Consortium are available at https://tinyurl.com/ISGC2021. All other data are available from the corresponding authors upon request.
Code availability
GRIN is available as an open-source, command-line R script for public use. The code, installation instructions, and user manual can be found at https://github.com/sullivanka/GRIN and on Zenodo79.
References
Sey, N. Y. A. et al. A computational tool (H-MAGMA) for improved prediction of brain-disorder risk genes by incorporating brain chromatin interaction profiles. Nat. Neurosci. 23, 583–593 (2020).
Hall, M. A. et al. Novel EDGE encoding method enhances ability to identify genetic interactions. PLoS Genet 17, e1009534 (2021).
Petersen, A., Alvarez, C., DeClaire, S. & Tintle, N. L. Assessing methods for assigning SNPs to genes in gene-based tests of association using common variants. PLoS One 8, e62161 (2013).
de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Gosak, M. et al. Network science of biological systems at different scales: a review. Phys. Life Rev. 24, 118–135 (2018).
Carlin, D. E. et al. A fast and flexible framework for network-assisted genomic association. iScience 16, 155–161 (2019).
Shim, J. E. et al. GWAB: a web server for the network-based boosting of human genome-wide association data. Nucleic Acids Res. 45, W154–W161 (2017).
Voracek, M. & Loibl, L. M. Genetics of suicide: a systematic review of twin studies. Wien. Klin. Wochenschr. 119, 463–475 (2007).
Erlangsen, A. et al. Genetics of suicide attempts in individuals with and without mental disorders: a population-based genome-wide association study. Mol. Psychiatry 25, 2410–2421 (2020).
Mullins, N. et al. Dissecting the shared genetic architecture of suicide attempt, psychiatric disorders, and known risk factors. Biol. Psychiatry https://doi.org/10.1016/j.biopsych.2021.05.029 (2021).
Kimbrel, N. A. et al. A genome-wide association study of suicide attempts and suicidal ideation in U.S. military veterans. Psychiatry Res. 269, 64–69 (2018).
Kimbrel, N. A. et al. A genome-wide association study of suicide attempts in the million veterans program identifies evidence of pan-ancestry and ancestry-specific risk loci. Mol. Psychiatry https://doi.org/10.1038/s41380-022-01472-3 (2022).
Harris, M. A. et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258–D261 (2004).
Wang, H., Xu, J., Lazarovici, P., Quirion, R. & Zheng, W. cAMP Response Element-Binding Protein (CREB): a possible signaling molecule link in the pathophysiology of schizophrenia. Front. Mol. Neurosci. 11, 255 (2018).
Dvorakova, M. et al. SGIP1 is involved in regulation of emotionality, mood, and nociception and modulates in vivo signalling of cannabinoid CB1 receptors. Br. J. Pharmacol. 178, 1588–1604 (2021).
Trevaskis, J. et al. Src homology 3-domain growth factor receptor-bound 2-like (endophilin) interacting protein 1, a novel neuronal protein that regulates energy balance. Endocrinology 146, 3757–3764 (2005).
Thakar, S. et al. Evidence for opposing roles of Celsr3 and Vangl2 in glutamatergic synapse formation. Proc. Natl. Acad. Sci. USA 114, E610–E618 (2017).
Sugiura, N., Patel, R. G. & Corriveau, R. A. N-methyl-D-aspartate receptors regulate a group of transiently expressed genes in the developing brain. J. Biol. Chem. 276, 14257–14263 (2001).
Takahashi, H. et al. MED26 regulates the transcription of snRNA genes through the recruitment of little elongation complex. Nat. Commun. 6, 5941 (2015).
Vukojevic, V. et al. Evolutionary conserved role of neural cell adhesion molecule-1 in memory. Transl. Psychiatry 10, 217 (2020).
Walmod, P. S., Kolkova, K., Berezin, V. & Bock, E. Zippers make signals: NCAM-mediated molecular interactions and signal transduction. Neurochem. Res. 29, 2015–2035 (2004).
Monaghan, C. E. et al. REST corepressors RCOR1 and RCOR2 and the repressor INSM1 regulate the proliferation-differentiation balance in the developing brain. Proc. Natl. Acad. Sci. USA 114, E406–E415 (2017).
Abrajano, J. J. et al. Differential deployment of REST and CoREST promotes glial subtype specification and oligodendrocyte lineage maturation. PLoS One 4, e7665 (2009).
Sokpor, G., Xie, Y., Rosenbusch, J. & Tuoc, T. Chromatin Remodeling BAF (SWI/SNF) complexes in neural development and disorders. Front. Mol. Neurosci. 10, 243 (2017).
Caubit, X., Tiveron, M.-C., Cremer, H. & Fasano, L. Expression patterns of the three Teashirt-related genes define specific boundaries in the developing and postnatal mouse forebrain. J. Comp. Neurol. 486, 76–88 (2005).
Lessel, D. et al. De novo missense mutations in DHX30 impair global translation and cause a neurodevelopmental disorder. Am. J. Hum. Genet. 101, 716–724 (2017).
Lentini, J. M., Alsaif, H. S., Faqeih, E., Alkuraya, F. S. & Fu, D. DALRD3 encodes a protein mutated in epileptic encephalopathy that targets arginine tRNAs for 3-methylcytosine modification. Nat. Commun. 11, 2510 (2020).
Meltzer, H. Y. et al. Clozapine treatment for suicidality in schizophrenia: International Suicide Prevention Trial (InterSePT). Arch. Gen. Psychiatry 60, 82–91 (2003).
Cliff, A. et al. A high-performance computing implementation of iterative random forest for the creation of predictive expression networks. Genes 10, 996 (2019).
Baranger, D. A. A. et al. Multi-omics cannot replace sample size in genome-wide association studies. Genes Brain Behav. 22, e12846 (2023).
Valdeolivas, A. et al. Random walk with restart on multiplex and heterogeneous biological networks. Bioinformatics 35, 497–505 (2019).
Duval, F. et al. Hypothalamic-prolactin axis regulation in major depressed patients with suicidal behavior. Psychoneuroendocrinology 151, 106050 (2023).
Oquendo, M. A. et al. Toward a biosignature for suicide. Am. J. Psychiatry 171, 1259–1277 (2014).
Chilton, I. et al. De novo heterozygous missense and loss-of-function variants in CDC42BPB are associated with a neurodevelopmental phenotype. Am. J. Med. Genet. A 182, 962–973 (2020).
Agha, Z. et al. Exome sequencing identifies three novel candidate genes implicated in intellectual disability. PLoS One 9, e112687 (2014).
Wang, X.-X. et al. MRCKβ links Dasm1 to actin rearrangements to promote dendrite development. J. Biol. Chem. 296, 100730 (2021).
Li, L. et al. Protein kinases paralleling late-phase LTP formation in dorsal hippocampus in the rat. Neurochem. Int. 76, 50–58 (2014).
Lund, H. et al. MARK4 and MARK3 associate with early tau phosphorylation in Alzheimer’s disease granulovacuolar degeneration bodies. Acta Neuropathol. Commun. 2, 22 (2014).
Doki, C. et al. Microtubule elongation along actin filaments induced by microtubule-associated protein 4 contributes to the formation of cellular protrusions. J. Biochem. 168, 295–303 (2020).
Neale, B. M. et al. Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485, 242–245 (2012).
Rehni, A. K., Singh, T. G. & Chand, P. Amisulpride-induced seizurogenic effect: a potential role of opioid receptor-linked transduction systems. Basic Clin. Pharmacol. Toxicol. 108, 310–317 (2011).
Rolf, M. G. et al. In vitro pharmacological profiling of R406 identifies molecular targets underlying the clinical effects of fostamatinib. Pharm. Res. Perspect. 3, e00175 (2015).
Pinner, N. A., Hamilton, L. A. & Hughes, A. Roflumilast: a phosphodiesterase-4 inhibitor for the treatment of severe chronic obstructive pulmonary disease. Clin. Ther. 34, 56–66 (2012).
Hwang, S. et al. HumanNet v2: human gene networks for disease research. Nucleic Acids Res. 47, D573–D580 (2019).
Wang, T. et al. Gene essentiality profiling reveals gene networks and synthetic lethal interactions with oncogenic ras. Cell 168, 890–903.e15 (2017).
Szklarczyk, D. et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607–D613 (2018).
Zhang, H. et al. Aberrant white matter microstructure in depressed patients with suicidality. J. Magn. Reson. Imaging https://doi.org/10.1002/jmri.27927 (2021).
Cao, J. et al. The association between resting state functional connectivity and the trait of impulsivity and suicidal ideation in young depressed patients with suicide attempts. Front. Psychiatry 12, 567976 (2021).
Pearl, J. R. et al. Genome-scale transcriptional regulatory network models of psychiatric and neurodegenerative disorders. Cell Syst. 8, 122–135.e7 (2019).
Basu, S., Kumbier, K., Brown, J. B. & Yu, B. Iterative random forests to discover predictive and stable high-order interactions. Proc. Natl. Acad. Sci. USA 115, 1943–1948 (2018).
GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Kainer, D., Lane, M., Sullivan, K., Cashman, M. & Miller, J. dkainer/RWRtoolkit (Oak Ridge National Laboratory (ORNL), 2022). https://doi.org/10.11578/DC.20220607.1
Amberger, J. S., Bocchini, C. A., Schiettecatte, F., Scott, A. F. & Hamosh, A. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 43, D789–D798 (2015).
Piñero, J. et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 45, D833–D839 (2017).
Schunkert, H. et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat. Genet. 43, 333–338 (2011).
Nikpay, M. et al. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 47, 1121–1130 (2015).
Klarin, D. et al. Genetic analysis in UK Biobank links insulin resistance and transendothelial migration pathways to coronary artery disease. Nat. Genet. 49, 1392–1397 (2017).
Liu, M. et al. Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use. Nat. Genet. 51, 237–244 (2019).
Saunders, G. R. B. et al. Genetic diversity fuels gene discovery for tobacco and alcohol use. Nature 612, 720–724 (2022).
Teslovich, T. M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).
Willer, C. J. et al. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 1274–1283 (2013).
Graham, S. E. et al. The power of genetic diversity in genome-wide association studies of lipids. Nature 600, 675–679 (2021).
Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).
Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).
Yengo, L. et al. A saturated map of common genetic variants associated with human height. Nature 610, 704–712 (2022).
Ripke, S. et al. Genome-wide association study identifies five new schizophrenia loci. Nat. Genet. 43, 969–978 (2011).
Ripke, S. et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
Trubetskoy, V. et al. Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature 604, 502–508 (2022).
Morris, A. P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 44, 981–990 (2012).
Mahajan, A. et al. Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation. Nat. Genet. 54, 560–572 (2022).
Pletscher-Frankild, S., Pallejà, A., Tsafou, K., Binder, J. X. & Jensen, L. J. DISEASES: text mining and data integration of disease-gene associations. Methods 74, 83–89 (2015).
Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Science 362, eaat8464 (2018).
Chen, J., Bardes, E. E., Aronow, B. J. & Jegga, A. G. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 37, W305–W311 (2009).
Wishart, D. S. et al. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 36, D901–D906 (2008).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13, 2498–2504 (2003).
Sullivan, K. et al. GRIN - Geneset Refinement Using Interacting Networks. (Zenodo). https://doi.org/10.5281/ZENODO.13684721 (2024).
Acknowledgements
This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This work was sponsored by MVP CHAMPION (DAJ), NIH grants DA041913 (DAJ), DA051908 (DAJ), MH116269 (DMR), MH121455 (DMR), Brain & Behavior Research Foundation (NARSAD Young Investigator Award No. 29551 [NM]), Department of Veterans Affairs (VA) Clinical Science Research and Development (CSR&D) grants lK6BX003777 (JCB, NAK, DWO), and the VA Million Veteran Program (MVP), and the Australian Research Council Center of Excellence for Plant Success in Nature & Agriculture (project number CE200100015) (DK). This publication does not represent the views of the VA or the United States Government. We also thank and acknowledge MVP (Office of Research and Development, Veterans Health Administration), the MVP Suicide Exemplar Workgroup, and the ISGC for their contributions to this manuscript. A complete listing of contributors from the MVP, MVP Suicide Exemplar Workgroup, and ISGC is provided in the Supplemental Information. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
Author information
Authors and Affiliations
Consortia
Contributions
Conceptualization and methodology: K.A.S., D.K., and D.A.J.; Software: K.A.S., M.L., M.C., J.I.M., A.C., J.R., and D.K.; Formal analysis: K.A.S. and D.K.; Investigation: K.A.S., D.K., X.Q., N.M., and D.A.J.; Resources: D.R., X.Q., and D.A.J.; Writing - Original Draft: K.A.S. and D.K.; Writing - Review & Editing: K.A.S., M.L., M.P., A.M.W., N.M., A.D., H.C., D.M.R., M.R.G., A.E.A.K., J.C.B., B.M., D.W.O., N.A.K., D.K., and D.A.J.; Visualization: K.A.S. and D.K.; Supervision: N.A.K., D.A.J., and D.K.; Project administration: A.D., H.C., D.M.R., J.P.P., A.E.A.K., J.C.B., B.M., D.W.O., N.A.K., D.A.J.; Funding acquisition: D.M.R., D.K., and D.A.J.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Biology thanks Niamh Ryan and the other, anonymous, reviewers for their contribution to the peer review of this work. Primary Handling Editors: Aylin Bircan and Benjamin Bessieres.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Sullivan, K.A., Lane, M., Cashman, M. et al. Analyses of GWAS signal using GRIN identify additional genes contributing to suicidal behavior. Commun Biol 7, 1360 (2024). https://doi.org/10.1038/s42003-024-06943-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42003-024-06943-7