Assessment and ascertainment in psychiatric molecular genetics: challenges and opportunities for cross-disorder research

Cai, Na; Verhulst, Brad; Andreassen, Ole A.; Buitelaar, Jan; Edenberg, Howard J.; Hettema, John M.; Gandal, Michael; Grotzinger, Andrew; Jonas, Katherine; Lee, Phil; Mallard, Travis T.; Mattheisen, Manuel; Neale, Michael C.; Nurnberger, John I.; Peyrot, Wouter J.; Tucker-Drob, Elliot M.; Smoller, Jordan W.; Kendler, Kenneth S.

doi:10.1038/s41380-024-02878-x

Download PDF

Expert Review
Open access
Published: 27 December 2024

Assessment and ascertainment in psychiatric molecular genetics: challenges and opportunities for cross-disorder research

Molecular Psychiatry volume 30, pages 1627–1638 (2025)Cite this article

7273 Accesses
12 Citations
23 Altmetric
Metrics details

Subjects

A Correction to this article was published on 28 January 2025

This article has been updated

Abstract

Psychiatric disorders are highly comorbid, heritable, and genetically correlated [1,2,3,4]. The primary objective of cross-disorder psychiatric genetics research is to identify and characterize both the shared genetic factors that contribute to convergent disease etiologies and the unique genetic factors that distinguish between disorders [4, 5]. This information can illuminate the biological mechanisms underlying comorbid presentations of psychopathology, improve nosology and prediction of illness risk and trajectories, and aid the development of more effective and targeted interventions. In this review we discuss how estimates of comorbidity and identification of shared genetic loci between disorders can be influenced by how disorders are measured (phenotypic assessment) and the inclusion or exclusion criteria in individual genetic studies (sample ascertainment). Specifically, the depth of measurement, source of diagnosis, and time frame of disease trajectory have major implications for the clinical validity of the assessed phenotypes. Further, biases introduced in the ascertainment of both cases and controls can inflate or reduce estimates of genetic correlations. The impact of these design choices may have important implications for large meta-analyses of cohorts from diverse populations that use different forms of assessment and inclusion criteria, and subsequent cross-disorder analyses thereof. We review how assessment and ascertainment affect genetic findings in both univariate and multivariate analyses and conclude with recommendations for addressing them in future research.

Exploring the genetic overlap between twelve psychiatric disorders

Article 05 December 2022

Ten challenges for clinical translation in psychiatric genetics

Article 22 September 2022

The predicament of heritable confounders

Article 13 January 2026

Introduction

The comorbidity between psychiatric disorders stems, at least in part, from overlapping genetic factors. Understanding the genetic etiology of psychiatric outcomes can illuminate the common biological mechanisms that contribute to comorbid presentations of psychopathology, delineate distinct psychiatric disorders, and aid the development of more effective and targeted interventions. We focus on binary diagnoses of psychiatric disorders to link the implications of our recommendations to clinically validated outcomes and remain consistent with existing psychiatric genetics research. It could be argued that dimensional characterizations of psychopathology have more statistical power and are more capable of dissecting symptom heterogeneity. Nevertheless, we emphasize the clinical validity of categorical diagnoses which have been used extensively in psychiatric genetic analyses to situate our discussion within the context of large genomic initiatives such as the Psychiatric Genomics Consortium (PGC). Our goal in this review is to summarize current research and perspectives on how assessment and ascertainment strategies impact genetic findings for both individual disorders (Table 1) and, in turn, cross-disorder genetic sharing and the genetics of comorbidity and disease trajectories (Table 2). After examining the impact of various common assessment and ascertainment methods, we conclude with recommendations for collecting new genomic data and conducting rigorous genetic analyses in the future.

Table 1 Summary of assessment and ascertainment strategies for individual psychiatric disorder diagnoses in case-control cohorts, volunteer-based biobanks, and electronic health records (EHRs) or registries.

Full size table

Table 2 Sources of biases in individual disorder genetics, their potential impact on cross disorder genetic studies, mitigating factors in analyses, and recommendations for future data collection taking these biases and their effects into account.

Full size table

Assessment of individual psychiatric disorders

Diagnoses of psychiatric disorders used in genomic studies are obtained from a variety of research designs: clinician (or trained-research staff) administered structured interviews, self-administered questionnaires on current, and lifetime worst-episode symptoms [6, 7], self-reports of a prior or current diagnosis or treatment [7], and diagnostic codes from electronic health records (EHRs) [8,9,10] or registries [11]. The reliability and validity of psychiatric diagnoses are a function of variation in these assessment strategies within three primary domains.

The first domain is the depth of clinical detail with which a diagnosis is based. Structured interviews epitomize “deep” phenotyping, providing rich information on the clinical characteristics used to assign a diagnosis. Established instruments, such as the Structured Clinical Interview for DSM (SCID) [12] or the Composite International Diagnostic Interview (CIDI) [13], assess all symptoms, functional impairment, and exclusion criteria required for a DSM [14] or ICD [15] diagnosis. Some studies use the Operational Criteria Checklist [16], which leverages multiple operational diagnostic systems to enable consensus best-estimate procedures [17]. Such “deep” phenotyping was widely applied in the initial phases of the Psychiatric Genomic Consortium (PGC) meta-analyses [18,19,20]. This approach results in diagnoses that reflect current clinical standards and enables investigations into clinical heterogeneity. Supplementing “deep” phenotyping with assessments of other relevant psychiatric disorders, personality traits [21], early life factors [22], and stressful life events [23] further enables investigations into psychological and environmental correlates of disorders [24]. Conversely, “shallow” assessments allow us to quickly obtain large, inexpensive samples, accelerating gene-discovery efforts by increasing statistical power. However, shallow assessments, such as very short screening tools (one to four item scales) [25], while correlated with structured interviews, often yield high false positive rates [25] jeopardizing their clinical validity. Between these extremes exists a spectrum of assessment methods that vary in their depth, including self-reported symptom-based questionnaires, self-reported professional diagnoses or treatment, diagnostic codes (ICD9, ICD10), hospital visits, prescription records, and insurance claims based on clinical assessments from EHRs. These assessment techniques vary wildly in their reliability and validity. For example, diagnoses based on brief internet surveys may have questionable clinical validity while, for some disorders, online assessment instruments that assess a full set of diagnostic criteria have better psychometric properties [26]. Alternatively, those derived from prescriptions of restricted drugs, such as clozapine for treatment-resistant schizophrenia (SZ), can offer highly valid diagnoses [18, 27, 28]. Lower levels of reliability and clinical validity of shallow assessments may result in the misclassification of sub-clinical respondents as cases, influencing both genetic associations with the primary diagnosis and subsequent genetic correlations with comorbid conditions [29,30,31,32].

The second domain is the source of the assessment. Assessments of psychiatric disorders may come from clinicians (e.g., psychiatrists, other physicians, psychologists), trained research staff, self-reports, and relative or teacher reports [33,34,35,36,37]. The reliability and clinical validity of the psychiatric assessments vary as a function of the expertise of the interviewer, especially if the training or background of the interviewers enable them to create a sense of safety or rapport that allows the respondent to answer honestly, even for embarrassing topics. Consistency between trained psychiatrists and primary care physicians varies but is often high with repeated examinations [38, 39]. Diagnostic interviews conducted by trained research staff using semi-structured interviews such as the CIDI have been shown to have high validity when compared with structured interviews by clinicians [40]. However, diagnoses based on clinician ratings show significant differences from those relying on self-report [41, 42], with self-reports often being more severe [43, 44]. Furthermore, genetic analyses find that self-reports [45, 46] capture non-specific genetic effects and miss a significant portion of the genetic contributions to clinically defined disorders [45,46,47,48]. It remains unknown whether differences in validity between clinical and self-report diagnoses can be compensated for by repeated assessments [49]. Notably, the validity of self-reports can be influenced by disease-, symptom- and individual-specific factors that depend on a respondent’s comprehension of the questionnaire, motivation, and ability to answer accurately [50]. These self-report biases may be related to personality traits [51] or specific psychiatric symptoms [52] (which may influence disorder vulnerability), potentially impacting the reliability and generalizability of research findings.

The third domain is the time frame of the assessment. Genomic studies have started to explore how genetic variants affect temporal features of psychiatric disorders. Notably, lifetime diagnoses tend to be more heritable than current diagnoses [53, 54]. Genetic analyses demonstrate that self-reported current symptoms assessed by the Patient Health Questionnaire 9 are more reflective of subsyndromal dysphoria that is related to stressful life events and neuroticism, while self-reported worst-episode symptoms assessed through the CIDI Short Form [55] show greater genetic sharing with major depressive disorder (MDD). This suggests using current symptoms for identifying genetic contributions to disorders is likely to result in findings with low specificity that may be best limited to use in making current diagnoses. Alternatively, lifetime symptoms and diagnoses, may be modestly affected by inaccurate recollections, or other features of state-dependent memory [56]. The combination of over- and under-reporting due to selective recall introduces an unpredictable mixture of biases that depend on the lifetime prevalence of subsyndromal symptoms and is confounded with the source of the information (i.e., self-report vs clinician assessment) [57]. Genomic studies have started to explore how genetic variants affect other temporal features of psychiatric disorders. For example, age at onset or recurrence can reflect differences in genetic risk [58,59,60], and the timing of assessment relative to disorder onset can substantially affect genetic findings. More targeted analyses that isolate the effects of different time scale factors are needed.

As effect sizes of associations between individual genetic variants and psychiatric phenotypes are usually small, we need large sample sizes to obtain reproducible results. This means meta-analyzing data spanning all three assessment domains. The justification for integrating potentially heterogeneous phenotypes is usually based on high genetic correlations (rGs) between them. However, there are notable differences in the rGs among assessments of different disorders. The reported rGs between SZ samples collected through different means and populations are high (>0.9) [61, 62] while the rGs between MDD samples are as low as 0.59 [10]. Ignoring this variability may skew our understanding of the genetic architecture of individual disorders, rGs between disorders, and downstream analyses such as tissue-enrichment of the SNP-based heritability (h²_SNP), and prioritization of GWAS findings for fine-mapping and drug-target identification.

How strictly should individual or cross-disorder psychiatric genetics research rely on deep, clinician-assessed diagnoses based on established DSM criteria rather than shallow, self-reported symptoms or EHRs? The DSM is neither perfect nor immutable and is periodically revised based on advances in the understanding of the etiology of the disorders. DSM criteria do not, nor are they designed to, exhaustively capture the diagnostic complexity of any specific disorder [63, 64]. However, DSM-based diagnoses correspond with current best-practice patient care, providing reliable assessments and underscoring their clinical validity for translating research into beneficial patient outcomes. Nevertheless, dichotomizing individuals into cases and controls discards potentially valuable information regarding disease severity thereby potentially reducing the power to detect genetic associations. Alternatively, self-reported questionnaires are less expensive to administer, allowing researchers to collect substantially more data, increasing statistical power at the potential cost of clinical reliability and validity. Thus, it is important to consider supplementing data on current diagnostic criteria with additional measures, such as self-reports, to identify additional factors that may play an important role for refining the diagnostic formulations and subtypes of psychiatric disorders. In many ways deep, clinician-assessed diagnoses compliment shallow, self-reported measures, and vice versa. The challenge will be to integrate seemingly disparate assessment methods in a way that maximizes the clinical validity of structured interviews and the recruitment potential of self-reported measures. As such, understanding how different assessment procedures affect empirical findings will streamline the integration of genomic evidence into future DSM revisions [65], with the goal of using epistemic iteration to refine diagnostic criteria [66, 67].

Ascertaining cases and controls for individual disorders

Case ascertainment

Strategies for identifying and recruiting individuals who meet diagnostic criteria for a psychiatric disorder can influence genetic associations and their interpretations [68]. Ascertainment for genomic studies primarily occurs in three forms: targeted recruitments of cases with a specific disorder from clinical or research settings, sampling from EHRs, and population-based sampling. While ascertainment strategies are theoretically independent of assessment methods and the prevalence of the target phenotype, practical constraints can confound these design factors.

Early in the psychiatric GWAS era, genomic studies primarily relied on targeted recruitments, requiring the coordination of networks of mental health professionals to screen patients for a target disorder, typically employing deep phenotyping [69,70,71]. This strategy was effective for the initial GWAS of rare disorders, particularly SZ [72] and bipolar disorder [73] (BD). Importantly, participants recruited from clinical settings frequently exhibit more severe illness than their counterparts in EHR and population-based studies [74,75,76]. Targeted approaches are typically the best way to obtain large numbers of cases of relatively rare disorders [77, 78]. One concern with this approach is whether such samples are representative, or biased toward treatment-seeking, severity, excess comorbidity, and/or treatment non-responsiveness. In addition, the exclusion of cases with other comorbid disorders (common among core PGC cohorts) likely affects its profile of genetic sharing, dependent on the patterns of comorbidity. Nonetheless, these ascertainment techniques, underscored by rigorous assessment methods, contributed to the success of the early PGC GWAS efforts.

National registries [79, 80] and EHRs [81,82,83] record healthcare information for everyone in their catchment, making them effective ascertainment strategies for identifying common and rare disorders. Patient diagnostic codes available through these resources can, in some instances, have high validity. For example, several follow-up clinical studies of cases [84, 85] of SZ [86, 87], BD [31], and obsessive compulsive disorder (OCD) [88] in Swedish and Danish registries and American EHRs have demonstrated strong validation against DSM criteria. Some EHRs have comprehensive doctors’ notes from individual interviews, which – if carefully coded – can augment case-control outcomes for genetic analyses [32, 89, 90]. Diagnostic data from EHRs and registries, however, can be heterogeneous. First, some healthcare systems use billing codes and base insurance claims or reimbursements on diagnostic assignment, while others do not. These incentive structures can create systematic biases in code assignment [91, 92]. Second, diagnoses inferred from administrative sources (e.g., pharmaceutical records) are indirect, adding uncertainty into the “case” phenotype. Third, different diagnostic biases, such as those related to search satisfaction (leading to underdiagnosis of comorbidities) and diagnostic momentum (sticking to a previous or working diagnosis even when it is erroneous) may differentially affect specific psychiatric disorders [44, 93].

EHRs and registries, however, may not be representative, capturing only those who interact with the healthcare system, and may oversample individuals with comorbidities and increased access to healthcare [94, 95]. This results in a disproportionate number of unhealthy individuals in EHRs, depending on the specific psychiatric disorder [96]. Further, EHRs based on insurance records, common in the US, may bias the presence of diagnosis or diagnostic classifications due to variable mobility, socioeconomic status and access to healthcare. This ascertainment problem can lead to biased estimates of polygenic score effect sizes. They EHRs also substantially under-represent early-onset disorders such as autism spectrum disorder [97], especially in females, though correlates later-in-life may be informative [84, 85]. These ascertainment problems affect the representativeness of the samples and can significantly affect cross-disorder genetic results by potentially biasing genetic analyses [96]. Finally, registries or EHRs may not contain information that provides a psychosocial context for the patient’s illness. Nonetheless, innovative ways to utilize EHR and registry data have potential for case identification [98,99,100].

Population-based biobanks are a common non-targeted means to collect data on psychiatric disorders [101] which have proven particularly useful for genomic analyses of common psychiatric disorders that are amenable to large-scale data collection using self-administered questionnaires with varying depth and time frames of assessment [45, 55]. However, population-based recruitment is sensitive to healthy volunteer biases. For example, the UK Biobank [102] invited approximately 9 million individuals to participate but only recruited 500,000 respondents (5.5% response rate), who are more likely to be older, female, living in less socioeconomically deprived areas, and reporting fewer physical and mental health conditions than the general population in the UK [101, 103]. Many studies have shown that this “healthy volunteer bias” distorts the associations among phenotypes [104,105,106], and with genetic variants [107] that are associated with self-selection. Notably, several genetic variants that are associated with self-selection are also associated with psychiatric disorders [108,109,110,111]. Unless adequately mitigated through statistical approaches [104, 112,113,114,115] or validated through experimental means [112], genetic findings from volunteer samples may compound biases [104]. Despite these limitations, population-based biobanks have made important contributions to progress in psychiatric genetics.

Control ascertainment

While the recruitment and assessment of cases dominate ascertainment debates, the selection of controls poses underappreciated methodological issues [116,117,118]. In clinical ascertainment, case and control participants are typically recruited independently, so case-control differences may be driven by both disease liability and ascertainment procedures. While the ascertainment biases discussed above regarding the selection of cases apply to the selection of controls in a broad sense, there are several control specific ascertainment factors that deserve attention. Most importantly, to identify meaningful case-control differences, controls should resemble cases in all characteristics except for the absence of the disorder for which cases are selected. Controls selected on this principle are referred to as normal controls.

However, the collection of controls in many genetic studies does not follow this principle, and the strategies used are not always adequately reported [74, 75]. In particular, many psychiatric GWAS use super-normal controls who are screened for the disorder being studied and other psychiatric disorders that are not screened out of cases [119, 120]. Epidemiological studies have shown that the use of super-normal controls not only exaggerates case-control differences but can induce familial/genetic correlations in the absence of any true relationships [120]. In family studies, the use of super-normal controls produces spurious co-aggregation between disorders, with the magnitude of the bias increasing proportional to the population prevalence of screened-out correlated disorders [121]. Simulation studies demonstrate that the symmetrical use of super-normal controls in GWAS of two disorders inflates rG proportional to the population prevalence of the two disorders and the simulated magnitude of the association [122]. For example, if parallel GWASs of MDD and SUD were conducted that included the opposite disorder in the cases but excluded them from the controls, the resulting MDD-SUD rG estimate would be overestimated.

The problem here, simply put, is the case-vs-super-normal-control difference reflects not only case-control differences for the target disorder but also of any traits or diseases that were asymmetrically screened out of the control group. This will upwardly bias GWAS effect sizes as a function of the prevalence of the diseases that are disproportionately screened out of the controls, compounding biases in analyses that use the summary statistics [122]. To further complicate the situation, some studies not only screen controls based on their own phenotype but also on the phenotypes of close relatives [123]. Alternatively, because screening potential controls can be effortful and expensive, unscreened controls have been used in some psychiatric GWAS [124, 125]. In this scenario, the control group may contain cases of the target disorder at approximately the population prevalence. Here, without appropriate correction, genetic associations are downwardly biased, with the magnitude of the bias increasing for more prevalent disorders in the population [126].

Going forward

In GWAS meta-analyses, most of the samples for common disorders (e.g., MDD) are population-ascertained with shallow phenotyping, whereas those for less common disorders (e.g. SZ, BD) are predominantly clinically ascertained or obtained through EHRs and registries. Thus, biases in GWAS meta-analysis may operate differently across disorders. This complicates cross-disorder analyses, where shared genetic effects across disorders may reflect an unknown mixture of biases due to the different assessment and ascertainment strategies and true etiologic overlap between diagnostic entities. While misdiagnosis influences rGs between genetically related disorders [127], simulation studies suggest that an implausibly high level of misdiagnosis [3] would be required to account for the observed rGs between most pairs of psychiatric disorders in the absence of true genetic overlap. Nevertheless, lower levels of case misclassifications can inflate rG especially when misdiagnosis occurs for both disorders, and the magnitude of inflation depends on the magnitude of the rGs between disorders [45] and their prevalence. Finally, inflation of rGs can result from other sources including cross-trait assortative mating [128]. While some of these biases may cancel each other out, accurately identifying the source of pleiotropy and comorbidity remains essential for illuminating the shared genetic architecture of psychiatric disorders. In this section, we summarize ways to reduce or quantify biases that affect assessment and ascertainment strategies in both individual and cross-disorder genetic findings and give recommendations for future data collection efforts.

Refining phenotypes

Phenotypic quality control substantially increases the validity of psychiatric diagnoses, including applying stringent clinical criteria [45], requiring multiple endorsements from different assessment strategies [49, 129], and ensuring consistency of endorsements across time [130]. For example, correcting for mis-reports in different measures of alcohol use increases the rGs across different assessment strategies from 0.79 to >0.9 [130].

We now have a wide range of tools to quantify and compare the genetic architectures of the same disorder collected through different assessment and ascertainment strategies [131]. At the individual locus level, we can assess the replicability or heterogeneity of effects across assessment strategies [19, 28, 132]. At the genome-wide level, we can assess whether SNP-heritability estimates of the same disorder are similar across different study designs, and whether rGs among them are close to unity [10, 45, 62]. We can further assess whether polygenic risk scores (PRSs) from each assessment or ascertainment strategy robustly associate with scores from the other strategies [8, 62]. A recently derived metric called PRS Pleiotropy takes these approaches further, by assessing how well a PRS predicts the disorder of interest relative to other phenotypes (available in biobanks and EHRs) [133]. With PRS Pleiotropy as a means to assess specificity, we can identify clinically valid shallow phenotypes (e.g. clozapine treatment for SZ [18, 27, 28]) to include in GWAS meta-analyses. While no single test provides unambiguous evidence of bias, consistency across multiple tests provide convergent evidence of stable genetic effects.

We can also utilize statistical methods that combine genetic effects from shallow and deep measures to maximally leverage all data collected for improving GWAS power while maintaining reasonable specificity. These methods include LT-FH [134] (which models family history-based liability to disease), MTAG [135] (a meta-analytical approach leveraging information from collateral GWAS phenotypes with high rG to target GWAS), and Genomic SEM [136] (a framework for modeling genetic covariance structure that can be used to specify common and unique genetic factors underlying a system of GWAS phenotypes and perform GWAS discovery on those factors). In contrast to methods that require carefully choosing input phenotypes, multiple-phenotype imputation presents a relatively agnostic way to boost sample sizes for deep measures of a disorder (usually available in only a subset of individuals in a biobank) [133, 137]. Exploring different imputation approaches, especially non-linear models, can further allow us to utilize more data modalities (multi-omics [138,139,140,141], imaging [142, 143], data from smartphones and wearable devices [144, 145]). Further methodological developments applied to time-censored and longitudinal data in EHRs may help to refine diagnostic accuracy beyond missing value imputation [29, 92].

Accounting for ascertainment biases

As biases are prevalent and unavoidable, developing methods to assess and control for them is critical for obtaining generalizable findings [96]. One way to address known bias, such as sex-differential participation, is to stratify GWAS and all subsequent analyses by the known factor [114, 146] However, psychiatric disorders and relevant comorbid traits are unlikely to be biased by a single factor as straightforward as sex-differential participation, and stratification by factors that are also genetically regulated may induce collider biases [107, 113].

Several studies have proposed the use of inverse probability (IP) weightings (up-weighting participants with features identified to be associated with lower participation) [113, 147, 148] to improve representativeness of relationships identified between variables of interest (and interactions between them) in participants of volunteer-based biobanks [96, 104, 146]. This approach has been shown to improve the robustness of GWAS findings, rGs, and results of Mendelian randomization (MR) [115]. Notably, IP weighting relies on training feature selection models using variables affecting participation that are available in both the unrepresentative dataset (e.g., the UK Biobank) and a representative dataset from the same population (e.g. the UK Census microdata [104]). As misspecification of IP weightings may introduce further biases [113], feature selection for IP models will vary across different psychiatric disorders based on disease severity and other known risk factors [115, 128, 130, 146]. Further, under some circumstances IP weighting may reduce power [149]. Despite these limitations, this approach can be applied to correct for participation biases in EHRs and cohort studies. Of note, as we move towards analyzing disease trajectories that involve diagnostic conversions and comorbidities, we need to address a specific form of ascertainment bias: the index event bias [113]. For example, genetic effects identified as associated with late-onset BD (the disease incidence) in MDD cases would be biased by genetic effects associated with MDD diagnosis (the index event) [150, 151]. However, their utility in investigations into comorbidities among psychiatric disorders are limited, as they assume no correlation or interaction between SNP effects on disease progression and incidence. Methods for identifying, clustering, and correcting for incidence have been developed [152, 153], but like IP weighting methods, they are currently low in power.

Quantifying and correcting for ascertainment biases is an active area of research [113]. Nevertheless, novel methods are likely to remain imperfect. As such, sensitivity analyses of genetic associations are recommended to identify the bounds of worst-case biases and the minimal level of bias necessary to account for the genetic findings [154].

Investigating disease trajectories and comorbidities from a genomic perspective

While most psychiatric disorders have clear developmental components, developmental processes are just beginning to be integrated into genomic analyses. Genetic studies of disease trajectories have become more feasible with the increased availability of data from biobanks, EHRs and registries linked with genetic data that may inform the interrelated development of multiple disorders. Self-reports of first diagnosis from the UK Biobank [155], for instance, enable the examination of temporal factors that may affect the comorbidity between symptom criteria for anxiety disorders and MDD [156] as well as their comorbidities with non-psychiatric phenotypes [157, 158]. Alternatively, repeated measurements from EHR or registry records provide the longitudinal elements necessary for prospective genomic studies [159, 160]. Furthermore, there are now large genotyped prospective samples, not relying on retrospective data [161, 162].

When considering the trajectory of disease progression, how patients are sampled also has major implications for genetic analyses and comorbidity. A recent longitudinal Swedish study of cases of MDD, BD, and SZ (using recorded discharges from the Swedish registry) concluded that “Over time clinical diagnosis and genetic risk profiles became increasingly consilient [58]”. These results suggest that genetic correlations between BD and SZ may be higher in cases examined early versus later in their course of illness. What might be termed diagnostic error could in part reflect the clinical development of the disorders over time [59, 60].

Records of clinical diagnoses of psychiatric disorders from millions of individuals in the Swedish and Danish registries have shown high, though variable, rates of comorbidities between different pairs of psychiatric disorders [163,164,165], corroborated by findings from a Columbian EHR study [147]. Studies using polygenic risk scores (PRS) [166, 167] or family genetic risk scores (FGRS) [168,169,170,171] can investigate patterns of shared genetic risk between pairs of disorders or their comorbidities. Many interesting insights confirm previous expectations: FGRS of disorders vary in their ability to predict comorbid disorders as would be expected from variation in the prevalence of individual disorders and genetic correlations between them [165]; MDD cases with higher FGRS for BD have an elevated rate of conversion to a BD diagnosis (also generally true for other pairs of disorders) [58]; multinomial logistic regression using both PRS and FGRS are able to identify genetic heterogeneities among cases of MDD [170] and ADHD with different comorbid disorders [166]. Some findings, however, defy previous expectations and offer new opportunities for expanding our understanding of psychiatric disorders: other non-affective psychoses are found to have much lower SZ FGRS than expected, calling into question their inclusion in SZ analyses [168]. To date, psychiatric GWAS has not typically stratified analyses by different patterns of comorbidity. Following from the PRS and FGRS genetic heterogeneity results, this reflects a promising avenue for future cross-disorder genomic research to evaluate the extent to which different comorbid presentations implicate unique biological pathways.

Most psychiatric genetic studies to date have taken a cross-sectional disease-centric approach, focusing on investigations into genetic contributions to individual disorders while ignoring current comorbidities or subsequent conversions to other disorders. We would hypothesize that phenotypes that share similar trajectories also share genetic (in addition to environmental) precursors. Not all diagnostic switches (defined to be conversions among disorders that are exclusion criteria for each other in the DSM [172]) may pass this validity test, as they are based entirely on DSM-defined exclusion criteria that may be arbitrary. Disease trajectory analyses, therefore, present important opportunities for improving and refining disease nosology and DSM criteria. In fact, taking the trajectory-centric approach may enable us to get traction on potential biases that might otherwise inflate (or deflate) estimates of apparent pleiotropy, such as cross-sectional misclassifications of two diagnoses with frequent transitions [173] (e.g. BD and MDD, psychotic disorders and affective psychoses), and age-related differences in genetic correlations. Accordingly, we need strategies for keeping analyses tractable without losing resolution. This may require identifying biologically interesting questions, defining relevant phenotypes [58], designing useful data formats [164], and developing necessary statistical metrics [174]. Statistical approaches developed for assessing multimorbidity across the entire disease classification tree, currently employed on first-diagnoses or inpatient data in the UK Biobank, may also be customized to accommodate diagnostic criteria specific to psychiatric disorders, or longitudinal trajectory data in EHRs and registries [175,176,177].

Recommendations for future data collections

Integrating data from disparate assessment and ascertainment strategies will continue to pose challenges to psychiatric genetics in the foreseeable future. While little can be done to alter the study design choices of existing data, we hope that in planning future genomic data collection efforts, researchers will consider the implications that assessment and ascertainment techniques have on the validity, severity, comorbidity and genetic sharing across psychiatric disorders.

Diagnostic validity for individual disorders is a necessary but insufficient condition for any phenotyping approach. Cases and controls in new cohorts, especially when collected through different strategies, should demonstrate similar epidemiological relationships with known risk and protective factors in the population they are obtained from. For example, SZ cases should show a range of characteristics including male excess, mean age of onset in early to mid-20s, and present evidence of poor premorbid social or educational functioning and impaired social functioning, in addition to the canonically assessed key symptoms. Further, tests of the specificity of identified genetic risk (see above) are also critical.

Deep phenotyping studies will play a vital role in dissecting and understanding findings from heterogeneous meta-analyses, buttressing the translation of psychiatric molecular genetic results into diagnostic and treatment regimens. This is particularly important for cross-disorder genetic studies, as shallow phenotyping may be less accurate for some disorders than others. For these, we recommend: (i) expanding symptom assessment beyond DSM or ICD criteria to permit the measurement of other relevant clinical and non-clinical dimensions and/or subtypes that may not be captured by standard criteria, (ii) hiring trained mental health interviewers familiar with the relevant symptomatology of the case sample, (iii) establishing rigorous quality control procedures for interviewers such as monitored interview recordings by trained editors, and (iv) where possible, especially for more severe disorders, complementing interviews with reviews of relevant clinical records. For such studies we would also recommend consensus all-sources diagnostic procedures.

Conversely, studies that use non-clinician assessment approaches will continue to play a key role in recruiting large samples that are necessary for genomic analyses. For these studies, we recommend: (i) avoiding single item screens and prior treatment- or diagnosis-based questions (e.g., “Have you ever been diagnosed with …”) in favor of brief self-report versions of full diagnostic criteria, some of which have been validated in genetic designs [178, 179]; (ii) remaining cognizant of the potential for misdiagnosis especially with regard to false positives and negatives for standard screens for psychotic symptoms [180]; (iii) recognizing the impact of the time-frame of assessment, recalling that, overall, lifetime measures are likely to be more genetically informative, and (iv) utilizing modular assessment designs that allow participants to be recontacted to obtain more detailed assessments where necessary or followed-up for longitudinal assessments and trajectory analyses.

Selecting ascertainment strategies for psychiatric genomic investigations will likely be guided by the researcher’s access to data. However, it is important to keep the corresponding ascertainment biases in mind when analyzing genomic data. Furthermore, we recommend (i) using representative (not super-normal) controls, (ii) developing an ascertainment frame for cases that avoids oversampling severe and/or treatment resistant illness unless that is a specific focus of the design, and (iii) when possible, assessing phenotypes through measurement non-invariance techniques.

Finally, we call for greater efforts recruiting cohorts from diverse ancestries and environments. Most genetic studies have been performed on individuals of European descent who have relatively easy access to healthcare. Not only do we need to increase data collection in previously underrepresented communities, we must also pay careful attention to the translation of assessment instruments and, where necessary, design and benchmark new data collection protocols to address language and cultural differences. Further, with the increasing use of electronic health records in genetic research, we would like to urge the greater research community, not just in psychiatric genetics [82, 181,182,183], to investigate the social determinants that bias representation of different communities in these resources [184, 185]. Such biases can skew our understanding of disorder risk and comorbidities, and if uncorrected, result in increasing healthcare disparities [186].

Conclusions

Over the last 15 years, robust genetic associations have been identified for numerous psychiatric disorders, both under the auspices of the PGC and in independent studies. As we move into an era of historically large sample sizes in the genomic sciences, it is essential that we avoid assuming that larger samples will overcome biases and remain vigilant to the challenges associated with various measurement and ascertainment approaches in studies contributing to large meta-analyses. The translation of genetic findings into novel diagnostic techniques and treatment regimens for psychiatric disorders are predicated on valid assessment techniques and unbiased ascertainment strategies, as well as statistical methods to analyze genomic data. The aim, which we should always keep in mind, is identifying loci affecting risk for the disorders and disaggregating pleiotropic from disorder-specific variants. This will enable us to understand the biological mechanisms of individual psychiatric disorders and their comorbidity and serve as the foundation for improvements in diagnoses and individualized treatments of patients living with mental illness.

Change history

27 January 2025
The original online version of this article was revised: In this article the author’s name Wouter J. Peyrot was incorrectly written as Wouter Peyrout. The original article has been corrected.
28 January 2025
A Correction to this paper has been published: https://doi.org/10.1038/s41380-025-02914-4

References

Kendler KS, Aggen SH, Knudsen GP, Røysamb E, Neale MC, Reichborn-Kjennerud T. The structure of genetic and environmental risk factors for syndromal and subsyndromal common DSM-IV axis I and all axis II disorders. Am J Psychiatry. 2011;168:29–39.
PubMed Google Scholar
Pettersson E, Lichtenstein P, Larsson H, Song J, Attention Deficit/Hyperactivity Disorder Working Group of the iPSYCH-Broad-PGC Consortium, Autism Spectrum Disorder Working Group of the iPSYCH-Broad-PGC Consortium, Bipolar Disorder Working Group of the PGC, Eating Disorder Working Group of the PGC, Major Depressive Disorder Working Group of the PGC, Obsessive Compulsive Disorders and Tourette Syndrome Working Group of the PGC, Schizophrenia CLOZUK, Substance Use Disorder Working Group of the PGC, Agrawal A, et al. Genetic influences on eight psychiatric disorders based on family data of 4 408 646 full and half-siblings, and genetic data of 333 748 cases and controls. Psychol Med. 2019;49:1166–73.
CAS PubMed Google Scholar
Brainstorm Consortium, Anttila V, Bulik-Sullivan B, Finucane HK, Walters RK, Bras J, et al. Analysis of shared heritability in common disorders of the brain. Science. 2018;360:eaap8757.
Grotzinger AD, Mallard TT, Akingbuwa WA, Ip HF, Adams MJ, Lewis CM, et al. Genetic architecture of 11 major psychiatric disorders at biobehavioral, functional genomic and molecular genetic levels of analysis. Nat Genet. 2022;54:548–59.
CAS PubMed Central PubMed Google Scholar
Cross-Disorder Group of the Psychiatric Genomics Consortium. Electronic address: plee0@mgh.harvard.edu, Cross-disorder group of the psychiatric genomics consortium. genomic relationships, novel loci, and pleiotropic mechanisms across eight psychiatric disorders. Cell. 2019;179:1469–82.e11.
PubMed Central Google Scholar
Howard DM, Adams MJ, Shirali M, Clarke T-K, Marioni RE, Davies G, et al. Genome-wide association study of depression phenotypes in UK Biobank identifies variants in excitatory synaptic pathways. Nat Commun. 2018;9:1470.
PubMed Central PubMed Google Scholar
Hyde CL, Nagle MW, Tian C, Chen X, Paciga SA, Wendland JR, et al. Identification of 15 genetic loci associated with risk of major depression in individuals of European descent. Nat Genet. 2016;48:1031–6.
CAS PubMed Central PubMed Google Scholar
Wray NR, Ripke S, Mattheisen M, Trzaskowski M, Byrne EM, Abdellaoui A, et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat Genet. 2018;50:668–81.
CAS PubMed Central PubMed Google Scholar
Howard DM, Adams MJ, Clarke T-K, Hafferty JD, Gibson J, Shirali M, et al. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nat Neurosci. 2019;22:343–52.
CAS PubMed Central PubMed Google Scholar
Levey DF, Stein MB, Wendt FR, Pathak GA, Zhou H, Aslan M, et al. Bi-ancestral depression GWAS in the Million Veteran Program and meta-analysis in >1.2 million individuals highlight new therapeutic directions. Nat Neurosci. 2021;24:954–63.
CAS PubMed Central PubMed Google Scholar
Schork AJ, Won H, Appadurai V, Nudel R, Gandal M, Delaneau O, et al. A genome-wide association study of shared risk across psychiatric disorders implicates gene regulation during fetal neurodevelopment. Nat Neurosci. 2019;22:353–61.
CAS PubMed Central PubMed Google Scholar
First MB, Williams JBW, Karg RS, Spitzer RL SCID-5-CV: Structured Clinical Interview for DSM-5 Disorders : Clinician Version. American Psychiatric Pub; (2015).
Wittchen HU. Reliability and validity studies of the WHO-Composite International Diagnostic Interview (CIDI): a critical review. J Psychiatr Res. 1994;28:57–84.
CAS PubMed Google Scholar
Diagnostic and Statistical Manual of Mental Disorders: Dsm-5. Amer Psychiatric Pub Incorporated; (2013).
World Health Organization. The International Statistical Classification of Diseases and Health Related Problems ICD-10: Tenth Revision. Volume 2: Instruction Manual. World Health Organization; 2004.
Azevedo MH, Soares MJ, Coelho I, Dourado A, Valente J, Macedo A, et al. Using consensus OPCRIT diagnoses. An efficient procedure for best-estimate lifetime diagnoses. Br J Psychiatry. 1999;175:154–7.
CAS PubMed Google Scholar
Leckman JF, Sholomskas D, Thompson WD, Belanger A, Weissman MM. Best estimate of lifetime psychiatric diagnosis: a methodological study. Arch Gen Psychiatry. 1982;39:879–83.
CAS PubMed Google Scholar
Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014;511:421–7.
PubMed Central Google Scholar
Mullins N, Forstner AJ, O’Connell KS, Coombes B, Coleman JRI, Qiao Z, et al. Genome-wide association study of more than 40,000 bipolar disorder cases provides new insights into the underlying biology. Nat Genet. 2021;53:817–29.
CAS PubMed Central PubMed Google Scholar
Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium, Ripke S, Wray NR, Lewis CM, Hamilton SP, Weissman MM, et al. A mega-analysis of genome-wide association studies for major depressive disorder. Mol Psychiatry. 2013;18:497–511.
Google Scholar
Eysenck HJ, Eysenck SBG Eysenck personality inventory. PsycTESTS Dataset. (2016).
Parker G, Tupling H, Brown LB. A parental bonding instrument. Br J Med Psychol. 1979;52:1–10.
Google Scholar
Goodman LA, Corcoran C, Turner K, Yuan N. Green BL Stressful life events screening questionnaire. PsycTESTS Dataset. (2011).
CONVERGE consortium. Sparse whole-genome sequencing identifies two loci for major depressive disorder. Nature. 2015;523:588–91.
PubMed Central Google Scholar
Mitchell AJ, Coyne JC. Do ultra-short screening instruments accurately detect depression in primary care? A pooled analysis and meta-analysis of 22 studies. Br J Gen Pract. 2007;57:144–51.
PubMed Central PubMed Google Scholar
van Ballegooijen W, Riper H, Cuijpers P, van Oppen P, Smit JH. Validation of online psychometric instruments for common mental health disorders: a systematic review. BMC Psychiatry. 2016;16:45.
PubMed Central PubMed Google Scholar
Rees E, Walters JTR, Georgieva L, Isles AR, Chambert KD, Richards AL, et al. Analysis of copy number variations at 15 schizophrenia-associated loci. Br J Psychiatry. 2014;204:108–14.
PubMed Central PubMed Google Scholar
Hamshere ML, Walters JTR, Smith R, Richards AL, Green E, Grozeva D, et al. Genome-wide significant associations in schizophrenia to ITIH3/4, CACNA1C and SDCCAG8, and extensive replication of associations reported by the Schizophrenia PGC. Mol Psychiatry. 2013;18:708–12.
CAS PubMed Google Scholar
Smoller JW. The use of electronic health records for psychiatric phenotyping and genomics. Am J Med Genet B Neuropsychiatr Genet. 2018;177:601–12.
PubMed Google Scholar
Madden JM, Lakoma MD, Rusinak D, Lu CY, Soumerai SB. Missing clinical and behavioral health data in a large electronic health record (EHR) system. J Am Med Inform Assoc. 2016;23:1143–9.
PubMed Central PubMed Google Scholar
Sellgren C, Landén M, Lichtenstein P, Hultman CM, Långström N. Validity of bipolar disorder hospital discharge diagnoses: file review and multiple register linkage in Sweden. Acta Psychiatr Scand. 2011;124:447–53.
CAS PubMed Google Scholar
Castro VM, Minnier J, Murphy SN, Kohane I, Churchill SE, Gainer V, et al. Validation of electronic health record phenotyping of bipolar disorder cases and controls. Am J Psychiatry. 2015;172:363–72.
PubMed Google Scholar
Thapar A, Harrington R, Ross K, McGuffin P. Does the definition of ADHD affect heritability? J Am Acad Child Adolesc Psychiatry. 2000;39:1528–36.
CAS PubMed Google Scholar
Overgaard KR, Oerbeck B, Friis S, Pripp AH, Aase H, Zeiner P. Predictive validity of attention-deficit/hyperactivity disorder from ages 3 to 5 Years. Eur Child Adolesc Psychiatry. 2022;31:1–10.
PubMed Google Scholar
Merwood A, Greven CU, Price TS, Rijsdijk F, Kuntsi J, McLoughlin G, et al. Different heritabilities but shared etiological influences for parent, teacher and self-ratings of ADHD symptoms: an adolescent twin study. Psychol Med. 2013;43:1973–84.
CAS PubMed Google Scholar
Ip HF, van der Laan CM, Krapohl EML, Brikell I, Sánchez-Mora C, Nolte IM, et al. Genetic association study of childhood aggression across raters, instruments, and age. Transl Psychiatry. 2021;11:413.
CAS PubMed Central PubMed Google Scholar
Van der Laan CM, Ip HF, Schipper M, Hottenga J-J, Krapohl EML, Brikell I, et al. Meta-analysis of genome wide association studies on childhood ADHD symptoms and diagnosis reveals 17 novel loci and 22 potential effector genes. bioRxiv. (2024).
Kendler KS, Ohlsson H, Bacanu S, Sundquist J, Sundquist K. Differences in genetic risk score profiles for drug use disorder, major depression, and ADHD as a function of sex, age at onset, recurrence, mode of ascertainment, and treatment. Psychol Med. 2023;53:3448–60.
PubMed Google Scholar
Mitchell AJ, Vaze A, Rao S. Clinical diagnosis of depression in primary care: a meta-analysis. Lancet. 2009;374:609–19.
PubMed Google Scholar
Kessler RC, Abelson J, Demler O, Escobar JI, Gibbon M, Guyer ME, et al. Clinical calibration of DSM-IV diagnoses in the World Mental Health (WMH) version of the World Health Organization (WHO) Composite International Diagnostic Interview (WMHCIDI). Int J Methods Psychiatr Res. 2004;13:122–39.
PubMed Google Scholar
Sayer NA, Sackeim HA, Moeller JR, Prudic J, Devanand DP, Coleman EA, et al. The relations between observer-rating and self-report of depressive symptomatology. Psychol Assess. 1993;5:350–60.
Google Scholar
von Glischinski M, von Brachel R, Thiele C, Hirschfeld G. Not sad enough for a depression trial? A systematic review of depression measures and cut points in clinical trial registrations. J Affect Disord. 2021;292:36–44.
Google Scholar
Thombs BD, Kwakkenbos L, Levis AW, Benedetti A. Addressing overestimation of the prevalence of depression based on self-report screening questionnaires. CMAJ. 2018;190:E44–E49.
PubMed Central PubMed Google Scholar
Fried EI, Flake JK, Robinaugh DJ. Revisiting the theoretical and methodological foundations of depression measurement. Nat Rev Psychol. 2022;1:358–68.
PubMed Central PubMed Google Scholar
Cai N, Revez JA, Adams MJ, Andlauer TFM, Breen G, Byrne EM, et al. Minimal phenotyping yields genome-wide association signals of low specificity for major depression. Nat Genet. 2020;52:437–47.
CAS PubMed Central PubMed Google Scholar
Davies MR, Buckman JEJ, Adey BN, Armour C, Bradley JR, Curzons SCB, et al. Comparison of symptom-based versus self-reported diagnostic measures of anxiety and depression disorders in the GLAD and COPING cohorts. J Anxiety Disord. 2022;85:102491.
PubMed Google Scholar
Kendler KS, Gardner CO, Neale MC, Aggen S, Heath A, Colodro-Conde L, et al. Shared and specific genetic risk factors for lifetime major depression, depressive symptoms and neuroticism in three population-based twin samples. Psychol Med. 2019;49:2745–53.
PubMed Google Scholar
Dahl A, Thompson M, An U, Krebs M, Appadurai V, Border R, et al. Phenotype integration improves power and preserves specificity in biobank-based genetic studies of major depressive disorder. Nat Genet. 2023;55:2082–93.
CAS PubMed Central PubMed Google Scholar
Glanville KP, Coleman JRI, Howard DM, Pain O, Hanscombe KB, Jermy B, et al. Multiple measures of depression to enhance validity of major depressive disorder in the UK Biobank. BJPsych Open. 2021;7:e44.
PubMed Central PubMed Google Scholar
Stone AA, Bachrach CA, Jobe JB, Kurtzman HS, Cain VS. The Science of Self-report: Implications for Research and Practice. Psychology Press; (1999).
Kendler KS, Prescott CA, Jacobson K, Myers J, Neale MC. The joint analysis of personal interview and family history diagnoses: evidence for validity of diagnosis and increased heritability estimates. Psychol Med. 2002;32:829–42.
CAS PubMed Google Scholar
Heath AC, Neale MC, Kessler RC, Eaves LJ, Kendler KS. Evidence for genetic influences on personality from self-reports and informant ratings. J Pers Soc Psychol. 1992;63:85–96.
CAS PubMed Google Scholar
Cheesman R, Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium, Purves KL, Pingault J-B, Breen G, Rijsdij K F, et al. Extracting stability increases the SNP heritability of emotional problems in young people. Transl Psychiatry. 2018;8:223.
PubMed Central PubMed Google Scholar
Zavos HMS, Gregory AM, Eley TC. Longitudinal genetic analysis of anxiety sensitivity. Dev Psychol. 2012;48:204–12.
PubMed Google Scholar
Huang L, Tang S, Rietkerk J, Appadurai V, Krebs MD, Schork AJ, et al. Polygenic analyses show important differences between MDD symptoms collected using PHQ9 and CIDI-SF. Biol Psychiatry. 2023. 4 December 2023. https://doi.org/10.1016/j.biopsych.2023.11.021.
Brewin CR, Andrews B, Gotlib IH. Psychopathology and early experience: a reappraisal of retrospective reports. Psychol Bull. 1993;113:82–98.
CAS PubMed Google Scholar
Levis B, Benedetti A, Ioannidis JPA, Sun Y, Negeri Z, He C, et al. Patient Health Questionnaire-9 scores do not accurately estimate depression prevalence: individual participant data meta-analysis. J Clin Epidemiol. 2020;122:115–28.e1.
PubMed Google Scholar
Kendler KS, Ohlsson H, Sundquist J, Sundquist K. Relationship of family genetic risk score with diagnostic trajectory in a Swedish national sample of incident cases of major depression, bipolar disorder, other nonaffective psychosis, and schizophrenia. JAMA Psychiatry. 2023;80:241–9.
PubMed Central PubMed Google Scholar
Feng Y-CA, Ge T, Cordioli M, Ganna A, Smoller JW, Neale BM, et al. Findings and insights from the genetic investigation of age of first reported occurrence for complex disorders in the UK Biobank and FinnGen. bioRxiv. (2020).
Baker E, Leonenko G, Schmidt KM, Hill M, Myers AJ, Shoai M, et al. What does heritability of Alzheimer’s disease represent? PLoS One. 2023;18:e0281440.
CAS PubMed Central PubMed Google Scholar
Lam M, Chen C-Y, Li Z, Martin AR, Bryois J, Ma X, et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat Genet. 2019;51:1670–8.
CAS PubMed Central PubMed Google Scholar
Pardiñas AF, Holmans P, Pocklington AJ, Escott-Price V, Ripke S, Carrera N, et al. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nat Genet. 2018;50:381–9.
PubMed Central PubMed Google Scholar
Kendler KS. DSM disorders and their criteria: how should they inter-relate? Psychol Med. 2017;47:2054–60.
CAS PubMed Google Scholar
Kendler KS. The Phenomenology of Major Depression and the Representativeness and Nature of DSM Criteria. Am J Psychiatry. 2016;173:771–80.
PubMed Google Scholar
Kendler KS. A history of the DSM-5 Scientific Review Committee. Psychol Med. 2013;43:1793–1800.
CAS PubMed Google Scholar
Chang H Inventing Temperature: Measurement and Scientific Progress. Oxford University Press on Demand; (2004).
Kendler KS, Parnas J Philosophical Issues in Psychiatry II: Nosology. OUP Oxford; (2012).
Trzaskowski M, Mehta D, Peyrot WJ, Hawkes D, Davies D, Howard DM, et al. Quantifying between-cohort and between-sex genetic heterogeneity in major depressive disorder. Am J Med Genet B Neuropsychiatr Genet. 2019;180:439–47.
PubMed Central PubMed Google Scholar
Bjornson-Benson WM, Stibolt TB, Manske KA, Zavela KJ, Youtsey DJ, Buist AS. Monitoring recruitment effectiveness and cost in a clinical trial. Control Clin Trials. 1993;14:52S–67S.
CAS PubMed Google Scholar
Flint J, Chen Y, Shi S, Kendler KS, CONVERGE consortium. Epilogue: Lessons from the CONVERGE study of major depressive disorder in China. J Affect Disord. 2012;140:1–5.
PubMed Google Scholar
Lovato LC, Hill K, Hertert S, Hunninghake DB, Probstfield JL. Recruitment for controlled clinical trials: literature summary and annotated bibliography. Control Clin Trials. 1997;18:328–52.
CAS PubMed Google Scholar
Schizophrenia Psychiatric Genome-Wide Association Study (GWAS) Consortium. Genome-wide association study identifies five new schizophrenia loci. Nat Genet. 2011;43:969–76.
Google Scholar
Psychiatric GWAS Consortium Bipolar Disorder Working Group. Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nat Genet. 2011;43:977–83.
Google Scholar
Lopez R, Scheutz F, Errboe M, Baelum V. Selection bias in case-control studies on periodontitis: a systematic review. Eur J Oral Sci. 2007;115:339–43.
PubMed Google Scholar
Malay S, Chung KC. How to use outcomes questionnaires: pearls and pitfalls. Clin Plast Surg. 2013;40:261–9.
PubMed Google Scholar
Legge SE, Pardiñas AF, Woolway G, Rees E, Cardno AG, Escott-Price V, et al. Genetic and Phenotypic Features of Schizophrenia in the UK Biobank. JAMA Psychiatry. 2024;81:681–90.
PubMed Central PubMed Google Scholar
Taherdoost H Sampling methods in research methodology; How to choose a sampling technique for research. SSRN Electron J. 2016. 2016. https://doi.org/10.2139/ssrn.3205035.
Cross-Disorder Group of the Psychiatric Genomics Consortium, Lee SH, Ripke S, Neale BM, Faraone SV, Purcell SM, et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat Genet. 2013;45:984–94.
PubMed Central Google Scholar
Schmidt M, Schmidt SAJ, Sandegaard JL, Ehrenstein V, Pedersen L, Sørensen HT. The Danish National Patient Registry: a review of content, data quality, and research potential. Clin Epidemiol. 2015;7:449–90.
PubMed Central PubMed Google Scholar
Ludvigsson JF, Andersson E, Ekbom A, Feychting M, Kim J-L, Reuterwall C, et al. External review and validation of the Swedish national inpatient register. BMC Public Health. 2011;11:450.
PubMed Central PubMed Google Scholar
All of Us Research Program Investigators, Denny JC, Rutter JL, Goldstein DB, Philippakis A, Smoller JW, et al. The ‘All of Us’ Research Program. N Engl J Med. 2019;381:668–76.
Google Scholar
Roden DM, Pulley JM, Basford MA, Bernard GR, Clayton EW, Balser JR, et al. Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin Pharmacol Ther. 2008;84:362–9.
CAS PubMed Google Scholar
Gottesman O, Kuivaniemi H, Tromp G, Faucett WA, Li R, Manolio TA, et al. The electronic medical records and genomics (eMERGE) network: past, present, and future. Genet Med. 2013;15:761–71.
PubMed Central PubMed Google Scholar
Engelhard MM, Henao R, Berchuck SI, Chen J, Eichner B, Herkert D, et al. Predictive value of early autism detection models based on electronic health record data collected before age 1 year. JAMA Netw Open. 2023;6:e2254303.
PubMed Central PubMed Google Scholar
Amit G, Bilu Y, Sudry T, Avgil Tsadok M, Zimmerman DR, Baruch R, et al. Early prediction of autistic spectrum disorder using developmental surveillance data. JAMA Netw Open. 2024;7:e2351052.
PubMed Google Scholar
Lichtenstein P, Björk C, Hultman CM, Scolnick E, Sklar P, Sullivan PF. Recurrence risks for schizophrenia in a Swedish national cohort. Psychol Med. 2006;36:1417–25.
PubMed Google Scholar
Ekholm B, Ekholm A, Adolfsson R, Vares M, Osby U, Sedvall GC, et al. Evaluation of diagnostic procedures in Swedish patients with schizophrenia and related psychoses. Nord J Psychiatry. 2005;59:457–64.
PubMed Google Scholar
Rück C, Larsson KJ, Lind K, Perez-Vigil A, Isomura K, Sariaslan A, et al. Validity and reliability of chronic tic disorder and obsessive-compulsive disorder diagnoses in the Swedish National Patient Register. BMJ Open. 2015;5:e007520.
PubMed Central PubMed Google Scholar
Beaulieu-Jones BK, Villamar MF, Scordis P, Bartmann AP, Ali W, Wissel BD, et al. Predicting seizure recurrence after an initial seizure-like episode from routine clinical notes using large language models: a retrospective cohort study. Lancet Digit Health. 2023;5:e882–e894.
CAS PubMed Central PubMed Google Scholar
Yang X, Chen A, PourNejatian N, Shin HC, Smith KE, Parisien C, et al. A large language model for electronic health records. NPJ Digit Med. 2022;5:194.
PubMed Central PubMed Google Scholar
Hersh WR, Weiner MG, Embi PJ, Logan JR, Payne PRO, Bernstam EV, et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care. 2013;51:S30–S37.
PubMed Central PubMed Google Scholar
Abul-Husn NS, Kenny EE. Personalized medicine and the power of electronic health records. Cell. 2019;177:58–69.
CAS PubMed Central PubMed Google Scholar
Croskerry P. The importance of cognitive errors in diagnosis and strategies to minimize them. Acad Med. 2003;78:775–80.
PubMed Google Scholar
Swanson JM. The UK Biobank and selection bias. Lancet. 2012;380:110.
PubMed Google Scholar
Berkson J. Limitations of the application of fourfold table analysis to hospital data. Biom Bull. 1946;2:47.
CAS Google Scholar
Lee YH, Thaweethai T, Sheu Y-H, Feng Y-CA, Karlson EW, Ge T, et al. Impact of selection bias on polygenic risk score estimates in healthcare settings. Psychol Med. 2023;53:7435–45.
PubMed Google Scholar
Dueñas HR, Seah C, Johnson JS, Huckins LM. Implicit bias of encoded variables: frameworks for addressing structured bias in EHR-GWAS data. Hum Mol Genet. 2020;29:R33–R41.
PubMed Central PubMed Google Scholar
Goldstein ND A Researcher’s Guide to Using Electronic Health Records: From Planning to Presentation. CRC Press; (2023).
Beaulieu-Jones BK. Machine Learning Methods to Identify Hidden Phenotypes in the Electronic Health Record. (2017).
Polubriaginof FCG, Vanguri R, Quinnies K, Belbin GM, Yahi A, Salmasian H, et al. Disease heritability inferred from familial relationships reported in medical records. Cell. 2018;173:1692–704.e11.
CAS PubMed Central PubMed Google Scholar
Davis KAS, Coleman JRI, Adams M, Allen N, Breen G, Cullen B, et al. Mental health in UK Biobank - development, implementation and results from an online questionnaire completed by 157 366 participants: a reanalysis. BJPsych Open. 2020;6:e18.
PubMed Central PubMed Google Scholar
Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12:e1001779.
PubMed Central PubMed Google Scholar
Fry A, Littlejohns TJ, Sudlow C, Doherty N, Adamska L, Sprosen T, et al. Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population. Am J Epidemiol. 2017;186:1026–34.
PubMed Central PubMed Google Scholar
van Alten S, Domingue BW, Galama T, Marees AT. Reweighting the UK Biobank to reflect its underlying sampling population substantially reduces pervasive selection bias due to volunteering. bioRxiv. (2022).
Baltes PB, Mayer KU Die Berliner Altersstudie. Akademie Verlag; (1999).
Batty GD, Gale CR, Kivimäki M, Deary IJ, Bell S. Comparison of risk factor associations in UK Biobank against representative, general population based studies with conventional response rates: prospective cohort study and individual participant meta-analysis. BMJ. 2020;368:m131.
PubMed Central PubMed Google Scholar
Munafò MR, Tilling K, Taylor AE, Evans DM, Davey Smith G. Collider scope: when selection bias can substantially influence observed associations. Int J Epidemiol. 2018;47:226–35.
PubMed Google Scholar
Mignogna G, Carey CE, Wedow R, Baya N, Cordioli M, Pirastu N, et al. Patterns of item nonresponse behavior to survey questionnaires are systematic and have a genetic basis. bioRxiv. (2022).
Tyrrell J, Zheng J, Beaumont R, Hinton K, Richardson TG, Wood AR, et al. Genetic predictors of participation in optional components of UK Biobank. Nat Commun. 2021;12:886.
CAS PubMed Central PubMed Google Scholar
Martin J, Tilling K, Hubbard L, Stergiakouli E, Thapar A, Davey Smith G, et al. Association of genetic risk for schizophrenia with nonparticipation over time in a population-based cohort study. Am J Epidemiol. 2016;183:1149–58.
PubMed Central PubMed Google Scholar
Adams MJ, Hill WD, Howard DM, Dashti HS, Davis KAS, Campbell A, et al. Factors associated with sharing e-mail information and mental health survey participation in large population cohorts. Int J Epidemiol. 2020;49:410–21.
PubMed Google Scholar
Rothman KJ, Gallacher JEJ, Hatch EE. Why representativeness should be avoided. Int J Epidemiol. 2013;42:1012–4.
PubMed Central PubMed Google Scholar
Mitchell RE, Hartley AE, Walker VM, Gkatzionis A, Yarmolinsky J, Bell JA, et al. Strategies to investigate and mitigate collider bias in genetic and Mendelian randomisation studies of disease progression. PLoS Genet. 2023;19:e1010596.
CAS PubMed Central PubMed Google Scholar
Lee H, Han B. A theory-based practical solution to correct for sex-differential participation bias. Genome Biol. 2022;23:138.
PubMed Central PubMed Google Scholar
Schoeler T, Speed D, Porcu E, Pirastu N, Pingault J-B, Kutalik Z. Participation bias in the UK Biobank distorts genetic associations and downstream analyses. Nat Hum Behav. 2023;7:1216–27.
PubMed Central PubMed Google Scholar
Hodge SE, Subaran RL, Weissman MM, Fyer AJ. Designing case-control studies: decisions about the controls. Am J Psychiatry. 2012;169:785–9.
PubMed Central PubMed Google Scholar
Lubin JH, Gail MH. Biased selection of controls for case-control analyses of cohort studies. Biometrics. 1984;40:63–75.
CAS PubMed Google Scholar
Wacholder S, McLaughlin JK, Silverman DT, Mandel JS. Selection of controls in case-control studies. I. Principles. Am J Epidemiol. 1992;135:1019–28.
CAS PubMed Google Scholar
Chen TJH, Blum K, Mathews D, Fisher L, Schnautz N, Braverman ER, et al. Are dopaminergic genes involved in a predisposition to pathological aggression? Hypothesizing the importance of ‘super normal controls’ in psychiatricgenetic research of complex behavioral disorders. Med Hypotheses. 2005;65:703–7.
CAS PubMed Google Scholar
Schwartz S, Susser E. The use of well controls: an unhealthy practice in psychiatric research. Psychol Med. 2011;41:1127–31.
CAS PubMed Google Scholar
Kendler KS. Toward a scientific psychiatric nosology. Strengths and limitations. Arch Gen Psychiatry. 1990;47:969–73.
CAS PubMed Google Scholar
Kendler KS, Chatzinakos C, Bacanu S-A. The impact on estimations of genetic correlations by the use of super-normal, unscreened, and family-history screened controls in genome wide case-control studies. Genet Epidemiol. 2020;44:283–9.
PubMed Google Scholar
Wray NR, Pergadia ML, Blackwood DHR, Penninx BWJH, Gordon SD, Nyholt DR, et al. Genome-wide association study of major depressive disorder: new results, meta-analysis, and lessons learned. Mol Psychiatry. 2012;17:36–48.
CAS PubMed Google Scholar
Kirov G, Zaharieva I, Georgieva L, Moskvina V, Nikolov I, Cichon S, et al. A genome-wide association study in 574 schizophrenia trios using DNA pooling. Mol Psychiatry. 2009;14:796–803.
CAS PubMed Google Scholar
O’Donovan MC, Craddock N, Norton N, Williams H, Peirce T, Moskvina V, et al. Identification of loci associated with schizophrenia by genome-wide association and follow-up. Nat Genet. 2008;40:1053–5.
PubMed Google Scholar
Peyrot WJ, Boomsma DI, Penninx BWJH, Wray NR. Disease and polygenic architecture: avoid trio design and appropriately account for unscreened control subjects for common disease. Am J Hum Genet. 2016;98:382–91.
CAS PubMed Central PubMed Google Scholar
Wray NR, Lee SH, Kendler KS. Impact of diagnostic misclassification on estimation of genetic correlations using genome-wide genotypes. Eur J Hum Genet. 2012;20:668–74.
PubMed Central PubMed Google Scholar
Border R, Athanasiadis G, Buil A, Schork AJ, Cai N, Young AI, et al. Cross-trait assortative mating is widespread and inflates genetic correlation estimates. Science. 2022;378:754–61.
CAS PubMed Central PubMed Google Scholar
Jermy BS, Glanville KP, Coleman JRI, Lewis CM, Vassos E. Exploring the genetic heterogeneity in major depression across diagnostic criteria. Mol Psychiatry. 2021;26:7337–45.
CAS PubMed Central PubMed Google Scholar
Xue A, Jiang L, Zhu Z, Wray NR, Visscher PM, Zeng J, et al. Genome-wide analyses of behavioural traits are subject to bias by misreports and longitudinal changes. Nat Commun. 2021;12:20211.
CAS PubMed Google Scholar
van Rheenen W, Peyrot WJ, Schork AJ, Lee SH, Wray NR. Genetic correlations of polygenic disease traits: from theory to practice. Nat Rev Genet. 2019;20:567–81.
PubMed Google Scholar
Han B, Eskin E. Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am J Hum Genet. 2011;88:586–98.
CAS PubMed Central PubMed Google Scholar
Dahl A, Thompson M, An U, Krebs M, Appadurai V, Border R, et al. Phenotype integration improves power and preserves specificity in biobank-based genetic studies of MDD. bioRxiv. (2022).
Hujoel MLA, Gazal S, Loh P-R, Patterson N, Price AL. Liability threshold modeling of case-control status and family history of disease increases association power. Nat Genet. 2020;52:541–7.
CAS PubMed Central PubMed Google Scholar
Turley P, Walters RK, Maghzian O, Okbay A, Lee JJ, Fontana MA, et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat Genet. 2018;50:229–37.
CAS PubMed Central PubMed Google Scholar
Grotzinger AD, Rhemtulla M, de Vlaming R, Ritchie SJ, Mallard TT, Hill WD, et al. Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat Hum Behav. 2019;3:513–25.
PubMed Central PubMed Google Scholar
An U, Pazokitoroudi A, Alvarez M, Huang L, Bacanu S, Schork AJ, et al. Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries. Nat Genet. 2023;55:2269–76.
CAS PubMed Central PubMed Google Scholar
PsychENCODE Consortium, Akbarian S, Liu C, Knowles JA, Vaccarino FM, Farnham PJ, et al. The PsychENCODE project. Nat Neurosci. 2015;18:1707–12.
Google Scholar
Wang D, Liu S, Warrell J, Won H, Shi X, Navarro FCP, et al. Comprehensive functional genomic resource and integrative model for the human brain. Science. 2018;362:eaat8464.
Gandal MJ, Zhang P, Hadjimichael E, Walker RL, Chen C, Liu S, et al. Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. Science. 2018;362:eaat8127.
Gandal MJ, Haney JR, Parikshak NN, Leppa V, Ramaswami G, Hartl C, et al. Shared Molecular Neuropathology Across Major Psychiatric Disorders Parallels Polygenic Overlap. Focus. 2019;17:66–72.
PubMed Central PubMed Google Scholar
Opel N, Goltermann J, Hermesdorf M, Berger K, Baune BT, Dannlowski U. Cross-disorder analysis of brain structural abnormalities in six major psychiatric disorders: a secondary analysis of mega- and meta-analytical findings from the ENIGMA consortium. Biol Psychiatry. 2020;88:678–86.
PubMed Google Scholar
Hettwer MD, Lariviere S, Park B-Y, van den Heuvel OA, Schmaal L, Andreassen OA, et al. Coordinated cortical thickness alterations across psychiatric conditions: A transdiagnostic ENIGMA study. bioRxiv. (2022).
Balliu B, Douglas C, Shenhav L, Wu Y, Seok D, Chatzopoulou D, et al. Personalized mood prediction from patterns of behavior collected with smartphones. bioRxiv. (2022).
Freimer NB, Mohr DC. Integrating behavioural health tracking in human genetics research. Nat Rev Genet. 2019;20:129–30.
CAS PubMed Central PubMed Google Scholar
Pirastu N, Cordioli M, Nandakumar P, Mignogna G, Abdellaoui A, Hollis B, et al. Genetic analyses identify widespread sex-differential participation bias. Nat Genet. 2021;53:663–71.
CAS PubMed Central PubMed Google Scholar
Griffith GJ, Morris TT, Tudball MJ, Herbert A, Mancano G, Pike L, et al. Collider bias undermines our understanding of COVID-19 disease risk and severity. Nat Commun. 2020;11:5749.
CAS PubMed Central PubMed Google Scholar
Gkatzionis A, Burgess S. Contextualizing selection bias in Mendelian randomization: how bad is it likely to be? Int J Epidemiol. 2019;48:691–701.
PubMed Google Scholar
Cole SR, Hernán MA. Constructing inverse probability weights for marginal structural models. Am J Epidemiol. 2008;168:656–64.
PubMed Central PubMed Google Scholar
Dudbridge F, Allen RJ, Sheehan NA, Schmidt AF, Lee JC, Jenkins RG, et al. Adjustment for index event bias in genome-wide association studies of subsequent events. Nat Commun. 2019;10:1561.
PubMed Central PubMed Google Scholar
Cai S, Hartley A, Mahmoud O, Tilling K, Dudbridge F. Adjusting for collider bias in genetic association studies using instrumental variable methods. Genet Epidemiol. 2022;46:303–16.
CAS PubMed Central PubMed Google Scholar
Mahmoud O, Dudbridge F, Davey Smith G, Munafo M, Tilling K. A robust method for collider bias correction in conditional genome-wide association studies. Nat Commun. 2022;13:619.
CAS PubMed Central PubMed Google Scholar
Qi G, Chatterjee N. Mendelian randomization analysis using mixture models for robust and efficient estimation of causal effects. Nat Commun. 2019;10:1941.
PubMed Central PubMed Google Scholar
Cinelli C, LaPierre N, Hill BL, Sankararaman S, Eskin E. Robust Mendelian randomization in the presence of residual population stratification, batch effects and horizontal pleiotropy. Nat Commun. 2022;13:1093.
CAS PubMed Central PubMed Google Scholar
Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–9.
CAS PubMed Central PubMed Google Scholar
Thorp JG, Campos AI, Grotzinger AD, Gerring ZF, An J, Ong J-S, et al. Symptom-level modelling unravels the shared genetic architecture of anxiety and depression. Nat Hum Behav. 2021;5:1432–42.
PubMed Google Scholar
Nakada S, Ho FK, Celis-Morales C, Jackson CA, Pell JP. Individual and joint associations of anxiety disorder and depression with cardiovascular disease: A UK Biobank prospective cohort study. Eur Psychiatry. 2023;66:e54.
PubMed Central PubMed Google Scholar
Qiao Y, Ding Y, Li G, Lu Y, Li S, Ke C. Role of depression in the development of cardiometabolic multimorbidity: Findings from the UK Biobank study. J Affect Disord. 2022;319:260–6.
PubMed Google Scholar
Han X, Hou C, Yang H, Chen W, Ying Z, Hu Y, et al. Disease trajectories and mortality among individuals diagnosed with depression: a community-based cohort study in UK Biobank. Mol Psychiatry. 2021;26:6736–46.
PubMed Central PubMed Google Scholar
Mulugeta A, Zhou A, King C, Hyppönen E. Association between major depressive disorder and multiple disease outcomes: a phenome-wide Mendelian randomisation study in the UK Biobank. Mol Psychiatry. 2020;25:1469–76.
PubMed Google Scholar
Magnus P, Birke C, Vejrup K, Haugan A, Alsaker E, Daltveit AK, et al. Cohort Profile Update: The Norwegian Mother and Child Cohort Study (MoBa). Int J Epidemiol. 2016;45:382–8.
PubMed Google Scholar
Havdahl A, Wootton RE, Leppert B, Riglin L, Ask H, Tesli M, et al. Associations between pregnancy-related predisposing factors for offspring neurodevelopmental conditions and parental genetic liability to attention-deficit/hyperactivity disorder, autism, and Schizophrenia: The Norwegian Mother, Father and Child Cohort Study (MoBa). JAMA Psychiatry. 2022;79:799–810.
PubMed Central PubMed Google Scholar
Plana-Ripoll O, Pedersen CB, Holtz Y, Benros ME, Dalsgaard S, de Jonge P, et al. Exploring comorbidity within mental disorders among a Danish National population. JAMA Psychiatry. 2019;76:259–70.
PubMed Central PubMed Google Scholar
Krebs MD, Themudo GE, Benros ME, Mors O, Børglum AD, Hougaard D, et al. Associations between patterns in comorbid diagnostic trajectories of individuals with schizophrenia and etiological factors. Nat Commun. 2021;12:6617.
CAS PubMed Central PubMed Google Scholar
Kendler KS, Ohlsson H, Sundquist J, Sundquist K Selecting cases of major psychiatric and substance use disorders in Swedish national registries on the basis of clinical features to maximize the strength or specificity of the genetic risk. Mol Psychiatry. 2023. 2023. https://doi.org/10.1038/s41380-023-02156-2.
LaBianca S, Brikell I, Helenius D, Loughnan R, Mefford J, Palmer CE, et al. Polygenic profiles define aspects of clinical heterogeneity in attention deficit hyperactivity disorder. Nat Genet. 2023. 2023. https://doi.org/10.1038/s41588-023-01593-7.
Musliner KL, Krebs MD, Albiñana C, Vilhjalmsson B, Agerbo E, Zandi PP, et al. Polygenic risk and progression to bipolar or psychotic disorders among individuals diagnosed with unipolar depression in early life. Am J Psychiatry. 2020;177:936–43.
PubMed Google Scholar
Kendler KS, Ohlsson H, Sundquist J, Sundquist K. Family genetic risk scores and the genetic architecture of major affective and psychotic disorders in a Swedish national sample. JAMA Psychiatry. 2021;78:735–43.
PubMed Google Scholar
Kendler KS, Ohlsson H, Sundquist J, Sundquist K. The patterns of family genetic risk scores for eleven major psychiatric and substance use disorders in a Swedish national sample. Transl Psychiatry. 2021;11:326.
PubMed Central PubMed Google Scholar
Dybdahl Krebs M, Georgii Hellberg K-L, Lundberg M, Appadurai V, Ohlsson H, Pedersen EM, et al. PA-FGRS is a novel estimator of pedigree-based genetic liability that complements genotype-based inferences into the genetic architecture of major depressive disorder. bioRxiv. (2023).
Dybdahl Krebs M, Appadurai V, Georgii Hellberg K-L, Ohlsson H, Steinbach J, Pedersen E, et al. The relationship between genotype- and phenotype-based estimates of genetic liability to psychiatric disorders, in practice and in theory. bioRxiv. (2023).
De la Hoz JF, Arias A, Service, SK, Castaño M, Diaz-Zuluaga AM, Song J, et al. Electronic health records reveal transdiagnostic clinical features and diverse trajectories of serious mental illness. bioRxiv. (2022).
Bromet E, Andrade LH, Hwang I, Sampson NA, Alonso J, de Girolamo G, et al. Cross-national epidemiology of DSM-IV major depressive episode. BMC Med. 2011;9:90.
PubMed Central PubMed Google Scholar
Studer M, Ritschard G. What matters in differences between life trajectories: a comparative review of sequence dissimilarity measures. J R Stat Soc Ser A Stat Soc. 2016;179:481–511.
Google Scholar
Cortes A, Dendrou CA, Motyer A, Jostins L, Vukcevic D, Dilthey A, et al. Bayesian analysis of genetic association across tree-structured routine healthcare data in the UK Biobank. Nat Genet. 2017;49:1311–8.
CAS PubMed Central PubMed Google Scholar
Cortes A, Albers PK, Dendrou CA, Fugger L, McVean G. Identifying cross-disease components of genetic risk across hospital data in the UK Biobank. Nat Genet. 2020;52:126–34.
CAS PubMed Google Scholar
Zhang Y, Jiang X, Mentzer AJ, McVean G, Lunter G. Topic modeling identifies novel genetic loci associated with multimorbidities in UK Biobank. Cell Genom. 2023;3:100371.
CAS PubMed Central PubMed Google Scholar
Kendler KS, Pedersen NL, Neale MC, Mathé AA. A pilot Swedish twin study of affective illness including hospital- and population-ascertained subsamples: results of model fitting. Behav Genet. 1995;25:217–32.
CAS PubMed Google Scholar
Sanchez-Roige S, Palmer AA, Fontanillas P, Elson SL, 23andMe Research Team, the Substance Use Disorder Working Group of the Psychiatric Genomics Consortium, Adams MJ, et al. Genome-Wide Association Study Meta-Analysis of the Alcohol Use Disorders Identification Test (AUDIT) in Two Population-Based Cohorts. Am J Psychiatry. 2019;176:107–18.
PubMed Google Scholar
Kendler KS, Gallagher TJ, Abelson JM, Kessler RC. Lifetime prevalence, demographic risk factors, and diagnostic validity of nonaffective psychosis as assessed in a US community sample. The National Comorbidity Survey. Arch Gen Psychiatry. 1996;53:1022–31.
CAS PubMed Google Scholar
All of Us Research Program Genomics Investigators. Genomic data in the all of us research program. Nature. 2024;627:340–6.
Google Scholar
Verma A, Huffman JE, Rodriguez A, Conery M, Liu M, Ho Y-L, et al. Diversity and scale: genetic architecture of 2068 traits in the VA Million Veteran Program. Science. 2024;385:eadj1182.
CAS PubMed Google Scholar
Belbin GM, Cullina S, Wenric S, Soper ER, Glicksberg BS, Torre D, et al. Toward a fine-scale population health monitoring system. Cell. 2021;184:2068–83.e11.
CAS PubMed Google Scholar
Smith MA, Gigot M, Harburn A, Bednarz L, Curtis K, Mathew J, et al. Insights into measuring health disparities using electronic health records from a statewide network of health systems: A case study. J Clin Transl Sci. 2023;7:e54.
PubMed Central PubMed Google Scholar
Yan C, Zhang X, Yang Y, Kang K, Were MC, Embí P, et al. Differences in health professionals’ engagement with electronic health records based on inpatient race and ethnicity. JAMA Netw Open. 2023;6:e2336383.
PubMed Central PubMed Google Scholar
Hsu C-Y, Yang W, Parikh RV, Anderson AH, Chen TK, Cohen DL, et al. Race, genetic ancestry, and estimating kidney function in CKD. N Engl J Med. 2021;385:1750–60.
CAS PubMed Central PubMed Google Scholar

Download references

Acknowledgements

BV is supported by the Brain and Behavior Research Foundation (BBRF 31397). OAA a consultant to Cortechs.ai and Precision Health, and received speaker’s honorarium from Lundbeck, Janssen, Otsuka and Sunovion. He is supported by the Research Council of Norway (#324499, #324252, #296030), NIH 1R01MH124839, KG Jebsen Stiftelsen (SKGJ-MED-021), European Union’s Horizon 2020 RIA grant (#964874). JB is supported by the EU-AIMS (European Autism Interventions) and AIMS-2-TRIALS programs which receive support from Innovative Medicines Initiative Joint Undertaking Grant No. 115300 and 777394, the resources of which are composed of financial contributions from the European Union’s FP7 and Horizon2020 Programs, and from the European Federation of Pharmaceutical Industries and Associations (EFPIA) companies’ in-kind contributions, and AUTISM SPEAKS, Autistica and SFARI; and by the Horizon2020 supported programs CANDY Grant No. 847818, and R2D2 Grant No. 101057385. AG was supported by NIH Grant R01MH120219. KJ is a consultant to Allia Health. PL was supported by R01MH119243. TTM was supported by K08MH135343. EMTD was supported by R01MH120219. KSK was supported by R01MH130665 and U01MH126798. These funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Any views expressed are those of the author(s) and not necessarily those of the funders.

Author information

These authors contributed equally: Na Cai, Brad Verhulst.

Authors and Affiliations

Helmholtz Pioneer Campus, Helmholtz Munich, Neuherberg, Germany
Na Cai
Computational Health Centre, Helmholtz Munich, Neuherberg, Germany
Na Cai
School of Medicine and Health, Technical University of Munich, Munich, Germany
Na Cai
Department of Psychiatry and Behavioral Sciences, Texas A&M University, College Station, TX, USA
Brad Verhulst
Centre of Precision Psychiatry, University of Oslo, Oslo, Norway
Ole A. Andreassen
Division of Mental Health and Addiction, Oslo University Hospital, Oslo, Norway
Ole A. Andreassen
KG Jebsen Centre for Neurodevelopmental disorders, University of Oslo, Oslo, Norway
Ole A. Andreassen
Department of Cognitive Neuroscience, Donders Institute for Brain, Cognition and Behavior, Radboud University Medical Center, Nijmegen, The Netherlands
Jan Buitelaar
Karakter Child and Adolescent University Center, Nijmegen, The Netherlands
Jan Buitelaar
Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN, USA
Howard J. Edenberg
Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
Howard J. Edenberg & John I. Nurnberger Jr
Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA
John M. Hettema, Michael C. Neale & Kenneth S. Kendler
Departments of Psychiatry and Genetics, University of Pennsylvania, Philadelphia, PA, USA
Michael Gandal
Lifespan Brain Institute at Penn Med and the Children’s Hospital of Philadelphia, Philadelphia, PA, USA
Michael Gandal
Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO, USA
Andrew Grotzinger
Department of Psychology and Neuroscience, University of Colorado Boulder, Boulder, CO, USA
Andrew Grotzinger
Department of Psychiatry & Behavioral Health, Stony Brook University, Stony Brook, NY, USA
Katherine Jonas
Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
Phil Lee
Department of Psychiatry, Harvard Medical School, Boston, MA, USA
Phil Lee
Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
Travis T. Mallard & Jordan W. Smoller
Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
Travis T. Mallard & Jordan W. Smoller
Department of Community Health and Epidemiology and Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada
Manuel Mattheisen
Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital of Munich, Munich, Germany
Manuel Mattheisen
Department of Biomedicine, Aarhus University, Aarhus, Denmark
Manuel Mattheisen
Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA
Michael C. Neale & Kenneth S. Kendler
Department of Psychiatry, Indiana University School of Medicine, Indianapolis, IN, USA
John I. Nurnberger Jr
Stark Neurosciences Research Institute, Indiana University School of Medicine, Indianapolis, IN, USA
John I. Nurnberger Jr
Department of Psychiatry, Amsterdam UMC, Vrije Universiteit, Amsterdam, The Netherlands
Wouter J. Peyrot
Amsterdam Public Health, Amsterdam UMC, Vrije Universiteit, Amsterdam, The Netherlands
Wouter J. Peyrot
Department of Psychology, University of Texas at Austin, Austin, TX, USA
Elliot M. Tucker-Drob
Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Jordan W. Smoller

Authors

Na Cai
View author publications
Search author on:PubMed Google Scholar
Brad Verhulst
View author publications
Search author on:PubMed Google Scholar
Ole A. Andreassen
View author publications
Search author on:PubMed Google Scholar
Jan Buitelaar
View author publications
Search author on:PubMed Google Scholar
Howard J. Edenberg
View author publications
Search author on:PubMed Google Scholar
John M. Hettema
View author publications
Search author on:PubMed Google Scholar
Michael Gandal
View author publications
Search author on:PubMed Google Scholar
Andrew Grotzinger
View author publications
Search author on:PubMed Google Scholar
Katherine Jonas
View author publications
Search author on:PubMed Google Scholar
Phil Lee
View author publications
Search author on:PubMed Google Scholar
Travis T. Mallard
View author publications
Search author on:PubMed Google Scholar
Manuel Mattheisen
View author publications
Search author on:PubMed Google Scholar
Michael C. Neale
View author publications
Search author on:PubMed Google Scholar
John I. Nurnberger Jr
View author publications
Search author on:PubMed Google Scholar
Wouter J. Peyrot
View author publications
Search author on:PubMed Google Scholar
Elliot M. Tucker-Drob
View author publications
Search author on:PubMed Google Scholar
Jordan W. Smoller
View author publications
Search author on:PubMed Google Scholar
Kenneth S. Kendler
View author publications
Search author on:PubMed Google Scholar

Contributions

NC, BV and KSK outlined the specific issues reviewed in this paper and prepared the first draft of the manuscript. OA, JB, HE, JH, MG, AG, KJ, PL, TM, MM, MN, JN, WP, ET-D, and JW participated in the initial discussions on developing this paper, reviewed the initial draft and provided important input into the final document which was reviewed and approved by all authors.

Corresponding author

Correspondence to Kenneth S. Kendler.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: In this article the author’s name Wouter J. Peyrot was incorrectly written as Wouter Peyrout. The original article has been corrected.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cai, N., Verhulst, B., Andreassen, O.A. et al. Assessment and ascertainment in psychiatric molecular genetics: challenges and opportunities for cross-disorder research. Mol Psychiatry 30, 1627–1638 (2025). https://doi.org/10.1038/s41380-024-02878-x

Download citation

Received: 19 June 2024
Revised: 07 November 2024
Accepted: 16 December 2024
Published: 27 December 2024
Version of record: 27 December 2024
Issue date: April 2025
DOI: https://doi.org/10.1038/s41380-024-02878-x

This article is cited by

The predicament of heritable confounders
- Na Cai
- Andy Dahl
- Jonathan Flint
Nature Genetics (2026)
Defining suicidality phenotypes for genetic studies: perspectives of the Psychiatric Genomics Consortium Suicide Working Group
- Sarah M. C. Colbert
- Eric T. Monson
- Lea Zillich
Molecular Psychiatry (2025)