Table 2 Sources of biases in individual disorder genetics, their potential impact on cross disorder genetic studies, mitigating factors in analyses, and recommendations for future data collection taking these biases and their effects into account.

From: Assessment and ascertainment in psychiatric molecular genetics: challenges and opportunities for cross-disorder research

Source of bias

Potential effect on cross-disorder analysis

Strategies for mitigation

Future data collection recommendations

Depth

Diagnoses from shallow phenotyping assessments are usually less valid, incurring high levels of misdiagnosis that may not be random

Nonspecific genetic effects on individual disorders; inflated rG between disorders in the case of bidirectional misdiagnosis between two disorders; mixture of unknown biases in shared genetic effects identified between two disorders

Assess the replicability or heterogeneity of effects across assessment strategies; assess specificity of polygenic risk scores (PRS); methods to combine different measures while maintaining specificity: (1) LT-FH, (2) MTAG, (3) Genomic SEM, (4) imputation

Use deep phenotyping where possible; use brief self-report versions of full diagnostic criteria if only shallow phenotyping is possible; repeated assessments; expand collection of data to include non-DSM symptoms and non-diagnostic information to supplement clinical characteristics obtained

Source

Diagnoses made by different sources may have different levels of validity and biases; concordance between diagnoses made by different sources may differ by disorder

Assessments by trained mental health professionals who are familiar with the relevant symptomatology of individual disorders and their usual comorbidities; establish quality control of interviewers; complement interviews with doctors’ notes, prescription and other medical records. For online assessment, avoid single item screens and use brief measures that assess full diagnostic criteria

Timeframe

Diagnoses made in different timeframes may reflect subsyndromal states or lifetime liabilities to disorder; effects compound with those from source of info and depth of assessment; effect of timeframe may be disorder specific

Focus on diagnoses made with assessments of lifetime, not current, symptoms; repeated assessments

EHRs

Only capture those who interact with the healthcare system, who may be unhealthier while having higher socioeconomic status than the population

Genetics of individual disorders unrepresentative of disorder in the population; rG between disorders may be inflated if they share common ascertainment patterns; disorders with different levels of dysfunction may not share ascertainment patterns, leading to deflation in their rGs

Epidemiologically verify disorder validity using known relationships with non-clinical factors; inverse probability (IP) weightings to improve representativeness (up-weighting participants with features identified to be associated with lower participation)

Collection of non-clinical epidemiological information, collection of repeated measurements; for international studies, pay attention to translations of assessment instruments and, when possible, assess your success though measurement non-invariance techniques

Biobanks

Only capture those who volunteer to participate, who may be healthier, better educated, and of higher socioeconomic status than the population

Case-control cohorts

Biased toward treatment-seeking, high severity, excess comorbidity, and treatment non-responsiveness

Exaggerated case-control differences; may deflate rG proportional between disorders depending on genetic sharing between high severity forms of both disorders

Assess extent of biases to understand differences between inclusion and exclusion criteria

Design an ascertainment frame for cases that avoids oversampling of those with severe and/or treatment resistant illness

The exclusion of prior or lifetime diagnoses of other disorders

Deflate rG between disorders as explicitly removed those with shared genetics

Collect information on prior or lifetime diagnosis of other disorders to assess their impact on individual disorder liability and cross disorder sharing

Screened “super-normal” controls

Screened for the disorder being studied and other psychiatric disorders not screened out in cases

Exaggerates case-control differences; produces spurious co-aggregation between disorders; inflates rG proportional to the population prevalence of the two disorders

Predicting disorder liability for unscreened controls

Use representative (not super-normal) controls

Unscreened controls

Containing cases of the target disorder at approximately the population prevalence

Genetic associations of the GWAS are downwardly biased, with the magnitude of the bias increasing for more prevalent disorders in the population, affecting rG between disorders accordingly

Screen controls as much as possible