Table 2 Sources of biases in individual disorder genetics, their potential impact on cross disorder genetic studies, mitigating factors in analyses, and recommendations for future data collection taking these biases and their effects into account.
Source of bias | Potential effect on cross-disorder analysis | Strategies for mitigation | Future data collection recommendations | |
---|---|---|---|---|
Depth | Diagnoses from shallow phenotyping assessments are usually less valid, incurring high levels of misdiagnosis that may not be random | Nonspecific genetic effects on individual disorders; inflated rG between disorders in the case of bidirectional misdiagnosis between two disorders; mixture of unknown biases in shared genetic effects identified between two disorders | Assess the replicability or heterogeneity of effects across assessment strategies; assess specificity of polygenic risk scores (PRS); methods to combine different measures while maintaining specificity: (1) LT-FH, (2) MTAG, (3) Genomic SEM, (4) imputation | Use deep phenotyping where possible; use brief self-report versions of full diagnostic criteria if only shallow phenotyping is possible; repeated assessments; expand collection of data to include non-DSM symptoms and non-diagnostic information to supplement clinical characteristics obtained |
Source | Diagnoses made by different sources may have different levels of validity and biases; concordance between diagnoses made by different sources may differ by disorder | Assessments by trained mental health professionals who are familiar with the relevant symptomatology of individual disorders and their usual comorbidities; establish quality control of interviewers; complement interviews with doctors’ notes, prescription and other medical records. For online assessment, avoid single item screens and use brief measures that assess full diagnostic criteria | ||
Timeframe | Diagnoses made in different timeframes may reflect subsyndromal states or lifetime liabilities to disorder; effects compound with those from source of info and depth of assessment; effect of timeframe may be disorder specific | Focus on diagnoses made with assessments of lifetime, not current, symptoms; repeated assessments | ||
EHRs | Only capture those who interact with the healthcare system, who may be unhealthier while having higher socioeconomic status than the population | Genetics of individual disorders unrepresentative of disorder in the population; rG between disorders may be inflated if they share common ascertainment patterns; disorders with different levels of dysfunction may not share ascertainment patterns, leading to deflation in their rGs | Epidemiologically verify disorder validity using known relationships with non-clinical factors; inverse probability (IP) weightings to improve representativeness (up-weighting participants with features identified to be associated with lower participation) | Collection of non-clinical epidemiological information, collection of repeated measurements; for international studies, pay attention to translations of assessment instruments and, when possible, assess your success though measurement non-invariance techniques |
Biobanks | Only capture those who volunteer to participate, who may be healthier, better educated, and of higher socioeconomic status than the population | |||
Case-control cohorts | Biased toward treatment-seeking, high severity, excess comorbidity, and treatment non-responsiveness | Exaggerated case-control differences; may deflate rG proportional between disorders depending on genetic sharing between high severity forms of both disorders | Assess extent of biases to understand differences between inclusion and exclusion criteria | Design an ascertainment frame for cases that avoids oversampling of those with severe and/or treatment resistant illness |
The exclusion of prior or lifetime diagnoses of other disorders | Deflate rG between disorders as explicitly removed those with shared genetics | Collect information on prior or lifetime diagnosis of other disorders to assess their impact on individual disorder liability and cross disorder sharing | ||
Screened “super-normal” controls | Screened for the disorder being studied and other psychiatric disorders not screened out in cases | Exaggerates case-control differences; produces spurious co-aggregation between disorders; inflates rG proportional to the population prevalence of the two disorders | Predicting disorder liability for unscreened controls | Use representative (not super-normal) controls |
Unscreened controls | Containing cases of the target disorder at approximately the population prevalence | Genetic associations of the GWAS are downwardly biased, with the magnitude of the bias increasing for more prevalent disorders in the population, affecting rG between disorders accordingly | Screen controls as much as possible |