Table 1 Sources of dirty data.

From: Predicting the future of neuroimaging predictive models in mental health

Category

Problem

Examples

Phenotypic measures

Measures are subjective

• Poor inter-rater reliability and high variability in gold-standard diagnostic tools and behavioral measures [33,34,35, 81]

 

Measures are nonspecific

• High false-positive rate on ADOS in adults with schizophrenia [36]

 

Measures focus on the tails of behavior

• Healthy controls will be zero inflated on questionnaire data [37, 38]

Participants

Comorbidity

• Symptoms of psychiatric disorders often overlap across diagnoses, while the majority of predictive models in psychiatry rely on more binary classification approaches

 

Medication

• Psychiatric medications have the ability to alter BOLD signal patterns. This becomes difficult to study the psychiatric phenomena of interest as signals are confounded

 

Episodic symptoms

• Symptoms change as a function of disease state. Data from scans based on one day may be vastly different in brain states relative to scans based on another day

Data collection

Multi-site

• Inter-scanner differences can induce significant variability [82, 83], and the complexity of the data analysis workflows could affect reproducibility [84]

 

Missing data

• Subjects not completing questionnaires

• Inability to complete behavioral testing or scan sessions in clinical populations [85, 86]