Assessing the validity of a self-reported clinical diagnosis of schizophrenia

Woolway, Grace E.; Legge, Sophie E.; Lynham, Amy J.; Smart, Sophie E.; Hubbard, Leon; Daniel, Ellie R.; Pardiñas, Antonio F.; Escott-Price, Valentina; O’Donovan, Michael C.; Owen, Michael J.; Jones, Ian R.; Walters, James T. R.

doi:10.1038/s41537-024-00526-5

Download PDF

Article
Open access
Published: 30 October 2024

Assessing the validity of a self-reported clinical diagnosis of schizophrenia

Schizophrenia volume 10, Article number: 99 (2024) Cite this article

3026 Accesses
5 Citations
Metrics details

Subjects

Abstract

The increasing availability of biobanks is changing the way individuals are identified for genomic research. This study assesses the validity of a self-reported clinical diagnosis of schizophrenia. The study included 1744 clinically-ascertained participants with schizophrenia or schizoaffective disorder depressed-type (SA-D) diagnosed by self-report and/or research interview and 1453 UK Biobank participants with self-reported and/or medical record diagnosis of schizophrenia or SA-D. Unaffected controls included a total of 501,837 participants. We assessed the positive predictive values (PPV) of self-reported clinical diagnoses against research interview and medical record diagnoses. Polygenic risk scores (PRS) and phenotypes relating to demographics, education and employment were compared across diagnostic groups. The variance explained (r²) in schizophrenia PRS for each diagnostic group was compared to samples in the Psychiatric Genomics Consortium (PGC). In the clinically-ascertained participants, the PPV of self-reported schizophrenia for a research diagnosis of schizophrenia was 0.70, which increased to 0.81 after expanding the research diagnosis to schizophrenia or SA-D. In UK Biobank, the PPV of self-reported schizophrenia for a medical record diagnosis was 0.74. Compared to participants who self-reported, participants with a clinically-ascertained research diagnosis were younger and more likely to have a high school qualification. Participants with a medical record diagnosis in UK Biobank were less likely to be employed or have a high school qualification than those who self-reported. Schizophrenia PRS did not differ between participants that had a diagnosis from self-report, research diagnosis or medical records. Polygenic liability r², for all diagnosis definitions, fell within the distribution of PGC schizophrenia cohorts. Self-reported measures of schizophrenia are justified in genomic research to maximise sample size and reduce the burden of in-depth interviews on participants, although within sample validation of diagnoses is recommended.

Novel disease associations with schizophrenia genetic risk revealed in ~400,000 UK Biobank participants

Article 19 November 2021

Investigating the effects of genetic risk of schizophrenia on behavioural traits

Article Open access 22 January 2021

Polygenic scores for psychiatric disorders in a diverse postmortem brain tissue cohort

Article Open access 24 January 2023

Introduction

Schizophrenia is a severe mental health condition characterised by positive, negative, and disorganised symptoms as well as cognitive deficits¹ and has a lifetime prevalence of 0.32% worldwide². In research studies, schizophrenia diagnoses are now determined from various sources; for example, research interview and/or clinical note review, electronic heath records, or diagnoses can be based on a self-report of a clinical diagnosis made by a health professional. Methods combining diagnostic interviews and note reviews are considered the gold-standard for defining cases in research³, but are resource intensive and often associated with ascertainment biases; excluding those severely affected due to the lengthy interview but also milder cases that are not in contact in secondary health services leading to unrepresentative samples^4,5.

Traditionally, genomics research has been founded on clinically-ascertained samples that provide a clinical or research interview diagnosis. The availability of large-scale biobanks only sets to increase in the coming years⁶ and holds great potential for psychiatric genomics, but the appropriateness and validity of these different diagnostic sources is unclear.

Diagnoses generated from medical records have been shown to have good concordance with research interview diagnoses^7,8, with particularly high convergence seen in schizophrenia^9,10,11. Ascertainment through medical records overcomes some of the practical limitations for participation, but still hinders representation by relying on records typically from secondary care³ and thus under-represents patients less severely affected who are less likely to be admitted to hospital. Alternatively, a self-reported diagnosis from a health professional could be one approach to improving representativeness and increasing sample size in genomic research, circumventing the need for a labour-intensive research interview. However, the validity of self-reported diagnoses is likely to differ between psychiatric disorders, contexts, and cultures. Research using self-reported medical diagnoses from large-scale genomic datasets such as 23andMe^{12,13,14,15,16,17}, UK Biobank^{13,16,18,19,20,21} and the Million Veterans Programme^16,22 are now common, but the reliability and validity of self-reported diagnoses is unclear²³.

It is also unknown what impact different diagnostic methodologies such as self-report and medical records have on the outcome of genetic studies²⁴. In order to enhance power in GWAS individuals with a self-reported diagnosis have been included^13,17, and are likely to be increasingly so, despite some studies suggesting that individuals defined using minimal phenotyping approaches show genetic differences to participants who are strictly defined^23,25,26. In one study, the effect sizes for schizophrenia polygenic risk scores (PRS) were reported to be smaller in samples where diagnoses are derived from electronic health records compared to clinically-ascertained case-control research cohorts in the Psychiatric Genomics Consortium (PGC)²⁷. However, analyses comparing samples from the Schizophrenia Working Group of the PGC found no differences in PRS across consensus DSM/ICD diagnosis (by psychiatrists), diagnostic interview, medical records, and mixed methods²⁸. To our knowledge, there is no published research comparing a self-reported clinical diagnosis of schizophrenia from a health professional against a gold-standard research interview diagnosis. In this study, we address this knowledge gap and assess whether a self-reported clinical diagnosis of schizophrenia is a valid approach to identify relevant individuals for genomic research.

Methods

Participants

Study participants came from two clinically-ascertained Cardiff University cohorts, the National Centre for Mental Health (NCMH) and CardiffCOGS, and from the UK Biobank. All participants provided written informed consent. Table 1 provides information on the assessments used to determine diagnosis in each sample. A flowchart of the samples, methods and number of participants recruited is shown in Fig. 1.

Table 1 Diagnosis definitions for self-reported, research interview, and medical record diagnosis groups.

Full size table

**Fig. 1: Sample recruitment, assessment methods and number of participants.**

NCMH

NCMH participants were recruited via health care services, voluntary organisations or via public advertisement²⁹. Trained researchers administered a brief standardized assessment to gather demographic and clinical information and participants were asked to provide a sample for DNA extraction and genetic analyses. Participants self-reporting a schizophrenia, psychosis or affective diagnosis were invited to take part in a research interview based on the Schedules for Clinical Assessment in Neuropsychiatry (SCAN)³⁰. NCMH received approval from Health Research Authority and Wales Research Ethics Committee (REC) 2 (16/WA/0323).

CardiffCOGS

CardiffCOGS participants were recruited from community, in-patient and voluntary sector mental health services across the UK³¹. All participants completed a SCAN-based research interview³⁰, underwent a case-note review and were asked to provide a sample for DNA extraction and genetic analyses. CardiffCOGS received approval from Southeast Wales REC (07/WSE03/110). CardiffCOGS participants were included to increase the sample size for the genetic analysis. These participants were not included in comparisons of self-report and research diagnoses, as self-report diagnoses are not available in this sample.

UK Biobank

UK Biobank is a population-based UK cohort of around 500,000 participants, aged between 40–69 at recruitment³². Participants completed a range of assessments and provided a sample for genetic analysis. Ethical approval was granted by the Northwest Multi-Centre Ethics Committee. This study was conducted under UK Biobank project number 13310.

Diagnosis definitions

Table 1 provides an overview of the self-reported, research interview and medical record diagnosis definitions used in this study.

Self-reported diagnosis

In NCMH, participants were asked whether a doctor or health professional had ever told the participant that they had a mental health diagnosis and prompted with a list of psychiatric diagnoses to choose from (Supplementary Fig. 1). In UK Biobank, participants were asked to report if a doctor had told them they had any serious medical condition in the initial assessment. A subset of participants in the UK Biobank (31%) completed the Mental Health Questionnaire (MHQ), where they were prompted with a list of psychiatric diagnoses to choose from (Supplementary Fig. 2). For both NCMH and UK Biobank, if the participant chose schizophrenia from the list or they verbally self-reported a schizophrenia diagnosis, they were assigned a schizophrenia self-reported diagnosis in this study. Table 1 describes the subtypes of self-reported diagnoses available in the clinically-ascertained sample. A self-reported schizoaffective disorder diagnosis was excluded from analyses, as it was not possible to differentiate between the depressed and manic subtypes.

Research interview diagnosis

In the clinically-ascertained samples (NCMH and CardiffCOGS), DSM-IV, DSM-5, and ICD-10 research diagnoses were derived from a SCAN-based clinical interview and note review where available. A research interview diagnosis of schizophrenia was given in this study if either a DSM or ICD schizophrenia criteria were met. If participants met criteria for schizoaffective disorder depressed-type (SA-D), they were also included alongside participants with schizophrenia given evidence that these participants do not differ on a range of phenotypic and genotype measures, including symptoms, cognition and polygenic risk³³. ‘Other psychotic disorders’ in this study refer to the following diagnoses: psychosis not otherwise specified, schizophreniform disorder, delusional disorder, brief psychotic disorder, acute polymorphic disorder, and other psychotic illness.

Medical record diagnosis

In UK Biobank, a medical record diagnosis of schizophrenia and SA-D were defined as a F20/F25.1 ICD-10 code from national hospital admission records or death records, or an equivalent read code from primary care (Supplementary Table 1). Hospital records date back to 1997 for England, 1998 for Wales and 1981 for Scotland and contain coded data on admissions, operations, and procedures. Primary care data was obtained for approximately 45% of the UK Biobank cohort. In secondary analyses, hospital admissions for schizophrenia were further subdivided into primary and secondary admissions. Primary ICD-10 codes represent conditions that caused the admission and secondary ICD-10 codes represent conditions that coexist at the time of admission, affect the treatment received, or develop after admission.

Unaffected controls

Unaffected controls for the clinically-ascertained samples were NCMH participants with no history of a mental health diagnosis and who were recruited through participants with a psychiatric diagnosis (e.g., a family member/partner) or via advertisements. Unaffected controls for the UK Biobank analyses consisted of participants in UK Biobank who did not have a psychotic disorder diagnosis (F21-F29 inclusive) from admission records, death records, primary care records, or from self-reported sources.

Phenotypic data

The phenotypes compared across diagnostic groups included sex, age at interview (in years), educational attainment, and employment status. Educational attainment was dichotomised to GCSE (General Certificate of Secondary Education) and above, usually achieved at 16 years upon completing high school, or below GCSE/no qualification, consistent with previous research³⁴, in addition to degree/no degree. Employment status was dichotomised to in current paid employment or not and restricted to participants under the age of 65 who did not report being retired.

Genetic data

Clinically-ascertained sample

The clinically-ascertained participants were genotyped on the Illumina OmniExpress (Infinium OmniExpress-24 Kit), Illumina PsychArray (Infinium PsychArray-24 Kit) or Illumina GSA (Infinium Global Screening Array-24 Kit) genotyping platforms. Quality control and imputation using the Haplotype Reference Consortium (HRC)³⁵ was performed as part of the DRAGON-Data protocol³⁶. Datasets containing participants from the clinically-ascertained samples were restricted to those with the diagnoses described above and who did not carry a neurodevelopmental CNV³⁶. These samples were combined with samples from 1000 Genomes European phase 3³⁷ using PLINK v1.9³⁸ after restricting to overlapping SNPs. The 1000 Genomes sample was included to provide a population reference to allow studies using different arrays to be directly compared³⁹. The following quality control exclusion criteria were subsequently applied to SNPs: minor allele frequency (MAF) < 0.05, genotyping rate < 0.05, and Hardy-Weinberg equilibrium p ≤ 10⁻⁶. Linkage disequilibrium-pruned SNPs (500 variant count window size, 20 variant count to shift the window at the end of each step, a pairwise r² threshold of 0.2) were used to identify related individuals and to derive principal components (PC). One individual from each pair assumed to be duplicates (kinship coefficient > 0.98) or related (kinship coefficient > 0.1875) was removed at random. The first 5 PCs were used to perform multi-dimensional clustering to identify an ancestrally-homogenous subsample of individuals⁴⁰. The first 5 PCs explained the majority of the variance in the principal components, adding additional PCs did not change the classifications. Individuals within a 90% threshold from the most central point were included for analyses. There were insufficient numbers of participants of non-European ancestries in NCMH and CardiffCOGS to allow us to analyse PRS in different ancestries.

UK Biobank

Imputed genetic data were provided by UK Biobank. Pre-imputation quality control and imputation have been described elsewhere⁴¹. Briefly, participants were assayed at the Affymetrix Research Services laboratory using the UK Biobank Axiom or UK BiLEVE Axiom purpose-built arrays. Imputation was completed using the HRC panel³⁵. We applied additional quality control procedures using the same thresholds used in our clinically-ascertained sample and detailed elsewhere^39,42. Genetic analyses were restricted to participants with European ancestry, to mirror the clinically-ascertained sample, using the method described above, see also Legge et al⁴².

Polygenic risk scores

In the clinically-ascertained sample and UK Biobank, PRSicev2⁴³ was used to calculate PRS for schizophrenia using GWAS de-duplicated summary statistics that were derived separately from our clinical sample and UK Biobank²⁸. PRS were also calculated for bipolar disorder¹³ and major depressive disorder⁴⁴. Summary statistics underwent quality control³⁶ and SNPs with MAF > 0.01 outside of the major histocompatibility complex region were used in the PRS analysis. PRS were calculated, using relatively independent SNPs (r² < 0.1, within 500 kb window), at a p-value threshold of 0.05²⁸. Polygenic risk scores were standardised within samples prior to analysis.

Analysis

In NCMH, positive predictive values (PPV) were used to assess the ratio of participants with a self-reported schizophrenia diagnosis from a health professional who had a concordant DSM/ICD research interview diagnosis. We also considered a research interview diagnosis of schizophrenia and schizoaffective disorder depressive-type (SA-D) together as there is evidence these two groups do not substantially differ with respect to genetic liability to schizophrenia^28,33. It was not possible to assess negative predictive values (NPV), sensitivity and specificity in the clinically-ascertained sample due to the recruitment methods; participants were only approached to complete a SCAN-based research interview if they self-reported a mood or psychotic disorder diagnosis.

In the UK Biobank, PPV, NPV, sensitivity and specificity were used to assess how predictive a self-reported clinical diagnosis from a health professional was of a medical record diagnosis. We scaled the PPV and NPV to the population point prevalence of schizophrenia (0.6%) (Supplementary Note 1). We could not calculate PPV related to a medical record diagnosis of schizophrenia and SA-D together due to a very low prevalence of SA-D in the UK Biobank.

In both NCMH and the UK Biobank, logistic regressions were used to test for phenotypic differences between individuals that only self-reported a diagnosis and those who had a research interview diagnosis/medical record diagnosis (some of whom also self-reported). Year of birth and sex were included as covariates.

Due to the limited number of genotyped participants in NCMH, the genetic analyses included participants from both NCMH and CardiffCOGS. In both the clinically-ascertained sample and the UK Biobank logistic regressions were used to test for genetic differences in schizophrenia between self-report-only and the research interview diagnosis/medical record diagnosis groups.

We compared the variance explained by schizophrenia PRS on the liability-scale (r², assuming 1% lifetime risk) in schizophrenia case/control status in the clinically-ascertained sample and UK Biobank, separated by diagnosis definitions, against the variances reported by other samples of European genetic ancestry in the PGC3 schizophrenia GWAS. The r² values refer to the variance explained by the schizophrenia PRS in comparison to a covariates-only baseline model. In addition, we calculated the variance explained in schizophrenia case/control status in UK Biobank for bipolar disorder¹³ and major depressive disorder⁴⁴ PRS.

In the UK Biobank sample, further logistic regressions were used to assess if schizophrenia PRS was associated with the number of times a diagnosis was reported, the number of admissions and type of admission (primary and secondary). These PRS analyses were covaried for the first 5 PCs, array, age at assessment, and sex.

All statistical tests were two-sided. Unless otherwise specified, data analysis was conducted in R.

Results

We identified participants with a self-reported, research or medical record diagnosis of either schizophrenia or schizoaffective disorder (SA-D) from across the three samples. This included 1112 participants from NCMH, 632 participants from CardiffCOGS and 1453 participants from the UK Biobank. Unaffected controls included 749 participants from NCMH and 501,088 participants from UK Biobank. Demographic information for each sample is provided in Table 2. The number of participants included in each analysis varies according to the availability of data (e.g. number with both self-report and research/medical record diagnoses for PPI analyses and number of participants with genotype data) and are detailed in the relevant sections.

Table 2 Sample characteristics by cohort.

Full size table

Positive predictive values

In NCMH, the proportion of participants with a self-reported schizophrenia diagnosis from a health professional who had a research interview diagnosis was used to estimate PPV (n = 273, Table 3). A self-reported current clinical diagnosis of schizophrenia had a PPV of 0.74 for receiving a research interview diagnosis of schizophrenia, 0.85 of receiving a diagnosis of schizophrenia or SA-D and 0.9 of receiving a diagnosis of schizophrenia, SA-D or any other psychotic disorder. A self-reported current clinical diagnosis produced slightly higher PPVs than self-reported lifetime clinical diagnoses or participant opinion diagnoses (Table 3). Secondary PPV analyses indicated that a self-reported psychotic disorder was less predictive of receiving a research diagnosis of schizophrenia, SA-D or other psychotic disorder (PPVs range: 0.27–0.65, Supplementary Table 2). All participants who self-reported schizophrenia but did not proceed to get a research interview diagnosis of schizophrenia received other mood or psychotic research diagnoses, except one participant where there was insufficient data to make a research interview diagnosis (Supplementary Table 3).

Table 3 Positive predictive values for self-reported diagnoses of schizophrenia and subsequent research interview diagnoses.

Full size table

In UK Biobank, predictive values were calculated in participants who had both a self-reported diagnosis and a medical record diagnosis. After correction for the point prevalence⁴⁵ of schizophrenia (0.6%), the PPV of having a medical record diagnosis of schizophrenia for those who self-reported a schizophrenia diagnosis was 0.83, the NPV 0.996, the specificity 0.9995 and the sensitivity 0.383 (Table 4). When including a medical record diagnosis of schizophrenia or any other psychotic-related disorder the specificity decreased to 0.77 and sensitivity to 0.21 (Supplementary Table 4).

Table 4 Predictive values for self-reported schizophrenia and medical record diagnosis of schizophrenia.

Full size table

Phenotypic and genetic differences across diagnosis source

Clinically-ascertained sample

We compared 458 participants in NCMH with a research interview diagnosis of schizophrenia or SA-D with 654 participants whose only source of diagnosis was a self-reported diagnosis of schizophrenia. Participants who had a research interview diagnosis were younger (mean age 43 vs. 47; OR = 0.77; 95% CI = 0.67, 0.88; p = 9.11 × 10^-5) and more likely to have a high school qualification (GCSE) or above (OR = 1.61; 95% CI = 1.13, 2.29; p = 0.008) than self-reporting only participants. Having a degree did not significantly differ across self-report only and research interview groups (OR = 1.15, 95%CI = 0.79, 1.67, p = 0.47). No significant differences were detected in employment (OR = 1.35; 95% CI = 0.87, 2.08; p = 0.18) or sex (OR = 1.07; 95% CI = 0.83, 1.37; p = 0.61) (Fig. 2).

**Fig. 2: Phenotype differences across methods of diagnosis.**

For the genetic analyses, we added participants from the CardiffCOGS cohort (who all had a research diagnosis) to the NCMH sample to increase the sample size. Therefore, the genetic analyses included 803 participants with a research diagnosis, 449 participants who exclusively had a self-report diagnosis and 710 controls. We found no significant difference in schizophrenia PRS between participants who had a research interview diagnosis and those who only self-reported a diagnosis (OR = 0.97; 95% CI = 0.86, 1.09; p = 0.59) (Fig. 3).

**Fig. 3: Schizophrenia polygenic risk scores plotted by method of diagnosis.**

CardiffCOGS samples were genotyped on a different array platform (OmniExpress) to NCMH cases and controls, which were split across GSA and PsychChip (Supplementary Table 5). To test whether there were any batch effects, we removed the CardiffCOGS samples from the case/control analysis. We found a consistent effect for the association between PRS and schizophrenia case/control status (OR = 1.70; 95%CI = 1.44-2.02; P = 1.91 × 10⁻¹⁰, r² = 0.036; se = 0.011; AUC = 0.64). This finding, alongside Supplementary Fig. 3 suggests that there were no batch effects in the genetic data.

UK Biobank sample

Compared to participants whose basis for a diagnosis was solely self-report (n = 252), participants who had a medical record diagnosis of schizophrenia or SA-D (n = 1201) were less likely to be in paid employment (OR = 0.55; 95% CI = 0.39, 0.79; p = 0.001), and less likely to have a GCSE (high school) or higher qualification (OR = 0.70; 95% CI = 0.51, 0.95; p = 0.02). Furthermore, participants with a medical record diagnosis were less likely to have a degree (OR = 0.59, 95%CI = 0.44, 0.79, p = 0.0005). There were no differences in sex (OR = 0.95; 95% CI = 0.72, 1.26; p = 0.75) or age across the groups (OR = 0.91; 95% CI = 0.80, 1.04; p = 0.18) (Fig. 2).

The genetic analysis included 181 participants with only a self-report and 809 participants with a medical record diagnosis. No significant difference in schizophrenia PRS was found between participants who had a medical record diagnosis and a self-report diagnosis (OR = 1.01; 95%CI = 0.87,1.19; p = 0.85; Fig. 3).

Liability explained in case/control status

The proportion of variance on the liability scale attributable to schizophrenia PRS in both diagnostic groups in the clinically-ascertained sample and UK Biobank studies fell within the distribution of studies in the latest PGC analysis (Fig. 4, Table 5). In the clinically-ascertained sample, the schizophrenia PRS explained 5.0% of the variability in the self-reported-only group, and 4.7% in the research interview diagnosis group. In the UK Biobank sample, the schizophrenia PRS explained 6.5% of the variability in the self-reported only-group and 6.1% in the medical record diagnosis group (Table 5).

**Fig. 4: Variance explained by schizophrenia PRS by diagnostic method compared to PGC studies.**

Table 5 Variance explained by schizophrenia PRS for each diagnostic method.

Full size table

Further examination of diagnosis source in the UK Biobank

Schizophrenia PRS increased with the number of times a schizophrenia diagnosis was reported; OR = 1.82 (95%CI = 1.67, 1.99) for 1 endorsement compared to controls and OR = 2.11 (95%CI = 1.92, 2.32) for 2 or more endorsements (Supplementary Fig. 4). Participants who had two or more diagnosis endorsements had a significantly higher schizophrenia PRS than participants who only had one diagnosis endorsement (OR = 1.15; 95%CI = 1.01, 1.31; P = 0.03). The schizophrenia PRS also increased as the number of schizophrenia hospital admissions increased from OR = 1.85 (95%CI = 1.72, 2.00) for 0 admissions (participants had an alternative source of schizophrenia diagnosis), to OR = 1.92 (95%CI = 1.77, 2.08) for 1 admission, and OR = 2.28 (95%CI = 2.01, 2.58) for 2 or more admissions (Supplementary Fig. 5).

Schizophrenia cases with a primary ICD-10 admission code had a higher schizophrenia PRS than those who had schizophrenia as a secondary ICD-10 admission code (OR = 1.28; 95%CI = 1.10, 1.49; P = 0.002). Participants identified with a schizophrenia diagnosis from a secondary code only, on average, had lower schizophrenia PRS than those identified from self-reported or a primary hospital admission code (Supplementary Fig. 6). These findings did not appear to be related to the secondary code only participants having different associated diagnoses (Supplementary Table 6 and 7).

Discussion

In this study, we demonstrated that participants who self-reported a clinical diagnosis of schizophrenia were likely to be given a subsequent research interview diagnosis of schizophrenia, SA-D or other psychotic disorder (PPVs between 0.70 and 0.90). Furthermore, we found that participants in UK Biobank who self-reported a clinical schizophrenia diagnosis were likely to have a medical record diagnosis of schizophrenia or another psychotic disorder (PPV = 0.80). Although we found some phenotypic differences, genetic liability to schizophrenia did not significantly differ between participants with a self-reported diagnosis compared to those diagnosed via research interview or medical records. The variance explained by the schizophrenia PRS for all diagnostic methods fell within the distribution of PGC studies. These findings suggest that using a self-reported clinical diagnosis of schizophrenia is a valid approach for identifying participants for large-scale genomic research.

In the clinically-ascertained sample, participants who self-reported schizophrenia were likely to receive a research diagnosis of schizophrenia, SA-D or other psychotic disorder, however, participants who self-reported a lifetime clinical diagnosis of psychosis (without schizophrenia and bipolar) were much less likely to obtain a research interview diagnosis of schizophrenia, SA-D, or other psychotic disorder (PPVs 0.27–0.65). Previous research has shown that a schizophrenia diagnosis has much better agreement between diagnostic methods (PPVs 0.69-1.00) than other diagnoses such as bipolar, depression and other psychotic disorders⁹. Although self-reported diagnoses have generally been shown to have poor predictive accuracy when it comes to obtaining a gold-standard research interview diagnosis^46,47, our results suggest that for schizophrenia specifically, self-reported diagnoses could be used in place of a research interview diagnosis to identify participants in genomic research. No research diagnostic method, including clinical interviews, is totally free of bias and recruiting participants via a range of sources is likely to lead to more representative studies as a whole.

In UK Biobank, participants who self-reported schizophrenia were likely to have a medical record diagnosis of schizophrenia. However, the low sensitivity values indicate that a self-report in the UK Biobank did not capture everyone who had a medical record. This could be for many different reasons including later onset of illness, the stigma associated with reporting a schizophrenia diagnosis, or the non-specific nature of the question in the initial assessment (“Has a doctor ever told you that you have any other serious medical conditions?”). Participants were not specifically asked about mental health and prompts, if any, were only given for physical health conditions. This may have led to under-reporting of schizophrenia and underscores the importance of ensuring self-report questions specifically reference the diagnosis of interest. The high negative classifications in UK Biobank illustrated that participants who did not self-report schizophrenia also did not have a medical record of schizophrenia and vice versa, demonstrating that, although certainly enhanced by the low prevalence, self-reported diagnoses are effective at ruling out non-cases.

Currently, using a research interview and note review to obtain a diagnosis is considered gold standard, although we find the requirement to attend and undergo a detailed and time-consuming interview may induce recruitment biases, with those participating in such an interview (after the majority having a brief interview first) being younger and more likely to have a high school qualification (GCSEs) than those who only have self-reported. No difference in degree qualification was observed across groups, however, this is likely due to the small proportion of individuals with a degree in the clinical sample (research interview = 18% vs self-report = 16%). Research interviews may exclude participants who are more acutely unwell or cognitively impaired and unable to complete a long assessment. In the UK Biobank, participants with a medical record diagnosis were less likely to have GCSEs and to be employed. Furthermore, participants with a medical record diagnosis were less likely to have a degree (medical record = 25% vs self-report = 36%). This suggests that these participants may have more impaired functioning than those with a self-report only, and by using medical records only as researchers we may be missing participants who are functioning well and/or have not been admitted to hospital. An alternative explanation is that those who did not self-report schizophrenia in UK Biobank may not have sufficient insight to verbally report their diagnosis to the research nurse or may not have been able to complete the online Mental Health Questionnaire due to poorer functioning or education. Thus, the self-report sample in UK Biobank may be missing participants with more impaired functioning. Taken together, our results highlight some phenotypic differences between different methods of identifying a diagnosis of schizophrenia, particularly within UK Biobank where sample representativeness is a known issue. However, for clinically ascertained samples, our results suggest that self-report of a diagnosis made by a health professional may be sufficient given the limited phenotypic differences between self-report and research interview diagnosis, and the potential to include more participants with lower education using self-report.

In contrast to a depression study in UK Biobank which found participants defined by minimal phenotyping (self-report, help-seeking, and symptom-based) had lower SNP-derived heritability than the strictly defined participants (Composite International Diagnostic Interview)²³, we did not find a difference in schizophrenia PRS between the self-report and research interview/medical record diagnosis groups. This may reflect differences in obtaining clinical diagnoses of schizophrenia and depression. Depression can be diagnosed in many settings including primary care, whilst schizophrenia is almost universally diagnosed in secondary care following comprehensive assessment. There may be hesitancy to disclose a diagnosis of schizophrenia to patients due to the associated stigma, as there is evidence that health professionals are more likely to communicate diagnoses of depression than schizophrenia to patients, are more likely to diagnose schizophrenia after recurrent episodes than during the first episode and show a preference for using alternative terms to schizophrenia such as psychosis^48,49,50. As such, a self-reported diagnosis of schizophrenia may have higher validity.

Our results highlight the potential, especially for genomic studies, of using this self-report method to identify participants for schizophrenia research. However, we did find differences in schizophrenia PRS within hospital admission diagnoses (in primary and secondary admissions). We also found the number of diagnosis reports and admissions were associated with higher PRSs in UK Biobank, as has been reported in previous literature^51,52. This is consistent with findings from the PGC, who reported schizophrenia PRS to be higher in patients who were recruited from inpatient settings²⁸. These findings could indicate greater severity or improved accuracy of diagnosis, or both. We also found participants whose primary reason for admission was schizophrenia, and those who only self-reported schizophrenia, had a higher schizophrenia PRS than those with a secondary admission diagnosis. One explanation of the difference in schizophrenia PRS could be that the secondary admission group were participants who were not admitted primarily for psychosis because their symptoms were milder or were well treated. Alternatively, the accuracy of a secondary diagnosis may be more prone to error than a diagnosis given for a primary admission to hospital (e.g., if admitted for a heart attack), although this did not appear to be the case when looking specifically at psychiatric comorbidities.

Limitations

It is important to note limitations of the current study. Participants were invited to complete a SCAN-based research interview if they self-reported psychosis or a schizophrenia diagnosis. This study design prevented us from assessing other metrics (negative predictive value, sensitivity, and specificity) in the clinically-ascertained sample. This also meant we were unable to adjust the PPV to the population point prevalence of schizophrenia. As a result, the PPV could have been inflated by the high proportion of schizophrenia participants in our clinically-ascertained sample. Additionally, some participants had their diagnosis confirmed by a clinician if systematically recruited, which could have increased the positive predictive values, and only a subset of the sample were asked their own opinion of their diagnosis (n = 99). Despite these limitations, our clinically-ascertained sample is one of the only psychosis-based samples with both self-report diagnosis data and a gold-standard research diagnosis. In the UK Biobank, 93% of the participants with a medical record diagnosis of schizophrenia have a hospital admission, therefore the predictive values primarily reflect how predictive a self-report was of a hospital admission.

Both the clinically-ascertained sample and UK Biobank were sampled from the UK; therefore, the findings may not apply to other countries. The generalisability of the UK Biobank findings are also hindered because this sample is not wholly representative of the UK population⁵³. Furthermore, the primary and secondary admission diagnosis in the UK Biobank may have differed depending on the location of the admission (e.g., psychiatric hospital vs general) or by the clinician’s expertise. Polygenic risk scores were restricted to participants from a European genetic ancestry. We were unable to investigate whether the polygenic risk scores differ across diagnostic groups in non-European genetic ancestries due to a limited number of participants in our samples from non-European genetic ancestries. Lastly, the recruitment methods in the NCMH study could have enriched for relatives of those with mood and psychotic disorders, although only 5% (n = 33) of our controls reported having a family history of bipolar disorder or schizophrenia. Nonetheless, and although the effect of this would be conservative, this could weaken the variance explained in our schizophrenia PRS analyses. Our findings apply to large-scale genomic studies and caution should be applied in applying self-report methods of diagnosis to studies such as clinical trials where diagnostic accuracy for individual participants is paramount and where participants may receive financial reimbursement for their time.

Conclusion

Self-reporting a clinical schizophrenia diagnosis may be a valid method for identifying cases in schizophrenia genomic research, providing systematic differences of methodologies are transparently noted. Participants who only self-reported a schizophrenia diagnosis showed differences in age, education, and employment but crucially, they did not differ in relation to schizophrenia genetic liability. These findings provide preliminary evidence for using less stringent methods of ascertaining diagnoses in schizophrenia genomic research, which could reduce the burden on participants and researchers to complete extensive interviews, and thereby potentially improve the representativeness of future samples and increase sample sizes.

Data availability

UK Biobank data can be obtained upon application from https://www.ukbiobank.ac.uk/enable-your-research. The de-duplicated PGC summary stats for schizophrenia are available on the PGC website https://figshare.com/articles/dataset/scz2022/19426775. All code will be made available upon request. Data from the clinically-ascertained cohorts cannot be made publicly available due to restrictions in ethical approvals.

References

American Psychiatric Association. Diagnostic and statistical manual of mental disorders: DSM-5™, 5th ed. Arlington, VA, US: American Psychiatric Publishing, Inc.; 2013.
World Health Organisation. Schizophrenia. https://www.who.int/news-room/fact-sheets/detail/schizophrenia. Published 2022. Accessed 20th December 2022.
Harvey, P. D. et al. Diagnosis of schizophrenia: consistency across information sources and stability of the condition. Schizophr. Res. 140, 9–14 (2012).
Article PubMed PubMed Central Google Scholar
Haapea, M. et al. Non-participation may bias the results of a psychiatric survey. Soc. Psychiatry Psychiatr. Epidemiol. 42, 403–409 (2007).
Article PubMed Google Scholar
Reinikainen, J. et al. Participation rates by educational levels have diverged during 25 years in Finnish health examination surveys. Eur. J. Public Health 28, 237–243 (2017).
Article Google Scholar
O’Connell, K. S. et al. Genomics yields biological and phenotypic insights into bipolar disorder. Preprint at https://www.medrxiv.org/content/10.1101/2023.10.07.23296687v3 (2024).
Weiser, M. et al. Sensitivity of ICD-10 diagnosis of psychotic disorders in the Israeli National Hospitalization Registry compared with RDC diagnoses based on SADS-L. Compr. Psychiatry. 46, 38–42 (2005).
Article PubMed PubMed Central Google Scholar
Williams, J., Farmer, A., Ackenheil, M., Kaufmann, C. & McGuffin, P. Group ORR. A multicentre inter-rater reliability study using the OPCRIT computerized diagnostic system. Psychol. Med. 26, 775–783 (1996).
Article CAS PubMed Google Scholar
Vares, M., Ekholm, A., Sedvall, G. C., Hall, H. & Jönsson, E. G. Characterization of patients with schizophrenia and related psychoses: evaluation of different diagnostic procedures. Psychopathology 39, 286–295 (2006).
Article PubMed Google Scholar
Ekholm, B. et al. Evaluation of diagnostic procedures in Swedish patients with schizophrenia and related psychoses. Nord. J. psychiatry 59, 457–464 (2005).
Article PubMed Google Scholar
Davis, K. A., Sudlow, C. L. & Hotopf, M. Can mental health diagnoses in administrative data be used for research? A systematic review of the accuracy of routinely collected diagnoses. BMC Psychiatry 16, 1–11 (2016).
Article Google Scholar
Pasman, J. A. et al. GWAS of lifetime cannabis use reveals new risk loci, genetic overlap with psychiatric traits, and a causal effect of schizophrenia liability. Nat. Neurosci. 21, 1161–1170 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018).
Article CAS PubMed PubMed Central Google Scholar
Demontis D., et al. Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat. Genet. 51:63 (2019).
Dalby, M. et al. Characterizing mood disorders in the AFFECT study: a large, longitudinal, and phenotypically rich genetic cohort in the US. Transl. Psychiatry 12, 121 (2022).
Article PubMed PubMed Central Google Scholar
Levey, D. F. et al. Bi-ancestral depression GWAS in the Million Veteran Program and meta-analysis in >1.2 million individuals highlight new therapeutic directions. Nat. Neurosci. 24, 954–963 (2021).
Article CAS PubMed PubMed Central Google Scholar
Hyde, C. L. et al. Identification of 15 genetic loci associated with risk of major depression in individuals of European descent. Nat. Genet. 48, 1031–1036 (2016).
Article CAS PubMed PubMed Central Google Scholar
Davis, K. A. et al. Indicators of mental disorders in UK Biobank—A comparison of approaches. Int. J. Methods Psychiatr. Res. 28, e1796 (2019).
Article PubMed PubMed Central Google Scholar
Mallard, T. T. et al. Multivariate GWAS of psychiatric disorders and their cardinal symptoms reveal two dimensions of cross-cutting genetic liabilities. Cell Genom. 2, 100140 (2022).
Article CAS PubMed PubMed Central Google Scholar
Wainberg, M., Jacobs, G. R., di Forti, M. & Tripathy, S. J. Cannabis, schizophrenia genetic risk, and psychotic experiences: a cross-sectional study of 109,308 participants from the UK Biobank. Transl. Psychiatry 11, 211 (2021).
Article PubMed PubMed Central Google Scholar
Jermy, B. S., Hagenaars, S., Coleman, J. R., Vassos, E. & Lewis, C. M. Risk factor profiles for depression following childbirth or a chronic disease diagnosis: case–control study. BJPsych Open. 8, e182 (2022).
Article PubMed PubMed Central Google Scholar
Levey, D. F. et al. Reproducible Genetic Risk Loci for Anxiety: Results From similar to 200,000 Participants in the Million Veteran Program. Am. J. Psychiatry 177, 223–232 (2020).
Article PubMed PubMed Central Google Scholar
Cai, N. et al. Minimal phenotyping yields genome-wide association signals of low specificity for major depression. Nat. Genet. 52:437 (2020).
Waszczuk, M. A. et al. Dimensional and transdiagnostic phenotypes in psychiatric genome-wide association studies. Mol. Psychiatry 28, 4943–4953 (2023).
Article PubMed PubMed Central Google Scholar
Liu, H. et al. Mendelian randomization highlights significant difference and genetic heterogeneity in clinically diagnosed Alzheimer’s disease GWAS and self-report proxy phenotype GWAX. Alzheimer’s. Res. Ther. 14, 17 (2022).
Article CAS Google Scholar
Schoeler, T., Pingault, J.-B. & Kutalik, Z. Self-report inaccuracy in the UK Biobank: Impact on inference and interplay with selective participation. Preprint at https://www.medrxiv.org/content/10.1101/2023.10.06.23296652v1 (2023).
Zheutlin, A. B. et al. Penetrance and Pleiotropy of polygenic risk scores for schizophrenia in 106,160 patients across four health care systems. Am. J. Psychiatry 176, 846–855 (2019).
Article PubMed PubMed Central Google Scholar
Trubetskoy, V. et al. Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature 604, 502–508 (2022).
Article CAS PubMed PubMed Central Google Scholar
Underwood, J. F. G. et al. Autism spectrum disorder diagnosis in adults: phenotype and genotype findings from a clinically derived cohort. Br. J. Psychiatry 215, 647–653 (2019).
Article PubMed PubMed Central Google Scholar
Wing, J. K. et al. SCAN: Schedules for clinical assessment in neuropsychiatry. Arch. Gen. Psychiatry 47, 589–593 (1990).
Article CAS PubMed Google Scholar
Legge, S. E. et al. Clinical indicators of treatment-resistant psychosis. Br. J. Psychiatry 216, 259–266 (2020).
Article PubMed Google Scholar
Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Article PubMed PubMed Central Google Scholar
Dennison, C. A. et al. Risk factors, clinical features, and polygenic risk scores in schizophrenia and schizoaffective disorder depressive-type. Schizophr. Bull. 47, 1375–1384 (2021).
Article PubMed PubMed Central Google Scholar
Escott-Price, V. et al. Genetic liability to schizophrenia is negatively associated with educational attainment in UK Biobank. Mol. Psychiatry 25, 703–705 (2020).
Article PubMed Google Scholar
McCarthy, S., Das, S. & Kretzschmar, W. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
Article CAS PubMed PubMed Central Google Scholar
Lynham, A. J. et al. DRAGON-Data: a platform and protocol for integrating genomic and phenotypic data across large psychiatric cohorts. BJPsych Open. 9, e32 (2023).
Article PubMed PubMed Central Google Scholar
The 1000 Genomes Project Consortium, Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article Google Scholar
Chang C. C., et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:s13742-13015-10047-13748.
Leonenko, G. et al. Identifying individuals with high risk of Alzheimer’s disease using polygenic risk scores. Nat. Commun. 12, 1–10 (2021).
Article Google Scholar
Conomos, M. P. et al. Genetic diversity and association studies in US Hispanic/Latino populations: applications in the Hispanic Community Health Study/Study of Latinos. Am. J. Hum. Genet. 98, 165–184 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bycroft, C. et al. Genome-wide genetic data on ~ 500,000 UK Biobank participants. Preprint at https://www.biorxiv.org/content/10.1101/166298v1 (2017).
Legge, S. E. et al. Association of genetic liability to psychotic experiences with neuropsychotic disorders and traits. JAMA Psychiatry 76, 1256–1265 (2019).
Article PubMed PubMed Central Google Scholar
Choi, S. W. O’Reilly PF. PRSice-2: Polygenic Risk Score software for biobank-scale data. Gigascience 8, giz082 (2019).
Article PubMed PubMed Central Google Scholar
Stahl, E. A. et al. Genome-wide association study identifies 30 loci associated with bipolar disorder. Nat. Genet. 51, 793–803 (2019).
Article CAS PubMed PubMed Central Google Scholar
Davies, M. R. et al. Comparison of symptom-based versus self-reported diagnostic measures of anxiety and depression disorders in the GLAD and COPING cohorts. J. Anxiety Disord. 85, 102491 (2022).
Article PubMed Google Scholar
Vieira, L. S. et al. Self-reporting of psychiatric illness in an online patient registry is a good indicator of the existence of psychiatric illness. J. Psychiatr. Res. 151, 34–41 (2022).
Article Google Scholar
Mitchell, A. J. Reluctance to disclose difficult diagnoses: a narrative review comparing communication by psychiatrists and oncologists. Support. Care Cancer 15, 819–828 (2007).
Article PubMed Google Scholar
Paccaloni, M., Moretti, F. & Zimmermann, C. Giving information and involving in treatment: what do psychiatrists think? A review. Epidemiol. Psychiatr. Sci. 14, 198–216 (2005).
Article Google Scholar
Clafferty, R. A., McCabe, E. & Brown, K. W. Conspiracy of silence? Telling patients with schizophrenia their diagnosis. Psychiatr. Bull. 25, 336–339 (2001).
Article Google Scholar
Meier, S. M. et al. High loading of polygenic risk in cases with chronic schizophrenia. Mol. Psychiatry 21, 969–974 (2016).
Article CAS PubMed Google Scholar
Glanville, K. P. et al. Multiple measures of depression to enhance validity of major depressive disorder in the UK Biobank. BJPsych Open. 7, e44 (2021).
Article PubMed PubMed Central Google Scholar
Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

JTRW, MCO’D and MJO received a research grant to Cardiff University from Takeda Pharmaceuticals that funded this work and GW’s research position. This work was supported by the following grants: Medical Research Council Program (MR/P005748/1), DATAMIND (MR/W014386/1) and a grant from NIH (Award U01MH109514). We thank the participants, clinicians, lab staff and field team for their help with the NCMH, CardiffCOGS and UK Biobank studies.

Author information

Authors and Affiliations

Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
Grace E. Woolway, Sophie E. Legge, Amy J. Lynham, Sophie E. Smart, Leon Hubbard, Ellie R. Daniel, Antonio F. Pardiñas, Valentina Escott-Price, Michael C. O’Donovan, Michael J. Owen, Ian R. Jones & James T. R. Walters

Authors

Grace E. Woolway
View author publications
Search author on:PubMed Google Scholar
Sophie E. Legge
View author publications
Search author on:PubMed Google Scholar
Amy J. Lynham
View author publications
Search author on:PubMed Google Scholar
Sophie E. Smart
View author publications
Search author on:PubMed Google Scholar
Leon Hubbard
View author publications
Search author on:PubMed Google Scholar
Ellie R. Daniel
View author publications
Search author on:PubMed Google Scholar
Antonio F. Pardiñas
View author publications
Search author on:PubMed Google Scholar
Valentina Escott-Price
View author publications
Search author on:PubMed Google Scholar
Michael C. O’Donovan
View author publications
Search author on:PubMed Google Scholar
Michael J. Owen
View author publications
Search author on:PubMed Google Scholar
Ian R. Jones
View author publications
Search author on:PubMed Google Scholar
James T. R. Walters
View author publications
Search author on:PubMed Google Scholar

Contributions

G.W. conducted statistical analyses with supervision from S.E.L., A.J.L., I.R.J., and J.T.R.W. S.S. and L.H. processed the clinically-ascertained genomic data with supervision from A.F.P. and J.T.R.W. E.R.D. assisted in the processing of the phenotypic data from the clinically-ascertained sample. J.T.R.W., V.E.P., M.C.O.D. and M.J.O. critically reviewed analyses and results. G.W. drafted the manuscript with input from S.E.L. and A.J.L. All authors read, contributed to revisions, and approved the final draft of the manuscript.

Corresponding authors

Correspondence to Sophie E. Legge or James T. R. Walters.

Ethics declarations

Competing interests

JTRW, MCO’D and MJO received a research grant to Cardiff University from Takeda Pharmaceuticals that funded this work and GW’s research position. Pharmaceuticals that funded this work and GW’s research position. Takeda Pharmaceuticals have not had any input into the study design, analysis, or interpretation of results. JTRW, MCO’D, MJO, IRJ and AFP have received research funding from Akrivia Health for work unrelated to this study. The remaining authors have nothing to disclose.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Materials

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Woolway, G.E., Legge, S.E., Lynham, A.J. et al. Assessing the validity of a self-reported clinical diagnosis of schizophrenia. Schizophr 10, 99 (2024). https://doi.org/10.1038/s41537-024-00526-5

Download citation

Received: 23 May 2024
Accepted: 15 October 2024
Published: 30 October 2024
DOI: https://doi.org/10.1038/s41537-024-00526-5

Subjects

Abstract

Similar content being viewed by others

Novel disease associations with schizophrenia genetic risk revealed in ~400,000 UK Biobank participants

Investigating the effects of genetic risk of schizophrenia on behavioural traits

Polygenic scores for psychiatric disorders in a diverse postmortem brain tissue cohort

Introduction

Methods

Participants

NCMH

CardiffCOGS

UK Biobank

Diagnosis definitions

Self-reported diagnosis

Research interview diagnosis

Medical record diagnosis

Unaffected controls

Phenotypic data

Genetic data

Clinically-ascertained sample

UK Biobank

Polygenic risk scores

Analysis

Results

Positive predictive values

Phenotypic and genetic differences across diagnosis source

Clinically-ascertained sample

UK Biobank sample

Liability explained in case/control status

Further examination of diagnosis source in the UK Biobank

Discussion

Limitations

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Materials

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links