Introduction

This paper describes the work of the Accelerating Medicines Partnership® Schizophrenia (AMP®SCZ) Clinical Ascertainment and Outcome Measures Team led by J. Addington and A. Yung, which aimed to establish a harmonized clinical assessment protocol across the ProNET and PRESCIENT research networks and to define ascertainment criteria and primary and secondary endpoints1,2. Once the assessment protocol was finalized, the team set out three additional goals for the AMP SCZ project: (1) to implement and monitor clinical training, ascertainment of participants, and clinical assessments; (2) to provide expert clinical input to the Psychosis Risk Evaluation, Data Integration and Computational Technologies: Data Processing, Analysis, and Coordination Center (PREDICT-DPACC) for data collection, quality control, and preparation of data for the analysis of the clinical measures; and (3) to provide ongoing support to the collection, analysis, and reporting of clinical data. This paper will present and describe the (1) protocol clinical endpoints and outcomes, (2) rationale for the selection of the clinical measures, (3) extensive training of clinical staff, (4) preparation of clinical measures for a multisite study which includes several sites where English is not the native language; and (5) the assessment of measure stability over time in the AMP SCZ observational study, with initial data comparing clinical ratings at baseline and at the 2-month follow up.

Comprehensive details about the complete AMP SCZ project can be found at the following link www.ampscz.org. Under the section “For Scientists” (use direct link www.ampscz.org/scientists/) the standard operating procedures developed by this team are available and entitled “Clinical Data Acquisition”.

Screening, endpoints and outcomes

Screening

The initial screening assessment determines if the individual meets criteria for the study. Two groups are included: individuals meeting Clinical High Risk (CHR) (also known as the Ultra High Risk) criteria, meant to capture symptoms of the prodrome for psychosis3 prospectively, and community controls (CC). Inclusion and exclusion criteria are presented in Table 1.

Table 1 Inclusion, exclusion, and conversion criteria.

At screening, CHR status is determined by:

  1. (i)

    Attenuated positive symptoms and/or brief limited positive symptoms: these are assessed using the Positive Symptoms and Diagnostic Criteria for the CAARMS Harmonized with the SIPS (PSYCHS). This new instrument was developed to ensure that the study includes CHR individuals as they are defined by either of the two most commonly used instruments, the CAARMS (Comprehensive Assessment of At-Risk Mental States)4 and the SIPS (Structured Interview for Psychosis-Risk Syndromes)5. The development of the PSYCHS is described in detail elsewhere6.

  2. (ii)

    Trait vulnerability and genetic risk and deterioration: these are assessed using the schizotypal personality disorder section from the Structured Clinical Interview for DSM-5 (SCID), the Social and Occupational Functioning Assessment Scale (SOFAS)7, and the Family Interview for Genetic Studies (FIGS)8.

Exclusion criteria are presented in Table 1. At screening, eligibility of community control participants is determined by use of the above instruments to rule out CHR status, history of brain injury and/or current use of psychotropic medications.

Primary clinical endpoint

The primary endpoint is the transition to a threshold psychosis as assessed by the PSYCHS, by 12- and 24-months. Transition to psychosis is defined in Table 1.

Secondary clinical endpoints

Traditionally in CHR research, endpoints focus on transition to psychosis. However, there is increased interest in not just transition but also remission and persistence of attenuated psychotic symptoms (APS)9,10. Thus, the first two secondary endpoints, defined by the PSYCHS, include (i) sustained remission of CHR status for ≥6 months and until the last available assessment for an individual, and (ii) persistent CHR status, defined as meeting neither sustained remission nor transition criteria by study end.

Second, a considerable portion of CHR individuals do not develop psychosis but may have an onset of a mood or anxiety disorder, or continue to experience mood and anxiety symptoms, cognitive difficulties and/or significant functional difficulties. That is, even though CHR individuals may not develop threshold psychosis, they may nonetheless have persistent clinical and/or functional difficulties11,12. A recent special issue of Schizophrenia Research emphasized that “embracing heterogeneity creates new opportunities for understanding and treating those at CHR for psychosis”13. Indeed, it is important to deconstruct the heterogeneity of the CHR syndrome and broaden the endpoints to fully understand the range of clinical outcomes in those at CHR for psychosis to improve treatment outcomes. As Woods et al.14 recommend, the AMP SCZ project has developed a range of measures that may help to dissect this heterogeneity and aid progress toward new treatments. These secondary outcomes include APS severity, APS remission, functional outcome, depression, persistent functional impairment, negative symptoms, and cognitive deficits, and incident and persistent non-psychotic disorders.

The clinical measures: rationale and description

As described in the AMP SCZ introductory paper1, several factors were taken into consideration in the selection of clinical measures and their time points. First, there needed to be existing evidence regarding their relevance for CHR outcomes. Second, measures needed to provide data suitable for both static and dynamic predictive modeling15. Third, the timing of follow-up assessments should be informed by the design of future clinical trials. Finally, the burden on both participants and clinical raters had to be considered. It should also be noted that although most of the measures involve predictive potential for primary and secondary endpoints, they can also serve as clinical endpoints themselves.

The PSYCHS

The key clinical measure in AMP SCZ is the newly developed PSYCHS, which is used to operationalize CHR criteria, assess type and severity of APS, and determine the onset of a first episode of psychosis. A main consideration in harmonizing the clinical measures was that ProNET proposed to use the Structured Interview for Psychosis-risk Syndromes (SIPS)5 while PRESCIENT proposed to use the Comprehensive Assessment of At-Risk Mental States (CAARMS)4,16. The PSYCHS is a semi-structured interview that was created to generate CHR criteria and severity scores for both the CAARMS and the SIPS. The development of the PSYCHS is described in two papers6,17. The PSYCHS consists of 15 APS and generates relevant CAARMS and SIPS diagnoses for lifetime, past year, and past month, as well as APS severity ratings for both the CAARMS and the SIPS. Furthermore, the creation of the PSYCHS allows data from the AMP SCZ project to be compared to legacy data that used either the CAARMS or SIPS. A recent issue of the journal “Early Intervention in Psychiatry” is dedicated to the development of this measure6,17.

The fifteen symptoms include unusual thoughts and experiences, suspiciousness/paranoia, unusual somatic ideas, ideas of guilt, jealous ideas, unusual religious ideas, erotomaniac ideas, grandiosity, six perceptual abnormality symptoms (auditory, visual, olfactory, gustatory, tactile, and somatic) and disorganized communication.

Psychosocial functioning

Impaired functioning is a major characteristic of psychotic disorders and is evident in CHR individuals. The primary measures of functioning are the Global Functioning (GF): Social and Role scales developed by Cornblatt and colleagues to measure changes in functioning across time in CHR participants18,19. Each scale generates a single global score which is optimal for measuring developmental trajectories in functioning and for defining ongoing functional disability.

The rater-generated GF scores were developed to provide brief and easy-to-use clinician ratings, to detect changes over time, and to avoid confounding with psychiatric symptoms18. The GF: Social scale rates peer relationships and conflict, and family involvement. The GF: Role scale assesses performance, and the amount of support needed in school and work symptoms18. Each scale ranges from a score of 1 (extreme dysfunction) to 10 (superior functioning). Each GF scale generates three scores: (i) current level, (ii) highest level in the past year, and (iii) lowest level of functioning in the past year prior to the assessment18. For most analyses, the current level of functioning is used. However, a decline in social functioning in the year before entering the study (defined as having a 1 point or greater drop on the GF: Social scale from the highest level to the current (baseline) level of functioning) was found to be a significant predictor of psychosis in the NAPLS Psychosis Risk Calculator20,21.

Negative symptoms

Negative symptoms are predictive of conversion to psychosis and are associated with functional decline and cognitive impairment22 and are therefore a critical dimension to assess in those at CHR for psychosis23. Unfortunately, first generation CHR rating instruments used to assess negative symptoms (e.g., SIPS, CAARMS, SPI) are suboptimal for several reasons. For example, they are based on outdated conceptualizations of negative symptoms, conflate constructs, fail to incorporate relevant contemporary developments in basic affective science, do not isolate primary negative symptoms, are highly influenced by secondary sources (e.g., depression, anxiety), have limited coverage of motivational and social problems relevant to adolescents, and are confounded by cognitive impairment23.

To address these issues and to develop a negative symptom instrument designed specifically for CHR youth, the Negative Symptom Inventory-Psychosis Risk (NSI-PR)24 was created in response to the NIMH MATRICS initiative25. Using guidelines established in the 2005 NIMH Consensus Conference on Negative Symptoms, the NSI-PR employed an iterative, data-driven approach that involved two phases. Phase 1 consisted of item creation and psychometric evaluation of an initial beta scale that consisted of sixteen items. A multisite psychometric study (the Georgia and Illinois Negative Symptom Study: GAINS) was conducted on 218 CHR participants with the goal of evaluating the psychometrics of the beta scale to later revise it and create a final, briefer scale to be validated in Phase 2. Results of the Phase 1 study indicated that the 16-items were best fit by five factors (anhedonia, avolition, asociality, alogia, blunted affect) and hierarchical structures, whereas one and two-factor structures offered a poor fit. The five domains demonstrated adequate internal consistency, good temporal stability, and high inter-rater reliability. Convergent validity was supported by moderate correlations with the SIPS negative subscale, and the GF: Social and GF: Role scales. Discriminant validity was supported by low correlations with SIPS positive, disorganized, and general symptoms. Item response theory and other psychometric criteria (e.g., factor loadings, item-level correlations, item-total correlations, and skew) identified items for removal and revision. These results were used to create a revised, briefer, final 11-item scale. Phase 2 of the GAINS study involved validating the final 11-item scale in n = 222 cases in a second study which supported the same five-factor and hierarchical structures identified in the first study. Inter-rater reliability, internal consistency, and temporal stability were again good for the final 11-item measure. Convergent validity was established via associations with the SIPS and functioning measures, while discriminant validity was supported by low correlations with positive, disorganized, and general symptoms. Thus, psychometric properties of the final 11-item NSI-PR were excellent. This final 11-item NSI-PR is used in the AMP SCZ project. Scores to be used include average scores for the five domains (avolition: items 1–2; asociality: items 3–5; anhedonia: items 6–7; blunted affect: items 8–10; alogia: item 11), as well as the two broader Motivation and Pleasure (MAP: items 1–7) and Diminished Expressivity (EXP: items 8–11) dimensions.

Diagnoses

The Structured Clinical Interview for DSM-526 (Modules A through E) is used in the AMP SCZ project to assess diagnostic criteria for psychotic, mood, and substance use disorders. Administering these modules allows for differential diagnosis of DSM-5 schizophrenia spectrum and associated disorders, which are primary outcome variables of AMP SCZ.

Depression

In addition to APS, ~40–60% of CHR individuals report current or previous depressive episodes27,28. This observation is noteworthy as depression is often the foremost initial complaint29. For some CHR individuals, more serious or recurrent depression has been observed to develop over the course of two years27, thus making depression an important outcome. In addition to obtaining DSM-5 diagnoses, we use the Calgary Depression Scale for Schizophrenia (CDSS), a structured interview, to capture depression severity on a continuous scale that is independent of negative symptoms30. The CDSS is one of the most widely used depression measures in clinical trials for psychosis and has been validated for CHR31.

General psychopathology

The Brief Psychiatric Rating Scale (BPRS)32 is a rating scale in which 24-items relating to different types of psychopathology are rated on a continuum of ‘not present’ to ‘extremely severe.’ The measure provides a score of general psychopathology, as well as subscale scores for positive symptoms, negative symptoms, affective disturbance, disorganization, and activation. The BPRS has a well-established history of acceptable psychometric properties33,34. The BPRS was chosen in AMP SCZ for several reasons. First, there is increasing recognition of the importance of transdiagnostic psychopathology in psychiatric research35, including CHR research36. There is also empirical work indicating the utility and validity of the BPRS as a transdiagnostic scale37, which has not yet been established for other commonly used instruments in this field, such as the Positive and Negative Syndrome Scale38. Second, given its wide use in psychiatric research, there are many other data sets with which AMP SCZ BPRS ratings can be compared, which will assist with characterizing the AMP SCZ sample in relation to other psychiatric cohorts, including other CHR cohorts. Finally, the BPRS has been selected in a number of previous prediction models of CHR clinical outcomes39, and thus may contribute to prediction models developed in AMP SCZ. The BPRS is conducted monthly in the AMP SCZ project, as it provides time series data that can be used to develop dynamic prediction models of the final clinical endpoints. In previous work, a joint model using repeat monthly assessments of BPRS scores yielded superior prediction statistics for conversion to psychosis compared to static baseline assessments in the same CHR cohort40,41.

Suicide and suicidal ideation

We included the Columbia Suicide Severity Rating Scale (CSSRS)42 to provide a standard method for quantifying the severity of suicidal ideation and behavior. The CSSRS has proven convergent and divergent validity with other scales, high sensitivity and specificity for behavioral classification, sensitivity to change, and internal consistency for administration in adolescent and adult cohorts43. It is easy to administer, available in more than one hundred languages, has minimal burden, and is part of the PhenX Toolkit. Furthermore, the CSSRS also uniquely captures a range of suicidal ideation, from the “wish to be dead” through “non-specific” suicidal thoughts to active suicidal ideation with intent and/or plan. This is useful for capturing the spectrum of suicidal ideation and behavior that is elevated among individuals at CHR for psychosis relative to the general population, ranging from less severe ideation of “wish to be dead”43 to self-harm and suicide attempts44.

Patient-reported outcomes (PROMs)

In our clinical assessments of psychopathology and functioning, we may be missing the important perspectives of CHR individuals themselves. Petros, et al.45 point out that the assessment of CHR individuals tends to focus on psychopathology with an emphasis on transition to psychosis, with few studies investigating participant perspectives, particularly potential protective or resilience factors that might contribute to such outcomes. Thus, it is important to consider the perspectives of CHR individuals by using Patient Reported Outcome Measures (PROMs). In fact, in a recent review46, it was observed that PROMS were rarely a primary focus in CHR studies and that very few PROMS have been validated for youth or CHR individuals. Thus, there is a need to have a core set of PROMS that could be used across different outcome studies, especially since data on patients’ experiences often provide supporting information in cases where the condition may not be that well-defined or may be useful in conjunction with biomarkers of symptoms or health improvement47.

Several brief PROMs are used. First, participants are asked at each assessment to rate their sense of the severity of their symptoms using a Patient Global Impression of Severity (PGI-S)48 scale developed for this project.

The Overall Anxiety Severity and Impairment Scale (OASIS)49 which is assessed in AMP SCZ is a valid and reliable measure of anxiety severity and related impairment. Anxiety is identified as a key feature in CHR cohorts and is associated with severe APS50. This measure was chosen because it is a brief instrument (5-items) and can be used across multiple anxiety disorders, and in those experiencing subthreshold anxiety symptoms. This measure also includes a cut-off score to indicate clinically significant anxiety49.

Emerging studies also report significant sleep problems for CHR youth. Thus, the 8-item Patient-Reported Outcomes Measurement Information System-Sleep Disturbance (PROMIS-SD), a self-report measure, is administered to assess participants’ perceptions of their sleep quality, depth, and restoration within the past seven days51,52. This measure includes perceived difficulties falling asleep, and staying asleep, as well as satisfaction with sleep. While there is yet no evidence that disturbed sleep at baseline is associated with later transition to psychosis, it is associated with increased APS over time and thus may be a promising intervention target to optimize outcome53,54.

The Perceived Stress Scale (PSS)55 is used to capture the degree to which situations in participants’ lives are appraised as stressful (i.e., subjective experience of stress levels). Stress has been well established as playing a role in the emergence and reoccurrence of psychotic symptoms and has therefore been included in the current battery56,57. Importantly, this instrument is one of the more widely used psychological instruments for measuring the perception of stress.

Although there is limited evidence to suggest that increased rates of substance use are associated with transition to psychosis in CHR cohorts58, there may be CHR sub-groups for whom it is related to clinical outcomes and/or related to CHR clinical outcomes other than transition to psychosis. Accordingly, the Alcohol, Smoking and Substance Involvement Screening Test (ASSIST)59 was included in the AMP SCZ battery. The ASSIST is a widely used measure endorsed by the World Health Organization for the assessment of drug and alcohol use.

Perceived discrimination is observed to occur more frequently in CHR individuals60 and has been associated with later transition to psychosis61. We therefore included a brief version of the Perceived Discrimination Scale62 to determine whether participants have experienced discrimination in their lifetime.

The Psychosis Polyrisk Score (PPS) is used to capture exposure to a range of environmental risk and protective factors (e.g., family history of mental illness, family environment, neighborhood, psychological, physical, or sexual trauma or abuse, nutrition, cannabis, and tobacco use) that may be associated with psychosis. Historically, identification of psychosis risk and prediction of outcomes has typically been evaluated through subthreshold symptomatology63. Developers of this scale propose that symptoms are not necessarily the underlying cause of psychosis and suggest that they are instead epiphenomena of underlying gene-by-environment interactions that directly modulate psychosis risk64,65,66. Of note, the accumulation of these interactions through exposure to risk factors results in increased psychosis risk67,68,69,70,71. Exposure to these risk factors is often observed in the recruitment phase of CHR individuals72,73. However, these risk factors have typically been measured individually and considered independently of one another.

Thus, the PPS, a self-report tool, was developed70,74 to measure multivariate exposure to environmental risk and protective factors for psychosis onset. The PPS shows high variance in simulated general population data70,74 and significantly higher scores in individuals referred for a CHR assessment compared to healthy controls74. Although the PPS appears to have the potential for enhancing the identification of at-risk individuals and predicting their outcomes, it requires further validation in larger samples, such as the AMP SCZ observational study. Through the data collected in AMP SCZ, we will thus be able to validate the PPS prospectively as a prediction tool and refine it to allow for more complex relationships between risk factors and outcome. The PPS will not only allow the creation of a global score of environmental risk, but it will also make possible the investigation of more specific links between some environmental insults and outcomes, as well as the mediating mechanisms.

Other measures

Other measures in AMP SCZ include a comprehensive assessment of demographics at baseline. The sample ranges from 12 to 30 years of age, which covers a range of developmental stages. These stages are assessed using the Pubertal Development Scale75 to determine the development of secondary sexual characteristics in those 18 years of age and younger. To complement the assessment of functional outcome, the well-established Premorbid Adjustment Scale (PAS)76 is administered at month one follow-up. In addition, health conditions and medication usage are monitored at regular intervals. See Table 2 for the Schedule of Clinical Assessments.

Table 2 AMP SCZ Schedule of Clinical Assessments.

Validity of measures for adolescents and young adults

There are often concerns about using measures designed for adults with those under 18. Several of our scales were designed specifically for the CHR age range of 12–30 (PSYCHS, Global functioning: social and role, NSI-PR) or to include 12–18-year-olds (C-SSRS, PROMIS measures); and some have been validated in a CHR population (CDSS). Scales that have been used with adolescents and CHR populations that have not been validated for youth include the ASSIST, Perceived Discrimination, PSS, the BPRS, the OASIS and the SCID. With the wealth of data from this AMP SCZ study there are plans to examine the validity for CHR youth for as many of these measures as possible.

Training on clinical measures

For the key clinical measures, PSYCHS, GF: Social and Role, NSI-PR, BPRS, CDSS, and SCID, all raters undergo intensive training and meet pre-determined reliability standards to be certified on these measures. All initial training is conducted live on Zoom and offered at different times to accommodate the range of time zones involved in the AMP SCZ project. Training is also video recorded for future use by new raters. Raters are sent materials to review prior to the training sessions. Following the training, raters practice the measures, can ask trainers further questions, and finally complete the training video cases required for certification. Training outcomes are presented in Supplementary Material 1: Training on Clinical Measures. Interclass correlations (ICC), based on data collected to date, demonstrate the inter-rater reliability between trainees and the gold standard for these measures (Table 3). A two-way mixed effects model with absolute agreement type, single measures ICC was selected for our multiple-rating, multiple-rater design. All ICCs are in the excellent range.

Table 3 Intraclass correlations between interviewers and gold standards on key clinical measures on training case videos.

For the self-report measures, training requires raters to complete these scales themselves. To make sure they are familiar with the items, a one-hour live Zoom meeting is conducted for raters to ask any questions that arise from their practice sessions. Similarly, a 2-h live Zoom meeting is held to demonstrate to raters how to complete medication and treatment logs. These meetings are repeated a second time and recorded. Finally, for the CSSRS, raters are required to complete the online training and to submit their training certificates. Follow-up training includes reviews of the key clinical measures and repeated annual assessments of reliability of the PSYCHS.

Translation of measures for a multi-national study

In addition to the English-speaking sites, the AMP SCZ project includes non-English speaking sites in Germany, Switzerland, Chile, Korea, China, Italy, Spain, Denmark, and Quebec, Canada. This consortium therefore required the translation of several measures. Our first step was to find translations that were already available and possibly validated. Next, we checked to see if any of the non-English sites already had translations for any of the clinical measures we chose to use. A significant number of the key scales were already translated and back-translated by a professional translation company. Finally, for the remainder of the scales that were neither available nor translated professionally, they were translated by the site that required them, back-translated by another group at the same site and then the back-translation was independently reviewed by the Calgary site. Details of the translations are presented in Supplementary Table 1.

Consensus diagnosis for study entry and psychosis transition

After the screening assessment with the PSYCHS, for the CHR participants who met criteria, vignettes were prepared for the purpose of obtaining a consensus diagnosis. Raters complete the vignette template with a written description of each symptom and ratings for the four PSYCHS measurement concepts (description, tenacity/source, distress, and interference) for each of the 15 symptoms6. The vignette is written so another rater can review the information under each symptom category and provide a reliable rating. Once approved at the site level, the vignette is presented on a conference call for a consensus decision on the symptom ratings, as well as the diagnosis. Due to the number of AMP SCZ sites and their international distribution, sites are divided into five different consensus calls that meet weekly and are attended by all raters from the sites. These weekly calls are led by J. Addington and J. Schiffman, M. Calkins, M. Kerr and B. Nelson, B. Walsh, and A. Yung. Submitted vignettes are individually reviewed and discussed, and a consensus is reached on each symptom rating, diagnosis, and ultimate admission into the study. When a participant is thought to have made the transition to psychosis, a vignette detailing the transition is prepared, and the above procedure for consensus is followed. Monthly calls are held with all consensus leaders to review any concerns and to develop an ongoing FAQ document. The FAQ document consists of queries by sites on specific rating issues which aids the training and ongoing rating with the PSYCHS.

Concept stability

As the initial participants (n = 160) were recruited, we were interested in examining concept stability. The SCZ-AMP study was approved by the Institutional Review Boards of all participating sites. All participants were provided with written informed consent, including parental consent for all minors. For this we only focussed on key clinical symptom measures that, at the time of submission had been administered at baseline and at 2-month follow-up. To examine any change over time we used a paired t-test, which also gives a correlation between the scores at the two time points. The measures examined included total scores from the PSYCHS, BPRS, CDSS, NSI-PR, OASIS, PGI-S, GF: Social and Role, and SOFAS. Since there may be changes in clinical symptoms in either direction for CHR participants over this measurement interval, we expect more variation in these stability measurements than for test-retest scores, with the exception of less in negative symptoms and functioning, as these measures tend to be more stable. These results are presented in Table 4.

Table 4 Paired sample analyses between baseline and 2-month follow-up on total scores.

For all measures, there were highly significant correlations between baseline and 2-month follow-up. There were no significant differences, on average, between baseline ratings and 2-month ratings within subjects for the GF: Social and Role, negative symptoms, SOFAS, or patient global impression of severity. However, there were significant changes in APS, anxiety, depression, and general psychopathology, all in the direction of improvement. For the PSYCHS, there was an average decrease of 4.25 points (confidence limits of 3.07–5.43), for the BPRS, an average decrease of 2.56 points, (confidence limits of 1.24–3.88), for the CDSS, an average decrease of 0.89 points (confidence limits of 0.22–1.56), and for the OASIS an average decrease of 1.02 units (confidence limits of 0.37–1.67). Although these changes were statistically significant in this large cohort, they are not necessarily clinically significant. As more data becomes available, similar stability estimates could inform expected placebo effect sizes for future clinical trials.

Conclusion

The AMP SCZ Clinical Ascertainment and Outcome Measures working group had the important tasks of defining inclusion and exclusion criteria, determining the primary and secondary clinical and functional outcomes, and choosing valid and reliable measures for each of these outcomes. Training, translation of measures, and consensus ratings of the PSYCHS to determine if an individual meets inclusion criteria were part of the team’s remit. Early analysis shows that symptom and functional ratings are relatively stable between baseline and the 2-month point, with some improvement occurring over this time at the group level. Our robust procedures for the clinical and functional assessments should permit our methodology to be replicated by other groups. Details of the AMP SCZ project including standard operating procedures are available at www.ampscz.org.