Introduction

The Accelerating Medicines Partnership® Schizophrenia (AMP®SCZ) initiative is a public-private partnership that ultimately aims to identify preventative interventions for schizophrenia and related psychotic disorders. AMP®SCZ builds on previous studies that developed and validated criteria for psychosis clinical high-risk (CHR) syndromes. Meta-analyses confirm that individuals meeting CHR criteria have an elevated risk of psychosis, with about 20% developing a psychotic disorder within two years and 35% within ten years1,2. While about 40% of persons meeting CHR criteria have symptomatic remission and good functioning, the remainder experience significant functional impairments and a mix of attenuated psychosis, cognitive, negative and/or mood symptoms1,2. As an initial foundational step, AMP®SCZ aims to ascertain biomarkers that both predict clinical outcomes and provide insights into the biological processes driving clinical outcomes in persons meeting CHR criteria3. AMP®SCZ includes the Psychosis Risk Outcomes Network (ProNET), involving 28 international sites recruiting 1040 CHR and 390 community controls and the Prediction Scientific Global Consortium (PRESCIENT) involving 15 international sites recruiting 937 CHR and 250 community controls. Participants’ clinical status is monitored periodically over two years and biomarker assessments are conducted at baseline and at two-month follow-up, with the intention to examine both baseline and two-month change as predictors of clinical outcomes.

The following provides our rationale for examining biomarkers derived from blood and saliva to predict clinical outcomes in individuals meeting CHR criteria. We also describe our preanalytical and analytical methodological considerations. Other papers in this special issue discuss biomarkers derived from brain imaging4, electrophysiology5 cognitive evaluations6, speech and facial expression analyses7, as well as actigraphy and other digital health technologies8.

Overview: biomarkers from blood and saliva

As discussed below, previous studies have shown that psychosis and psychosis risk are associated with altered levels of various blood-based biomarkers and salivary cortisol. These changes reflect dysregulation in stress-related hormones, immune responses, complement and coagulation systems, and redox mechanisms. The abundance of a variety of hormones and other circulating molecules in saliva is highly correlated with their abundance in blood9. However saliva is the preferred source to measure cortisol and other hormones as saliva contains only unbound and thus bioactive forms, in contrast to blood levels that include both the protein-bound (inactive) and protein-free (active) fractions9.

More than 10,000 proteins have been identified in blood10,11, with the majority entering via the lymphatic system. These proteins originate from interstitial spaces and include locally secreted proteins as well as cellular and extracellular matrix debris. The cellular sources of these blood molecules are becoming increasingly understood12,13. For example, a recent study identified 184 blood proteins primarily sourced from the brain14, most of which have not been investigated in psychosis-risk cohorts. Many of the biomarkers associated with psychosis and psychosis risk are part of the “blood secretome,” which currently includes 780 proteins identified as being actively secreted from various tissues into the bloodstream13,15. For example, the Human Protein Atlas15 lists 14 brain-derived secreted proteins in blood, including hypothalamic hormones such as oxytocin and corticotropin releasing hormone.

Advances in proteomic assay methodologies and genomics have significantly improved our understanding of the blood proteome. Recent research shows that protein levels remain relatively stable within a healthy individual over days to a few years16,17,18,19. Large-scale, longitudinal population-based studies have measured thousands of blood proteins, examining protein-protein and protein-gene relationships in relation to disease status and risk20,21,22,23. These studies have identified networks of correlated proteins, characterized by highly connected “hub” proteins that systematically differ in individuals with a disease or who later developed that disease compared to those without the disease21,22,23,24,25. Levels of “hub” proteins are associated with genetic variants within or near the gene coding for that protein (in cis). The “hub” protein variant further impacted levels of other proteins within the hub protein network (in trans)20,21,22. Mendelian randomization studies have further linked “hub” proteins in disease-associated networks to genetic variants identified in genome-wide association studies for that disease21,24,26. Several studies have included schizophrenia24,27,28 and while further research is needed it is worth noting that many of the blood proteins so identified as associated with schizophrenia are novel discoveries.

Proteomic networks in blood are thought to play roles in maintaining homeostasis and integrating the functions of organ systems throughout the body, including the brain20,21. Blood molecules influence brain function by interacting with brain-neurovascular interfaces and the afferent vagus nerve29,30,31. Most of the brain’s vasculature is surrounded by astrocyte endfeet, enabling blood-borne molecules such as hormones and cytokines to signal astrocytes. This can occur directly when these molecules cross the blood-brain barrier, or indirectly when they activate blood vessel endothelial cells to release signaling molecules that target receptors on the astrocyte endfeet. Additionally, blood-borne molecules may interact directly with neurons at circumventricular organs, which lack a blood-brain barrier32,33. For example, during a viral infection, the immune system releases cytokines into the blood that are hypothesized to signal the brain to conserve bodily resources for defense. Experimental evidence indicates that the subjective experience and behavioral responses, known as “sickness syndrome” include decreases in motivation, goal directed behaviors, energy, speech, affective expression, and social engagement–symptoms that resemble negative symptoms30,34.

In addition to proteins, blood contains numerous other constituents, including secreted vesicles35 and cells. Vesicles are enclosed by the plasma membrane of their source cells and carry proteins, lipids, and nucleic acids. The vesicle membrane displays signaling molecules reflective of the source cell, enabling recognition and communication with distant target cells. Secreted vesicles play key roles in processes such as immune system regulation and neurodevelopment36. Circulating cells, such as leukocytes (white blood cells), are vital to the inflammatory response. However, it is less commonly recognized that the brain relies on molecular signals from specific leukocyte subtypes, for healthy functioning37,38,39. Circulating cells are readily accessible and thus can be used as proxies for specific biomarkers of health status. For example, the fatty acid content of red blood cell membranes is used to measure omega-3 levels40. Additionally, circulating leukocytes are a common source of nucleic acids for DNA, transcriptome, and microRNA analyses.

Promising biomarkers

This review highlights findings from schizophrenia and clinical high-risk (CHR) studies that support our plans to use polygenic scores as biomarkers of genetic risk and salivary cortisol as a biomarker of hypothalamic-pituitary-adrenal axis dysfunction. We also review blood proteomic studies in schizophrenia and CHR populations. While no single blood proteomic biomarker has been identified to reliably distinguish CHR individuals who transition to psychosis from those who do not, only a small portion of the blood proteome has been explored to date. Here, we present our rationale for utilizing next-generation multiplex affinity assays, which are capable of measuring thousands of blood proteins not yet studied in CHR cohorts. We also discuss other promising biomarkers, including those involved in redox balance and related mechanisms, that merit further study.

Genomics

Evidence of a genetic contribution in schizophrenia is extensive; first-degree relatives of probands with schizophrenia are at increased risk (relative risks for schizophrenia range from 8.55 to 12.8 (parent 9.43, offspring 9.7–12.8, full siblings 8.55–9))41,42,43,44. The heritability of schizophrenia is estimated to be high (pedigree- ~0.63–0.6742,43,45, twin- ~0.57–0.81. 46,47,48,49). The genetic architecture is polygenic, with contributions from common (single nucleotide polymorphism (SNP)-based heritability estimates ~0.23–0.3248,49) and rare variants50,51,52,53,54, rare copy number variations55,56, and de novo mutations57,58. A polygenic score (PGS) captures the contribution of many risk alleles across the genome and has been applied in schizophrenia research59; the PGS difference between schizophrenia cases and controls has been robustly reproduced50,60. Genetic risk factors point to causal factors in contrast to all other biomarkers which may reflect cause or consequence of disease onset. Patients in the top decile of PGS distribution have 10-fold higher schizophrenia risk than those in the bottom decile60. PGS for schizophrenia was higher among CHR converters than nonconverters or unaffected controls, while there is no consistent difference between nonconverters and unaffected controls61. In addition, polygenic risk for attention deficit disorder was elevated in persons meeting CHR criteria62.

Large-scale schizophrenia genetic studies primarily involve populations of European ancestry, while studies of those of Asian and Black/African American ancestry trail behind50,63,64,65. While it might seem reasonable to assume that the common causal risk variants are ancient and shared across ancestries, in fact the correlation structure in genomes differs between ancestries, reflecting population history events66. Hence, the PGS predictive power differs for patients of different genetic ancestries. Method development to improve translation of PGS across ancestries is an active area of research67,68. The AMP®SCZ study includes participants of diverse ancestry, and so we propose to use the Blended Genome Exome (BGE) assay, recently developed at the Broad Institute. It is a cost-effective approach to capturing genetic diversity that blends high-pass sequencing of protein-coding and low-pass sequencing of nonprotein-coding regions69.

Hypothalamic-pituitary-adrenal axis dysfunction

Stressors, both physical, such as inflammation and injury, and psychological stimulate the release of corticotropin-releasing hormone from the hypothalamus. This, in turn, triggers the pituitary gland to secrete adrenocorticotropic hormone into the bloodstream, which prompts the adrenal glands to release cortisol into circulation70. Circulating cortisol enters the brain through passive diffusion71 and activates glucocorticoid receptors in hypothalamic and hippocampal neurons, providing negative feedback that dampens further corticotropin-releasing hormone release. Chronic stressors may result in prolonged HPA hyperactivity leading to an adaptive reduction in sensitivity of glucocorticoid receptors, reducing the cortisol responses to stress72.

Cortisol studies benefit from accounting for its diurnal variation. In healthy individuals, salivary cortisol levels begin to rise rapidly during the last few hours of sleep, peaking about 30 minutes after awakening. Levels then decline throughout the day, with the rate of decline slowing from late afternoon into the night, reaching their lowest point around midnight. Within a single individual, there is typically a 3- to 4-fold difference between peak and nadir cortisol levels, making it essential to control for this diurnal variation in cortisol research.

It is important to address diurnal variation and potential confounders in studies involving cortisol as part of study design and analysis procedures. One promising approach to address diurnal effects is “residualization,” which involves creating a new cortisol value where diurnal effects have been removed. For example, this could be calculated as the difference between the observed cortisol level at a specific time of day and the expected level based on data from a control group73. We and others have found that controlling for diurnal effects meaningfully impacts study results74,75,76. In addition, both antipsychotic medications and antidepressants have been shown to suppress cortisol levels in both healthy and clinical populations77,78,79. One challenge in clinical high-risk research is that a subset of individuals is already on psychotropic medication at baseline, with a significant portion receiving antipsychotics. Follow-up studies show that CHR individuals on antipsychotics at baseline are more likely to transition to psychosis80. Additionally, those with more severe attenuated positive symptoms at baseline are at higher risk of psychosis conversion81. CHR individuals on antipsychotics are not only more likely to transition to psychosis but are also more likely to exhibit medication-induced cortisol suppression. This creates a confounding factor that can obscure the identification of reliable biomarkers for conversion.

To address these methodological challenges the North American Psychosis-Risk Longitudinal Study (NAPLS2) measured salivary cortisol at multiple time points, excluded participants taking antipsychotics, and adjusted cortisol values for the time of sample collection using residualization. Baseline cortisol levels in the NAPLS2 study were significantly higher in CHR individuals who later converted to psychosis compared to non-converters and community controls75,76,82. Furthermore, adding baseline cortisol levels to a multivariate psychosis risk calculator—which already included factors such as the severity of unusual beliefs and paranoia, decline in social functioning, verbal memory, and information processing speed—significantly improved the calculator’s ability to predict psychosis conversion75. Two other studies that measured salivary cortisol at multiple time points and either adjusted for the time of sample collection74 or standardized sample collection times83, also found differences in cortisol levels between CHR psychosis converters and non-converters. Notably, Labad and colleagues74 standardized cortisol sample collection times (on awakening and 30 and 60 minutes post-awakening) and observed that also controlling on awakening time and other potential confounders yielded significant differences between CHR psychosis converters and non-converters. In contrast, differences in raw, unadjusted cortisol levels approached trend-level significance (p = 0.14). Studies that used a single cortisol measurement and/or did not adjust cortisol values for collection time have not found differences in baseline cortisol between CHR converters and nonconverters84,85,86.

In particular, CHR individuals show evidence of prolonged hypothalamic-pituitary-adrenal hyperactivity related to chronic stress that results in an adaptive reduction in sensitivity of glucocorticoid receptors, blunting the cortisol response to acute stressors70,72,87. Many76,82,85,88,89,90, but not all84,89,91,92,93,94,95,96, studies have found elevations in basal cortisol levels or cortisol reactivity in CHR compared to controls, especially if CHR individuals on psychotropic medications are excluded85,88. Studies linking hypothalamic pituitary axis dysregulation to other psychosis risk biomarkers, especially brain imaging and cognition, may improve psychosis risk prediction and further our understanding of mechanisms. For example, imaging studies in CHR found salivary cortisol levels to be inversely correlated with hippocampal volumes76,91,97.

Biomarkers of systemic chronic inflammation

The canonical role of the immune system is to orchestrate an organism’s defense against infections. The acute phase response begins when sentinel cells of the innate immune system detect invaders or tissue damage, releasing cytokines locally and into the bloodstream. This signals the recruitment of immune cells and prompts the liver to produce acute phase proteins (e.g., C-reactive protein, complement, and coagulation molecules). Cytokines also communicate the threat to other organs, including the brain, triggering energy-conserving adaptive responses98. If necessary, the innate immune system activates the adaptive immune system to produce antibodies. Once the threat is resolved, pro-inflammatory molecules are replaced by anti-inflammatory ones, promoting tissue repair.

Systemic chronic inflammation is a low-grade, persistent inflammatory state that can arise from prolonged exposure to xenobiotics, psychological stress, or other immune triggers, coupled with a failure to resolve inflammation99. Unlike acute inflammation, it involves subtle changes in cytokines, complement, coagulation factors, C-reactive protein, and related molecules with both pro- and anti-inflammatory roles. Systemic chronic inflammation becomes more common and severe with aging and is linked to conditions such as cardiovascular disease and diabetes.

Molecular hallmarks of systemic chronic inflammation in schizophrenia include elevated levels of proinflammatory cytokines including interleukins (interleukin-1B, interleukin-6, interleukin-8), and tumor necrosis factor, and anti-inflammatory cytokines (interleukin10, interleukin-1 receptor agonist and soluble interleukin-2 receptor)100,101,102. Meta-analysis also found c-reactive protein elevations in schizophrenia102,103. Cytokine levels vary across the disease course, with interleukin-2 and interferon-gamma elevated in acute but not chronic schizophrenia, while interleukin-4, interleukin-12 and interferon-gamma were lower in chronic schizophrenia100. Alterations of the coagulation system in schizophrenia point towards a pro-coagulant phenotype, including higher prevalence of protein S deficiency, lower tissue plasminogen activator activity104,105, and increased D-dimer levels106.

Evidence of systemic chronic inflammation is found at the onset of psychosis. Meta-analyses reported increased blood levels of interleukin-6, interleukin-12, interleukin-17 and interferon gamma relative to controls107,108,109. In addition, meta-analysis also found that interleukins-1B, interleukin-2, interleukin-6 and tumor necrosis factor levels were positively correlated with severity of negative symptoms108. Additional studies have found alterations in various complement factors110,111,112 and coagulation system molecules113 in early schizophrenia compared to controls, consistent with systemic chronic inflammation.

The limited number of CHR studies provide evidence that systemic chronic inflammation precedes and predicts the onset of psychosis. Two contemporaneous meta-analyses found higher interleukin-6114,115 and lower interleukin-1B in CHR relative to controls114. Longitudinal CHR studies have examined relationships between baseline levels of inflammatory molecules and subsequent development of psychosis including the European Gene-Environment Interaction (EU-GEI), Shanghai At Risk for Psychosis (SHARP), North American Prodrome Longitudinal Studies (NAPLS) studies, and the NEURAPRO clinical trial. Studies in the EU-GEI cohort found higher levels of the cytokine vascular endothelial growth factor, higher ratios of interleukin-10 to interleukin-6116, and differences in several complement and coagulation system molecules117 in CHR converters compared to nonconverters. The SHARP study reported lower levels of interleukin-2, interleukin-6, and interleukin-10 as well levels of two complement proteins in CHR converters versus nonconverters118,119. The NAPLS2 study used a multiplex assay with 185 biomarkers and found that 19 were differentially expressed in CHR converters compared with controls120. No biomarkers were differentially expressed in CHR-converters compared to nonconverters, although the dynamic range of the assay was not sensitive enough to detect many cytokines including interleukin-6. In contrast to findings in the EU-GEI cohort117, a recent analysis in the NAPLS2, NAPLS3, and NEURAPRO cohorts focused on complement and coagulation proteins did not find differences in CHR converters versus nonconverters121. However, in both NAPLS2 and NAPLS3 complement and coagulation regulatory proteins predicted severity of negative symptoms at follow-up, even after controlling for baseline severity of negative symptoms and other prognostic factors122.

Population-based studies have found relationships between elevated c-reactive protein with increased risk of developing schizophrenia in early adulthood123. In a longitudinal study applying discovery-based proteomic techniques to blood samples taken at age 11, several complement and coagulation proteins were differentially expressed among those who developed a psychotic disorder at age 18124. Similar findings were observed in a study of psychotic experiences125.

Systemic chronic inflammation involves subtle protein network changes, making single-molecule differences weak predictors. However, prediction improves with multivariable risk models. Two studies used different, but partially overlapping panels that included 185120 and 225126 biomarkers involved in inflammation, hormonal signaling, metabolism, and oxidative stress. Machine learning approaches were used to generate multivariable risk classifiers that significantly discriminated CHR converters from nonconverters (AUC = 0.88120, AUC = 0.82126) and CHR converters from health comparison subjects (AUC = 0.91120, AUC = 0.90126). The majority of the chosen biomarkers have roles in hormonal and inflammatory responses consistent with systemic chronic inflammation.

Redox dysregulation

Redox reactions are positioned at the nexus of multiple critical biochemical pathways including immune signaling/neuroinflammation as well as mitochondrial function/bioenergetics, neurodevelopment, and neuronal plasticity127,128,129. Redox dysregulation (an imbalance between reactive oxygen species and antioxidant defenses) related to mitochondrial dysfunction links genetic and environmental risk factors driving the onset of psychosis130,131,132,133,134,135,136,137. Meta-analyses of FEP and chronic schizophrenia reported elevations of oxidatively damaged lipids, proteins, and nucleic acids in blood138,139,140,141, but yielded mixed results regarding redox pathway substrates and enzymes139,141,142, possibly related to preanalytic factors137, sexual dimorphisms143, or confounders140,144. Evidence of oxidative damage is found in CHR120,145,146 and in one study was predictive of psychosis conversion120.

Disruptions in parvalbumin fast-spiking GABA interneurons are a hallmark of schizophrenia147,148,149,150. Parvalbumin interneurons play a pivotal role in maintaining cortical excitatory-inhibitory balance, essential for high-frequency neuronal synchrony151 and optimal cognitive, emotional, and social behavior149,152. Parvalbumin interneurons are particularly vulnerable to redox dysregulation and mitochondrial damage due to their elevated metabolic activity needed to support high-frequency discharge153. Indeed, in multiple rodent models relevant to schizophrenia, oxidative stress-induced parvalbumin interneuron impairments appear as a common underlying pathway133,154,155.

MicroRNAs are RNA molecules that are not coded into proteins, but instead regulate the translation of proteins. A novel microRNA mechanism linking mitochondrial and oxidative stress-induced parvalbumin interneuron impairments in schizophrenia has been identified156. Both genome-wide association studies and other lines of research implicate microRNA miR-137 in schizophrenia risk157. Oxidative stress upregulates miR-137 expression in parvalbumin interneurons, and in parvalbumin interneurons mitochondria miR-137 negatively regulates COX6A2, a subunit of cytochrome c oxidase complex IV in mitochondria specific to parvalbumin interneurons, impairing mitophagy leading to mitochondrial damage accumulation156. Remarkably, a formulation of the mitochondrial-targeted coenzyme Q with enhanced bioavailability rescued these pathological processes156.

Translating these findings to individuals with early psychosis, brain-derived vesicles (exosomes) from blood were analyzed for miR-137 and COX6A2 as proxy markers for parvalbumin interneurons microcircuit impairment. Among 272 individuals with early psychosis, a sub-group showed increased blood exosomal miR-137 and decreased COX6A2, as well as mitophagy markers when compared to controls156. The colocalization of COX6A2 and parvalbumin in exosomes of neuronal origin emphasizes that plasma exosomal COX6A2 levels reflected central parvalbumin interneurons integrity. Individuals with abnormal miR-137 also showed a reduction of auditory-steady-state response in gamma oscillations measured using electroencephalography, worse psychopathological status and neurocognitive performance, and impaired global and social functioning. As auditory-steady-state response requires healthy parvalbumin interneuron-related networks, alterations in plasma exosome levels of miR-137/COX6A2 may represent a proxy marker of parvalbumin interneurons cortical microcircuit impairment156. Stratification based on exosomal miR-137/COX6A2 allowed, with high selectivity and specificity (AUC = 0.96), the selection of early psychosis for studies of treatment targeting brain mitochondria dysregulation.

Sample acquisition and processing methods

Ex vivo changes in the concentration of many blood proteins are often linked to improper blood acquisition and processing procedures, particularly those that result in hemolysis and platelet activation158,159,160. To address these issues, the Standard Operating Procedures for the AMP®SCZ project—available on the ampscz.org website—were designed to minimize the impact of preanalytical factors on blood and saliva samples. Our procedures include detailed phlebotomy and blood processing protocols designed to minimize hemolysis and platelet activation. In addition lab technicians recorded levels of hemolysis and lipemia in plasma and serum samples. Time from sample acquisition to freezer was set at 60 minutes in order to minimize biomarker degradation. Study participants were given information about avoiding specific activities and substances known to impact cortisol levels, and saliva samples were collected on ice to minimize degradation. We collected information on various factors known to influence blood- and saliva-based biomarker levels for use in analyses, for example to control for potential confounders or to aid in interpreting outliers. This includes data on current health status, recent medication and drug use, activity levels, blood pressure, body temperature, sleep and wake times, and body mass index161.

We employed strategies to minimize the risk of sample mix-up after processing. We required the cryovial barcodes to be scanned into the database immediately before or at the time of sample processing. We used different color-coded cryovial caps for whole blood, serum, and plasma specimens and specified an orderly cryovial storage arrangement in the storage boxes, which allowed us to easily monitor the database for inconsistencies. Finally, differences between genetically determined sex and recorded biological sex will be available as identifiers of misidentified samples. We also monitored other critical metrics, including the time from sample collection to freezing and freezer temperature excursions. We intervened with retraining at sites where issues were identified.

To increase adherence to blood collection protocols, metadata documentation, and blood processing procedures, study staff were required to complete a training program that reviewed the standard operating procedures and pass a test on the material. Staff were also required to complete an annual refresher training. Additionally, we held monthly meetings with study staff to review implementation issues and provide retraining when problems were identified.

Variables, assays and analyses considerations

Polygenic scores

In addition to an elevated risk for psychosis, individuals meeting high-risk criteria for psychosis also have higher rates of other mental disorders. Therefore, polygenic scores developed for cross-disorder associations (e.g., the Cross-Disorder Group of the Psychiatric Genomics Consortium) as well as for specific disorders (e.g., schizophrenia, bipolar disorder, depression, ADHD, autism) and other relevant traits (e.g., cognition, resilience) are proposed as potential predictors of clinical outcomes. We plan to use the Blended Genome Exome (BGE) assay, a cost-effective approach that identifies rare variants in the coding genome while simultaneously capturing common variation. The BGE combines high depth sequencing (~ 30X) of the exome (2% of the genome) with the remainder of the genome covered at 1-3X, a depth selected to capture common variation in all ancestries162. This approach will allow identification of rare variants previously reported as associated with psychiatric disorders including copy number variants, and will contribute to the global rare variant genetic discovery efforts51,53.

QC parameters for BGE data will include exome coverage, whole genome coverage, per sample and per variant call rate, contamination rate, and chimera read rate. Additional sample QC filters will include filters for samples > 4 median absolute deviations within ancestry groups defined by reference datasets as well as based on the following metrics: transition/transversion ratio, heterozygosity ratio, number of insertions, number of deletions, etc. Samples where reported biological sex does not agree with genetically determined sex are excluded. Results are imputed163 to allow generation of polygenic scores based on results from large-scale GWAS studies.

Cortisol

Salivary cortisol levels will be included as a variable in development of the AMP®SCZ risk prediction calculator. Saliva samples are collected on ice at three time points over two hours and then stored at −20C. Saliva will be assayed for cortisol using the Salimetrics ELISA platform at two Salimetrics facilities using assays from the same lot (to avoid any lot-to-lot variation). Cortisol levels will be adjusted for diurnal phase75 and other potential confounders. The mean value of the three adjusted cortisol levels will be used in the risk prediction model.

Proteomics

The final proteomic platform choice will depend on assay cost and coverage of high-priority proteins, including inflammatory, complement, coagulation, and redox system biomarkers associated with psychosis, brain-derived blood proteins, other proteins influenced by genes implicated in schizophrenia risk, and proteins identified as part of the blood secretome. Our prediction models will incorporate baseline protein levels and their changes over two months. Our overarching goal will be the development of multivariable risk classifiers for psychosis and other clinical outcomes.

For this study, we plan to utilize either the Olink or SomaScan platform18,164. Olink employs a proximity extension assay, where a pair of antibodies specific to the target protein are linked to DNA oligonucleotides. After antibodies bind to the target protein, the oligonucleotides anneal to form a DNA duplex, which is then quantified using real-time PCR. SomaScan uses fluorescently labeled single-stranded DNA molecules that are chemically modified to bind specific proteins with high sensitivity and specificity. Both platforms offer validated multiplex assays for thousands of proteins, covering a wide dynamic concentration range consistent with much of the blood proteome, as well as excellent specificity (minimal or no cross-reactivity) and reproducibility11,18.

Raw biomarker data will be analyzed following the guidelines provided by the platform manufacturer. We anticipate that most proteins will show consistent levels between baseline and two months, allowing us to use the distribution of coefficients of variance as a measure of platform reproducibility. Principal component analysis or Grubbs’s test will be conducted to identify outliers. Additionally, we will evaluate the relationships between protein levels and factors such as hemolysis levels, time to freezer, and clinical confounders (e.g., body mass index). A quality control assay using proteins assayed by the chosen platform will also be employed to detect ex vivo blood cell or platelet activation or damage that has impacted certain biomarker levels159,165.

We will use machine learning algorithms to develop multivariable classifiers predicting clinical outcomes. Machine learning algorithms are powerful at pattern detection, selecting combinations of variables with improved performance metrics over single biomarker predictors166,167,168. The curse of their complexity is that, unchecked, these algorithms “overfit” the data, meaning that they work well or even perfectly in the discovery data set but fail external validation. For this reason we plan to utilize machine-learning algorithms with features designed to minimize the risk of overfitting169. We also will employ several strategies that address overfitting, including utilization of internal re-sampling methods (cross-validation and bootstrapping) and permutation testing. Resampling strategies estimate model performance in the population of individuals from which the study participants were drawn by repeatedly testing the performance of an algorithm in randomly selected subsets of the development data. Also important is permutation testing which addresses the fact that machine learning algorithms can detect patterns in random data169. With permutation testing, the measure of model fit (e.g. AUC) is generated repeatedly (1000 or more times) in data where the outcome (e.g., developed or did not develop psychosis) has been randomly reassigned, dissociating any relationship with disease risk but preserving the intra-variable relationship (“nonsense data”). The model fit obtained using the true data is compared with the model fit distribution obtained in all the nonsense data. If the nonsense data never or seldom provides as good a fit as with the true data, then permutation testing implies the model has discovered information in the data.

Future directions

Integrating genetic findings with quantitative traits related to psychosis risk offers the possibility of elucidating causal mechanisms driving the development of psychosis. For example, it is well-recognized that some variation in blood biomarkers reflects genetic variation28. Moreover, the genetic architecture of biological traits such as cognition usually includes variants of much larger effect than those for common diseases, particularly in cis (i.e., genomically close) to key genes associated with the biomarker. Also, the impact of environmental risk factors, such as social determinants of health, on proteins found to be associated with psychosis risk could also be modeled25. Finally, we recognize that our plan does not include investigation of other substances in blood associated with psychosis risk, including microRNAs157,170,171, antibodies172,173,174,175, lipids132,146,176, and the kynurenine pathway177,178, to name a few and future studies might well add additional types of biomarkers. For this reason DNA, plasma, serum, whole blood, and saliva will be biobanked at the NIMH Biorepository as a limited resource for the scientific community.

Individual biomarkers explain only a small effect size, hence multivariate models combining data across modalities may improve accuracy of psychosis prediction166,167. These models have evolved from using purely clinical predictors to a diverse range of modalities measuring fluids, cognition, behavior, MRI, spectroscopy, and EEG179,180. Arguably, fluid biomarkers have so far been under-utilized in multimodal modeling of psychosis risk. A recent meta-analysis of machine learning and Cox regression–based models, identified only three models using fluid markers with other modalities (two included genetic analyses and one included serum proteins180). A further three studies were fluid-based (fatty acids, antioxidants and plasma biomarkers). Across these studies average model performance was reasonable; averages of CHR diagnostic models were 78% sensitivity and 77% specificity, while prognostic models had 67% sensitivity and 78% specificity180. The breadth of the AMP®SCZ protocol will allow the combination of fluid markers with unparalleled deep phenotyping of psychosis risk, including emerging modalities such as environmental and digital markers, facial expression, speech and ecological momentary assessment, and actigraphy181,182. Fluid markers should be added to multimodal modeling due to their unique explanation of inheritance and molecular mechanisms.

Summary

In summary, the AMP®SCZ project was designed to address several limitations of previous studies by recruiting a large number of participants (2000 CHR and 640 healthy participants), implementing detailed biospecimen collection and processing procedures designed to minimize preanalytic factors that might impact biomarker levels ex vivo, and collecting meta-data critical to the interpretation of assay results. The results of AMP®SCZ have the potential to lay the groundwork for the development of psychosis-risk blood or saliva tests and to further our understanding of (likely diverse) mechanisms related to psychosis onset. The results of the genetic and biomarker assays will be publicly available and likely a valuable resource to the scientific community.