Introduction

Schizophrenia is one of the most important public health problems in the world. Its prevalence ranges from 1.4 to 4.6 per 1,000 individuals at risk worldwide [1]. It begins early in life, tends to be chronic, and often persists with significant severity despite treatment. The complex and heterogeneous nature of schizophrenia, characterized by a broad spectrum of symptoms that vary widely among individuals, contributes to the challenges in diagnosis and treatment. Historically, specific symptoms have been defined as fundamental to the disease. Kraepelin defined avolition—along with other symptoms now classified as negative symptoms— as fundamental features of the disease. He also specifically noted the early onset of symptoms—in late adolescence or early adulthood—as well as the deteriorating course of the illness [2]. Therefore, it is unsurprising that Kraepelin initially named the disease dementia praecox. On the other hand, Bleuler, who gave the disease its name, recognized loss of affective responsiveness and disruptions in the continuity of associations (i.e., thought processes) as the fundamental symptoms, while he did not consider chronicity and deterioration to be primary defining characteristics [2].

Despite the initial efforts by pioneers to delineate the clinical syndrome of schizophrenia, there was considerable confusion regarding the classification and understanding of its symptoms. Based on Hughlings-Jackson’s dissolution theory, negative symptoms were defined as a loss of function in higher cortical regions [3]. In contrast, positive symptoms such as delusions and hallucinations represented an exaggeration of normal function, and might represent release or ‘disinhibition’ effects due to the loss of higher cortical functions [4]. However, merely defining and categorizing symptoms (including those in the American Psychiatric Association’s Diagnostic and Statistical Manual of Mental Disorders (DSM) system) has not resolved the issue of patient heterogeneity, nor has it significantly improved predictions regarding disease progression or treatment response. Moreover, the symptom profile of the patients might change according to psychotic exacerbations (e.g., more stable negative symptoms but relapsing and remitting positive symptoms) [5]. Thus, particularly with the advent and increased use of neuroimaging in the 1970s, some researchers have shifted their focus towards the pathological processes within the brain and have aimed to elucidate the biological foundations of the disorder. Tim J. Crow suggested classifying patients into two types based on symptom presentation, course, and neuroimaging findings [6]. Type I schizophrenia is primarily associated with positive symptoms and minimal cognitive decline, with generally normal CT/MRI scans, favorable responses to antipsychotic treatments, and good treatment outcomes. In contrast, Type II schizophrenia is predominantly characterized by negative symptoms, such as speech poverty and avolition, poor treatment response, and unfavorable outcomes, with neuroimaging revealing enlarged ventricles and cortical atrophy. Although this subtyping approach exhibited limited validity, it marked a significant step towards classifying schizophrenia patients based not only on phenomenological but also on neurobiological characteristics. Subsequent researchers, such as Carpenter, have defined a deficit syndrome within schizophrenia--based on persistent primary negative symptoms that are not attributable to depression, antipsychotic side effects, or inadequate environmental stimulation [7]. Accumulating research has raised the possibility that patients with the deficit syndrome may exhibit distinct neuroimaging and neuropsychological characteristics, suggesting a potential for subclassifying patients according to phenotypes that reflect the neurobiology of the disease [8,9,10,11]. On the other hand, Goldberg and Weinberger have proposed that variations among patients could be more concisely explained along a severity dimension, potentially accounting for much of the observed variance in symptomatology (Fig. 1, Model 1) [12]. They further suggested that schizophrenia is a disorder with multiple etiologies and phenocopies, potentially representing the manifestations of a singular pathogenic process with varying degrees of impact.

Fig. 1
figure 1

Simplified models illustrate concepts from Goldberg and Weinberger [12] for schizophrenia and Tamminga et al. [13, 14] for psychosis. In Model 1, various etiologies contribute to distinct phenotypes stemming from a single pathological.

More recently, Tamminga et al. proposed that psychosis may represent a broader phenotype resulting from multiple distinct pathological processes, drawing an analogy to heart failure in cardiology (Fig. 1, Model 2) [13, 14]. This perspective emphasizes the need to explore diverse biomarkers, imaging methods, and genetic studies to identify distinct subtypes and underlying causes of psychosis, which could help explain the heterogeneity observed in psychotic disorders. Therefore, the alternative approach classifies subtypes or biotypes based on biological markers rather than purely on symptomatic observations. BiotypingFootnote 1 aims to identify biologically distinct subgroups of schizophrenia, which could facilitate the development of more targeted and effective treatments and preventive measures. Additionally, biotyping may enhance our understanding of the course of the disease and add new phenotypes. Biotyping of a disease can be achieved through genetics, proteomics, metabolomics, cognitive and behavioral assessments, and neuroimaging. A similar effort has been fruitful in dementia treatment and research – as anti-amyloid treatments became available for Alzheimer’s disease, the field has recently re-defined and subtyped dementias using the AT(N) [amyloid, tau, neurodegeneration] system using these and other biomarkers (such as alpha-synuclein – a marker of Parkinson’s disease) to subtype patients before enrolling them in treatment trials [15]. A comparable biotyping approach may be valuable in psychiatry, supporting the goal of precision medicine where treatments are matched to patients based on their clinical and biomarker profiles [16].

Recent developments in unbiased machine learning (ML) algorithms might identify individuals with different biological subtypes and help to disentangle the heterogeneity in disorders such as schizophrenia. This review will focus on recent developments in using structural neuroimaging data for schizophrenia subtyping with ML algorithms and try to compare their findings with the previous literature, mostly from group (schizophrenia versus controls) comparison studies. The research in this area is rapidly growing and can be challenging for many clinicians unfamiliar with these algorithms. Therefore, we tried to limit our discussion to the basic principles of the algorithms, focusing on the meaning of the results for a better clinical understanding of the condition. We will solely focus on structural imaging due to its availability and simplicity in acquisition and processing; as the available data from diffusion MRI and functional neuroimaging increase, we will likely see a growth in subtyping approaches using these modalities or multiple imaging methods [17, 18].

Structural imaging in schizophrenia

Schizophrenia is among the earliest psychiatric disorders to be extensively studied using structural imaging techniques. Ventricular enlargement and cortical volume reductions or thinning are the most common findings in schizophrenia studies. In the last 30 years, heterogeneous findings in brain MRI studies initially arose due to low statistical power and samples of under a hundred patients, despite the interest in neuroimaging in clinical trials [19]. More recently, however, large-scale consortium studies were able to pool data from many sites worldwide, using standardized protocols for imaging analysis and quality control [19, 20]. As such, initial skepticism on the magnitude and pattern of effects in the brain gave way to well-powered, reproducible studies yielding brain maps with high reliability (see van Erp et al., for example, for a study of 4474 individuals with schizophrenia and 5098 healthy volunteers, assessed with standardized methods at 39 centers worldwide [21]).

Meta-analyses suggest that cortical thinning is prominent in the left superior and medial temporal lobes, including fusiform gyrus, insula, inferior and medial frontal regions [21, 22]. Similar but less prominent cortical thinning in the frontotemporal areas is also present in clinical high-risk populations and those who convert to schizophrenia [23]. Despite widespread cortical thinning in patients, it does not uniformly affect every brain region. For instance, when accounting for individual variations in global mean cortical thickness, individuals with schizophrenia exhibit a thicker cortex in some brain regions (such as the superior parietal cortex, precuneus, and paracentral lobule) compared to healthy volunteers [21]. However, the volumetrics of these regions cannot reliably be used, on their own, for diagnosis of schizophrenia, due to large interindividual variability (overlap between patients and controls) and limited anatomical consistency of the measures. Moreover, schizophrenia patients show larger within-group cortical variation compared to controls [24, 25]. This increased variability may partly stem from disease-related factors, such as illness duration and the severity of negative symptoms [23, 26, 27]. Other factors, such as antipsychotic exposure correlate with thinner cortex observed in various brain regions, including the fusiform, temporal, and frontal lobes, independent of negative symptoms [21]. It is important to note that patients with more severe illnesses are often those who have been medicated for longer durations. Consequently, medication status is confounded with other factors that influence the severity of the disease, such as disease duration, substance abuse, and childhood trauma.

The cortical thickness reduction pattern of patients may not be random but instead follows an organization aligned with cortical networks that interconnect various regions [28]. This pattern differs significantly from that observed in healthy controls, particularly in the distribution, extent, and progression of grey matter reduction. Patients with first-episode psychosis (FEP) exhibit reduced structural covariance between regions of cortical thinning compared to healthy controls, indicating disrupted early network organization. As the illness progresses, patients with chronic schizophrenia show increased structural covariance among the affected regions, suggesting that cortical thinning may propagate along interconnected brain networks over time. These deviations clearly diverge from the characteristic patterns associated with typical aging [19, 29,30,31]. Overall, the findings support a neurobiological basis for schizophrenia, marked by progressive grey matter loss and individual variability.

Recent advances in machine learning have led to limited but growing efforts directed toward diagnosing the disorder based on structural brain MRI (sMRI) images. Multivariate pattern recognition studies showed that the accuracy rate of structural data for diagnosing schizophrenia is approximately 78% with a sensitivity of 76.42% (95% CI: 71.9–80.4%) and specificity of 79.01% (95% CI: 74.6–82.8%) [32]. The reported accuracy rates for sMRI-based ML models range from 73.6 to 83.13%, over the last decade of research [33, 34]. With the transition from traditional models to deep learning and access to larger training datasets, diagnostic accuracy has approached or exceeded 85%. The best-reported results are often achieved using 3D convolutional neural network (3D-CNN) models, with overall accuracy ranging from 86.7 to 87.2%, sensitivity from 90 to 92%, and specificity from 85 to 87.4% [35, 36]. Furthermore, ensemble methods, which combine the results of several machine learning models to achieve a consensus, can help to improve the classification performance relative to using a single classifier model. A recently reported ensemble model reached 92.2% accuracy, 94.4% sensitivity, and 90% specificity with a lightweight 3D CNN model ensembled with a bagging classifier [37]. However, the reported findings raise concerns due to the effect sizes, small sample sizes, and limited testing on out-of-domain data from different sites and scanners, particularly as some subsets of sites and scanners appear to contribute disproportionately to the variability in results [38]. These factors suggest that the reported accuracy metrics may be overly optimistic, as studies may have shown declines in reported accuracy when applied to multisite data. Moreover, ML algorithms may struggle to achieve the desired accuracy for schizophrenia when challenged by other neurodegenerative diseases, such as behavioral-variant frontotemporal dementia (bvFTD) or Alzheimer’s disease, due to the overlap in neuroanatomical patterns [39]. Therefore, at its best, artificial intelligence (AI) or ML algorithms can achieve diagnostic accuracy with MRI images ranging from 80 to 92%.

According to the U.S. Food and Drug Administration and National Institutes of Health Biomarkers Working Group, a diagnostic biomarker should have a sensitivity of 80% in detecting a particular psychiatric disorder, a specificity of 80% in distinguishing this disorder from other psychiatric disorders, and a positive predictive value that approaches 90% [40]. As a result, structural MRI data barely meet the criteria needed to serve as a diagnostic biomarker with the methods tested so far. Furthermore, this approach assumes that schizophrenia is a single disease with near-uniform pathology among patients. However, the diverse clinical presentations and variability in brain structure observed in schizophrenia underscore the need to subtype patients based on their symptoms and neuroanatomical features.

Validation of clinical subgroups with neuroimaging

Factor analyses studies of clinical symptom assessments have consistently supported a three-dimensional model comprising negative, positive, and disorganized dimensions on schizophrenia symptoms [41]. This widely accepted model in clinical practice provides a useful top-down approach for identifying more homogeneous subgroups with distinct neural signatures. Zhang et al. [42] used this factor analysis model in combination with Optimally-Discriminative Voxel-Based Analysis (ODVBA) -- a method that, they argued, offers higher sensitivity and specificity compared to conventional voxel-based morphometry (VBM) [43]. ODVBA’s spatially adaptive technique accounts for the brain’s spatial interrelatedness, allowing for the detection of subtle structural variations in schizophrenia subgroups. Those with predominantly positive symptoms (pPOS) showed significant gray matter (GM) volume reductions in the ventromedial prefrontal cortex (vmPFC), occipitotemporal regions, and portions of the lingual gyrus. Patients with predominantly disorganized symptoms (pDIS) exhibited the highest degree of preservation of GM volume among the three subtypes. The study found that patients with predominantly negative symptoms (pNEG) had lower GM volume in the cerebellum and the vmPFC. However, the vmPFC was relatively preserved in these patients compared to other subtypes. Several other studies found a negative association between superior temporal cortex thickness (including Heschl’s gyrus) and positive symptoms [27, 44, 45]. While these findings highlight the structural heterogeneity of schizophrenia and suggest that each subgroup may correspond to distinct neuroanatomical signatures, several factors hinder the practical application of this approach. These include the absence of clear boundaries between symptom-based subtypes in clinical practice and the possibility that one type of symptom may not reliably predict the later emergence of others [5, 46].

Deficit and non-deficit schizophrenia

In the effort to reduce heterogeneity in schizophrenia, subgrouping as deficit and non-deficit appeals to many clinicians. This is because deficit schizophrenia aligns well with the classical Kraepelinian description of the disorder, while ‘non-deficit’ patients encompass the remaining cases. Deficit patients generally show poor social functioning and limited response to treatment, suggesting that widespread brain structural alterations may underlie their unfavorable clinical outcomes [10, 11]. Banaj et al. [8], using global meta- and mega-analyses in a large international sample, observed that deficit schizophrenia (DSZ) patients exhibited more pronounced cortical thinning than non-deficit schizophrenia (NDSZ) patients—particularly in the right frontoparietal cortex—with small but consistent effect sizes (Cohen’s d ≈ –0.13 to –0.20) in key regions such as the right inferior parietal lobule, caudal middle frontal gyrus, and superior frontal gyrus, although these differences did not reach statistical significance. The authors suggested that clinical heterogeneity within the NDSZ group may have reduced the observed effects. These findings are in line with other schizophrenia studies linking negative symptoms to reduced cortical thickness [27, 47, 48], although contrasting findings are also present [49, 50]. One study found that over five years, DSZ patients showed greater lateral ventricle expansion than NDSZ patients, with moderate effect sizes (η² = 0.089 left, 0.102 right), indicating more pronounced grey matter loss [9]. However, some direct comparisons between DSZ and NDSZ patients have reported larger LVs and more pronounced cortical reductions in NDSZ patients [51,52,53]. While these studies suggest that DSZ and NDSZ may reflect distinct neurological differences among phenotypically diverse schizophrenia subtypes, a consensus on the neuroanatomical differentiation between DSZ and NDSZ patients has yet to be reached. The inconsistencies across the studies might indicate that the relationship between phenotype and brain structure is more complex than previously understood.

Cognitive deficit and cognitive sparse schizophrenia

Another approach to reducing phenotypic heterogeneity in schizophrenia is the delineation of two potential subgroups: cognitive deficit (CD) and cognitive sparse (CS) patients, based on performance across multiple cognitive domains [54, 55]. Multiple studies have shown that CD patients tend to exhibit impairments across all cognitive domains, are more likely to be male, have an earlier onset of illness, and suffer from greater functional disability. Gould et al. [56] applied support vector machine classification to differentiate healthy controls from schizophrenia patients and to further distinguish between the CD and CS subgroups using MRI data. Cognitive subgroups were distinguished from healthy controls with an accuracy of up to 72%. However, classification accuracy between cognitive subgroups was relatively low at 60% without stratification, but it improved significantly to 83% for females when stratified by sex. Thus, while there is significant overlap in disease patterns between cognitive subgroups, sex-related differences in brain organization appear to play a role in improving the classification of schizophrenia subtypes based on neuroanatomical features. In another study, researchers used cluster analysis to classify schizophrenia patients based on differences between their current and premorbid IQ [57]. They found that patients with relatively preserved IQ showed reductions primarily in the inferior parietal lobe. However, as the IQ difference increased in the moderate to severe groups, more extensive cortical reductions were observed, particularly in the insula, temporal and frontal cortices, as well as in medial cortical structures, in addition to the parietal lobe. Additionally, higher negative symptom scores were associated with greater IQ decline. Thus, the loss of cognitive abilities in patients is associated with a more widespread reduction in the cortex and the medial temporal lobe. This finding suggests that the severity of cognitive symptoms is associated with more pronounced structural brain abnormalities, although other factors, such as sex, may also interact with cognition to influence brain structure.

The bipolar-schizophrenia network intermediate phenotype (B-SNIP) and associated studies

B-SNIP represents one of the largest (711 patients, 883 first-degree relatives and 278 healthy controls) and most influential studies for subtyping psychosis [13]. Even though B-SNIP includes schizoaffective and psychotic bipolar patients--not just patients with schizophrenia--many of its conclusions are valuable for understanding the subtyping of schizophrenia. The initial objective of the study was to identify biomarkers for traditional psychosis diagnoses; however, no specific biomarker or set of biomarkers could be identified for psychosis. Consequently, researchers shifted focus towards using biomarker data to identify distinct subtypes within psychosis characterized by shared neurobiological features, irrespective of traditional DSM diagnostic categories [13]. Nine composite variables, such as cognitive control and sensorimotor reactivity, were used to derive biomarkers for subsequent clustering analyses. The study employed unsupervised clustering methods (k-means) to identify subtypes, or “biotypes,” based on neurobiological markers. The optimal number of biotypes was determined to be three, using the “gap statistic” and two-step clustering methods.

Biotype 1 was characterized by the highest inhibition errors, the lowest brain responses to auditory stimuli, exaggerated responses to repeated auditory stimuli, sluggish reactions to sensory inputs, and poor detection of critical stimuli. Biotype 2 showed moderate impairment in cognitive control, falling between Biotypes 1 and 3, but had heightened sensorimotor reactivity, including robust neural responses to auditory inputs and high levels of intrinsic neural activity, while still maintaining intact target detection. Biotype 3, despite a psychosis diagnosis, displayed no significant deficits in cognitive control compared to healthy controls but exhibited modest deviations in sensorimotor reactivity. Biotype 1 was associated with more cases of schizophrenia (59%), while Biotype 3 had a higher representation of bipolar disorder with psychosis (44%). Schizoaffective disorder, however, was fairly distributed across all biotypes. Social functioning was lowest in Biotype 1 and highest in Biotype 3. Biotypes 1 and 2 had similar The Positive and Negative Syndrome Scale (PANSS) scores, whereas Biotype 3 had lower scores, indicating that it was clinically less severe and more socially active. MRI data revealed gray-matter reductions across all three biotypes. Biotype 1 showed the largest deficits (e.g., frontal Cohen’s d ≈ −1.0; cingulate d ≈ −0.7 to −1.0; temporal d ≈ −0.9), extending to the thalamus and basal ganglia (d ≈ −0.6 to −1.1). Biotype 2 displayed intermediate losses (most regions d ≈ −0.6 to −0.8), while Biotype 3 exhibited milder yet still notable reductions (d ≈ −0.3 to −0.5), concentrated in anterior–limbic areas. Relatives showed a parallel but attenuated pattern (d ≈ −0.2 to −0.4), indicating that gray-matter differences extend beyond patients themselves. Although biotypes initially seem to represent a gradient of clinical syndrome severity or align more closely with schizophrenia, the observed differences cannot be attributed to disease severity, as Biotypes 1 and 2 exhibited similar PANSS scores. Notably, Biotype 2 stands out due to its heightened sensorimotor reactivity. Deficit/non-deficit subgroups are also not accurate for understanding biotypes in B-SNIP because, in the group with the most severe GM loss (Biotype 1), more than 50% of the population were non-deficit patients (Fig. 2).

Fig. 2
figure 2

Psychotic patients show widespread neocortical volume reduction compared to healthy controls in Biotype 1 (left), substantial fronto-temporal reduction in Biotype 2 (middle), and localized limbic reductions in Biotype 3 (right) [13].

Xiao et al. [58] aimed to identify discrete subtypes in first-episode schizophrenia (FES) patients based on structural brain features and to determine if these patterns were similar to those observed in midcourse schizophrenia patients from the B-SNIP study. In other words, they tried to replicate the B-SNIP findings in FES. Schizophrenia subtyping based on structural brain alterations was conducted using a density peak-based clustering (DPC) algorithm. Compared to the k-means method used in the original study, DPC offers advantages for datasets with complex structures or unknown cluster numbers, and it is more robust to noise and outliers due to its reliance on density peaks. DPC identified three reliable neurobiological subtypes of FES patients, replicating the cluster number of the original B-SNIP study. FES patients in Subtype 1 showed decreased surface area, thickness, and volume, mainly in the cortical-thalamic-cortical circuitry, with increased thickness in the left rostral anterior cingulate gyrus. In contrast, Subtypes 2 and 3 showed no significant cortical or subcortical alterations compared to controls. Midcourse schizophrenia patients from the B-SNIP study showed more severe GM loss compared to FES, Subtype 1 exhibited widespread GM deficits across all lobes, as well as in the insular cortex and bilateral hippocampus, with increased GM volume in the bilateral pallidum (not observed original B-SNIP study). Subtype 2 showed reduced GM volume in the left hippocampus, while Subtype 3 showed no significant brain alterations. Notably, there were no significant differences in PANSS scores across the three subtypes in either the FES or midcourse schizophrenia cohorts. While the basic structure of subgroup classification remained consistent in later-course patients, differences between the two patient samples (FES and midcourse) suggest that certain neuroanatomical features may be influenced by illness progression or antipsychotic treatment in B-SNIP biotyping.

Zhao et al. [59] investigated alterations of brain structures in antipsychotic-treated patients built upon the subtype findings from drug-naïve FES patients of the previous study, Xiao et al. [58]. They performed principal component analysis (PCA) on neuroanatomical features in never-treated patients to generate principal components (PCs) and corresponding loadings, which were then applied to the antipsychotic-treated sample to produce predicted PCs for each participant. Pearson’s correlation coefficient was used to measure the similarity between each treated patient’s brain morphology and the predefined subtype vectors from the never-treated sample. To clarify, they used PCA on drug-naïve patients to create a reference model of pure illness-related brain features. Applying this model to treated patients allowed them to parse out which features were likely due to the illness, and which might be due to medication or chronic institutionalization. This method allows researchers to disentangle the neuroanatomical effects of treatment from those of the disorder itself. Despite comparable PANSS scores and chlorpromazine equivalent doses across the three subtypes, each demonstrated distinct GM patterns. Subtypes 1 and 2 exhibited severe and widespread cortical thickness deficits, while Subtype 3 displayed thicker cortices in the parietal and occipital regions. All subtypes showed decreased volumes in the bilateral thalamus and the left nucleus accumbens. Subtype 3 also showed reductions in the bilateral hippocampus, amygdala, and right nucleus accumbens, suggesting more pronounced subcortical deficits. It should be noted that subtype 1 has an increased globus pallidus, which is not observed in other subtypes. Although Zhao et al. [59] demonstrated that three subtypes could be identified in antipsychotic-treated patients, the overlap in structural findings between this study and the B-SNIP midcourse patients in Xiao et al. [58] is limited.

Imaging data-driven subtyping

In recent years, there has been increasing interest in subtyping schizophrenia using data-driven brain imaging biomarkers. A key advantage of this approach is that it essentially minimizes dependence on clinical features, which can be influenced by factors such as the psychiatrist’s assessment, medication history, and, most critically, the fluctuating nature of symptoms over time [5, 60]. Therefore, image data-driven approaches hold the potential for a more precise definition of objective subtypes that can be applied across different sites and cohorts, ideally leading to the identification of more homogeneous subgroups. By employing a data-driven approach, it is possible to decompose the variability within a population into consistent brain patterns associated with specific clinical and cognitive measures.

Many studies use cross-sectional data from one or more centers to identify these biomarkers, providing a snapshot of brain structure at a single time point. In contrast, longitudinal data offers valuable insights into the progression of brain changes over time, potentially revealing more dynamic and temporally sensitive subtypes. However, until this time, no longitudinal data has yet been published. Multi-center studies face challenges such variations in imaging protocols and differences in scanner types. To address these issues, techniques such as ComBat [61] and other harmonization methods are often applied, however, they can influence the outcomes of clustering analyses [62] The choice of clustering algorithm is another key factor, as different methods can lead to the identification of varying subgroup structures depending on the nature of the data being analyzed. The validation process also differs across studies. While some use independent datasets for cross-validation, others focus on internal validation within their sample set. Robust validation strategies are essential for ensuring the reproducibility and generalizability of the identified subtypes. As summarized in Table 1, based on these principles, new studies have emerged in recent years (Table 1).

Table 1 Studies* using Machine Learning Algorithms for Subtyping Schizophrenia Patients.

In one of the early studies in this field, Dwyer et al. [63] tested the hypothesis that subtyping schizophrenia could enhance the accuracy of computer-aided discrimination between patients and controls. In a small group of patients (N = 71), they applied an unsupervised, data-driven clustering approach to sMRI data using the fuzzy c-means (FCM) algorithm. This method organizes data into clusters without prior knowledge of group membership, allowing for fuzzy boundaries between clusters. A consensus-based clustering technique was used to ensure stability and reliability, running the FCM algorithm multiple times to identify the final subgroup partitioning based on consensus. The number of clusters was limited to two due to the small sample size. Indeed, subtyping improved diagnostic accuracy in external validation samples with an average gain of 9%. Subtype 1 was characterized by cortical and subcortical volume reductions, including the insula, striatal, thalamic, hippocampal, and right-hemispheric superior temporal regions, alongside volume increases in the medial and lateral parietal lobes. Patients in this subtype had longer illness duration, a later age of onset, and more hospitalizations. Subtype 2, in contrast, was primarily defined by cortical reductions, particularly in the lateral prefrontal, medial parietal, and temporal cortices along with volume increases in cerebellar structures. Patients in this subtype had a shorter illness duration and exhibited symptoms of conceptual disorganization, hallucinations, and hyperthymic tendencies. In this study, illness duration emerged as the key factor for subgroup separation (predictive accuracy of 81.2% using the top 10 variables), correlating with negative symptoms but not necessarily with widespread cortical differences, as certain areas, like the TPJ, showed increased volume. These findings do not clearly support the presence of distinct subtypes in schizophrenia, as the duration of illness might be a key factor influencing both symptoms and brain structure rather than distinct subgroups.

In another single-center study, Honnorat et al., [64] involving 157 patients with schizophrenia and 169 controls used a semi-supervised clustering method, CHIMERA (Clustering of heterogeneous disease effects via distribution matching of imaging patterns), to identify schizophrenia subtypes [65]. Instead of clustering the patients directly, CHIMERA aims to cluster the differences between patients and a demographically matched group of healthy controls, focusing specifically on disease-related neuroanatomical alterations. They integrated demographic variables such as age, sex, and height directly into the CHIMERA model’s optimization function to match patients and controls, thereby minimizing the bias introduced by these factors. The method determines a set of transformations that deform the control group’s neuroanatomy to match that of the patients, representing the disease’s varying effects on the brain. Each patient is then associated with a linear combination of these disease subtypes, and a discrete clustering is achieved by retaining only the most significant influence on each patient’s brain structure.

The study identified three subgroups of patients with schizophrenia, each characterized by distinct neuroanatomical patterns and demographic traits. Group 1 showed significant GM atrophy in the thalamus, anterior cingulate cortex, and superior temporal gyrus, along with increased cerebrospinal fluid (CSF) and white matter expansion in temporal regions, predominantly affecting older males (86.7%) with longer disease duration and stronger positive symptoms. Group 2 exhibited frontal CSF expansion and volumetric reductions in thalamic, peri-Sylvian, and cerebellar regions, especially in white matter, with a younger male demographic (82.2%) and similar disease duration to Group 1 (suggesting earlier onset), though their positive symptoms were not different of Group 1. Group 3 presented a mix of neuroanatomical features seen in Group 1 and Group 2, with milder CSF expansions and a more balanced sex distribution (55.8% male), lower brain volumes, and significantly lower education levels, particularly among female patients. The results of this study suggest structurally different subgroups might be present in schizophrenia but cannot be expressed with a simple discriminative approach. Unlike previous studies that primarily focus on gray matter (GM) for analysis, this study utilizes CHIMERA and incorporates both gray and white matter (WM) features. Given that myelin integrity is known to be compromised in schizophrenia—potentially reducing the contrast between GM and WM—this inclusion is particularly relevant. Indeed, the thalamic GM reduction was almost completely counterbalanced by an increase in white matter volume. This could indicate a tissue contrast in the thalamic region, not necessarily reflecting volumetric change but rather associated with a degree of (de)myelination. The authors demonstrate that excluding WM features results in lower silhouette scores and reduced cluster stability, indicating that WM contributes unique variance beyond GM alone. Consequently, the current approach produces distinct clusters of differences among subgroups that may diverge from those identified in other studies. These findings underscore the need for future research to jointly examine both gray and white matter, while carefully addressing contrast-related challenges in neuroimaging analyses.

Shi et al. [66] identified two subtypes of schizophrenia using Tensor-Based Morphometry (TBM) in combination with W-scores, which represent individual deviations from a normative brain structure model. Using a large multi-site cohort across seven centers (Table 1), they constructed the normative model from healthy control data, correcting for confounding variables such as age, sex, and scanner/site differences. This process generated individualized W-score maps that quantified disorder-related brain abnormalities while minimizing demographic and technical variability. Rather than clustering on raw MRI-derived morphometric values, which embed normal inter-individual differences, the study focused on residuals—specifically atypical TBM patterns that reflect deviations from the normative model. These residual-based inputs enabled identification of biologically meaningful subtypes independent of clinical symptoms, avoiding confounds unrelated to the disease itself. This approach not only improves the precision of subtype classification based solely on brain structure differences but also enhances sensitivity to clinically relevant neurobiological patterns. They used sparse K-Means clustering, which has advantages such as a feature selection step and being better suited for high-dimensional data (like brain MRI) compared to traditional K-Means. They validated the subtypes using an independent dataset scanned with General Electric scanners, confirming results similar to those found in the discovery dataset, which was scanned with Siemens scanners. The identified subtypes have lower TBM values (more tissue loss) in the ventral caudate and cortical areas like the superior and middle frontal gyri, cingulate gyrus, superior temporal gyrus, fusiform gyrus, and insular gyrus. They differed: Subtype 1 showed lower values in the superior and middle frontal regions when compared to Subtype 2, while Subtype 2 had lower values in the middle temporal gyrus. Subtype 1 had a larger globus pallidus and putamen but smaller amygdala and hippocampus, whereas Subtype 2 showed normal globus pallidus and putamen but more widespread deformities in the hippocampus, amygdala, and thalamus. Subtype 1 had on average, a smaller cerebellum, while Subtype 2 showed cerebellar deformities in opposite directions. These findings could account for the inconsistent results observed in previous studies of the cerebellum and striatal regions, which may have disproportionately included one subtype. On the other hand, other shared regions between subtypes like the prefrontal cortex, cingulate, insula, and fusiform are consistently reported in previous studies [67, 68]. Compared to subtype1, subtype 2 has higher scores on negative symptoms on PANSS (d = 0.2) while there were no difference among other subscales. A partial least-squares correlation analysis relating individual PANSS scores to the subtype-specific W-score deviation map identified a first latent component, dominated by negative-symptom loadings, that explained 39.26% of the shared covariance between negative symptoms and neuroanatomical heterogeneity.

The PHENOM (‘Psychosis Heterogeneity Evaluated via Dimensional Neuroimaging’) consortium sought to identify neuroanatomical subtypes by applying a semi-supervised machine learning method termed HYDRA (Heterogeneity Through Discriminative Analysis) [69]. Unlike traditional approaches that cluster patients based on similarities, HYDRA focuses on identifying disease effects by modeling differences from healthy controls (a case–control subtraction approach), helping to isolate true disease subtypes while minimizing confounding influences like age, sex, scanner differences, and ethnicity. It uses a semi-supervised learning approach, performing classification and clustering simultaneously by separating patients from healthy controls with linear maximum-margin classifiers and associating patient subtypes with different faces of a polytope (hyperplanes). This contrasts with fully supervised methods like support vector machines, which cannot differentiate between patient subtypes. HYDRA’s comparison of patients to healthy controls makes it more robust against irrelevant inter-individual variations, enhancing the focus on disease-specific pathology. Additionally, HYDRA emphasizes reproducibility, using extensive validation methods such as permutation tests, split-sample analyses, and leave-one-site-out validation, ensuring consistent and reliable identification of subtypes across datasets. These characteristics make HYDRA particularly effective in uncovering distinct neuroanatomical subtypes in complex, heterogeneous conditions such as schizophrenia. The consortium identified two subtypes replicated in later studies by their research group and others (Fig. 3) [70, 71]. Subtype 1 comprised 63% of the sample and was characterized by widespread cortical gray matter (GM) reductions, with the most pronounced atrophy observed in the thalamus (Cohen’s d ≈ –0.6), nucleus accumbens (d ≈ –0.55), medial temporal regions (d ≈ –0.5), medial prefrontal cortex (d ≈ –0.45), and insular cortex (d ≈ –0.5). In contrast, Subtype 2 (37% of the sample) exhibited a largely preserved brain anatomy except for significantly larger GM volumes in the basal ganglia, particularly in the pallidum (d ≈ +0.4), putamen (d ≈ +0.45), and parts of the caudate nucleus (d ≈ +0.35). These effect sizes, though moderate, underscore distinct and non-overlapping neuroanatomical profiles between the subtypes, supporting the notion of underlying pathophysiological heterogeneity in schizophrenia. Sex-specific analyses, antipsychotic dose adjustment and restriction analyses to patients with less than 2 years of duration did not change the findings. Duration of illness is associated with GM reduction only in Subtype 1. Both subtypes have similar positive and negative symptoms. This study clearly demonstrated that schizophrenia patients can be neuroanatomically subtyped independently of clinical symptoms, and that gray matter reduction might not apply uniformly to all patients.

Fig. 3: Gray matter volume differences between schizophrenia subtypes identified by HYDRA analysis.
figure 3

A Subtype 1 shows widespread gray matter volume reductions, particularly in the thalamus, nucleus accumbens, medial temporal, medial prefrontal/frontal, and insular cortices. B Subtype 2 shows relatively greater gray matter volumes in the basal ganglia regions, including the pallidum, putamen, and portions of the caudate nucleus, compared to healthy controls.

The same group validated the presence of these subtypes in a large first-episode schizophrenia cohort by applying the pre-trained subtype models derived from chronic schizophrenia patients in their earlier study [71]. Subtype 1 had a higher proportion of first-episode psychosis (FEP) cases (32%) compared to healthy controls (19%), while Subtype 2 had a similar proportion in both groups (21% in FEP, 23% in controls). Clinically, Subtype 2 was linked to higher educational attainment, more positive psychosis symptoms at first presentation, and a higher likelihood of symptom remission over 1-, 3-, and 5-year follow-up periods, higher female sex, whereas Subtype 1 was associated with more schizophrenia diagnoses and lower educational attainment. Therefore, Subtype 1 is more closely associated with progressive GM loss, greater cognitive deficits, and lower remission rates, regardless of antipsychotic dosages. Similar results were reported by using K-means + + clustering in a group of institutionalized patients [72]. This analysis revealed that patients in Subtype 1, characterized by reduced volumes in multiple cortical regions (dorsomedial medial frontal areas, parietal cortex, middle and superior temporal cortex), subcortical basal ganglia, the amygdala and hippocampus, displayed poorer cognitive function compared to patients who showed increased pallidal volume and otherwise normal subcortical regions (Subtype 2) [72]. When the authors applied the classifier developed from institutionalized patients to categorize community-dwelling individuals with schizophrenia, either with long-term illness or FEP, they observed similar subcortical volume increases in long-term illness patients classified as Subtype 2. This finding suggests that increased pallidum volume is a consistent regional brain alteration observed in both institutionalized patients and community-dwelling individuals with long-term illness. However, this pattern was not observed in first-episode patients. In contrast, drug-naïve FEP patients in Subtype 2 exhibited increased caudate and putamen volumes, suggesting that other basal ganglia nuclei are enlarged during the early stages of illness.

Based on the concept that a significant portion of the general population may carry a biological vulnerability to schizophrenia without ever developing psychosis [73], HYDRA was used to identify imaging signatures within the general population. The analysis was then applied to two independent population-level datasets: typically developing youths and youths with psychosis spectrum symptoms from the Philadelphia Neurodevelopmental Cohort (N = 359; ages 16–23 years) and adults from the UK Biobank study (N = 836; ages 44–50 years) [70]. Signature 1, which corresponds to Subtype 1 in schizophrenia patients in the previous studies, was found to be more prevalent in youths with psychosis spectrum symptoms compared to typically developing youths (40 vs. 23%), while the prevalence of Signature 2 (representing Subtype 2) was not higher in youths with psychosis spectrum symptoms (14 vs. 24%). Signature 1 was linked to poorer cognitive performance in both youths and adults, whereas Signature 2 was associated with no significant cognitive impairment. Additionally, adults expressing Signature 1 had higher schizophrenia polygenic risk scores compared to those without either signature.

Data from follow-up studies in schizophrenia indicate that at least a subset of patients follows a distinct trajectory, reflecting neuro-pathophysiological progression that differs from healthy controls [74,75,76]. However, the absence of studies with longitudinal data limits our understanding of the trajectories of potential subtypes. SuStaIn (Subtype and Stage Inference) algorithm is a probabilistic, unsupervised machine learning approach that designed to identify disease subtypes and stages of progression using a large number of cross-sectional data derived from single time-point MRI scans [77]. SuStaIn clusters individuals into subtypes based on shared patterns of brain abnormalities and infers distinct trajectories of disease progression across different brain regions. By combining these two capabilities—clustering into subtypes and staging within subtypes—SuStaIn allows researchers to capture both what kind of disease a person has and how far along they are in its progression, all from cross-sectional data. These trajectories are modeled as pseudo-longitudinal sequences, representing typical patterns of disease progression inferred from the population. In doing so, SuStaIn leverages structural MRI data to reveal biologically meaningful subtypes and temporal stages, offering deeper insight into the neurobiological mechanisms underlying heterogeneous conditions such as schizophrenia. However, SuStaIn relies on cross-sectional MRI data, which may not accurately reflect the dynamic progression of brain changes over time. Additionally, SuStaIn is built on the assumption of a progressive pathological trajectory in schizophrenia, which may align more closely with a neurodegenerative model of the illness. This assumption may not fully capture the heterogeneity or neurodevelopmental aspects of schizophrenia across all patient populations. Using the SuStaIn algorithm, Jiang et al. [78] identified two neurostructural subgroups by mapping the temporal and spatial trajectories of gray matter changes in schizophrenia patients based on a cohort of 4,222 patients and 7,038 controls (Fig. 4).

Fig. 4: Distinct spatiotemporal trajectories of gray matter loss in schizophrenia.
figure 4

The spatiotemporal pattern of gray matter loss displays a progressive spatial extension accompanied by later ‘temporal’ stages of pathological progression that differ between the two trajectories. Reproduced from Jiang et al. 2024, Nature Communications, licensed under CC BY 4.0.

“Trajectory” 1 (62.1% of patients) exhibited an “early cortical-predominant loss” biotype, starting with a reduction in Broca’s area, spreading to adjacent fronto-insular regions (Cohen’s d ≈ 0.7 increases as the disease progress) then extending across the neocortex, and eventually affecting the subcortex. By contrast, “Trajectory 2” (37.9% of patients) displayed an “early subcortical-predominant loss” biotype, with volume loss beginning in the hippocampus (Cohen’s d ≈ 0.6 increases as the disease progress), then progressing to the amygdala and parahippocampus, followed by the accumbens and caudate, before reaching the cerebral cortex. Both medication-naïve patients and those with less than two years of disease duration followed the same trajectories with either Broca’s area or hippocampus as initiating regions. Individuals in the later stages of either trajectory showed a significant correlation with reduced GM volume in Broca’s area and the hippocampus, longer disease duration, more severe negative symptoms, and worse cognitive deficits. Interestingly, compared to healthy controls, the striatum (including the caudate and putamen) was larger in Subtype 1 patients and smaller in Subtype 2 patients. This finding was also observed in medication naïve individuals. Subtype 1 exhibited a gradual worsening of negative symptoms as the disease progressed, whereas Subtype 2 showed stable negative symptoms. Furthermore, a progressive increase in depression and anxiety symptoms was observed exclusively in Subtype 1. Overall, Subtype 1 was marked by increasing negative and depression/anxiety symptoms, alongside exacerbation in positive symptoms in the later stages of the disease compared to Subtype 2. The SuStaIn algorithm introduced a novel concept, revealing that distinct disease subtypes originate in different brain regions at the onset. However, structural alterations will converge as the disease progresses, with only a few regions, such as the striatum, exhibiting noticeable differences. Although not incorporating covariates into the SuStaIn model directly, validated subtypes across medication-naïve, first-episode, and multi-ethnic samples, supporting the robustness of their findings against potential confounds [78].

Discussion

The complex and heterogeneous nature of schizophrenia has posed a challenge for clinicians and researchers since its initial definition by early pioneers. Traditional approaches, such as a priori classifications (deficit/nondeficit) or symptom categorizations (positive/negative), have proven insufficient for effectively subtyping the disease. These limitations impede our ability to identify distinct neuroanatomical subtypes that could enhance diagnostic precision or support the development of robust pathophysiological models for schizophrenia. However, advancements in data analysis techniques have opened new possibilities for identifying novel subtypes and explaining inconsistent findings. For instance, recent studies suggest that cerebellar volume may vary across different subtypes [63, 66] Consequently, the heterogeneity of the studied population can significantly influence the results, leading to variability in findings across studies.

A biomarker-based approach in B-SNIP studies revealed three psychosis subtypes and subsequent research has confirmed that such subtypes may also be present in FES and midcourse patients [58, 59, 72, 79, 80]. The B-SNIP studies highlight several notable points. For instance, while Biotypes 1 and 2 exhibit similar clinical symptoms, differences in GM reduction suggest that clinical presentations may not accurately reflect the underlying pathophysiology. The finding of limited GM reduction in the Biotype 3 suggests that psychosis is not necessarily a consequence of neuropathology driven by GM reduction.

Image-driven approaches have generally identified two subtypes, except Honnorat et al., [64] who defined three distinct subtypes. The PHENOM consortium, utilizing the HYDRA framework, delineated two subtypes with comparable clinical profiles but divergent neuroanatomical characteristics. One subtype demonstrated significant cortical and thalamic GM reductions, whereas the other exhibited increased basal ganglia (BG) volumes without notable cortical GM alterations [70, 71, 81]. Interestingly, increased volumes of the globus pallidus and other BG structures have been consistently observed across several other image-based subtyping studies, suggesting a robust association with specific schizophrenia subtypes [58, 66, 78]. These neuroanatomical differences have also been detected in individuals with first-episode schizophrenia (FES) and in at-risk populations in the general population [70, 71]. While BG volumetric changes are often attributed to dopamine-blocking antipsychotic treatment, similar patterns have emerged in antipsychotic-naïve and high-risk individuals, implying that medication use is unlikely to be the sole driver of BG alterations in schizophrenia [80, 82]. Additionally, several investigations have failed to detect significant antipsychotic effects on subcortical structures [83]. Collectively, these findings underscore the potential of the pallidum and other BG regions as critical biomarkers in schizophrenia subtyping. Supporting this, schizophrenia polygenic risk scores and individual risk alleles have been linked to larger putamen volumes in non-clinical samples [83]. Furthermore, a recent study reported increased putamen volume in a transdiagnostic, medication-naïve cohort and among unaffected family members [84], further emphasizing a possible genetic and trait-level contribution to BG structural changes in schizophrenia.

Beyond the basal ganglia, long-term antipsychotic use has been associated with cortical thinning in the frontal and temporal regions, alongside increased volume in the anterior cingulate cortex among patients with schizophrenia [85, 86]. However, it remains unclear whether these cortical changes are due to neurotoxic or neuroplastic effects of antipsychotic medications, or from other processes related to the underlying pathology of the illness. In fact, recent studies suggest that cortical reductions may primarily reflect intrinsic, neurodevelopmental, or illness-related factors rather than medication effects [87, 88]. To account for the potential influence of antipsychotic treatment in image-driven subtyping studies, researchers have adopted various strategies. The most common approach involves adjusting antipsychotic doses to chlorpromazine equivalents in statistical analyses. More rigorous methodologies have also been employed. For instance, Jiang et al. replicated their findings in medication-naïve and first-episode patients, while Chand et al. identified similar subtypes even when restricting their analyses to patients with a disease duration of less than two years [78, 81]. Moreover, most studies [64, 81, 89] report comparable antipsychotic doses across the identified subtypes. Taken together, these findings suggest that although antipsychotic medications may influence brain structure, their impact on schizophrenia subtyping is likely minimal, with observed differences more reflective of underlying disease mechanisms than of treatment effects.

The identified subtypes may correspond to treatment-resistant patients, who typically show greater frontal cortical thinning and more severe clinical symptoms, comprising approximately 15–30% of schizophrenia cases [90]. However, no reported subtype has been reported to represent such a small group with significantly more frontal thinning. Additionally, clinical symptoms were generally similar across subtypes in many studies [13, 64, 81, 89]. Thus, no current subtype clearly corresponds to treatment-resistant schizophrenia as defined by today’s criteria.

Gray matter loss in the cortex, particularly in the prefrontal and temporal regions, is a well-documented finding in schizophrenia. This cortical GM reduction is often accompanied by loss in the hippocampus and other medial temporal structures. One subtype identified by HYDRA specifically exhibits these patterns, with GM loss correlating with the duration of illness, suggesting an ongoing degenerative process, at least in this subtype. However, this raises the question of how brain changes unfold across different subtypes throughout the illness. To date, there is only one study providing insight into longitudinal changes. While SuStaIn uses cross-sectional data, its algorithm estimates pseudo-longitudinal sequences—trajectories of disease progression reconstructed from the data. These trajectories represent typical patterns of disease progression rather than direct longitudinal observations. Current findings suggest two distinct trajectories of progression: the “Cortical Trajectory” (trajectory 1), where GM reduction begins in Broca’s area, spreads to adjacent fronto-insular regions, extends across the neocortex, and eventually involves subcortical structures. In the second one, the “Subcortical Trajectory” (trajectory 2), volume loss originates in the hippocampus, progresses to the amygdala and parahippocampus, and then affects the accumbens and caudate before reaching the cerebral cortex. These findings suggest that schizophrenia as a disease can originate from two different brain areas and follow different paths as the disease progresses. The involvement of the hippocampus in dopamine dysregulation has been implicated in the pathophysiology of schizophrenia in prior studies [91, 92]. If this mechanism is relevant, then patients following the “Subcortical Trajectory” may exhibit a distinct response to current dopamine-blocking antipsychotic treatments compared to those following the “Cortical Trajectory”. However, as the disease progressed, the treatment response might decreased independently of subtypes, suggesting the importance of disease staging [93].

This perspective is supported by the findings from neuroimaging studies using other modalities. Positron Emission Tomography (PET) studies have shown that elevated striatal dopamine release correlates with the clinical efficacy of dopamine-blocking antipsychotic drugs [94]. However, this hyperdopaminergic state is not present in all patients. Individuals who show poor response to antipsychotics often exhibit greater glutamatergic abnormalities, particularly in cortical regions [95]. Building on these observations, Howes et al [96] proposed two subtypes of schizophrenia: Type A, characterized by hyperdopaminergia, and Type B, which lacks this feature. Indeed, data-driven subtyping studies [78, 81, 89] discussed in this review have identified distinct cortical and subcortical patterns, which are consistent with findings from PET and MR spectroscopy studies and may be associated with variations in disease pathology, progression, cognitive function, and treatment outcomes. Collectively, these findings lay the groundwork for enhancing diagnostic accuracy, uncovering underlying mechanisms, and developing more personalized interventions.

Conclusion

Current approaches predominantly propose two or three cluster (subtype) solutions for deconstructing schizophrenia based on patients’ neuroanatomical characteristics, providing a framework that transcends traditional symptom-based classifications. However, these clusters vary depending on the classification algorithm used, despite some consistent overlap across studies. This underscores the need to validate current findings using larger datasets and more robust validation methods. Novel algorithms, such as SuStaIn, which show potential for identifying disease trajectories, also require longitudinal data to ensure reliability. Additionally, existing data have struggled to align these proposed subtypes with clinical symptomatology, primarily focusing on positive and negative symptoms. To address these gaps, future research should investigate the mental (dys)functioning of schizophrenia patients concerning the neuroanatomical patterns identified in the proposed clusters, bridging the divide between structural findings and clinical symptoms.