Introduction

Schizophrenia remains a major psychiatric disorder affecting approximately 1% of the global population, despite decades of extensive research. It is characterized by severe disturbances in cognition, perception, and emotion [1,2,3]. Among its core symptoms, auditory verbal hallucination (AVH) is a hallmark feature, with 60–80% of patients experiencing the sensation of "hearing voices" at some point in their illness [4,5,6]. These voices are typically perceived as originating from distinct external agents, are often negative or derogatory in content, and are associated with high levels of distress [7, 8]. Indeed, persistent AVH significantly impair cognitive functions, elevate the risk of self-harm, and reduce quality of life (F.Y. Chen et al., [9]; Han et al., [10]).

Although various explanatory models have been proposed, including source-monitoring deficit, and predictive-coding error models [11], the underlying mechanism of AVH remains unclear. Growing evidence points to brain network dysconnectivity as a fundamental mechanism [12, 13]. This dysconnectivity involves large-scale circuits connecting Broca’s area, supplementary motor area (SMA), superior and middle temporal gyri (STG, MTG), insula, and cerebellum (Ćurčić-Blake et al., [11]; Barber et al., [14]). From a structural neuroimaging perspective, diffusion tensor imaging (DTI) has attracted substantial attention for its ability to assess white-matter integrity [15] and thereby probe brain dysconnectivity in AVH. However, confusing factors such as study design, sample selection and analysis method may have contributed to the heterogenous results [16, 17], and some even suggest that DTI indices may vary with the course of illness rather than with the severity of AVH [18].

FC abnormalities detected by neuroimaging modalities such as Functional Magnetic Resonance Imaging (fMRI), Electroencephalography (EEG), and fNIRS have remained a mainstream line of evidence for elucidating the neural mechanisms of AVH. However, the precise pattern of dysconnectivity has remained ambiguous. Early task-based and symptom-capture studies demonstrated that multiple large-scale networks are implicated in AVH, most prominently the Default Mode Network (DMN), Salience Network (SN), and auditory–language network [19, 20], reflecting state-level neural activity when AVH is present. In contrast, resting-state FC (rs-FC) has received less attention; yet it may capture trait-level distinctions that reflect an individual’s vulnerability to AVH. As highlighted by Alderson-Day et al. and Northoff [21, 22], intrinsic brain activity at resting state continues to provide valuable insights into the underlying mechanisms of AVH. Meanwhile, findings from rs-FC remain heterogeneous across networks, underscoring the need for trait‑sensitive rs‑FC designs that do not prespecify the direction of effect [23,24,25]. Owing to its ability to capture temporal variability in intrinsic brain activity [26], dynamic functional connectivity (dFC) has gained increasing attention for exploring the neural mechanisms of AVH; yet findings across dFC studies remain heterogeneous, with discrepancies in both the direction and the spatial distribution of FC alterations [27,28,29]. Recently, Zhang et al. combined dFC and static FC (sFC) analyses and identified a characteristic pattern in AVH, marked by reduced overall sFC yet enhanced variability of dFC [30]. These results highlight that sFC continues to provide critical value to achieve a more comprehensive understanding of functional disturbances in AVH, particularly when interpreted alongside dFC measures.

In the present study, we aimed to characterize trait-relevant rs-FC in schizophrenia with specific attention to AVH history (AVHh+ vs. AVHh-), irrespective of whether AVH occurred during scanning. Guided by prior work implicating left-lateralized language–auditory networks and DMN (Barber et al., [14]; Ćurčić-Blake et al., [11]), we predefined regions of interest (ROIs) within these networks in the left hemisphere and treated ROI-to-ROI rs-FC among these nodes as the primary outcome. fNIRS was employed in this study to measure rs-FC of the participants. Recently, fNIRS has gained recognition as a feasible tool [31, 32]. While fMRI provides whole-brain volumetric coverage, including subcortical and deep cortical regions, fNIRS measurements adequately sample these frontal–temporal cortical surfaces that ground our ROIs, enabling bedside, low-noise, motion-resilient recordings in inpatients [33,34,35]. We hypothesized that FC would differ significantly between AVHh+ and AVHh-.

Materials and methods

Participants and experimental procedure

In total, thirty-nine patients with schizophrenia were recruited from the inpatient ward of Peking University Sixth Hospital (Beijing, China) and met the following criteria: 1) a diagnosis of schizophrenia or schizoaffective disorder made by an experienced psychiatrist, based on comprehensive interviews using the Structured Clinical Interview for DSM-IV (SCID); 2) of Han Chinese ethnicity; 3) between 18 and 65 years old. Exclusion criteria were as follows: 1) a history of other psychotic disorders; 2) a history of a neurological or severe medical disorder, substance abuse or dependency; 3) prior electroconvulsive therapy within the past 6 months or head injury resulting in loss of consciousness; 4) intellectual disability or neurological impairment; 5) other factors against fNIRS scanning, such as the patients who were too agitated. Patients with schizophrenia were further divided into 2 subgroups: AVHh+ who ever experienced AVH (n = 23) and AVHh˗ who never experienced AVH (n = 16). The division was determined by retrospective chart review and clinician interview confirming a documented history of AVH, consistent with approaches used in prior trait-level AVH research (e.g. Panikratova et al., [25]; Sone et al., [36]). Seventeen HCs were recruited from neighboring communities, who met the following criteria: 1) had no history of neurological, psychiatric or physical diseases; 2) between 18 and 65 years old. All included participants were right-handed and provided written informed consent to participate following the review of a complete study description. All experiments were performed in accordance with relevant guidelines and regulations of the Ethics Committee of Peking University Sixth Hospital, which comply with the Declaration of Helsinki.

Resting-state fNIRS data were acquired for at least 8 minutes in a dimly lit, isolated room. During the recording, the participants were instructed to sit still and focus on a fixation cross on the computer without falling asleep. Such resting-state recording did not require overt perceptual input or behavioral output.

fNIRS data acquisition

In this study, a multichannel fNIRS system (NirSmart-6000A, Danyang Huichuang Medical Equipment Co., Ltd., Jiangsu, China) with two wavelengths (730 and 850 nm) was used to continuously measure and record the concentration changes of brain oxyhemoglobin (HbO) and deoxyhemoglobin (HbR) during the resting state. Data were sampled at a frequency of 11 Hz. The fNIRS channels were defined as the midpoint of the corresponding light source-detector pairs. A total of 54 channels were built by 21 sources and 16 detectors for fNIRS measurement, arranged symmetrically over both hemispheres and positioned according to the international 10/20 system (Fig. 1). The inter-optode distance was 30 mm (range 27–33 mm), ensuring sensitivity to cortical gray matter beneath the optodes. Cap placement was guided by cranial landmarks (Cz, nasion [Nz], inion [Iz], left and right pre-auricular points [LPA/RPA]). Cz was first marked on the cap following the 10/20 convention; the D3 source-detector pair was then positioned such that its midpoint coincided with Cz. The acquired coordinates were then transformed into MNI coordinates and further projected to the MNI standard brain template using a spatial registration approach in NirSpace (Danyang Huichuang Medical Equipment Co., Ltd., China). Anatomical labeling was performed in MNI space with the Brodmann area (BA) atlas. Results of the channel registration and BA labeling are listed in Supplementary material Table S1. For each channel, the BA label was selected according to the highest overlap percentage, which is listed in Supplementary Material Table S2. Channels were then assigned to prespecified ROIs based on their BA labels and MNI locations (see Table 1).

Fig. 1: The fNIRS data acquisition.
Fig. 1: The fNIRS data acquisition.
Full size image

fNIRS optode layout design is showed as above, optodes were placed using the 10–20 system to cover bilateral frontotemporal regions and centered on Cz (midline, middle point of D3), Purple circles (S1-S21) and blue circles (D1-D16) represent sources and detectors, respectively. Brown rectangles represent channels. Semi-transparent overlays denote the ROIs and the corresponding channels: green, Superior Temporal Gyrus (STG); red, Broca's area (BA); yellow, Wernicke's area (WA); pink, left Supplementary Motor Area (SMA).

Table 1 ROIs with assigned fNIRS channels (all in the left hemisphere).

fNIRS data analysis

The NirSpark software package (Huichuang, China) was used to preprocess fNIRS signals, which has been used in previous studies [37] to analyze fNIRS data. Data were preprocessed as follows. First, an analyst performed a preliminary inspection of the raw data, marking and rejecting poor-quality signals, then extracted a stable 7-minute hemoglobin time series for each participant. Second, a spline interpolation algorithm was employed to correct motion artifacts unrelated to the experimental data. Spline interpolation is a commonly used correction method. Its advantage is that it only corrects pre-localized artifacts. Motion artifacts were manifested as impulse or cliff-type jumps caused by the relative sliding of the scalp and probes [38]. Subsequently, further analysis of the HbO and HbR data of channels covering functionally involved areas, namely, the four ROIs in the left hemisphere, was performed, and during the preprocessing, a bandpass filter with cut-off frequencies of 0.01 - 0.20 Hz was used for resting-state data and a 0.2 Hz low-pass filter was used for resting-state data to remove physiological noise (e.g., respiration, cardiac activity, and low-frequency signal drift). Then, the modified Beer-Lambert law was used to convert optical densities into changes in the HbO and HbR concentrations [39]. Motion artifacts were corrected by a moving SD and a cubic spline interpolation method. Once the noise components were identified, the concentration signal was reconstructed with these particular components eliminated from the original hemoglobin time course. The filtered concentration signal was used for further analysis. In this study, we used the HbO signal to present the following results because the HbO signal generally has a better signal-to-noise ratio than the HbR signal [40].

Statistical analysis

For each participant, FC at the channel level was defined as the Pearson correlation between the preprocessed HbO time series for every channel pair over the 7-min resting period. The formula is as follows:

$${R}_{x,y}=\frac{cov(X,Y)}{{\sigma }_{X}\,{\sigma }_{Y}}=\frac{E\left({XY}\right)-E(X)E(Y)}{\sqrt{E\left({X}^{2}\right)-{E}^{2}(X)}\sqrt{E\left({Y}^{2}\right)-{E}^{2}(Y)}}$$

Where cov (x, y) represents the covariance of X and Y; E (x) and E (y) represent the mean values of X and Y; and σX and σY represent the standard deviations of X and Y. Of note, these correlation coefficients (r) were normalized to z-values with Fisher’s r-to-z transformation, and the formula is as follows:

$$z=\frac{1}{2}\mathrm{ln}\frac{1+r}{1-r}={\tanh }^{-1}r$$

The FC matrix was also computed in NirSpark package (Huichuang, China). We generated a 54 × 54 correlation matrix for each participant. The group-averaged channel-pair matrices (HCs, AVHh-, AVHh+) are shown in Fig. 2 to document the data from which ROI signals were derived. Channels were assigned to corresponding ROIs as stated in Table 1 before analysis. All statistical analyses and data visualizations were performed using R version 4.5.0 [41]. The following packages were used: ggplot2, tidyverse, readxl, igraph and brainconn. For each ROI pair, Fisher z-transformed rs-FC values were compared across the three groups (HCs, AVHh-, AVHh+) using a one-way analysis of variance (ANOVA). For any ROI pair showing a significant group effect, Tukey-Kramer post-hoc tests were performed to identify the source of the difference and to control the family-wise error rate for all pairwise comparisons among the three groups, which is appropriate for potentially unequal group sizes [42].

Fig. 2: FC matrices derived from fNIRS data across three groups (from left to right: HCs, AVHh- and AVHh+, respectively).
Fig. 2: FC matrices derived from fNIRS data across three groups (from left to right: HCs, AVHh- and AVHh+, respectively).
Full size image

Each matrix displays correlation coefficients with Fisher's r-to-z transformation between fNIRS channels during resting state. The color bar on the right indicates FC, ranging from 0 (blue, low FC) to 1 (red, high FC), with warmer colors representing higher FC.

Demographic information was compared among the three groups using one-way ANOVA for continuous variables, and categorical variables were compared using the chi-square test; Clinical characteristics between patient groups were analyzed (Chou et al., 2021) using two-sample t test. In cases where the assumption of normality was violated (tested for normal distribution using Kolmogorov-Smirnov test), we used two-tailed Mann-Whitney U tests.

Results

Demographic and clinical characteristics

There were no significant differences across the three groups in age (F = 0.40, p = 0.675), education level (F = 0.98, p = 0.384), or gender (χ² = 2.75, p = 0.253). Additionally, there were no significant differences between the AVHh+ and AVHh- subgroups in illness duration (t (32.32) = 0.10, p = 0.92, 95% CI [–6.66, 7.38]) or antipsychotic dosage (chlorpromazine equivalents, CPZ equivalents; t (31.36) = 0.69, p = 0.49, 95% CI [−135.17, 274.47]). The details of demographic and clinical characteristics are listed in Table 2.

Table 2 Demographic and Clinical Characteristics of Participants.

Broca’s area-left SMA rs-FC decline in AVHh+

In the one-way ANOVA across the three groups, rs-FC between Broca’s area and the left SMA showed a significant group effect (F = 4.79, p = 0.010). No other ROI pairs reached significance (all p ≥ 0.050). Tukey-Kramer post-hoc analysis localized this effect to the AVHh+ versus AVHh- contrast (p = 0.009; Fig. 3), whereas comparisons involving HCs were not significant (p = 0.244 for HCs vs. AVHh-; p = 0.258 for HCs vs. AVHh+). These findings indicate that the Broca-left SMA connection is selectively associated with the presence of AVH history among patients with schizophrenia.

Fig. 3: ROI Level FC Comparison of AVHh+ and AVHh- groups.
Fig. 3: ROI Level FC Comparison of AVHh+ and AVHh- groups.
Full size image

From left to right are top, left-lateral, and right lateral views of the cortical surface. Orange = left Supplementary Motor Area (SMA): red = Broca's Area (BA): black line = the Broca-left SMA connection.

Discussion

In this study, we used fNIRS to investigate whether rs-FC within language- and perception-related networks in left hemisphere indexes trait-like vulnerability to AVH in schizophrenia. We found that AVHh+ showed significantly lower rs-FC between Broca’s area and the left SMA compared AVHh-, extending previous evidence of resting-state network dysconnectivity in AVH [43, 44]. Broca’s area and the SMA have been proposed as key nodes for the generation and monitoring of inner speech [20, 45, 46], our finding is generally in accordance with these previous studies, suggesting a dysfunction of inner speech generation and monitoring. In contrast, our results diverge from those of Clos et al., who examined resting-state fMRI connectivity in psychotic patients with frequent AVH [47]. and reported increased FC between Broca’s area and the SMA relative to HCs. They interpreted this pattern as reflecting reduced task-related SMA activation but increased rs-FC between the left SMA and Broca’s area, potentially indexing an upregulated generation of inner speech in psychosis. Several methodological differences may account for this discrepancy. Patients in Clos et al.’s study experienced AVH several times per day for at least one year, and the comparison was restricted to AVH patients versus HCs, without including a schizophrenia group without AVH; as a result, their findings may be confounded by general differences between schizophrenia and health. By directly contrasting schizophrenia patients with and without a history of AVH, our study offers an alternative, within-diagnosis approach that may aid in characterizing connectivity alterations specifically related to AVH.

Beyond its potential role in AVH, the SMA has increasingly been conceptualized as a core hub for temporal prediction, cognitive control, and self-agency in schizophrenia. In healthy individuals, functional neuroimaging studies consistently demonstrate SMA engagement during timing and related cognitive operations (Ortuño et al., 2005, 2011; Wiener, 2024) [48,49,50]. Recently, a large meta-analysis further identifies the SMA as the only region robustly recruited across sub-second vs. supra-second intervals and across both motor and non-motor timing paradigms, underscoring its centrality to generic time processing [51]. In schizophrenia, this timing network appears systematically compromised: reduced recruitment of the SMA-basal ganglia-thalamic circuit has been observed during both timing tasks and high-load cognitive-control paradigms, with SMA and striatal hypoactivation becoming more pronounced as temporal or executive demands increase [52,53,54]. Converging multimodal evidence further indicates that the SMA shows both structural abnormalities and functional activation deficits, supporting its disruption as a key neurobiological substrate of impaired cognitive control [55]. Complementing this, SMA effective connectivity has been linked to performance in social cognition, reasoning/problem solving, and working memory in schizophrenia, suggesting that alterations in SMA-centered networks contribute to broad cognitive dysfunction [56]. Finally, studies of the sense of agency highlight a pivotal role of the SMA: it is selectively engaged when mismatches between predicted and actual outcomes of self-generated actions are detected in healthy individuals and, in lesion-based Bayesian accounts, has been implicated in unwanted, seemingly autonomous movements such as those observed in alien limb syndrome [57, 58]. Taken together, these findings position the SMA at the intersection of temporal processing, cognitive control, and self-agency dysfunctions in schizophrenia, providing a coherent framework within which our observed reduction in Broca-SMA rs-FC may index a trait-like vulnerability underlying AVH.

A comprehensive review of functional and anatomical connectivity studies in AVH conducted by Curčić-Blake et al. concluded that aberrant coupling within the broader language–auditory–memory network is a hallmark of hallucination-prone patients [11]. Our current results are in line with this general framework, underscoring that trait-like connectivity disruptions extend beyond the classic fronto-temporal loop to include frontal speech-motor integration circuits. Specifically, whereas Curčić-Blake et al. highlighted ongoing debates about fronto-temporal disconnection in schizophrenia, our finding pinpoints that a weakened Broca-SMA coupling may represent a stable marker of AVH vulnerability. This complements prior observations of fronto-temporal dysconnectivity by suggesting that the speech generation network itself may be fundamentally impaired in those susceptible to hallucinations. Our results also complement findings by Storchak et al., who showed that reduced activation in motor-related regions correlates with higher hallucination proneness in non-clinical populations [59]. They emphasized deficits in monitoring rather than generating inner speech, implicating the SMA as crucial for self-attribution processes. This supports our interpretation that diminished Broca-SMA connectivity could impair the brain’s ability to correctly label internally generated speech as self-originated, contributing to the misattribution characteristic of AVH.

Several limitations temper the interpretation of our findings. First, the cross-sectional design prevents firm causal inferences about whether reduced Broca-left SMA rs-FC reflects a pre-existing vulnerability factor or a consequence of recurrent AVH; longitudinal studies, ideally including medication-naïve or first-episode patients, will be needed to replicate and extend these results. Second, the sample size, while comparable to similar fNIRS studies in schizophrenia, limited statistical power to detect more subtle associations between connectivity and AVH phenomenology or to examine potential moderating effects of clinical variables (e.g., medication, symptom dimensions). Third, an important methodological constraint concerns the spatial coverage and localization inherent in our fNIRS setup. The optode montage in the present study was primarily focused on frontal regions, which restricted our analyses to superficial portions of language- and control-related cortical networks [28, 60]. Although we defined ROIs including Broca’s area, Wernicke’s area, SMA and STG, only anterior segments of the STG and perisylvian cortex could be partially sampled; posterior STG and more caudal temporal-parietal language regions remained largely inaccessible to fNIRS. Thus, our conclusions regarding “language-related circuits” and Broca-SMA connectivity should be interpreted as referring to anterior, dorsolateral portions of these networks rather than to the entire canonical language system. Moreover, owing to the limited penetration depth of near-infrared light, we were unable to assess connectivity with subcortical and deep cortical structures (e.g., basal ganglia, thalamus, and medial temporal regions) that are also implicated in AVH and cognitive-control processes [49, 54]. In addition, the limited spatial resolution and depth sensitivity of fNIRS mean that the exact cortical generators of the recorded signals cannot be localized with the same precision as in high-field fMRI, which further constrains the anatomical specificity with which our connectivity findings can be interpreted. Finally, the present study focused on rs-FC and did not include task-based paradigms (e.g., time-estimation or cognitive-control tasks), precluding direct inferences about how Broca-SMA rs-FC abnormalities manifest during active inner-speech, temporal-processing, or control operations.

Conclusion

In conclusion, within a dysconnectivity framework, the reduced Broca-left SMA rs-FC observed in AVHh+ may reflect a disruption of language-related circuits that confers a neurophysiological predisposition to AVH, serving as a potential trait-like marker of AVH vulnerability. Situated against surging evidence that large-scale dysconnectivity in temporal-processing, cognitive-control, and self-agency networks characterizes schizophrenia, our findings highlight Broca-SMA coupling as one node-level manifestation of broader network-level abnormalities rather than an isolated regional deficit. Future studies should therefore move beyond cross-sectional resting-state designs by combining longitudinal follow-up with task-based fNIRS paradigms that explicitly probe temporal prediction and time estimation, as well as high-load cognitive-control demands, to characterize how Broca-SMA connectivity dynamically interacts with AVH onset, persistence, and remission. In parallel, integrating fNIRS with multimodal and network-analytic approaches and fine-grained behavioral measures of timing, cognitive control, and sense of agency will be crucial for establishing the stability and clinical utility of this FC marker and for refining our understanding of AVH as an emergent property of large-scale network dysconnectivity rather than a purely focal abnormality.