Introduction

Schizophrenia is a chronic, severe mental illness that affects how a person thinks, feels, and behaves. Common symptoms associated with schizophrenia include positive, negative, and cognitive symptoms. Cognitive dysfunction is a core feature of schizophrenia that predicts functional outcome and treatment adherence [1, 2]. Accumulating evidence indicates that cognitive deficits are present in adolescents at risk for schizophrenia and in untreated patients with first-episode schizophrenia [3]. However, the neural basis of cognitive deficits in schizophrenia remains unclear and current antipsychotics have demonstrated limited efficacy and reliability in treating cognitive impairments [4].

Decision making is a fundamental cognitive function involving intricate interactions among a distributed network of brain circuits. Individuals with schizophrenia show different behavioral patterns and worse performance than healthy controls in many decision-making tasks, such as in Iowa gambling task [5, 6], probabilistic stimulus selection task [7], and probabilistic reversal learning [8]. Our group also identified unique behavioral patterns in individuals with schizophrenia exhibiting high and low levels of psychosis in a two-choice probabilistic task [9, 10]. By fitting a reinforcement-learning model to behavioral data, these patients tend to update their reward values more rapidly and exhibit lower levels of choice perseveration compared to controls.

Genetic studies highlight the involvement of susceptibility genes, including AKT1 (protein kinase Bα), in schizophrenia pathogenesis [11,12,13]. A ~70% reduction in AKT1 protein levels has been reported in the lymphocytes, frontal cortex, and hippocampus of individuals with schizophrenia [11]. Similarly, Akt1 heterozygous mutant (HET) mice show an approximately 76% decrease in Akt1 protein levels in the striatum, along with decreases in the prefrontal cortex, hippocampus, and cerebellum [14]. As a key signaling intermediate downstream of the dopamine D2 receptor, AKT1 is the best-established target of antipsychotic drugs, and the AKT1-GSK3 signaling cascade plays a crucial role in dopamine-associated behaviors [15, 16]. Studies of postmortem brain tissue from individuals with schizophrenia [11, 17], Akt1-deficient mice [18,19,20], and functional neuroimaging in humans [21] further support the biological function of AKT1 and its role in schizophrenia susceptibility. Intriguingly, Akt1-deficient mice exhibited altered neural properties of striatal medium spiny neurons (MSNs) and methamphetamine-induced alteration of striatal activity [19]. These mice also displayed heightened reward prediction error (RPE, the discrepancy between expected and actual rewards) and updated reward information more rapidly in a two-choice dynamic foraging T-maze [14].

The striatum has been proposed to contribute to reward learning and decision-making. Lesions of the dorsal striatum impaired working memory, attention, and cognitive control [22]. In a reward-based decision-making task, subjects need to integrate information of feedback to estimate the causal relationship between the action and the result and maximize rewards [23]. The activity of mesostriatal dopamine neurons had been shown to signify the RPE [24]. The dorsal striatal neurons in rats showed the RPE coding like that of dopamine neurons during a probabilistic Pavlovian conditioning task [25]. Selective lesions of the dorsomedial striatum (DMS) impaired serial spatial reversal learning in rats [26]. Neither lesions of nucleus accumbens core/shell nor dorsolateral striatum affected the behavioral performance in reversal learning. These results suggest the involvement of the dorsal striatum in reversal learning. Further clarification is required concerning the precise role of the striatum in complex decision-making scenarios where there is no obvious correct choice, and its implications for AKT1 and cognitive impairments related to schizophrenia.

In this study, a series of six experiments were conducted to investigate various aspects of decision-making in a mouse model related to schizophrenia risk. Experiment 1 focused on the effects of different striatal subregions using lesion and model fitting techniques in a two-choice probabilistic task (2C task, Fig. 1A). Experiment 2 employed Akt1 heterozygous mutant (Akt1+/− or HET) mice, targeting AKT1 role as a susceptibility gene to examine choice behaviors and model fitting compared to wild-type (WT) littermate controls within the 2C task paradigm. Experiment 3 involved simultaneous recording of local field potentials (LFP) in the dorsomedial striatum (DMS) during behavioral tasks to correlate neural activity with performance. Experiment 4 used chemogenetic inhibition in HET mice and WT controls to establish a causal link between DMS activity and behavioral outcomes. Experiment 5 employed RNA sequencing (RNA-seq) and immunohistochemistry in the striatum to identify differential gene expressions influencing decision-making regulation between HET and WT mice. Finally, Experiment 6 selectively lesioned parvalbumin (PV) interneurons in the DMS to investigate their role in modulating decision-making processes and their impact on behavioral outcomes in the 2C task paradigm.

Fig. 1: Effects of NMDA excitotoxic lesions on striatal subregions (the dorsolateral striatum (DLS), dorsomedial striatum (DMS), and nucleus accumbens (NA)) in the mouse version of two-choice probabilistic task (2C task) of Experiment 1.
figure 1

A Illustration depicting the trial structure of the mouse version of the 2C task utilized in this study. B Mean ± SEM of accumulated trials for the lesion and sham groups across three subregions of the striatum during the testing phase of the 2C task. C Mean ± SEM percentage of different choice strategies (win-stay, win-shift, lose-stay, and lose-shift) for the lesion and sham groups within the three striatal subregions. D Group-level distributions of model parameters derived from reward-no-reward model-based analysis in Experiment 1. Notably, mice with DMS lesions exhibited significantly lower reward learning rate (αrew) and no-reward learning rate (αnor) but higher choice consistency (β) compared to their sham controls (* p < 0.05).

Materials and methods

Animal

Male C57BL/6 mice were obtained from the Animal Center of National Taiwan University (NTU) School of Medicine. Akt1 heterozygous (HET) male mice and their wild-type (WT) littermates were bred from Akt1 HET pairs. PV-Cre (Jax-008069) and GAD-cre (Jax-010802) mice were used to study specific role of cell types in the 2C task. All mice were on a C57BL/6 background and genotyped via PCR of tail DNA. They were housed individually with ad libitum food and water, starting experiments at 2–3 months old. Mice were handled and weighed daily for one week before experiments. All animal procedures adhered to protocols approved by the Animal Care and Use Committees at NTU. Animals were randomly assigned to experimental groups based on their genotypes or experimental treatments to ensure unbiased allocation.

Two-choice probabilistic task (2C task)

The 2C task was adapted from a dynamic foraging task used previously in humans and mice [9, 10, 14, 27]. Shown in Fig. 1A, this task featured a two-alternative forced-choice paradigm with one lever offering high-rate rewards and the other low-rate rewards, conducted daily over 45-min sessions. Trial counts and choice outcomes were recorded using The Graphic State 4.2.03 software from Coulbourn Instruments. The experimental protocol included shaping, surgery and recovery, reshaping, and testing phases, detailed below.

The shaping and reshaping phases

Animals underwent food (or water in Experiment 1) restriction to 85% of their original body weight and locomotor activity assessment before the shaping phase. Each shaping stage lasted 45 min, with mice advancing to the next stage daily upon meeting criteria.

The surgery and recovery phase

Experiments 1, 3, 4, and 6 included surgery and recovery. Under isoflurane anesthesia (1.5%), mice underwent stereotaxic surgery with skull burr holes drilled. Procedures included neurotoxin microinjection, electrode implantation, or viral microinjection as dictated by experimental conditions. Post-surgery, spontaneous locomotor activity was assessed in an open field using EthoVision video tracking (Noldus Information Technology).

For Experiment 1, lesions targeted the dorsomedial striatum (DMS; AP, 0.5 mm; ML, ±1.5 mm; DV, −3.0 mm), dorsolateral striatum (DLS; AP, 0.5 mm; ML, ±2.5 mm; DV, −3.0 mm), or nucleus accumbens (NA; AP, 1.8 mm; ML, ±1.1 mm; DV, −4.7 mm) with NMDA solution (20 mg/mL in 0.2 µL) infusion via Hamilton syringe. Post-operative analgesics were administered for 7 days.

In Experiment 3, electrode implants targeted the DMS region (AP, 0.5 mm; ML, ±1.5 mm; DV, −3.0 mm) using a 4-electrode array. Electrodes were secured with dental cement and analgesics were provided for 7 days. Mice fully recovered before entering the reshaping phase in all experiments.

For viral microinjection surgery (Experiments 4 and 6), each mouse underwent bilateral microinjection of a virus mix targeting the DMS (AP, 0.5 mm; ML, ±1.5 mm; DV, −3.0 mm; 0.6 μL per site). The virus mix consisted of AAV with Cre-inducible Gi-coupled human M4 muscarinic receptor (AAV-hsyn-DIO-hM4D(Gi)-mCherry, NTU AAV core) and AAV with Cre expression driven by CMV promoter (AAV-CMV-Cre, NTU AAV core) in a 1:1 ratio. AAV groups received the full virus mix, while sham groups received AAV-CMV-Cre only, matched in volume to the virus mix. For PV-Cre mice, AAV8 injections contained Cre-inducible expression of diphtheria toxin A (AAV-mCherry-FLEx-DTA, UNC vector core). Mice remained in their home cage for 3 weeks post-surgery to allow for full virus expression and recovery before entering the reshaping phase of the 2C task.

The testing phase

Following the shaping phase (or reshaping phase in Experiments 1, 3, 4, and 6), mice entered the testing phase, aiming to achieve specific reward rates: 60–20% for sucrose water in Experiment 1 and 80–20% for food pellets in other experiments (Fig. 1A). Each 45-min daily session comprised 3–6 blocks (each block with 10 trials). Sessions began with house and food magazine lights illuminating. A nose-poke initiated a trial, extinguishing the food magazine light. A 5-s fixed inter-trial interval (ITI) preceded insertion of stimulus-response levers. After the ITI, two levers were presented, and mice pressed one. Each press led to a reward or no-reward outcome, followed by food magazine illumination. Trials ended when the reward was collected or after a 5-s wait post-nose-poke. Mice learned through trial and error to identify the high reward rate lever. Completion criteria required achieving ≥ 70% accuracy in lever choice across three consecutive blocks, with an average accuracy > 75%. Mice had 2 weeks to meet these criteria; failure resulted in data exclusion.

The analysis of choice strategy in the 2C task

Trial-by-trial choice data from all mice in the testing phase of the 2C task were recorded and analyzed for accumulated trials and choice strategy. The analysis of choice strategy encompassed four distinct strategies: win-stay, win-shift, lose-stay, and lose-shift. The ratio of each choice strategy was computed using a custom R code. The ratio for each choice strategy was determined by dividing the number of occurrences of the specific strategy by the total accumulated trials.

Fitting a reinforcement learning model to behavioral data in the 2C task

To explore the mechanism governing RPE (reward prediction error)-driven choice behavior, we selectively applied a reinforcement model to fit trial-by-trial behavioral data from mice engaged in the 2C task. Model fitting was performed using Rstan [28] and hBayesDM R packages [29] with custom code (available on request). Hierarchical Bayesian modeling with the MCMC algorithm estimated parameters from trial-by-trial choice data. Differences in parameters among mice were compared using posterior distribution values from the Bayesian estimation.

We applied a modified Q-learning model to examine how RPE affects and updates expectations. The model separates the learning rate (α) into αrew for rewarding results and αnor for no-reward results, determining the update speed of expected values. The model equations are as follows:

$${\rm{Qc}}({\rm{t}})={\rm{Qc}}({\rm{t}}-1)+{\alpha }_{{\rm{rew}}}\delta ({\rm{t}}-1)+{\alpha }_{{\rm{nor}}}\delta ({\rm{t}}-1)$$
$$\delta ({\rm{t}}-1)={\rm{Rc}}({\rm{t}}-1)\mbox{-}{\rm{Qc}}({\rm{t}}-1)$$

Here, αnor is set to 0 on reward trials, and αrew is set to 0 on no-reward trials.

To characterize how the choice tendency is guided by the updated expectation, we assumed that the probability of choosing the previously selected lever, Pc(t), was determined by the Boltzmann exploration, represented in a logistic form assigning a weight to each action:

$${\rm{Pc}}({\rm{t}})={{\rm{e}}}\wedge{(\beta {\rm{Qc}})}/({{\rm{e}}}\wedge{(\beta {\rm{Qc}})}+{{\rm{e}}}\wedge{(\beta {\rm{Qnc}})})$$

Here, the parameter β denotes the choice consistency (choice perseveration or exploration/exploitation) parameter, describing the tendency to make actions guided by expected reward values.

For MCMC analysis, both αrew and αnor were assigned a non-informative beta distribution (β (1.2, 1.2)) between 0 and 1 for the prior. A Gaussian prior between 0 and 10 was assigned to β.

In vivo electrophysiological recording of the DMS

Measuring local field potentials (LFPs)

In Experiment 3, LFPs in the DMS were recorded using the Plexon system (Plexon, Inc., Dallas, TX, USA) during the testing phase of the 2C task, with a sampling rate of 1 kHz. Event time points were imported into MATLAB for ERP analysis. Normalized LFPs were segmented into −1 to 1-s epochs around each event: (1) Trial initiation (nose-poke to start), (2) Lever press (choice-making), and (3) Outcome (entering the food magazine for reward or no reward). This segmentation facilitated ERP component extraction for decision-making analysis. To calculate power, raw LFP data were normalized to the root mean square of the voltage signal over the entire session. Power spectra were then computed using the Chronux MATLAB toolbox [30] with the multi-taper method. The resulting power spectral density (PSD) was averaged across specific frequency bands (θ: 5–8 Hz; γ: 30–80 Hz), as described previously [17]. The 60 Hz frequency component, considered noise, was excluded from statistical analysis.

1/f background removal

To isolate narrowband oscillations from broadband aperiodic activity, we applied 1/f correction by fitting a linear function to the log–log transformed power spectrum, following best practices [31]. This approach minimizes misinterpretation of broadband shifts as rhythmic changes and enhances detection of genuine oscillations (e.g., θ, γ). For each trial, a 2-s window (–1 to +1 s around head entry) was analyzed using a sliding window (1.0 s length, 50 ms step). At each time point, the PSD was estimated, transformed to log–log space, and fit with a linear model. The fitted 1/f component was subtracted to yield background-corrected spectra, enhancing detection of true oscillatory activity.

Histological verification of electrode placement

After behavioral testing, mice were euthanized, and electrode positions marked by passing current (10 μA, 30 s) to create iron deposits, visualized with potassium ferrocyanide.

Inhibition of the DMS of Akt1 HET mice during the 2C task

To investigate the causal relationship between the DMS neuronal activity and reward-related decision-making behavior, we employed chemogenetic modulation to directly inhibit the activity of the DMS in the 2C task. Adult male HET and WT mice (90–100 days old, n = 4–5 per group) were used in Experiment 4. Following virus mixture injection (AAV-hsyn-DIO-hM4D(Gi)-mCherry + AAV-CMV-cre), mice received clozapine N-oxide (CNO, 5 mg/kg, i.p.) 30 min before testing. Freshly prepared CNO in 1% DMSO saline was used. After meeting criteria, mice underwent 2-day CNO-off sessions to mitigate chronic injection effects [32].

RNA sequencing (RNA-seq) and validation

RNA sample collection

Left or right striatum was dissected from male HET and WT mice (90–100 days old, n = 4 each) in Experiment 5. RNA was extracted using Trizol (Thermal Fisher) and QIAamp RNeasy Mini Kit (QIAGEN). Samples were quantified by Qsep100 Capillary gel electrophoresis (RQN > 8.0), Nanodrop 2000 (260/280 ratio between 1.8 ~ 2.0, 260/230 > 2.0), and Qubit 3 Fluorometer (RNA concentration). Only high-quality RNA was used for RNA sequencing.

RNA-Seq library construction and sequencing

Poly-A enriched libraries were prepared using the SureSelect Strand Specific RNA Library Prep Kit (Integrated Science) and sequenced on the Illumina Miniseq system with an eight-base index for sample identification.

Analysis for RNA-Seq data

Raw read quality was assessed with FastQC (Babraham Bioinoformatics) and mapped using STAR 2.7.6a (mapping rates > 98%). Mapped reads were aligned to the Mus musculus genome GRCm38 with Gencode vM25 annotation. Alignment quality was checked by RSecQC, and gene expression levels were quantified by featureCounts as transcript per million. Differential expression analysis and volcano plots were generated using limma in R.

Gene selection and primer design

Target and reference genes were selected based on differential expression analysis, prioritizing the top 10 genes ranked by p-value, significant fold changes (log₂FC > 2), and known associations with schizophrenia, parvalbumin (PV) expression, or Akt1 function. Key genes of interest included Akt1, PV, GAD67, Calr, Ascl1, and Cldn5. Gapdh was used as the reference gene. Primer sequences were designed using PrimerQuest (Integrated DNA Technologies), and full primer details are provided in the Supplementary Methods.

Reverse transcriptome-quantitative real-time PCR (RT-qPCR)

RNA was extracted as mentioned above, and cDNA synthesized using LunaScript RT SuperMix Kit (#E3010, New England Biolabs). For qPCR, 0.5 μl of cDNA was used in a 10 μl reaction with SYBR Green I-based Luna Universal qPCR Master Mix (#M3003, New England Biolabs), and Applied Biosystems StepOne qPCR machine. Threshold cycles (CT) were calculated, and relative expression determined using the ΔΔCT algorithm: ΔΔCT = (CTA – CTref) − (CTB – CTref); Relative expression = 2^(−ΔΔCT).

Immunohistochemistry

Animals were transcardially perfused with 0.1% saline, followed by 4% paraformaldehyde. Brains were extracted and sectioned coronally at a thickness of 40 μm using a cryostat (HM-520, Thermo Scientific, Waltham, MA, USA). Free-floating sections were incubated in 3% H₂O₂ to quench endogenous peroxidase activity, then rinsed in 0.02 M potassium phosphate-buffered saline (KPBS, pH 7.0). After blocking with 5% skim milk for 1 h at room temperature, sections were incubated overnight at 4 °C in 5% skim milk containing the primary antibody. Parvalbumin-positive interneurons were labeled using a rabbit anti-parvalbumin antibody (1:1500, Sigma-Aldrich). Neuronal density in the DMS was quantified using NIH ImageJ software. Detailed procedures are described in the Supplementary Methods.

Selective lesioning PV interneurons in the DMS

For investigating the causal relationship between the DMS PV interneurons and the reward-related decision-making behavior, we selectively lesioned DMS PV interneurons by the virus-expressed diphtheria toxin A (DTA) in PV-cre mice before the 2C task. Adult male PV-cre mice and their WT littermates (90–100 days old) were used (n = 8–11, per group) in Experiment 6. The experimental schedule followed the previously described protocol, with virus injection (AAV-mCherry-FLEx-DTA) for virus expression occurring 3 weeks before the task.

Data analyses and statistics

The investigators were blinded to the group allocation during the experiment and when assessing the outcome to minimize bias. Data are presented as mean ± SEM. Behavioral data were analyzed using Student’s t-test or one-way ANOVA for genotypic differences, and Mann-Whitney U test for choice strategy ratios. Effect sizes were measured by Cohen’s d (≥ 0.8, large effect) and rank-biserial r (maximum = 1). Pearson correlation evaluated relationships between behavioral data and neural oscillation power. Data with misplaced injections or electrodes were excluded. The two-sample Kolmogorov-Smirnov test was employed to reveal genotypic/group differences in the distribution of model parameters of the reinforcement learning model. A p-value below 0.05 was considered statistically significant.

Results

Experiment 1: Lesions in the DMS notably disrupted performance in the 2C task more than other striatal subregions

In Experiment 1, mice with dorsomedial striatum (DMS), dorsolateral striatum (DLS), and nucleus accumbens (NA) lesions were used to evaluate striatal subregion roles in the 2C task (Fig. 1A). Compared to the sham group, selective excitotoxic lesions of different subregions of the striatum had no significant effect on their spontaneous locomotor activity in the open field before and after brain surgery, except in the NA lesion condition (t(9) = 2.602, p < 0.05, Cohen’s d = 1.735 > 0.8). As depicted in Fig. 1B, our behavioral data indicated that DMS lesions significantly impaired behavioral performance in the 2C task compared to the sham group (t(14) = 2.003, p < 0.05, Cohen’s d = 1.071 > 0.8). Notably, no significant differences were observed in goal-directed motor behaviors during the 2C task, such as latency to nose poke or latency to collect food rewards, before and after DMS lesions (data not shown). Conversely, lesions of the DLS and NA demonstrated no effect on accumulated trials compared to the sham group (both p > 0.05). No significant group difference in choice strategy was found among the three subregions (Fig. 1C). Furthermore, two-sample Kolmogorov-Smirnov test revealed that DMS lesions caused more pronounced differences in reinforcement learning model parameters compared to lesions in the other two striatal subregions. As shown in Fig. 1D, mice with lesions of the DMS had a significantly lower reward learning rate (αrew, D(40000) = 0.3459, p < 0.05), no-reward learning rate (αnor, D(40000) = 0.7985, p < 0.05), and higher choice consistency (β, D(40000) = 0.5552, p < 0.05) compared to their sham controls in our reward-no-reward model based analysis. Additionally, the NA lesion mice exhibited a lower no-reward learning rate and greater choice consistency, while the DLS lesion mice showed decreased learning rates for both reward and no-reward conditions. These findings suggest that distinct striatal subregions contribute to different aspects of the decision-making process in the 2C task, with DMS lesions resulting in more pronounced behavioral deficits compared to the other lesion groups.

Experiment 2: Akt1-deficient (HET) mice displayed aberrant behaviors in the 2C task compared to their WT littermates

HET mice were employed as a mouse model related to schizophrenia risk in Experiments 2 and 3 to investigate reward-based probabilistic decision making, choice strategy, and neural activity in the 2C task. No significant genotypic effect was observed in their locomotor activity (data not shown). Compared to WT controls, HET mice required fewer trials to reach learning criteria (t(16) = 3.695, p < 0.01, Cohen’s d = 1.848 > 0.8; Fig. 2A), and exhibited decreased lose-stay strategy (t(16) = 2.536, p < 0.05, Cohen’s d = 1.268 > 0.8; Fig. 2B). The trial-by-trial choice behavioral data were further fitted with a reinforcement learning model to estimate αrew, αnor, and β. The Kolmogorov–Smirnov test revealed genotypic differences in the probability of the posterior distributions of parameters. HET mice exhibited a lower αrew (D(40000) = 0.6483, p < 0.05), a higher αnor (D(40000) = 0.8708, p < 0.05), and a lower β (D(40000) = 0.4189, p < 0.05) compared to their WT littermates (Fig. 2C).

Fig. 2: Behavioral, model-fitting, and local field potential data comparison between Akt1 heterozygous (HET) mice and wild-type (WT) mice in the two-choice probabilistic task of Experiments 2 and 3.
figure 2

A Mean ± SEM of accumulated trials for HET and WT mice in Experiment 2. B Mean ± SEM of the ratio of different choice strategies (win-stay, win-shift, lose-stay, and lose-shift) for HET and WT mice. Analysis revealed a decreased lose-stay strategy in the HET group compared to the WT group. C Group-level distributions of model parameters (reward learning rate (αrew), no-reward learning rate (αnor), and choice consistency (ß)) in Experiment 2. HET mice exhibited a lower αrew, a higher αnor, and a lower ß compared to the WT group. D Time–frequency representations of local field potential (LFP) activity in the dorsomedial striatum (DMS), aligned to food magazine entry (0 s; indicated by white dashed lines), are shown for WT and HET mice under reward and no-reward conditions in Experiment 3. Average power maps are presented for the reward condition (left), no-reward condition (middle), and the difference between the two conditions (right), all following 1/f background correction. Food magazine entry served as a consistent behavioral anchor, enabling the analysis of neural dynamics associated with reward evaluation and consumption. E & F Correlations between theta/gamma power and accumulated trials in Experiment 3 under reward and no-reward conditions, respectively. Theta and gamma powers were significantly correlated with accumulated trials only in the no-reward condition. *p < 0.05.

Experiment 3: The power of local field potential in the DMS is highly correlated with no-reward condition

In Experiment 3, neural activity in the DMS of both WT and HET mice was closely associated with choice outcomes. After 1/f correction, WT mice showed increased theta power (5–8 Hz) during reward condition, especially near head entry (time 0), suggesting a role in reward anticipation (Fig. 2D, left). Despite low raw gamma power, corrected maps revealed localized increases in the 50–70 Hz range for reward > no-reward, indicating true oscillatory activity. In HET mice, theta and gamma power were elevated during no-reward condition (Fig. 2D, right), reflecting altered reward processing. Electrode implantation did not affect open field locomotion (data not shown). During reward condition (Fig. 2E), neither theta nor gamma power correlated with accumulated trial count, but in the no-reward condition (Fig. 2F), both were significantly correlated (theta: r(18) = 0.40; gamma: r(18) = 0.66; p < 0.05). These findings underscore the importance of theta and gamma activity in the DMS for guiding decision-making, especially when expected rewards are omitted.

Further analysis revealed correlations between DMS local field potential power and parameters of the reinforcement learning model, detailed in Table 1. Notably, in the no-reward condition, theta and gamma powers were significantly correlated with the no-reward learning rate in both WT and HET mice (all p < 0.05). A marginal correlation was observed between DMS local field potential power and choice consistency in HET mice, but not in WT mice. In contrast, in the reward condition, no significant correlations were found between DMS local field potential power and reinforcement learning model parameters. These results underscore a robust link between DMS local field potential power and choice behavior specifically under conditions where an anticipated reward is not received during the 2C task.

Table 1 The Pearson correlation between DMS power and model parameters in wild-type (WT) and Akt1+/− (HET) mice.

Experiment 4: Observed behavioral abnormalities in HET mice were restored by DMS inhibition

Compared to WT controls, HET mice showed significantly higher DMS gamma power during habituation (t(16) = 2.34, p < 0.05, Cohen’s d = 1.17 > 0.8; Fig. 3A). To establish causality, DREADDs were used to inhibit DMS activity in HET mice during decision-making. A heatmap depicting the overall viral spread in the DMS reveals an average coverage of approximately 73% (range: 47.8–98.5%). Representative histological images are shown for the WT, HET-Control, and HET-CNO groups (Fig. 3B). There were no significant differences in open field test locomotor activity before or after microinjection surgery (Fig. 3C). Chemogenetic inhibition of DMS activity with CNO in HET mice significantly increased accumulated trials in the 2C task compared to HET vehicle-treated controls (U(5, 4) = 0.50, p < 0.05, rank-biserial r = 0.95; Fig. 3D). Despite the small sample size, the effect size index (rank-biserial r) indicates a substantial effect, reaching a maximum value of 1. Although a trend was observed between HET-control and WT groups (U(4, 5) = 4, p = 0.057, rank-biserial r = 0.6), no significant difference was found between WT and HET-CNO groups (U(4, 4) = 5, p = 0.2429, rank-biserial r = 0.38). Additionally, HET-CNO mice exhibited an increased lose-stay strategy compared to HET controls (U(6, 4) = 2, p < 0.05, rank-biserial r = 0.83; Fig. 3E). Model-based analysis (Fig. 3F) revealed that CNO-treated HET mice had decreased αnor and increased β compared to HET controls. These results indicate that HET mice exhibited significantly higher DMS gamma power during habituation, and chemogenetic inhibition of DMS activity during decision-making increased accumulated trials and altered decision-making strategies in the 2C task, underscoring the critical role of DMS activity in modulating behavior.

Fig. 3: Effects of chemogenetic inhibition of the dorsomedial striatum (DMS) on decision-making behavior in Akt1 heterozygous (HET) mice in Experiment 4.
figure 3

A Average powers in the DMS between HET and WT mice during habituation in the testing chamber. B Heatmap showing overall viral spread within the dorsomedial striatum (DMS), averaging ~73%, along with representative images from WT, HET-Control, and HET-CNO (clozapine N-oxide) groups. C Mean ± SEM travel distance among the three groups before (left panel) and after (right panel) brain surgery, showing no significant differences. D Mean ± SEM accumulated trials for the three groups in the two-choice probabilistic task (Experiment 4). A significant enhancement of accumulated trials was observed in the HET-CNO group compared to the HET-control group. E Mean ± SEM of the ratio of different choice strategies (win-stay, win-shift, lose-stay, and lose-shift) for the three groups. Analysis revealed an increased lose-stay strategy in the HET-CNO group compared to the HET-control group. F Group-level distributions of model parameters (reward learning rate (αrew), no-reward learning rate (αnor), and choice consistency (ß)) derived from reward-no-reward model-based analysis. The HET-CNO group exhibited a decreased αnor and an increased ß compared to the HET-control group. *p < 0.05.

Experiment 5: RNA-seq analysis and immunohistochemistry data highlight unique expression of striatal parvalbumin (PV) interneurons in HET mice

RNA sequencing (RNA-seq) analysis was performed on striatal samples from 4 WT and 4 HET mice, followed by validation using RT-qPCR and immunohistochemistry. Principal components analysis (PCA) depicted the distribution of gene expression across samples (Fig. 4A). Volcano plot analysis identified top differentially expressed genes, with Akt1 and parvalbumin (Parvb/PV) among the notable up and down-regulated genes (Fig. 4B). Detailed results of the RNA-seq analysis, including fold changes, are provided in the Supplementary Table.

Fig. 4: RNA sequencing (RNA-seq) analysis and immunohistochemistry data indicate the unique expression of striatal parvalbumin (PV) interneurons in Akt1 heterozygous (HET) mice in Experiment 5.
figure 4

A Three-dimensional plot illustrating the principal component analysis (PCA) on the RNA-seq dataset comparing Akt1 heterozygous (HET) mice and wild-type (WT) mice (n = 4 each). B Volcano plot displaying the differentially expressed genes (DEGs) in the striatum of HET and WT mice, showing log 2-fold change on the x-axis and false discovery rate-adjusted p-value on the y-axis. The top 10 modulated DEGs are highlighted with gene names (red: up-regulation; blue: down-regulation), including Akt1, parvalbumin (PV), and Ascl1. C Relative expression level (mean ± SEM) of genes of interest between HET and WT mice using quantitative polymerase chain reaction (qPCR). D Representative images and immunohistochemistry results (mean ± SEM) for PV-positive neuron densities in the dorsomedial striatum (DMS) between HET and WT mice. Consistent with results from RNA-seq analysis and qPCR, a significant reduction of PV interneurons was confirmed in the DMS of HET mice. *p < 0.05.

RT-qPCR confirmed decreased expression of Akt1 (t(7) = 5.532, p < 0.01, Cohen’s d = 4.182 > 0.8) and parvalbumin (t(7) = 1.925, p < 0.05, Cohen’s d = 1.455 > 0.8) in HET mice compared to WT controls. No significant differences were observed in the expression of glutamic acid decarboxylase 67 (GAD67) and calretinin (Calr) between genotypes (Fig. 4C). Additionally, downregulated expression of Ascl1, a key transcription factor in striatal neurogenesis, was validated (t(7) = 2.815, p < 0.05, Cohen’s d = 2.128 > 0.8). Immunohistochemistry further confirmed reduced PV interneuron expression in the DMS of HET mice compared to WT controls (t(9) = 3.406, p < 0.01, Cohen’s d = 2.271 > 0.8; Fig. 4D), consistent with the gene expression findings.

Experiment 6: Selective lesion of PV interneurons in the DMS of DTA-treated PV-Cre mice disrupted choice behaviors as observed in HET mice

In Experiment 6, we specifically lesioned PV interneurons in the DMS via diphtheria toxin A (DTA) in PV-Cre mice to ensure the causal effect between reduced expression of PV interneurons and choice behavior in the 2C task. As depicted in Fig. 5A, the region-specific expression of DTA resulted in approximately a 50% reduction in the density of PV-positive interneurons in the DMS (t(12) = 4.843, p < 0.01, Cohen’s d = 2.796 > 0.8). The expression of DTA did not affect spontaneous locomotor activity in PV-Cre mice during habituation in the testing chamber before or after the treatment (Fig. 5B). However, compared to the control group, selective lesion of PV interneurons in the DMS led to a significant reduction of accumulated trials in the acquisition of the 2C task (t(16) = 2.179, p < 0.05, Cohen’s d = 1.09 > 0.8; Fig. 5C). The analysis of choice strategy further revealed a decreased lose-shift strategy in DTA-treated PV-Cre mice (U(8, 11) = 14, z = −2.435, p < 0.01, rank-biserial r = 0.68; Fig. 5D). Similar to HET mice, our trial-by-trial model fitting data revealed that DTA-treated PV-Cre mice exhibited a higher αrew (D(40000) = 0.9937, p < 0.05), a higher αnor (D(40000) = 0.9474, p < 0.05), and a lower β (D(40000) = 0.9305, p < 0.05) than the sham controls did (Fig. 5E). In contrast, selective chemogenetic inhibition of GABAergic interneurons in the DMS of GAD2-cre mice has no significant effect on accumulated trials, choice strategies, and model parameters (except the choice consistency (ß)) in the 2C task (data not shown).

Fig. 5: Selective lesion of parvalbumin (PV) interneurons in the dorsomedial striatum (DMS) of DTA-treated PV-Cre mice resulted in a reduction of PV positive cells and diminished behavioral performance in the two-choice probabilistic task in Experiment 6.
figure 5

A Representative images illustrating PV-positive cells in the DMS of DTA-treated PV-Cre mice (PV-DTA group) and PV-sham controls (PV-Sham group). A significant reduction of PV-positive cells was observed in the PV-DTA group. B Mean ± SEM travel distance before (left panel) and after (right panel) selective lesion of PV interneurons in the DMS, showing no significant group difference. C Mean ± SEM accumulated trials in the two-choice probabilistic task, revealing decreased accumulated trials in the PV-DTA group compared to the PV-Sham group. D Mean ± SEM of the ratio of different choice strategies (win-stay, win-shift, lose-stay, and lose-shift) for the PV-DTA group compared to the PV-Sham group. Analysis revealed a decreased lose-shift strategy in the PV-DTA group compared to the PV-Sham group. E Group-level distributions of model parameters (reward learning rate (αrew), no-reward learning rate (αnor), and choice consistency (ß)) derived from reward-no-reward model-based analysis. Mice in the PV-DTA group exhibited higher αrew, higher αnor, and lower ß compared to the WT Sham group. *p < 0.05.

Discussion

This study encompasses six experiments using the Akt1 mutant mouse model related to schizophrenia risk, revealing impaired behavioral performances and altered choice strategies in the 2C task, particularly evident in the absence of rewards. The integration of in vivo local field potential recordings and chemogenetic data highlights the critical role of the DMS in mediating abnormal decision-making behaviors. RNA-seq analysis, immunohistochemistry, and selective lesioning of PV interneurons in the DMS further support these findings. Together, these results underscore the pivotal roles of Akt1 and PV interneurons in regulating choice strategies during reward-based decision-making, especially under no-reward condition. This study also identifies Akt1 influence on probabilistic decision-making strategies within the DMS and the contribution of PV interneurons in a mouse model related to schizophrenia risk, representing a significant discovery. The findings suggest targeting PV interneurons in the DMS as a potential therapeutic approach for addressing cognitive deficits in schizophrenia.

PV interneurons have emerged as a significant focus in understanding cognitive functions, particularly in schizophrenia. Postmortem studies consistently reveal reduced PV interneuron numbers in individuals with schizophrenia [33,34,35], with NMDAR hypofunction in these cells proposed as a potential underlying mechanism [36]. Additionally, PV interneurons are critical for generating gamma oscillations (30–80 Hz), crucial for cognitive processes [37,38,39,40,41]. Studies in Huntington’s disease models suggest that striatal PV interneurons play a crucial role in learning by providing local inhibitory input to striosomes [42]. Our findings align with these insights, emphasizing that altered brain oscillations due to decreased PV interneurons in the striatum significantly impact cognitive functions. Moreover, fast-spiking PV interneurons in the striatum tightly regulate MSNs, balancing firing between direct (D1) and indirect (D2) pathway neurons [43]. Optogenetic and chemogenetic studies further indicate that PV interneurons modulate striatal output, enhancing early learning and action selection [44, 45]. Despite greater abundance in the lateral than medial striatum in rodents [46, 47], PV interneurons exert stronger regulatory control over DMS efferents compared to other GABAergic interneurons [48]. Additionally, PV interneurons in medial and lateral striatum exhibit intrinsic excitability differences [49], potentially influencing reward-related decision-making in distinct dorsal striatal subregions. Our findings support the critical role of striatal PV interneurons in choice strategy and probabilistic decision-making, relevant to cognitive deficits observed in schizophrenia. Consistent with this, we previously reported that individuals with schizophrenia who are experiencing psychosis exhibit rapid reward value updating (high learning rate, α) and reduced choice perseveration (low β) in a dynamic two-choice task [9, 10]. These behavioral observations may reflect alterations in striatal PV interneurons. Our behavioral task and analyses are adaptable to both human and mouse models, providing a valuable and translational method to investigate reward-related learning and decision-making in basic and clinical research contexts.

Previous studies have highlighted Akt1 role in modulating reward learning and prediction error in mice [14], as well as its involvement in methamphetamine-induced psychosis and striatal neuronal activity [19]. Given Akt1 association with dopamine-related behaviors and schizophrenia, it is valuable to compare Akt1- and PV-related findings across experiments in this study. In Experiment 2, consistent with our previous study [14], Akt1 HET mice showed reduced trial accumulation, a decrease in lose-stay choices under no-reward conditions, and altered model parameters (higher αnor and lower β). Similarly, deluded patients with schizophrenia tend to revise probability estimates more readily and require less information before making decisions [50]. However, a faster reward learning rate or heightened reward prediction error (RPE) does not necessarily enhance performance. In our study, this tendency happened to be advantageous within the specific constraints of the mouse task. Comparable results were observed in Experiment 6, where PV-Cre mice with selective PV interneuron lesions (via DTA treatment) also showed reduced trial accumulation, fewer lose-shift strategy in no-reward condition, and changes in αnor and β relative to controls. In contrast, Experiment 4 revealed contrasting behavioral outcomes. Building on our previous findings of increased excitability in striatal MSNs in Akt1-deficient mice [19] and elevated LFP power in the DMS during the habituation phase of the 2C task in Akt1 HET mice (Fig. 3A), we employed chemogenetic inhibition to nonspecifically suppress neuronal activity in the DMS in Experiment 4, aiming to investigate the role of DMS activity in the abnormal behavioral performance of Akt1 HET mice during the task. Chemogenetic inhibition of the DMS in CNO-treated Akt1 HET mice reversed behavioral abnormalities observed in untreated HET controls. Despite the potential for reduced modulatory efficacy with the dual virus approach in Experiment 4, the findings in Fig. 3D–F demonstrate that chemogenetic inhibition rescued behavioral deficits in Akt1 HET mice during the 2C task, highlighting the role of DMS activity in reward-based decision-making. Specifically, treated mice exhibited an increased lose-stay strategy under no-reward condition, decreased αnor, and increased β compared to controls, aligning with Experiment 3 indication of DMS’s pivotal role in choice behavior, especially in no-reward situations. Thus, while Experiment 2 with Akt1 HET mice and Experiment 6 with selective PV interneuron lesions showed similar changes in decision strategies and model parameters, contrasting results were observed in Experiment 4. These findings highlight the complex role of the DMS in decision-making, with different manipulations producing both similar and divergent outcomes. Specifically, while both Akt1 HET mice and PV interneuron lesions showed similar changes, DMS inhibition in HET mice had divergent effects, suggesting a context-dependent role of DMS activity in modulating choice behaviors.

To identify potential targets for further investigation, Experiment 5 revealed a reduction in striatal PV interneurons and decreased expression of Akt1, parvalbumin, and Ascl1 in HET mice, as assessed by RNA-seq, RT-qPCR and immunohistochemistry. Ascl1 is crucial for interneuron development [51, 52], and Akt1 inhibition has been shown to impair Ascl1-induced PV-neuron differentiation in cell culture [18]. Therefore, the chronic deficiency of PV interneurons in HET mice may be a result of compensatory mechanisms involving Ascl1, triggered by long-term Akt1 knockdown during brain development. Building on this finding, we used DTA to selectively lesion DMS PV neurons in WT mice, mimicking the reduced PV interneuron population observed in AKT1 HET mice. The striatum, a key component of the basal ganglia involved in motor function and reward learning, is predominantly composed of GABAergic MSNs [53]. Although these MSNs express relatively low levels of G-protein inwardly rectifying potassium (GIRK) channel subunits, which are key mediators of chemogenetic inhibition, compared to other brain regions, recent evidence reveals detectable GIRK immunofluorescence signals in the striatum and supports the functional relevance of GIRK-mediated inhibition in this region [54]. Akt1-deficient mice also exhibit reduced cumulative miniature IPSC amplitudes in striatal MSNs and altered activity following methamphetamine exposure [19]. These findings align with our interpretation that DMS chemogenetic inhibition temporarily suppressed striatal MSN activity in Experiment 4, which may have attenuated GABAergic inhibition from MSNs onto downstream regions and normalized striatal circuit activity in Akt1 HET mice. Future studies using in vivo electrophysiology could further explore neural oscillation changes in the DMS of Akt1 mutant mice. Additionally, PV interneurons in the DMS receive glutamatergic input from the cingulate cortex, which is distinct from the dorsolateral striatum (DLS) [49]. These PV neurons locally inhibit striosomal activity [42], playing a critical role in the GPi-LHb circuit, which is involved in anti-reward processing [55, 56]. Future research could benefit from multi-site electrophysiological recordings to examine the dynamics of the striatum-GPi-LHb circuit during decision-making tasks.

It is important to acknowledge a significant limitation of this study: the lack of gender consideration. Gender plays a critical role in the pathology and pathogenesis of schizophrenia, with systematic reviews and meta-analyses indicating a higher incidence of schizophrenia in men compared to women [57]. Additionally, sex-specific behavioral deficits have been observed in Akt1-deficient mice [19]. In our previous work, male Akt1 HET mice displayed behavioral patterns similar to those seen in individuals with schizophrenia [14], and only male Akt1-deficient mice showed a reduction in methamphetamine-induced striatal activity, whereas female mice did not exhibit this effect [19]. Given these findings, we focused exclusively on male mice in this study. Future research should consider investigating the behavioral performance of female mice in the 2C task to better understand potential gender differences in reward-based decision-making.

In conclusion, this study establishes a strong correlation between altered behavioral responses, shifts in choice strategies, and variations in model parameters observed in HET mice during the 2C task. The pivotal role of PV interneurons in the DMS in regulating strategic decision-making, particularly evident in no-reward condition, suggests their potential as therapeutic targets for addressing cognitive impairments in schizophrenia. Despite the limited sample size in some experiments, the observed effects and effect sizes remain substantial and consistent. Our findings pave the way for future investigations, providing deeper insights into the neural circuits underlying strategic decision-making and cognitive dysfunctions in schizophrenia. Specifically, conducting multi-site electrophysiological recordings to scrutinize the striatum-globus pallidus internus-lateral habenula (striatum-GPi-LHb) circuit during decision-making could offer valuable avenues for further exploration. Overall, our findings illuminate Akt1 role in precisely modulating probabilistic decision-making strategies within specific brain regions and cell types in a mouse model related to schizophrenia risk, marking a significant advance in our understanding of schizophrenia and the development of treatments, particularly for cognitive deficits.