Introduction

Emotion is a significant yet complex psychological phenomenon, attributable in part to its multicomponent nature1,2. Extensive psychophysiological evidence demonstrates that emotional stimuli evoke a range of responses, including subjective feelings and bodily reactions, that are systematically coordinated3. Among these responses, the subjective experience of emotional valence and corresponding facial expressions has garnered considerable attention. The subjective experience constitutes a core component of emotion, with emotional valence—defined as the continuum from positive to negative affective states—serving as a fundamental low-dimensional descriptor2,4,5. Facial expressions are among the most distinctive bodily responses associated with emotional states6. Numerous studies have demonstrated a systematic association between subjective emotional valence and facial expressions7,8,9,10,11,12,13,14,15,16,17,18,19,20,21. For example, a previous study found that participants exposed to emotionally positive films showed a positive association between their positively valenced subjective experiences and increased activation of the zygomatic major muscle, which is responsible for the lip corner pulling action19. However, the strength of the association was moderate; for example, the correlation coefficient between dynamic valence ratings and zygomatic major muscle activity was approximately 0.219. The data imply that facial and subjective emotional responses may reflect related but different underlying processes. Previous studies have suggested a number of mechanistic relationships between these responses, which can largely be classified into two categories. First, some researchers, including Darwin22, proposed that facial expressions are readouts of inner emotional experiences, a concept referred to as the readout hypothesis23 and constituting the commonsense view24. Others, including James25, proposed that facial expressions are produced first and subsequently produce or modulate emotional experiences, which is referred to as the facial feedback hypothesis26,27,28,29. Although several empirical investigations have tested the facial feedback hypothesis, the results have been inconsistent, including both positive30,31,32 and negative33 findings, and it therefore remains a matter of debate.

To elucidate the neural mechanisms underlying emotion, numerous functional neuroimaging studies using functional magnetic resonance imaging (fMRI) and positron emission tomography have examined brain regions associated with facial expressions and subjective experiences. Several studies have investigated the neural substrates involved in the production of emotional facial expressions by concurrently recording facial electromyography34,35 or video data36,37. These studies identified activation in subcortical regions, including the amygdala34,35,37, basal ganglia36, thalamus37, and cerebellum37, and in cortical regions including the occipital and temporal cortices37, lateral posterior parietal cortices37, somatosensory motor cortices37, supplementary motor cortices36, and prefrontal cortices37. Numerous studies investigated the neural correlates of the intensity or arousal of positive and negative subjective emotional experiences. These studies reported the involvement of subcortical regions such as the amygdala38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53, basal ganglia43,46,47,48,49,54,55,56, and cerebellum46,48,49,51,57,58, as well as various neocortical areas including the occipital and temporal cortices38,47,48,50,51,53, medial parietal regions (e.g., posterior cingulate cortex and precuneus)47,50,51,54,59, lateral posterior parietal cortices46,48,51,58, insular cortex43,46,47,55,59,60,61, somatosensory motor cortices48, and prefrontal cortices39,42,46,47,48,49,50,52,54,57,58,59,60,62,63. Taken together, these findings imply that a broad network of brain regions, spanning subcortical and cortical structures, may be implicated in facial expression and subjective emotional processing.

However, few studies have simultaneously recorded facial and subjective emotional responses while statistically dissociating their effects. This represents a critical gap in the existing literature, as numerous psychological studies have reported moderate correlations among various components of emotional responses. These findings raise the possibility that the observed associations with facial expressions or subjective emotional experiences may, at least in part, reflect the influence of distinct underlying emotional processes. Exceptionally, one fMRI study has directly addressed this issue37. In that study, the researchers recorded the facial reactions of 13 participants in response to humorous cartoons and subsequently collected humor perception ratings after the scanning session. Brain activity associated with smiling, after statistically controlling for humor perception, was observed in subcortical regions, including the amygdala, basal ganglia, and cerebellum, as well as in neocortical regions such as the temporal and prefrontal cortices. Conversely, activity related to humor perception, after removing the influence of facial expression, was predominantly found in neocortical areas, including the temporo-parieto-occipital junction, temporal cortex, and prefrontal cortex. These findings imply a functional dissociation between the neural substrates underlying facial expressions and those supporting subjective emotional experience. However, the study37 had several limitations. Measurement of humor perception may not fully reflect purely subjective emotional experiences, as it entails cognitive appraisal of the stimuli64. Moreover, the study37 used only positively valenced stimuli and included a relatively small sample size. Other studies have reported inconsistent findings regarding the neural correlates of facial and subjective responses. For instance, the relationship between amygdala activity and self-reported emotional experience has varied across investigations. Further studies are needed to clarify these associations. Our study was designed to address these concerns by testing the hypothesis that distinct neural regions are differentially associated with facial and subjective emotional responses, consistent with the results reported previously37.

Furthermore, it remains unclear whether the brain regions associated with facial and subjective emotional responses form distinct functional networks. To our knowledge, no prior study has investigated segregated patterns of functional connectivity corresponding to facial and subjective emotional responses. Considering that previous studies have identified functional network configurations during emotional processing65 and the execution of facial expressions66 involving multiple interconnected brain regions, we hypothesized that the brain areas exhibiting activity related to facial or subjective responses would constitute a functionally integrated neural network.

In addition, assuming the existence of distinct neural networks associated with facial and subjective emotional responses, the dynamic relationship between these networks remains uncertain. As noted above, several psychological theories have long proposed models of emotional responses that may correspond to such network dynamics. The readout hypothesis posits that the recognition of emotional stimuli initially elicits subjective emotional experiences, which subsequently generate facial emotional expressions23. In contrast, proponents of the facial feedback hypothesis have argued for the opposite sequence, suggesting that facial expressions themselves can influence and shape subjective emotional experiences25. Alternatively, some studies have proposed that stimulus recognition elicits both subjective and facial emotional responses67. Considering the relatively robust empirical support for the facial feedback hypothesis30,31,32, we hypothesized that the neural network involved in stimulus recognition would first activate the network associated with facial responses, which would subsequently modulate the network underlying subjective emotional experiences.

To test these hypotheses, we conducted fMRI while participants viewed emotionally evocative films, during which their facial reactions were simultaneously video recorded. Following image acquisition, participants provided cued-recall dynamic valence ratings. First, we performed analysis using a general linear model (GLM) incorporating both facial responses, quantified using Action Unit (AU) 12 (lip corner pulling), as defined by the Facial Action Coding System (FACS)68,69, and subjective emotional responses, indexed by the absolute values of dynamic valence ratings as a measure of emotional intensity (regardless of valence polarity) as in previous studies54,70,71,72,73. Due to the use of a 32-channel head coil that covered the upper half of the face, we limited our analysis to AU 12, a prototypical indicator of positive emotional expression (Fig. 1). We used the cued-recall procedure based on the proposal that online introspective monitoring could interfere with the authenticity of emotional responses74,75,76 and the data showing high positive correlations between online and cued-recall dynamic ratings in response to emotionally evocative films14,77. We analyzed the absolute values of valence ratings because meta-analyses of neuroimaging studies have reported that several emotion-related brain regions described above, such as the amygdala, can be consistently activated in response to both negative and positive emotions relative to neutral emotions78,79,80. Our analysis statistically dissociated neural activity associated with facial and subjective emotional responses. Next, to identify large-scale functional connectivity networks associated with each response type, we conducted a group-level independent component analysis (ICA) of the fMRI data81. Functional connectivity was defined as temporally synchronized activity among spatially distinct brain regions82. ICA is one of the two principal methods used to examine functional connectivity, the other being seed-based correlation analysis83. Although seed-based methods are suited for identifying connectivity between a predefined seed region and the rest of the brain, ICA is more appropriate for uncovering global connectivity networks across multiple brain regions84. Finally, to examine the dynamic coupling between the independent components (ICs) associated with facial and subjective emotional responses, we applied dynamic causal modeling (DCM)85 to the ICs. DCM for ICs has been proposed as a suitable method for investigating causal interactions between large-scale brain networks comprising multiple functionally distinct regions86,87,88.

Fig. 1
figure 1

Illustration of video data of participants’ emotional facial responses.

Results

Subjective and facial emotional responses

Figure 2 shows the group mean second-by-second dynamic valence ratings (left) and AU 12 (right) during each film, respectively. The emotionally evocative film clips elicited facial and subjective emotional responses. For example, the valence ratings and AU 12 for the negative film showed a slight increase followed by a decline, reflecting the film’s content: a pleasant group gathering scene followed by a massacre. Repeated measures analyses of variance (ANOVAs) with emotion as a factor showed significant main effects of emotion both for averaged valence ratings (mean ± standard error, −0.84 ± 0.11, −0.27 ± 0.12, 1.76 ± 0.15 for negative, neutral, and positive, respectively) and averaged AU 12 (mean ± standard error, 0.03 ± 0.03, 0.06 ± 0.04, 0.22 ± 0.06 for negative, neutral, and positive, respectively) (F[2, 64] = 106.99 and 5.99, p = 0.000 and 0.004, η²p = 0.77 and 0.16, respectively). Holm-corrected multiple comparisons revealed that the averaged valence ratings were significantly higher for the positive films than for the neutral and negative films, and for the neutral films than for the negative films (t > 3.01, p < 0.005). The averaged AU 12 responses were significantly higher for the positive films than for the neutral and negative films (t > 2.72, p < 0.02). When the absolute value of valence was calculated as the measure of emotional intensity, ANOVA for the averaged values (mean ± standard error, 0.91 ± 0.09, 0.48 ± 0.10, 1.76 ± 0.15 for negative, neutral, and positive, respectively) showed a significant main effect of emotion (F[2, 64] = 34.87, p = 0.000, η²p = 0.52). Multiple comparisons revealed significantly higher values for both the positive and negative films than neutral films and for the positive than negative films (t > 2.75, p < 0.01).

Fig. 2
figure 2

Group mean dynamic ratings (left) and action unit (AU) 12 (right) elicited by the negative, neutral, and positive emotionally evocative films.

Regional brain activity

For fMRI data analysis, a random-effects analysis89 was first conducted to identify regions showing significant activation associated with film observation, facial emotional responses, and subjective emotional responses.

Contrasts associated with the film observation revealed widespread activation across neocortical regions. Significant clusters were identified in the bilateral occipital and temporal lobes, left and right parietal lobes, bilateral dorsomedial frontal lobes, and left ventromedial frontal cortex (Table 1 and Fig. 3).

Fig. 3: Statistical parametric maps indicating brain activity associated with film observation, facial responses, and subjective responses.
figure 3

The significant areas (at p < 0.05 cluster-level family-wise error-corrected; n = 33) were superposed on the SPM-render brain and mean normalized structural images of the study participants.

Table 1 Brain regions demonstrating significant activity associated with film observation, facial responses, and subjective responses

Contrasts associated with facial emotional responses (i.e., AU 12) revealed significant activation in the bilateral somatosensory and motor cortices, including the postcentral gyrus, insula, operculum, precentral gyrus, supplementary motor area, and middle cingulate cortex (Table 1 and Fig. 3). In addition, widespread activation was observed in the right limbic regions, with prominent foci in the amygdala and putamen.

Contrasts assessing positive associations with subjective emotional responses (indexed by the absolute values of dynamic valence ratings) demonstrated significant activity in the bilateral medial parietal regions, including the precuneus; the bilateral lateral temporal regions, including the middle temporal gyri; and the bilateral cerebellum (Table 1 and Fig. 3).

Independent component analysis

Next, group ICA of the fMRI data was conducted to analyze functional brain networks associated with film observation, facial emotional responses, and subjective emotional responses. The group ICA estimated 30 ICs, and then the reconstructed time courses of the ICs were then evaluated by random-effects analyses as in the regional activity analyses discussed above.

Contrasts associated with the film observation revealed three significant components: IC #5 in the early visual areas, IC #7 in the higher-order visual areas (e.g., the middle temporal gyrus), and IC #11 in the early auditory areas (Fig. 4). Further contrasts revealed one component to be significantly associated with facial emotional responses (IC #13) and another to be significantly related to subjective emotional responses (IC #15) (Fig. 4). These components encompassed brain regions previously identified in the regional brain activity analyses, indicating functional connectivity among these areas.

Fig. 4: Group independent component (IC) maps indicating significant associations with early visual processing, higher-order visual processing, auditory processing, facial responses, and subjective responses (n = 33).
figure 4

The IC maps were superposed on mean normalized structural images of the study participants. The IC maps were converted into z-score maps and thresholded at z ≥ 1 for display purposes.

Dynamic causal modeling for independent components

Finally, we applied stochastic DCM90,91,92 to the ICs identified in the above ICA to examine the dynamic interactions between the functional brain networks associated with sensory processing (ICs #5, #7, and #11 for the early visual processing, higher-order visual processing, and auditory processing, respectively), facial emotional responses (IC #13), and subjective emotional responses (IC #15). Based on ample psychological evidence showing associations between facial and subjective emotional responses (e.g., ref. 7), we assumed the interaction between facial and subjective emotional response networks. In addition, based on anatomical evidence indicating that the large-scale connections are basically reciprocal93,94, we assumed bidirectional connectivity across the network. We tested the relationships between facial and subjective emotional responses by constructing three model variants, in which the sensory processing networks interacted with both facial and subjective emotional response networks independently, the subjective response network only, or the facial response network only. In addition, we explored relationships among the three sensory processing networks by constructing three variants with convergence of the early visual information in the higher-order visual processing, convergence of both early visual and auditory information in the higher-order visual processing (i.e., multimodal sensory association95,96), and independent processing of the three networks. In total, we constructed nine models (Fig. 5). The models were systematically compared using the random-effects Bayesian model selection97. In addition, we grouped the models into three families98 according to the relationships between the sensory and facial/subjective emotional response networks and conducted model-family comparisons.

Fig. 5: Models and families of dynamic causal modeling for independent components associated with early (eVis) and higher-order (hVis) visual and auditory (Aud) processing, and facial (Fac) and subjective (Sub) emotional responses.
figure 5

The nine models were divided into three families (each containing the three models in dashed boxes), based on three alternative hypotheses, in which the sensory processing components interacted with both facial and subjective emotional response components, the subjective response component only, or the facial response component only. The arrows indicate connections between components.

The exceedance probability in the model comparison revealed that the model in which the three sensory processing networks interacted independently with the facial emotional response system, which in turn interacted with the subjective emotional response system, was optimal (Fig. 6). Comparisons of model families confirmed that the models with connectivity between the sensory and facial response systems were superior to the other models positing connectivity between the sensory and subjective response systems or connections of the sensory processing networks with both emotional response networks (Fig. 6).

Fig. 6
figure 6

Bayesian model selection results (expected and exceedance probabilities for model and family).

Discussion

The results of our regional brain activity analyses demonstrated that the production of emotional facial expressions measured by AU 12 were associated with activity in several brain regions, including both subcortical and neocortical structures. In particular, subcortical activation was observed in the amygdala, putamen, and thalamus, whereas neocortical activation involved the somatosensory regions, insular cortex, supplementary motor area, and middle cingulate cortex. The activity observed in these regions is largely consistent with previous studies, which have specifically identified facial-expression-related activity in the amygdala34,35,37, basal ganglia36, thalamus37, somatosensory and motor cortices37, and supplementary motor cortices36. Several lesion studies have investigated the neural correlates of emotional facial expressions, reporting that damage to specific brain regions, including the thalamus, insular cortex, supplementary motor area, and middle cingulate cortex, can impair the production of these expressions, consistent with our findings (e.g., ref. 99; for a review, see ref. 100). Furthermore, prior clinical studies have indicated that patients with temporal lobe epilepsy exhibit unilateral weakness in emotional facial expressions contralateral to the side of mesial temporal sclerosis, consistent with our findings101,102.

Notably, amygdala activity was associated with facial, but not subjective, emotional responses. Consistent with prior studies, our findings revealed statistically dissociated facial and subjective emotional responses37. However, numerous previous studies have reported inconsistencies, with amygdala activity being linked to subjective emotional responses (e.g., ref. 38). These findings could be explained by the fact that previous studies did not measure facial emotional responses, which are more directly associated with amygdala activity and also positively correlated with facial responses, and detected spurious associations between amygdala activity and subjective emotional responses. Our findings are consistent with previous neuroimaging103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122, neurophysiological123,124, and lesion125,126,127 studies indicating that the amygdala is involved in nonconscious emotional processing prior to the production of subjective experiences. Some recent neuroimaging studies have also shown that the amygdala is activated in response to emotional stimuli independently of conscious evaluation of stimuli128,129. In particular, the amygdala appears to contribute to the rapid, preconscious appraisal of the emotional salience of stimuli130,131, a process that may trigger immediate emotional facial expressions132,133, which corresponds to the role of the appraisal process proposed in psychological emotion theories1,4,134,135,136,137. The amygdala may enable such rapid and complex computation through its rich intra- and interregional connectivity138,139,140.

Conversely, our findings revealed that subjective emotional responses, statistically dissociated from facial emotional responses, were primarily associated with activity in the medial parietal cortex and lateral temporoparietal regions. This pattern of activation aligns with previous neuroimaging studies that have investigated the neural correlates of subjective emotional experiences (e.g., ref. 59), including studies that have explicitly aimed to dissociate facial and subjective emotional components37. Notably, activation in these regions is consistent with earlier findings implicating the medial parietal and temporoparietal areas in non-emotional, self-reflective subjective states (e.g., ref. 141; for a review, see refs. 142,143). Our findings support psychological models suggesting that valenced subjective experiences function as a monitoring component, emerging from the integration and representation of changes in other affective and cognitive components4. Notably, the medial parietal and lateral temporoparietal regions have also been implicated in processes related to the estimation of others’ mental states (e.g., refs. 144,145; for reviews, see ref. 146,147,148). This overlap aligns with the conceptualization of theory of mind, or mentalizing, as the cognitive capacity to attribute mental states to oneself and others149. It supports the proposal that subjective emotional experiences may be constructed through mechanisms similar to those involved in understanding the subjective experiences of others29. Taken together, these findings imply that subjective emotional experiences are likely constructed via mentalizing processes, emphasizing a convergence between self-referential emotional appraisal and social cognitive functions.

In addition, our group ICA revealed that the aforementioned brain regions associated with facial or subjective emotional responses form distinct ICs. Our findings are consistent with anatomical evidence from non-human primates demonstrating white matter (WM) connectivity among the amygdala, basal ganglia, and sensorimotor cortices (for reviews, see refs. 138,150,151), and between the medial parietal and lateral temporoparietal regions (for reviews, see refs. 152,153), as well as their connections with the visual cortices93,154. The results are also consistent with previous findings of functional or effective connectivity among these regions (e.g., ref. 66), although no studies have directly demonstrated associations between such connectivity and either facial or subjective emotional responses. Our findings provide novel evidence implying that these brain regions cooperate as integrated functional neural networks, each supporting the implementation of either facial or subjective components of emotional experience.

Furthermore, our DCM for ICs revealed dynamic interactions among the relevant neural networks. In particular, the sensory processing networks were functionally connected with the network underlying facial, but not subjective, emotional responses. Subsequently, the facial and subjective emotional response networks exhibited bidirectional interactions. Our findings have important theoretical implications for understanding the psychological mechanisms underlying emotional responses. Previously, Darwin22 proposed that facial expressions are expressed as the readouts of inner emotional experiences. Several researchers subsequently supported this commonsense view of a direct pathway from subjective feelings to bodily changes23. In contrast, James25 proposed the inverted version of this proposition that stimuli evoke bodily responses, which subsequently give rise to subjective emotional experiences. Several subsequent theories have suggested that facial expressions play a formative role in constructing subjective emotional experiences (e.g., ref. 28; for a review, see ref. 29). There remains long-standing contention regarding these hypotheses32, partly due to the methodological limitations of behavioral data in elucidating the underlying mechanisms. Our findings regarding the neural dynamics provide empirical support for the foundational premise of James’s theory25 and the facial feedback hypothesis. In particular, our study revealed that the sensory processing of emotionally salient stimuli initiates activation of appraisal and facial response systems, which subsequently influence the emergence of subjective emotional experiences.

Our DCM for ICs also suggested that the early and higher-order visual processing networks and auditory processing networks interacted independently with facial response networks. The results may be plausible if we assume that the facial response network utilizes the amygdala for the input of sensory information, because anatomical evidence indicates that the amygdala receives inputs from both the visual and auditory areas138,139. In addition, the results are consistent with previous findings reporting that the amygdala modulated the activity in the widespread sensory areas during the perception of emotional stimuli155,156,157,158,159,160,161.

This study had several limitations. First, we assessed facial emotional responses solely in the lower face. This constraint arose from the use of a head coil that covered the upper face, limiting our ability to evaluate action units involving the upper facial region, such as brow lowering, an expression typically associated with negative emotional states. Future studies using half-head coils or other advanced imaging setups are needed to enable comprehensive assessment of upper and lower facial expressions.

Second, we assessed only valence ratings for subjective emotional responses. Although the dimensional emotional perspective generally posits that subjective emotional experience can be well represented by the two dimensions of valence and arousal3,162, we did not assess arousal ratings because several previous studies reported that the absolute values of valence ratings in response to emotional visual stimuli could produce overlapping information with arousal ratings163. Our preliminary analyses also showed a positive correlation between the absolute values of valence ratings and arousal ratings for the films used in this study (mean ± standard error, r = 0.66 ± 0.04; see “Methods”). However, there remains debate about the relationship between valence and arousal164,165 and it may be possible to stimulate valence and arousal states independently by certain stimuli, such as surprising films166,167,168. Testing of subjective arousal remains a matter for future research.

Third, we assessed subjective emotional ratings using the cued-recall procedure. We selected this procedure based on the proposal that online introspective monitoring could interfere with the authenticity of emotional responses74,75,76, such as the reduction of immersion75 and the data showing significant positive correlations between online and cued-recall ratings for the stimuli used in the present study (mean ± standard error, r = 0.55 ± 0.07; see “Methods”)14. However, the data imply that online and cued-recall ratings could have substantial differences (i.e., 56% in variance), and our regressors of subjective emotional experience yielded incomplete estimation of online subjective emotional responses. Future studies should test online ratings and to confirm the generalizability of the present findings.

Fourth, the sample size was small, which may have resulted in a lack of power169,170,171,172 for detecting associations between facial or subjective emotional responses and other brain regions. For example, some previous studies have suggested that the posterior superior temporal sulcus could be involved in the production of facial expressions173,174,175,176. Several studies showed that the dorsomedial prefrontal cortex was activated during a task involving mentalizing (e.g., ref. 177; for reviews, see refs. 178,179,180). The data imply that more brain regions may be involved in the facial or subjective emotional responses, which should be investigated in future studies.

Finally, our fMRI measurement may have lacked the temporal resolution to dissociate functional networks. Electrophysiological measures, including scalp or intracranial electroencephalography, are needed to investigate neural activity with high temporal resolution. Some previous electrophysiological studies showed that the amygdala was activated as early as 100 ms in response to emotional photographs181,182. Other studies reported that the medial parietal cortex was activated while evaluating one’s own mental state at about 300 ms183. Several other studies also reported electrical activity in the posterior cortices while viewing emotionally evocative films at different temporal or frequency profiles184,185,186,187. These data imply that the brain regions and networks identified in this study may exhibit different temporal profiles during the production of facial and subjective emotional responses. Future electrophysiological studies are warranted to test this idea.

In conclusion, our fMRI study delineated the regional brain activities, functional networks, and dynamic interaction patterns specifically associated with facial and subjective emotional responses. Regional brain activity analyses revealed that facial responses measured by AU 12 were primarily associated with limbic, motor, and somatosensory regions, whereas subjective emotional experiences, reflected by absolute valence ratings, were associated with medial parietal and lateral temporoparietal regions. Group ICA identified that these regions formed distinct ICs. DCM further demonstrated that the IC associated with visual recognition interacted with the facial motor response IC, which subsequently influenced the IC underlying subjective emotional responses. These findings imply that emotional responses are implemented through a dynamic and hierarchical interaction from the limbic–motor network to the mentalizing network.

Methods

Participants

Our study included 33 healthy volunteers (11 females and 22 males; mean ± standard deviation, 22.3 ± 2.9 years). The sample size was determined heuristically185, guided by prior fMRI studies investigating neural activity associated with facial and subjective emotional responses (n = 1337; n = 2835). All participants were right-handed, as determined using the Edinburgh Handedness Inventory188, and possessed normal or corrected-to-normal visual acuity. Before enrollment, the experimental procedures were thoroughly explained, and written informed consent was obtained from all participants. The Ethics Committee of the Unit for Advanced Studies of the Human Mind, Kyoto University, approved the study protocol. All ethical regulations relevant to human research participants were followed.

Experimental design

A within-subject, one-factor design was used, with emotion (negative, neutral, and positive) as the sole factor.

Stimuli

Film clips were used to induce negative, neutral, and positive emotional states. Negative affect was induced using scenes from “Cry Freedom” that portrayed acts of violence against vulnerable individuals. A neutral emotional state was evoked through the presentation of a standard screensaver display. These film stimuli were developed by Gross and Levenson189. To induce positive affect, a comedic dialogue between two individuals, excerpted from commercial films (M-1 Grand Prix The Best 2007–2009, Yoshimoto, Tokyo, Japan), was shown, which was used in a study by Sato et al.14. The durations of the negative, neutral, and positive stimuli were 156, 150, and 196 s, respectively. The efficacy of these stimuli in eliciting the intended subjective and facial emotional responses has been validated in multiple prior studies14,17,19,20,166. Specifically, the negative film we used was shown to be the most effective in eliciting negative emotional states (i.e., the ratio of mean to standard deviation of valence ratings) among films developed by Gross and Levenson189 and suitable for Japanese participants (i.e., those with Japanese dubbing or subtitles)166. Although the negative, neutral, and positive films were developed to induce anger, neutral, and amusement emotions14,189, the dimensional emotional states were used to refer to the films, because previous studies assessing categorical emotion ratings showed that the films could elicit multiple emotional categorical states166,189. It has been shown to be generally difficult to stimulate a single emotional category using film stimuli190. In addition to the three films, a film clip from “Silence of the Lambs”189 was used for practice. All visual stimuli subtended a visual angle of 9.3° vertically and 7.0° horizontally.

Presentation apparatus

Experimental events were controlled using Presentation Software, version 14.8 (Neurobehavioral Systems, Albany, CA, USA), running on a Windows-based computer (Microsoft, Redmond, WA, USA). Visual stimuli were projected via a liquid crystal projector (DLA-F110; Victor, Yokohama, Japan) at a refresh rate of 60 Hz onto a mirror positioned within the MRI scanner in front of the participants.

Participants’ facial responses were recorded using the MRI Communication Relay System (Panasonic, Tokyo, Japan), which comprised an MRI-compatible video camera, frame synchronizer (FA-125; Panasonic), video timer (VTG-55D; Panasonic), digital mixer (WR-D01; Panasonic), memory card portable recorder (AG-HPD24; Panasonic), and workstation (HP Z800; Hewlett-Packard Laboratories, Palo Alto, CA, USA).

For off-line dynamic valence ratings, a Windows-based laptop computer (CF-SV8; Panasonic) was used.

Procedure

In the fMRI experiment, participants completed three film viewing trials in a block design paradigm. Each trial began with the presentation of a fixation point—a small white “+” displayed on a black background—for 2 s, followed by a plain white screen for 10 s. Subsequently, the film stimulus was presented. After each film, the plain white screen reappeared for an additional 10 s. Interspersed between trials were off-epochs lasting 18 s to allow for signal stabilization and baseline comparison. The order of film presentation was counterbalanced across participants to mitigate order effects. Participants were informed that they would view a series of films during which brain activity and facial video data would be recorded. Prior to fMRI data acquisition, participants completed a practice trial to familiarize themselves with the procedure.

Following the imaging session, participants completed a subjective rating task outside the scanner. During each trial, one of the three film stimuli was presented on a monitor, accompanied by horizontal nine-point scales for assessing valence. Participants were instructed to recall their emotional experience from the initial viewing and continuously rate this recalled experience in terms of valence by manipulating a computer mouse. Mouse coordinates were recorded continuously at a sampling rate of 10 Hz and converted into second-by-second dynamic valence rating scores. This cued-recall paradigm was used to avoid inducing online introspective monitoring, which may interfere with the authenticity of emotional responses74,75,76. Notably, a previous study reported significant positive correlations between online and cued-recall ratings for the stimuli used in the present study (mean ± standard error, r = 0.66 ± 0.04)14.

MRI acquisition

Neuroimaging data were acquired using a 3-Tesla MRI system (MAGNETOM Verio; Siemens, Malvern, PA, USA) equipped with a 32-channel head coil. To minimize head motion, participants’ heads were stabilized using elastic padding. Functional images were acquired as 76 consecutive slices aligned parallel to the anterior–posterior commissure plane, providing whole-brain coverage. Imaging was performed using a T2*-weighted multiband gradient-echo echo-planar imaging sequence was used with the following parameters: repetition time (TR) = 2,000 ms; echo time (TE) = 41.2 ms; flip angle (FA) = 80°; multiband acceleration factor = 4; matrix size = 96 × 96; voxel size = 2 × 2 × 2 mm. At the beginning of each fMRI run, a gradient-echo field map was acquired to correct for geometric distortions (TR = 738 ms; TE1/TE2 = 4.92/7.38 ms [ΔTE = 2.46 ms]; FA = 60°; matrix size = 96 × 96; 76 slices with the same orientation and geometry as the functional echo-planar imaging). Following the functional image acquisition, a high-resolution T1-weighted anatomical image was obtained using a magnetization-prepared rapid acquisition gradient-echo sequence (TR = 2,250 ms; TE = 3.06 ms; FA = 9°; inversion time = 900 ms; field of view = 256 × 256 mm; matrix size = 256 × 256; voxel size = 1 × 1 × 1 mm).

Statistics and reproducibility: facial and subjective response analysis

Participants’ emotional facial expressions throughout the entire runs were coded from video recordings using FACS68,69. FACS is a comprehensive, anatomically based coding system that describes visible facial muscular movements in terms of AUs without attributing interpretive meaning. Although AU 4 (brow lowering), a prototypical indicator of negative emotional expression, and AU 12 (lip corner pulling), a prototypical indicator of positive emotional expression, are relevant for tracking dynamic valence changes20, the use of a 32-channel head coil (Fig. 1) precluded reliable analysis of AU 4. Therefore, only AU 12 was evaluated. A trained FACS coder, blinded to the study conditions, scored AU 12 on a second-by-second basis using a binary coding scheme. To assess inter-rater reliability, a second coder independently evaluated 10% of the data, comprising randomly selected 1-min segments from each participant. Inter-coder agreement was high, with a Cronbach’s alpha of 0.80. The binary time-series data for AU 12 were subjected to statistical tests for behavioral data, and then resampled to match the fMRI TR (2 s) and used as input for subsequent statistical analyses.

Participants’ subjective emotional responses during film viewing were assessed using second-by-second dynamic valence rating scores. The valence data were sampled at 1-s intervals for statistical tests of behavioral data, and resampled at 2-s intervals to align with the imaging data and served as input for subsequent statistical analyses. Based on meta-analysis findings of neuroimaging data showing that several emotion-related brain regions could be consistently activated in response to both negative and positive emotions relative to neutral emotions78,79,80, we calculated the absolute values of the valence ratings to quantify the intensity of positive or negative emotion as in several previous studies70,71,72,73. As the absolute values of valence ratings in response to emotional visual stimuli can produce overlapped information with arousal ratings163, we did not perfom independent assessment of arousal ratings. To confirm this rationale, we conducted a preliminary analysis of the previous data set, assessing second-by-second dynamic ratings of valence and arousal while 20 participants watched the negative, neutral, and positive emotionally evocative films used in the present study14. The mean ± standard error intraindividual correlation coefficient between the absolute values of valence ratings and arousal ratings was 0.55 ± 0.07 (one-sample t test contrasting with zero [two-tailed], t[19] = 7.85, p = 0.000, d = 1.76), implying overlapping of information regarding emotional intensity.

For the statistical evaluation of AU 12, valence ratings, and absolute valence ratings, the average values during each film were calculated and subjected to one-way repeated-measures ANOVA with emotion (negative, neutral, and positive) as a factor, followed by multiple comparisons using the Holm method. Statistical significance was set at p < 0.05.

Statistics and reproducibility: image analysis

Neuroimaging data were analyzed using the Statistical Parametric Mapping software package (SPM12; revision 7771; http://www.fil.ion.ucl.ac.uk/spm), implemented in MATLAB R2018a (MathWorks, Natick, MA, USA). The analysis pipeline comprised preprocessing, regional brain activity analysis, group ICA, and DCM for ICs (Fig. 7).

Fig. 7: Flowchart of data analysis.
figure 7

The analysis included preprocessing, regional brain activity analysis, group independent component (IC) analysis, and dynamic causal modeling for ICs.

For preprocessing, all functional images were initially corrected for slice timing. Subsequently, images from each run were realigned to the first scan to correct for head motion and were unwarped to correct for geometric distortions and for the interaction of motion and distortion based on the field map, using the FieldMap Toolbox191,192. Realignment parameters indicated minimal motion (maximum translation < 3.1 mm; mean ± standard deviation translation = 0.47 ± 0.33 mm, 0.42 ± 0.28 mm, and 1.28 ± 0.69 mm in the x, y, and z directions, respectively). Notably, motion was limited to < 2 mm for 28 participants. Functional images were coregistered to the skull-stripped anatomical image. Subsequently, all anatomical and functional images were spatially normalized to Montreal Neurological Institute space using the unified segmentation–spatial normalization procedure based on the anatomical image191. The normalized functional images were resampled to a voxel size of 2 × 2 × 2 mm and smoothed using an 8 mm full-width at half-maximum (FWHM) isotropic Gaussian kernel to account for inter-individual anatomical variability. Previous methodological work has demonstrated that an 8-mm FWHM provides optimal sensitivity and inter-subject comparability for GLM group analyses, whereas other findings imply that variations in FWHM (0, 4, and 8 mm) exert minimal influence on group-level ICA results193.

Random-effects analyses were conducted to identify significantly activated voxels at the population level89. Initially, single-subject analyses were performed using the GLM framework194. Task events, including film observation and facial reactions, were modeled using boxcar and delta functions, respectively. Subjective responses were incorporated as parametric modulators of the film observation. These three task-related regressors were convolved with a canonical hemodynamic response function. As multicolinearity among regressors can be problematic, we conducted preliminary analyses to evaluate the variance inflation factor (VIF). The results showed mean ± standard deviation (range) VIF = 1.1 ± 0.1 (1.0–1.6), implying no problematic multicolinearity compared with commonly used rules of 10195. A high-pass filter comprising a discrete cosine basis set with a cutoff period of 384 s was applied to remove low-frequency signal drift. To mitigate motion-related artifacts and physiological confounds, such as respiratory, cardiac, or vascular fluctuations, additional nuisance regression was conducted using the PhysIO Toolbox (version 3.2.0), part of the Translational Algorithms for Psychiatry-Advancing Science software collection (https://www.tnu.ethz.ch/de/software/tapas)196. Nuisance regressors included six motion parameters derived from the realignment step, as well as six WM and six cerebrospinal fluid (CSF) components calculated using the CompCor approach197. To extract the WM and CSF components for each participant, the anatomical image was segmented to create WM and CSF masks. The time-series signals were then extracted from voxels within these masks and subjected to principal component analysis. The first five principal components and mean signal were included as nuisance regressors of each of the WM and CSF signals. Serial autocorrelations were modeled assuming a first-order autoregressive process. These were estimated from the pool of active voxels using restricted maximum likelihood and used to whiten the data and the design matrix198. Contrast images corresponding to each task-related regressor from the first-level (single-subject) analysis were entered into a full factorial model for the second-level (random-effects) analysis. Corrections for non-sphericity, accounting for potential dependencies and unequal variance across factor levels, were implemented using the restricted maximum likelihood approach to ensure valid GLM assumptions of independent and identically distributed errors198.

Initially, regional brain activity associated with film observation, emotional facial responses, and emotional subjective responses was investigated. Clusters were considered statistically significant if they exceeded an extent threshold of p < 0.05, family-wise error-corrected for the whole brain, with a cluster-forming threshold of p < 0.001 (uncorrected). Anatomical labeling and identification of brain structures were conducted using the Automated Anatomical Labeling atlas and Brodmann area maps (Brodmann.nii) available via the MRIcron software package (https://www.nitrc.org/projects/mricron)199,200.

We conducted preliminary analyses of the above-described GLM using the original valence ratings (−4 to 4) instead of the absolute valence ratings. We searched for brain regions whose activity increased or decreased monotonically with subjective emotional valence. It has been argued that the original and absolute valence ratings reflect valence-specific and valence-general responses, respectively80. The analyses revealed that either the positive or negative contrasts of the subjective ratings produced significant activations reported in the Results section (e.g., increased activity in the bilateral lateral temporal regions with increasing valence, and increased activity in the bilateral medial parietal regions, lateral temporal regions, and cerebellum with decreasing valence; Supplementary Table 1 and Supplementary Fig. 1), as well as several other regions (e.g., activity in the left inferior frontal gyrus with increasing valence). Other effects, including those associated with film observation and facial emotional responses, showed almost identical patterns (e.g., activations in the bilateral somatosensory and motor cortices and the right limbic regions related to facial responses; Supplementary Table 1 and Supplementary Fig. 1). These results indicate that: most brain regions associated with valence-general subjective responses (i.e., those identified using absolute valence ratings) were also detectable using the original valence ratings; some regions may exhibit valence-specific patterns, either within these areas or in distinct regions; and the choice between original or absolute valence ratings had minimal impact on other effects related to film observation and facial emotional responses. Given these findings, and our focus on valence-general emotional responses, supported by prior meta-analyses showing that emotion-related brain activity is predominantly valence-general80, we reported the results for subjective emotional responses using the absolute values of valence ratings in the Results section.

To assess functional connectivity, group ICA was performed. Prior to ICA, spatially preprocessed data underwent further cleaning through nuisance regression, incorporating the same nuisance variables used in the GLM analysis: three discrete cosine transform basis functions (high-pass filter with a cutoff of 384 s), six head motion parameters, and six WM and six CSF-related time courses. Next, denoised data were analyzed using the Group ICA of fMRI Toolbox (GIFT, version 4.0b)81. Dimensionality reduction was conducted via principal component analysis at the individual and group levels. To identify ICs associated with film observation, emotional facial responses, and emotional subjective responses, temporal concatenation across all participants was used. The Infomax algorithm201 was used to estimate 30 ICs. Component stability was assessed using ICASSO (http://research.ics.aalto.fi/ica/icasso/) with 20 iterations. Finally, data were back-reconstructed using the default GIFT settings, with z-scoring applied to derive each participant’s time courses and spatial maps.

For each of the 30 ICs, we assessed their involvement in the three task conditions using temporal and spatial sorting procedures. In the temporal sorting procedure, we extracted the time courses associated with each IC for all participants and performed two-level random-effects analyses similar to those conducted for regional activity. For the first-level (corresponds to “Temporal Sorting” in GIFT), we regressed the back-reconstructed time course of the 30 components for each participant onto a first-level design matrix containing only task-related regressors (film observation, emotional facial responses, and emotional subjective responses), yielding one beta weight per regressor that reflected the strength and direction of the task-related activity for the time-series of that IC. These beta weights were then entered into second-level random-effects analyses (corresponds to “Stats on Beta Weights” in GIFT) across participants to test whether each task-related regressor significantly related each IC. Statistical significance was assessed using a one-tailed threshold of p < 0.05, corrected for multiple comparisons across the 30 ICs using the false discovery rate (FDR) method. Following temporal sorting, spatial sorting was performed to select further task-relevant components. This process involved evaluating the spatial correspondence between the IC maps and the activation maps obtained from the prior regional activity analysis (i.e., SPM{T} maps for the contrast of film observation, emotional facial responses, or emotional subjective responses thresholded at uncorrected p < 0.001). Specifically, we calculated the spatial correlation coefficient between each IC map and the SPM-derived template map to assess their degree of overlap. To identify components potentially related to, but not identical with, the task-related activations identified by SPM, we applied a spatial correlation threshold of 0.3. While the threshold may appear modest, it aligns with multiple prior studies that have employed similar spatial correlation criteria (r = 0.26–0.3) for ICA spatial sorting202,203,204,205. These precedents demonstrate that a moderate threshold effectively balances sensitivity to partially overlapping neural networks with the exclusion of noise-driven or redundant components. In our analysis pipeline, spatial sorting was applied as a complementary step following rigorous temporal sorting with FDR correction. This two-stage approach ensured robust and reliable identification of task-related ICA components that extended beyond conventional SPM-based findings.

To investigate the causal relationships among the ICs, DCM85 was applied using DCM12. This approach enables the evaluation of functional network patterns, including large-scale brain networks comprising multiple regions86,87,88. We investigated the intrinsic connectivity among the ICs associated with film observation, which were presumed to reflect sensory processing (early visual, higher-order visual, and auditory networks), facial expressions, and subjective emotional responses. For each participant, we extracted the time-series data corresponding to the ICs of interest and used them as inputs for DCM. Subsequently, individual hypothetical models were constructed, incorporating bidirectional (i.e., forward and backward) intrinsic connections (Fig. 5). Based on psychological and anatomical evidence, bidirectional connectivity was assumed between facial and subjective emotional response networks, as well as among sensory networks. Nine DCM models were constructed to test hypotheses about whether sensory networks interact with facial and subjective emotional response networks, and how sensory networks converge or process information. A detailed description of the network model is provided in the Results section.

We used the stochastic DCM approach, which enables modeling of both time-varying endogenous fluctuations and sensory perturbations elicited by continuous film viewing90,91,92. Model selection was conducted using a random-effects Bayesian model selection framework97, and exceedance probabilities were used as the evaluation metric, based on the premise that one model was more likely than the others to best explain the group-level data206,207. We then compared the families98 in the same way.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.