Introduction

The capacity of human conscious perception is limited1, and this has driven psychologists to explore the possibility of unconscious processing in the sensory system. However, the effects of unconscious processing have been inconsistent and weak, leading to a never-ending debate about the existence of unconscious perception/cognition2. In recent years, interocular suppression, specifically continuous flash suppression (CFS), has emerged as a promising tool to examine the existence of unconscious processing. CFS allows researchers to investigate whether visual stimuli that are suppressed and rendered invisible can still be processed unconsciously. Previous studies have reported various behavioral effects when the target stimulus presented in the suppressed eye was made invisible by a sequence of dynamic high-contrast suppressors presented in the other eye. Evidence suggests that CFS is capable of detecting differences between stimuli in gaining access to awareness3,4,5,6. Additionally, studies using suppressed stimuli as unconscious primes have shown effects on subsequent behavioral responses7,8, suggesting sub-perceptual-threshold processing in interocular suppression. In the present study, we sought to further investigate whether the neural underpinnings of such sub-perceptual-threshold processing can be reliably identified in the brain.

Among a vast variety of visual stimuli, human face arguably has the utmost importance in the vision science community due to its ecological relevance. Brain regions dedicated to face processing have been consistently reported in both humans and non-human primates9,10. To investigate whether various facial information can be accessed without awareness, the CFS paradigm has been widely used. For example, fearful faces have been shown to receive privileged processing compared to neutral and happy faces, as evidenced by shorter suppression durations11,12,13. This method has also been used to differentiate individuals with varying levels of depression14 and psychopathic traits15. Additionally, studies have examined the role of unconscious appraisal in the evaluation of face perception along dimensions of social interaction such as dominance, trustworthiness, and attractiveness. For instance, Stewart et al.16 demonstrated that dominant and untrustworthy faces emerged into consciousness significantly slower than neutral faces, while attractive faces had the privilege of breaking suppression and reaching consciousness earlier17,18.

Despite the behavioral effects being widely supported, the neural basis of the visual signals under interocular suppression is still not fully understood. It remains controversial whether interocularly suppressed faces generate reliable neural signals. For instance, a study by Fogelson et al.19 demonstrated that the middle occipital gyrus, lingual gyrus (LING), and middle occipital and lunate sulci could distinguish suppressed faces from tools, suggesting that unconscious face information could be processed in these regions. However, Fang and He20 reported that the brain activation for invisible faces was almost eliminated in both the ventral and dorsal streams. Using Magnetoencephalography (MEG), Kume et al.21 showed that the amplitude and latency of M170, generated by the fusiform face area (FFA), was attenuated and delayed under binocular rivalry compared to control conditions. Sterzer et al.22 demonstrated that activity patterns in FFA and the parahippocampal place area differentiated faces and houses even when the stimuli were rendered invisible by interocular suppression. Notably, in their study, the information regarding invisible stimuli could only be retrieved through fine-scale multivariate analysis rather than conventional univariate analysis.

Furthermore, the nature of the stimulus and the type of analysis used may also play a crucial role in the detection of neural activation under interocular suppression. For example, when comparing invisible faces to tools, the activation in the FFA was barely detectable. However, the activation was enhanced with emotional information by contrasting invisible fearful faces against invisible neutral faces, highlighting the importance of emotional content in activating FFA20. Another study further showed that the activation in the FFA was positively correlated with that in the amygdala in the invisible condition23.

Together, these mixed results are inconclusive with regard to the depth of unconscious processing in the visual hierarchy. Specifically, the inconsistent brain imaging results have raised concerns about the nature of behavioral effects found under CFS. Some effects seemingly driven by suppressed faces could be due to lower-level visual features rather than face processing24,25,26. Therefore, robust neural findings are necessary to unequivocally demonstrate that interocularly suppressed faces are processed in depth and in a ā€œface-likeā€ manner.

The inconsistency of previous studies in identifying the neural correlates of unconscious face perception may be due to variations in analysis techniques and signal-to-noise ratios of stimuli. In this study, we therefore aimed to investigate the impact of these two factors on the retrieval of suppressed-face-driven signals in the brain. To achieve this, we made three major improvements to the study design. First, we employed a novel variant of interocular suppression known as dis-continuous flash suppression (dCFS). dCFS allows for visual information to be presented for a longer duration by alternating the suppressor and the suppressed target on and off repeatedly. Previous research has suggested that dCFS enhances the probability of retrieving unconscious signals24,25. Second, we used both univariate and multivariate analyses to determine whether voxel-level analysis is essential in revealing the neural underpinnings of subliminal facial information. We hypothesized that multivariate analysis would be more effective in identifying face-driven activations in both lower and higher visual regions (e.g., primary visual cortex, FFA, and OFA). Finally, we utilized dynamic video clips alongside static images to increase the signal-to-noise ratio of stimuli. Previous studies have shown that dynamic stimuli elicit stronger and more widespread brain responses26,27. We expected these manipulations to enhance the detectability of unconscious face signals.

With these improvements, our study provides evidence that unconscious facial information can be reliably extracted under visual suppression, bridging the gap between existing brain imaging and behavioral findings. Our results suggest that using a novel visual suppression technique, sophisticated analysis, and dynamic stimuli can strengthen the signal-to-noise ratio, allowing for the observation of robust unconscious signals. These findings call for a paradigm shift in studies aimed at detecting weak sensory signals.

Results

Behavior results

For each trial in both conscious and unconscious conditions, two tasks namely detection task and localization task were involved. Both tasks were used to examine whether the suppressed targets were perceived in unconscious condition as well as to maintain participants’ engagement in the conscious condition.

Based on the given instruction we assumed that if the suppressed targets didn’t break participants’ awareness (i.e., unseen), then the performance of the localization task (measured by the accuracy rate) for those trials (i.e., unseen) would close to chance level. Conversely, if the accuracy rates for the reported unseen trials were higher than chance level, then it’s very likely that the awareness of those unseen trials were penetrated into participant’s perception.

With this rationale, we first included trials labeled as ā€œseenā€ in the conscious conditions and trials marked as ā€œunseenā€ in the unconscious conditions, relying on the results of the detection task. Subsequently, we proceeded to compared the accuracy rates of the localization task between conscious and unconscious conditions based on those included trials. Our results showed that the accuracy rate for the conscious condition was extremely high (M = 97.91, SD = 3.27, SE = 0.50, t(42) = 96.17, p < 0.001, Cohen’s d = 14.65) while that for the unconscious condition was not differing from the chance level (M = 52.35, SD = 8.35, SE = 1.27, t(42) = 1.84, p = 0.07, Cohen’s d = 0.28), suggesting the validity of participants’ behavior reports (Fig.Ā 1a). Specifically, participants exhibited excellent performance on the localization task when reporting ā€œseenā€ stimuli. Conversely, their performance mirrored random chance when reporting ā€œunseenā€ stimuli.

Fig. 1: The performance of behavior task in both conscious and unconscious conditions.
figure 1

a The accuracy rate in conscious condition was significantly higher than chance level whereas that in the unconscious condition was not different from the chance. b The unseen rate in conscious condition was relatively low while that in the unconscious condition was slightly higher than 70%. In both graphs, the error bars denote standard error and the horizontal lines mark for 50% and 75% respectively.

Additionally, we examined the validity of the used staircase (i.e., three-up-one-down) principle. Subsequently, we calculated the unseen rates for conscious and unconscious conditions independently. As expected, the unseen rate for the conscious condition is relatively low (Mean = 1.72%, SD = 3.99), suggesting the concentration of the participants’ involvement. Whereas the unseen rate for the unconscious rate is slightly higher than 75% (Mean = 80.49%, SD = 14.94, t(42) = 2.41, p = 0.04, Cohen’s d = 0.37) (Fig.Ā 1b).

Lastly, we calculated the mean Z-score of each participant to determine if there is an outlier in the present cohort of participants (Fig.Ā S3). During the followed-up imaging analyses, only seen trials in the conscious condition and unseen trials in the unconscious condition were included.

Univariate results

In both the conscious and the unconscious conditions, three contrasts were created: static faces vs. static scenes, dynamic faces vs. dynamic scenes, and combined (i.e., static, and dynamic) faces vs. combined scenes.

Our results from the second-level analysis showed that in the conscious condition, comparing against static scenes, static faces yielded stronger activation in several regions including left inferior occipital gyrus (IOG), left fusiform gyrus (FG), right inferior temporal gyrus (ITG) and triangular part of right inferior frontal gyrus (IFGtr) (Fig.Ā 2a black text). Additionally, in comparison to dynamic scenes, dynamic faces had greater activations in left IOG, right middle temporal gyrus (MTG), and bilateral FG (Fig.Ā 2b black text). Collapsing the static and dynamic conditions, we found that combined faces elicited greater activations than combined scenes in regions including left IOG, left opercular part of inferior frontal gyrus (IFGop), right triangle part of inferior frontal gyrus (IFGtr), right hippocampus (HP), and bilateral FG (Fig.Ā 2c black text).

Fig. 2: Results of three univariate contrasts in the conscious condition.
figure 2

a Univariate contrasts of static faces >static scenes demonstrated significant effects in several regions, including left IOG, left FG, right ITG, and right IFGtri. b Univariate contrasts of dynamic faces >dynamic scenes demonstrated significant effects on left IOG, right MTG, and bilateral FG. c Univariate contrasts of combined faces >combined scenes found significant effects in the left IOG, left IFGop, right IFGtr, right HP, and bilateral FG. Apart from reporting regions with the strongest univariate activations (black text), adjacent regions with voxels of more than 30 were also labeled (gray text).

In contrast to the conscious condition, there was no difference in univariate activation between faces and scenes regardless of static, dynamic, or combined comparison in the unconscious condition. TableĀ S3 provides more details regarding the main clusters and subregions observed from the univariate analysis.

Whole-brain multivariate results

Three sets of binary decoding were performed in the conscious and unconscious conditions using a supervised linear SVM and a searchlight method. In the conscious condition, decoding static faces versus static scenes revealed that left postcentral gyrus (PoCG), anterior cingulate cortex (ACC), right calcarine (CAL), right superior temporal gyrus (STG), right supplementary motor area (SMA), and dorsal part of superior frontal gyrus (SFGdor) could distinguish static faces from static scenes. Moreover, left superior occipital gyrus (SOG), left IOG, left SMA, and right triangular part of inferior frontal gyrus (IFGtr) were able to distinguish dynamic faces from dynamic scenes. When combining static and dynamic trials, right lingual (LING) and right STG could differentiate faces from scenes (see Fig.Ā 3, left column, and TableĀ S4, panels in blue color).

Fig. 3: Results from the whole brain decoding in the conscious and unconscious conditions.
figure 3

Results from the whole brain decoding in the conscious (left) and unconscious conditions (right). Overall, in the conscious condition we found that regions including the left PoCG, left ACC, right CAL, right STG, right SMA, and right SFGdor cand distinguish static faces from static scenes. Further, regions include left SOG, left IOG, left SMA and right IFGtr can differentiate dynamic faces from dynamic scenes. While collapsing static and dynamic trials, regions include right LING, and right STG can distinguish faces from scenes. Whereas in the unconscious, we found that left ACC, left SMA, right CAL, right SFGdor and right MTG can distinguish static faces from static scenes. Regions include left PLC, left STG, left ACC, right LING, right SFGdor, right TPsup, right SPG, and bilateral STG can differentiate dynamic faces from dynamic scenes while both were suppressed into invisible. Also, while collapsing static and dynamic trials, we found that left LING can distinguish invisible faces from invisible scenes. Color bars indicate tvalues across all figures.

In the unconscious condition, we first decoded static faces versus static scenes and dynamic faces versus dynamic scenes when both stimuli were suppressed and invisible. The results revealed that left ACC, left SMA, right CAL, dorsal part of right superior frontal gyrus (SFGdor), and right MTG could distinguish invisible static faces from invisible static scenes. Moreover, left paracentral lobule (PCL), left STG, ACC, right lingual (LING), dorsal part of right superior frontal gyrus (SFGdor), right STG, superior part of right temporal pole (TPsup), and right superior parietal gyrus (SPG) were able to differentiate invisible dynamic faces from invisible dynamic scenes. Lastly, when collapsing static and dynamic trials and decoding combined faces versus combined scenes, we found that left lingual (LING) could distinguish invisible faces from invisible scenes (see Fig.Ā 3, right column, and TableĀ S4, panels in green color).

ROI multivariate results

Utilizing a more refined approach for identifying individual ROIs, we investigated if face-related areas, such as the FFA and OFA, could differentiate unconscious faces and scenes under static and dynamic conditions. The bilateral coordination of OFA and FFA for each set of stimuli (static, dynamic, and combined) is presented separately for each individual in TableĀ S5 and TableĀ S6 (Supplementary Material). Our findings suggest that bilateral OFA and FFA can discriminate dynamic faces from dynamic scenes (lOFA: t(42) = 4.07, corrected p < 0.001 (all p values were corrected using FDR method), Cohen’s d = 0.62; rOFA: t(42) = 3.12, corrected p = 0.007, Cohen’s d = 0.48; lFFA: t(42) = 2.26, corrected p = 0.034, Cohen’s d = 0.35; rFFA: t(42) = 2.01, corrected p = 0.039, Cohen’s d = 0.31). Nonetheless, only the right OFA (t(42) = 2.17, corrected p = 0.034, Cohen’s d = 0.33) could differentiate static faces from static scenes (Fig.Ā 4a). Moreover, when collapsing static and dynamic trials, we observed that the right OFA could distinguish invisible faces from invisible scenes (rOFA: t(42) = 2.60, corrected p = 0.027, Cohen’s d = 0.40) (Fig.Ā 4b). Individual results were provided in Fig.Ā S4.

Fig. 4: Results from the ROIs decoding.
figure 4

a In the unconscious condition, all the predetermined ROIs could differentiate dynamic faces from dynamic scenes yet only the right OFA could distinguish static faces from static scenes. b While collapsing static and dynamic stimuli, right OFA could differentiate invisible faces from invisible scenes. Error bars denote ± SEM. Asterisks: * p < 0.05; ** p < 0.01; *** p < 0.001, pvalues were corrected for multiple comparisons using FDR-adjustment.

Discussion

Consistent with prior reports20,22,28,29, we found that unconscious facial signals were only detected when detailed, voxel-level information was preserved in the analysis. Specifically, our univariate analysis showed no differential activations when contrasting invisible faces against invisible scenes. In contrast, our whole-brain multivariate pattern analysis (MVPA) decoding analysis (linear support vector machine) revealed distinctive activation patterns between invisible faces and invisible scenes in the left lingual region and the surrounding areas, including the FFA, regardless of static and dynamic features. We found that using dynamic face stimuli was key to observing these effects. This was further supported by our follow-up region of interest (ROI) analysis, which demonstrated that bilateral occipital face area (OFA) and FFA played a crucial role in unconscious face perception. Specifically, our results showed that while the stimuli remained invisible, bilateral FFA and OFA differentiated dynamic faces from dynamic scenes, while only the right FFA could distinguish static faces from static scenes.

Our findings are consistent with previous studies of visual perception involving dynamic stimuli. For example, it has been shown that dynamic faces strengthened activity in face-selective ROIs compared to static faces30,31,32. Furthermore, signal changes to facial images inherent to dynamic stimuli caused the activation of a greater pool of neurons33. Additionally, TMS studies have found that the disruption of OFA reduced the response to static faces but not to dynamic faces, suggesting an enhanced power of dynamic stimuli in face perception34.

In addition to the well-established face-related areas, our whole-brain MVPA revealed the significant involvement of the LING as a critical component of the face cortical network for both stimulus types35,36,37,38. Furthermore, using static stimuli, our results also supported Pourtois et al.’s.39 finding that the left MTG is a convergence area during face perception40. Moreover, our results revealed that frontal areas were able to differentiate between faces and scenes when both were suppressed under dCFS. Such results also aligned with prior reports and might be benefited from the experimental paradigm employed in our study. For instance, using chromatic flicker fusion paradigm, a study by Fogelson et al.19 demonstrated that categorical information was not confined to early visual areas but instead extended to regions such as the temporal cortex and the superior part of the precentral sulcus of the frontal lobe.

In fact, there is compelling evidence that the frontal lobe plays a crucial role in handling unconscious information across a spectrum of cognitive functions, such as vision discrimination, cognitive control, memory formation, and language processing.

For instance, by using a visual discrimination task a study by Mei et al.41 elucidated that pattern of neural activity associated with unconscious stimuli can be decoded not only in the ventral visual pathway but also within parieto-frontal brain regions. Furthermore, using computer vision models their results that neural patterns of conscious items could correctly predict corresponding unconscious counterparts, suggesting that the neural representations of these two types of stimuli are similar41.

Additionally, an unconscious priming effect can also be attributed to activity within mid-dorsolateral prefrontal cortex by scrutinizing the congruency effect42. Although the underlying mechanism remained unclear, their results were consistent with the idea that unconscious information could influence the higher cognitive area (i.e., mid-DLPFC).

Thirdly, it has been demonstrated that the dorsolateral and anterior prefrontal cortex can process unconscious information in a way that goes beyond automatic forms of sensorimotor priming, supporting implicit working memory and higher-level cognitive function43. In the same vein, Bergstrƶm & Eriksson44 demonstrated that pattern of neural activity could differentiate between sample stimuli present and absent in the frontal cortex. Their results further showed that non-consciously retained information requires persistent neural activity in the frontal and occipital cortex, and may involve further cognitive control mechanisms during memory recognition.

Lastly, using a novel experimental paradigm, Axelrod et al.45 demonstrated that meaningful sentences, rendered invisible via CFS, could still be discerned in specific frontal regions such as the left posterior superior temporal sulcus and left middle frontal gyrus. This underscores the frontal cortex’s involvement in processing unconscious semantic content.

In sum, empirical studies showed that unconscious information can be processed across a network involving the frontal lobes and other areas of the brain. Additionally, neural representations of conscious and unconscious stimuli are overlapped.

These findings of unconscious activations in the frontal regions may seem to challenge consciousness theories advocating a key role of the frontal regions in generating consciousness, such as the global workspace theory46,47 and the high-order thought theory48. However, whether the activity underlying the unconscious face processing is qualitatively or quantitatively different from that underlying conscious face processing remains largely unknown. We believe our approach provides a new paradigm that allows future research to directly examine and disambiguate the role of the frontal regions in unconscious and conscious processing.

However, contradicting to expectation, the activation from the unconscious condition especially from the dynamic stimuli were broader than that from the conscious condition. Such results might stem from the utilization of Mondrian suppressor that we presented in the unconscious condition. More specifically, during the unconscious condition apart from presenting visual target to participants’ non-dominant eye, a series of flashing Mondrian were also presented to participants’ dominant eye. Such suppression procedure might influence the unconscious processing of the targets. It is possible that there is an unknown interaction between a class of suppressed stimuli (e.g., faces) and the Mondrian patterns. For example, similar features in the Mondrian patterns and the suppressed stimulus could lead to stronger feature-based suppression, and the effect on the decoding accuracy is unknown. To mitigate the unknow influences, planned ROI analyses were conducted.

Generally, the aforementioned results are in accordance with prior reports49.

To ensure that the participants were indeed unconscious of the suppressed stimulus, we showed that their performance in stimulus localization was at the chance level (Fig.Ā 3). However, we also agree that this was not a perfect indicator of their conscious content given its binary characteristics41. Moreover, the fact that we found decodable unconscious information in the face regions did not lend unequivocal support to ā€œface-likeā€ unconscious processing. For example, conscious and unconscious face processes could be different in nature as a recent study pointed out the fragmented processing of unconscious shapes50. Future research is needed to address these intriguing questions.

Based on our findings, we speculate that the observed differential effect may be at least partially related to the processing of social information carried by human faces and eye gaze. Recent research has highlighted the significant role of social cues, such as eye gaze, in shaping perceptual processing in an unconscious manner. More specifically, it has been demonstrated that faces with direct gaze can undergo effective processing even without conscious awareness4,51. This finding suggests a preferential access to conscious awareness for faces with direct gaze, emphasizing the significant impact of social cues on the modulation of perceptual processing. Furthermore, previous imaging studies have consistently revealed the involvement of distinct brain regions, including the FG, right superior temporal sulcus (STS), and medial prefrontal cortex, during the processing of faces with eye gaze52,53,54,55. Cumulatively, these previous findings lend support to the proposition that the observed activation of tempo-frontal regions in our study likely represents the processing of social cues, which reflects the ecological relevance of face perception. The integration of additional social information conveyed through facial expressions may facilitate face processing even in the absence of conscious awareness, highlighting the profound impact of social cognition on perceptual mechanisms.

Overall, the current study demonstrates the strength of incorporating dynamic stimuli and preserving voxel-level information in searching for unconscious neural signals in the brain. These results bridge previous rich behavioral and sparse neural findings and suggest that extracting reliable unconscious signals require an overarching consideration encompassing stimulus, experimental paradigm, and analysis pipelines. We believe that the current study has important implications for future studies investigating unconscious processing of sensory information.

Methods

Participants

Forty-three healthy right-handed (Edinburgh Handedness Inventory56) volunteers (male = 18), with age ranging from 20 to 35 years (Mean age = 22.40; SD = 3.00), were recruited from the local community. All participants had normal or corrected-to-normal vision and reported no history of neurological or psychological disorders. They were reimbursed approximately USD $60 for participating in two 60-min sessions. This study was performed in accordance with the ethical standards of the Declaration of Helsinki and approved by the Institutional Review Board of the National Taiwan University. All ethical regulations relevant to human research participants were followed. To determine the appropriate sample size for the current study, a pilot study was conducted with 10 participants using the same analysis. With a significance criterion of α = 0.05 and a power of 0.90, the minimum sample size required for detecting static face signals was found to be N = 113, while the minimum sample size needed for detecting dynamic face signals was N = 43. Based on these calculations, the adequate sample size for the current study was N = 43.

Stimuli

This present study employed both dynamic and static stimuli to investigate the perception of facial expressions. The dynamic stimuli were derived from a previous study by Pitcher et al.34 and comprised of 3-s video clips of faces and scenes. Face videos were films of seven children dancing and playing with toys and adults on a black background. On the other side, scene videos were captured in various locations, including pastoral landscapes taken from a moving vehicle in leafy suburbs, as well as footage from the air while flying through canyons or walking through tunnels (Fig.Ā 5a). The static stimuli were created by extracting five frames at intervals of 0.6 s from each video clip, resulting in five static images per clip (Fig.Ā 5b).

Fig. 5: An illustration of part of experimental stimuli, design, and trial procedure.
figure 5

a Scene video clips featured landscapes in different locations. Presented as dynamic sequence, each video lasted for 3 s. b From each video clip, static stimuli were generated with intervals of 0.6 s. c Participants were asked to complete a two-session scan, each containing eight runs. Four out of the eight runs were unconscious, labeled in gray (i.e., R2, R3, R6, and R7), while the remaining four were conscious (i.e., R1, R4, R5, and R8). An illustration of the trial procedure for the unconscious condition. In the case of SF in an unconscious condition, each trial consisted of five repetitions of on-off cycles in which both Mondrian suppressors on the dominant eye and visual targets on the non-dominant eye were turned on and off simultaneously. On each trial, participants were instructed to perform a detection task during the on-off cycles, followed by a localization task at the end.

Procedure and design

Prior to the scanning, the dominant eye was identified using the hole-in-the-card test57. During the fMRI sessions, participants viewed the binocular stimuli under dCFS25 through a goggle system with 4.7’ × 2.4’ × 1.3’ dimensions (Video Goggle/Resonance Technology, Inc.). Stimuli were presented against a black background with a resolution of 900 by 600 pixels and a refresh rate of 60 Hz.

The experiment consisted of two 60-min sessions, each comprising four runs of two conditions: conscious (CON) and unconscious (UNCON) (see Fig.Ā 2a). Each run consisted of four types (blocks) of stimuli: static faces (SF), static scenes (SS), dynamic faces (DF) and dynamic scenes (DS). Each block contained 11 trials, with each trial lasting for 6 s, and was interleaved with 8 s, 10 s, or 12 s blank durations.

Following the completion of half of the runs in the first session, the T1-weighted scan was acquired. The two 60-min sessions were scheduled at least 1 h apart to minimize the effects of fatigue. To minimize potential order effects, in each session, the presentation order of the conditions as well as the stimulus types were counterbalanced.

At the onset of each experimental trial, a blank screen was presented for a duration of 2 s. Thereafter, participants were exposed to a series of flashing colored Mondrian suppressors to the dominant eye, while a target stimulus (SF, SS, DF, or DS) was presented to the non-dominant eye for a duration of 400 ms, resulting in an ā€œonā€ period. Following this, the binocular presentation was terminated, and a blank screen was displayed for a duration of 400 ms, resulting in an ā€œoffā€ period.

As previously noted, a set of five static stimuli was generated from each of the corresponding video clips. In static trials, each of these stimuli was assigned to a separate ā€œonā€ period, resulting in a total of five ā€œonā€ periods within a trial. Conversely, in dynamic trials, each video clip was segmented into multiple 400 ms segments, and each segment was assigned to a separate ā€œonā€ period, again resulting in a total of five ā€œonā€ periods within a trial. This approach allowed for the investigation of visual processing and awareness across static and dynamic stimuli, and enabled the exploration of potential differences in processing between these two types of stimuli.

While scanning, participants were instructed to hold two response pads, one for each hand, each equipped with two buttons (Lumina, LS-PAIR) while they were in a supine position. As part of the paradigm, participants were required to press the right button on the left pad if any part of the suppressed target became visible during each 400-ms-on-400-ms-off cycle (i.e., detection task). If a breakthrough was reported, participants were instructed to report the location of the target by pressing either the left or right button on the right pad (i.e., localization task). In contrast, if a breakthrough was not reported, participants were encouraged to make their best guess. If they remained unable to decide, they were instructed to randomly press either left or right key. Notably, there were no questions related to facial expressions. During data analysis, ā€œconscious trialsā€ were defined as trials in which the target were detected successfully in the CON condition, while ā€œunconscious trialsā€ referred to trials in which the suppressed target did not break through to conscious awareness in the UNCON condition.

On each trial, the contrast of the colored Mondrian suppressors remained constant at 100%, whereas the contrast of the target linearly increased from 0% to a designated value, determined by a trial-by-trial thresholding procedure, over the course of a trial. More specifically, the thresholding procedure used a three-up-one-down staircase, whereby the contrast of the target decreased if the suppressed target was detected and increased if it was not. The step size was 5% of the full contrast. Trial procedure was illustrated in Fig.Ā 5c. In addition, to facilitate fusion, a two-layer white frame (subtending 7° 9’ 0.16ā€ā€‰Ć—ā€‰7° 9’ 0.16ā€) remained on the screen throughout the experiment.

Image acquisition and preprocessing

MRI scanning was performed on a 3-Tesla Siemens Prisma scanner at the Imaging Center for Integrative Body, Mind and Culture Research at the National Taiwan University. Whole-brain functional T2-weighted echo-planar images (EPI) were collected with a Blood Oxygenation Level Dependent (BOLD) sequence (TR/TE = 2000/32 ms, FOV = 256 mm, matrix = 74 × 74, slice- thickness = 3.4 mm). The in-plane resolution was 3.4 × 3.4 mm. A T1 magnetization prepared, rapid-acquisition gradient echo (MPRAGE, TR/TE = 2000/2.28 ms, FOV = 256 mm, matrix = 256 × 256, slice-thickness = 1 mm) sequence was used to collect a high-resolution image of each participant’s brain. 32 slices were collected with a 20-channel head coil. Slices were oriented roughly parallel to the AC-PC with whole brain covered.

The preprocessing steps for each experimental run were conducted using SPM 12. Firstly, the first volume in each run was aligned to the first volume of the first run (realigned) for each participant. Subsequently, each image in each run was registered to the first volume of that run (registered). Secondly, the realigned and registered images were normalized to MNI space (ICBM 152 Nonlinear Asymmetrical template version 2009c58). Lastly, resulting images were smoothed with a Gaussian kernel (8 mm FWHM) for univariate analyses but not for multivariate analyses. Notably, the region-of-interests (ROI) SVM analysis was performed in the native space without normalization and smoothing.

Univariate analyses

First, brain activity (BOLD signal change) associated with each of the stimulus type in the conscious and the unconscious condition were independently analyzed in a mass-univariate approach59.

During the univariate analyses, data across two scanning sessions were concatenated, and GLM models (referred to as GLM model 1) were conducted separately for conscious and unconscious conditions. Each GLM model included a total of 4 regressors, one for each stimulus type, namely SF (static face), SS (static scene), DF (dynamic face), and DS (dynamic scene). For each participant, the regressors were constructed by a boxcar function representing the onset and duration of the relevant stimulus type, convolved with the canonical hemodynamic response function. Other nuisance regressors included white matter signal and 6 motion parameters obtained from motion correction. To remove low-frequency drift, a high-pass filter was employed with a cut-off of 128 s.

As part of our primary goal, we examined three types of face effects: static face, dynamic face, and combined face effects. To this end, group-level inferences for each type of face effect was made by entering the appropriate contrast into a T-Test, using the following three contrasts: (1) static face vs. static scene; (2) dynamic face vs. dynamic scene and (3) combined face vs. combined scene. T-statistic image threshold was set at uncorrected, p < 0.001, extend threshold > 20.

Multivariate analysis

Differing from univariate analyses, during multivariate analyses GLM models (referred to as GLM model 2) were not only separately conducted for conscious and unconscious conditions but also employed to each of the experimental runs across two scanning sessions. Each GLM model included a total of 5 regressors, one for each stimulus type and one for blank. Subsequently, Subsequently, six pairs of contrasts (i.e., static faces against blank, dynamic faces against blank, dynamic faces against blank, dynamic scenes against blank, combined faces against blank, and combined scenes against blank) were created. A whole-brain binary decoding was then performed on voxel-wise data that obtained from the GLM results. Notably, GLM model 2 were conducted using realigned, co-registered and normalized functional images.

Subsequently, the binary decoding analyses, utilizing supervised Support Vector Machine (SVM), were conducted independently for each participant, with n-fold cross-classification performed at the run level. Notably, the supervised SVM was employed in both conscious and unconscious conditions, using the CoSMoMVPA package59. To identify brain regions that can distinguish suppressed faces from suppressed scenes in the unconscious condition, numeral sets of binary decoding were conducted: (1) static faces vs. static scenes, (2) dynamic faces vs. dynamic scenes, and (3) combined faces vs. combined scenes. Each set of the binary decoding was processed as followed.

First, a sphere with a radius of 3 millimeters was defined and centered in each voxel, and the pattern of responses within each sphere was represented by a feature vector for each type of stimulus. Following this, we divided each participant’s runs into a training set (N-1 runs; N denotes the total number of runs) and a testing set (the remaining run). With the aforementioned feature vectors, two feature matrices representing the spatial patterns of training and testing data sets were derived.

To solve six two-class problems, a linear SVM model was constructed after normalizing the training data set. As a result of repeating this process for all gray matter voxels (i.e., searchlight analysis)60 using the n-fold principle (i.e., leave-one-run-out cross validation), a three-dimensional accuracy map was derived as a measure of discriminability between different conditions (i.e., static faces vs static scenes, dynamic faces vs dynamic scenes, and combined faces vs combined scenes). To convert the accuracy map into a p value map, a binominal distribution was tested, with the null hypothesis that no differences existed between the two groups. Lastly, significant clusters were identified with FWE-corrected p < 0.05 and cluster sizes >30 voxels in both conscious and unconscious condition.

Moreover, to account for inter-individual variability in the localization of face-related regions such as the FFA and OFA, we identified the individual FG using T1 parcellation results. Following previous research, we defined the FFA and OFA as the portions of the FG adjacent to the posterior temporal gyrus and occipital gyrus, respectively. Spherical masks with a radius of 16 mm were used to create anatomical FFA and OFA regions, which were then overlaid onto each participant’s general linear model results in the native space. We derived three sets of individual functional ROIs by centering 10 mm spheres around the maximum activation within each participant’s FFA and OFA regions. We employed a linear SVM independently on each ROI and assessed the resulting classification performance using the same methods described previously. The analysis pipeline is presented in Fig.Ā 6. The distribution of classification accuracies was assessed with the Shapiro-Wilk test, and results were validated using one-tailed one-sample t-tests.

Fig. 6: An illustration of the analysis pipeline.
figure 6

An illustration of the analysis pipeline which involves the use of three GLM models (Model 1, Model 2, and Model 3) for univariate, whole brain, and ROI decoding analyses, respectively. The GLM Model 1 analyzed CON and UNCON separately and three pairs of contrasts (static faces against static scenes, dynamic faces against dynamic scenes, and combined faces against combined scenes) were examined for each of the conditions. While GLM Model 2 compares each stimulus against a baseline, resulting in six pairs of contrasts between static faces, static scenes, dynamic faces and dynamic scenes, combined faces, and combined scenes against baselines. In addition, to obtain functionally guided ROIs on an individual basis for the subsequent analysis, a GLM Model 3, with the same pairs of contrasts as Model 1, was created in native space only for the CON condition.

Reporting summary

Further information on research design is available in theĀ Nature Portfolio Reporting Summary linked to this article.