Introduction

A growing body of research supports the interplay between language and emotion in the human brain1,2,3. In this regard, prior accounts suggest that cognition and emotion are not categorically different but are deeply intertwined in the human brain4,5. However, the extent to which language and emotion processes interact with each other, and in which manner, deserve further investigation. On the other hand, visual bodily signals–including facial expressions–are also fundamental when it comes to pragmatic processes in human communication6. This study addresses this issue by investigating the possible influence of subliminal emotional facial expressions on syntactic processing while listening to connected speech, as indexed by event-related brain potentials (ERP).

Relevant to our aims, syntactic violations during sentence processing (e.g., gender or number violations), relative to syntactically correct sentence elements, typically elicit a biphasic ERP response: an early left anterior negativity (E/LAN) followed by a late posterior positivity (P600)7,8,9. Past research showed that the LAN component is typically linked to the early (and probably highly automatic) detection of a morphosyntactic mismatch based on the agreement relations for structure-building7,8,10. Some studies have also observed LAN modulations in response to verbal working memory operations11,12 (see also13,14). This negative deflection usually peaks at left frontal electrodes –albeit fronto-central distributions have also been reported– between 300 and 500 ms (ms) after the presence of a syntactic anomaly10,11. Possible neural sources of the LAN component include Brodmann area (BA) 44 in the inferior frontal gyrus, the frontal operculum (FOP), and the anterior superior temporal gyrus (aSTG), all of which are considered key nodes within the syntactic network7,8 (for a review, see15). Subsequently, the P600 component, a centro-parietal positivity peaking around 600 ms after the stimulus onset, is often associated with more controlled language-related processes (which are relatively more strategic, context-sensitive), thus reflecting a later phase of sentence-level integration and processes of syntactic reanalysis and repair7,8,16 (for an alternative view, see17,18). Although the neural sources of the P600 remain an open question in the current literature, they have been proposed the posterior STG (pSTG) and the superior temporal sulcus (STS), which would underlie the integration of different sources of information, including semantics and syntax7,8.

Interestingly, the pSTS may serve as a key hub for integrating visual and auditory signals in speech, including emotion perception. Neural models of face processing19,20 proposed the pSTS as a key brain structure for the visual analysis of changeable aspects, such as expression, eye gaze, or lip movement. More recently, Deen et al.21 demonstrated that face-sensitive areas within the STS also respond to voice perception compared to nonvocal music and environmental sounds, which supports the notion that this brain region reflects the multimodal processing of voice and face signals22. Furthermore, neuroimaging studies indicated that the STS, alongside the ventromedial prefrontal cortex (vmPFC), encodes perceived emotion categories (e.g., happiness, anger) in an abstract, modality-independent fashion23,24. Similarly, current research suggests that the anterior temporal lobe (ATL) is also a critical region involved in multimodal language and emotion processing22,25,26,27. Satpute & Lindquist3, for instance, noted that brain regions often involved in semantic processing –including the ATL– are also engaged during the perception and experience of discrete emotional experiences, as they provide the basis of conceptual knowledge to make meaning of one’s bodily sensations. Besides, the ATL contains subregions functionally linked to emotion-related structures, such as the orbitofrontal cortex or the amygdala26,27. Collectively, extant literature supports a potential interaction between language and emotion.

Of note, increasing studies show that syntactic violations can be processed subliminally, i.e., under reduced levels of perceptual awareness10,28,29,30. Batterink and Neville28, for instance, observed similar early negativities (LAN) in response to both undetected and detected syntactic violations in a dual task. They found a later positivity (P600) only in response to detected/conscious violations, suggesting more conscious awareness and controlled properties for its operation. Jiménez-Ortega et al.10,30, in turn, showed that masked emotional adjectives, which could contain morphosyntactic anomalies, modulate the syntactic processing of ongoing unmasked sentences both at early and late stages of syntactic processing (as indicated by LAN and P600 components). Furthermore, Hernández-Gutiérrez et al.31 observed an interaction between emotional facial expressions (happy, neutral, and fearful) presented supraliminally and morphosyntactic correctness but only during the P600 component (450–650 ms). Consequently, a growing body of evidence suggests that social and emotional information can impact syntactic operations under both supraliminal31,32,33,34,35,36,37 and subliminal10,30,41 conditions. Taken together, these studies reveal how both conscious and unconscious information–including emotional and linguistic stimuli–may impact ongoing language processes, paving the way to study how subliminally presented emotional expressions affect the processing of supraliminal syntactic information. Indeed, it is of interest to explore the degree and mode of sensibility of the linguistic processor to subtle social and emotional signals. This raises the question of how language processing aligns within an ecological perspective, in which syntax processing may be integrated within a complex and integrated system for human communication1,2,3,42.

Importantly, there is a long-standing debate in psycholinguistics as to the functional independence of syntactic processing from other perceptual and cognitive processes. In this regard, mixed findings can be found in the current literature. By and large, the studies reviewed above are in line with an interactive view of syntax. This view holds that different types of information (i.e., phonological, lexical, syntactic, semantic, and context integration) can be accessed in parallel during the early stages of linguistic processing, possibly allowing for free information exchange between processing subcomponents43,44. In contrast, other studies did not observe effects on early syntactic processing by emotional or social information45,46,47,48, in line with traditional views of syntax, in which syntactic processing is opaque or encapsulated to other perceptual and cognitive processes49,50,51.

The present study

Although recent accounts support the interplay between emotion and language1,2,3,4,5, it remains unclear whether syntactic processing can be modulated by subliminal affective visual stimuli, especially when it comes to automatic first-pass syntactic parsing (as reflected by the LAN component)31,38. This study, therefore, investigates the influence of subliminal emotional facial expressions on syntactic speech processing, using ERPs as the main research tool. Building on prior work, we used the same paradigm as in Rubianes et al.41 in which participants saw a scrambled face while listening to emotionally neutral spoken sentences that could contain morphosyntactic anomalies (based on number or gender agreement). Right before the target word (16 ms), in that study, the face identity (one’s own face, a friend’s face, or a stranger’s face) appeared for 16 ms and was masked by the scrambled stimulus. Here, instead of manipulating face identities, we presented emotional expressions from unfamiliar identities under masked conditions, thus maintaining the same procedure. With this approach, both the target word and emotional expression are presented almost simultaneously in order to test the effects of these processes.

If syntactic processing is context-sensitive and permeable, we hypothesized that it would be modulated by subliminal emotional expressions, as indexed by LAN and P600 components. More specifically, we predict that angry faces, relative to happy and neutral faces, would elicit the largest LAN amplitude, followed by a reduced P600 amplitude, in line with previous studies33,37,41. Alternatively, another possibility that might be expected is that the LAN effect would be reduced or even vanish as a result of capturing processing resources by emotional expressions. This observation could be compatible with prior results for both social and emotional stimuli10,47. In either scenario, these results would speak against the view that syntactic processing is modular and encapsulated from other cognitive or emotional processes. On the contrary, if syntactic processing is indeed modular and encapsulated, we would expect no modulation of either the LAN or the P600 components by subliminal emotional faces.

Methods

Participants

Thirty-six native Spanish speakers (twenty-four females) were included with no history of neurological or cognitive disorders and reported normal or corrected-to-normal vision (meanage = 23.24; standard deviationage = 4.78). Participants were recruited from a participant pool consisting of undergraduate and graduate students from the Complutense University of Madrid (Faculties of Psychology and Education). They all received 20€ for their participation. According to the Edinburgh Handedness Inventory52, all participants were right-handed (mean + 88; range + 72 to + 100). Written informed consent was obtained from all participants before the experiment. Besides, the study was conducted following the international ethical protocol for human research (Helsinki Declaration of the World Medical Association) and approved by the Ethics Committee of the Faculty of Psychology of the Complutense University of Madrid.

A priori power analysis was conducted using G*Power software53. Based on an effect size of f = 0.25 derived from Hernández-Gutiérrez et al.31, the analysis indicated that a minimum sample size of 28 participants was required to achieve a statistical power of 0.80 at an alpha level of 0.05. Nevertheless, a total of thirty-six participants were included in the present study to ensure that all possible combinations of the stimulus features were adequately counterbalanced. These features were (see also below): sentence structure (three levels), voice type (two levels), correctness (two levels), and emotional expressions (three levels), resulting in thirty-six conditions combinations. In line with prior studies31,41, the presentation sequence of these features was counterbalanced for each participant. Thus, the order in which participants encountered different combinations of sentence structure, voice type, correctness, and facial expression varied across individuals. For example, if one participant was presented with a particular sentence structure along with a specific voice and a neutral face, the next participant would be presented with a different, randomly generated sequence of these stimulus combinations. This approach was intended to mitigate any systematic influence of presentation order on the results. No participant was excluded from the sample.

Design and stimuli

The study used a within-subjects design in which Emotional Expression (three: happy, neutral, and angry) and Correctness (two: correct and incorrect sentences) were manipulated.

The linguistic material consisted of two-hundred and forty sentences in Spanish with three different structures elaborated from previous studies (see Table 1 for more details; see also54). According to the structure of the sentence, the critical word (gender or number agreement) might be an adjective (structure one) or a noun (structures two and three), and it was pseudo-randomly shuffled between sentences. The set of sentences was spoken with neutral prosody by two different female and male voices. The length of target words varied between two and five syllables, and linguistic characteristics such as word frequency, concreteness, imageability, familiarity, and emotional content were controlled by presenting every word in each voice across all experimental conditions. Some examples of the linguistic material are provided in Table 1 (critical words are highlighted in bold):

Table 1 Examples of sentences presented in this study.

The participants listened to sentences that could contain morphosyntactic anomalies while viewing emotional facial expressions masked with a scrambled face. The scrambled version was elaborated by using 30 × 40 matrices in Adobe Photoshop© from a neutral face. This control stimulus keeps most physical characteristics intact while keeping the facial features unidentifiable. The facial stimuli were obtained from validated datasets, namely, the Chicago Face Database55 (CFD) and the Warsaw Set of Emotional Facial Expression Pictures56 (WSEFEP). Each emotional facial expression was presented for 16 ms masked with the scrambled face. All facial stimuli were processed in Adobe Photoshop© with the aim of normalizing several parameters (grayscale, black background, luminance, and facial proportions). In total, 240 sentences (half of them incorrect) along with one hundred and twenty different facial stimuli according to each emotional expression were presented to each participant. Thus, each sentence (both correct and incorrect) was counterbalanced by each facial expression (happy, neutral and angry).

Procedure

Participants were first informed that they would be participating in a study on syntactic processing in the human brain. In the lab, they were informed that their task was to listen to spoken sentences, some of which would contain syntactic anomalies (e.g., number or gender agreement or disagreement), while viewing a visual stimulus on the monitor (i.e., this was the scrambled stimulus). Participants were instructed to judge whether each sentence was grammatically correct or incorrect (i.e., grammaticality judgment task) by pressing one of two designated buttons after the end of each sentence. Participants were also asked not to blink while the sentences were presented, if possible. A small set of practice trials (5–10) was conducted before the EEG recording. These practice trials were excluded from the experimental material.

As shown in Fig. 1, the procedure was as follows: a blank appeared 500 ms after the onset of the fixation cross, followed 300 ms later by the scrambled stimulus. The scrambled stimulus was then replaced by the emotional face, presented for 16 ms, after a randomized interval of 100–1700 ms. The scrambled stimulus then reappeared until the end of the sentence. Following this, two blanks (300 ms each) and a fixation cross (500 ms) were presented before the windows response (1500 ms). The alternatives (correct or incorrect) were displayed on either side of the screen corresponding to the index and middle fingers, respectively. The response button was counterbalanced across participants.

Fig. 1
figure 1

Schematic representation of the procedure. The facial expressions were masked by the scrambled stimuli. Along the audio presentation of the sentence, the critical word appeared 16 ms right after the masked face. Faces are blurred for anonymity purposes.

Following the EEG recording session, participants carried out a visibility task in order to test the degree of awareness of the masked faces. This task consisted of 40 trials that were identical to the experimental procedure of the EEG session, but they were asked to respond if they detected anything beyond the visual (scrambled) stimulus and communicate to the experimenter what they saw. This task is a subjective measure of visibility57, and it has been employed successfully in previous work using masked adjectives10,30 and facial stimuli41. Moreover, previous research using a forced-choice questionnaire to assess the subliminal perception of complex visual stimuli has demonstrated that conscious awareness is almost non-existent with a presentation of 16 milliseconds58. According to our visibility task, sixteen participants reported detecting the shape of a face, but none of them (including the rest of the participants) declared to be able to recognize the facial stimuli (in terms of expression or identity). In fact, all participants were amazed by the explanation that they saw the facial stimuli corresponding to happy, neutral, and angry expressions during the EEG experiment.

Regarding the timing of the experiment, the EEG session started with an EEG setup, which took approximately 30–45 min per participant. Subsequently, participants completed the experimental task, lasting around 25 min. To mitigate potential fatigue effects, the experimental session included three breaks (with one break after completing 60 sentences). Participants could resume the task at their own pace (break durations typically averaged around 5 min). Finally, the visibility task lasted around 10 min. This timeline is consistent with previous EEG experiments conducted in our lab31,41.

EEG recordings and analysis

Continuous EEG was registered using 59 scalp electrodes (EasyCap; Brain Products, Gilching, Germany) following the international 10–20 system. EEG data were recorded with a BrainAmp DC amplifier at a sampling rate of 250 Hz with a band-pass from 0.01 to 100 Hz. All scalp electrodes plus the left mastoid were all referenced to the right mastoid during the EEG recording, and then re-referenced off-line to the average of the right and left mastoids. The impedance of all electrodes was kept below 5 kΩ. The ground electrode was located at AFz. Eye movements were monitored using two vertical (VEOG) and two horizontal (HEOG) electrodes placed above and below the left eye and on the outer canthus of both eyes, respectively.

EEG preprocessing data were analyzed with the software Brain Vision Analyzer® (Brain Products). Raw data were filtered offline with a band-pass of 0.1–30 Hz and subsequently segmented into 1200 ms epochs starting 216 ms prior to the onset of the critical word. Baseline correction was applied from -216 to -16 ms relative to the onset of the critical word. Both incorrect and omitted responses were excluded from the analyses. Trials exceeding a threshold of 100 microvolts (μV) in any of the channels were automatically rejected. Common artifacts (eye movements or muscle activity) were corrected through infomax independent component analysis59 (ICA). After preprocessing EEG data, each condition was exported to Fieldtrip60, an open-source toolbox of Matlab (R2021b, MathWorks, Natick, MA, USA), for further analyses.

Cluster-based permutation tests

Cluster-based permutation tests were conducted to statistically evaluate the data obtained from the ERP using functions implemented in Fieldtrip60. The significance probability is computed from the permutation distribution using the Monte-Carlo method and the cluster-based test statistic61. The permutation distribution was formed by randomly reassigning the values corresponding to each condition across all participants 8000 times. If the p-value for each cluster (computed under the permutation distribution of the maximum cluster-level statistic) was smaller than the critical alpha level (0.05), it was considered that the two experimental conditions were significantly different. This statistical test was applied to evaluate the difference between experimental conditions in our design: 3 Emotional Expression (happy, neutral and angry) \(\times\) 2 Correctness (correct and incorrect). To calculate the interaction effects, we first tested for differences between the three conditions using a cluster-based F-test. These three conditions were obtained by subtracting the mean difference between incorrect and correct sentences for each condition (happy, neutral and angry). We then identified specific differences between pairs of conditions by means of cluster-based permutation t-tests. Thus, pairwise cluster-based t-tests were performed (i.e., happy vs. angry, happy vs. neutral, angry vs. neutral) and the critical alpha value was corrected due to multiple comparisons (0.05/3 = 0.016). These analyses included the whole-time window and all channels.

To estimate the magnitude of the effects in the data, both effect size62 (Cohen’s d) and mean difference for an average of all channels during the latency reported by the cluster permutation tests were calculated.

We also explored the main effects of typical components related to face expression perception, as a way of controlling the success of our manipulations, as well as an exploration of these effects under subliminal conditions. This was made by selecting a priori time windows (including all channels) based on previous research63,64: for the N170 component (100–200 ms) and for the EPN component (200–300 ms). Since our linked mastoid reference might attenuate the effects on these components65, the data were re-referenced to the average of all scalp channels only for the statistical analysis of both these components.

Results

Behavioral results

To examine the reaction times, a repeated-measures ANOVA was conducted with the factors Emotional Expression and Correctness. In this regard, the main effect of Emotional Expression was nonsignificant (F(2,70) = 2.518; p = 0.088; \({\upeta }_{\text{p}}^{2}\) = 0.067). However, the ANOVA indicated a significant main effect of Correctness (F(1,35) = 26.436; p < 0.001; \({\upeta }_{\text{p}}^{2}\) = 0.43). Post hoc analysis showed a shorter reaction time for incorrect sentences than for correct ones (\(\Delta\) = -32.71 ±6.36 ms; p < 0.001), as shown in Table 2. In addition, the interaction between Emotional Expression and Correctness was nonsignificant (F(2,70) = 0.557; p = 0.57; \({\upeta }_{\text{p}}^{2}\) = 0.01). As for the response accuracy, which was measured as the percentage of having correctly identified the sentence in terms of syntactic correctness, the ANOVA indicated that the main effect of Emotional Expression was not significant (F(2,70) = 1.995; p = 0.14; \({\upeta }_{\text{p}}^{2}\) = 0.05), while a significant effect of Correctness was found (F(1,35) = 30.129; p < 0.001; \({\upeta }_{\text{p}}^{2}\) = 0.46). Particularly, accuracy was better for correct sentences compared to incorrect ones (\(\Delta\) = -5.432 ±0.99%; p < 0.001). The interaction between Emotional Expression and Correctness was not significant (F(2,70) = 0.141; p = 0.87; \({\upeta }_{\text{p}}^{2}\) = 0.01).

Table 2 Mean values and standard deviation (SD) corresponding to reaction times and accuracy of the participants’ response.

Electrophysiological results

Face-related components

The cluster permutation tests yielded a significant difference of the main effects of Emotional Expression during the N170 time window (F(2,70) = 4.5; p = 0.01). Particularly, a smaller N170 was found in response to angry faces compared to happy (negative cluster: p = 0.007) and neutral faces (negative cluster: p = 0.026), while the difference between happy and neutral faces did not reach statistical significance (p = 0.091). These significant differences were more pronounced over parieto-occipital channels, as depicted in Fig. 2. In turn, no significant differences were observed during the EPN time window, which typically peaks between 200 and 300 ms around parieto-occipital channels (p < 0.14).

Fig. 2
figure 2

Average-referenced waveforms corresponding to the face-related components. Main effects of Emotional Expression (A). Cluster plots for the significant contrasts for the N170 (B). No significant differences emerged between happy and neutral faces. * p < .05, ** p < .01.

Language-related components

The cluster-based permutation tests first indicated a significant difference between incorrect versus correct sentences for each emotional expression. These effects were first associated with the LAN when happy (negative cluster: p < 0.001; \(\Delta\) = − 0.49; Cohen’s d = -0.25), neutral (negative cluster: p = 0.010; \(\Delta\) = − 0.93 ; d = -0.44), and angry faces were displayed (negative cluster: p < 0.001; \(\Delta\) = − 1.49 ; d = -0.58), according to their latency and topographical distribution (see Fig. 3A). Notably, they consisted in a large, long-lasting negativity, particularly observed for angry (between 220 and 960 ms approximately) in contrast with happy (380 – 860 ms approximately). For neutral faces, it was 550–850 ms approximately. Thereafter, the P600 component emerged in response to morphosyntactic anomalies preceded bay happy (positive cluster: p = 0.003; \(\Delta\) = 1.99 ; d = 0.71), neutral (positive cluster: p < 0.001; \(\Delta\) = 2.18 ; d = 0.93), and angry faces (positive cluster: p = 0.015; \(\Delta\) = 1.93 ; d = 0.57). As shown in Fig. 3B, the latency of the P600 component was similar between emotional expressions (840 – 1200 ms approximately).

Fig. 3
figure 3

Grand average of LAN (A) and P600 (B) waveforms and their topographical distributions when comparing morphosyntactic correctness for each emotional expression. Note that the ERP were time-locked to the onset of the subliminal face, the critical word starting 16 ms later. Box plots for the interaction effects between Correctness and Emotional Expression during the LAN (C) and the P600 (D) time windows. * p < .05, ** p < .01, *** p < .001.

After observing both LAN and P600 components in response to each emotional expression, a cluster-based F-test was conducted (including whole-time window and all channels) to examine whether there was an interaction effect between Emotional Expression and Correctness. This analysis showed a significant effect (F(2,70) = 3.66; p = 0.031). Subsequently, cluster-based permutation t-tests were performed to assess the specific differences between pairs of conditions (alpha corrected by the number of comparisons: 0.05/3 = 0.016). Relevant to our aims, this analysis showed a larger LAN amplitude for angry faces in comparison with happy (negative cluster: p = 0.010; \(\Delta\) = − 1.43 ; d = -0.40) and neutral faces (negative cluster: p = 0.014; \(\Delta\) = − 1.10 ; d = -− .42). Additionally, a larger LAN amplitude was found in response to happy faces compared to neutral faces (negative cluster: p = 0.010; \(\Delta\) = − 1.07 ; d = − 0.46; latency: 400 – 470 ms), as can be observed in Fig. 3C. As for the P600 effects, the analysis showed that a smaller P600 amplitude was observed for angry faces relative to neutral faces (positive cluster: p = 0.012; \(\Delta\) = 0.28; d = 0.11), while nonsignificant differences were found between angry and happy faces (positive cluster: p = 0.39; \(\Delta\) = 0.04 ; d = 0.01), as well as between happy and neutral faces (positive cluster: p = 0.11; \(\Delta\) = 0.33 ; d = 0.11) (see Fig. 3D).

Discussion

This study investigated whether syntactic processing can be affected by masked visual stimuli of emotional expressions. Typical morphosyntactic-related ERP components (LAN and P600) along all emotional expressions though, interestingly, these components varied as a function of the specific masked emotional expression, supporting an impact of the latter on syntactic processing. Namely, the results revealed a long-lasting negativity (LAN effect) at frontocentral electrodes larger to angry expressions compared to other expressions. Similarly, a larger LAN amplitude was observed for happy expressions relative to neutral ones. These findings show that first-pass syntactic processing can be biased by emotional information even under reduced levels of visual awareness, as shown by instances of angry and happy expressions. Furthermore, a reduced P600 effect was found for angry expressions than for other expressions. Taken together, these results support interactive accounts of language comprehension43,44,66,67 which in general terms suggest that different types of information (phonologic, syntactic, semantic, and contextual), including emotionally relevant cues, are processed in parallel and can rapidly influence each other to reach an overall interpretation. From this perspective, language processing seems to unfold in continuous interaction with affective and cognitive processes1,2,3,30,31,40,41.

Threat-related signals, such as anger or fear, might show a prioritized access to cognitive resources, as they are key to individuals’ survival and protection,68,69,70 (see also71). Besides its phylogenetic and adaptive value, past research has shown that threat-related stimuli are powerful cues for visual attention72,73,74, even without being consciously perceived75,76. The experimental manipulation of this study provides further evidence of the neural correlates of emotion to facial expressions when perceptual levels of visual awareness are reduced, which is a matter of current debate in the literature64,77,78). The data obtained here add evidence to this line of research by showing a smaller N170 amplitude in the presence of angry faces relative to other facial expressions under masked conditions. This emotion-selective response might reflect an early detection of threat-related signals, relative to other emotional content, that is needed to respond efficiently to potential threats in the social environment. By contrast, an increased N170 amplitude to happy expressions was found, which is in line with past research (for a recent review, see64). The observation of these face-related ERP modulations in response to our masked emotional stimuli suggests an early access to emotional processing, occurring irrespective of morphosyntactic processing. In addition, no differences were found for the EPN component, probably due to the brief presentation of subliminal stimuli. Likewise, it should be noted that participants were performing a grammaticality judgment task during the processing of facial expressions.

Notably, this study provides novel evidence that emotional facial expressions can modulate first-pass syntactic processing under subliminal conditions. In this regard, the patterns observed here extended prior work. For instance, Martín-Loeches et al.37, manipulating the emotional valence in adjectives during sentence comprehension, observed an increase in the LAN amplitude for negative words and a decrease in the LAN amplitude for positive words, as compared to neutral ones. Similarly, other LAN modulations in response to emotional information have been reported in the literature10,30,33,35. Accordingly, and taking into account that the LAN component is often considered to reflect an early detection of a morphosyntactic anomaly7,79 (see also11,13), the LAN increase for angry expressions could be linked to summoning more processing resources during first-pass syntactic parsing. This interpretation is consistent with the notion that negative content typically involves more processing resources (i.e., negativity bias) relative to neutral information37,80 (see also81). It is also plausible that this negativity bias triggered a shift in the processing strategy used for syntactic agreement violations, favoring a more analytic and rule-based approach, which is in line with the nature of early syntactic processes15,28,30,79. Consistent with this, prior research has interpreted modulations of syntactic processing in response to emotional information as reflecting a similar shift towards analytic and rule-based strategies when encountering syntactic agreement violations33,39 (for a review, see1), as opposed to more heuristic or associative processing styles, which, in turn, may trigger N400-like effects10,30,34. Collectively, this evidence clearly contrasts with proposals that syntactic processing –especially when it comes to earlier stages– is encapsulated and largely unaffected by information provided by other perceptual and cognitive systems. This further calls into question that the LAN component reflects a modular process, suggesting more flexible and context-sensitive processes30,41.

Our results showed that the late stages of syntactic operations can also be modulated by masked facial expressions of anger, as indexed by the P600. Our finding is in general agreement with past research that shows less P600 amplitude for negative than for positive conditions32,38. Indeed, similar morphosyntax-related ERP patterns have been previously reported in the literature. For instance, Rubianes et al.41 using the same spoken sentences but presenting facial identities (self, friend, and unknown faces) instead of emotional expressions, also found a larger LAN amplitude followed by a reduced P600 amplitude only in response to self-faces, while a larger LAN amplitude with no reduction of the P600 amplitude was observed for friend faces as compared to unknown faces. This might indicate that both emotional and self-relevant information concurring with language processing would capture processing resources in the service of rapidly adapting the behavior to environmental conditions82,83. Besides, considering the brief presentation of facial stimuli (16 ms) and the masking procedure, these data indicate that both socially and emotionally relevant stimuli may influence syntactic processing without the need for explicit recognition, most likely engaging low-level attentional capture and bottom-up mechanisms. This suggests that even subtle or briefly presented affective stimuli can influence ongoing cognitive processing and decision-making, even under reduced levels of perceptual awareness.

It should be noted that we presented neutrally-valenced spoken sentences, which implies that facial expressions of anger could not capture attentional resources towards sentence parts with negative valence. Specifically, we interpreted our main result –a larger LAN effect followed by a reduced P600 effect in response to facial expressions of anger– as reflecting the increased processing resources during first-pass syntactic parsing (LAN), reducing in turn the need for subsequent syntactic reanalysis or repair processes (P600). Similar biphasic patterns have been observed in the literature, suggesting that less reanalysis/repair processes are necessary to successfully resolve the morphosyntactic mismatch when the earlier stages have summoned increased processing resources34,41,84. In that vein, prior research also suggests that this biphasic pattern might reflect a more efficient syntactic processing30,33,36,85, which would fit tentatively in our study with the notion that threat-related signals are important for survival. We consider, however, that interpretations of facilitatory effects should be supported by corresponding behavioral measures, such as shorter reaction times or higher accuracy rates. Due to the absence of such behavioral effects in our study, we have refrained from interpreting this biphasic pattern as indicative of facilitation, focusing instead on the mobilization of processing resources during early and late syntactic processes, which addresses our main research question.

It is also worth noting that a larger LAN amplitude was also observed in response to happy faces as compared to neutral ones, with no differences in the P600 amplitude. This result might reflect that first-pass syntactic parsing was modulated by emotionally arousing (i.e., both happy and angry) compared to neutral (non-arousing) content, even if emotional valence may have differential effects. Comparable results can be found in the extant literature. For instance, a study conducted by Espuny et al.33, in which they presented emotion-laden words with different emotional valences (equated in arousal) preceding neutral sentences that could contain morphosyntactic anomalies, observed a larger LAN amplitude for both positive and negative conditions relative to neutral ones. This pattern was consistent with Jiménez-Ortega et al.86. Even though, in these studies33,86 no differences were found between positive and negative conditions during the LAN window, which suggests that the observed effects were modulated by arousal. In our study, however, a larger LAN amplitude was observed for angry expressions compared to happy faces, presumably due to emotional valence. Notwithstanding, our study cannot fully elucidate the independent contributions of valence and arousal due to the constraints of the databases used (Chicago Face Database and the Warsaw set of emotional facial expression pictures), which are based on emotional categories (e.g., anger, happiness) rather than quantitative valence and arousal ratings. Hence, further research is needed to disentangle the effects of valence and arousal from affective subliminal visual stimuli on syntactic processing.

In sum, the data presented here seem to indicate that language processes may unfold over time in continuous interaction with other perceptual, affective, and cognitive processes, which is in line with an interactive view of language processing1,2,43,44,66,67. More broadly, this notion is also consistent with prior accounts that suggest that cognition and emotion are intertwined in the brain, as opposed to reflecting isolated processes3,4,5.

Several limitations apply to the present study. One is the near-simultaneous presentation of visual (face) and auditory (critical word) stimuli, with only a 16 ms difference. Consequently, we cannot strictly isolate the effects of the face presentation itself from auditory inputs. Nevertheless, this temporal overlap was necessary to directly investigate our main research question, which focuses on the interplay between facial expressions of emotion and syntactic processing. Another limitation of our study is the lack of facilitatory effects observed from the behavioral data. In that vein, any potential facilitation of syntactic processes driven by emotional expressions would ideally manifest in observable behavioral advantages (e.g., shorter reaction times or higher accuracy rates). The absence of behavioral effects might be due to possible different factors such as the delay in participants’ response time, ERP procedural constraints, or the response window, which was adjusted after the end of the sentence in our task, thereby obscuring any subtle differences between conditions. It is important to note, however, that this is the case in most studies in the field30,31.

A critical question for future research is whether social and emotional information facilitates or impedes language processing. More broadly, it is necessary to elucidate the mechanisms by which emotional processes and language comprehension mutually influence one another. Current models of language comprehension do not yet adequately describe the precise mechanisms underlying this interaction7,43,44,67. For instance, it remains unclear whether non-linguistic (e.g., emotional faces) and linguistic (e.g., emotion-laden words) information shares the same mechanism influencing core language components such as phonology, syntax, or semantics (for a recent discussion, see87,88,89). To this end, cross-domain studies might prove valuable90, possibly leading to different ERP patterns and accounting for mixed findings in the extant literature.

To conclude, this study shows that emotional facial expressions can be decoded under reduced levels of awareness (i.e., subliminally). On the modularity of syntax, the data obtained here indicate that both first-pass and late syntactic processes can be affected by affective visual stimuli even if the latter is out of awareness, as reflected by the mobilization of processing resources. These results support an interactive view of language processing in which the processing of emotional and morphosyntactic features may interact during the early and late computation processes of agreement dependencies. Altogether, syntax is sensitive to subtle emotional signals and supports the depiction of language as integrated within a more complex, mingled, and integrated system devoted to human communication.