Abstract
Social identity threat (SIT) is a situational stressor that increases arousal and negative affect, biases memory encoding towards domain-specific negative affect, and impairs women’s performance in contexts where they are outnumbered by men. One consequence of these effects could be that women develop learned aversions towards stigmatized domains in Science, Technology, Engineering or Mathematics (STEM). Four studies tested whether stereotypic STEM images (STEMIs) prompt aversive-like responses that predict SIT-like outcomes, including underperformance in SIT contexts and more negative SIT-oriented memories over time. Using a dot-probe paradigm, Study 1 found that only SIT women exhibited greater arousal responses to STEMIs compared to stereotypic non-STEM images (NonSTEMIs), perceived STEMIs as more negatively arousing compared to men, and underperformed; men in this context showed a similar arousal response to STEMIs and NonSTEMIs and performed better. Study 2 replicated this effect among women in STEM majors and linked aversive responses to more negative affect laden memories for the STEM lab experience five weeks later. Using EEG, Study 3 found that enhanced processing of STEMIs presented during an attentional blink task (indexed via increased communication between occipital and prefrontal cortical regions) predicted underperformance on a math test among SIT women but marginally better performance among men. Study 4 mitigated SIT underperformance effects among women utilizing a dot-probe training paradigm that blunted arousal responses to STEMIs; instructing men to attend to STEMIs facilitated their performance. STEM aversions may thus facilitate SIT-like effects, possibly defining what the “threat” in SIT is, however, blunting aversions may attenuate these effects when women work alongside men in STEM performance situations.
Introduction
Beginning around middle school, girls and women leave Science, Technology, Engineering, and Math (STEM) fields at higher rates than their male counterparts1–3. Consequently, women in STEM often find themselves outnumbered by men, which past research suggests are contexts that may prime negative stereotypes about women’s inferior math ability4, 5. The fear of confirming such stereotypes,labeled social identity threat (SIT) and its performance-based derivative, stereotype threat6, 7, can lead to physiological stress, reduced working memory capacity, underperformance, and, ironically, confirmation of the stereotype stigmatized individuals are motivated to disprove (Steele, 1997)8.
Although the impact of men-dominant STEM contexts on women’s performance is well documented, less research examines how being placed in identity threatening contexts over time, i.e., being continuously exposed to negative stereotypic contexts or STEM images (STEMIs) that prime the male-dominant nature of these contexts, may prompt psychological and physiological responses (i.e., SIT-consistent responses) that help perpetuate the negative stereotype itself. These images, depicting men in stereotypical STEM settings, may not only reinforce a sense that women do not belong6, 9, they may be aversive to women at a basic level. That is, they may evoke a negative emotional reaction that facilitates an attentional bias towards category-relevant imagery, not unlike a snake phobic might elicit an aversive response towards images of snakes. These responses may have downstream consequences on how memories in these contexts are formed, as well as performance, much like any biological stressor affects these processes in general10–13.
The current studies utilized well-established cognitive, clinical, and neuroscience methodologies to assess women’s aversive responses to STEMIs. We posited that although men and women may both exhibit arousing responses to these images (whereas men may be positively aroused by STEMIs given their positive stereotypic connotations, women would find STEMIs negatively arousing or aversive), this arousal response will result in SIT consistent effects, such as negative performance outcomes and negative affective memories, for women only.
Identity threatening images may be aversive and undermine performance
Past research suggests that subtle situational cues can signal to stigmatized individuals, including women in STEM fields, that they do not belong in a domain in which their group is negatively stereotyped4, 6, 14. One situational cue that is a powerful prime for these effects are contexts where men outnumber women. These contexts can elicit physiological arousal responses, anxiety, and negative affect among women5, 13, 15,16,17,18,19,20. Given the link between cue exposure and physiological arousal and anxiety, it stands to reason that men dominant STEM contexts are threatening and anxiety provoking in a physiological sense.
To date, it is unclear why identity threatening contexts (or primes representative of them) are physiologically arousing and anxiety provoking. One possibility is that the physiological arousal, stress, and anxiety experienced in identity threatening contexts initially alters how memories for these experiences are encoded and then later recalled in similar contexts to evoke congruent affective, domain-specific responses over time. Research on emotional memory shows that heightened arousal can bias encoding and recall of negative experiences21. These memories tend to be more vivid and enduring compared to memories devoid of emotion. They also can be spontaneously recalled in contexts that evoke similar emotions and can prompt the re-experience of a given emotion, possibly outside the conscious awareness of an individual22, 23.
Evidence for emotional memory processes have been found in SIT contexts. Women placed in SIT contexts exhibited dramatic fluctuations in startle responses (operationalized as indices of amygdala activity, a harbinger for emotional memory encoding processes) in response to stereotype consistent information received throughout a supposed diagnostic math test13. Importantly, startle responses (i.e., amygdala activity) indirectly predicted emotion-memory based encoding of negative (but not positive) information, and decreased perceptions of math ability. If women in SIT contexts exhibit biased encoding of negative stereotype consistent information, and similar affective states or contextual cues can prime recall of these memories on a continual basis (strengthening them over time in the process24, 25), it’s possible that SIT contexts or cues could engender SIT like responses spontaneously and unconsciously.
If SIT contexts (or images representing them) trigger negative affect and arousal through spontaneous recall of negative past experiences, in line with basic associative learning processes, these responses may become paired with those contexts over time. Thus, when individuals are cued with either identity threatening contexts, or images representative of them, learned aversive responses may be primed that stem from the recall of domain-congruent negative affective memories, in addition to a learned affective response. These responses could then undermine performance, elicit negative affect, and memories of future experiences much like any biological stressor interferes with performance, executive function, or biases encoding processes8, 13, 26, 27. This dynamic establishes a cycle in which initial stress and underperformance bias memories of that experience, fostering negative affect and underperformance in similar environments. Over time, such learned aversions can become fortified. SIT contexts may thus render stereotyped images inherently aversive for stigmatized individuals, because these cues facilitate the retrieval of negative domain memories and associations. The present studies tested this proposition by examining how aversive responses to STEMIs relate to negative affective memory recall and performance in SIT contexts.
Overview of studies
Four studies integrate cognitive, clinical, and neuroscience methods to test whether STEMIs are aversive to women in SIT contexts, and whether such aversions are associated with other common SIT outcomes like negative affective recall and underperformance (Fig. 1). Study 1 used a dot-probe paradigm (commonly used to assess attentional biases toward threat) to test whether images of men-only STEM contexts (vs. women-only non-STEM images) elicit unique arousal biases for women in SIT contexts. Study 2 examined how STEM aversion relates to long-term SIT outcomes in male and female STEM majors. Study 3 utilized brain-imaging methods to explore rapid neural responses to STEMIs and their link to underperformance using another cognitive task designed to assess attentional bias, also referred to as emotion induced blindness. Finally, Study 4 tested an intervention that blunts women’s aversive responses to STEMIs and examined whether blunting aversion reverses typical SIT performance decrements.
Examples of (a) STEMI and (b) NonSTEMI stimuli used in all dot probe and attentional blink tasks reported in each study (Panel A) and an overview of the various methods and procedures employed for each study (Panel B).
Study 1 methods
In Study 1, men and women were randomly assigned to one of two conditions: a “diagnostic math test” (DMT), which primes SIT for women, or a “problem-solving task” (PST), a stereotype-neutral context. After completing an adaptive math task, participants performed a dot-probe task featuring STEMIs (men-only STEM settings; Fig. 1) versus NonSTEMIs (women-only non-STEM settings; Fig. 1). The dot probe task was used given its well established link to arousal and aversion as indexed via attentional bias towards stimuli of interest. Aversive responses to visual cues (threatening compared to non-threatening) have been widely studied in clinical and social psychology. This work consistently demonstrates that anxiety and fear promote vigilance for threat-relevant stimuli which results in faster reaction times to these stimuli28,29,30,31,32. Similar anxiety-related attentional biases have been found in SIT contexts via a dot-probe paradigm18. Thus, the dot probe task provides an ideal means to measure the extent to which women may find identity-threatening images related to stereotypic STEM contexts as aversive (i.e., as a function of the attentional bias exhibited towards these images). Participants also rated how positively or negatively arousing these images would be for men and women in general, allowing us to gauge whether any observed attentional bias stemmed from a more negative aversion (for women) or positive affinity (for men).
We hypothesized that women in the DMT (SIT) condition would show faster responses (i.e., heightened arousal/attentional bias) to STEMIs compared to NonSTEMIs and compared to women in the PST condition. Men in the DMT condition might also show arousal toward STEMIs but given the link between DMT and positive performance outcomes for men (i.e., stereotype lift33, we presume that women would rate STEMIs as more negatively arousing than men overall.
Complete methods, results, and sample justifications for all studies can be found online in the supplementary methods and results (SMAR). All studies were approved by the University of Delaware IRB (473669-1). Written informed consent was obtained from all participants in all studies, and all research was performed in accordance with relevant guidelines and regulations. All studies reported in this paper also implemented several tactics to maximize power and validate all measures via conducting sensitivity analyses, using validated stimuli and methods, and presenting stimuli and questions in random order. See the SMAR for specifics. Materials for all studies are located on the Open Science Framework: https://osf.io/54pum/.
Participants
113 Caucasian individuals participated in Study 1 in exchange for $10/hour. Fifty-eight participants (37 women, 21 men) were randomly assigned to the diagnostic math test (DMT) condition while 55 participants (31 women, 24 men) were assigned to the problem-solving task (PST) condition. All participants had knowledge of the stereotype that men are better than women at math (evidenced by responding 3 or lower on the question “Regardless of what you think, what is the stereotype that people have about women and men’s math ability, in general?” where 1 = men are much better than women and 4 = men and women are the same). We aimed for comparable numbers of participants in each cell and numbers consistent with past research utilizing dot probe procedures (e.g., Reese et al.34. Our stopping rule was to reach between 110 and 120 participants. A sensitivity analysis on G* Power revealed that 113 participants would be sufficient to capture a medium to large size effect (Cohen’s f = 0.32) with an alpha of 0.05 and power of 0.80 using four groups.
Procedure
As part of a larger study, participants were seated in their own sound-dampened chamber in front of a computer screen (they were also prepped for EEG recording, however, only behavioral findings are reported here in light of the primary research question for this study). This study directly manipulated SIT via instructions and gender composition of research assistants as utilized in previous laboratory studies13, 35. To manipulate SIT, participants in the DMT condition were told they would complete a math task diagnostic of their “math intelligence”, they recorded their gender on a questionnaire prior to the task, and at least one male research assistant was present during the session. Participants in the PST condition were told that they would be completing an experimental problem-solving task that was diagnostic of the type of problem-solving strategies they prefer, they were asked what year in college they were, and only women research assistants were present when women completed tasks in the PST condition. Participants then completed an adaptive math task, a dot-probe task that measured STEM aversion/affinity, and a manipulation check. Participants were invited for a second session of the study (that took place one week after the first visit) to complete ratings on the images they were exposed to during the initial visit.
Math feedback task
Participants completed a 15-minute math task consisting of standard multiplication and division problems (e.g., 10 × 20 =) that initial pilot tests confirmed varied in degree of difficulty (easy, medium and hard, ensuring all participants would solve problems correctly and incorrectly13. During each trial, three possible answers were shown; participants had up to 16 s to respond. Each correct or incorrect response was followed by 2 s of performance feedback. Failure to respond within 16 s resulted in negative feedback. On average, participants solved about 53 problems (SD = 2.90).
Dot probe task
The dot-probe task36, consisted of a brief practice phase followed by a testing phase. The testing phase consisted of 128 trials (64 critical, 64 filler). In each trial, a crosshair was presented in the middle of the screen as a fixation point (1000ms). On the critical trials, the fixation point was then replaced by one STEMI and one NonSTEMI presented on either side of the crosshair for 500ms. In the filler trials, two STEMI and non-STEMIs images were presented on either side of the crosshair for 500ms. After 500ms a small dot appeared where one of the images was previously located. The dot remained on the screen until participants responded by pressing either the “9” key on the keypad when the dot was located on the right side of the screen or the “1” key on the keypad when the dot was located on the left side of the screen. Reaction times to identifying the location of the dot were used as our primary dependent variable. More arousing images (e.g., STEMIs) prime an attentional bias that yields faster reaction times when the dot replaces that image in comparison to the other image36; although note that more recent demonstrations of the reliability and validity of the dot-probe paradigm have been yoked to using response-based computation procedures and extended testing protocols37– 41.
All STEMIs contained only men in STEM settings (e.g., men standing around a laboratory bench wearing laboratory coats; Fig. 1). All NonSTEMIs depicted only women as the focal point in stereotypic academic settings (e.g., a woman professor teaching a class of students; Fig. 1). We held stereotypicality of gender in academic settings constant across the sets of images. As such, we were able to rule out the possibility that images that highlight where women do belong in academic settings (according to role-based gender stereotypes) might also be aversive to women because they unintentionally signal that they do not fit the stereotypical profile for STEM settings42,43,44.
Manipulation check
Participants were asked to rate the extent to which “I am concerned the researcher will judge me based on my performance on the task” via a seven-point Likert scale (1= “Strongly Disagree” and 7 = “Strongly Agree”17.
Image rating task
In the second session of this study, participants rated how they believed hypothetical men or women would feel in the situations portrayed in the images used in the dot-probe task on three separate items (‘how worried would a man/woman feel about fitting in here?’, ‘do you think a man/woman would fear being perceived as less capable in this situation?’, and ‘How comfortable would a man/woman feel in this situation?’). Participants answered these questions on a 1 to 7 scale (1 = more positive perceptions, 7 = more negative perceptions). It is important to note that we asked participants to rate how a hypothetical man or woman would feel in these scenarios as opposed to how the men or women targets in each image would have felt. Due to high inter-item reliability among the three items (woman-target STEMIs =0.97; woman-target NonSTEMIs =0.94; man-target STEMIs =0.90; man-target NonSTEMIs =0.93), a mean composite perceived arousal score was generated where higher numbers indicated perceived increased negative arousal among men or women targets in STEMIs or NonSTEMIs.
Study 1 results
Note that full statistical details, exclusion criteria, and additional analyses (e.g., marginal effects) can be found in the SMAR.
Manipulation check
Three participants (1 PST woman, 2 PST men) did not finish due to time constraints and were excluded. A 2 (Gender: Women, Men) × 2 (Condition: DMT, PST) ANOVA on participants’ concern about being judged by the researcher17 yielded a main effect of gender, F(1,106) = 5.70, p < .02, η2 = 0.05. Overall, women (DMT: M = 3.49, SD = 1.81; PST: M = 4.10, SD = 1.79) expressed greater evaluative concern compared to men (DMT: M = 2.67, SD = 1.35; PST: M = 3.23, SD = 2.20), regardless of condition. No other effects were significant (p’s > 0.10). Since participants in both conditions were exposed to STEM images before completing the measure, it is possible this exposure may have primed evaluative concerns in women overall.
Math feedback task
One PST woman did not complete the math task. Because of the repeated measures nature of the math feedback task, and to account for the within subject variability, performance data was analyzed using Generalized Estimating Equations45, 46. We specified a logit model given that the outcome variable is binary (correct coded 1; wrong coded 0) and an unstructured working correlation matrix (Fitzmaurice, Laird, & Rotnitzky47. We included the main effects of condition, gender, and the two-way condition by gender interaction as predictors. The main effect of gender was significant, B = 0.19, (SE = 0.04); Waldχ2(1) = 23.77, p < .001 (95% Wald LL CI = 0.112; UL CI = 0.264), indicating men were more likely than women to answer problems correctly overall. No other effects emerged (p's > 0.97). Thus, women in both conditions showed somewhat lower math performance, consistent with SIT-like effects.
Image ratings
Twenty-two participants did not complete the image rating task and thus were excluded from these analyses (8 DMT women, 6 PST women, 3 DMT men, 5 PST men; one other DMT man had no men-oriented STEMI ratings but was considered in the analysis as he had all other image ratings). To determine whether there were gender differences in the perceived valence of the STEMI and NonSTEMI dot probe images a 2 (Gender: Men, Women) x 2 (Condition: DMT, PST) x 2 (Image Type: STEMI, NonSTEMI) x 2 (Question Orientation: Hypothetical men targets, Hypothetical women targets) mixed factors ANOVA was conducted on participants’ image arousal composite ratings. Pairwise comparisons used a Bonferroni adjustment to control for multiple comparisons. A three-way interaction between gender, image type, and question orientation (F(1,86) = 6.00, p < .02, η2 = 0.07; Fig. 2) revealed that women participants rated hypothetical women targets in STEMI contexts as more negatively aroused compared to men participants (p = .002). Both women and men participants rated women STEMI targets as more negatively aroused than men STEMI targets (p < .001), and women STEMI targets as more negatively aroused than women NonSTEMI targets (p’s < 0.001). This difference was not statistically significant in relation to NonSTEMI images or for hypothetical men targets in either image type (p’s > 0.53), although interestingly, men NonSTEMI targets were perceived as more negatively aroused than women NonSTEMI targets (p’s < 0.001).
STEMI/NonSTEMI arousal ratings for hypothetical men (male oriented) and women (female oriented) targets among men and women participants in the DMT and PST conditions. STEMIs were rated as more negatively arousing for women targets compared to all image-target combinations.
Overall, STEMIs were perceived as more negative for women (hypothetical female targets), but more positive for (hypothetical) male targets. Thus, these findings provide initial evidence that STEMIs were perceived as more negatively arousing among women but more positively arousing among men.
Dot-probe task
Four participants (3 women, 1 man in DMT) were excluded as outliers in an initial Grubbs test48. A 2 (Gender: Men or Women) x 2 (Condition: DMT or PST) x 2 (Image Type: STEMI or NonSTEMI) mixed factors ANOVA with repeated measures on the latter variable was conducted on reaction times for STEMIs and NonSTEMIs. Among findings of import, there was a main effect of participant gender, F(1,102) = 11.16, p <.01, indicating that women had slower reaction times to the images overall compared to men. There was also a condition × image type interaction, F(1,102) = 5.39, p <.03 (Fig. 3). Individuals in the DMT condition showed faster reaction times to STEMIs compared to NonSTEMIs, relative to individuals in the PST condition.
STEMI dot-probe reaction times. DMT contexts exacerbate arousal responses to stereotypic STEMIs.
Given the main effect for gender, the interaction for condition, and our a priori hypotheses, additional exploratory comparisons were conducted. Simple effect analyses using a Bonferroni adjustment to control for multiple comparisons indicated that only women in the DMT condition showed a significant attentional bias to STEMIs as evidenced by faster reaction times to STEMIs compared to NonSTEMIs (p < .01). This effect was not found among women in the PST condition nor among men in either condition (p’s > 0.14). No differences were apparent within gender for any group (p’s > 0.10). Women in the DMT condition were marginally faster to STEMIs compared to women in the PST group (p = .07). Men in the DMT condition were not faster to respond to STEMIs compared to women in the DMT condition (p = .29) but were marginally faster to respond to NonSTEMIs (p =.06).
These results suggest that diagnostic math test contexts prime heightened attentional bias toward stereotype-consistent STEM images. Image ratings further support that this arousal is likely negative for women but positive (or at least less negative) for men. The attentional bias women (but not men) in the DMT condition elicited in response to STEMIs compared to NonSTEMIs (and marginally more so than women in PST contexts) also suggests STEMI-based aversive like responses are more evident in SIT contexts.
Study 1 discussion
Study 1 provided initial evidence that only women in the DMT condition showed heightened arousal and bias (faster responses) to STEMIs compared to NonSTEMIs, as well as lower math performance compared to men. Although men performed better overall and also showed some arousal to STEMIs in the dot-probe task, this effect was not specific to STEMIs compared to NonSTEMIs and emerged only in the neutral problem-solving (PST) context. Participants broadly rated STEMIs as more negative for women and more positive for men, suggesting that women’s dot-probe responses possibly reflected threat-based arousal, whereas men’s reflected a more affinity-like response. These findings offer preliminary evidence that stereotype-threatening contexts may prime a more comprehensive threat response that renders otherwise seemingly benign imagery to be perceived as particularly threatening and oft-putting. They also provide evidence that a rapid STEM-oriented threat response may be triggered in SIT contexts, making male-dominated STEM imagery threatening. Such threat responses could arise via stressful experiences in negatively stereotyped domains, creating a cycle in which underperformance and negative affect reinforce one another, ultimately biasing future experiences and memories13, 35. This dynamic is consistent with research showing that individuals in negative mood states recall past events as more negative than they may have been49,50,51. Over time, learned aversions and negative memories may intensify, exacerbating stress and performance deficits in these domains. Study 2 attempts to provide initial evidence for a link between learned STEM aversions and more negatively biased STEM memories over time of specific experiences in identity threatening STEM contexts.
Study 2 methods
Study 2 tested whether there was a relationship between threat-based responses (negative affective arousal) and more negative long-term memory recall for specific life events in SIT contexts, processes that are otherwise commonly linked in cognitive and emotion neuroscience22, 23, 49,50,51. Specifically, we assessed whether threat responses elicited to STEMs images during an initial lab visit in a SIT context predicted the extent to which women recalled more negative emotional memories of their visit 1 STEM-related performance 5 weeks later in a similar context. That is, whether the confluence of negative affect and rapid threat-responses elicited during memory encoding at time 1 prompted context-dependent negative affective memory retrieval in a similar environment at time 2. We examined this possibility in a group that is both intimately identified with STEM domains and likely to find themselves in SIT contexts more regularly (and thus are more likely to have pre-existing STEM biases): a group of men and women STEM majors who specialize in fields where women are largely underrepresented.
Participants
As part of a larger study, 107 STEM majors (60 women) visited the lab multiple times over the course of an academic year for $10/hour. Visits were typically five weeks apart and never less than five. All participants were assigned to a DMT condition and had knowledge of the stereotype “men are better than women at math.” We aimed for comparable numbers of participants in each cell and numbers consistent with past research utilizing dot probe procedures34. Three women and one man were excluded as outliers on dot-probe reaction times, and one man was excluded as an outlier on memory variables at Visit 2 (via Grubbs test). A sensitivity analysis conducted for the analysis of the main question of interest (the effect of STEM aversions on negative affective memory recall) revealed that our sample size would be enough to capture a small size effect (Cohen’s f = 0.11) with an alpha of 0.05, power of 0.80.
Procedure
During Visit 1, participants sat in sound-dampened chambers, were prepared for EEG recording, and then received SIT instructions similar to Study 1 (only behavioral findings are reported here). They completed a 34-minute math feedback task that was an extended version of the task used in Study 1. After the math task, participants were prompted to write for five minutes about the thoughts and feelings they were experiencing during the math task. Participants then completed the same dot-probe task used in Study 1, as well as a manipulation check.
Seventy-four participants (42 women) returned approximately five weeks later for Visit 2 and completed the same EEG protocol. They were told they would complete another math intelligence test, replicating the SIT context for women. Before proceeding with other tasks, participants were asked to freely recall their thoughts and feelings about visit 1 (time 1) in addition to answering manipulation check questions. They then completed other tasks as part of a larger study, were thanked and debriefed.
Autobiographical memory encoding
The typed responses from each visit were coded for anxiety, negative affect, and positive affect using Linguistic Inquiry and Word Count (LIWC). Second-session recollections were also coded for anxiety, negative affect, and positive affect via LIWC. These metrics constituted a baseline measure of the potential memories participants encoded while performing in a SIT context (for women); i.e., the thoughts and memories they harbored as they left their first visit. Time 2 LIWC metrics were operationalized as memory recall from Time 1 in a SIT congruent context. Higher numbers represented greater frequencies of emotionally relevant words from each category.
Study 2 results
Manipulation check
An independent samples t-test conducted on the manipulation check revealed unexpectedly that there were no differences on this self-report measure between men and women (p = .71). Both men (M = 3.57, SD = 1.93) and women (M = 3.72, SD = 2.00) STEM majors reported comparable levels of concern that the researcher would judge them based on their performance during their initial visit to the lab.
Performance on math feedback task
Performance data was analyzed using GEE specifying a logit model (correct coded 1; wrong coded 0) and an unstructured working correlation matrix. We included gender as the predictor of this model (men coded − 1, women coded 1). Results yielded an effect for gender on performance, B = − 0.598 (SE = 0.28); Waldχ2(1) = 4.65, p = .031 (95% Wald LL CI = −1.142; UL CI = − 0.054). For every one unit increase in gender, the log odds of getting the question correct decreased by 0.598 units; men were more likely than women to get a question correct on the math feedback task. Thus, consistent with Study 1, these findings suggest that women were exhibiting SIT-consistent performance effects.
Dot probe task
A 2 (Gender: Men or Women) x 2 (Image Type: STEMI or NonSTEMI) mixed factors ANOVA with repeated measures on the latter variable was conducted on reaction times for STEMIs and NonSTEMIs at time 1. This analysis yielded a main effect indicating both men and women were faster to respond to STEMIs (M = 391.82, SD = 57.4) compared to NonSTEMIs (M = 399.41, SD = 68.62), F(1, 101) = 6.57, p < .02, d = 0.50. There was also a main effect for gender indicating that men were faster overall to detect dots compared to women, F(1, 101) = 4.94, p < .03, d = 0.44, suggesting women were solely discriminating between image type. The interaction was not significant (p = .36). Similar to Study 1, planned simple effect analyses using a bonferroni adjustment to control for multiple comparisons found that only women exhibited a bias between image types; they were faster to respond to STEMIs (M = 402.55, SD = 55.21) compared to NonSTEMIs (M = 412.50, SD = 66.13), F(1, 100) = 6.75, p < .02, d = 0.52, on the dot probe task. Men, however, did not exhibit a bias towards STEMIs (M = 378.53, SD = 57.99) compared to NonSTEMIs (M = 383.20, SD = 68.89; p = .27; Fig. 4).
STEMI dot-probe reaction times. DMT contexts exacerbate arousal responses to stereotypic STEMIs among women STEM majors.
Basic LIWC analyses (time 1 and time 2)
To test for basic differences in positive or negative memory encoding (at time 1) and subsequent recall (at time 2) between men and women, a 2 (Gender: Men or Women) by 2 (Valence reported: Positive or Negative) by 2 (Time: time 1 or time 2) mixed factors ANOVA was conducted on LIWC values for positive or negative affective words reported by participants during memory encoding and recall. These analyses yielded a main effect for valence indicating that participants reported more negative compared to positive affective words, F(1, 70) = 15.43, p < .001, d = 0.95. No other effects reached significance (p’s > 0.14). Simple effect analyses using a bonferroni adjustment to control for multiple comparisons found that both men, F(1, 70) = 5.78, p < .02, d = 0.58, and women, F(1, 70) = 5.84, p < .02, d = 0.59, reported greater negative affect (MWomen= 4.46, SDWomen= 3.33; MMen= 5.18, SDMen= 8.50) compared to positive affect (MWomen= 1.91, SDWomen= 2.24; MMen= 2.18, SDMen= 1.94) immediately after the difficult math task at time 1. However, only women reported more negative affect (M = 5.25, SD = 4.51) compared to positive affect (M = 2.50, SD = 2.89) during recall of time 1 at time 2, F(1, 70) = 10.60, p < .01, d = 0.79. Men did not exhibit this bias towards negative affective memory recall (M = 3.80, SD = 3.26) compared to positive affective recall (M = 2.55, SD = 2.99) at time 2 (p = .22). No other comparisons reached significance (p’s > .15).
The effect of STEM aversions on negative affective memory recall
To determine whether there was a link between STEM aversions (STEMI reaction times on the dot probe task) exhibited at time 1 and negative affective recall exhibited over five weeks later, moderated regression analyses were conducted. We tested for moderation by deriving unstandardized regression coefficients and 95% bias-corrected confidence intervals (CIs) from 5,000 bootstrap estimates (Hayes, 2013; model 1). STEMI reaction times (RTs) were entered as a predictor and mean-centered Gender (originally coded as women = 0, men = 1) was entered as a moderator predicting negative affective LIWC scores at time 2. To account for the basic memory differences between men and women described above, we included negative affective LIWC scores at time 1 as a covariate.
This analysis revealed a unique relationship between STEMI RTs and negative affective recall at time 2 among women only. When predicting LIWC negative affective scores at time 2 as a function of STEMI RTs and gender there was a main effect for gender, β = −2.05, SE = 0.96, t(67) = −2.12, p < .04, LLCI= −3.96, ULCI = − 0.13, and a main effect for STEMI RTs, β = −0.02, SE = 0.008, t(67) = −2.14, p < .04, LLCI = −0.03, ULCI = −0.001, indicating that the more women exhibited a STEM aversion at time 1, the more negative affective recall they reported more than 5 weeks later at time 2 (Fig. 5). Negative affect reported at time 1 was a significant covariate in the model (p = .02) but the interaction was not significant (p = .24). Additional analyses were conducted to determine whether findings were specific to women and STEMI RTs and rule out alternatives vis a vis positive affective recall and NonSTEMI RTs (this yielded 3 additional models: STEMI RTs and positive affective recall, and NonSTEMI RTs predicting negative or positive affective recall). No relationships were found for positive affective recall (p’s > 0.60) or when NonSTEMI RTs were included as a predictor in the model (p’s > 0.17).
Scatter plot of LIWC negative affect score residuals at time 2 (accounting for LIWC negative affect scores at time 1) and STEMI reaction times. Women STEM majors recalled more negative affective math memories associated with their first visit at time 2, five weeks after their first visit, to the extent they exhibited greater STEM aversion at time 1.
Study 2 discussion
Findings from this study provide additional evidence for a unique STEM aversion among women (a bias towards negative stereotypic STEMIs specifically), but this time among a sample of women STEM majors who are chronically outnumbered by men and thus more likely to find themselves in SIT contexts on a continual basis. Consistent with previous work13, 35, women not only underperformed compared to men, but also exhibited a negative affective memory bias up to five weeks after the initial (controlled) SIT STEM experience (for women). Notably, women showing stronger STEM aversion at time 1 recalled more negative affective-laden memories for time 1 at time 2, suggesting an ongoing cycle in which initial aversion intensifies negative encoding and recall of STEM events in future SIT contexts. Given that the negative affective recall was unprompted and associated with stronger STEM aversions suggests these processes may be more non-conscious in nature.
These findings highlight how STEM aversion may foster a recursive process: a bias toward encoding negative aspects of a STEM task may lead to spontaneously retrieving those negative memories in subsequent settings, perpetuating negative affect and potentially contributing to performance decrements. Given that negative affect often impairs performance in SIT contexts35, 52, a link between STEM aversion and poor outcomes on standardized math tests seems plausible. Study 3 explores this possibility further by assessing real-time neural responses to aversive STEM images and examining whether such responses predict underperformance in SIT conditions on more traditional (and difficult) diagnostic math tasks.
Study 3 methods
Studies 3 and 4 sought to expand upon evidence from Study 2 by examining whether STEM aversion plays a role in another standard identity threat outcome: the underperformance of women compared to men on difficult math tasks. In Study 3 we sought converging evidence for the existence of STEM aversions by examining neural activity in regions integral for arousal and threat detection on-line, while women were exposed to STEMIs, in SIT contexts. Furthermore, we used a different, well-established task from cognitive psychology known to assess the extent to which stimuli are deemed threatening or arousing to individuals-the attentional blink (aka emotion induced blindness) task (AB).
Specifically, all women were placed in a SIT context and completed an AB task while continuous EEG activity was recorded to assess neural communication between the calcarine sulcus (CS) and inferior frontal gyrus (IFG); two regions known to play an integral role in emotion induced blindness effects53. Neural communication was operationalized as phase locking between CS and IFG within specific oscillations or frequency bands associated with excitatory neural processes54,55,56. Individuals then completed a difficult math test, and analyses were conducted to determine whether the (neural) threat/arousal response elicited by women in response to STEMIs influenced performance outcomes in a SIT context. Consistent with previous studies, we predicted that women would exhibit a neural threat response to STEMIs compared to NonSTEMIs. Furthermore, we predicted that to the extent aversion and negative affective processes tax executive function processes otherwise needed for optimal performance on difficult cognitive tasks (much like any stressor tends to compromise performance on cognitively intensive tasks26, 27), the severity of this threat response would be associated with more negative performance outcomes for women compared to men on a standard difficult math test often used in SIT studies.
Participants
Sixty-nine White undergraduates participated for partial course credit in a mixed-factors design, crossing gender (men vs. women), T1 image type (STEM vs. non-STEM), and stimulus onset asynchrony (SOA length: short vs. long) with repeated measures on the latter two factors. Eligibility criteria included right-handedness, native English fluency, and no disabilities affecting task performance. Of the 69 participants, 59 (32 women) are included in the present analysis. Reasons for exclusion included incomplete data (2 men), excessive motion during EEG recording (2 men and 2 women), less than 10 usable epochs for data analysis in main categories of interest (2 women), and statistical outliers identified via Grubb’s Test (2 men). All participants had knowledge of the stereotype that men are better than women at math, verified as in previous studies. Based on past SIT EEG research, which found moderate to large effects sizes both overall and in moderated regression analyses13, 57, we aimed for approximately 35 subjects per cell knowing that equipment malfunction would result in loss of some data. Our stopping rule was to stop collecting participants once we had 35 men and women in each cell. A sensitivity analysis was conducted for the main analyses (the effect of phase locking on GRE performance) and showed that 59 participants would be enough to capture a small to medium size effect on that analysis (Cohen’s f = 0.14) with an alpha of 0.05, power of 0.80.
Procedure
Following EEG setup, participants were seated in a sound-dampened chamber. All were told they would complete a DMT mirroring SIT induction in Studies 1 and 2. They first completed an attentional blink (rapid serial visual presentation) task wherein the supposed “T1” was either a STEMI or a NonSTEMI (all stimuli were the same used in the previous 2 studies). Immediately afterward, participants were given 15 difficult GRE math problems to complete in five minutes58. Math accuracy was calculated by dividing the total correct by the total attempted, then multiplying by 100. A final questionnaire contained the same manipulation check as in Studies 1 and 2.
Attentional blink task
It has been well established in the literature that as emotionally arousing stimuli hijacks attentional resources, when a target image (termed T2) is presented in close approximation with the image (termed T1, which in this study are the STEMIs compared to the NonSTEMIs), recognition of the target image is impaired (for detailed explanations of the cognitive processes underlying this effect see59,60,61. Thus, given that arousing images are particularly attention grabbing, the degree to which T1 is aversive (to women) or positively arousing (to men) can be gauged by assessing the extent to which individuals fail to detect a neutral image (T2) that is presented immediately after T1.
Procedures for the attentional blink task were identical to62 with 2 primary exceptions: (1) T1 was always either a STEM (72 trials) or NonSTEM (72 trials) image, and (2) to maximize the number of blinks assessed in relation to the STEMIs or NonSTEMIs, more short SOA trials (which yield the expected blink effect) were included in the task compared to long SOA trials (see SMAR for complete description of the method). Participants were shown a series of 7 images in a rapid serial visual presentation format for a total of 144 trials and were asked to identify whether T2 was present using a scale of 0 (not visible)−99 (highly visible). Participants were considered to have seen T2 if their rating was above the midway point on the scale. Consistent with standard attentional blink paradigm analyses, only trials in which the participants correctly identified T1 (indicating whether the picture contained only men or only women) were included in the analysis63. Participants correctly identified T1 on 89% of the trials.
EEG recording and data reduction
EEG was recorded continuously (64 scalp electrodes, two mastoid electrodes, plus two electrodes for ocular movement) with a BioSemi ActiveTwo system (2048 Hz sampling). EEG signals were epoched from 500 ms before to 1000 ms after T1 onset, corrected for ocular artifacts, and screened for amplitude and gradient outliers. At least 10 clean epochs per stimulus type were required to remain in the final dataset64.
We utilized source localization and time-frequency analyses to assess neural activity in regions of interest. Phase-locking, i.e., the extent to which oscillations in two regions resonate in synchrony with one another in specific frequency bands, was calculated. One way to determine whether stimuli are perceived to be aversive is to assess whether occipital and prefrontal cortical regions phase lock with one another in response to a given stimulus. With respect to the AB task, CS is thought to process the T1 image and relay that information to IFG, which serves as the bottleneck that inhibits processing of T2 when it’s presented temporally close to T153. Thus, more aversive or attention-grabbing images should engender greater communication, or higher phase locking values, between occipital and prefrontal cortical regions, i.e., CS and IFG. The method for this was identical to previous studies35, 57, 65. See the SMAR for complete description of EEG analyses.
Study 3 results
Manipulation check
An independent samples t-test on the manipulation check showed that women reported greater concern about being judged (M = 4.09, SD = 2.13) compared to men (M = 2.82, SD = 1.72), t(57) = 2.36, p = .02, d = 0.62, suggesting evaluative concerns associated with the negative math stereotype in typical SIT contexts were successfully primed in women compared to men.
Attentional blink task
In the attentional blink task, a 2 (Gender) × 2 (T1 Type: STEMI vs. NonSTEMI) × 2 (SOA length: Short vs. Long) mixed factors analysis, using GEE to account for repeated measures, revealed a main effect of SOA length Waldχ2(1) = 18.53, p <.001, providing evidence for the standard blink effect: participants had fewer correct T2 detections at the short lag (M = 63.10, SE = 4.22) than at the long lag (M = 71.27, SE = 4.01). There was also a main effect of T1 Type, Waldχ2(1) = 6.32, p <.02. Participants detected fewer T2 trials following STEMIs (M = 65.72, SE = 4.21) compared to NonSTEMIs (M = 68.66, SE = 3.88), suggesting that, similar to Studies 1 and 2, both men and women found STEMIs more arousing in DMT contexts. No other effects were significant (p’s > 0.58).
The effect of phase locking on GRE performance
Phase-locking analyses (detailed in the SMAR) indicated that women generally showed greater oscillatory communication between threat-detection regions in response to STEMIs compared to NonSTEMIs and compared to men. We next tested whether greater communication between CS and IFG differentially predicted math performance for men and women using bootstrapped moderated regression analyses66. Although participants answered multiple math questions, with questions nested within participant, phase-locking values were aggregated across the session. Therefore, we reasoned that it made the most conceptual sense to examine the effect of aggregate phase locking scores on aggregate accuracy scores on the math test (rather than on the likelihood that any one question was correct).
Separate models were analyzed for each frequency band. Separate phase locking values were calculated for STEMI and NonSTEMIs in line with EEG analyses. Regression analyses consisting of Gender (women = 0, men = 1), phase locking values between CS and IFG in response to STEMI or NonSTEMIs (per the four frequency bands), and the 2-way interaction variable were conducted for a total of four STEMI-based and four NonSTEMI-based analyses. We didn’t have a priori hypotheses specific to frequency band; thus to control for multiple comparisons, an Adaptive Benjamini–Hochberg procedure was utilized to adjust for false discovery rates (FDR) across four tests per image type.
Analyses on CS-IFG phase locking in the Beta band elicited in response to stereotypic STEM images revealed a main effect of gender (β = −63.55, SE = 31.42, t(55) = −2.02, p < .05, LLCI = −126.52, ULCI = − 0.58), reflecting women’s lower math performance relative to men overall. The only comparison that survived multiple comparison correction (BH critical value = 0.03) was a significant interaction between gender and CS–IFG phase locking in the beta band (β = 417.99, SE = 189.28, t(55 = 2.21, p = .03, LLCI = 38.65, ULCI = 797.33); simple slope analyses indicated that women tended to perform worse as their CS–IFG phase locking increased in response to STEMIs (β = −6.89, SE = 4.08, t(55) = −1.69, p = .097, LLCI = −401.06, ULCI = 34.18), whereas men showed a nonsignificant trend in the opposite direction (β = 8.81, SE = 5.82, t(55) = 1.51, p = .14, LLCI = −76.15, ULCI = 545.26).
Additional analyses at one standard deviation below and above the mean of phase locking revealed that women performed worse than men at higher levels of CS–IFG phase locking in the beta band in response to STEMIs, β = 22.23, SE = 10.51, t(55) = 2.12, p < .04, LLCI = 1.17, ULCI = 43.28 (Fig. 6). Similar patterns were found in the theta and gamma frequency band (p’s = 0.12 for the interaction terms). The slopes for women were significant or approached significance in the theta, p =.047, and gamma, p =.05, bands but not the alpha band (p =.84). This pattern is consistent with past literature indicating neural activity in the theta, beta and gamma band tap into similar, perhaps more excitatory neural processes, colloquially referred to as the “gas” in the brain, compared to the alpha band, which taps into more inhibitory-oriented activity, colloquially referred to as the “breaks” in the brain54,55,56. No other neural variables associated with viewing STEMIs were significant predictors of performance outcomes on the math test (p’s > 0.2), nor were there any relationships between neural variables associated with viewing NonSTEMIs and performance outcomes on the math test (p’s > 0.3).
The effect of phase locking on GRE performance. A gender x phase locking interaction was found for CS-IFG phase locking in the beta band; Increased CS-IFG phase locking to STEMIs (at one standard devation above the mean) predicted worse performance on the difficult math test for women but not men.
Study 3 discussion
Study 3 provided neural evidence, via a different, well-established measure of attentional bias and arousal, that women perceive STEMIs as more arousing in SIT contexts. Although both men and women exhibited more attentional blinks following STEMIs, only women tended to underperform on challenging GRE math questions to the extent they exhibited heightened CS-IFG connectivity in response to STEMIs (to various degrees in the theta, beta, and gamma bands, which are all associated with excitatory neural activity). In contrast, men trended toward better performance when displaying a similar neural response.
This intriguing pattern suggests a context-dependent, and possibly learned, shift between maladaptive and adaptive responses within the same visual processing and gating mechanisms in the brain. For women experiencing stereotype-based stress, stronger CS–IFG phase locking to STEMIs may reflect threat-biased routing of visual information into inferior frontal circuits that sit at the precipice of higher-level, executive functions. Based on neural models of the attentional blink/emotion induced blindness53, it’s possible IFG up-regulates/biases attention towards threat-relevant stimuli, which diverts attention and taxes executive resources needed for goal-relevant processing on demanding tasks (e.g., solving difficult math problems). By contrast, the positive trend for men implies that the same CS–IFG phase locking can be adaptive, either by selectively amplifying domain-affirming cues (ingroup-supportive imagery) or by enhancing inhibitory control over distractors.
Accordingly, the direction of the performance effect depends on how IFG up- or down-regulates arousing information. When information is appraised as self-threatening, it biases attention and competes with executive control; when appraised as self-affirming, it is less distracting or more easily inhibited, leaving executive resources intact. Thus, the same behavioral signature (the blink effect) can arise from similar bottom-up sensory dynamics yet yield markedly different top-down consequences, depending on whether arousing cues are perceived as threatening versus affirming, which is a distinction shaped in part by associative learning (societal stereotypes primed in SIT contexts) and prior domain experiences (learned aversions).
Overall, these findings support the hypothesis that aversion and negative affect interfere with cognitive resources required for optimal performance, consistent with earlier evidence that STEM aversions relate to negative affect in SIT contexts (Study 2), as well as a multitude of past research linking the experience of situational stress to compromised executive functions26, 67. However, this study’s correlational design merely suggests a potential causal role of STEM aversion in standard SIT underperformance. Consequently, Study 4 tested whether directly manipulating the arousal response to STEMIs could alter men’s and women’s performance on an executive function-intensive task in identity-threatening settings.
Study 4 methods
Study 4 aimed to establish a causal link between arousal responses to STEMI primes and the underperformance of women in DMT contexts. To do this, we employed a well-documented experimental approach for demonstrating this effect known as manipulating the mediator68, 69 by directly altering participants’ attentional responses to STEMIs during a dot-probe task. Specifically, men and women were instructed to either attend toward or away from STEMIs, after which they completed a diagnostic math test in a mixed-gender setting.
We hypothesized that women instructed to attend to STEMIs would show typical SIT effects (lower math performance relative to men in the same condition), whereas women trained to divert attention from STEMIs would be buffered from SIT effects and perform comparably to men. In contrast, men were expected to show performance gains (i.e., “stereotype lift”) when instructed to attend toward STEMIs. Such findings would indicate that learned domain aversions to STEM contexts may be a direct mechanism fueling stereotypic underperformance in women.
Participants
114 participants were selected for Study 4. Fifty-eight participants (32 women, 26 men) were randomly assigned to the “exacerbate arousal condition” while 56 participants (26 women, 30 men) were assigned to the “mitigate arousal condition”. All participants had knowledge of the stereotype that men are better than women at math and were identified via the same means described in previous studies. We aimed for comparable numbers of participants in each cell and numbers like past research that utilized dot probe training procedures (e.g.,34). Our stopping rule was to reach between 110 and 120 participants. A sensitivity analysis on the gender by condition interaction (the effect of primary interest here) showed that 114 participants would be enough to capture a small to medium size effect (Cohen’s f = 0.27) with an alpha of 0.05, power of 0.80.
Procedure
Participants were brought into the lab and seated at computers four at a time. All sessions contained at least 1 woman and man participant. Participants first completed a Dot Probe task that either directed attention towards (exacerbate condition) or away (mitigate condition) from STEMIs. Then, participants received instructions indicating they would be taking a math test that was diagnostic of their math intelligence. Instructions were presented on the computer screen and read aloud by a recording of a man experimenter. Following instructions, participants completed demographic questions including being asked to indicate their gender, thus priming SIT7. Next, participants completed a difficult math test in a room with 1 to 3 men (the same test described in Study 3). Finally, participants completed the manipulation check and were debriefed.
Dot probe task
This task was identical to the task used in Study 1 and 2 with the exception that dependent on condition, the dot would appear either in the location where the STEMI (exacerbate arousal) or NonSTEMI (mitigate arousal) image had been, directing attention either towards or away from STEMIs accordingly.
Difficult math test
This task was identical to the test used in Study 3.
Manipulation check
Participants were asked the same question used in all previous studies.
Study 4 results
Manipulation check
An ANOVA conducted on the manipulation check question yielded no differences between condition (p =.35), however, there was a main effect for gender, F(1,110) = 8.60, p < .01, η2 = 0.073 (women M = 4.33, SD = 2.16; men M = 3.20, SD = 2.04), indicating that women were more concerned that the researcher would judge them based on their performance on the math task compared to men. The interaction was not significant (p = .98). Thus women reported the desired SIT-consistent concern.
Difficult math test
Given the repeated measures nature of the math task and within subject variability, performance data was again analyzed using Generalized Estimating Equations45, 46, specifying a logit model given that the outcome variable is binary (correct coded 1; wrong coded 0) and an unstructured working correlation matrix47. We included the main effects of condition, gender, and the two-way condition by gender interaction as predictors. There were no main effects of gender or condition, p’s> 0.41, however, the hypothesized gender x condition interaction was found, Waldχ2(1) = 7.51, p < .01 (95% Wald LL CI = 0.312; UL CI = 1.880). This interaction indicated that in the exacerbate arousal condition, for every one unit increase in gender (women= −1, men = 1) the log odds of getting the question correct decreased by 0.357 units, B = − 0.357 (SE = 0.14); Waldχ2 (1) = 6.122, p = .013 (95% Wald LL CI = − 0.640; UL CI = − 0.074). In other words, women were less likely to get the question correct in the exacerbate arousal condition than men. This effect was not present in the mitigate arousal condition, however (p = .17). For men, a main effect of condition was also found, B = 0.35 (SE = 0.15); Waldχ2 (1) = 5.75, p < .02 (95% Wald LL CI = 0.064; UL CI = 0.635), indicating that for every one unit increase in condition (mitigate arousal= −1, exacerbate arousal= 1), the log odds of getting the question correct increased by 0.35 units; men were more likely to get a question correct in the exacerbate arousal condition than the mitigate arousal condition. Women did not show this relationship (p = .15, Fig. 7). These findings suggest that blunting the arousal response associated with STEMI primes improves performance for women in SIT contexts but not men, whereas exacerbating the arousal response increases performance among men, but fosters standard SIT-like effects among women.
The effect of exacerbating or mitigating arousal on performance. Women’s performance improves in the mitigate arousal condition while men’s performance improves in the exacerbate arousal condition.
General discussion
Across four studies, we drew on neuroscience, cognitive, and clinical approaches to show that stereotypic STEMIs presented in identity-threatening contexts evoke an aversive-like response for women, i.e., they exhibited attentional biases towards images that were rated as negatively emotionally arousing for women in general. This aversion undermined performance on difficult math tasks (Study 4) and was associated with more negative affective memories about STEM over time (Study 2). Study 1 used a dot-probe paradigm to demonstrate that while both men and women exhibited an attentional bias to STEMIs in diagnostic math contexts (SIT contexts for women) compared to stereotype neutral contexts, only women in SIT contexts exhibited a bias towards STEMIs compared to NonSTEMIs. Explicit ratings supported the notion that STEMIs were more aversive in nature for women but more positive for men. Study 2 replicated these patterns in women STEM majors and suggested that STEM aversions may bias long-term negative affect and memory not unlike what is seen in more clinical samples10, 11, 21,22, 50, 51.
Study 3 used EEG-based methods and an attentional blink task to corroborate findings from Studies 1 and 2, indicating that both women and men exhibited a behavioral attentional bias towards STEMIs. Similar to the previous studies, however, women’s attentional bias towards STEMIs was linked to more negative outcomes, e.g., heightened threat-related neural communication between CS-IFG (two regions integral to the attentional blink effect) in response to STEMIs was associated with worse math performance among only women in SIT contexts. Men trended towards the opposite pattern, performing marginally better to the extent they exhibited increased CS-IFG communication in response to STEMIs. Finally, Study 4 directly manipulated the arousal response via a modified dot-probe task. When women were instructed to attend away from STEMIs, theoretically blunting the arousal response, their performance improved, attenuating typical SIT performance effects. Men showed the opposite pattern, benefiting from enhanced attention to STEMIs, consistent with a stereotype lift effect.
These findings compliment past research demonstrating that various primes within a context can be threatening to women and alter basic attentional and physiological processes accordingly6, 70, 71. These findings also offer potentially intriguing insight into what the “threat” in SIT might represent. Although Steele (1997)8 originally argued that SIT stems from fear of confirming a negative group stereotype, the precise nature of the threat remained unclear. Our results suggest that this threat could be a learned aversion that develops over time in a stigmatized domain. In other words, when individuals enter identity-threatening contexts, they may automatically retrieve negative memories tied to past failures or social cues, leading to heightened stress and negative affect that interfere with the cognitive resources required for demanding tasks.
According to Schmader and colleagues8, stereotype threat (the performance-based derivative of SIT) sets off a cascade of physiological stress, negative appraisals, and performance monitoring that drain working memory capacity. They attribute this cascade to an internal conflict between one’s personal and group identities and the domain-specific stereotype. While there is some support for these moderating factors, the underlying mechanism remains largely untested. As an alternative, we propose that this cascade may arise, or be initiated, through spontaneous recall of negative memories and learned aversions from previous stereotype-threat experiences. Although further research is needed to test this model, growing evidence13, 35 points to a central role for negative memory recall and domain aversions in both triggering and manifesting the classic effects observed in SIT contexts. Future research would be necessary to find unequivocal support for such a conjecture, however, given the sheer complexity of the various constructs involved.
It is also worth noting that although group-level effects across studies suggest that STEM aversion may play a role in negatively biasing women’s performance in the moment and memories for specific events over time, past research hints at a much more nuanced picture as it pertains to performance in SIT contexts, STEM identity (i.e., the semantic memory-based self-concept), and resilience in stigmatized domains. Indeed, our Study 2 sample consisted of women STEM majors who are likely outnumbered by men on a regular basis. This is likely a more resilient group. Thus, while in the aggregate these STEM majors still underperformed on the math feedback task compared to men, it is possible a subset of these women are invariably more resilient to stereotype-based stress and subsequent STEM aversions. While there are likely many individual difference factors that foster resilience (e.g.,72), our findings place a spotlight on the role that memory, and the accumulation of valanced memories (and context-dependent memory retrieval), may play in fostering resilience in the moment and over time.
To probe for insight, we conducted post-hoc analyses on Study 2 participants (all STEM majors) to determine what factors were most important for better math performance among women STEM majors on the math feedback task. To assess potential adaptive memory processes, we calculated context-dependent memory scores (d prime scores, which are measures of memory sensitivity) for both veridical positive and negative feedback received on the math feedback task in a manner identical to methods reported in Forbes et al.13, 35. We conducted separate ordinary least squares regression analyses for women and men on standardized variables for STEM aversion (mean dot probe reaction times for STEMIs), feedback memory scores (for positive and negative feedback), and positive and negative affective recall for STEM memories on visit two (LIWC scores from visit two accounting for visit one). Among women, findings revealed that overall encoding of both positive and negative performance related feedback was the strongest predictor of better math performance. Positive affective recall for STEM memories of visit one was also a strong positive predictor of women’s math performance. Consistent with findings across studies, STEM aversion was a strong predictor of decreased math performance. Interestingly, for women, faster STEMI reaction times were also marginally correlated with better memory for negative task-related feedback specifically, suggesting a relationship between STEM aversions and encoding of negative information in the moment. For men, better encoding of positive performance-related feedback and STEMI reaction times (assumed to be a STEM affinity as opposed to aversion) were the most influential predictors for heightened math performance.
Thus, while our women STEM majors performed more poorly in the aggregate compared to men STEM majors, the ones that performed better, i.e., exhibited more SIT-related resilience, did appear to exhibit context-relevant, enhanced encoding of all feedback, as well as more positive affective memories of the initial visit at time 2. These findings complement neurophysiological patterns found in the past, which provide evidence that resilience may be fostered in part by the optimal coordination between executive functions, emotion regulation, and autobiographical memory processes integral for the self-concept (Liu, Backer, et al., 2021)72. Thus, it’s possible that among women who remain in STEM, vigilance systems sensitive to situational stressors like SIT can be leveraged to foster (i) momentary vigilance that supports error monitoring and enhanced encoding of positive and negative performance-related feedback, and (ii) the rapid reconfiguration of executive function, emotion, and self-related networks that enable the integration of arousing information with past positive STEM experiences that complement (rather than compete with) executive resources needed for optimal performance on cognitively intensive tasks. When the converse patterns emerge, susceptibility to stereotype-based stressors may be exacerbated, setting the stage for underperformance in the moment, and more biased negative memories and possibly STEM aversions, over time. Establishing causal links between STEM stigma susceptibility, cumulative SIT experiences in STEM, STEM resilience, and STEM persistence would invariably require a decades-long longitudinal study to fully understand, however.
Although these studies offer valuable insights, they are limited by their reliance on predominantly White undergraduate samples, laboratory settings, and focus on specific STEM contexts (e.g., math tasks). While care was taken to approximate real world class and diagnostic testing contexts, these factors diminish the ecological validity of the studies. Also, while sensitivity analyses suggested all studies were powered well enough to detect small to medium sized effects and the degree of difficulty was particularly high for two studies (recruiting STEM majors for a longitudinal study in Study 2, and employing neuroimaging methods in Study 3), the moderated regression analyses conducted in studies 2 and 3 could have benefitted from larger sample sizes. The bootstrapping approach employed coupled with multiple comparisons corrections helps alleviate these concerns, but it would be important to replicate findings with larger, more diverse samples. It was also curious that men were reliably faster than women to all images in the dot probe tasks, regardless of image stereotypicality or target. One explanation for this could be that SIT contexts primed increases in cognitive load, altering attentional processing speeds accordingly. For instance, it is well documented that SIT contexts can tax cognitive capacity, with increased stress and performance monitoring processes being two key sources of taxation (for a review, see 8. Past work has also revealed a link between selective attention and cognitive load (e.g.,12). Given women demonstrated SIT consistent patterns across tasks (evidenced by the manipulation check and performance decrements across studies) and were placed in these contexts before completing the attentional bias tasks, it is possible SIT-dependent cognitive load slowed performance on the attentional blink task overall (relatively). Another possibility is that the primes themselves (men in STEM contexts and women in NonSTEM contexts) were both motivationally relevant to men in DMT and PST conditions respectively, serving as positive affirmations in positively stereotyped domains. We did not inquire about participants’ perceptions of targets vis a vis role models, however, so this possibility could not be tested, unfortunately.
The dot probe itself has also been a source of question in the past, with issues of validity and reliability at the forefront73, 74. Ultimately like many measures in psychology, there are considerable amounts of evidence in support of both camps, with caution necessary like any measure that is sensitive to dynamic cognitive, motivational, emotional, and individual difference-oriented processes37, 38, 40, 73,74,75. The fact that we supported our dot-probe based findings with self-report measures that indicated women had a negative affective perception of STEMIs, and replicated findings with another well validated measures of attentional bias/emotional threat detection, helps alleviate overt concerns.
In sum, these findings have significant implications for STEM settings, where images of men in labs or other contexts implying women’s lack of belonging may spark aversive reactions. Women’s aversions to stereotypic STEM cues could appear even in seemingly neutral environments, biasing performance, domain perceptions, and memory. Although the current work focused on static images, dynamic cues (e.g., classroom behaviors, commercials, or interactions with professors) could similarly prime these reactions. Future research should clarify how these aversions develop and operate without overt SIT primes, and whether they impact performance in more subtly biased STEM environments. Overall, these studies highlight how SIT contexts can cultivate a learned fear of STEM domains that operates largely unconsciously, thereby perpetuating stereotype-consistent outcomes for stigmatized individuals.
Data availability
Materials for all studies are located on the Open Science Framework: [https://osf.io/54pum/].
References
Settles, I. H., Cortina, L. M., Malley, J. & Stewart, A. J. The climate for women in academic science: the good, the bad, and the changeable. Psychol. Women Q. 30 (1), 47–58 (2006).
Singh, S. N., Mishra, S. & Kim, D. Research-related burnout among faculty in higher education. Psychol. Rep. 83 (2), 463–473 (1998).
Trower, C. A. & Chait, R. P. Forum: faculty diversity why women and minorities are underrepresented in the professoriate, and fresh ideas to induce needed reform. Harv. Magazine. 104 (4), 33–37 (2002).
Inzlicht, M. & Ben-Zeev, T. A threatening intellectual environment: why women are susceptible to experience problem-solving deficits in the presence of men. Psychol. Sci. 11, 365–371 (2000).
Spencer, S. J., Steele, C. M. & Quinn, D. M. Stereotype threat and women’s math performance. J. Exp. Soc. Psychol. 35 (1), 4–28 (1999).
Murphy, M. C., Steele, C. M. & Gross, J. J. Signaling threat: how situational cues affect women in math, science, and engineering settings. Psychol. Sci. 18 (10), 879–885. https://doi.org/10.1111/j.1467-9280.2007.01995.x (2007).
Steele, C. M. & Aronson, J. Stereotype threat and the intellectual test performance of African Americans. J. Personal. Soc. Psychol. 69 (5), 797 (1995).
Schmader, T., Johns, M. & Forbes, C. An integrated process model of stereotype threat effects on performance. Psychol. Rev. 115 (2), 336 (2008).
McCarty, M., Kelly, J. & Williams, K. The impact of fleeting exposure to female exemplars of success in STEM. Group. Processes Intergroup Relations. 25 (2), 474–488 (2022).
Herrera, S., Montorio, I., Cabrera, I. & Botella, J. Memory bias for threatening information related to anxiety: an updated meta-analytic review. J. Cogn. Psychol. 29 (7), 832–854 (2017).
Mitte, K. Memory bias for threatening information in anxiety and anxiety disorders: a meta-analytic review. Psychol. Bull. 134 (6), 886 (2008).
Lavie, N., Hirst, A., de Fockert, J. & Viding, E. Load theory of selective attention and cognitive control. J. Exp. Psychol. Gen. 133 (3), 339–354 (2004).
Forbes, C. E., Amey, R., Magerman, A. B., Duran, K. & Liu, M. Stereotype-based stressors facilitate emotional memory neural network connectivity and encoding of negative information to degrade math self-perceptions among women. Soc. Cognit. Affect. Neurosci. 13 (7), 719–740 (2018).
Purdie-Vaughns, V., Steele, C. M., Davies, P. G., Ditlmann, R. & Crosby, J. R. Social identity contingencies: how diversity cues signal threat or safety for African Americans in mainstream institutions. J. Personal. Soc. Psychol. 94 (4), 615 (2008).
Blascovich, J., Spencer, S. J., Quinn, D. & Steele, C. African Americans and high blood pressure: the role of stereotype threat. Psychol. Sci. 12 (3), 225–229 (2001).
Keller, J. & Dauenheimer, D. Stereotype threat in the classroom: dejection mediates the disrupting threat effect on women’s math performance. Personal. Soc. Psychol. Bull. 29 (3), 371–381. https://doi.org/10.1177/0146167202250218 (2003).
Schmader, T., Forbes, C. E., Zhang, S. & Mendes, W. B. A meta-cognitive perspective on the cognitive deficits experienced in intellectually threatening environments. Pers. Soc. Psychol. Bull. 35, 584–596 (2009).
Johns, M., Inzlicht, M. & Schmader, T. Stereotype threat and executive resource depletion: examining the influence of emotion regulation. J. Exp. Psychol. Gen. 137 (4), 691 (2008).
O’Brien, L. T. & Crandall, C. S. Stereotype threat and arousal: effects on women’s math performance. Pers. Soc. Psychol. Bull. 29 (6), 782–789 (2003).
Townsend, S. S., Major, B., Gangi, C. E. & Mendes, W. B. From in the air to under the skin: cortisol responses to social identity threat. Pers. Soc. Psychol. Bull. 37 (2), 151–164 (2011).
LaBar, K. S. & Cabeza, R. Cognitive neuroscience of emotional memory. Nat. Rev. Neurosci. 7 (1), 54–64 (2006).
Smith, A. P., Henson, R. N., Rugg, M. D. & Dolan, R. J. Modulation of retrieval processing reflects accuracy of emotional source memory. Learn. Mem. 12, 472–479 (2005).
Smith, A. P., Stephan, K. E., Rugg, M. D. & Dolan, R. J. Task and content modulate amygdala-hippocampal connectivity in emotional retrieval. Neuron 49, 631–638 (2006).
Alberini, C. M. & LeDoux, J. E. Memory reconsolidation. Curr. Biol. 23 (17), R746–R750 (2013).
Nadel, L., Hupbach, A., Gomez, R. & Newman-Smith, K. Memory formation, consolidation and transformation. Neurosci. Biobehavioral Reviews. 36 (7), 1640–1645 (2012).
Shields, G. S., Sazma, M. A. & Yonelinas, A. P. The effects of acute stress on core executive functions: A meta-analysis and comparison with cortisol. Neurosci. Biobehavioral Reviews. 68, 651–668 (2016).
Beilock, S. L. Math performance in stressful situations. Curr. Dir. Psychol. Sci. 17 (5), 339–343 (2008).
Bradley, B. P., Mogg, K., Falla, S. J. & Hamilton, L. R. Attentional bias for threatening facial expressions in anxiety: manipulation of stimulus duration. Cognition Emot. 12 (6), 737–753 (1998).
Bradley, B. P., Mogg, K. & Millar, N. H. Covert and overt orienting of attention to emotional faces in anxiety. Cogn. Emot. 14 (6), 789–808. https://doi.org/10.1080/02699930050156636 (2000).
MacLeod, C., Mathews, A. & Tata, P. Attentional bias in emotional disorders. J. Abnorm. Psychol. 95 (1), 15 (1986).
Mogg, K. & Bradley, B. P. Some methodological issues in assessing attentional biases for threatening faces in anxiety: A replication study using a modified version of the probe detection task. Behav. Res. Ther. 37 (6), 595–604 (1999a).
Mogg, K. & Bradley, B. P. Orienting of attention to threatening facial expressions presented under conditions of restricted awareness. Cognition Emot. 13 (6), 713–740 (1999b).
Walton, G. M. & Cohen, G. L. Stereotype lift. J. Exp. Soc. Psychol. 39.5, 456–467 (2003).
Reese, H. E., McNally, R. J., Najmi, S. & Amir, N. Attention training for reducing spider fear in spider-fearful individuals. J. Anxiety Disord. 24 (7), 657–662 (2010).
Forbes, C. E., Duran, K. A., Leitner, J. B. & Magerman, A. Stereotype threatening contexts enhance encoding of negative feedback to engender underperformance and anxiety. Soc. Cogn. 33 (6), 605–625 (2015).
Trawalter, S., Todd, A. R., Baird, A. A. & Richeson, J. A. Attending to threat: Race-based patterns of selective attention. J. Exp. Soc. Psychol. 44 (5), 1322–1327 (2008).
Evans, T. C. & Britton, J. C. Improving the psychometric properties of dot-probe attention measures using response-based computation. J. Behav. Ther. Exp. Psychiatry. 60, 95–103 (2018).
Aday, J. S. & Carlson, J. M. Extended testing with the dot-probe task increases test–retest reliability and validity. Cogn. Process. 20, 65–72 (2019).
Rubin, M. et al. Measuring and modifying Threat-Related attention bias in posttraumatic stress disorder: an attention bias modification study. Depress. Anxiety. 2024, 3683656. https://doi.org/10.1155/2024/3683656 (2024). PMID: 40226663; PMCID: PMC11918613.
Price, R. B. et al. Empirical recommendations for improving the stability of the dot-probe task in clinical research. Psychol. Assess. 27 (2), 365 (2015).
Franja, S., McCrae, A. E., Jahnel, T., Gearhardt, A. N. & Ferguson, S. G. Measuring food-related attentional bias. Front. Psychol. 12, 629115 (2021).
Eagly, A. H. & Karau, S. J. Role congruity theory of prejudice toward female leaders. Psychol. Rev. 109, 573–598 (2002).
Heilman, M. E. Sex bias in work settings: the lack of fit model. Res. Organizational Behav. 5, 269–298 (1983).
Heilman, M. E. Description and prescription: how gender stereotypes prevent women’s ascent up the organizational ladder. J. Soc. Issues. 57, 657–674 (2001).
Ballinger, G. A. Using generalized estimating equations for longitudinal data analysis. Organizational Res. Methods. 7, 127–150 (2004).
Liang, K. Y. & Zeger, S. L. Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22 (1986).
Fitzmaurice, G. M., Laird, N. M. & Rotnitzky, A. Regression models for discrete longitudinal responses. Stat. Sci. 8, 284–309 (1993).
Grubbs, F. E. Procedures for detecting outlying observations in samples. Technometrics 11 (1), 1–21 (1969).
H Bower, G. Mood and memory. Am. Psychol. 36 (2), 129 (1981).
Lewis, P. A. & Critchley, H. D. Mood-dependent memory. Trends Cogn. Sci. 7 (10), 431–433 (2003).
Miranda, R. & Kihlstrom, J. Mood congruence in childhood and recent autobiographical memory. Cognition Emot. 19(7), 981–998 (2005).
Cadinu, M., Maass, A., Rosabianca, A. & Kiesner, J. Why do women underperform under stereotype threat? Evidence for the role of negative thinking. Psychol. Sci. 16 (7), 572–578 (2005).
Kennedy, B. L., Rawding, J., Most, S. B. & Hoffman, J. E. Emotion-induced blindness reflects competition at early and late processing stages: an ERP study. Cogn. Affect. Behav. Neurosci. 14 (4), 1485–1498 (2014).
Klimesch, W., Schack, B. & Sauseng, P. The functional significance of theta and upper alpha oscillations. Exp. Psychol. 52 (2), 99–108 (2005).
Lee, J. H., Whittington, M. A. & Kopell, N. J. Top-down beta rhythms support selective attention via interlaminar interaction: a model. PLoS Comput. Biol. 9(8), e1003164 (2013).
Buzsáki, G. (2006). Rhythms of the Brain. Oxford university press.
Forbes, C. E. & Leitner, J. B. Stereotype threat engenders neural attentional bias towards negative feedback to undermine performance. Biol. Psychol. https://doi.org/10.1016/j.biopsycho.2014.07.007 (2014).
Forbes, C. E. & Schmader, T. Retraining attitudes and stereotypes to affect motivation and cognitive capacity under stereotype threat. J. Personal. Soc. Psychol. 99 (5), 740–741. https://doi.org/10.1037/a0020971 (2010).
Chun, M. M. & Potter, M. C. A two-stage model for multiple target detection in rapid serial visual presentation. J. Exp. Psychol. Hum. Percept. Perform. 21 (1), 109 (1995).
Raymond, J. E., Shapiro, K. L. & Arnell, K. M. Temporary suppression of visual processing in an RSVP task: an attentional blink? J. Exp. Psychol. Hum. Percept. Perform. 18 (3), 849 (1992).
Smith, S. D., Most, S. B., Newsome, L. A. & Zald, D. H. An emotion-induced attentional Blink elicited by aversively conditioned stimuli. Emotion 6 (3), 523 (2006).
Sergent, C., Baillet, S. & Dehaene, S. Timing of the brain events underlying access to consciousness during the attentional blink. Nat. Neurosci. https://doi.org/10.1038/nn1549 (2005).
MacLean, M., Arnell, K. & Attention A conceptual and methodological framework for measuring and modulating the attentional blink. Percept. Psychophysics, 74(6), 1080–1097. doi:https://doi.org/10.3758/s13414-012-0338-4. (2012).
Cohen, M. X. Analyzing Neural time Series Data: Theory and Practice. (MIT Press, 2014).
Hanslmayr, S. et al. The electrophysiological dynamics of interference during the Stroop task. J. Cogn. Neurosci. 20, 215–225 (2008).
Hayes, A. Process SPSS macro. Computer Software and Manual. Http://Www.Afhayes.Com/Public/Process.Pdf (2012).
Plieger, T. & Reuter, M. Stress & executive functioning: A review considering moderating factors. Neurobiol. Learn. Mem. 173, 107254 (2020).
Jacoby, J. & Sassenberg, K. Interactions do not only tell Us when, but can also tell Us how: testing process hypotheses by interaction. Eur. J. Social Psychol. 41 (2), 180–190 (2011).
Pirlott, A. G. & MacKinnon, D. P. Design approaches to experimental mediation. J. Exp. Soc. Psychol. 66, 29–38. https://doi.org/10.1016/j.jesp.2015.09.012 (2016).
Kaiser, C. R., Vick, S. B. & Major, B. Prejudice expectations moderate preconscious attention to cues that are threatening to social identity. Psychol. Sci. 17 (4), 332–338 (2006).
Chaney, K. E. Preconscious attentional bias to rejection facilitates social distancing for white women in STEM contexts. Soc. Cogn. 40 (5), 438–458 (2022).
Liu, M., Backer, R. A., Amey, R. C. & Forbes, C. E. How the brain negotiates divergent executive processing demands: evidence of network reorganization in fleeting brain states. Neuroimage 245, 118653 (2021).
Xu, I. et al. No evidence of reliability across 36 variations of the emotional dot-probe task in 9,600 participants. Clin. Psychol. Sci. 13 (2), 261–277 (2025).
Kruijt, A. W., Parsons, S. & Fox, E. A meta-analysis of bias at baseline in RCTs of attention bias modification: no evidence for dot-probe bias towards threat in clinical anxiety and PTSD. J. Abnorm. Psychol. 128 (6), 563 (2019).
Meissel, E. E. et al. The reliability and validity of response-based measures of attention bias. Cogn. therapy Res. 46, 1–15 (2022).
Acknowledgements
All aspects of these studies and manuscript were supported by National Science Foundation grant #1329281 awarded to C.E.F. AI (Chat GPT v4o) was used to assist with editing portions of the manuscript for clarity, brevity, and citation consistency.
Author information
Authors and Affiliations
Contributions
C.E.F. conceived and designed the experiments. R.C.A. performed the experiments. C.E.F., R.C.A., I.O.O. analyzed the data, prepared the figures, and drafted the work and revised it critically for important content.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Forbes, C.E., Amey, R.C. & Olcaysoy Okten, I. Aversive responses to stereotypic science and math-based (STEM) images predict women’s long–term STEM memories and underperformance in math. Sci Rep 16, 9581 (2026). https://doi.org/10.1038/s41598-025-27999-3
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-27999-3






