Abstract
Prior research demonstrates that eyewitness memory is susceptible to misinformation. Specifically, memory for an original event can be contaminated by post-event information. Recently, we found that susceptibility to misinformation is reduced when mock eyewitnesses are given a warning about the threat of misinformation either before exposure to the post-event information (pre-warning) or after exposure to the post-event information (post-warning). In the present study, we investigated whether the timing of the warning (pre-warning vs. post-warning) and warning frequency (one warning vs. two warnings) impact memory accuracy as well as metacognitive assessments of memory accuracy. In Experiment 1, we found pre- and post-warning similarly decreased the negative impact of misinformation on memory and increased the metacognitive assessments memory. In Experiment 2, repeated warnings (two warnings) also decreased the negative impact of misinformation on memory and increased metacognitive assessments of memory related to misinformation. However, these benefits of repeated warning came at the cost of metacognitive assessments of memory for information that had not been contaminated by misleading post-event information. These results suggest that warnings can improve eyewitness memory accuracy and support the relationship between memory and confidence; however, over-warning an eyewitness may result in under-confidence in accurate memory.
Similar content being viewed by others
Introduction
When to warn: examining warning effectiveness on eyewitness memory
In the United States judicial system, the reliance on eyewitness reports and testimony is often based on the assumption that eyewitnesses can accurately recount the witnessed crime. Yet, research shows that when eyewitnesses are exposed to post-event information that conflicts with what they witnessed (i.e., misinformation), they produce less accurate reports that are more likely to include erroneous post-event information. This misinformation effect (cf.1) is even more pronounced when individuals retrieve details about the original event prior to the presentation of post-event information, as is often the case during police interviews or calls to first responders (i.e., Retrieval Enhanced Suggestibility2).
Given that it is difficult to ascertain the accuracy of testimony in a real court case, people use other measures, such as an eyewitness’s confidence in their testimony, as a representation of their memory accuracy. Prior research has found that mock jurors are more likely to give a guilty verdict if an eyewitness was highly confident in their memory compared to if the eyewitness had low confidence in their memory3. However, mock eyewitness confidence has been found to be higher when information is quicker to retrieve, regardless of the accuracy of the information4,5. Importantly, providing mock eyewitnesses with a simple warning about the threat of misinformation can greatly reduce susceptibility to misinformation and lower their confidence in the accuracy of misleading post-event information5,6,7. The present experiment explored how the timing and frequency of warnings given to mock eyewitnesses impacts their susceptibility to misinformation as well as the relationship between their memory accuracy and confidence.
Warnings about the threat of misinformation can occur after the presentation of misleading post-event information (post-warning) or prior to the presentation of post-event information (pre-warning). Research has consistently demonstrated that post-warnings reduce susceptibility to misinformation (for review see8. Although limited research has examined pre-warnings in eyewitness memory paradigms, results from this work suggest that pre-warnings can also improve memory accuracy in the face of misinformation9. Recently, Karanian and colleagues extended past research by directly comparing the effects of pre- and post-warnings on susceptibility to misinformation in a repeated retrieval paradigm in which participants were given an initial test prior to being exposed to post-event information6. Specifically, participants watched a silent crime video (Witnessed Event), took an initial memory test about the video (Initial Test), listened to an audio retelling of the event containing misleading (inaccurate) details, consistent (accurate) details, and neutral details (Post-Event Information), and then took a final four-alternative forced choice memory test (Retrieval). They found that warning provided before or after the Post-Event Information reduced susceptibility to misinformation on the final memory test compared to when no warning was given. Importantly, pre- and post-warnings benefitted memory to a similar extent. These warning-related memory benefits were associated with increased neural reactivation of visual regions associated with the witnessed event and decreased reactivation of auditory regions associated with the post-event information, suggesting that both pre- and post-warnings may encourage participants to engage in more effortful retrieval processes, such as source monitoring, during memory decisions10. In follow-up work, pre-warnings were also found to influence neural activity during the encoding of post-event information7. Specifically, participants who received a pre-warning had greater activation in frontal regions associated with source encoding and conflict detection when sentences containing misinformation were presented compared to individuals who did not receive a pre-warning. These results align with the proposal that pre-warnings may increase memory accuracy by additionally modulating encoding-related processes during exposure to misinformation9.
While prior work indicates that pre- and post-warnings can be an effective tool to increase memory accuracy in the face of misinformation, important questions remain about how warnings affect metacognitive assessments of memory accuracy. Past research has found that when mock eyewitnesses were exposed to misleading post-event information and later remembered misinformation as part of the original event, they were as confident in their memory decisions about this misinformation as they were for memory decisions about accurate information11. Thus, exposure to misleading post-event information results in poor calibration between participant’s memory accuracy and confidence. However, it has been suggested that warnings may improve the confidence-accuracy relationship and promote more effortful memory retrieval5,6,12,13, resulting in improved memory and metacognitive assessments of one’s memory. Alternatively, warned participants may view any information they remember from the post-event information as questionable, even if it was truthfully told information, and thus lower their confidence when responding to those answers (i.e., tainted truth14).
To investigate these alternatives, the present study used the same eyewitness memory paradigm used by Karanian and colleagues6,7 to explore how different types of warnings influence memory accuracy as well as the calibration of eyewitness memory accuracy and confidence. In Experiment 1, participants were given a pre-warning, a post-warning, or no warning. We predicted that receiving a warning before or after presentation of post-event information would result in higher final memory test accuracy and lower misinformation selection compared to not receiving a warning. Importantly, we also predicted that both pre- and post-warnings would result in better calibration between memory accuracy and confidence on final memory test questions relating to misleading information.
In Experiment 2, we added an additional condition that introduced two warnings (one before the post-event information and one after the post-event information, i.e., repeated warning). To date, no study has examined the degree to which the frequency of warning impacts memory accuracy and metacognitive assessments of memory accuracy. We hypothesized that providing participants with a warning both before and after the post-event information would further increase memory accuracy and decrease misinformation selection by promoting both encoding and retrieval strategies attributed to pre- and post-warnings in prior research. However, repeatedly warning participants could also lower participants’ confidence in their responses, resulting in worse calibration between memory accuracy and confidence.
Experiment 1
Results
Initial memory test
We conducted a 3 (trial type: misleading, consistent, neutral) x 3 (warning group: no warning, pre-warning, post-warning) ANOVA on initial test memory accuracy. Consistent with past research, there was no significant main effect of trial type (F [2,264] = 1.63, p = .187,\(\:{\:\eta\:}_{p}^{2}\) = 0.01), warning group (F [2,132] = 0.09, p = .912,\(\:{\:\eta\:}_{p}^{2}\) = 0.001), or interaction (F [4,264] = 0.63, p = .641,\(\:{\:\eta\:}_{p}^{2}\) = 0.01). The average accuracy for the initial memory test was 0.68 (SD = 0.47). The average spontaneous selection of the misleading lure was 0.17 (SD = 0.38) and the foil lures was 0.15 (SD = 0.35).
Final memory test
Accuracy
We conducted a 3 (trial type: misleading, consistent, neutral) x 3 (warning group: no warning, pre-warning, post-warning) ANOVA on accuracy on the final memory test (See Fig. 1). Correct recognition of witnessed event details depended on trial type, F [2,264] = 122.70, p < .001; \(\:{\eta\:}_{p}^{2}\)= 0.48. Accuracy on misleading trials (M = 0.49) was significantly lower than consistent trials (M = 0.82), t [268] = -14.91, p < .001*, d = 1.46 and neutral trials (M = 0.67), t [268] = -8.13, p < .001*, d = 0.77. Accuracy on consistent trials was also significantly greater than neutral trials (t [268] = 6.78, p < .0001*, d = 0.78). Memory accuracy did not significantly differ according to warning group (F [2,132] = 0.53, p = .589,\(\:{\:\eta\:}_{p}^{2}\) = 0.01). Importantly, there was a significant interaction (F [4,264] = 7.73, p < .001; \(\:{\eta\:}_{p}^{2}\) = 0.10). Warnings significantly impacted memory accuracy on misleading trials (F [2,132] = 6.91, p = .001; \(\:{\eta\:}_{p}^{2}\) = 0.09), but did not impact accuracy on consistent trials (F [2,134] = 1.56, p = .214,\(\:{\:\eta\:}_{p}^{2}\) = 0.02) or neutral trials (F [2,134] = 0.33, p = .719,\(\:{\:\eta\:}_{p}^{2}\) = 0.01; Fig. 1). Memory accuracy for misleading trials was greater in both the pre-warning group (M = 0.53; t [132] = 3.06, p = .008*, d = 0.65 and post-warning group (M = 0.55; t [132] = 3.36, p = .003*, d = 0.71) compared to the no warning group (M = 0.37). There was no difference between the two warning groups (t [132] = -0.27, p < .961*, d = -0.06).
Accuracy results from the final memory test. “Proportion Correct” refers to the proportion of trials (consistent, neutral, misleading) that were answered correctly (i.e., the number of trials in which participants selected the correct details from the witnessed event divided by the total number of trials associated with each trial type). Error bars indicate standard error.
Misinformation selection
We conducted a 3 (trial type: misleading, consistent, neutral) x 3 (warning group: no warning, pre-warning, post-warning) ANOVA on misinformation selection on the final memory test. Misinformation selection was also dependent upon trial type (See Fig. 2; F [2, 264] = 126.41, p < .001; \(\:{\eta\:}_{p}^{2}\) = 0.49). As expected, misinformation selection was significantly higher on misleading trials (M = 0.43) compared to consistent trials (M = 0.09; t [268] = 14.61, p < .001*; d = 1.61) and neutral trials (M = 0.19; t [268] = 10.29, p < .001*, d = 1.10). Further, misinformation selection was significantly less for consistent trials compared to neutral trials (t [268] = -4.32, p = .001*, d = -0.77). Warning group did not significantly affect misinformation selection (F [2,132] = 3.03, p = .051, \(\:{\eta\:}_{p}^{2}\) = 0.04). However, there was a significant interaction between trial type and warning group (F [4, 264] = 9.18, p < .001, \(\:{\eta\:}_{p}^{2}\) = 0.12; Fig. 2). Warning group significantly impacted the misinformation selection on questions relating to misinformation (F [2, 132] = 10.08, p < .001, \(\:{\eta\:}_{p}^{2}\) = 0.13). Specifically, misinformation selection on misleading trials was significantly lower in both the pre-warning group (M = 0.37; t [132] = -3.71, p < .001*, d = 0.76) and post-warning group (M = 0.35; t [132] = -4.05, p < .001*, d = 0.85) compared to the no warning group (M = 0.58). The difference in misinformation selection on misleading trials between the post-warning and pre-warning conditions was not significant (t [132] = 0.291, p = .954*, d = 0.06). Warning group did not significantly impact misinformation selection on questions relating to neutral (F [2, 132] = 0.90, p = .411, \(\:{\eta\:}_{p}^{2}\)= 0.01) or consistent information (F [2, 132] = 1.19, p = .309, \(\:{\eta\:}_{p}^{2}\)= 0.02).
Misinformation selection results from the final memory test. Misinformation Selection refers to the proportion of trials (consistent, neutral, misleading) that were answered with misinformation from the post-event information (i.e., the number of trials in which participants selected the misinformation from the post-event information on the final memory test divided by the total number of trials associated with each trial type). Error bars indicate standard error.
Calibration error
Calibration is the comparison of confidence levels with the corresponding frequency of accurate answers within those confidence levels (0-100 in units of 10). Calibration error ranges from 0 to 1, with 0 corresponding to complete calibration between accuracy and confidence and 1 corresponding to being completely uncalibrated. We conducted a 3 (trial type: misleading, consistent, neutral) x 3 (warning group: no warning, pre-warning, post-warning) ANOVA on calibration error (See Fig. 3). There was a main effect of warning group (F [2,132] = 3.12, p = .047, \(\:{\eta\:}_{p}^{2}\) = 0.05). Participants who received no warning had significantly higher calibration error (M = 0.24) compared to participants that received a post-warning (M = 0.20, t [132] = 2.47, p = .04*, d = 0.29). There was no difference in calibration error between no warning and pre-warning (M = 0.21, t [132] = 1.55, p = .270*, d = 0.17) nor between pre-warning and post-warning (t [132] = 0.90, p = .643*, d = 0.12).
Calibration Error across warning groups and trial types. Calibration error represents how accurate each participants’ relationship is between their accuracy and their confidence. The y-axis is calibration error which is on a scale from 0 to 1 with 0 representing no calibration error (perfect calibration), and a 1 representing high calibration error.
There was a main effect of trial type on calibration error (F [2,264] = 76.46, p < .001, \(\:{\eta\:}_{p}^{2}\) = 0.37). Calibration error was higher on questions relating to misleading items (M = 0.31) compared to questions relating to consistent items (M = 0.14; t [268] = − 11.72, p < .001*, d = − 1.23) and questions relating to neutral items (M = 0.20; t [268] = − 7.47, p < .001*, d = − 0.76). Calibration error was also higher on neutral items compared to consistent items (t [268] = − 4.25, p < .001*, d = − 0.60).
There was a significant interaction between trial type and warning group (F [4,264] = 6.37, p < .001, \(\:{\eta\:}_{p}^{2}\) = 0.09). Warning group significantly impacted calibration error on misleading trials (F [2,132] = 8.37, p < .001, \(\:{\eta\:}_{p}^{2}\) = 0.11). Specifically, participants who did not receive a warning had higher calibration error on misleading trials (M = 0.39) compared to participants who received a post-warning (M = 0.26; t [132] = 3.73, p < .001*, d = 0.81) or a pre-warning (M = 0.27; t [132] = 3.28, p = .004*, d = 0.63). There was no significant difference in calibration between participants that received a pre-warning or post-warning (t [132] = -0.42, p = .909*, d = − 0.0.09). Warning group did not impact calibration error on consistent (F [2,132] = 1.52, p = .223, \(\:{\eta\:}_{p}^{2}\) = 0.020) or neutral trials (F [2,132] = 0.98, p = .377, \(\:{\eta\:}_{p}^{2}\) = 0.01).
Experiment 1 discussion
Experiment 1 is an important replication and extension of prior research into the benefits of warning participants about the threat of misinformation on eyewitness memory6. First, we found that pre-warning and post-warnings significantly increase eyewitness memory accuracy and decrease misinformation selection compared to when no warning is given. In addition, we found that warnings influence the relationship between memory accuracy and confidence. Overall, there was higher calibration error on final test questions that were inaccurately described in the Post-Event Information narrative (misleading trials) compared to those that were not inaccurately described (consistent or neutral trials)12,13,15. However, when participants received either type of warning, this calibration error on misleading trials was reduced. Together, these results suggest that both pre- and post-warnings mitigate the negative effect of misinformation on memory and improved metacognitive assessments of memory.
Experiment 2
Experiment 2 expanded the findings of Experiment 1 by examining the impact of a repeated warning (i.e., before and after the post-event information) on eyewitness memory accuracy and metacognitive assessment of memory accuracy.
Results
Initial memory test
We conducted a 3 (trial type: misleading, consistent, neutral) x 4 (warning group: no warning, pre-warning, post-warning, repeated warning) ANOVA on initial test memory accuracy. Consistent with past research, we did not find a significant effect of trial type (F [2,376] = 0.796, p = .452, \(\:{\eta\:}_{p}^{2}\) = 0.004), warning group (F [3,188] = 0.14, p = .452, \(\:{\eta\:}_{p}^{2}\) = 0.002), or interaction (F [6,376] = 1.63, p = .138, \(\:{\eta\:}_{p}^{2}\) = 0.03). The average accuracy for the initial memory test was 0.68 (SD = 0.46). The average spontaneous selection of the misleading lure was 0.17 (SD = 0.38) and the foil lures was 0.14 (SD = 0.35).
Final memory test
Accuracy
We conducted a 3 (trial type: misleading, consistent, neutral) x 4 (warning group: no warning, pre-warning, post-warning, repeated warning) ANOVA on accuracy on the final memory test. Correct recognition of witnessed event details depended on trial type, F [2,376] = 136.53, p < .001, \(\:{\eta\:}_{p}^{2}\) = 0.42. Accuracy on misleading trials (M = 0.51) was significantly lower than consistent trials (M = 0.80), t [382] = -15.88, p < .001*, d = 1.34 and neutral trials (M = 0.69), t [382] = -9.80, p < .001*, d = 0.76. Accuracy on consistent trials was also significantly greater than neutral trials (t [382] = 6.07, p < .001*, d = 0.58. Memory accuracy did not significantly differ according to warning group (F [3,188] = 0.159, p = .924, \(\:{\eta\:}_{p}^{2}\) = 0.003). Importantly, there was a significant interaction between trial type and warning group (F [6,376] = 5.08, p < .001, \(\:{\eta\:}_{p}^{2}\) = 0.07). Warnings impacted memory accuracy on misleading trials (F [3,188] = 3.77, p = .011, \(\:{\eta\:}_{p}^{2}\) = 0.06; Fig. 4). There was no effect of warning on consistent trial accuracy (F [3,188] = 2.03, p = .111, \(\:{\eta\:}_{p}^{2}\) = 0.03), or neutral trial accuracy (F [3,188] = 0.53, p = .663, \(\:{\eta\:}_{p}^{2}\) = 0.01). Memory accuracy on misleading trials was greater in the pre-warning group (M = 0.55; t [188] = 2.74, p = .040*, d = 0.58) and repeated warning group (M = 0.56; t [188] = 2.99, p = .017*, d = 0.62) compared to those in the no warning group (M = 0.41). Memory accuracy on misleading trails was numerically greater in the post-warning group compared to the no warning group, but this difference was not statistically significant (M = 0.52; t [188] = -2.38, p = .083*, d = 0.48). There was no significant difference between the post-warning and pre-warning group (t [188] = 0.34, p = .986*, d = 0.07), the pre-warning group and repeated warning group (t [188] = -0.23, p = .995*, d = -0.05), and post-warning group and repeated warning group (t [188] = -0.57, p = .940*, d = -0.11).
Accuracy results from the final memory test. “Proportion Correct” refers to the proportion of trials (consistent, neutral, misleading) that were answered correctly (i.e., the number of trials in which participants selected the correct details from the witnessed event divided by the total number of trials associated with each trial type). Error bars indicate standard error.
Misinformation selection
We conducted a 3 (trial type: misleading, consistent, neutral) x 4 (warning group: no warning, pre-warning, post-warning, repeated warning) ANOVA on misinformation selection on the final memory test. Misinformation selection on the final memory test was dependent upon trial type (F [2,376] = 150.67, p < .001, \(\:{\eta\:}_{p}^{2}\) = 0.44). As expected, misinformation selection was significantly higher on misleading trials (M = 0.40) compared to consistent trials (M = 0.11; t [382] = 15.98, p < .001*, d = 0.70) and neutral trials (M = 0.17; t [382] = 12.38, p < .001*, d = 0.51). Further, misinformation selection was significantly less for consistent trials compared to neutral trials (t [382] = − 3.60, p = .001*, d = − 0.19). Warning group did not significantly affect misinformation selection (F [3,188] = 1.58, p = .196, \(\:{\eta\:}_{p}^{2}\) = 0.02).
There was a significant interaction between trial type and warning group (F [6, 376] = 5.56, p = .0002, \(\:{\eta\:}_{p}^{2}\) = 0.08; Fig. 5). Warning group significantly impacted the misinformation selection on questions relating to misinformation (F [3,188] = 5.46, p = .001, \(\:{\eta\:}_{p}^{2}\) = 0.08) and questions relating to consistent information (F [3,188] = 3.78, p = .01, \(\:{\eta\:}_{p}^{2}\) = 0.06). There was no effect of warning on misinformation selection on neutral trials (F [3,188] = 0.27, p = .848, \(\:{\eta\:}_{p}^{2}\) = 0.004). Misinformation selection on misleading trials was significantly reduced in both the pre-warning group (M = 0.36; t [188] = − 3.09, p = .012*, d = 0.64) and repeated warning group (M = 0.32; t [188] = − 3.81, p = .001*, d = 0.79) compared to those in the no warning group (M = 0.52). Misinformation selection was numerically reduced in the post-warning group (M = 0.39) but this difference was not statistically significant (t [188] = 2.40, p = .081*, d = 0.48). There was no difference in misinformation selection on misleading trials between post-warning and pre-warning group (t [188] = − 0.67, p = .907*, d = − 0.14), the pre-warning group and repeated warning group (t [188] = 0.70, p = .896*, d = 0.15), and post-warning group and repeated warning group (t [188] = 1.38, p = .515*, d = 0.28).
Misinformation selection results from the final memory test. Misinformation Selection refers to the proportion of trials (consistent, neutral, misleading) that were answered with misinformation from the post-event information (i.e., the number of trials in which participants selected the misinformation from the post-event information on the final memory test divided by the total number of trials associated with each trial type). Error bars indicate standard error.
On consistent trials, participants in the repeated warning condition (M = 0.15) selected significantly more misinformation compared to the no warning condition (M = 0.08; t [188] = 3.31, p = .006*, d = 0.68). There were no other significant differences between the post-warning and pre-warning groups (t [188] = 0.02, p = 1.000*, d = 0.004), the pre-warning and repeated warning groups (t [188] = − 2.08, p = .165*, d = − 0.397), or the post-warning and repeated warning groups (t [188] = − 2.09, p = .162*, d = − 0.394) for misinformation selection on consistent trials.
Calibration error
We conducted a 3 (trial type: misleading, consistent, neutral) x 4 (warning group: no warning, pre-warning, post-warning, repeated warning) ANOVA on calibration error (See Fig. 6). There was a main effect of trial type (F [2,376] = 79.80, p < .001, \(\:{\eta\:}_{p}^{2}\) = 0.30). Participants had significantly worse calibration on misleading trials (M = 0.24) compared to neutral trials (M = 0.18; t [382] = 8.86, p < .001*, d = 0.82) and consistent trials (M = 0.15; t [382] = 11.64, p < .001, d = 1.06). Participants also had worse calibration on neutral trials compared to consistent trials (t [382] = 2.78, p = .016, d = 0.35). There was no main effect of warning group on calibration (F [3,188] = 1.30, p = .276, \(\:{\eta\:}_{p}^{2}\) = 0.02).
Calibration Error across warning groups and trial types. Calibration error represents how accurate each participants’ relationship is between their accuracy and their confidence. The y-axis is calibration error which is on a scale from 0 to 1 with 0 representing no calibration error (perfect calibration), and a 1 representing high calibration error.
Importantly, there was a significant interaction between trial type and warning group (F [6,376] = 6.08, p < .001, \(\:{\eta\:}_{p}^{2}\) = 0.09). Warning group significantly impacted calibration error on misleading trials (F [3,188] = 5.74, p = .001, \(\:{\eta\:}_{p}^{2}\) = 0.08) and consistent trials (F [3,188] = 3.75, p = .01, \(\:{\eta\:}_{p}^{2}\) = 0.06). Warning group was not found to impact calibration error on neutral trials (F [3,188] = 0.95, p = .419, \(\:{\eta\:}_{p}^{2}\) = 0.01). For misleading trials, the no warning group had higher calibration error (M = 0.37) compared to participants who received a post-warning (M = 0.26; t [188] = 3.46, p = .004*, d = 0.66), a pre-warning (M = 0.26; t [188] = 3.43, p = .004*, d = 0.68), or repeated warning (M = 0.27; t [188] = 3.25, p = .008*, d = 0.63). There was no significant difference in calibration error for misleading trials between the post-warning and pre-warning groups (t [188] = − 0.05, p = 1.000*, d = − 0.01), the pre-warning and repeated warning groups (t [188] = 0.18, p = .998*, d = 0.04), and the post-warning and repeated warning groups (t [188] = 0.23, p = .996*, d = 0.05). Participants who received a repeated warning had significantly higher calibration error on questions associated with consistent trials (M = 0.18) compared to participants in the pre-warning condition (M = 0.12; t [188] = 3.18, p = .010*, d = 0.65). There were no other significant differences in calibration error on consistent trials between the post-warning and pre-warning groups (t [188] = 1.00, p = .749*, d = 0.20), the pre-warning and repeated warning groups (t [188] = 1.81, p = .273*, d = 0.37), and the post-warning and repeated warning groups (t [188] = 0.79, p = .857*, d = 0.14).
Experiment 2 discussion
We found that warnings influenced both memory accuracy and metacognitive assessments of memory accuracy. In terms of accuracy, pre-warnings and repeated warnings significantly improved memory accuracy on questions relating to misleading post-event information and reduced misinformation selection compared to the no warning condition. Interestingly, post-warning only numerically improved memory accuracy and reduced misinformation selection. A sensitivity analysis did not suggest that the analysis was underpowered, and an analysis of outliers (z-score > |3|) did not suggest that any extreme data points were skewing the data.
To further assess the effects of post-warning on memory accuracy and misinformation production within our two experiments we conducted a mega-analysis16. Mega-analysis, also known as a meta-analysis of individual participant data, pools the raw data from multiple studies that were conducted under comparable conditions. This form of analysis was deemed appropriate due to the identical procedure experienced by the no warning, pre-warning, and post-warning conditions in Experiment 1 and Experiment 2 (for complete analysis see Supplemental Materials). The mega-analysis of memory accuracy and misinformation selection replicated the findings from Experiment 1 and prior research5,6,8,17 demonstrating that post-warning and pre-warning resulted in significantly higher memory accuracy and lower misinformation selection compared to a no warning condition on the final memory test.
In terms of metacognitive accuracy, post-warning, pre-warning, and repeated warning all resulted in better calibration between memory accuracy and confidence on misleading questions compared to the no warning condition. However, the repeated warning group also demonstrated significantly worse calibration on consistent questions on the final memory test compared to the no warning group. This suggests that while warnings benefit memory accuracy and calibration on final test questions relating to misinformation, there may be a cost to repeatedly warning an eyewitness with respect to confidence on consistent trials.
General discussion
The present study investigated the effect of warning on eyewitness memory accuracy and confidence within a repeated testing misinformation paradigm. Consistent with Karanian and colleagues6, we found that pre-warning is as effective as post-warning at reducing the misinformation effect. Past research suggested pre-warning may increase the ability to distinguish between difference sources of remembered information by influencing attention processes during encoding7,9 or by promoting more effortful memory retrieval6. Post-warnings, on the other hand, likely increase protection from misinformation by acting only on retrieval processes, for example by increasing source monitoring during retrieval5,6,8. Despite some potential differences in their underlying mechanisms, the current study confirmed that both pre- and post-warnings improve memory accuracy to a similar extent in the face of misinformation.
Importantly, in addition to improving memory accuracy, we found that pre- and post-warnings also significantly improved the calibration between memory accuracy and confidence on questions relating to misinformation. One explanation for these improved metacognitive judgements is that pre- and post-warnings improve participants’ ability to distinguish between memory sources, resulting in greater accuracy and greater confidence in their responses. Indeed, one study found a post-warning can improve both memory accuracy and result in a better calibrated confidence-accuracy relationship in an eyewitness memory paradigm12. The present findings build upon prior work suggesting that warnings improve memory accuracy by encouraging source monitoring at retrieval6 and suggest that warnings also result in participants making more accurate metacognitive assessments of their memory when rating their confidence18.
In Experiment 2, repeated warnings also significantly improved memory accuracy and confidence-accuracy calibration on questions relating to misinformation. Interestingly, although warnings both before and after post-event information could theoretically promote both encoding and retrieval strategies that benefit memory accuracy, repeated warnings did not have a significantly greater impact on memory accuracy in the face of misinformation compared to a single warning. However, repeated warnings may have a greater benefit for memory accuracy with longer retention intervals before testing, when the effectiveness of single warnings is diminished. Indeed, recent research using a repeated retrieval paradigm found that when there is a delay of 48 h between post-event information exposure and a memory test, the protective effect of post-warnings prior to testing is reduced12. Repeated warnings may remain effective against misinformation in such scenarios given that they would encourage both encoding- based (e.g., conflict detection7) and retrieval-based strategies (e.g., source monitoring10). Of note, the impact of warning will likely depend on the presence of an initial memory test. Indeed, prior work without a memory test prior to misinformation exposure found a post-warning effect when the warning was presented a month after misinformation exposure17. Therefore, future work exploring the effectiveness of repeated warnings compared to single warnings needs to consider the role of repeated memory retrieval on the effectiveness of warnings.
In addition to improving memory accuracy in the face of misinformation, repeated warnings also reduced calibration errors between accuracy and confidence on misleading trials and did so to a similar extent as single (pre or post) warnings. Yet, repeated warnings negatively impacted the relationship between confidence and accuracy on consistent trials. Repeated warning may result in participants perceiving any information from the post-event information, even information consistent with the original event, with skepticism (i.e., tainted truth14), resulting in lower confidence. These results suggest that repeated warnings may need to be used with caution in the context of real-world witnesses, given that lower eyewitness confidence may hinder the perceived accuracy of correct information given in their testimony.
Experiment 1 method
Participants
We recruited 167 participants from Mechanical Turk (MTurk). Prior work6 established a sample size of 81 was needed to appropriately power an analysis of eyewitness memory accuracy with this paradigm, but given that data was collected online in Experiment 1 we increased the sample size to approximately 50 participants per condition. All participants were required to have a “Masters” qualification from MTurk, which is awarded to workers that demonstrate superior performance across thousands of HITs. Sixteen participants were removed before data screening for taking the study more than once. Eight participants were removed from the study due to failure to watch or listen to the entire crime video and/or audio narrative, respectively. Two participants were excluded for being older than 65, which was outside the range we specified from data collection (18–65). Finally, six participants were removed during data screening for scoring below chance on the initial memory test (< 25%).
After exclusions, there were 135 participants included in the analysis (n = 45 per warning group). Participants on average were 43 years old (SD = 10.75), and approximately half (54%) the sample identified as female. Participants identified as white (n = 105), Black or African American (n = 12), Asian or Asian American (n = 7), Native American or Native Alaskan (n = 4), and other or multiple racial identities (n = 7). Participants provided written informed consent in accordance with the experimental procedures of the Institutional Review Board at Fairfield University (protocol number 3781), and they were compensated $8 per hour.
A sensitivity analysis was performed after data collection to assess if the analyses were appropriately powered19. Using G*Power, we calculated minimum effect sizes at a power of 0.8 for the between participants factor of warning group (\(\:{\eta\:}_{p}^{2}\) = 0.031), the within participants factor of trial type (\(\:{\eta\:}_{p}^{2}\) = 0.009), and the within-between interaction of group and trial type (\(\:{\eta\:}_{p}^{2}\) = 0.011). We calculated these minimal detectable effect sizes with a sample size of 135. Our results had effect sizes that were larger than the minimum detectable effect size, which were all small-to-medium effects.
Materials
Experiment 1 was a conceptual replication of a prior behavioral study by Karanian and colleagues6 in a larger online sample where participants read the warnings instead of having the experimenters read the warnings to the participants. Experiment 1 further extended prior work through analyzing the impact of warnings on metacognitive assessments of memory (i.e., calibration error). Experiment 2 extends the findings of Experiment 1 and prior work by exploring the impact of repeated warnings on memory accuracy and metacognitive assessments of memory. The materials include notes of how materials were adapted from prior work6 to be administered online. All materials and experimental protocol for Experiment 1 and Experiment 2 were approved by the Institutional Review Board at Fairfield University (protocol number 3781).
Witnessed event (video)
The witnessed event was a 22-minute excerpt from the black and white silent film “Rififi”20. The clip portrayed four men committing a burglary in the middle of the night. No participant reported seeing this movie before.
Initial memory test
The initial memory test occurred immediately after the witnessed event. Twenty-four questions about specific details from the witnessed event were constructed as memory test stimuli for the initial memory test. An example of one of these questions is, “What do the men hang for privacy?”. Each multiple-choice question relating to the witnessed event appeared in the center of the screen, with four answer choices below it. After each question, participants were asked to rate their confidence in the answer they just gave on a scale of 0-100 (in increments of 10s), where 0 corresponded to “not confident at all” and 100 corresponded to “completely confident”.
Filler task
The filler task was two games of Sudoku for participants to complete at their own pace. There was a timer counting down 10 min at the top of the web page. Participants would be automatically moved to the next section of the experiment after the 10 min were up. Even if participants finished the Sudoku puzzles early, they would not be moved to the next section until the 10-minute timer expired. This filler task ensured the paradigm mirrored the timescale in previous work by Karanian and colleagues6.
Post-event information (audio narrative)
The auditory post-event information was approximately 6 min long and consisted of 115 sentences spoken at a rate of 135 to 160 words per minute. There were 24 critical sentences. Specifically, eight sentences contained consistent information, eight sentences contained neutral information, and eight sentences contained misleading information. Consistent phrases contained details that were accurate regarding the witnessed event (e.g., “The number of men rolling up the rug is three”). Neutral phrases included details from the witnessed event in which the critical detail did not support nor contradict what was within the witnessed event (e.g., “The number of men rolling up the rug is a few”). Misleading phrases included details from the witnessed event that had been changed in the post-event information (e.g., “The number of men rolling up the rug is two”).
Final memory test
The final memory test was largely identical to the initial memory test. For example, all the questions and possible answer choices were identical between the two tests. However, we randomized the order of the response options for each multiple-choice between the two tests, so that the order of answer choices differed between the two tests.
Procedure
Participants first answered if they were accessing the study from a laptop or desktop computer with stable internet. Since our study is only compatible with non-mobile devices, if they answered “no” they were told to switch to a non-mobile device to take part in the study. After providing consent, participants confirmed they were within the United States. Participants were told they would watch a 22-minute video and adjust their computer volume before starting the video. All participants were instructed to watch the witnessed event in one sitting and to watch the witnessed event carefully as they may have a memory test about it later. Participants had the option to advance to the next screen at their own leisure. If they advanced before the 22-minute mark, then their data was removed from the analyses.
After watching the witnessed event, participants were given instructions about the initial memory test. They were told to answer each question about the witnessed event and then to rate their confidence in each answer. If they did not know the answer, they were told to make their best guess, even if they had no idea. Participants were unable to advance to the next question if the current question was unanswered. Participants answered each question at their own pace. Following the initial test, participants were given instructions on how to play Sudoku. All participants were given 10 min to play two games of Sudoku. After 10 min, participants were automatically advanced to the next part of the experiment. Participants were not able to advance to the next phase before the 10 min were up, even if they completed both puzzles.
Participants then listened to the post-event information, which was in the form of an audio narrative. Before listening to the post-event information, those in the pre-warning group received a warning in written form. They read: “You will have to answer questions regarding the video you previously watched for a second time. We will play a narrative of that video; however, we are uncertain as to the source of the narrative. Therefore, we were unable to verify the accuracy of the narrative. As such, base your answers only on what you saw in the video, and not on what you hear in the narrative.” The post-event information phase was approximately six minutes in length.
Following the post-event information, participants were given instructions for the final memory test. Those in the post-warning group also received a warning in written form. They read: “You will have to answer questions regarding the video you previously watched for a second time. We just played a narrative of that video; however, we are uncertain as to the source of the narrative. Therefore, we were unable to verify the accuracy of the narrative. As such, base your answer only on what you saw in the video, and not on what you heard in the narrative.”. After the final memory test, participants filled out the post-retrieval survey and were given the option to provide additional written explanations of their answers. Participants clicked to select their answer, and then typed out a response on the keyboard, instead of writing their responses by hand.
Finally, participants provided their age, gender, race/ethnicity, level of English proficiency, and how many psychology classes they have taken (in college or otherwise). Participants were debriefed and asked to provide their MTurk ID to receive compensation.
Measures
See the Supplemental Materials for a signal detection-based analysis of eyewitness discriminability. Proportion accuracy on the final memory test is score as hits while proportion misinformation selection is scored as false alarms.
Test accuracy
Initial test accuracy was calculated as the proportion of questions participants got correct on the initial memory test. Initial test confidence was calculated from the average confidence rating for each initial memory test question. Final test accuracy was also calculated at the proportion of questions that participants got correct on the final memory test. This accuracy was further investigated by trial type and by warning type.
Final test misinformation selection
Misinformation selection was calculated as the proportion of questions on which participants selected the misleading foil from the post-event information as being from the original event. Importantly, participants only encountered the misleading foil in the post-event information for the final memory test questions. We examine misinformation selection for all the trial types to check if misinformation selection on misleading trials is statistically greater than spontaneous misinformation selection on consistent and neutral trials.
Final test confidence—accuracy relationship (calibration error)
We calculated calibration error to investigate the relationship between confidence and accuracy. The calibration error of a participant represents the absolute difference between actual accuracy and their predicted accuracy (i.e., confidence) as a function of frequency of final memory test questions that received a specific confidence rating21,22,23. Perfect calibration is when predicted accuracy and actual accuracy perfectly aligned (e.g., items that have an 80% confidence rating have an average of 80% accuracy). Importantly, a larger mismatch between confidence and actual accuracy would result in larger calibration error. The calibration analyses included all questions on the final memory test.
Only calibration error was analyzed due to length and focus of the current manuscript. Average confidence by condition and trial type is included in Table 1. Table 2 includes average accuracy by condition and trial type to compare to the confidence results.
All post-hoc pairwise comparisons were Tukey’s Test corrected and are marked with the notation (*). Furthermore, analysis of effect size Cohen’s d is calculated dividing the difference between groups by the standard deviation24 which is deemed an appropriate standardized effect size regardless of repeated measures or independent groups designs25.
Experiment 2 method
Participants
Participants were recruited from MTurk (n = 151) for $11.00 USD or the Fairfield University Psychology Participant Pool (n = 61) for course credit or a $10.00 USA Amazon gift card. As in Experiment 1, the goal was to collect approximately 50 participants per warning condition. All participants completed the study online and participants recruited from MTurk had a “Masters” qualification. Fifteen participants were excluded for not completing the entirety of the crime video and/or audio narrative phase. An additional five participants were excluded for performance below chance (< 25%) on the initial memory test. The final sample was 192 participants from MTurk (n = 143) and the participant pool for either credit (n = 43) or monetary compensation (n = 6). Comparisons of in-person and online data collected for Experiment 2 can be found in the Supplemental Materials. The mean age of participants was 37 years old (SD = 13.87). Participants identified as female (n = 114), male (n = 74), non-binary (n = 2), or withheld this information (n = 2). Participants identified as white (n = 146), Black or African American (n = 17), Asian or Asian American (n = 18), Native American or Native Alaskan (n = 1), and other or multiple racial identities (n = 8). Participants were provided with written informed consent in accordance with the experimental procedures of the Institutional Review Board at Fairfield University (protocol number 3781).
We conducted a sensitivity analysis after data collection to assess if the analyses were appropriately powered19. Using G*Power, we calculated minimum effect sizes at a power of 0.8 for the between participants factor of warning group (\(\:{\eta\:}_{p}^{2}\) = 0.027), the within participants factor of trial type (\(\:{\eta\:}_{p}^{2}\) = 0.007), and the within-between interaction of group and trial type (\(\:{\eta\:}_{p}^{2}\) = 0.010). We calculated these minimum detectable effect sizes with a sample size of 188, given that G*Power assumes equal size groups and our lowest sample size per group was 47 (post-warning). Our results had effect sizes that were larger than the minimum detectable effect size, which were all small-to-medium effects.
Materials
The same materials from Experiment 1 were used in Experiment 2.
Procedure
The procedure from Experiment 1 was nearly identical to the procedure in Experiment 2 with the exception of the warning groups. Specifically, in addition to the no warning, pre-warning, and post-warning group, there was an additional group that received two warnings (repeated warning group), one prior to the presentation of post-event information (using the same wording as the pre-warning), and the other directly following the post-event information and just prior to the final memory test (using the same wording as the post-warning).
Data availability
Neither study was pre-registered. The first author can be contacted to request access to all study materials. All data and analysis code can be found here: https://osf.io/wr56v/?view_only=ff7c0eedd96f47bb8e0081f02d47f6f9.
References
Loftus, E. F. & Elizabeth, F. in Loftus. In A History of Psychology In Autobiography. 199–227 (eds Lindzey, I. X. & Runyan, W. M.) (American Psychological Association, 2005). https://doi.org/10.1037/11571-006
Chan, J. C. K., Thomas, A. K. & Bulevich, J. B. Recalling a witnessed event increases eyewitness suggestibility: the reversed testing effect. Psychol. Sci. 20, 66–73 (2009).
Brewer, N. & Burke, A. Effects of testimonial inconsistencies and eyewitness confidence on mock-juror judgments. Law Hum. Behav. 26, 353–364 (2002).
Nelson, T. O., Narens, L. & Metamemory A Theoretical Framework and New Findings. In Psychology of Learning and Motivationvol. 26125–173 (Elsevier, 1990).
Thomas, A. K., Bulevich, J. B. & Chan, J. C. K. Testing promotes eyewitness accuracy with a warning: implications for retrieval enhanced suggestibility. J. Mem. Lang. 63, 149–157 (2010).
Karanian, J. M. et al. Protecting memory from misinformation: Warnings modulate cortical reinstatement during memory retrieval. Proc. Natl. Acad. Sci. 117, 22771–22779 (2020).
Karanian, J. M., Thomas, A. K. & Race, E. Warning before misinformation exposure modulates memory encoding. Cogn. Affect. Behav. Neurosci. https://doi.org/10.3758/s13415-024-01183-y (2024).
Blank, H. & Launay, C. How to protect eyewitness memory against the misinformation effect: A meta-analysis of post-warning studies. J. Appl. Res. Mem. Cogn. 3, 77–88 (2014).
Greene, E., Flynn, M. S. & Loftus, E. F. Inducing resistance to misleading information. J. Verbal Learn. Verbal Behav. 21, 207–219 (1982).
Johnson, M. K. Memory and reality. Am. Psychol. 61, 760–771 (2006).
Bonham, A. J. & González-Vallejo, C. Assessment of calibration for reconstructed eye-witness memories. Acta Psychol. (Amst). 131, 34–52 (2009).
Chan, J. C. K., O’Donnell, R. & Manley, K. D. Warning weakens retrieval-enhanced suggestibility only when it is given shortly after misinformation: the critical importance of timing. J. Exp. Psychol. Appl. 28, 694–716 (2022).
Higham, P. A., Blank, H. & Luna, K. Effects of postwarning specificity on memory performance and confidence in the eyewitness misinformation paradigm. J. Exp. Psychol. Appl. 23, 417–432 (2017).
Echterhoff, G., Groll, S. & Hirst, W. Tainted truth: overcorrection for misinformation influence on eyewitness memory. Soc. Cogn. 25, 367–409 (2007).
Berntsen, D. & Bohn, A. Remembering and forecasting: the relation. Mem. Cognit. 38, 265–278 (2010).
Eisenhauer, J. G. Meta-analysis and mega‐analysis: A simple introduction. Teach. Stat. 43, 21–27 (2021).
Oeberst, A. & Blank, H. Undoing suggestive influence on memory: the reversibility of the eyewitness misinformation effect. Cognition 125, 141–159 (2012).
Horry, R., Colton, L. M. & Williamson, P. Confidence–accuracy resolution in the misinformation paradigm is influenced by the availability of source cues. Acta Psychol. (Amst). 151, 164–173 (2014).
Giner-Sorolla, R. et al. Power to Detect What? Considerations for Planning and Evaluating Sample Size. Pers. Soc. Psychol. Rev. 28(3), 276–301. https://doi.org/10.1177/10888683241228328 (2024).
Rififi (Pathé-Consortium Cinéma, (1955).
Wong, J. T., Cramer, S. J. & Gallo, D. A. Age-related reduction of the confidence–accuracy relationship in episodic memory: effects of recollection quality and retrieval monitoring. Psychol. Aging. 27, 1053–1065 (2012).
Dodson, C. S., Bawa, S. & Krueger, L. E. Aging, metamemory, and high-confidence errors: A misrecollection account. Psychol. Aging. 22, 122–133 (2007).
Wagenaar, W. Calibration and the effects of knowledge and reconstruction in retrieval from memory. Cognition 277–296. https://doi.org/10.1016/0010-0277(88)90016-9 (1988).
Cohen, J. Statistical Power Analysis for the Behavioral Sciences (Academic, 1988).
Dunlap, W. P., Cortina, J. M., Vaslow, J. B. & Burke, M. J. Meta-analysis of experiments with matched groups or repeated measures designs. Psychol. Methods. 1, 170–177 (1996).
Author information
Authors and Affiliations
Contributions
McKinzey Torrance, Jessica Karanian, Elizabeth Race, and Ayanna Thomas contributed to the conception, interpretation, and writing of the manuscript and approve of the submitted version. McKinzey Torrance conducted all data analyses.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Torrance, M.G., Karanian, J.M., Race, E. et al. Examining the impact of warnings on eyewitness memory. Sci Rep 15, 33508 (2025). https://doi.org/10.1038/s41598-025-17377-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-17377-4








