Introduction

One night in May 2011, people heard screaming next door and decided to call the police. Upon the police’s arrival, 6-year-old Vicky opened the door and immediately stated: “My father was here. He just left. My father did it.” When the police entered the house and found a woman lying on the floor in a pool of her own blood. When the ambulance arrived, the emergency physician confirmed that the woman had been stabbed to death. Again, Vicky incriminated her father. That same day, police investigators interviewed Vicky in a child-friendly studio, where she implicated her father once more. When the police questioned Vicky 2 months later for a second time, she remained consistent in her testimony but also added some new details to the story. According to Vicky, she was in a bedroom when she heard a noise downstairs. When she went downstairs, she stated that she saw her father stabbing her mother with a knife. The father denied all charges and, hence, Vicky became a crucial witness in the legal case1,2.

Children’s testimony can become crucial when evidence such as photographs, videos, or audio tapes is lacking3. The central point of discussion is oftentimes whether the child’s testimony is accurate or whether it contains memory errors (e.g., distortions or false memories). Such memory errors may arise due to suggestive pressure and are classified as suggestion-induced false memories4. The source monitoring framework5 stipulates that memory can stem from an own experience, overhearing someone else’s story, imagination, or watching an event on TV. Thus, sources have different visual, auditory, and other sensory characteristics. Suggestion-induced false memories arise when individuals attribute an incorrect source to their memory (i.e., a source monitoring error). Young children are more prone to conform to suggestive pressure than older children and adults6. That is, young children are more likely to change their accounts and report false details than older children or adults when they encounter (i) repeated questioning, (ii) incorrect information, and (iii) questions that imply the expected answer7.

Based on findings on children’s susceptibility to different sorts of external influence, one might assume that children are more likely to create false memories than adults. However, there are specific conditions where false memories increase with age8,9,10. This developmental reversal phenomenon refers to the finding that young children produce fewer (spontaneous) false memories compared to older children and adults. This effect has often been demonstrated when using the Deese-/Roediger-McDermott (DRM) paradigm11,12. In this paradigm, participants study wordlists that are associatively related to each other, for example tears, weep, sorrow. The critical lure word, cry, is missing from the list. A significant proportion of participants erroneously recall and recognize the critical lure in subsequent memory tests13. Participants with a history of childhood sexual abuse or diagnosed with post-traumatic stress disorder have shown heightened false memory levels when the words were associated to the experienced trauma14,15. Though the relevance of the DRM paradigm for legal (and clinical) practice can be argued to be limited, the results do provide information about associations between memories than might be activated eyewitness testimony after experiencing a traumatic event.

Theories such as Fuzzy Trace Theory16 and Associative Association Theory13 explain the production of spontaneous false memories. The Fuzzy Trace Theory postulates that experiences are stored along two memory traces: verbatim and gist traces. Verbatim traces involve item-specific details (e.g., the font of words), whereas gist traces rely on the underlying semantic background of an experience (retrieving the word ‘cry”). Following Fuzzy Trace Theory, false memories occur due to reliance on gist traces, when verbatim traces are unable to be retrieved. According to the Associative Association Theory, false memories arise due to spreading activation. That is, information spreads to associated concepts in one’s knowledge and related, but non-represented concepts become activated. Older children and adults are more prone to such false memories, because young children’s knowledge base is less well developed17 and spreading activation is less automatic and slower in children’s memory networks than adults’. Such false memories are classified as spontaneous false memories because they occur without external suggestive pressure. Taken together, research suggests that the claim that young children are more prone to false memories than older children and adults is not fully tenable. Instead, it depends upon the nature and mechanisms underlying false memory generation. That is, younger children are more prone to false memories because of suggestive pressure6,7, but they are less prone to spontaneous false memories than older children and adults8,9,10.

In cases like Vicky’s, expert witnesses can support the legal decision-making process. For legal cases, involved parties (e.g., lawyers and judges) must be knowledgeable about children’s memory. In Vicky’s case, two psychological experts, a clinical psychologist and a memory researcher, examined Vicky’s statement1,2. The clinical psychologist opined that Vicky made an incorrect association due to autosuggestion, a subtype of spontaneous false memory that sometimes occurs in young children (i.e., self-suggestion)18,19. Moreover, the clinical psychologist reasoned that additional details were a result of suggestive interviewing. The memory researcher stated that autosuggestion was unlikely due to developmental reversal effects, but that children were able to provide accurate statements20. Furthermore, he argued that reporting additional (i.e., reminiscent) details was not indicative of a false report21. Rather, adding details might be the result of appropriate interviewing22. Indeed, Vicky was interviewed in a child-friendly studio without suggestive questions. Furthermore, her initial disclosure was made spontaneously. Eventually, Vicky’s father received an 18-year prison sentence. Vicky’s case illustrates the importance of memory experts in assessing the validity of eyewitness statements. Each case is unique, and it is vital that experts possess up-to-date knowledge regarding, in this case, (children’s) memory and false memory development.

In cases when memory experts are not involved, it is up to legal professionals to evaluate the validity of an eyewitness statement. The scientific consensus is that legal professionals lack sufficient knowledge of eyewitness memory23,24,25. For example, police investigators, prosecutors, and defense lawyers believed that children aged 7–11 were less reliable and more suggestible than adult witnesses23,26,27. However, in contrast to research on developmental reversal, spontaneous disclosures of children were judged less reliable than spontaneous disclosures of adults28.

Inspired by Vicky’s case, we explored legal professionals’ knowledge about eyewitness memory. Specifically, we provided legal professionals with a case vignette describing Vicky’s case. We manipulated Vicky’s age (i.e., 6 or 22 years old) and disclosure (i.e., spontaneous or suggestion-induced). We were interested in legal practitioner’s awareness of the difference between suggestion-induced and spontaneous false memories and the accompanying developmental trends. Based on earlier studies, we hypothesized that (i) legal professionals believe children to provide less reliable statements compared to adults8,9,10, and (ii) that it is unlikely that legal professionals would be aware of the fact that this difference may be narrowed when taking into account the spontaneous versus suggestion-induced nature of the statement23,24,25,26,27.

Methods

Data and materials are available on the Open Science Framework (https://osf.io/4p3f7/?view_only=bcafa16ec7fe4172870a1158348b2135).

Participants

Data collection took place between February 2015 and March 2016. Judges, attorneys, and police officers were invited by e-mail to participate in a study about their expert opinion on the credibility of eyewitness statements. The authors’ networks were used to send e-mails to different professionals and participants were encouraged to share the link with professional acquaintances. The survey was administered through Qualtrics29 and took approximately 15 min to complete. The performed procedure was in accordance with the ethical standards of the institutional and/or national research committee and its later amendments or comparable ethical standards.

A total of 201 respondents started the online study. Of these, data from 99 participants had to be excluded because they did not answer any of the questions (n = 84) or did not complete the survey (n = 15). The final sample (N = 102; NSwedish/Norwegian = 63, NDutch = 39; 49 women, 53 men; Mage = 43.8, SD = 11.5) consisted of 49 judges, 39 police officers, 12 attorneys, and 2 without further specification of their profession. The sample varied considerably concerning their professional experience (Myearsofexperience = 12.7, SD = 8.8, range 1–39 years) and the numbers of cases on eyewitness credibility they are dealing with within their profession (ranging from less than 1 to more than 31 cases per month). We randomly assigned participants in a 2 (Age: 6 vs. 22 years old) × 2 (Disclosure: spontaneous vs. potentially suggestion-induced) between-subjects design. The sample sizes per condition were 6-year-old Vicky AND spontaneous disclosure (n = 22), 22-year-old Vicky AND spontaneous disclosure (n = 32), 6-year-old Vicky AND suggestion-induced disclosure (n = 25), and 22-year-old Vicky AND suggestion-induced disclosure (n = 23). A sensitivity power analysis in G*Power (α = 0.05, power = 0.80, N = 102, df = 98, number of groups = 4) revealed that the study can detect effects of at least f = 0.79 (i.e., Cohen’s d = 0.39)30,31.

Materials and procedure

Case vignette

The case vignette described a case report from ‘Vicky’. Vicky was 6 or 22 years old and the sole witness to the murder of her mother. The neighbors had called the police because they heard someone screaming. The police arrived within 15 min and Vicky opened the door. Vicky either spontaneously stated that her father murdered her mother when the police arrived or had phoned her grandmother before the police arrived. The grandmother stated that she had asked Vicky questions about what happened. On the same day, the police interviewed Vicky in a child-friendly studio. During this interview, she accused her father once more. Two weeks later, she was interviewed again, and her statement was consistent (i.e., her father murdered her mother). During the police interviews, Vicky was not subjected to suggestive questions. According to Vicky, she heard noises when she was in her bedroom. She stated that she went downstairs and saw her father stabbing her mother with a knife. Her father denies all accusations. Vicky was the only eyewitness and there was no other evidence.

Credibility ratings

Participants answered the following questions: (1) In the trial against Vicky’s father, how strongly does Vicky’s witness statement weigh as case-supporting evidence?; (2) What strength of evidence does Vicky’s witness statement have in court?; (3) To what extent is Vicky’s witness statement incriminating evidence?; (4) How likely do you think that Vicky’s father is guilty?; (5) How likely do you think that Vicky formed a false memory?; (6) How would you judge the general credibility of Vicky’s statement?; (7) How would you judge Vicky’s reliability? The questions were answered on a 7-point Likert scale (1 = very weak/unlikely/incredible/unreliable; 7 = very strong/likely/credible/reliable). Participants could also provide any remaining comments. We wanted to check whether (some) items represent one scale. The initial reliability analysis (excluding item 4 due to lack of content overlap), resulted in Cronbach’s α = 0.68. Removing item 5 from the reliability analysis resulted in Cronbach’s α = 0.85. We computed a mean credibility assessment score of items 1, 2, 3, 6, and 7. Though the items had different answer options, all were a measure of credibility and reliability.

Metacognition ratings

The metacognition ratings included questions about general factors that might have led to the credibility ratings. The questions were: (8) How much was your judgment influenced by intuition?; (9) How much was your judgment influenced by experience?; (10) How much was your judgment influenced by knowledge?; (11) How much did Vicky’s age influence your credibility ratings?; (12) How much did the fact that Vicky had not had contact with any other person/or had called her grandmother before she talked to the police influence your credibility ratings?; (13) How useful do you think it would be to ask a memory expert for their opinion on the credibility of Vicky’s statement? All questions were answered on a 7-point Likert scale (1 = very little/useless; 7 = very much/useful). The participants could then list any remaining comments. We did not perform a reliability analysis, because the content of the metacognition items does not overlap.

Belief ratings

The belief ratings pertained to beliefs about age differences in spontaneous and suggestion-induced false memories and therefore assessed legal professionals’ awareness of developmental reversals. More specifically, the two items were: (14) In your opinion, compared to adults, are children generally more or less likely to develop a spontaneous false memory?; (15) Compared to adults, are children generally more or less likely to develop a suggestion-induced false memory? Both questions were answered on a 7-point Likert scale (1 = much less likely; 7 = much more likely). The participants could then list any remaining comments. We did not perform a reliability analysis, because the two questions pertain to two different constructs (i.e., spontaneous and suggestion-induced false memories).

Results

Table 1 shows an overview of the credibility, metacognition, and belief ratings per condition. Reported numbers and percentages are relative to the complete sample (N = 102) unless otherwise specified. Supplementary Table 1 provides all ratings per condition. Supplementary Table 2 (see the Open Science Framework) reports the estimated marginal means per main effects. To control for the false discovery rate (FDR) because of multiple testing, we applied the Benjamini-Hochberg (BH32) correction method with a chosen FDR threshold of 0.10. Following the steps of ANOVA, we first applied the BH correction to interaction effects per group rating. Per the correction method, we ranked p-values in ascending order and applied the BH correction was applied by calculating the critical value for each test based on its rank. The largest p-value less than or equal to its critical value was used as the threshold for significance. If an interaction effect was significant, we computed and examined simple effects. If an interaction effect was non-significant, we computed and examined main effects. Table 3 in the Supplemental Materials on the Open Science Framework provides further details on the BH correction procedure.

Table 1 Overview of credibility, metacognition, and belief ratings per condition.

Credibility ratings

A 2 (Age: 6 vs. 22 years old) × 2 (Disclosure: Spontaneous vs. Suggestion-induced) two-way ANOVA on the credibility assessment score (i.e., mean score of items 1, 2, 3, 6, and 7) showed statistically non-significant interaction and main effects (all ps > BH critical value).

A 2 (Age: 6 vs. 22 years old) × 2 (Disclosure: Spontaneous vs. Suggestion-induced) two-way ANOVA on the credibility ratings of item 4 resulted in statistically non-significant interaction and main effects (all ps > BH critical value). For item 5, a 2 (Age: 6 vs. 22 years old) × 2 (Disclosure: Spontaneous vs. Suggestion-induced) two-way ANOVA showed a non-significant interaction effect (p > BH critical value), a significant main effect of Age, F(1,98) = 5.87, p = 0.017, ƞ2 = 0.06, and a significant main effect of Disclosure, F(1,98) = 6.44, p = 0.013, ƞ2 = 0.06, regarding the likelihood of Vicky having formed a false memory. That is, participants were more likely to assume that a false memory was formed when the case vignette included a 6-year-old witness (M = 3.45, SD = 1.41) than a 22-years-old witness (M = 2.69, SD = 1.40; Cohen’s d = 0.54), irrespective of the disclosure (i.e., spontaneous or suggestion-induced). Professionals were also more likely to assume that Vicky formed a false memory when she had talked to the grandmother (M = 3.46, SD = 1.43) rather than when she disclosed spontaneously (M = 2.76, SD = 1.37; Cohen’s d = 0.50), irrespective of age. Sixty participants (58.8%) needed more information to form a conclusion about Vicky’s credibility/reliability.

Metacognition ratings

A 2 (Age: 6 vs. 22 years old) × 2 (Disclosure: Spontaneous vs. Suggestion-induced) two-way ANOVA resulted in significant main effects of Age for three of the ratings. All other main and interaction effects were statistically non-significant (all ps > BH critical value).

First, there was a statistically significant main effect of Age for how much Vicky’s age influenced the credibility rating (item 11), F(1,98) = 14.20, p < 0.001, ƞ2 = 0.127. Participants who read a case vignette in which Vicky was 6 years old rated that her age influenced their credibility ratings stronger (M = 4.89, SD = 1.52) than participants who read a case vignette in which Vicky was 22 years old (M = 3.76, SD = 1.53; Cohen’s d = 0.74).

Second, we found a statistically significant main effect of Age for how much credibility judgements were influenced by the fact that Vicky did (or did not) have contact with any other person/her grandmother (item 12), F(1,98) = 16.48, p < 0.001, ƞ2 = 0.144. Participants who received a case vignette in which Vicky was 6 years old rated that having had contact with no other person/her grandmother influenced their credibility ratings stronger (M = 5.57, SD = 1.26) than participants who received a case vignette in which Vicky was 22 years old (M = 4.56, SD = 1.41; Cohen’s d = 0.75).

Finally, the main effect of Age was also significant for how useful it would be to ask a memory expert for his or her opinion on the credibility of Vicky’s statement (item 13), F(1,98) = 5.022, p = 0.027, ƞ2 = 0.049. Participants who received a case vignette in which Vicky was 6 years old found it more useful (M = 4.64, SD = 2.11) to ask a memory expert than participants who received a case vignette in which Vicky was 22 years old (M = 3.67, SD = 2.09; Cohen’s d = 0.46).

Twenty-four participants (23.5%) elaborated on their opinions about using an expert witness in court, and these opinions were mixed. Fourteen participants (13.7%) thought it depended upon the case at hand whether a memory expert was useful. For example, some mentioned that it was only useful when police officers, lawyers, and/or judges had no knowledge about child development or memory functioning, when their decision-making was influenced by their expertise, or when there were doubts about reliability (all n = 1). Only five participants (4.9%) thought involving a memory expert was useful (e.g., “It is always useful to ask for an expert opinion in order to underpin an explanation”). Three participants (2.9%) stated that memory experts were not useful (e.g., “Memory experts? Sounds like nonsense”; “It is about facts. Knowledge about memory and explanations by a memory expert are not allowed to be decisive”). Two participants (2.0%) misunderstood the questions.

Beliefs ratings

We conducted a 2 (Age: 6 vs. 22 years old) × 2 (Disclosure: Spontaneous versus Suggestion-induced) two-way ANOVA was performed on the belief of children’s proneness to develop a spontaneous false memory (item 14). There was no statistically significant interaction effect or main effect of Age (all ps > BH critical value). A main effect of Disclosure emerged, F(1,98) = 5.772, p = 0.018, ƞ2 = 0.06. That is, children were rated more likely to develop a spontaneous false memory by participants who received the suggestion-induced case vignette (M = 4.96, SD = 1.15) than participants who received the case vignette in which Vicky spontaneously disclosed (M = 4.28, SD = 1.46; Cohen’s d = 0.52).

For the belief about proneness to develop suggestion-induced false memories (item 15), a 2 (Age: 6 vs. 22 years old) × 2 (Disclosure: Spontaneous vs. Suggestion-induced) two-way ANOVA returned no statistically significant interaction effect, nor main effects (all ps > BH critical value).

Nineteen participants (18.6%) elaborated on the susceptibility to false memories. Most of them (n = 6, 5.9%) stated that they did not have the knowledge to form an opinion. Four participants (3.9%) stated that children were more likely to form a false memory, while two participants (2.0%) stated that adults were most susceptible. Three participants (2.9%) stated that other factors are of importance (e.g., interviewing techniques) and two participants (2.0%) stated that everyone is prone to false memories.

Further investigation

Participants had the opportunity to express their interest in conducting a further investigation of the case if they were involved in prosecuting the father. Only four participants (3.9%) did not want to examine the case further when they were involved in the prosecution of the father. Two of them (both judges) elaborated that it was not the task of a judge to do so. All other participants (n = 98, 96.1%) wanted more information. Topics to further investigate the case included investigating testimonies of everyone involved (e.g., alibis, motives, details of Vicky’s conversation with her grandmother, family dynamics (n = 58), further examining Vicky’s mental state and testimony (e.g., details of the police interview, Vicky’s motive; n = 27), any forensic evidence (e.g., bloodstains, details about phone calls; n = 27) or any missing evidence (not specified; n = 1).

Discussion

In this vignette study, we examined legal professionals’ awareness of false memory development. We were primarily interested in their knowledge about suggestion-induced and spontaneous false memories and the accompanying developmental trends (i.e., developmental reversal). Instead of examining legal professionals’ knowledge by solely relying on belief statements, we measured their knowledge through case vignettes. Generally, respondents were more likely to assume that a false memory was formed when the witness was a child rather than an adult and when the witness had talked to a grandmother before making her incriminating statement to the police rather than making a spontaneous disclosure. Respondents were more likely to consider the opinion of a memory expert and were more cautious about the reliability of the memory of a 6-year-old witness than a 22-year-old witness.

Professionals who received a case vignette in which Vicky was 6 years old deemed it more likely that she had formed a false memory than professionals who received the case vignette in which Vicky was 22 years old, regardless of the type of disclosure (i.e., spontaneous or suggestion-induced). This finding reflects the general belief that children are more vulnerable to false memories compared to adults and likely reflects the default assumption that children are inherently susceptible in making memory errors8. However, this general belief is in contrast with research on spontaneous false memory8,9,10. That is, following theoretical explanations of false memories, adults are more prone to spontaneous false memories than children (i.e., developmental reversal)16,33. Endorsing a default assumption on children’s susceptibility to producing false memories might result in legal professionals incorrectly disregarding the child’s testimony or wrongly accepting the adult’s testimony at face value. These findings are in line with earlier studies illustrating that legal professionals lack detailed knowledge on the functioning of (eyewitness) memory23,24,25.

Interestingly, the professionals did evaluate 6-year-old Vicky as more reliable than 22-year-old Vicky. This finding parallels the finding that young children do not tend to lie about their parents’ wrongdoings34 and might be related to research showing that children are able to provide accurate accounts35, even when they pertain to traumatic events36. Although speculative, our results might be a sign that although legal professionals lack knowledge on nuanced developmental trends in false memory generation, they do possess accurate knowledge on other aspects related to children’s memory.

Legal professionals also deemed it more likely that Vicky had formed a false memory when she spoke to the grandmother, regardless of age, than that the false memory would occur spontaneously. This finding may reflect their general awareness of memory suggestibility in that for both adult and child eyewitnesses, suggestive information can distort memory37,38, but with children being more likely to form suggestion-induced false memories compared to adults6,35,39.

Lastly, legal professionals who received a vignette in which Vicky was 6 years old found it more useful to consult a memory expert than professionals who read the vignette in which Vicky was 22 years old. Yet, in general, legal professionals did not think it was useful to consult memory experts. This is again in line with findings that legal professionals lack (some) knowledge about the functioning of memory28,40. In such cases, legal professionals could benefit from the advice of memory experts1,41.

An important question is how practically relevant the current findings are. One way to assess the practical relevance is to interpret the effect sizes42,43. The effect sizes (Cohen’s d) of this study were in the range of 0.50–0.75. This range indicates that between 69.1 and 77.3% of the data of one group (e.g., the group receiving information that Vicky talked to her grandmother) is above the mean of the other group (e.g., the group receiving information that Vicky made a spontaneous disclosure)44. These percentages can be interpreted as high. However, a critical and closer look at the data also shows that many of the observed means where below a rating of 4 (on a scale from 1 = unlikely to 7 = likely). For example, participants were more likely to rate that a false memory was formed when the witness was 6 years old (M = 3.45, SD = 1.41) than when the witness was 22 years old (M = 2.69, SD = 1.40). Both means are below 4, indicating that both means (3.45 and 2.69) were on the “unlikely” part of the scale. Overall, what our results might imply is that although we obtained statistical significance from some findings, the legal professionals were generally cautious in their responding.

This study has several limitations. First, we used a forced choice format. This format might have restricted the option for participants to state that they did not have the knowledge to answer the question and therefore prevented them from a more nuanced response. Second, most legal professionals indicated that they wanted to obtain more information to form a conclusion about Vicky’s case. Indeed, in actual cases, more information would be available. However, we were interested in the basic knowledge about false memory propensity. Future studies could include more information (e.g., suspect’s account and other eyewitness testimonies) in a case vignette to more closely mimic actual case details. Furthermore, other factors (e.g., gender and race of participants and individuals in the case vignette, situational/case specific variables) were not considered in the current study. Such stimulus sampling45 allows participants to develop more flexible and generalized responses to different situations, which is lacking in the current study. Finally, while DRM studies offer insights into general memory processes, the findings may not fully translate to the spontaneous recall of an entire criminal event. Specifically, remembering related words differ on many dimensions from remembering personal emotional experiences (e.g., level of emotion, self-relevance, etc.46). Therefore, caution is needed when using DRM-based developmental trends in false memory generation to draw conclusion about whether our participants’ knowledge aligns with scientific research on false memory development.

In summary, legal professionals did not seem to follow a basic rule when evaluating a witness’s statement. In general, legal professionals leaned towards Vicky’s statement having strong weight as case-supporting evidence and found it unlikely that Vicky formed a false memory. They were more likely to assume that a false memory was formed when the witness was 6 years old than 22 years old, irrespective of the type of disclosure (i.e., spontaneous or suggestion-induced false memory). Although children are more prone to suggestion-induced false memories than adults6, they are not more likely to form spontaneous false memories8,9,10. Both adults and children can provide reliable testimony, but both are also susceptible to false memory formation. It is important that each legal case must be examined separately, because individual case details might make it less likely that children form false memories than adults (e.g., spontaneous statements increase the likelihood of the statement being true47).

The current study promotes the need for (1) awareness among legal professionals about the implications of research on developmental trends in eyewitness memory and (2) the advantage of consulting a memory expert for statement validity assessment. Eventually, this will contribute to improved statement validity assessment and legal decision-making.