Introduction

Working memory (WM) is a limited data processing system responsible for encoding and briefly maintaining information, so that it can still be accessed in the absence of inputs from the environment1,2,3. Encoding processes transform transient perceptual representations into a more durable state, while maintenance mechanisms allow information to be retained so that it can be used to guide behaviour over brief delays4,5. WM performance may decline with ageing6,7,8, and is further impaired in several neurological conditions such as Alzheimer’s Disease (AD)9,10 and Parkinson’s Disease (PD)11,12. In healthy ageing, several components contributing to declining performance in visual WM have been identified: a reduction in the number of items that can be stored, or WM capacity13; a deficit in maintaining the associations (bindings) between individual object features14 resulting in a higher proportion of misbinding or “swap errors”15; an increase in random guessing16,17; and a decrease in response precision18,19. Since WM is limited in capacity, items compete for access to it. Hence, the ability to efficiently filter out irrelevant information is crucial for optimal use of WM resources, and its impairment constitutes an additional source of decline of WM performance and capacity20,21,22,23,24,25,26,27.

Filtering out distractors at encoding (i.e., when the items are presented), versus maintenance (i.e., when they are retained in memory), poses different challenges28,29. Filtering at encoding has been envisaged as deployment of a flexible gating system, allowing at the same time relevant targets to enter WM while keeping distractors out29,30. Filtering during maintenance on the other hand might rely on the ability to keep the gate shut firmly once targets are successfully encoded, in order to prevent any further information entering WM and corrupting task-relevant information24,26,27,30. The integrity of fronto-striatal networks has been commonly associated with performance in filtering paradigms31,32,33, and linked to brain structures involved in WM storage including the parietal cortex31,34. However, the contribution of the hippocampus to filtering abilities, despite being crucial for short-term and long-term memory function35,36,37,38, has received little attention.

Elderly participants seem to be particularly prone to errors with increasing memory load13, which has been linked to a decline in their filtering abilities20. Some evidence suggests older adults show poorer performance if distractors are presented during maintenance rather than at encoding29. It is currently unknown whether this is due to decreased memory fidelity, to a failure in successfully form a bound, stable representation of an object, or to information exceeding WM capacity, leading to random guessing. These questions can be addressed using delayed reproduction paradigms that ask participants to reproduce from memory, using a continuous analogue response space, specific features of items presented at encoding2Both an age-related decline in memory precision18,19 as well as increased guessing39 have previously been reported using such paradigms. Evidence of increased misbinding has been more mixed, with some data showing higher rates in healthy ageing15, while other have not40.

Maintenance of bound features seems to be even more impaired in AD compared to healthy ageing41. Increased misbinding in patients with both sporadic, late-onset AD and familial AD has now been reported using both change-detection and delayed reproduction tasks12,35,42,43,44, and specifically linked to hippocampal dysfunction45,46,47.

An increased vulnerability to distractors is also detectable from the earliest phases of AD48,49,50,51,52, particularly if task instructions need to be maintained over time, pointing towards a deficit of goal-maintenance processes53,54 Crucially, comparative data on distractor filtering presented at encoding or maintenance within the same study are lacking.

PD is also associated with deficits in visual WM. Different paradigms have provided evidence of reduced WM capacity55 as well as greater levels of guessing, rather than misbinding12,56Further, PD patients have been shown to exhibit deficits in distractor filtering at encoding, which might be linked to basal ganglia and prefrontal dysfunction31,55. However, other studies have shown that PD patients also fail to efficiently manipulate information over the maintenance period, which can be selectively modulated by replenishing their dopaminergic tone17.

In summary, healthy ageing, AD and PD have been associated with reduction of filtering abilities, which impact negatively on their overall WM capacity and performance. Comparative data on filtering at encoding and maintenance in these groups are currently lacking. Here, using a unified delayed reproduction task and computational modelling, we investigated whether filtering deficits in healthy ageing, AD and PD are related to a decline in memory precision, a decreased ability to detect the correct target, an increased likelihood of misbinding or greater random guessing. We also examined whether these performance metrics relate to hippocampal integrity, given the crucial role of the hippocampus in both encoding and maintenance processes36,57,58. First, we established normal performance in young and elderly healthy controls (Study 1). Then, we studied filtering ability in AD (Study 2) and PD (Study 3).

Study 1 | Effects of healthy ageing on working memory filtering

Materials and methods

Participants

28 young (YHC) and 28 elderly healthy controls (EHC) were recruited (YHC: 18–35 years and EHC: 50–90 years), through respectively either the departmental online participant recruitment system (SONA), or open day events. All performed the Filtering at Encoding and Maintenance Task (Fig. 1). We also collected a measure of global cognition – the Addenbrooke’s Cognitive Examination-III (ACE-III)59 – in the elderly group as baseline screening for cognitive impairment. Participants who reported any psychiatric or neurological illness, were on psychoactive drugs or scored below the cut-off for normality (88/100 total ACE-III score) were excluded from the study. A summary of participants’ demographics and tests results is presented in Supplementary Table 1.

Fig. 1
figure 1

Filtering at Encoding and Maintenance Task design. Participants were presented with either two (SS2) or three (SS3E) arrows at encoding and were instructed to remember the orientations of all arrows (SS2 and SS3E), or to ignore one of them in the Filter at Encoding (FE) condition. After a blank interval (2000 msec), one of the target arrows (probe) re-appeared in black in its original location but in a random orientation, which the participants had to rotate to its remembered orientation. In the Filter at Maintenance (FM) and Set Size three at Maintenance (SS3M) the first two arrows appeared simultaneously and had to be remembered, and after a 500 msec delay during the maintenance period, a third arrow appeared for 1000 msec, which either had to be ignored (FM), or remembered (SS3M). After 500 msec the probe was displayed and had to be rotated to its original orientation.

For all three studies, the following principles were observed: All participants had normal or corrected-to-normal vision acuity and no color blindness. All participants gave written informed consent prior to the start of the study and received financial compensation for their participation. Ethical approval was granted by the University of Oxford Medical Sciences Inter-Divisional Research Ethics Committee (MS IDREC). The study was performed in accordance with the Declaration of Helsinki and following the relevant guidelines and regulations.

Filtering at encoding and maintenance task

The Filtering at Encoding and Maintenance Task (Fig. 1) had 5 different conditions.

In Set Size 2 (SS2) and Set Size three at Encoding (SS3E) conditions, participants were presented with either two (SS2) or three (SS3E) different coloured arrows (red, green or blue) and were instructed to remember their colours and orientations. After a blank delay interval (2000 ms), one of the target arrows (i.e., the probe) re-appeared in black in its original location but presented in a random orientation. Participants were instructed to rotate the probe using the left and right arrow keys on a keyboard back to the remembered orientation. In the Filter at Encoding (FE) condition three arrows were presented simultaneously, as in the SS3E condition, but now participants were asked to ignore one of them.

In the Filter at Maintenance (FM) and Set Size three at Maintenance (SS3M) conditions, the first two arrows appeared simultaneously and had to be remembered. Subsequently, after a 500 msec delay during the maintenance period, a third arrow appeared for 1000 msec, which either had to be ignored (FM) or remembered (SS3M) and after 500 msec the probe was displayed. Further task specifics can be found in Supplementary materials.

Statistical analysis

All analyses were conducted using MATLAB 2019a and JASP (JASP team, 2020). As first step, we computed the mean absolute error (MAE) for each condition for each participant. This provides an index of raw error. We then fitted the probabilistic Mixture Model introduced by Bays et al. in 200960 to each participant for each condition. The mixture model allows modelling of responses in terms of probability of correctly identifying the target (target detection), misbinding the features of an object with another among the probed items (misbinding or “swap errors”), random guessing (guessing), and extracts a measure of memory precision, (Supplementary Model 1)60.

Key parameters used for statistical analyses

For all variables, Instruction x Condition x Group frequentist mixed ANOVAs were performed, where:

  • Instruction – Refers to 3-item conditions with filtering (FE, FM) vs. no filtering (SS3E, SS3M).

  • Condition – All stimuli at encoding (FE, SS3E) vs. one presented at maintenance (FM, SS3M).

  • Group - YHC vs. EHC.

We also computed from the MAE a main effect of Set Size, both at Encoding and at Maintenance and their difference or Set Size Cost:

  • Set Size at Encoding = SS3E - SS2.

  • Set Size at Maintenance = SS3M - SS2.

  • Set Size Cost = Set Size Encoding - Set Size Maintenance.

A main effect of Filtering, or Filtering rate, was calculated separately at Encoding and Maintenance from MAE as follows:

  • Filtering rate at Encoding = FE - SS2.

  • Filtering rate at Maintenance = FM - SS2.

  • Filtering rate Cost = Filtering rate at Encoding - Filtering rate at Maintenance.

If perfect filtering was achieved, Filtering rate would be zero. The higher the Filtering rate, the worse the performance. For the ANOVA, effect size was quantified using Eta Squared (η2). Between-group differences for Set Size and Filtering rate metrics were assessed either with a t-test or Mann-Whitney U test, and a Bayesian independent sample analysis was performed. For the Bayesian analysis, we used the default priors for the effects (Fixed effects: r = 0.5, Random effects: r = 1). Outliers’ removal was performed across groups for values more than 3 standard deviations from the mean.

Results

Mean absolute error (MAE)

EHC performed significantly worse than their younger counterparts across all conditions (main effect of Group [F (1,48) = 40.46, p < 0.001, η2 = 0.292]; Fig. 2A). Furthermore, both groups were significantly worse in the conditions where all three items had to be remembered (SS3E and SS3M) compared to when they had to filter an item out (main effect of Instruction [F (1,48) = 17.75, p < 0.001, η2 = 0.049]). There was no significant main effect of Condition [(F (1,48) = 1.77, p = 0.190, η2 = 0.014], whilst there was a significant 3-way Group x Instruction x Condition interaction [(F (1,48) = 8.22, p = 0.006, η2 = 0.014], suggesting EHC performed significantly worse if the three items to be remembered were presented simultaneously at encoding, compared to maintenance.

Fig. 2
figure 2

Mean absolute error (MAE) and mixture model metrics in young (YHC) and elderly healthy controls (EHC). Performance of participant groups are shown labeled with different colors, EHC in coral red, YHC in turquoise. Set Size 2 (SS2), Filtering at Encoding (FE), Filtering at Maintenance (FM), Set Size three at Encoding (SS3E), Set Size three at Maintenance (SS3M). 2 A: MAE performance across the five different conditions. On the Y-axis MAE in degrees. Shaded areas represent confidence intervals (CI). 2B: Mixture Model metrics across conditions in YHC and EHC. Panel a: Precision as concentration parameter κ, Panel b: Target probability, panel c: Misbinding probability, panel d: Guessing probability.

Impact of set size at encoding and maintenance

EHC exhibited a higher Set Size effect at Encoding compared to YHC (t(48) = −4.548, p < 0.001), (Supplementary Fig. 1 A. However, the two groups were not significantly different in Set Size at Maintenance (t(48) = − 0.936, p = 0.354). YHC had also a significantly lower Set Size Cost compared to EHC (t(47) = −3.391, p = 0.001). Bayesian independent sample test showed a BF10 = 523.1 for the Set Size at Encoding and BF10 = 23.28 for Set Size Cost, providing strong evidence for these effects.

Overall, performance at remembering three items simultaneously was much worse for older individuals compared to younger people, while when the WM load of encoding three items was distributed across the initial encoding and maintenance phases elderly participants were able to cope with the increased cognitive demand as well as younger ones.

Impact of filtering at encoding and maintenance

Both groups were able to efficiently filter out irrelevant items, shown by the Filtering rates being not significantly different from zero in both groups at Encoding and Maintenance (Supplementary Fig. 1B). EHC and YHC were not significantly different in Filtering rate at Encoding (U = 325, p = 0.993). However, EHC had worse performances (higher values) compared to YHC in Filtering rate at Maintenance (U = 205, p = 0.038). Nevertheless, the Filtering rate Cost was not significantly different between the two groups (U = 356, p = 0.402). Moreover, the Bayesian independent sample test showed a BF10 = 1.047, providing no definitive evidence for a significant difference in Filtering rate at Maintenance between groups.

Analysis of mixture model parameters

Precision

YHC had significantly higher Precision than EHC [F(1, 42) = 59.06, p < 0.001, η2 = 0.399]. There were no other significant main effects or interactions, suggesting the difference in precision reflects a pure ageing effect, regardless of task demands (Fig. 2B, panel a).

Target detection

YHC showed significantly better target detection compared to EHC [F(1, 48) = 25.38, p < 0.001, η2 = 0.198], (Fig. 2B, panel b). Across both groups, significantly more targets were detected in the filtering conditions than in SS3 conditions, indexed by a main effect of Instruction [F (1,48) = 10.01, p = 0.003, η2 = 0.031]. Furthermore, there was a main effect of Condition [F (1,48) = 6.64, p = 0.013, η2 = 0.011], with significantly fewer targets identified at Encoding than during Maintenance.

EHC performed worse than YHC when all three items needed to be remembered compared to when an item had to be filtered out, as shown by a significant two-way interaction between Instruction and Group [F (1,48) = 9.62, p = 0.003, η2 = 0.030]. Finally, there was a significant three-way Instruction x Condition x Group interaction [F (1,48) = 4.39, p = 0.041, η2 = 0.010], with EHC being worse at detecting targets when 3 items had to be remembered at Encoding compared to Maintenance. There were no other significant interactions (see Supplementary Table 2). These results suggest that YHC and EHC found it more challenging to detect targets when all three items had to be encoded, and if the stimuli appeared at Encoding, with EHC paying the highest cost.

Misbinding

There was a significant main effect of Group [F(1, 43) = 15.29, p < 0.001, η2 = 0.103], and of Instruction [F(1, 43) = 6.74, p = 0.013, η2 = 0.038], with significantly more misbinding errors committed by both groups in the SS3 conditions compared to the filtering conditions (Fig. 2B, panel c). There were no other significant main effects or interactions (see Supplementary Table 2). Thus, even in healthy young individuals, misbinding rates were higher if three items had to be remembered compared to when one could be filtered out, suggesting that filtering can reduce WM resources consumption.

Guessing

A main effect of Group (F(1, 44) = 7.32, p = 0.010, η2 = 0.070], no main effects of either Instruction or Condition (see Supplementary Table 2), but significant 2-way Instruction x Group (F(1, 44) = 4.329, p = 0.043, η2 = 0.016] and Instruction x Condition [F(1, 44) = 5.391, p = 0.025, η2 = 0.017] interactions were found (Fig. 2B, panel d). This suggests that EHC guessed more than YHC, and more so in the more taxing SS3 conditions, independently of being presented during the Encoding or the Maintenance phase. Moreover, filtering during the Encoding phase led to more guessing in both groups, compared to when filtering had to be carried out during Maintenance.

Full results of the ANOVA for all metrics can be found in Supplementary Table 2.

Overall, Study 1 showed that both EHC and YHC were able to filter out distractors efficiently, with EHC benefitting from encoding three items non-simultaneously. EHC made more errors compared to YHC, and this was due to worse performance across all types of errors: decline in memory Precision, reduced Target detection, increased Misbinding rates and increased Guessing. Filtering out an item was clearly beneficial in term of increased rates of target detection and reduced misbinding rates across the two groups. EHC also guessed more, especially when one item could not be filtered out. Memory precision was lower in EHC across all conditions and was the metric that showed the highest effect size, suggesting it might be a good marker of ageing.

Study 2 | Filtering in Alzheimer’s disease

Materials ad methods

Participants

Twenty-eight patients with Alzheimer’s clinical syndrome, defined as per61, i.e., AD group, and 28 EHC (the same as Study 1) were recruited respectively from the Oxford Centre for Cognitive Disorders and through open day events. Participants underwent the Filtering at Encoding and Maintenance Task, a neuropsychological battery (Table 1), and a subset of patients (48/56) consented to perform a 3 T MRI brain scan.

Table 1 EHC and AD demographics and tests.

The neuropsychological battery comprised measures of global cognition, verbal short-term memory, depression, apathy and fatigue. These included ACE-III, Digit Span (DS)62, Apathy Motivation Index (AMI)63, Beck’s Depression Inventory (BDI)64, Hospital Anxiety and Depression Scale (HADS)65, Snaith-Hamilton Pleasure Scale (SHAPS)66, Mood Scale (MS), i.e. the 15-Item Geriatric Depression Scale (GDS-15)67, Fatigue Severity Scale (FSS)68, Visual Analogue Fatigue Scale (VAFS)69, Pittsburgh Sleep Quality Index (PSQI)70, World Health Organisation Five Well-Being Index (WHO-5)71 and Cantril Ladder (CL)72. A summary of participants’ demographics and tests results is presented in Table 1. EHC and AD were not significantly different with respect to age, gender and handedness. All elderly healthy controls had global cognitive scores within the normal range, and none was significantly depressed.

MRI acquisition and analysis

The MRI protocol and pipeline used to calculate head-size-corrected hippocampal volumes have been described elsewhere44, and can be found in Supplementary materials.

Statistical analysis

For MAE, Precision, Target detection, Misbinding, and Guessing, we performed 2 (Instruction: Filter and SS3) x 2 (Condition: Encoding and Maintenance) x 2 (Group: EHC, AD) mixed ANOVAs. We followed the same principles described in Study 1 in computing Set Size and Filtering rate metrics. Between-group differences were assessed through frequentist and Bayesian ANCOVAs using age, sex, education and handedness as covariates. The same principles were applied in Study 3.

Results

Mean absolute error (MAE)

There was a main effect of Group [F (1,52) = 43.01, p < 0.001, η2 = 0.361], (Fig. 3) and Instruction [F (1,52) = 16.54, p < 0.001, η2 = 0.022], with higher errors in the conditions where three items had to be remembered compared to when one item could be filtered out. There was no main effect of Condition [F (1,52) = 0.91, p < 0.001], whilst there was a significant Condition x Group interaction [(F (1,52) = 4.26, p = 0.044, η2 = 0.004], with AD performing worse at Maintenance and EHC performing worse at Encoding. There was also a significant Instruction x Condition interaction [(F (1,52) = 11.79, p = 0.001, η2 = 0.010], with Filtering at Maintenance and Set Size at Encoding being challenging conditions compared to their counterparts Filtering at Encoding and Set Size at Maintenance.

Fig. 3
figure 3

Mean absolute error (MAE) and mixture model metrics in EHC and AD. Performance across the five different conditions: Set Size 2 (SS2), Filtering at Encoding (FE), Filtering at Maintenance (FM), Set Size three at Encoding (SS3E), Set Size three at Maintenance (SS3M). The two groups are shown as EHC in coral red, AD in green. 3 A: On the Y-axis MAE in degrees. Note in particular the differential effect of Filtering at Maintenance (FM) compared to Filtering at Encoding (FE) within the AD group compared to within the EHC group. Further, in AD, FM performance (when a to-be-ignored distractor was presented during maintenance) was equivalent to that Set Size three at Maintenance (SS3M, when the new item had to be retained). 3B: a | Precision as concentration parameter κ. b | Target probability. c | Misbinding probability. d | Guessing probability.

Impact of set size at encoding and maintenance

AD patients showed a greater Set Size effect at Maintenance [F (1,49) = 4.35, p = 0.042] compared to EHC, but no difference in Set Size at Encoding [F (1,50) = 0.61, p = 0.438] or their Cost [F (1,48) = 1.28, p = 0.263] (Supplementary Fig. 2 A). The Bayesian ANCOVA showed moderate evidence for this effect (BFM = 3.160).

Impact of filtering at encoding and maintenance

There was no difference in Filtering rate at Encoding between the two groups [F (1,49) = 0.46, p = 0.499], but groups were different in Filtering rate at Maintenance [F (1,48) = 8.94, p = 0.004] and Filtering rate Cost [F (1,48) = 4.63, p = 0.036], with AD having higher Filtering rates at Maintenance and lower Filtering rate Cost, indicative of worse performances at Maintenance (Supplementary Fig. 2B). The Bayesian ANCOVA showed a BFM = 5.762 for Filtering rate at Maintenance and BFM = 3.324 for Filtering rate Cost, therefore providing evidence for a significant difference.

Analysis of mixture model parameters

Precision

AD patients had lower Precision compared to EHC [F (1,39) = 4.34, p = 0.044, η2 = 0.039] (Fig. 3B, panel a). There were no main effects of Instruction or Condition, nor any significant interactions. Therefore, AD had lower precision compared to EHC regardless of the condition.

Target detection

For Target detection, a main effect of Group [F (1,52) = 36.66, p < 0.001, η2 = 0.291], with AD patients exhibiting worse performance compared to EHC, was found (Fig. 3B, panel b). There was also a main effect of Instruction [F (1,52) = 10.20, p = 0.002, η2 = 0.021], with fewer targets detected in SS3 than in the filtering conditions. While AD patients were worse at detecting targets compared to EHC, both groups benefitted from filtering one item out.

Misbinding

There was a main effect of Group [F (1,44) = 7.87, p = 0.007, η2 = 0.047], with higher misbinding rates in AD patients compared to EHC (Fig. 3B, panel c) and a main effect of Instruction [F (1,52) = 6.02, p = 0.018, η2 = 0.029], with more misbinding in SS3 than in the filtering conditions for both groups.

Guessing

A main effect of Group [F (1,48) = 31.27, p < 0.001, η2 = 0.241], with AD patients performing worse than EHC (Fig. 3B, panel d), was found.

Full results of the ANOVA for all metrics can be found in Supplementary Table 3.

In the AD group there was no correlation between disease duration and filtering abilities, working memory capacity or any other WM metric.

Overall, Study 2 showed that filtering out an item was beneficial in term of increased rates of Target detection and reduced Misbinding across the two groups but was unable to increase memory Precision. Filtering at Maintenance was worse in AD patients compared to EHC, with AD patients showing similar performances in FM as in the SS3M condition (Fig. 3). AD patients showed impairment across all multiple metrics compared to EHC: reduced memory Precision and Target detection, increased Misbinding and Guessing. However, in this case Target detection and Guessing, and not memory Precision as in healthy ageing, showed the strongest effect size in group comparisons.

Study 3 | Filtering in Parkinson’s disease

Materials and methods

Participants

Twenty-eight patients with idiopathic Parkinson’s Disease, diagnosed as per73, were recruited from the Cognitive Disorder Clinic and the Oxford Parkinson’s Disease Centre Discovery Cohort, based at the John Radcliffe Hospital in Oxford. Their performance was compared to the EHC group of Study 1 and 2. We recruited only PD patients with no evidence of cognitive impairment (ACE-III total score > 88/100), to avoid potential confounding factors in results’ interpretation. Nevertheless, further studies including a wider range of cognitively intact PD patients, PD with mild cognitive impairment and PD patients with dementia would be advisable. PD patients were tested in ‘ON’ state, i.e. within 3 h of their usual dopaminergic medications. Participants underwent the same assessments described in Study 2, and a subset (43/56) consented to MRI brain scanning. In addition, we collected data on Unified Parkinson’s Disease Rating Scale (UPDRS) scores and total Levodopa dose, to investigate whether disease stage or dopaminergic medications could impact performance. Participants’ demographics and test scores are presented in Table 2. PD patients had lower ACE scores, but still within normal ranges (above 88/100 total).

Table 2 EHC and PD demographics and tests.

Results

Mean absolute error (MAE)

There was no significant group difference between EHC and PD patients in MAE [F (1,49) = 1.10, p = 0.300], (Fig. 4A). However, there was a main effect of Instruction [F (1,49) = 27.07, p < 0.001, η2 = 0.079], with better performance in filtering conditions compared to Set Size three conditions. There was an Instruction x Condition interaction [F (1,49) = 8.12, p = 0.006, η2 = 0.016], suggesting remembering three items was more difficult at Encoding.

Fig. 4
figure 4

Mean absolute error (MAE) and mixture model metrics in EHC and PD. Performance across the five different conditions: Set Size 2 (SS2), Filtering at Encoding (FE), Filtering at Maintenance (FM), Set Size three at Encoding (SS3E), Set Size three at Maintenance (SS3M). Different participant groups are labeled with different colors, i.e. EHC in coral red, PD in violet. 4 A: On the Y-axis, MAE in degrees for the two groups. 4B: a | Precision as concentration parameter κ. b | Target probability. c | Misbinding probability. d | Guessing probability.

Set size and filtering strategies

There was no significant difference in Set Size at Encoding [F (1,50) = 2.23, p = 0.634], Maintenance [F (1,49) = 0.21, p = 0.652], or their Cost [F (1,47) = 1.33, p = 0.254], nor in Filtering rate at Encoding [F (1,48) = 1.33, p = 0.066], Maintenance [F (1,42) = 0.44, p = 0.509], and their Cost [F (1,47) = 2.61, p = 0.113], among the two groups.

Analysis of mixture model parameters

Precision

No main effect of Group [F (1,38) = 0.001, p = 0.972], Instruction [F (1,38) = 1.47, p = 0.233] or Condition [F (1,38) = 0.42, p = 0.518], but a significant Instruction x Condition interaction [F (1,38) = 4.58, p = 0.039, η2 = 0.022] was found, suggesting that overall participants were more precise during Set Size three at Maintenance rather than Set Size three at Encoding (Fig. 4B, panel a). Therefore, memory Precision does not seem to be affected in PD.

Target detection

There were no main effects of Group [F (1,49) = 0.18, p = 0.180], or Condition [F (1,49) = 0.75, p = 0.391], but a main effect of Instruction [F (1,49) = 29.60, p < 0.001, η2 = 0.090], with less targets detected in conditions where three items needed to be remembered compared to when one item could be filtered out (Fig. 4B, panel b).

Misbinding

No main effects of Group [F (1,46) = 0.20, p = 0.653], or Condition [F (1,46) = 0.10, p = 0.758], but a main effect of Instruction [F (1,46) = 7.72, p = 0.008, η2 = 0.046] was seen, with misbinding occurring more frequently when three items need to be remembered compared to when one could be filtered out (Fig. 4B, panel c).

Guessing

There was a main effect of Group [F (1,45) = 9.43, p = 0.004, η2 = 0.074], with PD guessing significantly more than EHC, and a main effect of Instruction [F (1,45) = 8.50, p = 0.006, η2 = 0.037], with more Guessing occurring in the conditions when all three items had to be remembered compared to filtering conditions (Fig. 4B, panel d).

Full results of the ANOVA for all metrics can be found in Supplementary Table 4.

In the PD group, we found no correlation between UPDRS, Levodopa dose or disease duration and filtering abilities, working memory capacity or any of the WM metrics.

In agreement with Study 1 and 2, Study 3 showed that filtering out an unwanted item improved Target detection, and reduced Misbinding, while having no effect on memory Precision. PD patients exhibited a selective increase in Guessing compared to EHC, but otherwise had comparable performances on all other metrics. In Study 3, filtering out an item was beneficial also to reduce Guessing rates in EHC and PD if taken together. A visual comparison across the 4 groups for MAE and each of the mixture model metrics are presented in Supplementary Fig. 3 and 4.

Transdiagnostic analysis of cognitive measures and hippocampal volumes

We next investigated the interplay between WM capacity, filtering abilities, different sources of error derived from the Mixture Model, and hippocampal volumes using a transdiagnostic approach. Cumulative metrics for each Mixture Model variable were calculated by averaging the performance across all five conditions (SS2, FE, FM, SS3E, SS3M). Correlations between variables were computed using a generalized linear model, with age, sex, handedness and education included as covariates of no interest.

Relationship between hippocampal volumes and filtering ability

Across groups, Filtering at Maintenance was significantly negatively correlated with whole hippocampal volume (WHV) (df = 55, r = − 0.404 p = 0.002), (Figs. 5a and 6). Figure 6 displays the relationship between Filtering rate at Maintenance and hippocampal volumes, with participants with higher Filtering rate at Maintenance (on the right side), having lower hippocampal volumes (bottom of the graph). There was no significant correlation between hippocampal volume measures with Filtering rate at Encoding (df = 60, r = − 0.155, p = 0.228) or with Set Size, either at Encoding (df = 61, r = − 0.113 p = 0.379) or Maintenance (df = 60, r = − 0.117 p = 0.365). To further validate this result, we looked at whether Filtering rate at Maintenance was correlated with whole brain volume, and there was no such correlation (df = 55, r = − 0.116, p = 0.392).

Fig. 5
figure 5

Mixture model parameters correlations with hippocampal volumes. a | Correlation between Filtering rate at Maintenance and whole hippocampal volume (WHV). b |Correlation between Mixture Model metrics and whole hippocampal volume. On the X-axis the volume in mm3 of the whole hippocampus. In b, the Y-axis represent from left to right respectively MAE in degrees, probabilities of Target detection, Misbinding and random Guessing.

Fig. 6
figure 6

Filtering and MAE stratified by WHV and diagnosis. FM on the X-axis, MAE on the Y-axis, hippocampal volumes (WHV) on the Z-axis. Subjects with higher WHV (top of the image) have lower MAE (back of the image), lower Filtering rate at Maintenance scores (on the left) and are predominantly represented by elderly healthy subjects (in coral red), and PD patients (in violet). On the other end of the spectrum, AD patients (in green) have higher Filtering rate at Maintenance scores (on the right), higher MAE (front of the image), and have lower hippocampal volumes (bottom of the image).

Relationship between hippocampal volume and other WM performance metrics

WHV was negatively correlated with MAE (df = 57, WHV: r = − 0.473, p < 0.001), Misbinding (df = 48, WHV: r = − 0.476, p < 0.001) and Guessing (df = 50, WHV: r = − 0.578, p < 0.001) and positively correlated with Target detection (df = 57, WHV: r = 0.474, p < 0.001), (Figs. 5b and 6 for MAE). Figure 6 shows how higher MAE is negatively correlated with hippocampal volumes, as the regression plane goes from higher hippocampal values (top if the image) to lower values (bottom) with increasing MAE values (from back to front of the image on the Y-axis). No association between hippocampal volume and Precision was found (df = 43, WHV: r = −0.057, p = 0.710). In this case, both MAE (df = 57, r = − 0.319, p = 0.014), Target detection (df = 57, r = 0.310, p = 0.017) and Guessing (df = 50, r = −0.335, p = 0.015) were correlated with whole brain volume, while Misbinding was not (df = 48, r = −0.101, p = 0.487).

We then performed a stepwise regression analysis to determine which metric was the best predictor of hippocampal volume, with WHV as dependent variable, Misbinding, Guessing, and Filtering rate at Maintenance as covariates. Target detection was not included for avoiding collinearity in multiple regression, which arises from how Target, Misbinding and Guessing probabilities are calculated. Overall, WHV was better explained by Guessing (df = 48, r = −0.526, p < 0.001).

Discussion

This study, using a novel delayed reproduction task, compared WM filtering performance at encoding and maintenance in healthy ageing, AD and PD.

Previous evidence suggests that filtering abilities are tightly related to higher WM capacity20,74. Our data support this view, as shown by Fig. 6, where there is a positive correlation between mean absolute error and Filtering rate at maintenance. Elderly controls showed preserved filtering abilities if the distractor was presented at Encoding (Fig. 2A). However, their ability to filter out a distractor during the Maintenance period might not be as efficient, as suggested by the small age-related effect in decline in Filtering rate at Maintenance (Supplementary Fig. 1B),which is in line with previous evidence of an age-dependent decline in performance during maintenance20.

AD patients on the other hand showed clear WM deficits in Filtering at Maintenance (Fig. 3A, Supplementary Fig. 2B). Patients with AD have been found to struggle to prioritize relevant over irrelevant visual information across different tasks where an optimal inhibitory control is required to avoid unwanted attentional capture75,76,77,78. Our study showed not only that filtering abilities are impaired in AD patients, but that their deficit is specifically limited to the maintenance phase.

In contrast, PD patients showed excellent filtering ability both at Encoding and Maintenance. All our PD patients were tested in an ON Levodopa state (when dopaminergic medications are effective). Other studies have reported an impaired filtering ability in PD patients OFF Levodopa55, so it is possible that there is a beneficial effect of dopamine replenishment on ignoring or filtering in PD17. The integrity of WM maintenance processes in PD concurs with previous data on tasks that likely require both object and spatial processing79. Since we did not specifically test for a delay effect, we cannot exclude that at longer delay PD patients could also suffer from delay-dependent declines in maintenance performances as previous evidence suggests80.

Overall, healthy elderly participants performed worse compared to younger participants and were specifically impaired when three items had to be encoded simultaneously (Fig. 2, Supplementary Fig. 1A), in line with a well-known age-related decline in WM capacity18,29 . AD patients showed overall lower performances compared to elderly controls. Unlike the latter, however, they seemed to cope well with increased Set Size at Encoding (Fig. 3, Supplementary Fig. 2 A), but were mildly impaired during the Maintenance phase, as reflected by the weak Set Size effect at Maintenance.

Mixture Model analysis revealed that healthy ageing is associated with different types of errors: reduced memory Precision, decreased Target detection, increased Misbinding and increased Guessing (Fig. 2B). But while filtering out an item was beneficial in increasing the numbers of detected targets, and reducing Misbinding, it was not able to help maintaining a high memory Precision, which declined invariably across all conditions in elderly subjects (Fig. 2B, panel a). Filtering out a distractor was also beneficial in reducing Guessing, and that was true specifically for EHC, with YHC not showing such effect. A possible interpretation is that the task was more taxing for the EHC, who had to deploy a cognitive strategy to free up space in WM. These results support the idea that memory Precision could be considered as a marker of ageing, as previously reported by other groups15,18,19,20. In this study, we were also able to show that not only memory Precision declines with ageing, but also that AD patients have a further drop in memory precision compared to age-matched healthy controls, albeit of a much smaller magnitude, while this does not occur in PD patients.

Another memory feature that seems to decline with ageing is the ability to correctly identify a target15,81,82. In Study 1, a clear decline in Target detection was observed in elderly compared to younger controls across all conditions (Fig. 2B, panel b). The significant three-way Instruction x Condition x Group interaction further suggested that elderly controls were worse at detecting targets when three items had to be remembered, and when this had to be accomplished during the Encoding phase. AD patients were even less likely to correctly identify a target compared to age-matched controls (Fig. 3B, panel b). This supports previous findings of reduced correct object identity recognition in familial AD35 as well as sporadic, late onset AD44,83.

In line with previous data15,18, Misbinding increased in healthy ageing, particularly if more items had to be remembered (Fig. 2B, panel c). Misbinding was further increased in AD patients compared to healthy controls (Fig. 3B), unlike in PD patients (Fig. 4B). Previous data showed that both familial35and sporadic late onset AD patients12,44 exhibit increased rates of Misbinding. Whilst confirming that object binding is a feature that is frail in healthy ageing, and more so, in AD, we also found that filtering out an item decreases Misbinding rates in EHC, AD and PD.

Lastly, Guessing rates were higher in healthy ageing, AD and PD patients (Figs. 2B, 3B and 4B, panel d), and it was the only metric which could differentiate between PD patients and EHC. This supports previous evidence showing that Guessing is primarily affected in PD, but can be increased in AD patients44. Filtering out an item was able to help Guessing less in EHC but not YHC, in Study 1, was ineffective in AD patients in Study 2, and was beneficial in EHC and PD in Study 3.

Misbinding has been extensively linked to hippocampal dysfunction45,47,84,85,86, which is in line with our results (Fig. 5b). The lack of correlation with whole brain volume, unlike other metrics such as mean absolute error or target detection, further reinforces this view.

In our dataset Guessing was the best predictor of hippocampal volumes across the WM examined. Notably, our sample consisted of an heterogeneous population, comprising also PD patients, which are more prone to guessing, as well more cognitively impaired AD patients compared to similar studies35,46. For misbinding errors to occur, at least a partial memory of the object (its identity) must be retained, while the memory of its location might be lost. When no mnemonic trace is left, Guessing remains the only viable option, and therefore its rates might be higher in more cognitively impaired populations.

Conversely, memory Precision was not associated with hippocampal integrity, which is in keeping with previous findings showing that the parietal cortex rather than the hippocampus could be primarily involved in maintaining information with high detail of precision over time37,87.

Our data demonstrate also that filtering ability during maintenance – but not encoding – is related to hippocampal volume (Filtering rate at Maintenance in Figs. 5a and 6), suggesting that the hippocampus might protect the contents of WM, once encoded. This is further supported by the lack of correlation between Filtering rate at Maintenance and whole brain volume, pointing towards a specific contribution of the hippocampus in WM filtering during maintenance. Hippocampal atrophy, which is a hallmark of Alzheimer’s Disease, might be the driver of the higher deficits in Filtering at Maintenance observed in the current study. Further studies are needed to explore the relationship between other brain regions involved in visual working memory including the prefrontal cortex, parietal cortex and the basal ganglia and patients’ performance, including markers of filtering abilities, working memory capacity and mixture model indices’. The current study, which uses the Mixture Model of WM, was aimed at studying of patients’ populations, where different factors such as memory Precision and Guessing are clearly dissociable. Examples include Rolinski et al, where memory Precision was unaffected but Guessing rates were higher in patients with Rem Behavioural Sleep Disorder and Parkinson’s Disease compared to age-matched healthy controls88 and Zokaei et al, where Alzheimer’s Disease and Parkinson’s Disease patients showed respectively higher rates of Misbinding and Guessing compared to age-matched elderly healthy controls but no change in memory Precision12. However, in healthy individuals, simpler models such as the one by Schurgin et al, showing a predictable relationship between memory Precision and Guessing, might also work effectively89.

Conclusion

Elderly participants and PD had relatively preserved filtering abilities both at encoding and during maintenance. In AD, however, there were significant filtering deficits when the distractor appeared during maintenance. The leading sources of memory error were reduced memory precision in healthy ageing, higher guessing and lower target detection in AD patients and solely higher guessing in PD patients. Finally, hippocampal volume correlated significantly with filtering ability during maintenance – but not at encoding, as well as other mixture model metrics of WM performance, providing further evidence for the role of the hippocampus in WM.