Introduction

Learning is a fundamental process that is crucial for shaping behavior and cognitive development in both humans and animals. It involves the establishment of associations between stimuli and outcomes, thus enabling individuals to adapt to and navigate their environment effectively. Theories of learning encompass diverse perspectives, from Bayesian inference models that explore how people update their existing beliefs based on new observations1 to reinforcement learning models that focus on how rewards and punishments influence future actions2. Such approaches have yielded significant insights into behavior under simple laboratory conditions in which only a few salient cues are linked to reward probabilities and attention is guided by explicit instructions. Yet it is possible that these approaches cannot be generalized to natural environments that offer multiple routes to rewards or that do not provide explicit training.

Learning in real-world environments is much more complex than in simple laboratory settings. In natural environments, learning tends to proceed across multidimensional forms of input and usually provides no explicit instructions to serve as a guide3,4,5. Consider the example of someone who enjoyed eating a blueberry muffin in the local coffee shop. Why this person enjoyed the muffin is not immediately clear. Is it because the person likes blueberries (pastry dimension), because all the pastries in this specific coffee shop are of high quality (shop dimension), or a combination of the two?

Recent evidence indicates that our cognitive system employs selective attention as a mechanism to simplify the complexities inherent in such learning environments, thereby reducing the dimensionality of the problem at hand4,6. Selective attention holds the task representations that the learning mechanisms act upon7. Occasionally, our attentional focus can be facilitated to support learning when the relevance of a dimension is explicitly conveyed to us. But usually, we must glean the informational value of dimensions through experience, which also shapes our attentional priorities accordingly. Studies in this field highlight the bidirectional interplay between attentional mechanisms and the learning process4,8. From a neurobiological perspective, the assumption is that the striatal regions, which are associated with reinforcement learning, and the frontoparietal areas, which are responsible for executive control processes, dynamically interact in such complex learning environments8. In computational terms, hybrid models that integrate both reinforcement learning and Bayesian inference strategies offer a better explanation of participants’ choices in such contexts than traditional reinforcement learning models9.

These insights offer significant implications for understanding various neurodevelopmental conditions in which attention and learning processes, including Bayesian inference, are often atypical10,11,12,13,14,15,16,17. Neurodevelopmental disorders encompass a spectrum of conditions that are characterized by impaired cognitive, motor, or social development. These conditions typically appear early in childhood and persist without marked remission or relapses18. While the categorization of these disorders continues to be a topic of ongoing discussion19, the current classification encompasses both ADHD and developmental dyslexia20. These conditions exhibit considerable variability in terms of etiology, characteristics, and causes, yet they also share high comorbidity and common risk factors21,22.

A plausible common risk factor among these disorders revolves around reinforcement learning14,23,24, often attributed to altered cortico-striatal activity16,25,26. Individuals diagnosed with ADHD and dyslexia demonstrate atypical striatal-cortical connectivity16 and significant learning difficulties across diverse paradigms, notably on probabilistic tasks in which explicit instruction acts as a guide e.g.23,24,27,28, and in which only a few dimensions are present14. Nevertheless, these tasks may underscore the challenges faced by individuals with ADHD and dyslexia in navigating real-world multidimensional environments devoid of explicit guidance. Furthermore, although attentional functions are sometimes altered in such conditions29, explicit attentional training interventions in neurodevelopmental disorders have proven ineffective30,31. Therefore, understanding how learning and attention interact under less explicit conditions can shed light on how individuals with neurodevelopmental disorders learn behaviorally relevant input regularities in the real world. Computational modelling of behavior can be significant as similar performance may be driven by different mechanisms.

To study how attention influences learning and is refined through experience among individuals with ADHD and dyslexia, we capitalized on previous studies showing that cue-outcome associations can be acquired when the environment features multiple dimensions predictive of reward and in which no explicit instructions are afforded6,32,33,34. In this paradigm, participants are required to choose one of three stimuli on each trial and are given reinforcement feedback (Fig. 1). Each stimulus comprises three features defined along three stimulus dimensions: color, shape, and pattern. The task includes multiple games. In each game, only the features of one dimension are probabilistically related to reward (e.g., color determines reward regardless of shape and pattern). Although the start of a new game is signaled to participants, they are not explicitly informed which dimension is currently predictive of reward. Therefore, as in real-world scenarios participants must use trial and error to discover which dimension to prioritize for obtaining maximal reward. Hence this task is called the No-Explicit Assist task. Participants also complete a control condition that is identical to the multidimensional task, except that at the beginning of each game they are explicitly instructed as to which dimension to focus their attention on, eliminating the need to learn the relevant dimension. Hence this task is called the Explicit-Assist task. In our study, participants completed both conditions, thus enabling us to compare learning across less explicit/explicit guidance conditions in ADHD and dyslexia as compared to among controls.

Fig. 1: Three dimensions (color, shape, texture), each with three separate defining features: shape (circle, square, or triangle), color (yellow, green, or red), and texture (plaid, waves, or dots).
figure 1

At the beginning of each trial, participants are prompted to select one of three stimuli. Three different features define each stimulus: color, shape and texture. After indicating their choice, participants receive feedback indicating the number of points they receive: 1 (depicted) or 0 points. Subsequently, a new trial began with a new set of stimuli. In the Explicit-Assist task, participants are informed about the predictive dimension. In the No-Explicit-Assist task, participants do not receive such guidance. To allow repeated measurements of learning, the task is split into several “games”. In any given game, only one of the three stimulus dimensions is predictive of reward. Choosing the most rewarding feature on that dimension results in being awarded a point 75% of the time, while choosing any other feature results in being awarded a point 25% of the time. At the end of each game, a selection screen prompts participants to indicate what they think is the most rewarding feature (or to choose “I don’t know”).

We hypothesized that performance under less supervised training conditions would be reduced in individuals with ADHD and dyslexia due to reported attention and associative learning impairments associated with these disorders. While this reduced performance may be most apparent in the No-Explicit-Assist task, we also considered the possibility that individuals with these neurodevelopmental disorders may experience greater deficits than control participants even when explicit instruction serves as a guide (Explicit-Assist task). This hypothesis is grounded in observations showing impaired simple probabilistic reinforcement learning among individuals with ADHD and dyslexia28,35.

We also sought to determine whether parameters derived from computational models of participant choices could serve as markers for neurodevelopmental disorders. Evidence suggesting reduced performance in the context of reinforcement learning in both dyslexia and ADHD, including suboptimal use of strategies36,37,38, led us to expect such suboptimal use of strategies. Specifically, individuals with dyslexia exhibit selective impairments in incremental learning of stimuli value through trial-and-error feedback36, whereas those with ADHD experience dual impairments in both value-based reinforcement learning and rule-based mechanisms37. Based on this evidence one can assume a broader disruption in cognitive processes for learning and adapting to complex reward structures among individuals with ADHD. For instance, problems in Bayesian inference in ADHD10,39 are likely to influence the ability to accurately assess the likelihood of different outcomes and to update beliefs based on new information. Moreover, a growing body of evidence shows difficulties in skill learning and memory consolidation among individuals with dyslexia23,35,40,41,42,43,44, including fast forgetting of recently learned information45,46, whereas declarative memory is preserved23. These findings lead to the prediction that these individuals will have difficulty retaining recently learned feature-reward associations in complex learning environments, especially under less explicit training conditions.

Results

Simple and complex reinforcement learning are affected in neurodevelopmental disorders

One-sample t-tests indicated that performance exceeded chance level (33%) in the ADHD group and its matched control group for both the Explicit-Assist and No-Explicit-Assist tasks (all p values <0.001). A 2 × 2 mixed factorial analysis revealed that participants performed significantly better on the Explicit-Assist task (M = 0.79, S.E. = 0.02) than on the No-Explicit-Assist task (M = 0.53, S.E. = 0.03) (F (1, 51) = 147.33, p < 0.001, ηp² = 0.74) consistent with prior findings. Performance was significantly poorer, F (1, 51) = 4.57, p = 0.037, ηp²= 0.08, in the ADHD group (M = 0.62, S.E. = 0.03) than in the control group (M = 0.71, S.E. = 0.02). The group by task interaction was not significant, F (1, 51) = 1.1, p = 0.29, ηp² = 0.02 (Fig. 2a). Despite this lack of significance, simple t-tests (one-tailed) were conducted based on their theoretical importance. Their results indicated that the ADHD group performed significantly worse than controls in the No-Explicit-Assist task, t (51) = 2.37, p = 0.02, but not in the Explicit-Assist task, t (51) = 1.38, p = 0.12 (after Bonferroni correction).

Fig. 2: Overall task performance.
figure 2

a The percentage of games learned as a function of task in the ADHD group and the matched controls and b in the dyslexia group and the matched controls. Error bars represent one standard error.

One-sample t-tests indicated that performance exceeded chance level (33%) in the dyslexia group and its matched control group both the Explicit-Assist and No-Explicit-Assist tasks (all p values <0.001). A 2 × 2 mixed factorial analysis revealed that participants performed significantly better, F (1, 41) = 91.52, p < 0.001, ηp² = 0.69, on the Explicit-Assist task (M = .80, S.E. = 0.02) than on the more complex No-Explicit-Assist task (M = 0.53, S.E. = 02), consistent with prior findings. Performance was significantly poorer, F (1, 43) = 6.67, p = 0.02; ηp² = 0.13, in the dyslexia group (M = 0.62, S.E. = 0.03) than in the control group (M = 0.72, S.E. = 0.02). The group by task interaction was not significant, F (1, 41) = 0.13, p = 0.72, ηp² = 0.003 (Fig. 2b). Despite the lack of significance, simple t-tests (one-tailed) indicated that the dyslexia group differed from the control group in the Explicit-Assist task, t (41) = 2.405, p = 0.02, with a marginally significant difference in the No-Explicit-Assist task, t (41) = 1.85, p = 0.07 (after Bonferroni correction).

Individuals with ADHD rely on suboptimal strategies in complex learning settings

One-tailed independent sample t-tests applied to the data derived from the computational modeling analysis did not reveal any significant differences in model parameters between the ADHD and the control groups on the Explicit-Assist task. On the Explicit-No-Assist task, the difference between the two groups was in the best-fit value of the free parameter alpha, which was lower for the ADHD group. This parameter reflects the balance between the contributions of reinforcement learning and Bayesian learning processes, with lower values indicating greater reliance on the reinforcement learning mechanism. On average, in the typical group the best fit was 0.23, whereas the optimal value for participants with ADHD was only 0.12 (see Table 1).

Table 1 Comparisons of the model parameters across each clinical group and its matched controls

Individuals with dyslexia demonstrate heightened decay rates in complex learning settings

We did not observe any significant differences in model parameters between the dyslexia and control groups on the Explicit-Assist task. We observed a significant difference in the decay rate parameter between the two groups on the No-Explicit Assist task, such that the parameter was higher in the dyslexia group (see Table 1). This parameter reflects the rapid diminishment of the influence of past experiences or learned information over time. In other words, in the dyslexia group information or associations learned in the past are quickly forgotten or discounted while learning, as compared to the control group.

Discussion

Learning studies among individuals with neurodevelopmental disorders often focus on simplified learning challenges that offer very few dimensions and provide explicit instructions as a guide. Yet, our understanding remains incomplete when it comes to how these individuals navigate more complex, real-world learning environments where attention and learning processes are in constant interplay.

In the present study we compared the performance of two neurodevelopmental groups to that of controls, while focusing on two distinct tasks: an Explicit-Assist simple reinforcement learning task in which participants were informed of the pertinent dimension, and a more intricate reinforcement learning task in which the relevant dimension was kept undisclosed (No-Explicit-Assist task). On the second task, not only did participants face the challenge of learning the feature linked to the highest reward, but they also needed to employ selective attention to learn the dimension most predictive of reward. We observed that overall performance was better on the Explicit-Assist reinforcement learning task than on the more complex No-Explicit-Assist reinforcement learning task, in line with previous studies32,33. However, when compared to their age-matched counterparts individuals with ADHD and dyslexia exhibited different performance profiles, even though their performance exceeded chance level. Individuals with ADHD faced challenges when they needed to learn to attend to the relevant dimension for maximizing rewards. Those with dyslexia were also affected when the signaled dimension was explicitly communicated to them, setting them apart from their matched controls. This interpretation should be taken with caution as the task-by-group interaction was not significant for both populations. These results align with prior research demonstrating impaired reinforcement learning in individuals with ADHD or dyslexia14,28,47,48. Moreover, they extend these findings to more complex reinforcement learning challenges that are ubiquitous in real-world environments.

We observed that individuals with ADHD and dyslexia exhibited more impairments than their matched controls. Deficits in learning, attentional functions or both may contribute to their performance shortcomings. It is conceivable that individuals with ADHD and dyslexia can focus their attention, yet they may struggle with trial-and-error learning to identify the most predictive feature or dimension for rewards, or they may rapidly forget the learned associations. It may also be the case that they encounter challenges in initially directing their attention towards relevant features or dimensions, thus limiting the effectiveness of state representation learning mechanisms. Moreover, there may be an atypical bidirectional interaction between learning and attention in these conditions. From a behavioral standpoint alone, a learning impairment is the more likely explanation in the dyslexia group. Attentional deficits or atypical interactions between attentional and learning processes could potentially lead to greater impairment on the No-Explicit-Assist task compared to the Explicit-Assist task. The fact that behavioral performance in the dyslexia group compared to the performance of matched controls was not adversely affected by increasing the attentional demands of the task reduces the likelihood that attentional deficits solely account for the observed group differences. However, simple t-tests conducted among the ADHD group point to differences in the No-Explicit-Assist task compared to controls but not in the Explicit-Assist task. Thus, impaired performance may arise in ADHD due to difficulties in focusing attention or due to atypical interactions between attention and learning.

Drawing conclusions from behavior alone is limited, as similar performance can sometimes arise from different underlying processes. To delve more deeply into group differences in the trial-by-trial dynamics of learning, we employed a hybrid computational model, integrating aspects from both reinforcement learning and statistically optimal Bayesian approaches, consistent with previous studies (Daniel et al.33). Our analysis revealed significantly higher decay parameter rates in the dyslexia group on the complex No-Explicit-Assist reinforcement learning task compared to their control counterparts. Nevertheless, a similar trend of higher decay rate was found in the neurodevelopmental disorder groups in all conditions, even when not reaching significance. These findings imply that individuals with dyslexia tend to discount the values of all unchosen options and focus on updating the values of the chosen option. These individuals are more likely to maintain high values for single features than for combinations of features. Individuals with dyslexia exhibit such updating bias primarily when confronted with high attentional demands, such as in complex multidimensional learning environments.

Two mechanistic explanations can account for the higher decay rates observed in the dyslexia group. One possibility is that individuals with dyslexia employ narrower selective attention during the learning phase, thus attributing the reward to a smaller set of stimulus features. Alternatively, they may be more prone to forgetting recently learned feature-reward associations. At present, it is unclear which explanation is more plausible. Although additional research is necessary to differentiate between these two explanations, our prior research may offer some insights. Our previous computational modeling research suggests that individuals with dyslexia are selectively impaired when incrementally learning the value of stimuli based on trial-and-error feedback, though they perform on par with controls when employing strategies that explicitly represent and evaluate hypotheses36. This indicates that their difficulty lies specifically in the reinforcement-learning mechanism rather than in cognitive strategies that involve hypothesis testing and rule application. Future research is essential to pinpoint whether the precise nature of the impairments observed in the present study is similar. Experimental manipulations that can differentiate between the contributions of selective attention and memory decay to task performance in this population will be particularly informative. Meanwhile, our results highlight the possibility that individuals with dyslexia have difficulties learning the relationships between reward feedback and incidental cues. These findings resonate with previous research showing impaired incidental learning of complex sound categories when associated with notable events and behaviors among individuals with dyslexia49, as well as impaired consolidation of statistically structured input40. Rapid forgetting in dyslexia can lead to greater flexibility and more rapid adaptation to new situations50 but also can hinder the development of long-term perceptual representations, especially in situations where stable long-term knowledge is crucial. It may be that such bias in dyslexia represents resource rational behavior in linguistic environments that provide low endogenous precision51,52.

An intriguing difference between individuals with ADHD and their control counterparts that was revealed by our computational modelling lies in a crucial parameter, alpha, which governs the balance between reinforcement learning and Bayesian learning in determining attentional weights. On the Explicit-Assist task, the alpha parameter hovered near zero in the control group, indicating a predominant reliance on reinforcement learning to steer attention and learning within the specified dimension. Conversely, in the No-Explicit-Assist task, the alpha parameter exhibited significant divergence across the groups, with higher values observed in the control group. These findings suggest that in intricate multidimensional environments, typically-developed individuals lean more towards statistically optimal Bayesian learning mechanisms, whereas those with ADHD lean more towards computationally efficient yet suboptimal reinforcement-learning processes. The present findings suggest that the observed learning impairments in ADHD are more closely related to the executive control processes that direct reinforcement learning toward operating on the currently important dimensions of the environment, rather than to a deficiency in reinforcement learning per se. Indeed, those with ADHD appear to face challenges in properly integrating prior knowledge with new information to update beliefs accurately, consistent with recent theoretical models10,39.

The present study underscores the importance of examining both behavioral performance and computational markers to elucidate the underlying impaired mechanisms in neurodevelopmental disorders. Although individuals with dyslexia and ADHD show similar performance in navigating complex learning environments, they exhibit alterations in different processes when considering the dynamic of learning across trials: Individuals with dyslexia exhibit difficulties in maintaining recently learned associations, while those with ADHD tend to rely on suboptimal strategies for learning as compared to controls. These findings align with previous research showing selective impairments in value-based learning in dyslexia36 but not in ADHD37. The dual impairment observed in ADHD suggests a broader disruption in the cognitive processes for learning and adapting to reward structures, corroborating our prior research14. One leading related theoretical framework posits that a family of neurodevelopmental disorders, including dyslexia and ADHD, is characterized by altered procedural learning and memory systems16. Our results resonate with this theoretical framework but also reveal different potential mechanisms that can contribute to impaired learning in dyslexia and ADHD. Note that while we observed different mechanisms for ADHD and dyslexia, it is possible that both mechanisms affect both groups, but to different extents. ADHD individuals showed higher decay rates than controls, but not significantly so, and individuals with dyslexia showed lower alpha levels compared to controls, but not significantly so. This may indicate that in complex learning scenarios multiple mechanisms affect learning, and the differences between the disorders may depend on different weights given to these mechanisms rather than on completely separate mechanistic deficits.

Research interest in applying reinforcement learning methods to study cognitive processes in neurodevelopmental disorders is growing51,52. Traditionally, research has focused on simple reinforcement learning tasks. Our study, in contrast, examined how individuals with ADHD and dyslexia navigate complex learning environments in which multiple dimensions predict reward and no explicit assistance is provided. Compared to controls, individuals with ADHD and dyslexia showed impaired learning. Our computational modeling revealed different impaired processes: ADHD is associated with difficulties in updating beliefs based on new evidence, whereas in dyslexia past experiences quickly lose their influence on current decisions. These findings underscore different computational markers in neurodevelopmental disorders, highlighting the need for tailored interventions that address these unique learning and attentional challenges to enhance educational outcomes for individuals with dyslexia and ADHD.

Methods

Participants

Individuals with ADHD and matched controls (N = 21 with ADHD, 10 female; N = 32 controls, 7 female; age range = 20–32, mean = 24, S.D. = 2.58) and individuals with dyslexia and matched controls (N = 20 with dyslexia, 8 female; N = 23 controls; 16 female; age range = 20–27, mean = 24.84, S.D. = 2.66) participated in the study (see Table II for participants’ behavioral characterization). A control group was matched to each clinical population (dyslexia control group and ADHD control group). Of the 32 participants in the ADHD control group, 22 were also included in the dyslexia control group. Participants with ADHD and dyslexia were recruited through the Yael Learning Disabilities Center at the University of Haifa in Israel. Control participants were recruited by ads. All participants were native speakers of Hebrew, with no history of neurological disorders or psychiatric disorders. All had normal or corrected-to-normal vision and hearing and normal cognitive ability based on the Raven test53. The Institutional Review Board at the University of Haifa (no. 18/099) approved the study, which was conducted in accordance with the Declaration of Helsinki. Participants received compensation for their participation in the study (150 shekels, approximately $30 USD) and provided signed informed consent.

Cognitive assessments

In addition to completing the experimental tasks, all participants underwent a series of cognitive tests designed to evaluate their cognitive ability as measured by Raven’s Progressive Matrices53:, verbal short-term memory54, word reading55 and attentional functions as measured by the ADHD Self Report Scale, ASRS56;. Participants with dyslexia and their matched controls also completed tests designed to assess their rapid automatized naming skills; RAN57;, phonological processing skills phoneme segmentation, phoneme deletion, and Spoonerisms58; and non-word reading59, Table 2 presents the details of these tasks, with results summarized in Tables 3, 4.

Table 2 Psychometric tests
Table 3 Demographic and psychometric data of the ADHD and control group
Table 4 Demographic and psychometric data of the dyslexia and control groups

Participants were included in the ADHD group if they met all the following criteria: (1) a formal diagnosis of ADHD given by a qualified psychologist/neurologist; (2) no comorbid neurodevelopmental disorders, such as developmental dyslexia, DLD, or any sensory or neurological disability; (3) reading skills at or above the average as measured by the word reading test55. Participants were included in the ADHD control group if they had: (1) no history of neurodevelopmental disorder; (2) attentional skills at or above average according to the ASRS56.

Participants were included in the dyslexia group if they met all the following criteria: (1) a formal diagnosis of dyslexia given by a qualified psychologist; (2) no comorbid neurodevelopmental disorders such as ADHD, developmental language disorder (DLD), or any sensory or neurological disability; (3) scored lower than expected on tests related to reading skills55,59. Since there are no standardized reading tests for adults in Hebrew, selection was based on local norms, using similar criteria as in other studies conducted among Hebrew readers with dyslexia40,60. In accordance with the standard practice in the Hebrew literature57,61, a score of one standard deviation below the mean of the local norms was chosen. Participants were included in the dyslexia control group if they had: (1) no history of neurodevelopmental disorder (2) reading skills at or above the average based on the word and non-word reading tests55,59.

The ADHD group did not differ from its matched control group in cognitive ability or age, but the groups differed in attentional skills. Similarly, those in the dyslexia group did not differ in age or in attentional or cognitive abilities from those in its matched control group. However, compared to the control group, the dyslexia group displayed a reading disability profile compatible with the symptomatology of dyslexia. The dyslexia group differed significantly from the control group both on rate and on accuracy measures of word reading and decoding skills. Moreover, the dyslexia group demonstrated deficits in the three key phonological domains: phonological processing (Spoonerism, phoneme segmentation, phoneme deletion), verbal short-term memory (digit span), and rapid naming (rapid automatized naming).

Experimental task

The experimental design mirrored that of Daniel et al.33. In this game-like task participants are presented with three perceptual stimuli differing on one to three dimensions: color, shape, and texture. At each stage, participants are asked to choose between stimuli consisting of a random combination of features (e.g., a green triangle with dots). Each stimulus is defined by three distinct features: color (red, yellow, or green), shape (circle, square, or triangle), and texture (dots, waves, or plaid). At any given moment, only one dimension is relevant for obtaining a reward. Within this dimension, selecting a specific feature results in a high probability of being rewarded (75% chance of receiving 1 point, otherwise 0), while choosing any of the other two features has a lower reward probability (25%). For instance, in the stimulus slide depicted in Fig. 1, if the relevant dimension is color and the rewarding feature is red, a participant that chooses the object on the right has a 75% chance of being rewarded.

After becoming familiarized with these rules, participants are told that their goal is to accumulate as many points as possible. In the control Explicit-Assist task, participants are explicitly informed about the dimension predictive of reward (color, shape, or texture). In contrast, in the No-Explicit-Assist task, this explicit prompt is not provided, and the participants’ task is to discover the rewarding dimension. After the familiarization, three stimuli are presented, and participants are required to select one by pressing a corresponding button. Failure to respond within 2 seconds leads to trial abandonment, accompanied by a display of the message “No response was received”. Conversely, upon responding participants are promptly notified whether their choice yields a point. Immediate feedback ensues following a fixed delay of 0 seconds. After a fixed intertrial interval (ITI) of 300 milliseconds, the subsequent trial starts. At the end of each game, a selection screen featuring all nine features is shown (see Fig. 1), prompting participants to indicate what they think is the feature most predictive of reward. There is no feedback at this stage. The game ends and a new game begins if participants correctly select the appropriate feature eight consecutive times (successfully learned games) or after 25 failed attempts to identify the appropriate feature (failed-to-learn games). The measure of interest is the number of games successfully learned, where the chance level is 33%.

Procedure

Participants played ten games (five No-Explicit Assist and five Explicit-Assist) in a training phase and then continued to perform the experiment. The experiment included 36 games consisting of 18 No-Explicit-Assist games and 18 Explicit-Assist games (6 color, 6 shape and 6 texture) randomly interleaved. Stimulus presentation and recording of response time and accuracy were controlled by an E-Prime computer program (King & Schneider, 2002). The experiment was conducted in a single session that did not exceed one hour.

Data analyses

Each clinical population group (ADHD/dyslexia) was assessed against its matched control group. A one-tailed t-test was used to assess the performance of each group in the game relative to chance (33%). In line with prior research, a repeated measure ANOVA analysis was conducted on the percentage of games learned, with task (Explicit-Assist/No-Explicit-Assist) as the within-subject factor and group (Controls vs. ADHD/dyslexia) as the between-subject factor. Moreover, simple t-tests corrected for multiple comparisons were conducted to examine group differences separately for each task. Given prior research e.g.23,24,27,28, we speculated that the clinical groups would exhibit significantly poorer performance than controls on both the Explicit-Assist and the No-Explicit-Assist tasks.

Modeling the dynamics of learning

Although measures of overall task performance are informative for evaluating group differences, they tell us little about the underlying mechanisms that contribute to these differences. To determine the temporal dynamics of learning and compare the use of different strategies across ADHD and dyslexia groups vs. matched control groups, a computational model of the sequence of each participant’s choices was calculated for each task separately. Building on previous work that modeled the current multidimensional reinforcement task8,9,62, the model combined aspects of reinforcement learning2,8 with a more statistically optimal Bayesian approach63. Prior research indicates that the hybrid reinforcement learning/Bayes model was the model with the best fit in explaining participants’ strategies on the same experimental paradigm33. Hence it was used in the present study to model participants’ choices. By utilizing the parameters of the hybrid reinforcement learning/Bayes model, we were able to discern subtle discrepancies in the learning dynamics among various groups of participants.

The reinforcement learning part of the hybrid model updates the value of each feature of a chose stimulus (color, shape and texture) based on the reward obtained by choosing it. This update is controlled by a free parameter η (learning rate). Higher learning rate indicates increased reliance on recent observations. The process also includes another parameter ηk (decay rate) that updates the values of all unchosen features, dictating the width of an implicit attentional filter dimension weight. Higher decay rates indicate rapid decay in the values of unchosen features i.e., the features of the unchosen options which represent reduced representations of these choices’ outcomes8. For instance, if a participant choose a red triangle but not a green square in a specific trial, the values assigned to “green” and “square” will be less represented after this trial in a case of high decay rates. Accordingly, higher decay rates imply less information regrading previous choices that were not strengthened in the current trial and hence can be detrimental to task performance. The reinforcement learning part also assigns weights to different dimensions based on their current learned values during the decision process, and their specificity is controlled by a free parameter \(\delta\), where higher values indicate assigning lower weights to low-value dimensions for the decision at hand, without changing their value estimation.

The Bayesian part of the model assigns values to dimensions based on the frequency in which unique features were associated with reward, using the Bayes rule64. This process does not require any free parameter but assumes complete memory of all stimulus-reward observations.

Finally, the hybrid model integrates reinforcement learning with the Bayesian parts to generate a value for each stimulus, using a parameter \(\alpha\) that balances between the reinforcement learning and the Bayesian value estimations33. High values of \(\alpha\) represent increased reliance on Bayesian learning. This combined value is used in a softmax decision rule, with β (SoftMax inverse temperature) used to capture the noise in participants’ choices, where low β indicates more random choices.

The hybrid reinforcement learning/Bayes model assumes that the probability of choosing stimulus Si depends on its combined (RL and Bayesian) value V. Softmax decision rule was used, with parameter \(\beta\) (inverse temperature) in order to map V(S) to the probability of choosing:

$$p\left({choosing}\,{S}_{i}\right)=\frac{{e}^{\beta V\left({S}_{i}\right)}}{\mathop{\sum}\nolimits_{j=1}^{3}{e}^{\beta V\left({S}_{j}\right)}}$$
(1)

To estimate the value of Si, the following function was used in which separate weight (\(W\)) values for each feature (\(f\)) on each dimension (\(d\)) were learned. The value of \({S}_{i}\) was calculated as a weighted linear combination of its feature values:

$$V\left(S\right)=\mathop{\sum }\limits_{d=1}^{2}{\phi }_{d}{W}_{f}$$
(2)

Where \({\phi }_{d}\) is the weight of dimension \(d\). On each trial of the task, the value \({W}_{f}\) of each of the features \(f\) of the chosen stimulus was updated based on the prediction error (PE) and a learning rate \(\eta\):

$${W}_{f}^{{new}}={W}_{f}^{{old}}+\eta {PE}$$
(3)

Where PE is computed based on the reward received and the value of the chosen stimulus:

$${PE}=R-V\left({S}_{{chosen}}\right)$$
(4)

In addition, the value of the features \(f\) that were not chosen were updated with a decay rate \({\eta }_{k}\) as follows:

$${W}_{f}^{{new}}=(1-{\eta }_{k}){W}_{f}^{{old}}$$
(5)

Moreover, an explicit attention filter was implemented, such that the proportion of attention resources directed toward each dimension was based on a RL weight vector \({\phi }^{{RL}}\) and a Bayesian inference based one \({\phi }^{{Bayes}}\).

Concerning \({\phi }_{d}^{{RL}}\) in the DT, the dimension weight was \(\delta\) if dimension \(d\) included the currently highest feature weight, while the remaining \(1-\delta\) was evenly distributed across the other two dimensions:

$${\phi }_{d}^{{RL}}=\left\{\begin{array}{l}\delta\qquad\qquad{if}\; {argma}{x}_{f}({W}_{f})\,\in {d}\,\\ \frac{1-\delta}{2}\qquad\quad{otherwise}\end{array}\right.$$
(6)

In the CT, \({\phi }_{d}^{{RL}}\) was always \(\delta\) for the instructed dimension and \(\frac{1-\delta }{2}\) otherwise.

Concerning \({\phi }^{{Bayes}}\), here it was assumed that participants can employ knowledge about reward probabilities to estimate the probability that each of the nine features \(f\) is the most rewarding one \({f}^{* }\). Each feature \(f\) was initialized to \(\frac{1}{9}\) at the beginning of the task, and was updated based on the rewards R and choices C using Bayes rule:

$$p(f={f}^{* }|{C}_{1:t},{R}_{1:t})\propto p\left({R}_{t}|f={f}^{* },{C}_{t}\right)p(f={f}^{* }|{C}_{1:t-1},{R}_{1:t-1})$$
(7)

At the beginning of each trial, \(p(f={f}^{* }|{C}_{1:t-1},\,{R}_{1:t-1})\) was used to derive dimensional attention weights \({\phi }^{{Bayes}}\). \({\phi }^{{Bayes}}\) was normalized using \(z\) so it sums up to one across the three dimensions:

$${\phi }^{{Bayes}}=\frac{1}{z}\left[\sum _{f\in d}p\left(f={f}^{* }|{C}_{1:t-1},{R}_{1:t-1}\right)\right]$$
(8)

Lastly, a free parameter \(\alpha\) was used that gave the model flexibility to rely more on either the RL-based attention weighting approach, or the Bayesain one:

$$\phi =\alpha {\phi }^{{Bayes}}+\left(1-\alpha \right){\phi }^{{RL}}$$
(9)

The values of each of the parameters \((\beta ,\eta ,\,{\eta }_{k},\delta ,\alpha )\) was fit individually for each participant by estimating the joint likelihood of all the T choices the participant chose:

$${\mathcal{L}}=p\left({C}_{1:T}|\beta ,\eta ,{\eta }_{k},\delta ,\alpha \right)=\mathop{\prod }\limits_{t=1}^{T}p({C}_{t}|{C}_{1:t-1},{R}_{1:t-1},\beta ,\eta ,{\eta }_{k},\delta ,\alpha )$$
(10)

The maximum likelihood of all parameters was estimated by minimizing the negative log likelihood of the participant’s choice data.

We estimated each participant’s free parameters by calculating the joint log likelihood of all choices. We used individual optimization using the R optim package to fit the model and evaluate the parameters that minimize the minus log likelihood. A full description of the model is provided in Daniel et al.33 and the code and data are available online (see data and code availability).

We assessed whether each clinical group differs in the model parameters compared to its matched control group (One-tailed independent sample T tests, Bonferroni-corrected p < 0.01 significant). Given prior research e.g.23,24,27,28, we speculated that relative to controls, the clinical groups would demonstrate reduced learning abilities. Specifically, relative to controls the clinical groups would show less incremental updating, resulting in reduced learning rates (lower η values), elevated noise (lower β values), as well as lower representation of unchosen features as manifested in higher decay rates (higher ηk values), including reduced dimension weight (lower \(\delta\) values) and reduced reliance on Bayesian learning (lower \(\alpha\) values).