Reversal learning is influenced by cognitive flexibility and develops throughout early adolescence

Bamberg, Christoph; Weigelt, Sarah; Hagelweide, Klara

doi:10.1038/s41539-025-00308-3

Download PDF

Article
Open access
Published: 12 May 2025

Reversal learning is influenced by cognitive flexibility and develops throughout early adolescence

npj Science of Learning volume 10, Article number: 27 (2025) Cite this article

2381 Accesses
1 Citations
7 Altmetric
Metrics details

Subjects

Human behaviour

Abstract

Learning behavioural responses and adapting them based on feedback is crucial from a young age, continuing to develop into young adulthood. This study examines the development trajectory and contributing factors from childhood to adulthood using a reversal learning paradigm. We tested 202 participants aged 10 to 22 in an online study, where they learned and reversed stimulus-outcome associations in a new blocked design paradigm and were assessed for working memory capacity. Results showed that reversal learning performance improved with age, particularly for 10- to 14-year-olds. Flexible responses to negative feedback correlated with better reversal learning. Additionally, pubertal development and working memory were positively associated with reversal learning. These findings align with previous research, highlighting flexible feedback responses as a key factor in reversal learning. As the overall rate of flexible reactions did not change with age, it could support reversal learning independent of age, potentially changing its role during development.

Navigating a varying reward environment in childhood and adolescence

Article Open access 02 July 2025

The control of movement gradually transitions from feedback control to feedforward adaptation throughout childhood

Article Open access 11 March 2025

Psychological flexibility and cognitive-affective processes in young adults’ daily lives

Article Open access 08 April 2024

Introduction

Learning is a life-long process. Not only what we learn is the result of a continuous process of adaptation, the very way we learn changes through our lifetime (e.g.^1,2). For instance, adults have a staggeringly different approach to learning a new language than infants, who begin to speak without even having a concept of language, indicating that learning does not just gradually increase in quantity, but also changes its quality during development³.

In adolescence in particular, both improvements and qualitative changes in learning processes are accompanied by far-reaching restructuring of the brain^4,5. In this way, accelerated growth and maturation go together with an expansion of cognitive, affective, and self-regulatory abilities that pave the way for the transition from child- to adulthood⁶. Neuroscientific studies show that changes in the brain that begin at puberty alter social information processing⁷ and the intake, integration, and retrieval of information in general^8,9. Accordingly, developments in underlying reinforcement learning processes are assumed to shape social behaviour and social learning during adolescence^10,11. Notably, adolescents do not automatically outperform children in all learning tasks or situations. For instance, fear extinction, i.e., the unlearning of a conditioned fear response, seems to be slowed in adolescents compared to both children and adults^12,13. This observation is supported by studies in juvenile rodents¹⁴ and can be related to developmental plasticity in the neuronal circuits underlying fear conditioning². Furthermore, children seem to be more receptive to statistical properties of phonetic input¹⁵ and motor sequences¹⁶ than adolescents and adults and were found to be more open to unusual causal relations than older participants¹. These studies show the complexity of learning and point to the importance of investigating the developmental trajectories of different learning processes and their underlying mechanisms. Research on one aspect of learning, namely incorporating instrumental feedback, shows significant changes in adolescence that are yet not fully understood.

Reinforcement learning includes the ability to form stimulus-outcome associations via instrumental feedback. As a simple example of instrumental reinforcement learning tasks, participants may get positive feedback if they press a button in reaction to a blue triangle, but negative feedback if they press the same button in reaction to a red square. This feedback is instrumental in that it informs the individual on how to adapt their response and increase the associated reward of their response. Oftentimes feedback is provided in a probabilistic way, so that, for example, in 80% of trials the blue triangle is associated with “correct” feedback and in 20% with “incorrect”, making it more challenging to learn the dominant association¹⁷. The ability to learn associations from probabilistic information was found to increase with age, especially during adolescence^18,19,20. Nevertheless, in some studies no age-dependent performance differences were observed²¹, or adolescents performed superior to adults when false conceptual information misguided the older participants²² or even in a simple probabilistic learning task²³. Hence, while instrumental learning seems to generally improve with age¹¹, specific aspects of it may follow a more complex trajectory²⁴.

In which way learned associations are adapted to new circumstances is one important aspect that may help understanding the intricate development of instrumental learning. Reversal learning (RL) paradigms test the ability to revise stimulus-outcome associations by changing instrumental feedback after participants learned to exhibit a certain response to a stimulus. Following the example from above, a button press in reaction to the blue triangle may suddenly turn out to be “incorrect”, reversing its association. Research generally indicates that RL ability increases during adolescence^{25,26,27,28,29}. This might be based on an increase of brain signal variability between childhood and mid-adulthood^30,31. Nevertheless, results are not consistent. For instance, Eckstein and colleagues found the most optimal learning strategies and highest RL performance in adolescents (13–15 years) compared to both children and adults³². Similarly, in a more recent study by Eckstein and colleagues of 291 participants aged 8–30 using a reversal-learning task designed to test the balance of persistence and flexibility, adolescents in their mid-teen years were found to outperform both younger and older age groups³⁰. A possible peak of RL in adolescence is supported by some evidence from rodent studies: For instance, juvenile mice performed better on an odour discrimination task with reversals than adult mice³³. On the contrary, Hauser and colleagues³⁴ did not observe performance differences of adolescents (12 - 16 years old) and adults in their RL paradigm. Nevertheless, using a computational modelling approach for the employed learning strategy, they found that adolescents learned faster from negative reward prediction error than adults, i.e., they showed higher sensitivity to changes in reward expectations. Similarly, van den Bos and colleagues did not find behavioural differences in learning associative rules comparing 8- to 11-year-olds, 13- to 16-year-olds and 18- to 22-year-olds in a probabilistic associative learning task. Nonetheless, they did find differences in neural activation between children and adults when comparing the processing of positive vs. negative feedback. These studies highlight that changes in the processing of negative feedback should be considered when investigating the development of RL performance during adolescence.

Indeed, studies found that children, adolescents, and adults use negative feedback to a different extent to change their behaviour: A study comparing 8- to 9-year-olds with 11- to 13-year-olds and 18- to 25-year-olds indicates that only children learn less well from negative instrumental feedback i.e., the information that a former response was incorrect, compared to positive instrumental feedback²¹. Van der Schaaf et al. found that especially adolescents between 16 and 17 years learn most strongly from negative instrumental feedback, compared to both children and adults²⁷. Reviewing neurodevelopmental research, DePasque and Galván assume that the prefrontal control to learn from negative feedback might develop during adolescence and that its effect on instrumental and reversal learning depends on task properties such as the informative value and quality of feedback¹⁸. They further suggest that adolescents exhibit greater cognitive flexibility in response to negative feedback compared to adults, which might lead to an advantage especially seen in reversal learning tasks²⁹. Moreover, this may be driven by increased dopamine-driven flexible behaviour in adolescence³⁵.

To summarise this body of research, the ability to reverse learned stimulus-outcome associations seems to increase from childhood to adulthood. Rapid changes in performance during adolescence and sometimes superior performance compared to adults were observed, although not always on the level of reversal performance but in relation to the underlying handling of instrumental feedback. This suggests that not only RL changes with age, but the way feedback is used to form new stimulus-outcome associations is subject to developmental change as well. Only a few studies investigated to what extent and in what way children and adolescents use negative instrumental feedback during RL compared to adults, although it seems to be an important factor contributing to the idiosyncratic RL performance in adolescence. To better understand these two aspects - the developmental trajectory of RL and the importance of reacting to negative feedback for RL performance - we designed a novel RL paradigm.

The reviewed studies on the development of RL mostly used a “continuous” design, i.e., stimulus-outcome associations reversed after a fixed number of correct answers learned from instrumental feedback^25,34,36. Here, we take a different approach with a new paradigm by guiding the participant’s learning throughout the course of the experiment. The paradigm is based on an associative learning task by Kinner and colleagues (ref. ³⁷; adapted from³⁸) that was modified to provide probabilistic feedback (80% valid and 20% invalid feedback trials). The paradigm uses a blocked design where invalid feedback appears at the beginning and end of each block, with valid trials in the middle. True stimulus-outcome associations may change between blocks, requiring participants to determine whether feedback signals a genuine change or is misleading. A key focus is how flexibly participants adapt their responses to negative feedback. Furthermore, participants can properly learn the associations before a reversal takes place, as at least eight consecutive valid trials are provided in the middle section of a learning block for each stimulus. Previous paradigms mostly used shorter phases of valid feedback (e.g., five trials) and more reversals. On the one hand, this set-up makes it more feasible to control the difficulty of the paradigm, so that participants at the age of 10 years manage to learn the associations even via probabilistic feedback. On the other hand, this helps distinguishing the associative-learning aspects from those related to flexibly reacting to feedback: We can measure the participants’ tendency to react flexibly to negative instrumental feedback (henceforth referred to as flexible reactions, omitting the qualification “to negative instrumental feedback”) at the beginning of each of three learning blocks independent of overall RL performance. Furthermore, we can observe the participants’ ability to adapt their response strategy according to the overall structure of the experiment: Reacting flexibly in the first learning block can be interpreted as openness to meaningful feedback; this is also true in the second and third block, as a break between learning blocks and half of the stimuli occurring in a new context indicate a potential change in contingency. Nevertheless, a decrease in flexible responses over the three blocks indicates an adaptation to the paradigm’s structure: Since feedback at the beginning of a block is partly misleading, it is most effective to stick to the association learned in the previous block and change the association only after repeated negative feedback.

We also added several questionnaires to the study to identify other age-dependent factors that may influence task performance next to the tendency for flexible reactions. Studies indicate that higher working memory has a positive impact on cognitive tasks in general (e.g.^39,40,41,42, and is related to better probabilistic learning performance⁴³. As it has not been clarified yet if this relationship changes from child- to adulthood¹⁹, we included a measure of working memory capacity to test its relation to reversal learning performance. Furthermore, we collected the motivation to engage in an effortful cognitive task, measured by the need for cognition⁴⁴, as a possible factor that might interact with the effect of working memory capacity on performance, especially during adolescence^45,46.

In sum, in this study we used a new paradigm to investigate developmental changes in RL performance, its relation to working memory and need for cognition, and the role of a specific aspect of the learning process, the reaction to negative instrumental feedback.

The main questions of interest in this study are how RL performance develops from early adolescence to young adulthood and how it is related to flexible reactions. We also ask how these flexible reactions differ between age groups, i.e., between early-adolescents (10 to 13 years), mid-adolescents (14 to 17 years) and young adults (18 to 22 years). Furthermore, we explore the relation of working memory capacity and the motivation to engage in effortful cognitive processes to RL performance. We preregistered further hypotheses focusing on probabilistic learning, the role of valence and contextual information on performance, and the stability of responses. These questions are beyond the scope of this article and will be considered in a later publication (see the public preregistration on the OSF repository: https://osf.io/ep5ua).

The hypotheses that we investigate here are the following: In line with the majority of findings on RL during development, we expect older participants to learn reversed stimulus-outcome associations better (H1a) and faster than younger participants (H1b). According to the studies on the role of negative feedback, this RL performance is hypothesised to be generally higher in participants that react flexibly to negative instrumental feedback (H2a). Nevertheless, we expect that the mid-adolescents’ openness to a behavioural change after negative feedback may override the learned prediction of the correct response, hindering RL instead of facilitating it. Therefore, the association between RL performance and flexible responses is expected to be positive in adults and early-adolescents, but non-existent or negative in mid-adolescents (H2b). Based on a study that investigated reactions to negative instrumental feedback in adolescents²⁷ we expect that flexible reactions are higher in mid-adolescents compared to early-adolescents and adults, following an inverted-U trajectory (H3a). Additionally, we hypothesised adults to optimise their response strategy more successfully than adolescents, i.e., adults should decrease their flexible reactions throughout the experiment as a sign of strategy adaptation, while early- and mid-adolescents should not (H3b). Finally, going further into the factors underlying successful RL, we expected working memory capacity and the motivation to engage in an effortful cognitive process to be both positively associated with RL performance (H4a), that motivation is more crucial for younger than older participants, so that the effect of motivation decreases with age (H4b), and working memory and motivation to interact in their effect on RL (H4c). In addition, we explored whether there are sex differences for these outcomes.

Results

In our analysis, we first considered the necessary precursor for reversal learning, namely that the initial stimulus-outcome associations are learned, before they can be reversed. After confirming this, we examined the relation between age and RL (H1a, H1b) as well as the association between puberty scores and reversal learning. Third, we investigated the link between flexible reactions and RL (H2a). Fourth, we assessed age group differences in these flexible reactions and their change during the experiment (H2b, H3a, H3b). Lastly, we explored the effects of motivation and cognitive capacity on performance in our paradigm (H4a-c). Here, we report the statistics without outliers removed. However, the supplementary analysis includes corrections for data points that yield a strong influence on the regression, which were excluded if their Cook’s D was above one. This additional caution did have only a minor impact on the results.

Did Participants Learn the Initial Associations?

RL presupposes proper learning of a stimulus-outcome association. Hence, we excluded participants who did not perform above our pre-specified initial learning threshold in the first learning block (N = 54, 27%, see Supplementary Table 1). They were especially younger participants (logistic regression for above/below threshold: coeff age = −0.15, SE = 0.04, 95% CI = [−0.2, −0.03], p = 0.006). After excluding these participants, the sample analysed below is N = 142; mean age M = 17.32, SD = 3.74 (see Table 1) of whom 32% are male (see Supplementary Table 2).

Table 1 Age Distribution Of The Final Sample

Full size table

As a next step, we considered how well participants learned stimulus-outcome associations over the whole experiment. The pattern of learning performance in the different age groups is as expected: Early-adolescents performed significantly worse in the second and third block compared to the first learning block (see Supplementary Table 2.; Supplementary fig. 1). In contrast, mid-adolescents’ performance is only significantly decreased for the third compared to the first learning block. The performance of adults did not change significantly from learning block one to two or three.

This shows that the paradigm allowed learning the stimulus-outcome associations and elicited differences between age groups as planned. After this initial confirmation of our approach, we turn to our hypotheses on RL.

Does Reversal Learning Differ Between Age Groups?

We observed the expected age-dependent increase in RL performance: For an increase in age by one year, the estimated improvement in RL performance is 0.01 units (SE = 0.003, 95% CI = [0.01, 0.02], p < 0.001; Fig. 1). This means that, for example, a 17-year-old participant compared to a 10-year-old participant is estimated to have a reversal score 0.07 units higher on a scale from zero to one. This association remained when adding sex as a covariate (see below). To better understand this relation between age and RL, we fitted a regression spline model with separate slopes for early-, mid-adolescents and adults (Fig. 2). For early-adolescents (between the y-intercept and the first knot at 14 years), the estimated slope is 0.05 (see Supplementary Table 3); for mid-adolescence it is −0.002; and for adults 0.01. This indicates that the relationship between age and RL performance is strongest in early-adolescents between 10 and 14 years compared to the rest of the sampled age range.

**Fig. 1: Scatterplot For Age and RL Performance.**

**Fig. 2: Linear Basis-Spline Regression for Age and RL Performance.**

As a next step, we investigated whether this relation is also true for RL response times (H1b). Against our expectations, we did not find a significant linear decrease in log-response times with age (coeff = −0.01; SE = 0.01, 95% CI = [−0.03, 0.001], p = 0.086). Yet, when excluding response times below 200 ms., since they are too fast to be serious responses to the task, the relation with age is significant (coeff = −0.02; SE = 0.01, 95% CI = [−0.03, −0.001], p = 0.011).

Does Reversal Learning Differ Depending on Pubertal Maturation and Sex?

As an exploratory step, we assessed how much of the developmental change in RL performance can be explained by self-reported puberty levels for underage participants (N = 75, pubertal development data for one participant was missing). With every unit increase in puberty, the RL performance increases by 0.2 units (SE = 0.09, 95% CI = [0.008, 0.38], p = 0.04).

There were approximately twice as many female than male participants in the sample (see Supplementary Table 5). Sex as well as age show a significant association with RL performance when adding them to the same regression model (coeff female sex = 0.05; SE = 0.02, 95% CI = [0.001, 0.09], p = 0.045; coeff age = 0.01; SE = 0.003, 95% CI = [0.01, 0.02], p < .001).

In sum, RL performance was slightly higher for older participants, especially within the early-adolescent group. Furthermore, it increased with pubertal development and was higher with female sex but did not co-occur with faster response times.

What is the Relation between Reversal Learning Performance and Flexible Reactions to Negative Instrumental Feedback?

After establishing a general relation between maturation and RL performance, we turned in more detail to a specific determinant of RL: flexibly reacting to negative instrumental feedback. As hypothesised (H2a), there is a positive association between RL performance and flexible reactions: For a 0.1 increase in flexible reactions (scale from 0 to 1), there is a 0.036 increase in reversal learning performance (SE = 0.09, 95% CI = [0.18,0.54], p < 0.001; Fig. 3). This effect remained significant when controlling for sex as a covariate (see Supplementary Table 6).

**Fig. 3: Scatterplot For Flexibility and Reversal Learning Performance.**

How Do Age and Flexible Reactions Together Contribute to Reversal Learning?

We found age and flexible reactions to be associated with RL performance in independent regression models. When using them both in the same regression model, i.e. controlling for the variance explained by the other factor, each of them still significantly explains RL performance (coeff age = 0.01, SE = 0.003, 95% CI = [0.006, 0.02], p < 0.001; coeff flexible reactions = 0.32, SE = 0.09, 95% CI = [0.14, 0.49], p < 0.001). This indicates that both higher age and a stronger tendency to change one’s answer after negative feedback contribute to better RL performance.

Moreover, we hypothesised both factors to interact in a specific way (H2b): better performance in learning reversed contingencies should be associated with more flexible reactions in adults and early-adolescents while we expected this association to be non-existent or negative for mid-adolescents, based on previous research. To test this hypothesis, we fitted linear regressions with the predictor flexible reactions separately for early-adolescents, mid-adolescents, and adults. Indeed, results show no significant relation between flexible reactions and RL performance in early-adolescents or mid-adolescents while a significant association was observed in adults (Table 2; Fig. 4).

**Fig. 4: Relationship Between Flexibility and RL Performance for Early-Adolescents, Mid-Adolescents and Adults.**

Table 2 Results of Regressions with Flexible Reactions for Each Age Group

Full size table

Nevertheless, when we predicted RL performance by age, flexible responses and the interaction between the two variables in a regression model, therefore assuming a linearly increasing influence of age on the positive effect of flexible reactions on RL, we found no significant interaction between age and flexible reactions (coeff = 0.03, SE = 0.02, 95% CI = [−0.01, 0.07], p = 0.174) while age and flexible responses still significantly predicted RL performance. Taken together, these results indicate that participants with a higher tendency for flexible reactions performed better, irrespective of their age, but that his relation was most pronounced for adults.

What is the Relation Between Age and Flexible Reactions to Negative Feedback?

To test our hypothesis that flexible reactions are highest in mid-adolescence, we fitted a model with a linear and quadratic term for age as a coefficient to flexible reactions as an outcome. We neither found evidence for a quadratic relation between age and flexible reactions (coeff = −0.01, SE = 0.01, 95% CI = [−0.03, 0.01], p = 0.56) nor a linear relation (coeff = 0.01, SE = 0.01, 95% CI = [0.0, 0.03], p = 0.16), with no change when adding sex as covariate (see Supplementary Table 7). Since a quadratic term for age did not describe the development in flexible reactions well, we conducted an exploratory basis-spline regression for flexible reactions (Fig. 5). We only found significant basis functions for early-adolescents and adults. The slope for early-adolescents, 0.03, was three times as large as for adults, 0.01 (see Supplementary Table 4). The slope for mid-adolescents is −0.01.

**Fig. 5: Linear Basis Spline Regression for Age and Flexible Reactions.**

After considering the flexible reactions throughout the whole paradigm, we also investigated differences between the three learning blocks. We anticipated that adults would reduce flexible reactions in response to negative feedback, indicating that they saw through the pattern of invalid feedback in the beginning of a block, while adolescents would not show this adaptation (H3b). To assess this, we applied linear regression models to each age group, using the learning block as a categorical predictor with the first block as the intercept. In early-adolescents, flexibility significantly decreases from first to second learning block Table 3; Fig. 6) but not from first to third learning block. In mid-adolescents, there is neither a significant change in flexibility from learning block one to two nor from one to three. In adults, flexibility significantly decreases between first and second learning block and first and third learning block. Notwithstanding the reduction from first to second learning block in early-adolescence, only adults showed the expected decrease in flexible reactions in both learning block two and three, indicating that they may have understood this feedback in the beginning of the learning blocks to be invalid.

**Fig. 6: Change In Flexible Reactions Over Learning Blocks, Split By Age Group.**

Table 3 Flexible Reactions in Learning Blocks, Split by Age Groups

Full size table

Influences of Working Memory Capacity and Need for Cognition

Besides testing the relation between age, flexible reactions and RL, we also investigated whether there is an association with working memory and Need for Cognition (NFC) as two likely candidates of situational and personality factors, respectively, that may influence RL (H4a-c). We found a positive association between working memory and reversal learning performance (coeff = 0.03, SE = 0.01, 95% CI = [0.009, 0.053], p = 0.007). However, NFC and RL performance were not significantly associated (coeff = 0.014, SE = 0.01, 95% CI = [−0.006, 0.04], p = 0.18). There was no interaction between working memory or NFC and age in their effect on RL. Interestingly, in an exploratory analysis, working memory was significantly associated with flexibility (coeff = 0.02, SE = 0.004, p < 0.001, 95%CI = [0.007, 0.02]). Moreover, working memory was higher for older participants (coeff = 0.08, SE = 0.02, 95% CI = [0.04, 0.13], p < .001).

Discussion

We investigated how RL differs between age groups and in relation to flexible reactions to negative instrumental feedback. Our main finding is that older participants learned stimulus-outcome association reversals slightly better than younger participants, which is consistent with our first hypothesis. This was especially the case for early adolescents and may be accounted for by pubertal development. Reaction times did not change in relation to age according to our preregistered analysis. When excluding implausibly fast responses as an exploratory analysis, age is associated with faster responses. According to the second hypothesis, participants who reacted more flexibly to negative feedback showed higher RL scores than participants who reacted less flexibly. Only partly in accordance with our expectation, this relation was most pronounced in adults. Third, we hypothesised to find an inverted-U relation between age and the number of flexible reactions, meaning that adolescents should react most sensitive to negative feedback. Contrary to our expectation, we found no general differences in flexibility between age groups. Nevertheless, as expected, adults reduced their responsiveness to negative instrumental feedback over the course of the experiment, indicating that they adapted their response strategy, unlike early-adolescents and mid-adolescents. Fourth, we found that working memory scores were associated with RL and flexible reactions, while NFC did not show any associations. Taken together, RL seems to improve during adolescence, especially between the ages of 10 and 14, and may be supported by proficient use of instrumental feedback. Below, these two aspects and their relation are discussed in more detail.

Our cross-sectional developmental results are in line with some previous research using different kinds of RL tasks. For instance, Crone et al.²⁵ identified children between 8 and 11 years of age, compared to adolescents and adults, as the group performing worst in RL. In their task stimulus-outcome associations changed continuously after two to four correct trials. In a similar continuous task with reversals after six to ten trials executed by children between 8 and 12 years as well as adolescents between 13 and 16 years, children also showed lower performance than the adolescent group²⁹. Although we used a blocked design, our finding that adolescents between 10 and 14 years show the highest performance increase may therefore extend to younger participants as well. Hauser and colleagues³⁴, who did not include participants below 12 years, found that adolescents (12 to 16 years old) did not differ from adults in the number of switches in their response behaviour when contingencies reversed after at least six correct choices. Similarly, we also only observed small performance differences between older adolescents and adults. Van der Schaaf and colleagues²⁷ tested RL performance in four age groups (10 to 11 years vs. 13 to 14 years, 16 to 17 years, and 20 to 25 years) with a task that changed associations after four to six correct responses. They also found the strongest increase in performance in 10- to 11-year-olds and decreasing improvements for 13- to 14-year-olds while performance was similar in the two oldest age-groups. Summing up, these studies support our finding that RL accuracy may reach a ceiling in late adolescence and that the highest gain in the ability to revise associations happens during early adolescence. This is in line with the observation that executive function more broadly improves more strongly in early adolescence compared to later adolescence⁴⁷.

Complementing these observations, we explored in how far the development of RL in adolescents can be explained by pubertal development. We tested the influence of pubertal development with self-reported puberty scores, which are inherently more imprecise than external medical assessments⁴⁸. Notwithstanding this caveat, we found that progress in pubertal development is related to higher RL performance amongst adolescents. These results may indicate that pubertal changes contribute to the observed improvements, but further studies are needed to examine the influence of puberty on learning measures more thoroughly.

Besides age or pubertal development, sex explains some differences in reversal learning performance in our study, with female participants performing better than male. There may be a genuine difference between female and male RL, or the higher performance of girls and women may point to the fact that females mature earlier than males in some regards, e.g., in grey matter development⁴⁹ or structural connectivity⁵⁰, and show earlier changes of reward-related behaviours⁵¹. This would further support the cross-sectional developmental differences observed in our study.

Furthermore, we found better RL performance to be related to higher working memory capacity irrespective of age, i.e., all participants similarly profited from higher executive function. Older participants had higher working memory scores, which could be related to the cross-sectional increase in RL performance we found. This is in line with research showing that the ability to retain and manipulate information in working memory promotes cognitive flexibility⁵². Interestingly, participants with higher working memory scores also tended to react more flexibly, pointing to the importance of cognitive capacity for flexible behavioural adaptation. In summary, previous behavioural research supports our result that flexibly relearning stimulus-outcome associations is most challenging for adolescents between 10 and 14 years. Still, the improvements in RL performance during early adolescence may be limited to certain tasks that are tailored in a specific way. On the one hand, they may not extend to less taxing tasks such as associative learning without reversals: For example, van den Bos et al.²¹ did not find differences in associative learning performance and response times between 8- to 11-, 13- to 16- and 18- to 22-year-olds. On the other hand, learning differences between older adolescents and adults may be more nuanced than apparent in a simple measure as the sum of correct responses, or require a higher degree of task difficulty. For instance, in the study by Palminteri and colleagues²⁶ adolescents (12 - 17 years) and adults performed a RL task that either provided rewarding or punishing feedback and additional counterfactual information or not. Adolescents performed only worse when counterfactual information was provided that adults could make use of, while adolescents seemingly did not take this information into account. Furthermore, adolescents learned less than adults from punishing feedback, while there was no difference in the rewarding condition. Thus, there seem to be more nuanced differences between mid-adolescents and adults not reflected in overall RL performance. Responding to negative feedback is one such possible nuance that we investigated in our study and discuss in the next section.

To better understand the development of RL, we examined its relation to the tendency to react flexibly to negative instrumental feedback, i.e., to change one’s answer after receiving the feedback that the former response to a stimulus was wrong. Our study points to a supporting role of flexible reactions for RL for all participants that might change its influence during development. We expected that mid-adolescents should exhibit most flexible reactions although they might not necessarily profit from these responses in terms of better RL. Against this expectation, the number of flexible reactions did not change with age neither in terms of an inverted U-shaped or linear relation. Complementing this result, although there was no linear increase in flexible responses across the complete age range, in an exploratory analysis we found a significant increase of flexible responses between 10 and 14 years and 18 and 22 years. The increase in early adolescence is in line with the proposal of DePasque and Galván¹⁸ that during puberty the prefrontal control is developed that allows to process negative feedback more efficiently.

We based our hypothesis on the study by Van der Schaaf and colleagues²⁷ which showed an inverted-U relation between age and reactions to negative instrumental feedback - a measure close to our flexible reaction score. Similarly, Hauser et al.³⁴ found that adolescents learned more quickly from negative instrumental feedback than adults. The study by Waltmann et al.²⁸ included participants from 12 to 45 years and also showed that younger participants changed their responses after negative feedback more readily than older participants. However, these studies show significant differences to ours. In Hauser and colleagues’ study³⁴, the underage sample was on average 14.7 (SD = 1.3, range = [12, 16]) years old, lacking the younger adolescents we included. Van der Schaaf et al.²⁷ found that especially adolescents between 16 and 17 years learn most strongly from negative instrumental feedback, compared to both, children and adults. In contrast to our study, participants were not directly informed of the correct response but rather had to deduce it by comparing their predicted outcome with the actual outcome. It could therefore be that adolescents are better than adults and children in their ability to react flexibly to counterfactual feedback but not so for transparently displayed instrumental feedback. This is partially supported by the models of reasoning Palminteri et al.²⁶ used to explain their previously mentioned results, since they found that adolescents may not profit as much from counterfactual information as adults. On the other hand, in the study by Waltmann and colleagues²⁸ direct instrumental feedback was provided just as in our paradigm. Nevertheless, their task was designed in such a way that more flexible responses to negative feedback meant not being sensitive enough to former positive feedback, i.e., being overall less reinforcement sensitive, corresponding with worse learning performance in the stable phases between reversals. In our study, adults may have reacted as flexible as adolescents because we looked at switching behaviour at the beginning of the learning blocks, so that changing responses after negative feedback was adaptive. Accordingly, flexible reactions were related to better RL performance irrespective of age. Examining the age groups separately, we found that this relationship was most prominent in adults, hinting at their ability to make use of flexible responses most efficiently.

This assumption is further supported when considering the rate of flexible reactions across the three blocks. Only adults showed an adaptation of their response strategy to the task demands, reflected as a decrease in flexible reactions over the course of the experiment, while such changes were less clear or absent in early- and mid-adolescents, respectively. This decrease in flexibility among adults is not related to a decline in overall performance but rather suggests they use the full extent of the provided information, including shifts in the optimal strategy, more effectively than adolescents. In the paradigm, adapting responses immediately to negative feedback is generally successful, but this strategy can be improved upon by recognizing that some feedback at the beginning and end of a learning block is invalid and should not be followed. Adults reacted more flexible overall but may have intentionally reduced their flexible reactions toward the end of the paradigm, having realized the occasional invalidity of feedback. This represents a higher-order strategy requiring the ability to judge the validity of feedback before adapting responses, which we expected only adults to achieve. Thus, adults may use complex, less immediate or tangible sources of information better than adolescents- in the case of our study in form of slow changes in optimal strategy, or as previously reported in form of counterfactual feedback²⁶. Accordingly, a review of computational modelling studies suggests that the adaptation of learning rates might increase with age, i.e., adults are better in optimizing how heavily their responses are influenced by recent feedback²⁴. Studies using other measures of learning and adaptation also showed that only older adolescents and adults can properly adjust their learning strategy to specific environments, e.g., in a recognition memory task⁵³, during decision making^54,55 and in an estimation and choice task⁵⁶. As Nussenbaum and Hartley²⁴ point out, the adult ability to use the most effective response strategy may explain why on the surface, instrumental learning studies are inconsistent regarding the extent to which feedback is used by different age groups. Task specifics may determine if it is better to include recent negative or positive feedback, or not, thus influencing if adolescents or adults react more flexibly. Therefore, in the last paragraph, we will discuss the strengths and limitations of our paradigm to make some suggestions which aspects might be considered in further studies.

In designing this paradigm, we tried to find a balance between task difficulty, so that developmental differences can be measured, and feasibility, so that participant drop-out is not too high. The results indicate that we achieved this goal since we found age-dependent differences and retained 73% of the sample. Nonetheless, the paradigm could be adapted to increase the number of younger participants who manage to learn the initial associations. Moreover, we did not manipulate the learning context in a way that allowed to investigate its influence on RL. Besides the role of instrumental feedback, we initially planned to investigate if adults or adolescents are more open to contextual clues that inform about contingency changes. After beginning the data collection, we realised that the design only allows to test the relation of contextual information on probabilistic learning, but not on RL, since the intended variables are confounded. Future replications or adaptations of this paradigm may improve on this.

Furthermore, there are some caveats to consider when interpreting our results. First, since the study was run online, some variance in the participants’ performance might be attributed to their different environments at home, the devices they used and reliability of their internet connection. When excluding implausibly fast responses, the results regarding response times change, indicating that quality control is necessary for such online experiments. If younger adolescents had more difficulties with using their computers than the university students, that mostly make up the adult participants, this might have confounded the results. We tried to avoid this by calling the participants via phone, providing instructions orally and giving enough time to ask questions regarding the task or procedure. Furthermore, we made sure that the experiment ran properly on the participant’s web browser, asking the parents’ help, if necessary. Although we do not believe that there are other factors that may have systematically influenced the data, ideally, the results should be replicated in a more controlled environment, such as a laboratory. Second, the paradigm should be repeated in a longitudinal study, to complement our cross-sectional results.

A strength of our paradigm is that we measured the role of responses to negative instrumental feedback separately from RL. This way, we were able to observe that the ability to respond flexibly seems to increase between 10 and 14 years and is related to better RL performance in all participants, but most prominently in adults. Nevertheless, the paradigm’s design determines the adaptive value of processing negative feedback, and therefore may influence the age-related results. As discussed above, inconsistent findings related to adolescent vs. adult advantages during RL may be the result of differences in the consequences of flexible responses to negative feedback. If changing responses is adaptive and reflects in better task performance, like in our paradigm, adults may show more flexible responses due to their ability to optimize behaviour in varying environments. If, however, flexible responses are in competition with more adaptive behaviour, adolescents may be most flexible because of a higher openness to new information or a higher expectation of environment volatility⁵⁶, depending also on the type and value of feedback^27,29,57,58. Therefore, on the one hand, the advantage or disadvantage of adolescents compared to adults in RL might depend substantially on task properties, while on the other hand adolescents might gain substantial improvements in cognitive flexibility compared to children. Keeping this in mind, future paradigms might try to manipulate the adaptivity of flexible behaviour within a paradigm and consider its possible non-linear effects on RL in different age groups.

Moreover, recommendations for educational settings can be deduced from this study. Younger adolescents particularly struggled to discern valid from invalid feedback, which hindered their ability to learn stimulus-outcome associations. Clear, unambiguous feedback may be crucial for their learning, while training the ability to identify patterns in noisy information could be a valuable educational focus for this age group. Furthermore, the difficulty younger participants had in learning associations, compared to the stronger performance of older participants, highlights the importance of calibrating task demands to children’s and adolescents’ developmental capacities in both research and educational contexts. It was shown that game-based flexibility training for children improved untrained executive functions which even transferred to better sentence comprehension⁵⁹. In sum, the study showed that learning and its components develop in a complex pattern, requiring age-appropriate learning situations for individuals to excel.

Methods

Participants

Seventy-four male and 138 female participants aged 10 to 22 participated. They were recruited via social media, the city’s birth-registry, former participants’ databank, and word-of-mouth. Exclusion criteria were no internet access, uncorrected refractive errors, deuteranopia or protanopia (“red-green deficiency”), diagnosed special educational needs or insufficient German proficiency. Participation was compensated with a 10€ voucher. Informed consent was obtained online from adults or legal guardians. The study was approved by the ethics committee of the Technical University Dortmund (date of approval 15.03.2021) and is in accordance with the declaration of Helsinki. To test hypothesised differences between age groups, we categorise early-adolescents as between 10 to 14 years, mid-adolescents between 14 and 18 years and adults as above 18 in all analyses. This separation is based on the studies reviewed in the introduction and evenly splits the age range in the sample.

Of the 212 participants, data from 13 participants could not be used due to errors: Data of four participants was excluded because they did not respond for more than 10 consecutive trials. Three participants terminated the experiment prematurely. Data from one participant could not be used due to technical issues. One participant was excluded due to giving false information. Further four participants were excluded because of missing data.

Reversal Learning Paradigm

The RL paradigm was developed using the PsychoPy Builder (Version 2020.1.2⁶⁰) and was run on the repository and experimental platform Pavlovia⁶¹, accessed through the participants’ web browser. In the experiment, participants learned to predict whether sweets, which could be bought in kiosk A or B, cause a stomach-ache or not in a fictitious group of friends based on probabilistic feedback. A trial was set up as follows: After a fixation cross, participants saw sweets framed by the picture of either kiosk A or B for one second. Then, participants were prompted to predict whether the sweets cause stomach-ache (“Yes”) or not (“No”) within three seconds. After a delay of half a second, feedback was presented for three seconds. The feedback was both instrumental (a green frame marking their choice if they are correct, a red frame if not) and valence-based (written feedback that their friends do or do not get a stomach-ache accompanied by a corresponding happy or sad emoji).

Participants completed three learning blocks (LB 1–3), each consisting of 80 trials. During each block, they learned the outcomes of four types of sweets: two presented in kiosk A and two in kiosk B. The assignment of sweets to contexts and outcomes was randomized per participant. As shown in Fig. 7, each sweet caused stomach-ache in either 20% or 80% of its 20 presentations per block, balanced across contexts. This resulted in four invalid feedback trials per sweet per block.

**Fig. 7: Trials In The Three Learning Blocks.**

To manipulate the feedback’s informational value, two invalid feedback trials were included in the first six presentations of each sweet. Invalid feedback could either be negative (incorrectly marking a correct response as wrong; Fig. 7b, trial 1) or positive (incorrectly marking an incorrect response as correct; Fig. 7b, trial 6). With this design we assessed how flexibly participants adjusted their responses after feedback (see analysis plan for details on the outcome measure “flexible reactions”). The remaining two invalid feedback trials occurred toward the end of the block (during trials 15–19) to evaluate how well participants retained learned associations when faced with invalid feedback. In the middle trials (7–14), only valid feedback was provided, allowing participants to learn the predominant stimulus contingencies. This feedback pattern applied to all four sweets and across all three learning blocks.

Stimulus presentations were pseudo-randomized to prevent the same stimulus from appearing in consecutive trials. To test adaptation to changes in stimulus-outcome associations, the same stimuli were used across blocks, but their outcome associations changed without prior notice. As shown in Fig. 7, two stimuli remained in the same context but changed outcomes between LB2 and LB3. The other two switched contexts in each block and changed outcomes between LB1 and LB2. Context switches were introduced to explore their impact on probabilistic learning, which will be addressed in a future publication.

In addition, there were brief test blocks after LB1 and LB2 during which the participants had to indicate which stimulus outcome they expect without receiving feedback. In the first test block (TB1), the four stimuli were presented in their initial learning context (i.e., the kiosk they were presented in during LB1) five times each to test whether the associations were learned properly. During the second test block (TB2) all stimuli were presented in all contexts in order to assess whether the reversed stimulus-outcome associations in LB2 were recalled.

In order to ensure that participants properly learned the stimulus-outcome associations in the first learning block, we defined an initial learning threshold they had to reach for the paradigm to continue. The learning performance was considered sub-threshold if the participants were correct in less than 21 out of the 32 possible correct answers in the middle eight trials of stimulus presentations’ during the first learning block (i.e., the probability that they answered randomly is less than 5% according to binomial distribution).

Questionnaires

Before scheduling the experimental session, participants, and an accompanying adult in case of minors, answered several questionnaires. These included demographic questions, the short version of the depression anxiety stress scale (DASS-21)⁶², 10-item short version of the Big Five inventory⁶³, need for cognition scale⁴⁴, emotion regulation questionnaire⁶⁴ and the behavioural inhibition system/ behavioural approach system questionnaire⁶⁵.

In addition, underage participants answered the strengths and difficulties questionnaire⁶⁶ and the pubertal development scale⁶⁷. The NFC and pubertal development scale are used in the analysis here. The remaining questionnaires may be analysed at a later point. The pubertal development scale consists of four different questions for male or female participants that ask for a self-report on characteristic bodily changes occurring during puberty. The average score from the answers is calculated for use in the statistical analysis.

Working Memory Assessment

Participants’ working memory was assessed with a digit span and backward digit span test, adapted from the Wechsler intelligence scale⁶⁸. The experimenter read number sequences to the participant over the phone. The participant had to retain the sequence in memory and recite it immediately after. They were explicitly asked not to note the sequences down. Sequence length increased every second time and ranged from two to nine digits for the forward and two to eight digits for the backward version. The number of correctly recited sequences is averaged for both versions.

Procedure

Questionnaires and the consent form were provided via Qualtrics (Qualtrics, Provo, UT, USA). The experiment was remotely administered via telephone and web browser. Before attending the experiment, participants and their parents (in case of minors) independently answered the questionnaires. Immediately before the experimental paradigm started, they underwent the working memory assessment and reported what web browser they used. To make the environment more comparable, all participants were asked to turn of disturbing devices and sit in a quiet atmosphere, if possible. Researchers followed a script for the instruction of participants and tested whether they understood the study task by asking them to respond to an example trial. Participants received their renumeration after completing the experiment. Several experimenters administered the study after being trained sufficiently. The study took approximately 35 minutes, of which 20 minutes were spent on the RL paradigm.

Statistical Analysis Plan

In this manuscript we focus on preregistered hypotheses on RL (research question b and parts of c and f in the preregistration, https://osf.io/ep5ua). Some of the respective analyses are reported in the supplementary material for completeness.

We calculated the variables used in the statistical analysis in the following way. Reversal learning performance was calculated for stimuli that changed the stimulus-outcome association compared to the previous learning block. We used the first 14 trials with these reversed stimuli per block to calculate a quota between 0 and 1 for how well participants learned the reversed stimulus-outcome associations. The later six trials did not factor into this calculation since they were intended to test the stability of participants’ responses specifically. Flexible reactions are also calculated as a fraction between 0 and 1. A reaction is counted as flexible if a participant changes the response to a certain stimulus in the next trial after receiving negative feedback within the first ten stimulus presentations within the learning block. We include the ten nine trials since the effects of invalid feedback may extend beyond the trial right after (see for example Fig. 7. B trial 6 – 9). The quota of flexible reactions is then calculated as the number of flexible answers divided by the number of all trials in which negative feedback was given within the first ten stimulus presentations and is averaged for all stimuli of a learning block. That is, participants who never change their answer after negative feedback would get a score of 0, while participants who always change their response after negative feedback would receive a score of 1. Note that a response counts as flexible if the participant changes his response after negative feedback, irrespective of the correctness of the answer. Therefore, flexible reactions measure the openness to change behaviour according to negative feedback, independent of reversal learning performance. Furthermore, erroneous changes in responses, without prior negative feedback, are not counted towards the flexible reactions. Therefore, the measure of flexible behaviour is discernible from impulsive or erroneous changes in responses. As planned in the preregistration, to further assure the independence of variables in the regression models, the measures of the quota of flexible responses are adjusted to the respective outcome variables. The RL performance of the two stimuli that changed their contingency in LB2 and LB3, respectively, are confounded with the quota of flexible reactions in reaction to these stimuli. Hence, the relation of flexible reactions with RL performance is tested by only including the quota of flexible reactions for all stimuli in LB1 and the stimuli in LB2 and LB3 that did not change their outcome-association. When analysing flexible reactions as a dependent variable, the values from all LBs and stimuli are used.

Response times were calculated as the median of reaction times of trials with a correct response for all four stimuli in one learning block. Response times were not normally distributed in 96% of participants according to an omnibus test of normality⁶⁹. Hence, the logarithm of the response times was used in the analysis. Since the web browser used to run the experiment may influence the accuracy of response time measurement⁷⁰, we added the type of browser as a factor in analyses of reaction times.

For hypotheses proposing a linear relation between independent and dependent variable, linear regression models are fitted to the data of those participants that reached the initial learning threshold. If the hypothesis stipulated an inverted-U relation, quadratic regression terms are used. As the sex distribution in the sample was not equal, we added sex as a covariate in additional analyses to exclude that results are driven by sex effects. We consider a test to be significant if it is expected to reach a false positive rate of 5% or below in the long run.

As an exploratory analysis in addition to linear regression, we also fitted linear basis-spline regression models for developmental research questions. This allows for more flexibility in investigating the relation between age and the outcome variable. On the one hand, this method allows for a more flexible fit than a standard regression and may better account for non-linear developmental effects. On the other hand, the shape of the regression line is more constrained than when fitting a higher-order polynomial regression. Two knots at 14 and 18 years of age were chosen so that the resulting slopes represent the development in pre- and early puberty, during mid-puberty, and during young adulthood. Furthermore, we used the pubertal development scores as an alternative predictor to age and refitted models with this predictor instead because the relation between age and RL may be further understood by taking into account the pubertal development in underage participants. Puberty does not progress with the same continuity as age and reaches a natural end point, therefore self-reported puberty and RL may be related in a different way than age and RL.

Data availability

The data that support the findings of this study are available from the corresponding author, CB, upon reasonable request.

Code availability

The code used to analyse the data are publicly available from the OSF project accompanying this publication: https://osf.io/8mc5r/files/osfstorage.

References

Gopnik, A. et al. Changes in cognitive flexibility and hypothesis search across human life history from childhood to adolescence to adulthood. Proc. Natl Acad. Sci. USA 114, 7892–7899 (2017).
Article PubMed PubMed Central CAS Google Scholar
Hartley, C. A. & Lee, F. S. Sensitive Periods in Affective Development: Nonlinear Maturation of Fear Learning. Neuropsychopharmacol 40, 50–60 (2015).
Article Google Scholar
Ramscar, M. & Gitcho, N. Developmental change and the nature of learning in childhood. Trends Cogn. Sci. 11, 274–279 (2007).
Article PubMed Google Scholar
van Duijvenvoorde, A. C. K., Whitmore, L. B., Westhoff, B. & Mills, K. L. A methodological perspective on learning in the developing brain. NPJ Sci. Learn 7, 12 (2022).
Article PubMed PubMed Central Google Scholar
Thompson, A. & Steinbeis, N. Sensitive periods in executive function development. Curr. Opin. Behav. Sci. 36, 98–105 (2020).
Article PubMed PubMed Central Google Scholar
Dahl, R. E., Allen, N. B., Wilbrecht, L. & Suleiman, A. B. Importance of investing in adolescence from a developmental science perspective. Nature 554, 441–450 (2018).
Article PubMed CAS Google Scholar
Blakemore, S. J. & Robbins, T. W. Decision-making in the adolescent brain. Nat. Neurosci. 15, 1184–1191 (2012).
Article PubMed CAS Google Scholar
Fuhrmann, D., Knoll, L. J. & Blakemore, S. -J. Adolescence as a Sensitive Period of Brain Development. Trends Cogn. Sci. 19, 558–566 (2015).
Article PubMed Google Scholar
Sturman, D. A. & Moghaddam, B. Reduced neuronal inhibition and coordination of adolescent prefrontal cortex during motivated behavior. J. Neurosci.: J. Soc. Neurosci. 31, 1471–1478 (2011).
Article CAS Google Scholar
Hofmans, L. & Van Den Bos, W. Social learning across adolescence: A Bayesian neurocognitive perspective. Dev. Cogn. Neurosci. 58, 101151 (2022).
Article PubMed PubMed Central Google Scholar
Towner, E., Chierchia, G. & Blakemore, S. -J. Sensitivity and specificity in affective and social learning in adolescence. Trends Cogn. Sci. 27, 642–655 (2023).
Article PubMed Google Scholar
Baker, K. D., Den, M. L., Graham, B. M. & Richardson, R. A window of vulnerability: impaired fear extinction in adolescence. Neurobiol. Learn. Mem. 113, 90–100 (2014).
Article PubMed Google Scholar
Pattwell, S. S. et al. Dynamic changes in neural circuitry during adolescence are associated with persistent attenuation of fear memories. Nat. Commun. 7, 1–9 (2016).
Article Google Scholar
Pattwell, S. S. et al. Altered fear learning across development in both mouse and human. Proc. Natl Acad. Sci. USA 109, 16318–16323 (2012).
Article PubMed PubMed Central CAS Google Scholar
McNealy, K., Mazziotta, J. C. & Dapretto, M. Age and experience shape developmental changes in the neural basis of language-related learning: Neurodevelopmental changes in language learning. Dev. Sci. 14, 1261–1282 (2011).
Article PubMed PubMed Central Google Scholar
Janacsek, K., Fiser, J. & Nemeth, D. The best time to acquire new skills: age-related differences in implicit sequence learning across the human lifespan. Dev. Sci. 15, 496–505 (2012).
Article PubMed PubMed Central Google Scholar
Izquierdo, A., Brigman, J. L., Radke, A. K., Rudebeck, P. H. & Holmes, A. The neural basis of reversal learning: An updated perspective. Neuroscience 345, 12–26 (2017).
Article PubMed CAS Google Scholar
DePasque, S. & Galván, A. Frontostriatal development and probabilistic reinforcement learning during adolescence. Neurobiol. Learn. Mem. 143, 1–7 (2017).
Article PubMed Google Scholar
Master, S. L. et al. Distentangling the systems contributing to changes in learning during adolescence. Dev. Cognit. Neurosci. 41 (2020).
Xia, L. et al. Modeling changes in probabilistic reinforcement learning during adolescence. PLoS Comput Biol. 17, e1008524 (2021).
Article PubMed PubMed Central CAS Google Scholar
Van den Bos, W. et al. Better than expected or as bad as you thought? The neurocognitive development of probabilistic feedback processing. Front. Human Neurosci. 3 (2009).
Decker, J. H., Lourenco, F. S., Doll, B. B. & Hartley, C. A. Experiential reward learning outweighs instruction prior to adulthood. Cogn., Affect. Behav. Neurosci. 15, 310–320 (2015).
Article PubMed Google Scholar
Davidow, J. Y., Foerde, K., Galván, A. & Shohamy, D. An Upside to Reward Sensitivity: The Hippocampus Supports Enhanced Reinforcement Learning in Adolescence. Neuron 92, 93–99 (2016).
Article PubMed CAS Google Scholar
Nussenbaum, K. & Hartley, C. A. Reinforcement learning across development: What insights can we draw from a decade of research?. Dev. Cogn. Neurosci. 40, 100733 (2019).
Article PubMed PubMed Central Google Scholar
Crone, E. A., Zanolie, K., Leijenhorst, L. V., Westenberg, P. M. & Rombouts, S. A. R. B. Neural mechanisms supporting flexible performance adjustment during development. Cogn., Affect. Behav. Neurosci. 8, 165–177 (2008).
Article PubMed Google Scholar
Palminteri, S., Kilford, E. J., Coricelli, G. & Blakemore, S. J. The Computational Development of Reinforcement Learning during Adolescence. PLoS Comput. Biol. 12, 1–25 (2016).
Article Google Scholar
Van Der Schaaf, M. E., Warmerdam, E., Crone, E. A. & Cools, R. Distinct linear and non-linear trajectories of reward and punishment reversal learning during development: Relevance for dopamine’s role in adolescent decision making. Dev. Cogn. Neurosci. 1, 578–590 (2011).
Article PubMed PubMed Central Google Scholar
Waltmann, M. et al. Diminished reinforcement sensitivity in adolescence is associated with enhanced response switching and reduced coding of choice probability in the medial frontal pole. Dev. Cogn. Neurosci. 60, 101226 (2023).
Article PubMed PubMed Central Google Scholar
Weiss, E. O. et al. Developmental Differences in Probabilistic Reversal Learning: A Computational Modeling Approach. Frontiers in Neuroscience 14 (2021).
Eckstein, M. K., Master, S. L., Dahl, R. E., Wilbrecht, L. & Collins, A. G. E. Reinforcement learning and Bayesian inference provide complementary models for the unique advantage of adolescents in stochastic reversal. Dev. Cogn. Neurosci. 55, 101106 (2022).
Article PubMed PubMed Central Google Scholar
Uddin, L. Q. Cognitive and behavioural flexibility: neural mechanisms and clinical considerations. Nat. Rev. Neurosci. 22, 167–179 (2021).
Article PubMed PubMed Central CAS Google Scholar
Eckstein, M., Master, S., Dahl, R., Wilbrecht, L. & Collins, A. The Unique Advantage of Adolescents in Probabilistic Reversal: Reinforcement Learning and Bayesian Inference Provide Adequate and Complementary Models. 1–34 https://doi.org/10.1101/2020.07.04.187971 (2020)
Johnson, C. & Wilbrecht, L. Juvenile mice show greater flexibility in multiple choice reversal learning than adults. Dev. Cogn. Neurosci. 1, 540–551 (2011).
Article PubMed PubMed Central Google Scholar
Hauser, T. U., Iannaccone, R., Walitza, S., Brandeis, D. & Brem, S. Cognitive flexibility in adolescence: Neural and behavioral mechanisms of reward prediction error processing in adaptive decision making during development. NeuroImage 104, 347–354 (2015).
Article PubMed Google Scholar
Parr, A. C., Sydnor, V. J., Calabro, F. J. & Luna, B. Adolescent-to-adult gains in cognitive flexibility are adaptively supported by reward sensitivity, exploration, and neural variability. Curr. Opin. Behav. Sci. 58, 101399 (2024).
Article PubMed Google Scholar
Vo, A., Seergobin, K. N. & MacDonald, P. A. Independent effects of age and levodopa on reversal learning in healthy volunteers. Neurobiol. Aging 69, 129–139 (2018).
Article PubMed CAS Google Scholar
Kinner, V. L., Merz, C. J., Lissek, S. & Wolf, O. T. Cortisol disrupts the neural correlates of extinction recall. NeuroImage 133, 233–243 (2016).
Article PubMed CAS Google Scholar
Üngör, M. & Lachnit, H. Contextual control in discrimination reversal learning. J. Exp. Psychol.: Anim. Behav. Process. 32, 441–453 (2006).
PubMed Google Scholar
Alloway, T. P. & Copello, E. Working Memory: The What, the Why, and the How. Aust. Educ. Dev. Psychol. 30, 105–118 (2013).
Article Google Scholar
Rolison, J. J., Evans, J. S. B. T., Walsh, C. R. & Dennis, I. The Role of Working Memory Capacity in Multiple-Cue Probability Learning. Q. J. Exp. Psychol. 64, 1494–1514 (2011).
Article Google Scholar
Sohn, Y. W. & Doane, S. M. Roles of working memory capacity and long-term working memory skill in complex task performance. Mem. Cogn. 31, 458–466 (2003).
Article Google Scholar
St Clair-Thompson, H. L. & Gathercole, S. E. Executive functions and achievements in school: Shifting, updating, inhibition, and working memory. Q. J. Exp. Psychol. 59, 745–759 (2006).
Article Google Scholar
Yoo, A. H. & Collins, A. G. E. How Working Memory and Reinforcement Learning Are Intertwined: A Cognitive, Neural, and Computational Perspective. J. Cogn. Neurosci. 34, 551–568 (2022).
Article PubMed Google Scholar
Keller, U. et al. A Need for Cognition Scale for Children and Adolescents: Structural Analysis and Measurement Invariance. Eur. J. Psychol. Assess. 35, 137–149 (2019).
Article Google Scholar
Kramer, A. -W., Van Duijvenvoorde, A. C. K., Krabbendam, L. & Huizenga, H. M. Individual differences in adolescents’ willingness to invest cognitive effort: Relation to need for cognition, motivation and cognitive capacity. Cogn. Dev. 57, 100978 (2021).
Article Google Scholar
Luong, C. et al. Need for cognition in children and adolescents: Behavioral correlates and relations to academic achievement and potential. Learn. Individ. Differ. 53, 103–113 (2017).
Article Google Scholar
Tervo-Clemmens, B. et al. A canonical trajectory of executive function maturation from adolescence to adulthood. Nat. Commun. 14, 6922 (2023).
Article PubMed PubMed Central CAS Google Scholar
Koopman-Verhoeff, M. E., Gredvig-Ardito, C., Barker, D. H., Saletin, J. M. & Carskadon, M. A. Classifying Pubertal Development Using Child and Parent Report: Comparing the Pubertal Development Scales to Tanner Staging. J. Adolesc. Health 66, 597–602 (2020).
Article PubMed Google Scholar
Giedd, J. N. Structural Magnetic Resonance Imaging of the Adolescent Brain. Ann. N. Y. Acad. Sci. 1021, 77–85 (2004).
Article PubMed Google Scholar
Lim, S., Han, C. E., Uhlhaas, P. J. & Kaiser, M. Preferential Detachment During Human Brain Development: Age- and Sex-Specific Structural Connectivity in Diffusion Tensor Imaging (DTI) Data. Cereb. Cortex 25, 1477–1489 (2015).
Article PubMed Google Scholar
Walker, D. M. et al. Adolescence and Reward: Making Sense of Neural and Behavioral Changes Amid the Chaos. J. Neurosci. 37, 10855–10866 (2017).
Article PubMed PubMed Central CAS Google Scholar
Dajani, D. R. & Uddin, L. Q. Demystifying cognitive flexibility: Implications for clinical and developmental neuroscience. Trends Neurosci. 38, 571–578 (2015).
Article PubMed PubMed Central CAS Google Scholar
Horn, S. S., Ruggeri, A. & Pachur, T. The development of adaptive decision making: Recognition-based inference in children and adolescents. Dev. Psychol. 52, 1470–1485 (2016).
Article PubMed Google Scholar
McCormick, E. M. & Telzer, E. H. Adaptive Adolescent Flexibility: Neurodevelopment of Decision-making and Learning in a Risky Context. J. Cogn. Neurosci. 29, 413–423 (2017).
Article PubMed Google Scholar
Van Duijvenvoorde, A. C. K., Jansen, B. R. J., Bredman, J. C. & Huizenga, H. M. Age-related changes in decision making: Comparing informed and noninformed situations. Developmental Psychol. 48, 192–203 (2012).
Article Google Scholar
Jepma, M., Schaaf, J. V., Visser, I. & Huizenga, H. M. Uncertainty-driven regulation of learning and exploration in adolescents: A computational account. PLoS Comput Biol. 16, e1008276 (2020).
Article PubMed PubMed Central CAS Google Scholar
Chierchia, G. et al. Confirmatory reinforcement learning changes with age during adolescence. Developmental Sci. 26, e13330 (2023).
Article Google Scholar
Javadi, A. H., Schmidt, D. H. K. & Smolka, M. N. Adolescents adapt more slowly than adults to varying reward contingencies. J. Cogn. Neurosci. 26, 2670–2681 (2014).
Article PubMed PubMed Central Google Scholar
Johann, V. E. & Karbach, J. Effects of game-based and standard executive control training on cognitive and academic abilities in elementary school children. Developmental Sci. 23, e12866 (2020).
Article Google Scholar
Peirce, J. et al. PsychoPy2: Experiments in behavior made easy. Behav. Res 51, 195–203 (2019).
Article Google Scholar
Bridges, D., Pitiot, A., MacAskill, M. R. & Peirce, J. W. The timing mega-study: comparing a range of experiment generators, both lab-based and online. PeerJ 8, e9414 (2020).
Article PubMed PubMed Central Google Scholar
Szabó, M. The short version of the Depression Anxiety Stress Scales (DASS-21): factor structure in a young adolescent sample. J. adolescence 33, 1–8 (2010).
Article Google Scholar
Rammstedt, B. & John, O. P. Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory in English and German. J. Res. Personal. 41, 203–212 (2007).
Article Google Scholar
Abler, B. & Kessler, H. Emotion regulation questionnaire–Eine deutschsprachige Fassung des ERQ von Gross und John. Diagnostica 55, 144–152 (2009).
Article Google Scholar
Strobel, A., Beauducel, A., Debener, S. & Brocke, B. Eine deutschsprachige version des BIS/BAS-Fragebogens von carver und white. Zeitschrift für Differentielle und diagnostische Psychologie (2001).
Petermann, U., Petermann, F. & Schreyer, I. The German Strengths and Difficulties Questionnaire (SDQ). European Journal of Psychological Assessment (2010).
Carskadon, M. A. & Acebo, C. A self-administered rating scale for pubertal development. J. Adolesc. Health 14, 190–195 (1993).
Article PubMed CAS Google Scholar
Petermann, F. & Petermann, U. Wechsler Intelligence Scale for Children®–Fourth Edition. Frankfurt a. M.: Pearson Assessment (2011).
D’Agostino, R. & Pearson, E. S. Tests for Departure from Normality. Empirical Results for the Distributions of b 2 and √b 1. Biometrika 60, 613 (1973).
Google Scholar
Anwyl-Irvine, A., Dalmaijer, E. S., Hodges, N. & Evershed, J. K. Realistic precision and accuracy of online experiment platforms, web browsers, and devices. Behav. Res 53, 1407–1425 (2021).
Article Google Scholar

Download references

Acknowledgements

This work was supported by the German Research Foundation (DFG) under Collaborative Research Centre grant 1280.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Department of Vision, Visual Impairment and Blindness Technical University Dortmund, Dortmund, Germany
Christoph Bamberg, Sarah Weigelt & Klara Hagelweide
Department of Psychology Paris Lodron University of Salzburg, Salzburg, Austria
Christoph Bamberg

Authors

Christoph Bamberg
View author publications
Search author on:PubMed Google Scholar
Sarah Weigelt
View author publications
Search author on:PubMed Google Scholar
Klara Hagelweide
View author publications
Search author on:PubMed Google Scholar

Contributions

C.B.: conceptualisation, design, programming, data collection, analysis, writing, revision. K.H.: conceptualisation, design, writing, revision. S.W.: conceptualisation, design, revision. All authors have read and approved the manuscript.

Corresponding authors

Correspondence to Christoph Bamberg or Sarah Weigelt.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental Material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bamberg, C., Weigelt, S. & Hagelweide, K. Reversal learning is influenced by cognitive flexibility and develops throughout early adolescence. npj Sci. Learn. 10, 27 (2025). https://doi.org/10.1038/s41539-025-00308-3

Download citation

Received: 28 June 2024
Accepted: 18 March 2025
Published: 12 May 2025
DOI: https://doi.org/10.1038/s41539-025-00308-3

Subjects

Abstract

Similar content being viewed by others

Navigating a varying reward environment in childhood and adolescence

The control of movement gradually transitions from feedback control to feedforward adaptation throughout childhood

Psychological flexibility and cognitive-affective processes in young adults’ daily lives

Introduction

Results

Did Participants Learn the Initial Associations?

Does Reversal Learning Differ Between Age Groups?

Does Reversal Learning Differ Depending on Pubertal Maturation and Sex?

What is the Relation between Reversal Learning Performance and Flexible Reactions to Negative Instrumental Feedback?

How Do Age and Flexible Reactions Together Contribute to Reversal Learning?

What is the Relation Between Age and Flexible Reactions to Negative Feedback?

Influences of Working Memory Capacity and Need for Cognition

Discussion

Methods

Participants

Reversal Learning Paradigm

Questionnaires

Working Memory Assessment

Procedure

Statistical Analysis Plan

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplemental Material

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links