Fig. 3 | Nature Communications

Fig. 3

From: Dissociating task acquisition from expression during learning reveals latent knowledge

Fig. 3

Contextual scaling in a reinforcement learning model. a Schematic of the model, which implements reinforcement-driven learning of stimulus-action associations, with a readout function that can be contextually modulated. The model simultaneously captures reinforced and probe learning trajectories by dissociating between reward-driven plasticity representing task acquisition (gray, reinforcement signals) and context-dependent changes in expression (orange, contextual scaling) of the learned values. Plastic synapses between sensory and decision-making populations represent stimulus-action values, and their weights are only updated during reinforced trials (gray shading, reinforcement signals). Actions are generated by the decision-making (D and I) units (orange shading), which read out the sensory input filtered through the synaptic weight matrix (W). The decision unit’s (D) activity determines the probability that the model will respond. The parameters of the readout units (orange shading) are modulated between the reinforced and the probe contexts, either via selective scaling of inhibition and excitation, noise modulation, or threshold changes (see Supplementary Fig. 4). b Left: illustration of the effects of selective scaling of inhibition on D’s activity. Decision activity represents the net input to the decision-making unit, i.e., the difference between the values of the go and no-go actions, for the target (magenta) and foil (cyan) tone over the course of learning. Solid and dashed lines, respectively, indicate probe and reinforced context (inhibition scaling cI: 0.52). Right: schematic of the contextually modulated model via inhibitory scaling, and a description of how inhibitory scaling is mathematically implemented. The net output of the model is proportional to the excitatory weights (WE) minus the inhibitory weights (WI). In the probe context, the inhibitory weights are multiplied by a scalar factor (CI, orange). Throughout panels a-b, orange highlights the context-dependent parameters. c Comparisons between average mouse behavioral data (n = 7 mice) and fits of the inhibitory scaling model for the two contexts. d same as c but for the learning trajectory of one individual mouse. e, f, same as c, d, but for rat behavioral data (e: n = 6 rats; f: rat rt003)

Back to article page