Table 1 MPL agents learn an alternating pattern. MPL agents learn a sequence of outcomes x generated by alternating deterministically between 0 and 1. The agent’s parameters are given in the first row. The p 1 column gives the probability that the agent will respond 1 (it will respond 0 with probability 1 − p 1). From trial t = 4 on, both agents have already learned the pattern. Henceforth, the agent with optimal parameters (left) always makes correct predictions, but the agent with suboptimal parameters (right) may not always do so.
MPL k = 1, A = 1, ρ = 1, θ → ∞ | MPL k = 1, A = 0.9, ρ = 0.9, θ = 0.3 | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
t | η = 0 | η = 1 | p 1 | x | t | η = 0 | η = 1 | p 1 | x | ||||
E 0 | E 1 | E 0 | E 1 | E 0 | E 1 | E 0 | E 1 | ||||||
1 | 0 | 0 | 0 | 0 | 0.5 | 0 | 1 | 0 | 0 | 0 | 0 | 0.5 | 0 |
2 | 0 | 0 | 0 | 0 | 0.5 | 1 | 2 | 0 | 0 | 0 | 0 | 0.5 | 1 |
3 | 0 | 1 | 0 | 0 | 0.5 | 0 | 3 | 0 | 1 | 0 | 0 | 0.5 | 0 |
4 | 0 | 1 | 1 | 0 | 1 | 1 | 4 | 0 | 0.9 | 1 | 0 | 0.57 | 1 |
5 | 0 | 2 | 1 | 0 | 0 | 0 | 5 | 0 | 1.73 | 0.9 | 0 | 0.43 | 0 |
6 | 0 | 2 | 2 | 0 | 1 | 1 | 6 | 0 | 1.56 | 1.73 | 0 | 0.61 | 1 |
7 | 0 | 3 | 2 | 0 | 0 | 0 | 7 | 0 | 2.26 | 1.56 | 0 | 0.39 | 0 |
8 | 0 | 3 | 3 | 0 | 1 | 1 | 8 | 0 | 2.03 | 2.26 | 0 | 0.65 | 1 |