Figure 2
From: Scalable photonic reinforcement learning by time-division multiplexing of laser chaos

Chaotic time series, inherent time-correlated structures, RF spectra, and decision-making performance for the two-armed bandit problem. (a) Snapshots of the time series used for solving the N-armed bandit problem. Four kinds of chaotic signals (Chaos 1–4) as well as quasiperiodic sequences, pseudorandom numbers (RAND), and coloured noise are used. (b) Radio-frequency (RF) power spectra in Chaos 1–4 and quasiperiodic signal cases. (c) Evolution of the correct decision ratio (CDR) indicating the likelihood of choosing the highest-reward-probability slot machine. The reward probabilities of Machine 0 and 1 are 0.9 and 0.7, respectively. (d) Autocorrelation inherent in Chaos 1–4, quasiperiodic, and coloured noise cases.