Figure 5
From: Exploring optimal control of epidemic spread using reinforcement learning

The graph illustrates the decay of reward value concerning the increase in the percentage of active cases (neglecting the cumulative death cases \(D_t\) = 0). The value of \(E_t\) being 1, 0.75, and 0.25 approximately represents the level-0, level-1, and level-2 movement restrictions. After crossing 0.82% of active cases, the reward of all the different restrictions falls to zero.