Figure 6
From: Exploring optimal control of epidemic spread using reinforcement learning

The figure describes the memory-based agent neural network architecture of the agent. The agent uses three bidirectional LSTM layers with 128, 64, and 64 nodes, respectively. It is further followed by four dense layers of 128, 64, 32, and 3 nodes.