Figure 4

Accuracies and loss for different models. Training accuracies (blue) and validation accuracies (orange) during training are shown in the left column. The corresponding loss is shown in the right column. (A) For a discount factor of \(\gamma = 0.3\), the highest accuracy of \(60\%\) was achieved. (B) For \(\gamma =0.7\), the accuracy saturates after 200 epochs at \(30\%\). (C) For \(\gamma =1.0\), the accuracy saturates after 200 epochs at \(35\%\).