Table 2 Detailed statistical summary of episodic rewards for all evaluated agent types.
Agent type | Mean reward | SD | 25th percentile (Q1) | Median reward (Q2) | 75th percentile (Q3) |
|---|---|---|---|---|---|
Random agent | 5.231 | 7.013 |  − 3.878 | 8.522 | 10.502 |
Heuristic (hint) | 5.564 | 6.885 |  − 4.49 | 9.854 | 10.277 |
Heuristic (repeat) | 6.564 | 6.885 |  − 3.49 | 10.854 | 11.277 |
PPO (no cognitive-behavioral) | 5.213 | 7.012 |  − 3.868 | 8.526 | 10.503 |
PPO agent | 6.563 | 6.885 |  − 3.49 | 10.853 | 11.276 |