Table 2 Detailed statistical summary of episodic rewards for all evaluated agent types.

Agent type	Mean reward	SD	25th percentile (Q1)	Median reward (Q2)	75th percentile (Q3)
Random agent	5.231	7.013	− 3.878	8.522	10.502
Heuristic (hint)	5.564	6.885	− 4.49	9.854	10.277
Heuristic (repeat)	6.564	6.885	− 3.49	10.854	11.277
PPO (no cognitive-behavioral)	5.213	7.012	− 3.868	8.526	10.503
PPO agent	6.563	6.885	− 3.49	10.853	11.276

Quick links

Search