Table 2 Detailed statistical summary of episodic rewards for all evaluated agent types.

From: The necessity of multimodal feedback for learning effective pedagogical policies with reinforcement learning

Agent type

Mean reward

SD

25th percentile (Q1)

Median reward (Q2)

75th percentile (Q3)

Random agent

5.231

7.013

 − 3.878

8.522

10.502

Heuristic (hint)

5.564

6.885

 − 4.49

9.854

10.277

Heuristic (repeat)

6.564

6.885

 − 3.49

10.854

11.277

PPO (no cognitive-behavioral)

5.213

7.012

 − 3.868

8.526

10.503

PPO agent

6.563

6.885

 − 3.49

10.853

11.276