Table 2 Components of Markov Decision Processes
From: A Primer on Reinforcement Learning in Medicine for Clinicians
Component name | Description |
|---|---|
State (S) | The set of States (s∈S) |
Action (A) | The set of Actions (a∈A) |
Probability scores (P) | Denotes transition probabilities, specifying the likelihood of moving from one state to another when the agent takes a particular action |
Rewards (R) | Denotes the positive or negative rewards received upon transitioning from state s to s’ (s, s’∈S) by taking an action a |
discount factor (γ) | Determines the importance of future rewards compared to immediate ones in the calculation of cumulative rewards. A discount factor of 0 would mean that only immediate rewards will be considered, while a discount factor of 1 would mean that future rewards would be valued equally to immediate rewards. |