Table 2 Components of Markov Decision Processes

From: A Primer on Reinforcement Learning in Medicine for Clinicians

Component name

Description

State (S)

The set of States (s∈S)

Action (A)

The set of Actions (a∈A)

Probability scores (P)

Denotes transition probabilities, specifying the likelihood of moving from one state to another when the agent takes a particular action

Rewards (R)

Denotes the positive or negative rewards received upon transitioning from state s to s’ (s, s’∈S) by taking an action a

discount factor (γ)

Determines the importance of future rewards compared to immediate ones in the calculation of cumulative rewards.

A discount factor of 0 would mean that only immediate rewards will be considered, while a discount factor of 1 would mean that future rewards would be valued equally to immediate rewards.