Figure 6

Reward function for reinforcement learning. Contour plot of reward function as a function of the probability of local control (PLC) and radiation induced pneumonitis of grade 2 or higher (PRP2). Area enclosed by the blue line corresponds to the clinically desirable outcome, i.e., \(P_{LC} > 70{\%}\) and \({P_{RP2}} <17.2{\%}\) . Similarly, the area enclosed by the green lines corresponds to the computationally desirable outcome, i.e., \(P_{LC} > 50{\%}\) and \({P_{RP2}} <50{\%}\). Along with \(P_{LC} \times (1-P_{RP2})\) the AI agent receives + 10 reward for achieving clinically desirable outcome, + 5 for achieving computationally desirable outcome, and -1 when unable to achieve a desirable outcome.