Table 1 Reward comparison among the proposed DQN method, one-size-fit-all, random forest, and experts’ treatment.

From: Learning the Dynamic Treatment Regimes from Medical Registry Data through Deep Q-network

Treatments

Method

Reward

95% Confidence Interval

AGVHD

DQN

0.717

(0.683, 0.729)

one-size-fit-all

0.693

(0.659, 0.705)

Random forest

0.677

(0.666, 0.703)

experts’ treatment

0.673

(0.663, 0.694)

CGVHD

DQN

0.706

(0.678, 0.722)

one-size-fit-all

0.684

(0.671, 0.712)

Random forest

0.672

(0.663, 0.713)

experts’ treatment

0.671

(0.661, 0.697)