Table 2 Hyperparameter selection for the proposed deep reinforcement learning.

From: A machine learning optimized Dielectric Ultra-focused Oscillatory (DUO) electrode for low temperature electrosurgery

Category

Parameter

Description

Value

Q-learning

discount factor (γ)

Discount rate for Q-value (reward) of future values (0 ≤ γ ≤ 1)

0.85

learning rate (α)

The proportion of new learning results reflected (0 ≤ α ≤ 1)

0.001

epsilon (εQ)

The ratio of exploring various actions, decreasing from the initial value to the final value

1.0 → 0.2

epsilon decay (\(\:\lambda\:)\)

Gradual reduction of the exploration rate over time to stable learning

0.99

Target network update

Frequency of updating the target network with episodes

1

Experience replay

batch size

Amount of data used for learning at once

64

memory size

Size of the memory that stores learning data

2000

Network configuration

Input layer

Number of input nodes, i.e., state variables

3

Hidden layer

Number of nodes in the hidden layer,

adjusting the complexity of the model

32-32

Output layer

Number of selectable actions

7

Loss

Loss function, Mean Squared Error (MSE)

 

Optimizer

Optimization algorithm, Adam