Table 1 The Q algorithm.

From: The analysis of educational informatization management learning model under the internet of things and artificial intelligence

Input: state set S, action set A

 

Training phase:

  
 

Initialize the E and D matrices as zero matrices; discount factor γ = 0.8;

 

For each episode, do

  

Randomly select the initial state s0;

  

s: = s0;

  

If convergence is not reached, do

   

Select behavior a for the current state;

   

execute a;

   

Generate the next state sʹ;

   

Calculate new Q [s, a];

   

s: = s0;

  

end if

 
 

end for

  

Use stage:

s: = s0;

Determine the current optimal behavior aʹ according to Q[s, a] = max Q[s, aʹ];

s: = sʹ;

Iterate continuously until convergence;

Output: E and D matrix;