Table 1 \(\ell _2\) norm for the deviation of the velocity with respect to its true projection value for \(t\in [0,1]\).

From: Variational multiscale reinforcement learning for discovering reduced order closure models of nonlinear spatiotemporal transport systems

ROM

Action

RMSE (\(Re = 1200\))

RMSE (\(Re = 1500\))

RMSE (\(Re = 2000\))

GP

–

\(11.837\times 10^{-3}\)

\(16.217\times 10^{-3}\)

\(22.809\times 10^{-3}\)

LMRL

\(a_t \in \{\eta _e(t)\}\)

\(4.645\times 10^{-3}\)

\(6.144\times 10^{-3}\)

\(10.092\times 10^{-3}\)

MMRL

\(a_t \in \{\eta _1(t), \eta _2(t),\ldots , \eta _R(t)\}\)

\(3.111\times 10^{-3}\)

\(5.258\times 10^{-3}\)

\(9.746\times 10^{-3}\)

VMRL

\(a_t \in \{\eta _1(t), \eta _2(t),\ldots , \eta _R(t)\}\)

\(5.341\times 10^{-3}\)

\(5.262\times 10^{-3}\)

\(8.063\times 10^{-3}\)

  1. We note that both snapshot data generation for POD analysis and RL training are performed at \(Re=1000\). Here LMRL and MMRL models use the reward function defined in Eq. (23) utilizing the true snapshot data, whereas the VMRL model uses the reward function defined in Eq. (22) that utilizes the variational multiscale formalism. We note that computational time of the RL training is about 5.1 s for each episode that has 1000 time steps.