Fig. 1: Schematic flowchart of the RL-based materials optimization framework.

The framework shows two alternative training approaches: an inner loop utilizing a surrogate model for sample-efficient RL agent learning (orange), and an on-the-fly loop interacting directly with the experimental environment (green). The RL agent can be trained through either approach or their combination to optimize materials properties through iterative design decisions. The dashed blue arrow and the gray argmax formula represent the ranking of candidates using an acquisition function (α(·)) based on model predictions in Bayesian Optimization.