Fig. 5: Optimization and validation of AgBiI4 synthesis conditions using the reinforcement learning (RL) algorithm.
From: Data-driven microstructural optimization of Ag-Bi-I perovskite-inspired materials

a Total reward progression as a function of episodes for the RL baseline (detailed in Fig. S7), demonstrating RL performance over time. The synthesis conditions are optimized as the reward peaks, indicating convergence. b Reward distribution across 1000 training episodes for the exploratory RL (described in Fig. 4d), where the optimal synthesis conditions align with the highest reward values. c Principal component analysis (PCA) visualization of all unique synthesis conditions explored by the RL agent over 1000 training episodes. Each point represents a unique condition, colored by its corresponding reward. Gray points in the background represent the full design space of 1728 possible parameter combinations, illustrating that the agent selectively explores a subset of this space during training. d Comparison and validation of the best synthesis conditions identified by both the RL baseline and exploratory RL against the historical dataset’s top-performing condition with the largest average grains detected. The upper panel visualizes synthesis conditions in the parameter space, analyzed via PCA on the original full dataset containing all three compositions: AgBiI4, Ag2BiI5, and Ag3BiI6. The lower panel presents the scanning electron microscopy (SEM) images of solar cells fabricated with the respective synthesis parameters, and their average grain sizes are specified in Table 1. To facilitate direct visual comparison, the SEM image for the “Highest Reward in RL”, originally acquired at a different magnification, was cropped and rescaled to match the scale of the other two images.