Fig. 4: The reward trajectory during the training process in the lower level.
From: Adaptive hierarchical learning for uncertainty-aware distributed energy resource planning

The solid blue line signifies the mean average reward achieved during training, while the surrounding light blue band denotes the standard deviation or confidence interval, reflecting the variability and stability of the learning process across multiple trials.