This work introduces a bilevel optimization framework that discovers optimal reward functions for embodied reinforcement learning agents through a mechanism called regret minimization. The approach accelerates policy optimization and enhances adaptability across diverse tasks.
- Renzhi Lu
- Zonghe Shao
- Hai-Tao Zhang