Bounded Ratio Reinforcement Learning
arXiv:2604.18578v1 Announce Type: new
Abstract: Proximal Policy Optimization (PPO) has become the predominant algorithm for on-policy reinforcement learning due to its scalability and empirical robustness across domains. However, there is a significan…