Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO
arXiv:2604.13517v1 Announce Type: cross
Abstract: Temporal credit assignment in reinforcement learning has long been a central challenge. Inspired by the multi-timescale encoding of the dopamine system in neurobiology, recent research has sought to in…