cs.AI, cs.LG

Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO

arXiv:2604.13517v1 Announce Type: cross
Abstract: Temporal credit assignment in reinforcement learning has long been a central challenge. Inspired by the multi-timescale encoding of the dopamine system in neurobiology, recent research has sought to in…