StructRL: Recovering Dynamic Programming Structure from Learning Dynamics in Distributional Reinforcement Learning
arXiv:2604.08620v2 Announce Type: replace
Abstract: Reinforcement learning is typically treated as a uniform, data-driven optimization process, where updates are guided by rewards and temporal-difference errors without explicitly exploiting global str…