Zhancun Mu, Guangyu Zhao, Yiwu Zhong, Chi Zhang

Preserve Support, Not Correspondence: Dynamic Routing for Offline Reinforcement Learning

Zhancun Mu, Guangyu Zhao, Yiwu Zhong, Chi Zhang / April 27, 2026

arXiv:2604.22229v1 Announce Type: cross
Abstract: One-step offline RL actors are attractive because they avoid backpropagating through long iterative samplers and keep inference cheap, but they still have to improve under a critic without drifting awa…

Author name: Zhancun Mu, Guangyu Zhao, Yiwu Zhong, Chi Zhang

Preserve Support, Not Correspondence: Dynamic Routing for Offline Reinforcement Learning