EXPO: Stable Reinforcement Learning with Expressive Policies
arXiv:2507.07986v3 Announce Type: replace
Abstract: We study the problem of training and fine-tuning expressive policies with online reinforcement learning (RL) given an offline dataset. Training expressive policy classes with online RL present a uniq…