Long-Horizon Model-Based Offline Reinforcement Learning Without Explicit Conservatism
arXiv:2512.04341v3 Announce Type: replace
Abstract: Popular offline reinforcement learning (RL) methods rely on explicit conservatism, penalizing out-of-dataset actions or restricting rollout horizons. We question the universality of this principle an…