cs.AI, cs.LG

VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning

arXiv:2504.11944v3 Announce Type: replace-cross
Abstract: Offline reinforcement learning (RL) learns effective policies from pre-collected datasets, offering a practical solution for applications where online interactions are risky or costly. Model-ba…