cs.LG

Peng’s Q($\lambda$) for Conservative Value Estimation in Offline Reinforcement Learning

arXiv:2605.14779v1 Announce Type: new
Abstract: We propose a model-free offline multi-step reinforcement learning (RL) algorithm, Conservative Peng’s Q($\lambda$) (CPQL). Our algorithm adapts the Peng’s Q($\lambda$) (PQL) operator for conservative val…