Byeongchan Kim, Min-hwan Oh

Peng’s Q($\lambda$) for Conservative Value Estimation in Offline Reinforcement Learning

Byeongchan Kim, Min-hwan Oh / May 15, 2026

arXiv:2605.14779v1 Announce Type: new
Abstract: We propose a model-free offline multi-step reinforcement learning (RL) algorithm, Conservative Peng’s Q($\lambda$) (CPQL). Our algorithm adapts the Peng’s Q($\lambda$) (PQL) operator for conservative val…

Author name: Byeongchan Kim, Min-hwan Oh

Peng’s Q($\lambda$) for Conservative Value Estimation in Offline Reinforcement Learning