cs.AI, cs.LG

Reinforcement Learning via Value Gradient Flow

arXiv:2604.14265v1 Announce Type: new
Abstract: We study behavior-regularized reinforcement learning (RL), where regularization toward a reference distribution (the dataset in offline RL or the base model in LLM RL finetuning) is essential to prevent …