cs.AI, cs.LG

SOPE: Stabilizing Off-Policy Evaluation for Online RL with Prior Data

arXiv:2605.05863v1 Announce Type: new
Abstract: Incorporating prior data into online reinforcement learning accelerates training but typically forces a difficult trade-off between high computational costs and long, multi-stage training pipelines. Whil…