Dillon Sandhu, Ronald Parr

Approximate Next Policy Sampling: Replacing Conservative Target Policy Updates in Deep RL

Dillon Sandhu, Ronald Parr / May 8, 2026

arXiv:2605.05481v1 Announce Type: new
Abstract: We revisit a classic “chicken-and-egg” problem in reinforcement learning: to safely improve a policy, the value function must be accurate on the state-visitation distribution of the updated policy. That …

Author name: Dillon Sandhu, Ronald Parr

Approximate Next Policy Sampling: Replacing Conservative Target Policy Updates in Deep RL