Nima H. Siboni - Provide.ai

On the “Causality” Step in Policy Gradient Derivations: A Pedagogical Reconciliation of Full Return and Reward-to-Go

Nima H. Siboni / April 7, 2026

arXiv:2604.04686v1 Announce Type: new
Abstract: In introductory presentations of policy gradients, one often derives the REINFORCE estimator using the full trajectory return and then states, by “causality,” that the full return may be replaced by th…

Author name: Nima H. Siboni

On the “Causality” Step in Policy Gradient Derivations: A Pedagogical Reconciliation of Full Return and Reward-to-Go