Beyond Importance Sampling: Rejection-Gated Policy Optimization
arXiv:2604.14895v1 Announce Type: new
Abstract: We propose a new perspective on policy optimization: rather than reweighting all samples by their importance ratios, an optimizer should select which samples are trustworthy enough to drive a policy upda…