Qi Zhang - Provide.ai

Positive-Only Drifting Policy Optimization

Qi Zhang / April 21, 2026

arXiv:2604.16519v1 Announce Type: cross
Abstract: In the field of online reinforcement learning (RL), traditional Gaussian policies and flow-based methods are often constrained by their unimodal expressiveness, complex gradient clipping, or stringent …

Author name: Qi Zhang

Positive-Only Drifting Policy Optimization