Positive-Only Drifting Policy Optimization
arXiv:2604.16519v1 Announce Type: cross
Abstract: In the field of online reinforcement learning (RL), traditional Gaussian policies and flow-based methods are often constrained by their unimodal expressiveness, complex gradient clipping, or stringent …