Truncated Rectified Flow Policy for Reinforcement Learning with One-Step Sampling
arXiv:2604.09159v1 Announce Type: new
Abstract: Maximum entropy reinforcement learning (MaxEnt RL) has become a standard framework for sequential decision making, yet its standard Gaussian policy parameterization is inherently unimodal, limiting its a…