Jincheng Mei, Ian Osband

Delightful Gradients Accelerate Corner Escape

Jincheng Mei, Ian Osband / May 13, 2026

arXiv:2605.11908v1 Announce Type: new
Abstract: Softmax policy gradient converges at $O(1/t)$, but its transient behavior near sub-optimal corners of the simplex can be exponentially slow. The bottleneck is self-trapping: negative-advantage actions re…

Author name: Jincheng Mei, Ian Osband

Delightful Gradients Accelerate Corner Escape