cs.CV

KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration

arXiv:2605.14278v1 Announce Type: new
Abstract: Aligning streaming autoregressive (AR) video generators with human preferences is challenging. Existing reinforcement learning methods predominantly rely on noise-based exploration and SDE-based surrogat…