Fairoz Nower Khan, Nabuat Zaman Nahim, Peizhong Ju

Discrete Flow Matching for Offline-to-Online Reinforcement Learning

Fairoz Nower Khan, Nabuat Zaman Nahim, Peizhong Ju / May 13, 2026

arXiv:2605.12379v1 Announce Type: new
Abstract: Many reinforcement learning (RL) tasks have discrete action spaces, but most generative policy methods based on diffusion and flow matching are designed for continuous control. Meanwhile, generative poli…

Author name: Fairoz Nower Khan, Nabuat Zaman Nahim, Peizhong Ju

Discrete Flow Matching for Offline-to-Online Reinforcement Learning