Polychromic Objectives for Reinforcement Learning
arXiv:2509.25424v4 Announce Type: replace-cross
Abstract: Reinforcement learning fine-tuning (RLFT) is a dominant paradigm for improving pretrained policies for downstream tasks. These pretrained policies, trained on large datasets, produce generation…