When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual Try-On
arXiv:2603.05659v2 Announce Type: replace
Abstract: Reinforcement learning with verifiable rewards (RLVR) and Rubrics as Rewards (RaR) have driven strong gains in domains with clear correctness signals and even in subjective domains by synthesizing ev…