PARM: Pipeline-Adapted Reward Model
arXiv:2604.18327v1 Announce Type: cross
Abstract: Reward models (RMs) are central to aligning large language models (LLMs) with human preferences, powering RLHF and advanced decoding strategies. While most prior work focuses on single-step generation,…